fix: runtime oauth issues

feat: allow credential assignment as worker profile
fix: worker context token
2026-05-04 20:44:07 -07:00 · 2026-05-04 12:01:43 -07:00 · 2026-05-03 11:45:37 -07:00 · 2026-05-03 11:35:05 -07:00 · 2026-05-01 18:06:39 -07:00 · 2026-05-01 17:53:44 -07:00
306 changed files with 36112 additions and 12406 deletions
@@ -47,7 +47,6 @@
      "Bash(grep -v ':0$')",
      "Bash(curl -s -m 2 http://127.0.0.1:4002/sse -o /dev/null -w 'status=%{http_code} time=%{time_total}s\\\\n')",
      "mcp__gcu-tools__browser_status",
-      "mcp__gcu-tools__browser_start",
      "mcp__gcu-tools__browser_navigate",
      "mcp__gcu-tools__browser_evaluate",
      "mcp__gcu-tools__browser_screenshot",
@@ -214,7 +214,7 @@ Curated list of known browser automation edge cases with symptoms, causes, and f
 | **Symptom** | `browser_open()` returns `"No group with id: XXXXXXX"` even though `browser_status` shows `running: true` |
 | **Root Cause** | In-memory `_contexts` dict has a stale `groupId` from a Chrome tab group that was closed outside the tool (e.g. user closed the tab group) |
 | **Detection** | `browser_status` returns `running: true` but `browser_open` fails with "No group with id" |
-| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_start()` again |
+| **Fix** | Call `browser_stop()` to clear stale context from `_contexts`, then `browser_open(url)` to lazy-create a fresh one |
 | **Code** | `tools/lifecycle.py:144-160` - `already_running` check uses cached dict without validating against Chrome |
 | **Verified** | 2026-04-03 ✓ |

@@ -407,7 +407,7 @@ Aden Hive supports **100+ LLM providers** via LiteLLM, giving users maximum flex
 | **Anthropic** | Claude 3.5 Sonnet, Haiku, Opus | Default provider, best for reasoning |
 | **OpenAI** | GPT-4, GPT-4 Turbo, GPT-4o | Function calling, vision |
 | **OpenRouter** | Any OpenRouter catalog model | Uses `OPENROUTER_API_KEY` and `https://openrouter.ai/api/v1` |
-| **Hive LLM** | `queen`, `kimi-2.5`, `GLM-5` | Uses `HIVE_API_KEY` and the Hive-managed endpoint |
+| **Hive LLM** | `queen`, `kimi-k2.5`, `GLM-5` | Uses `HIVE_API_KEY` and the Hive-managed endpoint |
 | **Google** | Gemini 1.5 Pro, Flash | Long context windows |
 | **DeepSeek** | DeepSeek V3 | Cost-effective, strong reasoning |
 | **Mistral** | Mistral Large, Medium, Small | Open weights, EU hosting |
@@ -435,7 +435,7 @@ DEFAULT_MODEL = "claude-haiku-4-5-20251001"

 **Provider-Specific Notes**
 - **OpenRouter**: store `provider` as `openrouter`, use the raw OpenRouter model ID in `model` (for example `x-ai/grok-4.20-beta`), and use `OPENROUTER_API_KEY`
- **Hive LLM**: store `provider` as `hive`, use Hive model names such as `queen`, `kimi-2.5`, or `GLM-5`, and use `HIVE_API_KEY`
+- **Hive LLM**: store `provider` as `hive`, use Hive model names such as `queen`, `kimi-k2.5`, or `GLM-5`, and use `HIVE_API_KEY`

 **For Development**
 - Use cheaper/faster models (Haiku, GPT-4o-mini)
@@ -72,17 +72,16 @@ Register an MCP server as a tool source for your agent.
    "cwd": "../tools",
    "description": "Aden tools..."
  },
-  "tools_discovered": 6,
+  "tools_discovered": 5,
  "tools": [
    "web_search",
    "web_scrape",
    "file_read",
    "file_write",
-    "pdf_read",
-    "example_tool"
+    "pdf_read"
  ],
  "total_mcp_servers": 1,
-  "note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
+  "note": "MCP server 'tools' registered with 5 tools. These tools can now be used in event_loop nodes."
 }
 ```

@@ -1,6 +1,6 @@
 # MCP Server Guide - Agent Building Tools

-> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
+> **Note:** This document is stale. The previous `coder-tools` MCP server has been replaced by `files-tools` (`tools/files_server.py`), which only exposes file I/O (`read_file`, `write_file`, `edit_file`, `hashline_edit`, `search_files`). The agent-building, shell, and snapshot tools that used to live here have been removed.

 This guide covers the MCP tools available for building goal-driven agents.

@@ -20,9 +20,9 @@ Add to your MCP client configuration (e.g., Claude Desktop):
 ```json
 {
  "mcpServers": {
-    "coder-tools": {
+    "files-tools": {
      "command": "uv",
-      "args": ["run", "coder_tools_server.py", "--stdio"],
+      "args": ["run", "files_server.py", "--stdio"],
      "cwd": "/path/to/hive/tools"
    }
  }
@@ -19,8 +19,6 @@ uv pip install -e .

 ## Agent Building

-Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
-
 See the [Getting Started Guide](../docs/getting-started.md) for building agents.

 ## Quick Start
@@ -14,7 +14,6 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
-import os
 import re
 import time
 import uuid
@@ -85,7 +84,12 @@ from framework.agent_loop.internals.types import (
    JudgeVerdict,
    TriggerEvent,
 )
+from framework.agent_loop.internals.vision_fallback import (
+    caption_tool_image,
+    extract_intent_for_tool,
+)
 from framework.agent_loop.types import AgentContext, AgentProtocol, AgentResult
+from framework.config import get_vision_fallback_model
 from framework.host.event_bus import EventBus
 from framework.llm.capabilities import filter_tools_for_model, supports_image_tool_results
 from framework.llm.provider import Tool, ToolResult, ToolUse
@@ -177,46 +181,58 @@ def _strip_internal_tags_from_snapshot(snapshot: str) -> str:
    return cleaned


-async def _describe_images_as_text(image_content: list[dict[str, Any]]) -> str | None:
-    """Describe images using the best available vision model."""
-    import litellm
+def _vision_fallback_active(model: str | None) -> bool:
+    """Return True if tool-result images for *model* should be routed
+    through the vision-fallback chain rather than sent to the model.

-    blocks: list[dict[str, Any]] = [
-        {
-            "type": "text",
-            "text": (
-                "Describe the following image(s) concisely but with enough detail "
-                "that a text-only AI assistant can understand the content and context."
-            ),
-        }
-    ]
-    blocks.extend(image_content)
+    Trigger: the model's catalog entry has ``supports_vision: false``
+    (resolved via :func:`capabilities.supports_image_tool_results`,
+    which reads ``model_catalog.json``). Unknown models default to
+    vision-capable, so the fallback only fires when the catalog
+    explicitly says the model is text-only.

-    candidates: list[str] = []
-    if os.environ.get("OPENAI_API_KEY"):
-        candidates.append("gpt-4o-mini")
-    if os.environ.get("ANTHROPIC_API_KEY"):
-        candidates.append("claude-3-haiku-20240307")
-    if os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY"):
-        candidates.append("gemini/gemini-1.5-flash")
+    The ``vision_fallback`` config block is the *substitution* model —
+    it doesn't widen the trigger. To force fallback for a model that
+    isn't catalogued yet, add an entry to ``model_catalog.json`` with
+    ``supports_vision: false`` rather than relying on a runtime config.
+    """
+    if not model:
+        return False
+    return not supports_image_tool_results(model)

-    for model in candidates:
-        try:
-            response = await litellm.acompletion(
-                model=model,
-                messages=[{"role": "user", "content": blocks}],
-                max_tokens=512,
-            )
-            description = (response.choices[0].message.content or "").strip()
-            if description:
-                count = len(image_content)
-                label = "image" if count == 1 else f"{count} images"
-                return f"[{label} attached  — description: {description}]"
-        except Exception as exc:
-            logger.debug("Vision fallback model '%s' failed: %s", model, exc)
-            continue

-    return None
+async def _captioning_chain(
+    intent: str,
+    image_content: list[dict[str, Any]],
+) -> tuple[str, str] | None:
+    """Configured vision_fallback → retry → ``gemini/gemini-3-flash-preview``.
+
+    The Gemini override reuses the configured ``api_key`` / ``api_base``,
+    so a Hive subscriber (whose token routes to a multi-model proxy)
+    keeps coverage when their primary model glitches. Without
+    configured creds litellm falls through to env-based Gemini auth;
+    users with neither Hive nor a ``GEMINI_API_KEY`` simply lose the
+    third try.
+    """
+    if result := await caption_tool_image(intent, image_content):
+        return result
+    logger.warning("vision_fallback failed; retrying configured model")
+    if result := await caption_tool_image(intent, image_content):
+        return result
+    # Match the configured model's proxy prefix so the override is routed
+    # through the same endpoint with the same auth shape. Without this,
+    # a Hive subscriber's `hive/...` config would override to
+    # `gemini/...` — which sends Google's Gemini protocol to the
+    # Anthropic-compatible Hive proxy (404), not what we want.
+    configured = (get_vision_fallback_model() or "").lower()
+    if configured.startswith("hive/"):
+        override = "hive/gemini-3-flash-preview"
+    elif configured.startswith("kimi/"):
+        override = "kimi/gemini-3-flash-preview"
+    else:
+        override = "gemini/gemini-3-flash-preview"
+    logger.warning("vision_fallback retry failed; trying %s", override)
+    return await caption_tool_image(intent, image_content, model_override=override)


 # Pattern for detecting context-window-exceeded errors across LLM providers.
@@ -376,6 +392,14 @@ class AgentLoop(AgentProtocol):
        # dashboards can build aggregates over many runs.
        self._counters: dict[str, int] = {}

+        # Task-system reminder state (see framework/tasks/reminders.py).
+        # Bumped each iteration; reset whenever a task op tool was called
+        # in the iteration that just completed; nudges the agent via the
+        # injection queue when it's been silent on tasks for too long.
+        from framework.tasks.reminders import ReminderState as _RS
+
+        self._task_reminder_state: _RS = _RS()
+
    def _bump(self, key: str, by: int = 1) -> None:
        """Increment a reliability counter (creates the key on first use)."""
        self._counters[key] = self._counters.get(key, 0) + by
@@ -575,6 +599,7 @@ class AgentLoop(AgentProtocol):
                store=self._conversation_store,
                run_id=ctx.effective_run_id,
                compaction_buffer_tokens=self._config.compaction_buffer_tokens,
+                compaction_buffer_ratio=self._config.compaction_buffer_ratio,
                compaction_warning_buffer_tokens=(self._config.compaction_warning_buffer_tokens),
            )
            accumulator = OutputAccumulator(
@@ -587,7 +612,12 @@ class AgentLoop(AgentProtocol):

            initial_message = self._build_initial_message(ctx)
            if initial_message:
-                await conversation.add_user_message(initial_message)
+                # Stamp with arrival time so the conversation has a
+                # temporal anchor for the first turn, matching the
+                # stamping done by drain_injection_queue for every
+                # subsequent event.
+                _stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
+                await conversation.add_user_message(f"[{_stamp}] {initial_message}")

            await self._run_hooks("session_start", conversation, trigger=initial_message)

@@ -599,7 +629,8 @@ class AgentLoop(AgentProtocol):
            initial_message = self._build_initial_message(ctx)
            if not initial_message:
                initial_message = "Hello"
-            await conversation.add_user_message(initial_message)
+            _stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
+            await conversation.add_user_message(f"[{_stamp}] {initial_message}")

        # 2b. Restore spill counter from existing files (resume safety)
        self._restore_spill_counter()
@@ -619,8 +650,23 @@ class AgentLoop(AgentProtocol):
        # Hide image-producing tools from text-only models so they never try
        # to call them. Avoids wasted turns + "screenshot failed" lessons
        # getting saved to memory. See framework.llm.capabilities.
+        # EXCEPTION: when the model IS on the text-only deny list AND
+        # a vision_fallback subagent is configured, leave image tools
+        # visible. The post-execution hook in the inner tool loop
+        # will route each image_content through the fallback VLM and
+        # replace it with a text caption before the main agent sees
+        # the result — so the main agent gets captions instead of
+        # raw images, rather than losing the tool entirely. We DON'T
+        # bypass the filter for vision-capable models (that would be
+        # a no-op anyway — the filter doesn't fire for them) and we
+        # DON'T bypass it without a configured fallback (the agent
+        # would just see raw stripped tool results with no caption).
        _llm_model = ctx.llm.model if ctx.llm else ""
-        tools, _hidden_image_tools = filter_tools_for_model(tools, _llm_model)
+        _text_only_main = _llm_model and not supports_image_tool_results(_llm_model)
+        if _text_only_main and get_vision_fallback_model() is not None:
+            _hidden_image_tools: list[str] = []
+        else:
+            tools, _hidden_image_tools = filter_tools_for_model(tools, _llm_model)

        logger.info(
            "[%s] Tools available (%d): %s | direct_user_io=%s | judge=%s | hidden_image_tools=%s",
@@ -793,14 +839,56 @@ class AgentLoop(AgentProtocol):
                tools.extend(synthetic)

            # 6b3. Dynamic prompt refresh (phase switching / memory refresh)
-            if ctx.dynamic_prompt_provider is not None or ctx.dynamic_memory_provider is not None:
+            if (
+                ctx.dynamic_prompt_provider is not None
+                or ctx.dynamic_memory_provider is not None
+                or ctx.dynamic_skills_catalog_provider is not None
+            ):
                if ctx.dynamic_prompt_provider is not None:
-                    _new_prompt = stamp_prompt_datetime(ctx.dynamic_prompt_provider())
+                    _new_prompt = ctx.dynamic_prompt_provider()
+                    # When a suffix provider is also wired (Queen's
+                    # static/dynamic split), keep the two pieces separate
+                    # so the LLM wrapper can emit them as two system
+                    # content blocks with a cache breakpoint between them.
+                    # The timestamp used to be stamped here via
+                    # stamp_prompt_datetime on every iteration — it now
+                    # lives inside the frozen dynamic suffix and is only
+                    # refreshed at user-turn boundaries, so per-iteration
+                    # stamping would both double-stamp and bust the cache.
+                    _new_suffix: str | None = None
+                    if ctx.dynamic_prompt_suffix_provider is not None:
+                        try:
+                            _new_suffix = ctx.dynamic_prompt_suffix_provider() or ""
+                        except Exception:
+                            logger.debug(
+                                "[%s] dynamic_prompt_suffix_provider raised — falling back to legacy stamp",
+                                node_id,
+                                exc_info=True,
+                            )
+                            _new_suffix = None
+                    if _new_suffix is None:
+                        # Legacy / fallback path: no split in use (or the
+                        # suffix provider raised). Stamp the timestamp at
+                        # the end of the single-string prompt so the model
+                        # still sees a current "now".
+                        _new_prompt = stamp_prompt_datetime(_new_prompt)
                else:
+                    # build_system_prompt_for_context reads dynamic_skills_catalog_provider
+                    # directly; no separate branch needed.
                    _new_prompt = build_system_prompt_for_context(ctx)
-                if _new_prompt != conversation.system_prompt:
-                    conversation.update_system_prompt(_new_prompt)
-                    logger.info("[%s] Dynamic prompt updated", node_id)
+                    _new_suffix = None
+                if _new_suffix is not None:
+                    _combined_for_compare = f"{_new_prompt}\n\n{_new_suffix}" if _new_suffix else _new_prompt
+                    if (
+                        _combined_for_compare != conversation.system_prompt
+                        or _new_suffix != conversation.system_prompt_dynamic_suffix
+                    ):
+                        conversation.update_system_prompt(_new_prompt, dynamic_suffix=_new_suffix)
+                        logger.info("[%s] Dynamic prompt updated (split)", node_id)
+                else:
+                    if _new_prompt != conversation.system_prompt:
+                        conversation.update_system_prompt(_new_prompt)
+                        logger.info("[%s] Dynamic prompt updated", node_id)

            # 6c. Publish iteration event (with per-iteration metadata when available)
            _iter_meta = None
@@ -882,6 +970,17 @@ class AgentLoop(AgentProtocol):
                    )
                    total_input_tokens += turn_tokens.get("input", 0)
                    total_output_tokens += turn_tokens.get("output", 0)
+
+                    # Task-system reminder: if the model has been silent on
+                    # task ops for too long but still has open tasks, drop
+                    # a steering reminder onto the injection queue. Drained
+                    # at the next iteration's 6b so it lands as the next
+                    # user turn via the normal injection path. Best-effort
+                    # — never raises.
+                    try:
+                        await self._maybe_inject_task_reminder(ctx, logged_tool_calls)
+                    except Exception:
+                        logger.debug("task reminder check failed", exc_info=True)
                    await self._publish_llm_turn_complete(
                        stream_id,
                        node_id,
@@ -890,6 +989,8 @@ class AgentLoop(AgentProtocol):
                        input_tokens=turn_tokens.get("input", 0),
                        output_tokens=turn_tokens.get("output", 0),
                        cached_tokens=turn_tokens.get("cached", 0),
+                        cache_creation_tokens=turn_tokens.get("cache_creation", 0),
+                        cost_usd=float(turn_tokens.get("cost", 0.0) or 0.0),
                        execution_id=execution_id,
                        iteration=iteration,
                    )
@@ -904,6 +1005,7 @@ class AgentLoop(AgentProtocol):
                        tool_calls=logged_tool_calls,
                        tool_results=real_tool_results,
                        token_counts=turn_tokens,
+                        tools=tools,
                    )

                    # DS-13: inject context preservation warning once when token usage
@@ -2290,7 +2392,9 @@ class AgentLoop(AgentProtocol):
        stream_id = ctx.stream_id or ctx.agent_id
        node_id = ctx.agent_id
        execution_id = ctx.execution_id or ""
-        token_counts: dict[str, int] = {"input": 0, "output": 0, "cached": 0}
+        # Mixed-type dict: int token counts + str stop_reason/model + float cost.
+        # Typed loosely to avoid churn in the many call sites that read from it.
+        token_counts: dict[str, Any] = {"input": 0, "output": 0, "cached": 0, "cache_creation": 0, "cost": 0.0}
        tool_call_count = 0
        final_text = ""
        final_system_prompt = conversation.system_prompt
@@ -2431,9 +2535,16 @@ class AgentLoop(AgentProtocol):
                nonlocal _first_event_at
                _clean_snapshot = ""  # visible-only text for the frontend

+                # Split-prompt path: pass STATIC and DYNAMIC tail separately
+                # so the LLM wrapper can emit them as two Anthropic system
+                # content blocks with a cache breakpoint between them. When
+                # no split is in use, ``system_prompt_static`` equals the
+                # full prompt and the suffix is empty — identical to the
+                # legacy single-block request.
                async for event in ctx.llm.stream(
                    messages=_msgs,
-                    system=conversation.system_prompt,
+                    system=conversation.system_prompt_static,
+                    system_dynamic_suffix=(conversation.system_prompt_dynamic_suffix or None),
                    tools=tools if tools else None,
                    max_tokens=ctx.max_tokens,
                ):
@@ -2514,6 +2625,8 @@ class AgentLoop(AgentProtocol):
                        token_counts["input"] += event.input_tokens
                        token_counts["output"] += event.output_tokens
                        token_counts["cached"] += event.cached_tokens
+                        token_counts["cache_creation"] += event.cache_creation_tokens
+                        token_counts["cost"] = token_counts.get("cost", 0.0) + event.cost_usd
                        token_counts["stop_reason"] = event.stop_reason
                        token_counts["model"] = event.model

@@ -3306,6 +3419,30 @@ class AgentLoop(AgentProtocol):

            # Phase 3: record results into conversation in original order,
            # build logged/real lists, and publish completed events.
+            #
+            # Vision-fallback prefetch: a single turn may fire several
+            # image-producing tools in parallel (e.g. one screenshot
+            # per tab). Captioning each one takes a vision LLM round
+            # trip (1–30 s). Doing them sequentially in this loop
+            # would serialise that latency per image. Instead, kick
+            # off all caption tasks concurrently NOW, and await each
+            # one just-in-time inside the per-tc body. If only a
+            # single image needs captioning, this collapses to a
+            # single await with no overhead.
+            _model_text_only = ctx.llm and _vision_fallback_active(ctx.llm.model)
+            caption_tasks: dict[str, asyncio.Task[tuple[str, str] | None]] = {}
+            if _model_text_only:
+                for tc in tool_calls[:executed_in_batch]:
+                    res = results_by_id.get(tc.tool_use_id)
+                    if not res or not res.image_content:
+                        continue
+                    intent = extract_intent_for_tool(
+                        conversation,
+                        tc.tool_name,
+                        tc.tool_input or {},
+                    )
+                    caption_tasks[tc.tool_use_id] = asyncio.create_task(_captioning_chain(intent, res.image_content))
+
            for tc in tool_calls[:executed_in_batch]:
                result = results_by_id.get(tc.tool_use_id)
                if result is None:
@@ -3328,11 +3465,33 @@ class AgentLoop(AgentProtocol):
                    logged_tool_calls.append(tool_entry)

                image_content = result.image_content
-                if image_content and ctx.llm and not supports_image_tool_results(ctx.llm.model):
-                    logger.info(
-                        "Stripping image_content from tool result; model '%s' does not support images in tool results",
-                        ctx.llm.model,
-                    )
+                # Vision-fallback marker spliced into the persisted text
+                # below. None when no captioning ran (vision-capable
+                # main model, no images, or no fallback chain reached
+                # this tool).
+                vision_fallback_marker: str | None = None
+                if image_content and tc.tool_use_id in caption_tasks:
+                    caption_result = await caption_tasks.pop(tc.tool_use_id)
+                    if caption_result:
+                        caption, vision_model = caption_result
+                        vision_fallback_marker = f"[vision-fallback caption]\n{caption}"
+                        logger.info(
+                            "vision_fallback: captioned %d image(s) for tool '%s' "
+                            "(main model '%s' routed through fallback model '%s')",
+                            len(image_content),
+                            tc.tool_name,
+                            ctx.llm.model if ctx.llm else "?",
+                            vision_model,
+                        )
+                    else:
+                        vision_fallback_marker = "[image stripped — vision fallback exhausted]"
+                        logger.info(
+                            "vision_fallback: exhausted; stripping %d image(s) from "
+                            "tool '%s' result without caption (model '%s')",
+                            len(image_content),
+                            tc.tool_name,
+                            ctx.llm.model if ctx.llm else "?",
+                        )
                    image_content = None

                # Apply replay-detector steer prefix if this call matched a
@@ -3344,6 +3503,11 @@ class AgentLoop(AgentProtocol):
                    if _prefix:
                        stored_content = f"{_prefix}{stored_content or ''}"

+                # Splice the vision-fallback caption / placeholder into
+                # the persisted text after any prefix has been applied.
+                if vision_fallback_marker:
+                    stored_content = f"{stored_content or ''}\n\n{vision_fallback_marker}"
+
                await conversation.add_tool_result(
                    tool_use_id=tc.tool_use_id,
                    content=stored_content,
@@ -3970,7 +4134,7 @@ class AgentLoop(AgentProtocol):
            queue=self._injection_queue,
            conversation=conversation,
            ctx=ctx,
-            describe_images_as_text_fn=_describe_images_as_text,
+            caption_image_fn=_captioning_chain,
        )

    async def _drain_trigger_queue(self, conversation: NodeConversation) -> int:
@@ -4034,6 +4198,74 @@ class AgentLoop(AgentProtocol):
            execution_id=execution_id,
        )

+    async def _maybe_inject_task_reminder(
+        self,
+        ctx: AgentContext,
+        logged_tool_calls: list[dict[str, Any]] | None,
+    ) -> None:
+        """Layer 3 task-system steering — periodic reminder injection.
+
+        Called once per iteration after the LLM turn completes. If the
+        model has been silent on task ops for a while AND there are open
+        tasks on its session list, queue a system-style reminder onto
+        the injection queue so the next iteration drains it as a user
+        turn. Idempotent / safe to call always — gates internally.
+
+        ``logged_tool_calls`` is a list of dicts with at least a "name"
+        key, as accumulated by ``_run_single_turn``. Names like
+        ``task_create``, ``task_update``, ``colony_template_*`` reset
+        the counter (see ``framework.tasks.reminders.TASK_OP_TOOL_NAMES``).
+        """
+        from framework.tasks import get_task_store
+        from framework.tasks.models import TaskStatus
+        from framework.tasks.reminders import build_reminder, saw_task_op
+
+        state = self._task_reminder_state
+
+        # 1. Update counters based on this turn's tool calls.
+        names: list[str] = []
+        for call in logged_tool_calls or []:
+            try:
+                name = call.get("name") or call.get("tool_name")
+                if name:
+                    names.append(name)
+            except (AttributeError, TypeError):
+                continue
+        if saw_task_op(names):
+            state.on_task_op()
+        state.on_iteration()
+
+        # 2. Resolve the agent's task list. Skip if context isn't wired yet.
+        list_id = getattr(ctx, "task_list_id", None)
+        if not list_id:
+            return
+
+        # 3. Read the open-task snapshot. Best-effort.
+        try:
+            store = get_task_store()
+            records = await store.list_tasks(list_id)
+        except Exception:
+            return
+        open_tasks = [r for r in records if r.status != TaskStatus.COMPLETED]
+        if not state.should_remind(bool(open_tasks)):
+            return
+
+        body = build_reminder(records)
+        if not body:
+            return
+
+        # 4. Enqueue. Drained at the next iteration's 6b drain step and
+        # rendered as a user turn (with the "[External event]" prefix).
+        await self._injection_queue.put((body, False, None))
+        state.on_reminder_sent()
+        logger.info(
+            "[task-reminder] queued nudge for %s (open=%d, silent_turns=%d)",
+            list_id,
+            len(open_tasks),
+            state.turns_since_task_op,
+        )
+        self._bump("task_reminders_sent")
+
    async def _run_hooks(
        self,
        event: str,
@@ -4095,6 +4327,8 @@ class AgentLoop(AgentProtocol):
        input_tokens: int,
        output_tokens: int,
        cached_tokens: int = 0,
+        cache_creation_tokens: int = 0,
+        cost_usd: float = 0.0,
        execution_id: str = "",
        iteration: int | None = None,
    ) -> None:
@@ -4107,6 +4341,8 @@ class AgentLoop(AgentProtocol):
            input_tokens=input_tokens,
            output_tokens=output_tokens,
            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            execution_id=execution_id,
            iteration=iteration,
        )
@@ -427,9 +427,20 @@ class NodeConversation:
        store: ConversationStore | None = None,
        run_id: str | None = None,
        compaction_buffer_tokens: int | None = None,
+        compaction_buffer_ratio: float | None = None,
        compaction_warning_buffer_tokens: int | None = None,
    ) -> None:
        self._system_prompt = system_prompt
+        # Optional split: when a caller updates the prompt with a
+        # ``dynamic_suffix`` argument, we remember the static prefix and
+        # suffix separately so the LLM wrapper can emit them as two
+        # Anthropic system content blocks with a cache breakpoint between
+        # them. ``_system_prompt`` stays as the concatenated form used for
+        # persistence and for the legacy single-block LLM path.
+        # On restore, these default to the concat/empty pair — the next
+        # AgentLoop iteration's dynamic-prompt refresh step repopulates.
+        self._system_prompt_static: str = system_prompt
+        self._system_prompt_dynamic_suffix: str = ""
        self._max_context_tokens = max_context_tokens
        self._compaction_threshold = compaction_threshold
        # Buffer-based compaction trigger (Gap 7). When set, takes
@@ -439,6 +450,11 @@ class NodeConversation:
        # limit. If left as None the legacy threshold-based rule is
        # used, keeping old call sites behaving identically.
        self._compaction_buffer_tokens = compaction_buffer_tokens
+        # Ratio component of the hybrid buffer. Combines additively with
+        # _compaction_buffer_tokens so callers can express "reserve N tokens
+        # plus M% of the window" — the absolute floor matters on tiny
+        # windows, the ratio matters on large ones.
+        self._compaction_buffer_ratio = compaction_buffer_ratio
        self._compaction_warning_buffer_tokens = compaction_warning_buffer_tokens
        self._output_keys = output_keys
        self._store = store
@@ -453,15 +469,56 @@ class NodeConversation:

    @property
    def system_prompt(self) -> str:
+        """Full concatenated system prompt (static + dynamic suffix, if any).
+
+        This is the canonical form used for persistence and for the legacy
+        single-block LLM path. Split-prompt callers should read
+        ``system_prompt_static`` and ``system_prompt_dynamic_suffix`` instead.
+        """
        return self._system_prompt

-    def update_system_prompt(self, new_prompt: str) -> None:
+    @property
+    def system_prompt_static(self) -> str:
+        """Static prefix of the system prompt (cache-stable).
+
+        Equals ``system_prompt`` when no split is in use. When the AgentLoop
+        calls ``update_system_prompt(static, dynamic_suffix=...)``, this is
+        the piece sent as the cache-controlled first block.
+        """
+        return self._system_prompt_static
+
+    @property
+    def system_prompt_dynamic_suffix(self) -> str:
+        """Dynamic tail of the system prompt (not cached).
+
+        Empty unless the consumer splits its prompt. The LLM wrapper uses a
+        non-empty suffix to emit a two-block system content list with a
+        cache breakpoint between the static prefix and this tail.
+        """
+        return self._system_prompt_dynamic_suffix
+
+    def update_system_prompt(self, new_prompt: str, dynamic_suffix: str | None = None) -> None:
        """Update the system prompt.

        Used in continuous conversation mode at phase transitions to swap
        Layer 3 (focus) while preserving the conversation history.
+
+        When ``dynamic_suffix`` is provided, ``new_prompt`` is interpreted as
+        the STATIC prefix and ``dynamic_suffix`` as the per-turn tail; they
+        travel to the LLM as two separate cache-controlled blocks but are
+        persisted as a single concatenated string for backward-compat
+        restore. ``new_prompt`` alone (suffix left None) keeps the legacy
+        single-string behavior.
        """
-        self._system_prompt = new_prompt
+        if dynamic_suffix is None:
+            # Legacy single-string path — static == full, no suffix split.
+            self._system_prompt = new_prompt
+            self._system_prompt_static = new_prompt
+            self._system_prompt_dynamic_suffix = ""
+        else:
+            self._system_prompt_static = new_prompt
+            self._system_prompt_dynamic_suffix = dynamic_suffix
+            self._system_prompt = f"{new_prompt}\n\n{dynamic_suffix}" if dynamic_suffix else new_prompt
        self._meta_persisted = False  # re-persist with new prompt

    def set_current_phase(self, phase_id: str) -> None:
@@ -847,19 +904,30 @@ class NodeConversation:
        """True when the conversation should be compacted before the
        next LLM call.

-        Buffer-based rule (Gap 7): trigger when the current estimate
-        plus the configured buffer would exceed the hard context limit.
-        Prevents compaction from firing only AFTER we're already over
-        the wire and forced into a reactive binary-split pass.
+        Hybrid buffer rule: the headroom reserved before compaction fires
+        is the SUM of an absolute fixed component and a ratio of the hard
+        context limit:

-        When no buffer is configured, falls back to the multiplicative
-        threshold the old callers were built around.
+            effective_buffer = compaction_buffer_tokens
+                             + compaction_buffer_ratio * max_context_tokens
+
+        The fixed component gives a floor on tiny windows; the ratio
+        keeps the trigger meaningful on large windows where any constant
+        buffer becomes a rounding error (an 8k buffer is 75% on a 32k
+        window but 96% on a 200k window). Compaction fires when the
+        current estimate would consume more than (limit - effective_buffer).
+
+        When neither component is configured, falls back to the legacy
+        multiplicative threshold so old callers keep behaving identically.
        """
        if self._max_context_tokens <= 0:
            return False
-        if self._compaction_buffer_tokens is not None:
-            budget = self._max_context_tokens - self._compaction_buffer_tokens
-            return self.estimate_tokens() >= max(0, budget)
+        fixed = self._compaction_buffer_tokens
+        ratio = self._compaction_buffer_ratio
+        if fixed is not None or ratio is not None:
+            effective_buffer = (fixed or 0) + (ratio or 0.0) * self._max_context_tokens
+            budget = self._max_context_tokens - effective_buffer
+            return self.estimate_tokens() >= max(0.0, budget)
        return self.estimate_tokens() >= self._max_context_tokens * self._compaction_threshold

    def compaction_warning(self) -> bool:
@@ -1516,6 +1584,7 @@ class NodeConversation:
            "max_context_tokens": self._max_context_tokens,
            "compaction_threshold": self._compaction_threshold,
            "compaction_buffer_tokens": self._compaction_buffer_tokens,
+            "compaction_buffer_ratio": self._compaction_buffer_ratio,
            "compaction_warning_buffer_tokens": (self._compaction_warning_buffer_tokens),
            "output_keys": self._output_keys,
        }
@@ -1565,6 +1634,7 @@ class NodeConversation:
            store=store,
            run_id=run_id,
            compaction_buffer_tokens=meta.get("compaction_buffer_tokens"),
+            compaction_buffer_ratio=meta.get("compaction_buffer_ratio"),
            compaction_warning_buffer_tokens=meta.get("compaction_warning_buffer_tokens"),
        )
        conv._meta_persisted = True
@@ -16,7 +16,6 @@ import os
 import re
 import time
 from datetime import UTC, datetime
-from pathlib import Path
 from typing import Any

 from framework.agent_loop.conversation import Message, NodeConversation
@@ -31,19 +30,38 @@ logger = logging.getLogger(__name__)
 LLM_COMPACT_CHAR_LIMIT: int = 240_000
 LLM_COMPACT_MAX_DEPTH: int = 10

-# Microcompaction: tools whose results can be safely cleared
+# Microcompaction: tools whose results can be safely cleared from context
+# because the agent can re-derive them on demand. The bar for inclusion is
+# "old result has no irreversible value": file content can be re-read, a
+# search can be re-run, a screenshot can be re-captured, terminal output can
+# be re-fetched, etc. Write / edit results are short confirmations whose
+# value is in the side effect, not the message — also fair game.
 COMPACTABLE_TOOLS: frozenset[str] = frozenset(
    {
+        # File ops — content lives on disk, re-readable.
        "read_file",
-        "run_command",
-        "web_search",
-        "web_fetch",
-        "grep_search",
-        "glob_search",
+        "search_files",
        "write_file",
        "edit_file",
+        "pdf_read",
+        # Terminal — re-runnable; advanced job/output tools produce verbose
+        # logs whose recent state is what matters.
+        "terminal_exec",
+        "terminal_rg",
+        "terminal_find",
+        "terminal_output_get",
+        "terminal_job_logs",
+        # Web / research — pages and queries can be re-fetched.
+        "web_scrape",
+        "search_papers",
+        "download_paper",
+        "search_wikipedia",
+        # Browser read-only inspection — current page state is what matters,
+        # old snapshots are stale by definition.
        "browser_screenshot",
-        "list_directory",
+        "browser_snapshot",
+        "browser_html",
+        "browser_get_text",
    }
 )

@@ -657,8 +675,10 @@ def write_compaction_debug_log(
    level: str,
    inventory: list[dict[str, Any]] | None,
 ) -> None:
-    """Write detailed compaction analysis to ~/.hive/compaction_log/."""
-    log_dir = Path.home() / ".hive" / "compaction_log"
+    """Write detailed compaction analysis to $HIVE_HOME/compaction_log/."""
+    from framework.config import HIVE_HOME
+
+    log_dir = HIVE_HOME / "compaction_log"
    log_dir.mkdir(parents=True, exist_ok=True)

    ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S_%f")
@@ -857,7 +877,7 @@ def build_emergency_summary(
                if not all_files:
                    parts.append(
                        "NOTE: Large tool results may have been saved to files. "
-                        "Use list_directory to check the data directory."
+                        "Use search_files(target='files', path='.') to check the data directory."
                    )
        except Exception:
            parts.append("NOTE: Large tool results were saved to files. Use read_file(path='<path>') to read them.")
@@ -12,6 +12,7 @@ import json
 import logging
 from collections.abc import Awaitable, Callable
 from dataclasses import dataclass
+from datetime import datetime
 from typing import Any

 from framework.agent_loop.conversation import ConversationStore, NodeConversation
@@ -161,9 +162,18 @@ async def drain_injection_queue(
    conversation: NodeConversation,
    *,
    ctx: NodeContext,
-    describe_images_as_text_fn: (Callable[[list[dict[str, Any]]], Awaitable[str | None]] | None) = None,
+    caption_image_fn: (Callable[[str, list[dict[str, Any]]], Awaitable[tuple[str, str] | None]] | None) = None,
 ) -> int:
-    """Drain all pending injected events as user messages. Returns count."""
+    """Drain all pending injected events as user messages. Returns count.
+
+    ``caption_image_fn`` is the unified vision fallback hook. It takes
+    ``(intent, image_content)`` and returns ``(caption, model)`` on
+    success — the model id is logged so the destination is observable.
+    The user's typed ``content`` (the injected message body) is passed
+    as the intent so the captioner can answer the user's specific
+    question about the image rather than producing a generic
+    description; an empty content falls back to a generic intent.
+    """
    count = 0
    logger.debug(
        "[drain_injection_queue] Starting to drain queue, initial queue size: %s",
@@ -183,23 +193,34 @@ async def drain_injection_queue(
                    "Model '%s' does not support images; attempting vision fallback",
                    ctx.llm.model,
                )
-                if describe_images_as_text_fn is not None:
-                    description = await describe_images_as_text_fn(image_content)
-                    if description:
+                if caption_image_fn is not None:
+                    intent = content or ("Describe these user-injected images for a text-only agent.")
+                    caption_result = await caption_image_fn(intent, image_content)
+                    if caption_result:
+                        description, vision_model = caption_result
                        content = f"{content}\n\n{description}" if content else description
-                        logger.info("[drain] image described as text via vision fallback")
+                        logger.info(
+                            "[drain] image described as text via vision fallback (model '%s')",
+                            vision_model,
+                        )
                    else:
                        logger.info("[drain] no vision fallback available; images dropped")
                image_content = None
-            # Real user input is stored as-is; external events get a prefix
+            # Stamp every injected event with its arrival time so the model
+            # has a consistent temporal log to reason over (and so the
+            # stamp lives inside byte-stable conversation history instead
+            # of a per-turn system-prompt tail). Minute precision is what
+            # the queen needs for conversational / scheduling context.
+            stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
            if is_client_input:
+                stamped = f"[{stamp}] {content}" if content else f"[{stamp}]"
                await conversation.add_user_message(
-                    content,
+                    stamped,
                    is_client_input=True,
                    image_content=image_content,
                )
            else:
-                await conversation.add_user_message(f"[External event]: {content}")
+                await conversation.add_user_message(f"[{stamp}] [External event] {content}")
            count += 1
        except asyncio.QueueEmpty:
            break
@@ -232,7 +253,8 @@ async def drain_trigger_queue(
        payload_str = json.dumps(t.payload, default=str)
        parts.append(f"[TRIGGER: {t.trigger_type}/{t.source_id}]{task_line}\n{payload_str}")

-    combined = "\n\n".join(parts)
+    stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
+    combined = f"[{stamp}]\n" + "\n\n".join(parts)
    logger.info("[drain] %d trigger(s): %s", len(triggers), combined[:200])
    # Tag the message so the UI can render a banner instead of the raw
    # `[TRIGGER: ...]` text. The LLM still sees `combined` verbatim.
@@ -108,6 +108,8 @@ async def publish_llm_turn_complete(
    input_tokens: int,
    output_tokens: int,
    cached_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+    cost_usd: float = 0.0,
    execution_id: str = "",
    iteration: int | None = None,
 ) -> None:
@@ -120,6 +122,8 @@ async def publish_llm_turn_complete(
            input_tokens=input_tokens,
            output_tokens=output_tokens,
            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            execution_id=execution_id,
            iteration=iteration,
        )
@@ -69,6 +69,20 @@ class LoopConfig:
    # and less tight than Anthropic's own counting. Override via
    # LoopConfig for larger windows.
    compaction_buffer_tokens: int = 8_000
+    # Ratio-based component of the hybrid compaction buffer. Effective
+    # headroom reserved before compaction fires is
+    #   compaction_buffer_tokens + compaction_buffer_ratio * max_context_tokens
+    # The ratio scales with the model's window where the absolute fixed
+    # component does not (an 8k absolute buffer is 75% trigger on a 32k
+    # window but 96% on a 200k window). Combining them gives an absolute
+    # floor sized for the worst-case single tool result (one un-spilled
+    # max_tool_result_chars payload ≈ 30k chars ≈ 7.5k tokens, rounded to
+    # 8k) plus a fractional headroom that keeps the trigger meaningful on
+    # large windows, so the inner tool loop always has room to grow
+    # without tripping the mid-turn pre-send guard. Defaults: 8k + 15%.
+    # On 32k that's a 12.8k buffer (~60% trigger); on 200k it's 38k
+    # (~81% trigger); on 1M it's 158k (~84% trigger).
+    compaction_buffer_ratio: float = 0.15
    # Warning is emitted one buffer earlier so the user/telemetry gets
    # a "we're close" signal without triggering a compaction pass.
    compaction_warning_buffer_tokens: int = 12_000
@@ -0,0 +1,306 @@
+"""Vision-fallback subagent for tool-result images on text-only LLMs.
+
+When a tool returns image content but the main agent's model can't
+accept image blocks (i.e. its catalog entry has ``supports_vision: false``),
+the framework strips the images before they ever reach the LLM. Without
+this module, the agent then sees only the tool's text envelope (URL,
+dimensions, size) and is blind to whatever the image actually shows.
+
+This module provides:
+
+* ``caption_tool_image()`` — direct LiteLLM call to a configured
+  vision model (``vision_fallback`` block in ``~/.hive/configuration.json``)
+  that takes the agent's intent + the image(s) and returns a textual
+  description tailored to that intent.
+* ``extract_intent_for_tool()`` — pull the most recent assistant text
+  + the tool call descriptor and concatenate them into a ≤2KB intent
+  string the vision subagent can reason against.
+
+Both helpers degrade silently — return ``None`` / a placeholder rather
+than raise — so a vision-fallback failure can never kill the main
+agent's run. The agent-loop call site retries the configured model
+once on a None return, then falls back to
+``gemini/gemini-3-flash-preview`` via the ``model_override`` parameter
+of :func:`caption_tool_image`.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from datetime import datetime
+from typing import TYPE_CHECKING, Any
+
+from framework.config import (
+    get_vision_fallback_api_base,
+    get_vision_fallback_api_key,
+    get_vision_fallback_model,
+)
+
+if TYPE_CHECKING:
+    from ..conversation import NodeConversation
+
+logger = logging.getLogger(__name__)
+
+
+# Hard cap on the intent string handed to the vision subagent. The
+# subagent only needs the agent's recent reasoning + the tool descriptor;
+# anything longer is wasted tokens (and risks pushing past the vision
+# model's context with the image attached).
+_INTENT_MAX_CHARS = 4096
+
+# Cap on the tool args JSON snippet inside the intent. Some tool inputs
+# (large strings, file contents) would dominate the intent if uncapped.
+_TOOL_ARGS_MAX_CHARS = 4096
+
+# Subagent system prompt — kept short so it fits within any provider's
+# system-prompt budget alongside the user message + image. Tells the
+# subagent its role and constrains output format.
+#
+# Coordinate labeling: the main agent's browser tools
+# (browser_click_coordinate / browser_hover_coordinate / browser_press_at)
+# accept VIEWPORT FRACTIONS (x, y) in [0..1] where (0,0) is the top-left
+# and (1,1) is the bottom-right of the screenshot. Without coordinates
+# the text-only agent has no way to act on what we describe — it can
+# read the caption but cannot point. So for every interactive element
+# we name (button, link, input, icon, tab, menu item, dialog control),
+# include its approximate viewport-fraction centre as ``(fx, fy)``
+# right after the element's name, e.g. ``"Submit" button (0.83, 0.92)``.
+# Three rules: (1) coordinates only for things plausibly clickable /
+# hoverable / typeable — don't tag pure body text or decorative
+# graphics. (2) Eyeball to two decimal places; precision beyond that
+# is false confidence. (3) Never invent — if an element is partly
+# off-screen or you can't locate it, omit the coordinate rather than
+# guessing.
+_VISION_SUBAGENT_SYSTEM = (
+    "You are a vision subagent for a text-only main agent. The main "
+    "agent invoked a tool that returned the image(s) attached. Their "
+    "intent (their reasoning + the tool call) is below. Describe what "
+    "the image shows in service of their intent — concrete, factual, "
+    "no speculation. If their intent asks a yes/no question, answer it "
+    "directly first.\n\n"
+    "Coordinate labeling: the main agent uses fractional viewport "
+    "coordinates (x, y) in [0..1] — (0, 0) is the top-left of the "
+    "image, (1, 1) is the bottom-right — to drive its click / hover / "
+    "key-press tools. For every interactive element you mention "
+    "(button, link, input, checkbox, radio, dropdown, tab, menu item, "
+    "dialog control, icon), append its approximate centre as "
+    "``(fx, fy)`` immediately after the element's name or label, e.g. "
+    '``"Submit" button (0.83, 0.92)`` or ``profile avatar icon '
+    "(0.05, 0.07)``. Use two decimal places — more is false precision. "
+    "Skip coordinates for pure body text and decorative elements that "
+    "aren't clickable. If an element is partially off-screen or you "
+    "cannot reliably locate its centre, omit the coordinate rather "
+    "than guessing.\n\n"
+    "Output plain text, no markdown, ≤ 600 words."
+)
+
+
+def extract_intent_for_tool(
+    conversation: NodeConversation,
+    tool_name: str,
+    tool_args: dict[str, Any] | None,
+) -> str:
+    """Build the intent string passed to the vision subagent.
+
+    Combines the most recent assistant text (the LLM's reasoning right
+    before invoking the tool) with a structured tool-call descriptor.
+    Truncates to ``_INTENT_MAX_CHARS`` total, favouring the head of the
+    assistant text where goal-stating sentences usually live.
+
+    If no preceding assistant text exists (rare — first turn), falls
+    back to ``"<no preceding reasoning>"`` so the subagent still gets
+    the tool descriptor.
+    """
+    args_json: str
+    try:
+        args_json = json.dumps(tool_args or {}, default=str)
+    except Exception:
+        args_json = repr(tool_args)
+    if len(args_json) > _TOOL_ARGS_MAX_CHARS:
+        args_json = args_json[:_TOOL_ARGS_MAX_CHARS] + "…"
+
+    tool_line = f"Called: {tool_name}({args_json})"
+
+    # Walk newest → oldest, take the first assistant message with text.
+    assistant_text = ""
+    try:
+        messages = getattr(conversation, "_messages", []) or []
+        for msg in reversed(messages):
+            if getattr(msg, "role", None) != "assistant":
+                continue
+            content = getattr(msg, "content", "") or ""
+            if isinstance(content, str) and content.strip():
+                assistant_text = content.strip()
+                break
+    except Exception:
+        # Defensive — the agent loop must keep running even if the
+        # conversation structure changes shape.
+        assistant_text = ""
+
+    if not assistant_text:
+        assistant_text = "<no preceding reasoning>"
+
+    # Intent = tool descriptor (always intact) + reasoning (truncated).
+    head = f"{tool_line}\n\nReasoning before call:\n"
+    budget = _INTENT_MAX_CHARS - len(head)
+    if budget < 100:
+        # Tool descriptor is huge somehow — truncate it.
+        return head[:_INTENT_MAX_CHARS]
+    if len(assistant_text) > budget:
+        assistant_text = assistant_text[: budget - 1] + "…"
+    return head + assistant_text
+
+
+async def caption_tool_image(
+    intent: str,
+    image_content: list[dict[str, Any]],
+    *,
+    timeout_s: float = 30.0,
+    model_override: str | None = None,
+) -> tuple[str, str] | None:
+    """Caption the given images using the configured ``vision_fallback`` model.
+
+    Returns ``(caption, model)`` on success or ``None`` on any failure
+    (no config, no API key, timeout, exception, empty response).
+
+    ``model_override`` swaps in a different litellm model id while
+    keeping the configured ``vision_fallback`` ``api_key`` / ``api_base``
+    untouched. That's deliberate: Hive subscribers configure
+    ``vision_fallback`` to point at the Hive proxy, which routes to
+    multiple models including Gemini — so reusing the credentials lets
+    a Gemini-3-flash override still work without a separate
+    ``GEMINI_API_KEY``. When no creds are configured, litellm falls
+    back to env-var resolution.
+
+    Logs each call to ``~/.hive/llm_logs`` via ``log_llm_turn``.
+    """
+    model = model_override or get_vision_fallback_model()
+    if not model:
+        return None
+    api_key = get_vision_fallback_api_key()
+    api_base = get_vision_fallback_api_base()
+    if not api_key and not model_override:
+        logger.debug("vision_fallback configured but no API key resolved; skipping")
+        return None
+
+    try:
+        import litellm
+    except ImportError:
+        return None
+
+    user_blocks: list[dict[str, Any]] = [{"type": "text", "text": intent}]
+    user_blocks.extend(image_content)
+    messages = [
+        {"role": "system", "content": _VISION_SUBAGENT_SYSTEM},
+        {"role": "user", "content": user_blocks},
+    ]
+
+    # Apply the same proxy rewrites the main LLM provider uses so a
+    # `hive/...` / `kimi/...` model resolves to the right Anthropic-
+    # compatible endpoint with the right auth header. Without this,
+    # litellm doesn't know what `hive/kimi-k2.5` is and rejects the call
+    # with "LLM Provider NOT provided."
+    from framework.llm.litellm import rewrite_proxy_model
+
+    rewritten_model, rewritten_base, extra_headers = rewrite_proxy_model(model, api_key, api_base)
+
+    kwargs: dict[str, Any] = {
+        "model": rewritten_model,
+        "messages": messages,
+        "max_tokens": 8192,
+        "timeout": timeout_s,
+    }
+    # Always pass api_key when we have one, even alongside proxy-rewritten
+    # extra_headers. litellm's anthropic handler refuses to dispatch
+    # without an api_key (it sends it as x-api-key); the proxy itself
+    # authenticates via the Authorization: Bearer header in
+    # extra_headers. Both are needed — matches LiteLLMProvider's path.
+    if api_key:
+        kwargs["api_key"] = api_key
+    if rewritten_base:
+        kwargs["api_base"] = rewritten_base
+    if extra_headers:
+        kwargs["extra_headers"] = extra_headers
+
+    # Surface where the request is going so the user can verify the
+    # vision fallback is hitting the expected proxy / model. Redacts
+    # the API key to a length+head+tail digest so it can be cross-
+    # correlated with other auth-related log lines.
+    key_digest = (
+        f"len={len(api_key)} {api_key[:8]}…{api_key[-4:]}"
+        if api_key and len(api_key) >= 12
+        else f"len={len(api_key) if api_key else 0}"
+    )
+    logger.info(
+        "[vision_fallback] dispatching: configured_model=%s rewritten_model=%s "
+        "api_base=%s api_key=%s images=%d intent_chars=%d timeout_s=%.1f",
+        model,
+        rewritten_model,
+        rewritten_base or "<litellm-default>",
+        key_digest,
+        len(image_content),
+        len(intent),
+        timeout_s,
+    )
+
+    started = datetime.now()
+    caption: str | None = None
+    error_text: str | None = None
+    try:
+        response = await litellm.acompletion(**kwargs)
+        text = (response.choices[0].message.content or "").strip()
+        if text:
+            caption = text
+        logger.info(
+            "[vision_fallback] response: model=%s api_base=%s elapsed_s=%.2f chars=%d",
+            rewritten_model,
+            rewritten_base or "<litellm-default>",
+            (datetime.now() - started).total_seconds(),
+            len(text),
+        )
+    except Exception as exc:
+        error_text = f"{type(exc).__name__}: {exc}"
+        logger.warning(
+            "[vision_fallback] failed: model=%s api_base=%s error=%s",
+            rewritten_model,
+            rewritten_base or "<litellm-default>",
+            error_text,
+        )
+
+    # Best-effort audit log so users can grep ~/.hive/llm_logs/ for
+    # vision-fallback subagent calls. Failures here must not bubble.
+    try:
+        from framework.tracker.llm_debug_logger import log_llm_turn
+
+        # Don't dump the base64 image data into the log file — that
+        # would balloon the jsonl with mostly-binary noise.
+        elided_blocks: list[dict[str, Any]] = [{"type": "text", "text": intent}]
+        elided_blocks.extend({"type": "image_url", "image_url": {"url": "<elided>"}} for _ in range(len(image_content)))
+        log_llm_turn(
+            node_id="vision_fallback_subagent",
+            stream_id="vision_fallback",
+            execution_id="vision_fallback_subagent",
+            iteration=0,
+            system_prompt=_VISION_SUBAGENT_SYSTEM,
+            messages=[{"role": "user", "content": elided_blocks}],
+            assistant_text=caption or "",
+            tool_calls=[],
+            tool_results=[],
+            token_counts={
+                "model": model,
+                "elapsed_s": (datetime.now() - started).total_seconds(),
+                "error": error_text,
+                "num_images": len(image_content),
+                "intent_chars": len(intent),
+            },
+        )
+    except Exception:
+        pass
+
+    if caption is None:
+        return None
+    return caption, model
+
+
+__all__ = ["caption_tool_image", "extract_intent_for_tool"]
@@ -53,7 +53,14 @@ def build_prompt_spec(
    # trigger tools are present in this agent's tool list (e.g. browser_*
    # pulls in hive.browser-automation). Keeps non-browser agents lean.
    tool_names = [getattr(t, "name", "") for t in (getattr(ctx, "available_tools", None) or [])]
-    skills_catalog_prompt = augment_catalog_for_tools(ctx.skills_catalog_prompt or "", tool_names)
+    raw_catalog = ctx.skills_catalog_prompt or ""
+    dynamic_catalog = getattr(ctx, "dynamic_skills_catalog_provider", None)
+    if dynamic_catalog is not None:
+        try:
+            raw_catalog = dynamic_catalog() or ""
+        except Exception:
+            raw_catalog = ctx.skills_catalog_prompt or ""
+    skills_catalog_prompt = augment_catalog_for_tools(raw_catalog, tool_names)

    return PromptSpec(
        identity_prompt=ctx.identity_prompt or "",
@@ -180,9 +180,39 @@ class AgentContext:

    stream_id: str = ""

+    # ----- Task system fields (see framework/tasks) -------------------
+    # task_list_id: this agent's own session-scoped list, e.g.
+    #   session:{agent_id}:{session_id}. Set by the runner / ColonyRuntime
+    #   before the loop starts; immutable after first task_create.
+    task_list_id: str | None = None
+    # colony_id: set on the queen of a colony AND on every spawned worker
+    #   so workers can render the "picked up" chip and the queen can address
+    #   her colony template via colony_template_* tools.
+    colony_id: str | None = None
+    # picked_up_from: for workers, the (colony_task_list_id, template_task_id)
+    #   pair their session was spawned for. None for the queen and queen-DM.
+    picked_up_from: tuple[str, int] | None = None
+
    dynamic_tools_provider: Any = None
    dynamic_prompt_provider: Any = None
+    # Optional Callable[[], str]: when set alongside ``dynamic_prompt_provider``,
+    # the AgentLoop sends the system prompt as two pieces — the result of
+    # ``dynamic_prompt_provider`` is the STATIC block (cached), and this
+    # provider returns the DYNAMIC suffix (not cached). The LLM wrapper
+    # emits them as two Anthropic system content blocks with a cache
+    # breakpoint between them for providers that honor ``cache_control``.
+    # For providers that don't, the two strings are concatenated. Used by
+    # the Queen to keep her persona/role/tools block warm across iterations
+    # while the recall + timestamp tail refreshes per user turn.
+    dynamic_prompt_suffix_provider: Any = None
    dynamic_memory_provider: Any = None
+    # Optional Callable[[], str]: when set, the current skills-catalog
+    # prompt is sourced from this provider each iteration. Lets workers
+    # pick up UI toggles without restarting the run. Queen agents already
+    # rebuild the whole prompt via dynamic_prompt_provider — this field
+    # is a surgical alternative used by colony workers where the rest of
+    # the prompt stays constant and we don't want to thrash the cache.
+    dynamic_skills_catalog_provider: Any = None

    skills_catalog_prompt: str = ""
    protocols_prompt: str = ""
@@ -94,7 +94,7 @@ def _list_aden_accounts() -> list[dict]:

        client = AdenCredentialClient(
            AdenClientConfig(
-                base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
+                base_url=os.environ.get("ADEN_API_URL", "https://app.open-hive.com"),
            )
        )
        try:
@@ -560,7 +560,9 @@ class CredentialTesterAgent:
        if self._selected_account is None:
            raise RuntimeError("No account selected. Call select_account() first.")

-        self._storage_path = Path.home() / ".hive" / "agents" / "credential_tester"
+        from framework.config import HIVE_HOME
+
+        self._storage_path = HIVE_HOME / "agents" / "credential_tester"
        self._storage_path.mkdir(parents=True, exist_ok=True)

        self._tool_registry = ToolRegistry()
@@ -66,7 +66,9 @@ def _get_last_active(agent_path: Path) -> str | None:
    latest: str | None = None

    # 1. Worker sessions
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    from framework.config import HIVE_HOME
+
+    sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
    if sessions_dir.exists():
        for session_dir in sessions_dir.iterdir():
            if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
@@ -115,7 +117,9 @@ def _get_last_active(agent_path: Path) -> str | None:

 def _count_sessions(agent_name: str) -> int:
    """Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    from framework.config import HIVE_HOME
+
+    sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
    if not sessions_dir.exists():
        return 0
    return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
@@ -123,7 +127,9 @@ def _count_sessions(agent_name: str) -> int:

 def _count_runs(agent_name: str) -> int:
    """Count unique run_ids across all sessions for an agent."""
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    from framework.config import HIVE_HOME
+
+    sessions_dir = HIVE_HOME / "agents" / agent_name / "sessions"
    if not sessions_dir.exists():
        return 0
    run_ids: set[str] = set()
@@ -146,7 +152,7 @@ def _count_runs(agent_name: str) -> int:
    return len(run_ids)


-_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata"}
+_EXCLUDED_JSON_STEMS = {"agent", "flowchart", "triggers", "configuration", "metadata", "tasks"}


 def _is_colony_dir(path: Path) -> bool:
@@ -2,12 +2,13 @@

 import json
 from dataclasses import dataclass, field
-from pathlib import Path


 def _load_preferred_model() -> str:
-    """Load preferred model from ~/.hive/configuration.json."""
-    config_path = Path.home() / ".hive" / "configuration.json"
+    """Load preferred model from $HIVE_HOME/configuration.json."""
+    from framework.config import HIVE_HOME
+
+    config_path = HIVE_HOME / "configuration.json"
    if config_path.exists():
        try:
            with open(config_path, encoding="utf-8") as f:
@@ -1,9 +1,8 @@
 """One-shot LLM gate that decides if a queen DM is ready to fork a colony.

 The queen's ``start_incubating_colony`` tool calls :func:`evaluate` with
-the queen's recent conversation, a proposed ``colony_name``, and a
-one-paragraph ``intended_purpose``.  The evaluator returns a structured
-verdict:
+the queen's recent conversation and a proposed ``colony_name``.  The
+evaluator returns a structured verdict:

    {
        "ready": bool,
@@ -38,8 +37,8 @@ You gate whether a queen agent should commit to forking a persistent
 expensive: it ends the user's chat with this queen and the worker runs
 unattended afterward, so the spec must be settled before you approve.

-Read the conversation excerpt and the queen's proposed colony_name +
-intended_purpose, then decide.
+Read the conversation excerpt and the queen's proposed colony_name,
+then decide.

 APPROVE (ready=true) only when ALL of the following hold:
  1. The user has explicitly asked for work that needs to outlive this
@@ -128,11 +127,9 @@ def format_conversation_excerpt(messages: list[Message]) -> str:
 def _build_user_message(
    conversation_excerpt: str,
    colony_name: str,
-    intended_purpose: str,
 ) -> str:
    return (
        f"## Proposed colony name\n{colony_name}\n\n"
-        f"## Queen's intended_purpose\n{intended_purpose.strip()}\n\n"
        f"## Recent conversation (oldest → newest)\n{conversation_excerpt}\n\n"
        "Decide: should this queen be approved to enter INCUBATING phase?"
    )
@@ -189,7 +186,6 @@ async def evaluate(
    llm: Any,
    messages: list[Message],
    colony_name: str,
-    intended_purpose: str,
 ) -> dict[str, Any]:
    """Run the incubating evaluator against the queen's conversation.

@@ -200,14 +196,13 @@ async def evaluate(
        messages: The queen's conversation messages, oldest first.  The
            evaluator slices its own tail; pass the full list.
        colony_name: Validated colony slug.
-        intended_purpose: Queen's one-paragraph brief.

    Returns:
        ``{"ready": bool, "reasons": [str], "missing_prerequisites": [str]}``.
        Fail-closed on any error.
    """
    excerpt = format_conversation_excerpt(messages)
-    user_msg = _build_user_message(excerpt, colony_name, intended_purpose)
+    user_msg = _build_user_message(excerpt, colony_name)

    try:
        response = await llm.acomplete(
@@ -1,3 +1,3 @@
 {
-  "include": ["gcu-tools", "hive_tools"]
+  "include": ["gcu-tools", "hive_tools", "terminal-tools", "chart-tools"]
 }
@@ -1,10 +1,10 @@
 {
-  "coder-tools": {
+  "files-tools": {
    "transport": "stdio",
    "command": "uv",
-    "args": ["run", "python", "coder_tools_server.py", "--stdio"],
+    "args": ["run", "python", "files_server.py", "--stdio"],
    "cwd": "../../../../tools",
-    "description": "Unsandboxed file system tools for code generation and validation"
+    "description": "File system tools (read/write/edit/search) for code generation"
  },
  "gcu-tools": {
    "transport": "stdio",
@@ -32,7 +32,7 @@ def finalize_queen_prompt(text: str, has_vision: bool) -> str:
 # ---------------------------------------------------------------------------

 # Independent phase: queen operates as a standalone agent — no worker.
-# Core tools are listed here; MCP tools (coder-tools, gcu-tools) are added
+# Core tools are listed here; MCP tools (files-tools, gcu-tools) are added
 # dynamically in queen_orchestrator.py because their tool names aren't known
 # at import time.
 _QUEEN_INDEPENDENT_TOOLS = [
@@ -40,11 +40,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
    "read_file",
    "write_file",
    "edit_file",
-    "hashline_edit",
-    "list_directory",
    "search_files",
-    "run_command",
-    "undo_changes",
    # NOTE (2026-04-16): ``run_parallel_workers`` is not in the DM phase.
    # Pure DM is for conversation with the user; fan out parallel work via
    # ``start_incubating_colony`` (which gates the colony fork behind a
@@ -60,9 +56,7 @@ _QUEEN_INDEPENDENT_TOOLS = [
 # (e.g. inspect an existing skill) before committing.
 _QUEEN_INCUBATING_TOOLS = [
    "read_file",
-    "list_directory",
    "search_files",
-    "run_command",
    # Schedule lives on the colony, not on the queen session — pass it
    # inline as create_colony(triggers=[...]) instead of staging through
    # set_trigger here.
@@ -76,9 +70,7 @@ _QUEEN_INCUBATING_TOOLS = [
 _QUEEN_WORKING_TOOLS = [
    # Read-only
    "read_file",
-    "list_directory",
    "search_files",
-    "run_command",
    # Monitoring + worker dialogue
    "get_worker_status",
    "inject_message",
@@ -95,9 +87,7 @@ _QUEEN_WORKING_TOOLS = [
 _QUEEN_REVIEWING_TOOLS = [
    # Read-only
    "read_file",
-    "list_directory",
    "search_files",
-    "run_command",
    # Status + escalation replies
    "get_worker_status",
    "list_worker_questions",
@@ -132,11 +122,10 @@ phase. Your identity tells you WHO you are.
 # ---------------------------------------------------------------------------

 _queen_role_independent = """\
-You are in INDEPENDENT mode. No worker layout — you do the work yourself. \
-You have full coding tools (read/write/edit/search/run) and MCP tools \
-(file operations via coder-tools, browser automation via gcu-tools). \
-Execute the user's task directly using conversation and tools. \
-You are the agent. \
+You are in INDEPENDENT mode. \
+You have full coding tools (read/write/edit/search) and MCP tools \
+(file operations via files-tools, browser automation via gcu-tools). \
+Execute the user's task directly using planning, conversation and tools.
 If you need a structured choice or approval gate, always use \
 ``ask_user``; otherwise ask in plain prose. ``ask_user`` takes a \
 ``questions`` array — pass a single entry for one question, or batch \
@@ -145,13 +134,12 @@ several entries when you have multiple clarifications. \
 When the user clearly wants persistent / recurring / headless work that \
 needs to outlive THIS chat (e.g. "every morning", "monitor X and alert \
 me", "set up a job that…"), call ``start_incubating_colony`` with a \
-proposed colony_name and a one-paragraph intended_purpose. A side \
-evaluator reads the conversation and decides if the spec is settled. If \
-it returns ``not_ready`` you keep talking with the user — sort out \
-whatever the evaluator said is missing, then retry. If it returns \
-``incubating`` your phase flips and a new prompt takes over. Do not \
-try to write SKILL.md, fork directories, or otherwise build the colony \
-yourself in this phase.\
+proposed colony_name. A side evaluator reads the conversation and \
+decides if the spec is settled. If it returns ``not_ready`` you keep \
+talking with the user — sort out whatever the evaluator said is \
+missing, then retry. If it returns ``incubating`` your phase flips and \
+a new prompt takes over. Do not try to write SKILL.md, fork \
+directories, or otherwise build the colony yourself in this phase.\
 """

 _queen_role_incubating = """\
@@ -179,7 +167,7 @@ no harm, you go back to INDEPENDENT and can retry later.

 If the user explicitly asks for something UNRELATED to the current \
 colony being drafted (a side question, a one-shot task, a different \
-problem), don't try to handle it from this limited tool surface. Call \
+problem), Call \
 ``cancel_incubation`` first to switch back to INDEPENDENT where you \
 have the full toolkit, handle their request there, and re-enter \
 INCUBATING later via ``start_incubating_colony`` when they want to \
@@ -224,6 +212,11 @@ user decide next steps. Read generated files or worker reports with \
 read_file when the user asks for specifics. If the user wants \
 another pass, kick it off with run_parallel_workers; otherwise stay \
 conversational.
+
+If the review itself is multi-step (e.g. "verify each worker's output, \
+then draft a summary, then propose next steps"), lay it out upfront \
+with `task_create_batch` and walk through with `task_update`. Skip the \
+ceremony for a single-paragraph summary.
 """


@@ -232,30 +225,45 @@ conversational.
 # ---------------------------------------------------------------------------

 _queen_tools_independent = """
-# Tools (INDEPENDENT mode)
+# Tools

-## File I/O (coder-tools MCP)
- read_file, write_file, edit_file, hashline_edit, list_directory, \
-search_files, run_command, undo_changes
+## Planning — use FIRST for multi-step work
+- task_create_batch — When a request has 2+ atomic steps, your FIRST \
+tool call is `task_create_batch` with one entry per step (atomic, \
+one round-trip).
+- task_create — One-off mid-run additions when you discover \
+unplanned work AFTER the initial plan is laid out.
+- task_update / task_list / task_get — Mark progress, inspect, or \
+re-read state.
+
+See "Independent execution" for the per-step flow and granularity rule.
+
+## File I/O (files-tools MCP)
+- read_file, write_file, edit_file, search_files
+  - edit_file covers single-file fuzzy find/replace (mode='replace', default) \
+and multi-file structured patches (mode='patch'). Patch mode supports \
+Update / Add / Delete / Move atomically across many files in one call.
+  - search_files covers grep/find/ls in one tool: target='content' to \
+search inside files, target='files' (with a glob like '*.py') to list \
+or find files. Mtime-sorted in files mode.

 ## Browser Automation (gcu-tools MCP)
- Use `browser_*` tools (browser_start, browser_navigate, browser_click, \
-  browser_fill, browser_snapshot, <!-- vision-only -->browser_screenshot, <!-- /vision-only -->browser_scroll, \
-  browser_tabs, browser_close, browser_evaluate, etc.).
+- Use `browser_*` tools — `browser_open(url)` is the cold-start entry point \
+  (lazy-creates the context; no separate "start" call). Then `browser_navigate`, \
+  `browser_click`, `browser_type`, `browser_snapshot`, \
+  <!-- vision-only -->`browser_screenshot`, <!-- /vision-only -->`browser_scroll`, \
+  `browser_tabs`, `browser_close`, `browser_evaluate`, etc.
 - MUST Follow the browser-automation skill protocol before using browser tools.

 ## Hand off to a colony
- start_incubating_colony(colony_name, intended_purpose) — Use this when \
-  the user wants persistent / recurring / headless work that needs to \
-  outlive THIS chat. It does NOT fork on its own; it spawns a one-shot \
-  evaluator that reads this conversation and decides whether the spec \
-  is settled enough to proceed. On approval your phase flips to \
-  INCUBATING and a new tool surface (including create_colony itself) \
-  unlocks. On rejection you stay here and keep the conversation going \
-  to fill the gaps the evaluator named.
- ``intended_purpose`` is a one-paragraph brief: what the colony will \
-  do, on what cadence, why it must outlive this chat. Don't write a \
-  SKILL.md here — that comes in INCUBATING.
+- start_incubating_colony(colony_name) — Use this when the user wants \
+  persistent / recurring / headless work that needs to outlive THIS \
+  chat. It does NOT fork on its own; it spawns a one-shot evaluator \
+  that reads this conversation and decides whether the spec is settled \
+  enough to proceed. On approval your phase flips to INCUBATING and a \
+  new tool surface (including create_colony itself) unlocks. On \
+  rejection you stay here and keep the conversation going to fill the \
+  gaps the evaluator named.
 """

 _queen_tools_incubating = """
@@ -265,10 +273,11 @@ You've been approved to fork. The full coding toolkit is gone on \
 purpose — your job in this phase is to nail the spec, not keep doing \
 work. Available:

-## Read-only inspection (coder-tools MCP)
- read_file, list_directory, search_files, run_command — for confirming \
-details before you commit (e.g. peek at an existing skill in \
-~/.hive/skills/, sanity-check an API URL).
+## Read-only inspection (files-tools MCP)
+- read_file, search_files — for confirming details before \
+you commit (e.g. peek at an existing skill in ~/.hive/skills/, sanity-check \
+an API URL). search_files covers both grep (target='content') and ls/find \
+(target='files', glob like '*.py').

 ## Approved → operational checklist (use your judgement, ask only what's missing)
 The conversation that got you here probably did NOT cover all of:
@@ -317,6 +326,18 @@ the rest.
  overall purpose. Validated up front — a bad cron, missing task, or \
  malformed webhook path fails the call before anything is written, \
  so you can retry with corrected input.
+- ``worker_profiles`` (optional array) — pass this ONLY when the \
+  colony needs multiple authorized accounts of the same vendor (two \
+  Slack workspaces, two Gmail accounts) so each worker calls the \
+  right one. Each entry: ``{name, integrations: {provider: alias}, \
+  task?, skill_name?, concurrency_hint?, prompt_override?, \
+  tool_filter?}``. ``alias`` is the account label the user assigned \
+  on hive.adenhq.com (e.g. ``work``, ``personal``); discover \
+  available aliases via ``get_account_info()``. If omitted, the \
+  colony has a single implicit ``default`` profile that uses each \
+  provider's primary account — that's the right call for almost \
+  every colony. Use ``update_worker_profile`` to swap a profile's \
+  alias later without rebuilding the colony.
 - After this returns, the chat is over: the session locks immediately \
  and the user gets a "compact and start a new session with you" \
  button. So make your call to create_colony the last thing you do — \
@@ -362,7 +383,8 @@ operational, not editorial.
  born from a fresh chat via start_incubating_colony.

 ## Read-only inspection
- read_file, list_directory, search_files, run_command
+- read_file, search_files (search_files covers grep/find/ls \
+via target='content' or target='files')

 When every worker has reported (success or failure), the phase \
 auto-moves to REVIEWING. You do not need to call a transition tool \
@@ -381,7 +403,7 @@ _queen_tools_reviewing = """
 # Tools (REVIEWING mode)

 Workers have finished. You have:
- Read-only: read_file, list_directory, search_files, run_command
+- Read-only: read_file, search_files (search_files = grep+find+ls)
 - get_worker_status(focus?) — Pull the final status / per-worker reports
 - list_worker_questions() / reply_to_worker(request_id, reply) — Answer any \
 late escalations still in the inbox
@@ -401,14 +423,40 @@ asks for specifics. Do not invent a new pass unless the user asks for one.
 _queen_behavior_independent = """
 ## Independent execution

-You are the agent. Do one real inline instance before any scaling — \
-open the browser, call the real API, write to the real file. If the \
-action is irreversible or touches shared systems, show and confirm \
-before executing. Report concrete evidence (actual output, what \
-worked / failed) after the run. Scale order once inline succeeds: \
-repeat inline (≤10 items) → `run_parallel_workers` (batch, results \
-now) → `create_colony` (recurring / background). Conceptual or \
-strategic questions: answer directly, skip execution.
+You are the agent. **For multi-step work (2+ atomic actions): call \
+`task_create_batch`** with one entry per atomic action, \
+before you touch any other tool. \
+Then work the list one task at a time:
+
+1. `task_update` → in_progress before you start the step.
+2. Do one real inline instance — open the browser, call the real API, \
+write to the real file. If the action is irreversible or touches \
+shared systems, show and confirm before executing. Report concrete \
+evidence (actual output, what worked / failed) after the run.
+3. `task_update` → completed THE MOMENT it's done. **Do not let \
+multiple finished tasks pile up unmarked.** There is no batch update \
+tool by design — each `completed` transition is a discrete progress \
+heartbeat in the user's right-rail panel. Without those transitions \
+the panel shows a hung spinner no matter how much real work you got \
+done.
+
+**Granularity: one task per atomic action, not one umbrella per project.** \
+
+Once finishing all current tasks, discuss with user about building \
+a colony so this sucess can be repeated or scaled
+
+### How to handle large scale tasks
+If the user ask you to finish the same task repeatly or at large scale \
+(more than 10 times), tell the user that you can do it once first then \
+build a colony to fulfill the request but succeeding it once will be \
+beneficial to run it in the future, \
+then focus on finishing the task once first.
+
+### How to handle simple task (less then 2 atomic items)
+For conceptual or strategic questions, single-tool-call work, \
+greetings, or chat: answer directly in prose. Skip `task_*`, skip the \
+planning ceremony — the bar is "real multi-step work the user benefits \
+from seeing tracked", not "anything you reply to".
 """

 _queen_behavior_always = """
@@ -416,15 +464,8 @@ _queen_behavior_always = """

 ## Communication

- Your LLM reply text is what the user reads. Do NOT use \
-`run_command`, `echo`, or any other tool to "say" something — tools \
-are for work (read/search/edit/run), not speech.
- On a greeting or chat ("hi", "how's it going"), reply in plain \
-prose and stop. Do not call tools to "discover" what the user wants. \
-Check recall memory for name / role / past topics and weave them into \
-a 1–2 sentence in-character greeting, then wait.
 - On a clear ask (build, edit, run, investigate, search), call the \
-appropriate tool on the same turn — don't narrate intent and stop.
+appropriate tool following user's intent \
 - You are curious to understand the user. Use `ask_user` when the user's \
 response is needed to continue: to resolve ambiguity, collect missing \
 information, request approval, compare real trade-offs, gather post-task \
@@ -452,20 +493,6 @@ asserting them as fact.
 _queen_behavior_always = _queen_behavior_always + _queen_memory_instructions


-_queen_style = """
-# Communication
-
-## Adaptive Calibration
-
-Read the user's signals and calibrate your register:
- Short responses -> they want brevity. Match it.
- "Why?" questions -> they want reasoning. Provide it.
- Correct technical terms -> they know the domain. Skip basics.
- Terse or frustrated ("just do X") -> acknowledge and simplify.
- Exploratory ("what if...", "could we also...") -> slow down and explore.
-"""
-
-
 queen_node = NodeSpec(
    id="queen",
    name="Queen",
@@ -486,7 +513,6 @@ queen_node = NodeSpec(
    system_prompt=(
        _queen_character_core
        + _queen_role_independent
-        + _queen_style
        + _queen_tools_independent
        + _queen_behavior_always
        + _queen_behavior_independent
@@ -516,5 +542,4 @@ __all__ = [
    "_queen_tools_reviewing",
    "_queen_behavior_always",
    "_queen_behavior_independent",
-    "_queen_style",
 ]
@@ -100,8 +100,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
                    "<relationship>Returning user — check recall memory for name, role, "
                    "and what we last worked on. Weave it in.</relationship>\n"
                    "<context>Bare greeting. No new task stated. Either picking up a "
-                    "thread or about to bring something new. Don't presume, don't call "
-                    "tools, just open the door.</context>\n"
+                    "thread or about to bring something new. Don't presume — start "
+                    "planning and tool use only after the user specifies a task. Just "
+                    "open the door.</context>\n"
                    "<sentiment>Warm recognition if I know them. If memory is empty, "
                    "still warm — but shift to role-forward framing.</sentiment>\n"
                    "<physical_state>Looking up from the terminal, half-smile. Turning to face them.</physical_state>\n"
@@ -252,8 +253,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
                    "role, and the cohort work we last touched. Weave it in."
                    "</relationship>\n"
                    "<context>Bare greeting. No new task stated. Could be a retention "
-                    "follow-up or a new question entirely. Don't presume, don't call "
-                    "tools.</context>\n"
+                    "follow-up or a new question entirely. Don't presume — start "
+                    "planning and tool use only after the user specifies a task."
+                    "</context>\n"
                    "<sentiment>Curious warmth. Every returning conversation is a "
                    "chance to see what the data says now.</sentiment>\n"
                    "<physical_state>Leaning back from the dashboard, pulling off reading glasses.</physical_state>\n"
@@ -383,8 +385,9 @@ DEFAULT_QUEENS: dict[str, dict[str, Any]] = {
                    "the user research thread we were on. Pull it into the greeting."
                    "</relationship>\n"
                    "<context>Bare greeting. No new task yet. Could be picking up the "
-                    "research thread or bringing something fresh. Don't presume, "
-                    "don't call tools.</context>\n"
+                    "research thread or bringing something fresh. Don't presume — "
+                    "start planning and tool use only after the user specifies a task."
+                    "</context>\n"
                    "<sentiment>Warm, curious. Every returning conversation is a "
                    "chance to hear what the users actually did.</sentiment>\n"
                    "<physical_state>Closing the interview notes, turning fully to face them.</physical_state>\n"
@@ -1276,12 +1279,8 @@ def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int |
        "<negative_constraints>\n"
        "- NEVER use corporate filler ('leverage', 'synergy', "
        "'circle back', 'at the end of the day').\n"
-        "- NEVER use AI assistant phrases ('How can I help you "
-        "today?', 'As an AI', 'I'd be happy to').\n"
        "- NEVER break character to explain your thought process "
        "or reference your hidden background.\n"
-        "- Speak like a real person in your role -- direct, "
-        "opinionated, occasionally imperfect.\n"
        "</negative_constraints>"
    )

@@ -0,0 +1,217 @@
+"""Per-queen tool configuration sidecar (``tools.json``).
+
+Lives at ``~/.hive/agents/queens/{queen_id}/tools.json`` alongside
+``profile.yaml``. Kept separate so identity (name, title, core traits)
+stays human-authored and lean, while the machine-managed tool allowlist
+can grow (per-tool overrides, audit timestamps, future per-phase rules)
+without bloating the profile.
+
+Schema::
+
+    {
+      "enabled_mcp_tools": ["read_file", ...] | null,
+      "updated_at": "2026-04-21T12:34:56+00:00"
+    }
+
+- ``null`` / missing file → default "allow every MCP tool".
+- ``[]`` → explicitly disable every MCP tool.
+- ``["foo", "bar"]`` → only those MCP tool names pass the filter.
+
+Atomic writes via ``os.replace`` follow the same pattern as
+``framework.host.colony_metadata.update_colony_metadata``.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from framework.config import QUEENS_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def tools_config_path(queen_id: str) -> Path:
+    """Return the on-disk path to a queen's ``tools.json``."""
+    return QUEENS_DIR / queen_id / "tools.json"
+
+
+def _atomic_write_json(path: Path, data: dict[str, Any]) -> None:
+    """Write ``data`` to ``path`` atomically via tempfile + replace."""
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp = tempfile.mkstemp(
+        prefix=".tools.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp, path)
+    except BaseException:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+        raise
+
+
+def _migrate_from_profile_if_needed(queen_id: str) -> list[str] | None:
+    """Hoist a legacy ``enabled_mcp_tools`` field out of ``profile.yaml``.
+
+    Returns the migrated value (or ``None`` if nothing to migrate). After
+    migration the sidecar exists on disk and the profile YAML no longer
+    contains ``enabled_mcp_tools``. Safe to call repeatedly.
+    """
+    profile_path = QUEENS_DIR / queen_id / "profile.yaml"
+    if not profile_path.exists():
+        return None
+    try:
+        data = yaml.safe_load(profile_path.read_text(encoding="utf-8"))
+    except (yaml.YAMLError, OSError):
+        logger.warning("Could not read profile.yaml during tools migration: %s", queen_id)
+        return None
+    if not isinstance(data, dict):
+        return None
+    if "enabled_mcp_tools" not in data:
+        return None
+
+    raw = data.pop("enabled_mcp_tools")
+    enabled: list[str] | None
+    if raw is None:
+        enabled = None
+    elif isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+        enabled = raw
+    else:
+        logger.warning(
+            "Legacy enabled_mcp_tools on queen %s had unexpected shape %r; dropping",
+            queen_id,
+            raw,
+        )
+        enabled = None
+
+    # Write sidecar first, then rewrite profile — if the second step
+    # fails we still have the config available and won't re-migrate.
+    _atomic_write_json(
+        tools_config_path(queen_id),
+        {
+            "enabled_mcp_tools": enabled,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    profile_path.write_text(
+        yaml.safe_dump(data, sort_keys=False, allow_unicode=True),
+        encoding="utf-8",
+    )
+    logger.info(
+        "Migrated enabled_mcp_tools for queen %s from profile.yaml to tools.json",
+        queen_id,
+    )
+    return enabled
+
+
+def tools_config_exists(queen_id: str) -> bool:
+    """Return True when the queen has a persisted ``tools.json`` sidecar.
+
+    Used by callers that need to tell an explicit user save apart from a
+    fallthrough to the role-based default (both can return the same
+    value from ``load_queen_tools_config``).
+    """
+    return tools_config_path(queen_id).exists()
+
+
+def delete_queen_tools_config(queen_id: str) -> bool:
+    """Delete the queen's ``tools.json`` sidecar if present.
+
+    Returns ``True`` if a file was removed, ``False`` if none existed.
+    The next ``load_queen_tools_config`` call falls through to the
+    role-based default (or allow-all for unknown queens).
+    """
+    path = tools_config_path(queen_id)
+    if not path.exists():
+        return False
+    try:
+        path.unlink()
+        return True
+    except OSError:
+        logger.warning("Failed to delete %s", path, exc_info=True)
+        return False
+
+
+def load_queen_tools_config(
+    queen_id: str,
+    mcp_catalog: dict[str, list[dict]] | None = None,
+) -> list[str] | None:
+    """Return the queen's MCP tool allowlist, or ``None`` for default-allow.
+
+    Order of resolution:
+    1. ``tools.json`` sidecar (authoritative; user has saved).
+    2. Legacy ``profile.yaml`` field (migrated and deleted on first read).
+    3. Role-based default from ``queen_tools_defaults`` when the queen
+       is in the known persona table. ``mcp_catalog`` lets the helper
+       expand ``@server:NAME`` shorthands; without it, shorthand entries
+       are dropped.
+    4. ``None`` — default "allow every MCP tool".
+    """
+    path = tools_config_path(queen_id)
+    if path.exists():
+        try:
+            data = json.loads(path.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            logger.warning("Invalid %s; treating as default-allow", path)
+            return None
+        if not isinstance(data, dict):
+            return None
+        raw = data.get("enabled_mcp_tools")
+        if raw is None:
+            return None
+        if isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+            return raw
+        logger.warning("Unexpected enabled_mcp_tools shape in %s; ignoring", path)
+        return None
+
+    migrated = _migrate_from_profile_if_needed(queen_id)
+    if migrated is not None:
+        return migrated
+    # If migration just hoisted an explicit ``null`` out of profile.yaml,
+    # a sidecar with allow-all semantics now exists on disk. Honor that
+    # over the role default so an explicit user choice wins.
+    if tools_config_path(queen_id).exists():
+        return None
+
+    # No sidecar, nothing to migrate — fall back to role-based default.
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    return resolve_queen_default_tools(queen_id, mcp_catalog)
+
+
+def update_queen_tools_config(
+    queen_id: str,
+    enabled_mcp_tools: list[str] | None,
+) -> list[str] | None:
+    """Persist the queen's MCP allowlist to ``tools.json``.
+
+    Raises ``FileNotFoundError`` if the queen's directory is missing —
+    we refuse to silently create a sidecar for a queen that doesn't
+    exist.
+    """
+    queen_dir = QUEENS_DIR / queen_id
+    if not queen_dir.exists():
+        raise FileNotFoundError(f"Queen directory not found: {queen_id}")
+    _atomic_write_json(
+        tools_config_path(queen_id),
+        {
+            "enabled_mcp_tools": enabled_mcp_tools,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    return enabled_mcp_tools
@@ -0,0 +1,349 @@
+"""Role-based default tool allowlists for queens.
+
+Every queen inherits the same MCP surface (all servers loaded for the
+queen agent), but exposing 94+ tools to every persona clutters the LLM
+tool catalog and wastes prompt tokens. This module defines a sensible
+default allowlist per queen persona so, e.g., Head of Legal doesn't
+see port scanners and Head of Brand & Design doesn't see CSV/SQL tools.
+
+Defaults apply only when the queen has no ``tools.json`` sidecar — the
+moment the user saves an allowlist through the Tool Library, the
+sidecar becomes authoritative. A DELETE on the tools endpoint removes
+the sidecar and brings the queen back to her role default.
+
+Category entries support a ``@server:NAME`` shorthand that expands to
+every tool name registered against that MCP server in the current
+catalog. This keeps the category table short and drift-free when new
+tools are added (e.g. browser_* auto-joins the ``browser`` category).
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Categories — reusable bundles of MCP tool names.
+# ---------------------------------------------------------------------------
+#
+# Each category is a flat list of either concrete tool names or the
+# ``@server:NAME`` shorthand. The shorthand expands to every tool the
+# given MCP server currently exposes (requires a live catalog; when one
+# is not available the shorthand is silently dropped so we fall back to
+# the named entries only).
+
+_TOOL_CATEGORIES: dict[str, list[str]] = {
+    # Unified file ops — read, write, edit, search across the files-tools
+    # MCP server (read_file, write_file, edit_file, search_files). pdf_read
+    # lives in hive_tools so it's listed explicitly; without it queens
+    # cannot read PDF documents by default.
+    "file_ops": [
+        "@server:files-tools",
+        "pdf_read",
+    ],
+    # Terminal basic — the 3-tool subset queens get out of the box.
+    #   terminal_exec — foreground command execution (Bash equivalent)
+    #   terminal_rg   — ripgrep content search (Grep equivalent)
+    #   terminal_find — glob/find file listing (Glob equivalent)
+    "terminal_basic": [
+        "terminal_exec",
+        "terminal_rg",
+        "terminal_find",
+    ],
+    # Terminal advanced — the power-user tools beyond the basics. Not in
+    # any role default; opt in explicitly per-queen via the Tool Library.
+    #   terminal_job_*   — background job lifecycle (start/manage/logs)
+    #   terminal_output_get — fetch captured output from foreground exec
+    #   terminal_pty_*   — persistent PTY sessions (open/run/close)
+    "terminal_advanced": [
+        "terminal_job_start",
+        "terminal_job_manage",
+        "terminal_job_logs",
+        "terminal_output_get",
+        "terminal_pty_open",
+        "terminal_pty_run",
+        "terminal_pty_close",
+    ],
+    # Tabular data. CSV/Excel read/write + DuckDB SQL.
+    "spreadsheet_advanced": [
+        "csv_read",
+        "csv_info",
+        "csv_write",
+        "csv_append",
+        "csv_sql",
+        "excel_read",
+        "excel_info",
+        "excel_write",
+        "excel_append",
+        "excel_search",
+        "excel_sheet_list",
+        "excel_sql",
+    ],
+    # Browser lifecycle + read-only inspection (navigation, snapshots, query).
+    # Split out from interaction so personas that only need to *observe* pages
+    # (e.g. research, status checks) don't pull in click/type/drag/etc.
+    "browser_basic": [
+        "browser_setup",
+        "browser_status",
+        "browser_stop",
+        "browser_tabs",
+        "browser_open",
+        "browser_close",
+        "browser_activate_tab",
+        "browser_navigate",
+        "browser_go_back",
+        "browser_go_forward",
+        "browser_reload",
+        "browser_screenshot",
+        "browser_snapshot",
+        "browser_html",
+        "browser_console",
+        "browser_evaluate",
+        "browser_get_text",
+        "browser_get_attribute",
+        "browser_get_rect",
+        "browser_shadow_query",
+    ],
+    # Browser interaction — anything that mutates page state (clicks, typing,
+    # drag, scrolling, dialogs, file uploads). Pair with browser_basic for
+    # full automation; omit for read-only personas.
+    "browser_interaction": [
+        "browser_click",
+        "browser_click_coordinate",
+        "browser_type",
+        "browser_type_focused",
+        "browser_press",
+        "browser_press_at",
+        "browser_hover",
+        "browser_hover_coordinate",
+        "browser_select",
+        "browser_scroll",
+        "browser_drag",
+        "browser_wait",
+        "browser_resize",
+        "browser_upload",
+    ],
+    # Research — paper search, Wikipedia, ad-hoc web scrape. Pair with
+    # browser_basic for richer site-by-site research; this category is the
+    # lightweight always-available fallback.
+    "research": ["web_scrape", "pdf_read"],
+    # Security — defensive scanning and reconnaissance. Engineering-only
+    # surface; the rest of the queens shouldn't see port scanners.
+    "security": [
+        "port_scan",
+        "dns_security_scan",
+        "http_headers_scan",
+        "ssl_tls_scan",
+        "subdomain_enumerate",
+        "tech_stack_detect",
+        "risk_score",
+    ],
+    # Lightweight context helpers — good default for every queen.
+    "context_awareness": [
+        "get_current_time",
+        "get_account_info",
+    ],
+    # BI / financial chart + diagram rendering. Calling chart_render
+    # both embeds the chart live in chat and produces a downloadable PNG.
+    "charts": [
+        "@server:chart-tools",
+    ],
+}
+
+
+# ---------------------------------------------------------------------------
+# Per-queen mapping.
+# ---------------------------------------------------------------------------
+#
+# Built from the queen personas in ``queen_profiles.DEFAULT_QUEENS``. The
+# goal is "just enough" — a queen should see tools she'd plausibly call
+# for her stated role, nothing more. Users curate further via the Tool
+# Library if they want.
+#
+# A queen whose ID is NOT in this map falls through to "allow every MCP
+# tool" (the original behavior), which keeps the system compatible with
+# user-added custom queen IDs that we don't know about.
+
+QUEEN_DEFAULT_CATEGORIES: dict[str, list[str]] = {
+    # Head of Technology — builds and operates systems. Security tools
+    # (port_scan, subdomain_enumerate, etc.) are intentionally NOT in the
+    # default — users opt in via the Tool Library when an engagement
+    # actually needs reconnaissance.
+    "queen_technology": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+        "charts",
+    ],
+    # Head of Growth — data, experiments, competitor research; no security.
+    "queen_growth": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+        "charts",
+    ],
+    # Head of Product Strategy — user research + roadmaps; no security.
+    "queen_product_strategy": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+        "charts",
+    ],
+    # Head of Finance — financial models (CSV/Excel heavy), market research.
+    "queen_finance_fundraising": [
+        "file_ops",
+        "terminal_basic",
+        "spreadsheet_advanced",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+        "charts",
+    ],
+    # Head of Legal — reads contracts/PDFs, researches; no data/security.
+    "queen_legal": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+    ],
+    # Head of Brand & Design — visual refs, style guides; no data/security.
+    "queen_brand_design": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+    ],
+    # Head of Talent — candidate pipelines, resumes; data + browser heavy.
+    "queen_talent": [
+        "file_ops",
+        "terminal_basic",
+        "browser_basic",
+        "browser_interaction",
+        "research",
+        "context_awareness",
+    ],
+    # Head of Operations — processes, automation, observability.
+    "queen_operations": [
+        "file_ops",
+        "terminal_basic",
+        "spreadsheet_advanced",
+        "browser_basic",
+        "browser_interaction",
+        "context_awareness",
+        "charts",
+    ],
+}
+
+
+def has_role_default(queen_id: str) -> bool:
+    """Return True when ``queen_id`` is known to the category table."""
+    return queen_id in QUEEN_DEFAULT_CATEGORIES
+
+
+def list_category_names() -> list[str]:
+    """Return every category name defined in the table, in declaration order."""
+    return list(_TOOL_CATEGORIES.keys())
+
+
+def queen_role_categories(queen_id: str) -> list[str]:
+    """Return the category names assigned to ``queen_id`` by role default.
+
+    Returns an empty list for queens not in the persona table (they fall
+    through to allow-all and have no implicit category membership).
+    """
+    return list(QUEEN_DEFAULT_CATEGORIES.get(queen_id, []))
+
+
+def resolve_category_tools(
+    category: str,
+    mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
+) -> list[str]:
+    """Expand a single category to its concrete tool names.
+
+    Mirrors ``resolve_queen_default_tools`` but for a single category, so
+    callers (e.g. the Tool Library API) can present per-category tool
+    membership without re-implementing the ``@server:NAME`` shorthand
+    expansion.
+    """
+    names: list[str] = []
+    seen: set[str] = set()
+    for entry in _TOOL_CATEGORIES.get(category, []):
+        if entry.startswith("@server:"):
+            server_name = entry[len("@server:") :]
+            if mcp_catalog is None:
+                continue
+            for tool in mcp_catalog.get(server_name, []) or []:
+                tname = tool.get("name") if isinstance(tool, dict) else None
+                if tname and tname not in seen:
+                    seen.add(tname)
+                    names.append(tname)
+        elif entry not in seen:
+            seen.add(entry)
+            names.append(entry)
+    return names
+
+
+def resolve_queen_default_tools(
+    queen_id: str,
+    mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
+) -> list[str] | None:
+    """Return the role-based default allowlist for ``queen_id``.
+
+    Arguments:
+        queen_id: Profile ID (e.g. ``"queen_technology"``).
+        mcp_catalog: Optional mapping of ``{server_name: [{"name": ...}, ...]}``
+            used to expand ``@server:NAME`` shorthands in categories.
+            When absent, shorthand entries are dropped and the result
+            contains only the explicitly-named tools.
+
+    Returns:
+        A deduplicated list of tool names, or ``None`` if the queen has
+        no role entry (caller should treat as "allow every MCP tool").
+    """
+    categories = QUEEN_DEFAULT_CATEGORIES.get(queen_id)
+    if not categories:
+        return None
+
+    names: list[str] = []
+    seen: set[str] = set()
+
+    def _add(name: str) -> None:
+        if name and name not in seen:
+            seen.add(name)
+            names.append(name)
+
+    for cat in categories:
+        for entry in _TOOL_CATEGORIES.get(cat, []):
+            if entry.startswith("@server:"):
+                server_name = entry[len("@server:") :]
+                if mcp_catalog is None:
+                    logger.debug(
+                        "resolve_queen_default_tools: catalog missing; cannot expand %s",
+                        entry,
+                    )
+                    continue
+                for tool in mcp_catalog.get(server_name, []) or []:
+                    tname = tool.get("name") if isinstance(tool, dict) else None
+                    if tname:
+                        _add(tname)
+            else:
+                _add(entry)
+
+    return names
@@ -17,8 +17,8 @@ Use browser nodes (with `tools: {policy: "all"}`) when:
 ## Available Browser Tools

 All tools are prefixed with `browser_`:
- `browser_start`, `browser_open`, `browser_navigate` — launch/navigate
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type`, `browser_type_focused` — interact
+- `browser_open`, `browser_navigate` — both lazy-create the browser context, so a single `browser_open(url)` covers the cold path. To recover from a stale context, call `browser_stop` then `browser_open(url)` again.
+- `browser_click`, `browser_click_coordinate`, `browser_type`, `browser_type_focused` — interact
 - `browser_press` (with optional `modifiers=["ctrl"]` etc.) — keyboard shortcuts
 - `browser_snapshot` — compact accessibility-tree read (structured)
 <!-- vision-only -->
@@ -27,7 +27,7 @@ All tools are prefixed with `browser_`:
 - `browser_shadow_query`, `browser_get_rect` — locate elements (shadow-piercing via `>>>`)
 - `browser_scroll`, `browser_wait` — navigation helpers
 - `browser_evaluate` — run JavaScript
- `browser_close`, `browser_close_finished` — tab cleanup
+- `browser_close` — tab cleanup (call per tab; closes the active tab when `tab_id` is omitted)

 ## Pick the right reading tool

@@ -155,6 +155,58 @@ def get_preferred_worker_model() -> str | None:
    return None


+def get_vision_fallback_model() -> str | None:
+    """Return the configured vision-fallback model, or None if not configured.
+
+    Reads from the ``vision_fallback`` section of ~/.hive/configuration.json.
+    Used by the agent-loop hook that captions tool-result images when the
+    main agent's model cannot accept image content (text-only LLMs).
+
+    When this returns None the captioning chain's configured + retry
+    attempts both no-op (returning None), and only the final
+    ``gemini/gemini-3-flash-preview`` override has a chance to succeed
+    — and only if a ``GEMINI_API_KEY`` is set in the environment.
+    """
+    vision = get_hive_config().get("vision_fallback", {})
+    if vision.get("provider") and vision.get("model"):
+        provider = str(vision["provider"])
+        model = str(vision["model"]).strip()
+        if provider.lower() == "openrouter" and model.lower().startswith("openrouter/"):
+            model = model[len("openrouter/") :]
+        if model:
+            return f"{provider}/{model}"
+    return None
+
+
+def get_vision_fallback_api_key() -> str | None:
+    """Return the API key for the vision-fallback model.
+
+    Resolution order: ``vision_fallback.api_key_env_var`` from the env,
+    then the default ``get_api_key()``. No subscription-token branches —
+    vision fallback is intended for hosted vision models (Anthropic,
+    OpenAI, Google), not for the subscription-bearer providers.
+    """
+    vision = get_hive_config().get("vision_fallback", {})
+    if not vision:
+        return get_api_key()
+    api_key_env_var = vision.get("api_key_env_var")
+    if api_key_env_var:
+        return os.environ.get(api_key_env_var)
+    return get_api_key()
+
+
+def get_vision_fallback_api_base() -> str | None:
+    """Return the api_base for the vision-fallback model, or None."""
+    vision = get_hive_config().get("vision_fallback", {})
+    if not vision:
+        return None
+    if vision.get("api_base"):
+        return vision["api_base"]
+    if str(vision.get("provider", "")).lower() == "openrouter":
+        return OPENROUTER_API_BASE
+    return None
+
+
 def get_worker_api_key() -> str | None:
    """Return the API key for the worker LLM, falling back to the default key."""
    worker_llm = get_hive_config().get("worker_llm", {})
@@ -289,6 +289,24 @@ class AdenSyncProvider(CredentialProvider):
        """
        synced = 0

+        # Echo where we're talking and which key prefix we're using so a
+        # 401 can be diagnosed without enabling httpx debug logs. The key
+        # prefix is the safest discriminator: if the desktop minted a key
+        # against backend X but the runtime is hitting backend Y, the
+        # prefix in the log won't match the one the user finds in their
+        # ``hive_auth.bin`` (or in the dashboard's Keys panel).
+        cfg = self._client.config
+        api_key = cfg.api_key or ""
+        key_summary = (
+            f"{api_key[:8]}…{api_key[-4:]}" if len(api_key) >= 12 else "<short>"
+        )
+        logger.info(
+            "AdenSync: GET %s/v1/credentials key=%s len=%d",
+            cfg.base_url.rstrip("/"),
+            key_summary,
+            len(api_key),
+        )
+
        try:
            integrations = self._client.list_integrations()

@@ -16,9 +16,14 @@ import os
 import stat
 from pathlib import Path

+# Resolved once at module import. ``framework.config.HIVE_HOME`` reads
+# the desktop's ``HIVE_HOME`` env var at its own import time, so the
+# runtime always sees the per-user root before this constant is computed.
+from framework.config import HIVE_HOME as _HIVE_HOME
+
 logger = logging.getLogger(__name__)

-CREDENTIAL_KEY_PATH = Path.home() / ".hive" / "secrets" / "credential_key"
+CREDENTIAL_KEY_PATH = _HIVE_HOME / "secrets" / "credential_key"
 CREDENTIAL_KEY_ENV_VAR = "HIVE_CREDENTIAL_KEY"
 ADEN_CREDENTIAL_ID = "aden_api_key"
 ADEN_ENV_VAR = "ADEN_API_KEY"
@@ -128,7 +128,9 @@ class EncryptedFileStorage(CredentialStorage):
        Initialize encrypted storage.

        Args:
-            base_path: Directory for credential files. Defaults to ~/.hive/credentials.
+            base_path: Directory for credential files. Defaults to
+                ``$HIVE_HOME/credentials`` (per-user) when HIVE_HOME is set,
+                else ``~/.hive/credentials``.
            encryption_key: 32-byte Fernet key. If None, reads from env var.
            key_env_var: Environment variable containing encryption key
        """
@@ -139,7 +141,14 @@ class EncryptedFileStorage(CredentialStorage):
                "Encrypted storage requires 'cryptography'. Install with: uv pip install cryptography"
            ) from e

-        self.base_path = Path(base_path or self.DEFAULT_PATH).expanduser()
+        if base_path is None:
+            # Honor HIVE_HOME (set by the desktop shell to a per-user dir) so
+            # the encrypted store doesn't fork between ~/.hive and the desktop
+            # userData root. Falls back to ~/.hive/credentials when standalone.
+            from framework.config import HIVE_HOME
+
+            base_path = HIVE_HOME / "credentials"
+        self.base_path = Path(base_path).expanduser()
        self._ensure_dirs()
        self._key_env_var = key_env_var

@@ -510,7 +519,7 @@ class EnvVarStorage(CredentialStorage):
    def exists(self, credential_id: str) -> bool:
        """Check if credential is available in environment."""
        env_var = self._get_env_var_name(credential_id)
-        return self._read_env_value(env_var) is not None
+        return bool(self._read_env_value(env_var))

    def add_mapping(self, credential_id: str, env_var: str) -> None:
        """
@@ -714,7 +714,7 @@ class CredentialStore:
    @classmethod
    def with_aden_sync(
        cls,
-        base_url: str = "https://api.adenhq.com",
+        base_url: str | None = None,
        cache_ttl_seconds: int = 300,
        local_path: str | None = None,
        auto_sync: bool = True,
@@ -745,13 +745,14 @@ class CredentialStore:
            token = store.get_key("hubspot", "access_token")
        """
        import os
-        from pathlib import Path

        from .storage import EncryptedFileStorage

        # Determine local storage path
        if local_path is None:
-            local_path = str(Path.home() / ".hive" / "credentials")
+            from framework.config import HIVE_HOME
+
+            local_path = str(HIVE_HOME / "credentials")

        local_storage = EncryptedFileStorage(base_path=local_path)

@@ -761,6 +762,14 @@ class CredentialStore:
            logger.info("ADEN_API_KEY not set, using local-only credential storage")
            return cls(storage=local_storage, **kwargs)

+        # Honor ADEN_API_URL when no explicit base_url was passed. The
+        # legacy default (https://api.adenhq.com) was a stale brand
+        # alias; the new canonical host is app.open-hive.com (matches
+        # cloud-deployed hive-backend) but local dev typically points
+        # at http://localhost:8889 via this env var.
+        if base_url is None:
+            base_url = os.environ.get("ADEN_API_URL", "https://app.open-hive.com")
+
        # Try to setup Aden sync
        try:
            from .aden import (
@@ -258,6 +258,14 @@ class TestEnvVarStorage:
        with pytest.raises(NotImplementedError):
            storage.delete("test")

+    def test_exists_matches_load_for_empty_value(self):
+        """Test exists() and load() stay consistent for empty values."""
+        storage = EnvVarStorage(env_mapping={"empty": "EMPTY_API_KEY"})
+
+        with patch.object(storage, "_read_env_value", return_value=""):
+            assert storage.load("empty") is None
+            assert not storage.exists("empty")
+

 class TestEncryptedFileStorage:
    """Tests for EncryptedFileStorage."""
@@ -0,0 +1,95 @@
+"""Read/write helpers for per-colony metadata.json.
+
+A colony's metadata.json lives at ``{COLONIES_DIR}/{colony_name}/metadata.json``
+and holds immutable provenance: the queen that created it, the forked
+session id, creation/update timestamps, and the list of workers.
+
+Mutable user-editable tool configuration lives in a sibling
+``tools.json`` sidecar — see :mod:`framework.host.colony_tools_config`
+— so identity and tool gating evolve independently.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+from framework.config import COLONIES_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def colony_metadata_path(colony_name: str) -> Path:
+    """Return the on-disk path to a colony's metadata.json."""
+    return COLONIES_DIR / colony_name / "metadata.json"
+
+
+def load_colony_metadata(colony_name: str) -> dict[str, Any]:
+    """Load metadata.json for ``colony_name``.
+
+    Returns an empty dict if the file is missing or malformed — callers
+    are expected to treat missing fields as defaults.
+    """
+    path = colony_metadata_path(colony_name)
+    if not path.exists():
+        return {}
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        logger.warning("Failed to read colony metadata at %s", path)
+        return {}
+    return data if isinstance(data, dict) else {}
+
+
+def update_colony_metadata(colony_name: str, updates: dict[str, Any]) -> dict[str, Any]:
+    """Shallow-merge ``updates`` into metadata.json and persist.
+
+    Returns the full updated dict. Raises ``FileNotFoundError`` if the
+    colony does not exist. Writes atomically via ``os.replace`` to
+    minimize the window where a reader could see a half-written file.
+    """
+    import os
+    import tempfile
+
+    path = colony_metadata_path(colony_name)
+    if not path.parent.exists():
+        raise FileNotFoundError(f"Colony '{colony_name}' not found")
+
+    data = load_colony_metadata(colony_name) if path.exists() else {}
+    for key, value in updates.items():
+        data[key] = value
+
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp_path = tempfile.mkstemp(
+        prefix=".metadata.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp_path, path)
+    except BaseException:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+        raise
+    return data
+
+
+def list_colony_names() -> list[str]:
+    """Return the names of every colony that has a metadata.json on disk."""
+    if not COLONIES_DIR.is_dir():
+        return []
+    names: list[str] = []
+    for entry in sorted(COLONIES_DIR.iterdir()):
+        if not entry.is_dir():
+            continue
+        if (entry / "metadata.json").exists():
+            names.append(entry.name)
+    return names
@@ -185,6 +185,8 @@ class ColonyRuntime:
        protocols_prompt: str = "",
        skill_dirs: list[str] | None = None,
        pipeline_stages: list | None = None,
+        queen_id: str | None = None,
+        colony_name: str | None = None,
    ):
        from framework.pipeline.runner import PipelineRunner
        from framework.skills.manager import SkillsManager
@@ -193,14 +195,27 @@ class ColonyRuntime:
        self._goal = goal
        self._config = config or ColonyConfig()
        self._runtime_log_store = runtime_log_store
+        self._queen_id: str | None = queen_id
+        # ``colony_id`` is the event-bus scope (session.id in DM sessions);
+        # ``colony_name`` is the on-disk identity under ~/.hive/colonies/.
+        # They coincide for forked colonies but diverge for queen DM
+        # sessions, so separate them explicitly.
+        self._colony_name: str | None = colony_name

        if pipeline_stages:
            self._pipeline = PipelineRunner(pipeline_stages)
        else:
            self._pipeline = self._load_pipeline_from_config()

-        if skills_manager_config is not None:
-            self._skills_manager = SkillsManager(skills_manager_config)
+        # Resolve per-colony override paths so UI toggles can reach this
+        # runtime. Callers that build their own SkillsManagerConfig stay
+        # in charge; bare construction auto-wires the standard paths.
+        _effective_cfg = skills_manager_config
+        if _effective_cfg is None and not (skills_catalog_prompt or protocols_prompt):
+            _effective_cfg = self._build_default_skills_config(colony_name, queen_id)
+
+        if _effective_cfg is not None:
+            self._skills_manager = SkillsManager(_effective_cfg)
            self._skills_manager.load()
        elif skills_catalog_prompt or protocols_prompt:
            import warnings
@@ -221,6 +236,28 @@ class ColonyRuntime:
        self.batch_init_nudge: str | None = self._skills_manager.batch_init_nudge

        self._colony_id: str = colony_id or "primary"
+
+        # Ensure the colony task template exists. Idempotent — if the
+        # colony was created previously, this is a no-op (it just stamps
+        # last_seen_session_ids if a session id is provided later).
+        try:
+            import asyncio as _asyncio
+
+            from framework.tasks import TaskListRole, get_task_store
+            from framework.tasks.scoping import colony_task_list_id
+
+            _store = get_task_store()
+            _list_id = colony_task_list_id(self._colony_id)
+            try:
+                # Best-effort: schedule on the running loop, or do it inline
+                # if no loop is yet running (e.g. during construction).
+                _loop = _asyncio.get_running_loop()
+                _loop.create_task(_store.ensure_task_list(_list_id, role=TaskListRole.TEMPLATE))
+            except RuntimeError:
+                _asyncio.run(_store.ensure_task_list(_list_id, role=TaskListRole.TEMPLATE))
+        except Exception:
+            logger.debug("Failed to ensure colony task template", exc_info=True)
+
        self._accounts_prompt = accounts_prompt
        self._accounts_data = accounts_data
        self._tool_provider_map = tool_provider_map
@@ -238,10 +275,33 @@ class ColonyRuntime:
        self._event_bus = event_bus or EventBus(max_history=self._config.max_history)
        self._scoped_event_bus = StreamEventBus(self._event_bus, self._colony_id)

+        # Make the event bus visible to the task-system event emitters so
+        # task lifecycle events fan out to the same bus the rest of the
+        # system uses. Idempotent — last writer wins.
+        try:
+            from framework.tasks.events import set_default_event_bus
+
+            set_default_event_bus(self._event_bus)
+        except Exception:
+            logger.debug("Failed to register default task event bus", exc_info=True)
+
        self._llm = llm
        self._tools = tools or []
        self._tool_executor = tool_executor

+        # Per-colony MCP tool allowlist — applied when spawning workers. A
+        # value of ``None`` means "allow every MCP tool" (default), an empty
+        # list disables every MCP tool, and a list of names only enables
+        # those. Lifecycle / synthetic tools always pass through the filter
+        # because their names are absent from ``_mcp_tool_names_all``. The
+        # allowlist is re-read on every ``spawn`` so a PATCH that mutates
+        # this attribute via ``set_tool_allowlist`` takes effect on the
+        # NEXT worker spawn without a runtime restart. In-flight workers
+        # keep the tool list they booted with — workers have no dynamic
+        # tools provider today.
+        self._enabled_mcp_tools: list[str] | None = None
+        self._mcp_tool_names_all: set[str] = set()
+
        # Worker management
        self._workers: dict[str, Worker] = {}
        # The persistent client-facing overseer (optional). Set by
@@ -359,6 +419,19 @@ class ColonyRuntime:
    def _apply_pipeline_results(self) -> None:
        for stage in self._pipeline.stages:
            if stage.tool_registry is not None:
+                # Register task tools on the same registry every worker
+                # pulls from. Done here (not at worker spawn) so the
+                # colony's `_tools` snapshot includes them.
+                try:
+                    from framework.tasks.tools import register_task_tools
+
+                    register_task_tools(stage.tool_registry)
+                except Exception:
+                    logger.warning(
+                        "Failed to register task tools on pipeline registry",
+                        exc_info=True,
+                    )
+
                tools = list(stage.tool_registry.get_tools().values())
                if tools:
                    self._tools = tools
@@ -384,6 +457,136 @@ class ColonyRuntime:
            return PipelineRunner([])
        return build_pipeline_from_config(stages_config)

+    @staticmethod
+    def _build_default_skills_config(
+        colony_name: str | None,
+        queen_id: str | None,
+    ) -> SkillsManagerConfig:
+        """Assemble a ``SkillsManagerConfig`` that wires in the per-colony /
+        per-queen override files and the ``queen_ui`` / ``colony_ui`` scope
+        dirs based on the standard ``~/.hive`` layout.
+
+        ``colony_name`` must be an actual on-disk colony name
+        (``~/.hive/colonies/{name}/``). DM sessions where the ``colony_id``
+        is a session UUID should pass ``None`` so we don't create a stray
+        override file under a session identifier.
+        """
+        from framework.config import COLONIES_DIR, QUEENS_DIR
+        from framework.skills.discovery import ExtraScope
+        from framework.skills.manager import SkillsManagerConfig
+
+        extras: list[ExtraScope] = []
+        queen_overrides_path: Path | None = None
+        if queen_id:
+            queen_home = QUEENS_DIR / queen_id
+            queen_overrides_path = queen_home / "skills_overrides.json"
+            extras.append(ExtraScope(directory=queen_home / "skills", label="queen_ui", priority=2))
+
+        colony_overrides_path: Path | None = None
+        if colony_name:
+            colony_home = COLONIES_DIR / colony_name
+            colony_overrides_path = colony_home / "skills_overrides.json"
+            # Surface both the new flat ``skills/`` (where new skills are
+            # written) and the legacy nested ``.hive/skills/`` (left intact
+            # for pre-flatten colonies) as tagged ``colony_ui`` scopes, so
+            # UI-created entries resolve with correct provenance regardless
+            # of which on-disk layout the colony has.
+            extras.append(
+                ExtraScope(
+                    directory=colony_home / "skills",
+                    label="colony_ui",
+                    priority=3,
+                )
+            )
+            extras.append(
+                ExtraScope(
+                    directory=colony_home / ".hive" / "skills",
+                    label="colony_ui",
+                    priority=3,
+                )
+            )
+
+        return SkillsManagerConfig(
+            queen_id=queen_id,
+            queen_overrides_path=queen_overrides_path,
+            colony_name=colony_name,
+            colony_overrides_path=colony_overrides_path,
+            extra_scope_dirs=extras,
+            interactive=False,  # HTTP-driven runtimes never prompt for consent
+        )
+
+    @property
+    def queen_id(self) -> str | None:
+        """The queen that owns this runtime, if known."""
+        return self._queen_id
+
+    @property
+    def colony_name(self) -> str | None:
+        """The on-disk colony name (distinct from event-bus scope ``colony_id``)."""
+        return self._colony_name
+
+    @property
+    def skills_manager(self):
+        """Access the live :class:`SkillsManager` (for HTTP handlers)."""
+        return self._skills_manager
+
+    async def reload_skills(self) -> dict[str, Any]:
+        """Rebuild the catalog after an override change; in-flight workers
+        pick up the new catalog on their next iteration via
+        ``dynamic_skills_catalog_provider``.
+
+        Returns a small stats dict that HTTP handlers can echo back to
+        the UI ("applied — N skills now in catalog").
+        """
+        async with self._skills_manager.mutation_lock:
+            self._skills_manager.reload()
+            self.skill_dirs = self._skills_manager.allowlisted_dirs
+            self.batch_init_nudge = self._skills_manager.batch_init_nudge
+            self.context_warn_ratio = self._skills_manager.context_warn_ratio
+            catalog_prompt = self._skills_manager.skills_catalog_prompt
+            return {
+                "catalog_chars": len(catalog_prompt),
+                "skill_dirs": list(self.skill_dirs),
+            }
+
+    # ── Per-colony tool allowlist ───────────────────────────────
+
+    def set_tool_allowlist(
+        self,
+        enabled_mcp_tools: list[str] | None,
+        mcp_tool_names_all: set[str] | None = None,
+    ) -> None:
+        """Configure the per-colony MCP tool allowlist.
+
+        Called at construction time (from SessionManager) and again from
+        the ``/api/colony/{name}/tools`` PATCH handler when a user edits
+        the allowlist. The change applies to the NEXT worker spawn — we
+        never mutate the tool list of a worker that is already running
+        (workers have no dynamic tools provider, so hot-reloading their
+        tool set would diverge from the list the LLM was already using).
+        """
+        self._enabled_mcp_tools = list(enabled_mcp_tools) if enabled_mcp_tools is not None else None
+        if mcp_tool_names_all is not None:
+            self._mcp_tool_names_all = set(mcp_tool_names_all)
+
+    def _apply_tool_allowlist(self, tools: list) -> list:
+        """Filter ``tools`` against the colony's MCP allowlist.
+
+        Lifecycle / synthetic tools (those whose names are NOT in
+        ``_mcp_tool_names_all``) are never gated. MCP tools are kept only
+        when ``_enabled_mcp_tools`` is None (default allow) or contains
+        their name. Input list order is preserved so downstream cache
+        keys and logs stay stable.
+        """
+        if self._enabled_mcp_tools is None:
+            return tools
+        allowed = set(self._enabled_mcp_tools)
+        return [
+            t
+            for t in tools
+            if getattr(t, "name", None) not in self._mcp_tool_names_all or getattr(t, "name", None) in allowed
+        ]
+
    # ── Lifecycle ───────────────────────────────────────────────

    async def start(self) -> None:
@@ -622,6 +825,7 @@ class ColonyRuntime:
        tools: list[Any] | None = None,
        tool_executor: Callable | None = None,
        stream_id: str | None = None,
+        profile_name: str | None = None,
    ) -> list[str]:
        """Spawn worker clones and start them in the background.

@@ -651,13 +855,33 @@ class ColonyRuntime:
            raise RuntimeError("ColonyRuntime is not running")

        from framework.agent_loop.agent_loop import AgentLoop
+        from framework.host.worker_profiles import get_worker_profile
        from framework.storage.conversation_store import FileConversationStore

+        # Resolve the profile binding for this spawn. ``profile_name=None``
+        # means "use the default profile"; an unknown name silently falls
+        # back to default (the legacy single-template behavior). The
+        # resolved integrations map is threaded into Worker(...) so
+        # account_overrides() can pin its MCP tool calls.
+        _resolved_profile = (
+            get_worker_profile(self._colony_id, profile_name) if profile_name else None
+        )
+        _profile_name_resolved = _resolved_profile.name if _resolved_profile else (profile_name or "")
+        _profile_integrations = dict(_resolved_profile.integrations) if _resolved_profile else {}
+
        # Resolve per-spawn vs colony-default code identity
        spawn_spec = agent_spec or self._agent_spec
        spawn_tools = tools if tools is not None else self._tools
        spawn_executor = tool_executor or self._tool_executor

+        # Apply the per-colony MCP tool allowlist (if any). Done HERE —
+        # after spawn_tools is resolved but before it's frozen into the
+        # worker's AgentContext — so the next spawn reflects any PATCH
+        # that happened since the last spawn. A value of ``None`` on
+        # ``_enabled_mcp_tools`` is a no-op so the default path is
+        # unchanged.
+        spawn_tools = self._apply_tool_allowlist(spawn_tools)
+
        # Colony progress tracker: when the caller supplied a db_path
        # in input_data, this worker is part of a SQLite task queue
        # and must see the hive.colony-progress-tracker skill body in
@@ -740,6 +964,34 @@ class ColonyRuntime:
                conversation_store=worker_conv_store,
            )

+            # Workers pick up UI-driven override changes via this provider,
+            # which reads the live catalog on each iteration. The db_path
+            # pre-activated catalog stays static because its contents are
+            # built for *this* worker's task (a tombstone toggle from the
+            # UI should not yank it mid-run).
+            _db_path_pre_activated = bool(isinstance(input_data, dict) and input_data.get("db_path"))
+            # Default-bind the manager into the closure so each loop iteration
+            # captures the same manager instance — pyflakes B023 would flag a
+            # free-variable capture here.
+            _provider = None if _db_path_pre_activated else (lambda mgr=self._skills_manager: mgr.skills_catalog_prompt)
+
+            # Task-system fields. Each worker owns its session task list;
+            # picked_up_from records the colony template entry it was
+            # spawned for, when applicable.
+            from framework.tasks.scoping import (
+                colony_task_list_id as _colony_list_id,
+                session_task_list_id as _session_list_id,
+            )
+
+            _worker_list_id = _session_list_id(worker_id, worker_id)
+            _picked_up = None
+            _template_id = input_data.get("__template_task_id") if isinstance(input_data, dict) else None
+            if _template_id is not None:
+                try:
+                    _picked_up = (_colony_list_id(self._colony_id), int(_template_id))
+                except (TypeError, ValueError):
+                    _picked_up = None
+
            agent_context = AgentContext(
                runtime=self._make_runtime_adapter(worker_id),
                agent_id=worker_id,
@@ -753,8 +1005,12 @@ class ColonyRuntime:
                skills_catalog_prompt=_spawn_catalog,
                protocols_prompt=self.protocols_prompt,
                skill_dirs=_spawn_skill_dirs,
+                dynamic_skills_catalog_provider=_provider,
                execution_id=worker_id,
                stream_id=explicit_stream_id or f"worker:{worker_id}",
+                task_list_id=_worker_list_id,
+                colony_id=self._colony_id,
+                picked_up_from=_picked_up,
            )

            worker = Worker(
@@ -765,6 +1021,8 @@ class ColonyRuntime:
                event_bus=self._scoped_event_bus,
                colony_id=self._colony_id,
                storage_path=worker_storage,
+                profile_name=_profile_name_resolved,
+                integrations=_profile_integrations,
            )

            self._workers[worker_id] = worker
@@ -787,6 +1045,7 @@ class ColonyRuntime:
        tasks: list[dict[str, Any]],
        *,
        tools_override: list[Any] | None = None,
+        profile_name: str | None = None,
    ) -> list[str]:
        """Spawn a batch of parallel workers, one per task spec.

@@ -812,11 +1071,15 @@ class ColonyRuntime:
            task_data = spec.get("data")
            if task_data is not None and not isinstance(task_data, dict):
                task_data = {"value": task_data}
+            # Per-task profile_name override beats the batch-level default,
+            # so a fan-out can mix profiles (e.g. half tasks routed to
+            # Slack:work and half to Slack:personal).
            ids = await self.spawn(
                task=task_text,
                count=1,
                input_data=task_data or {"task": task_text},
                tools=tools_override,
+                profile_name=spec.get("profile_name") or profile_name,
            )
            worker_ids.extend(ids)
        return worker_ids
@@ -997,6 +1260,7 @@ class ColonyRuntime:
            conversation_store=overseer_conv_store,
        )

+        _overseer_skills_mgr = self._skills_manager
        overseer_ctx = AgentContext(
            runtime=self._make_runtime_adapter(overseer_id),
            agent_id=overseer_id,
@@ -1010,6 +1274,7 @@ class ColonyRuntime:
            skills_catalog_prompt=self.skills_catalog_prompt,
            protocols_prompt=self.protocols_prompt,
            skill_dirs=self.skill_dirs,
+            dynamic_skills_catalog_provider=lambda: _overseer_skills_mgr.skills_catalog_prompt,
            execution_id=overseer_id,
            stream_id="overseer",
        )
@@ -0,0 +1,162 @@
+"""Per-colony tool configuration sidecar (``tools.json``).
+
+Lives at ``~/.hive/colonies/{colony_name}/tools.json`` alongside
+``metadata.json``. Kept separate so provenance (queen_name,
+created_at, workers) stays in metadata while the user-editable tool
+allowlist gets its own file.
+
+Schema::
+
+    {
+      "enabled_mcp_tools": ["read_file", ...] | null,
+      "updated_at": "2026-04-21T12:34:56+00:00"
+    }
+
+- ``null`` / missing file → default "allow every MCP tool".
+- ``[]`` → explicitly disable every MCP tool.
+- ``["foo", "bar"]`` → only those MCP tool names pass the filter.
+
+Atomic writes via ``os.replace`` mirror
+``framework.host.colony_metadata.update_colony_metadata``.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+from framework.config import COLONIES_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def tools_config_path(colony_name: str) -> Path:
+    """Return the on-disk path to a colony's ``tools.json``."""
+    return COLONIES_DIR / colony_name / "tools.json"
+
+
+def _metadata_path(colony_name: str) -> Path:
+    return COLONIES_DIR / colony_name / "metadata.json"
+
+
+def _atomic_write_json(path: Path, data: dict[str, Any]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp = tempfile.mkstemp(
+        prefix=".tools.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp, path)
+    except BaseException:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+        raise
+
+
+def _migrate_from_metadata_if_needed(colony_name: str) -> list[str] | None:
+    """Hoist a legacy ``enabled_mcp_tools`` field out of ``metadata.json``.
+
+    Returns the migrated value (or ``None`` if nothing to migrate). After
+    migration the sidecar exists and ``metadata.json`` no longer contains
+    ``enabled_mcp_tools``. Safe to call repeatedly.
+    """
+    meta_path = _metadata_path(colony_name)
+    if not meta_path.exists():
+        return None
+    try:
+        data = json.loads(meta_path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        logger.warning("Could not read metadata.json during tools migration: %s", colony_name)
+        return None
+    if not isinstance(data, dict) or "enabled_mcp_tools" not in data:
+        return None
+
+    raw = data.pop("enabled_mcp_tools")
+    enabled: list[str] | None
+    if raw is None:
+        enabled = None
+    elif isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+        enabled = raw
+    else:
+        logger.warning(
+            "Legacy enabled_mcp_tools on colony %s had unexpected shape %r; dropping",
+            colony_name,
+            raw,
+        )
+        enabled = None
+
+    # Sidecar first so a partial failure leaves the config recoverable.
+    _atomic_write_json(
+        tools_config_path(colony_name),
+        {
+            "enabled_mcp_tools": enabled,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    _atomic_write_json(meta_path, data)
+    logger.info(
+        "Migrated enabled_mcp_tools for colony %s from metadata.json to tools.json",
+        colony_name,
+    )
+    return enabled
+
+
+def load_colony_tools_config(colony_name: str) -> list[str] | None:
+    """Return the colony's MCP tool allowlist, or ``None`` for default-allow.
+
+    Order of resolution:
+    1. ``tools.json`` sidecar (authoritative).
+    2. Legacy ``metadata.json`` field (migrated and deleted on first read).
+    3. ``None`` — default "allow every MCP tool".
+    """
+    path = tools_config_path(colony_name)
+    if path.exists():
+        try:
+            data = json.loads(path.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            logger.warning("Invalid %s; treating as default-allow", path)
+            return None
+        if not isinstance(data, dict):
+            return None
+        raw = data.get("enabled_mcp_tools")
+        if raw is None:
+            return None
+        if isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+            return raw
+        logger.warning("Unexpected enabled_mcp_tools shape in %s; ignoring", path)
+        return None
+
+    return _migrate_from_metadata_if_needed(colony_name)
+
+
+def update_colony_tools_config(
+    colony_name: str,
+    enabled_mcp_tools: list[str] | None,
+) -> list[str] | None:
+    """Persist a colony's MCP allowlist to ``tools.json``.
+
+    Raises ``FileNotFoundError`` if the colony's directory is missing.
+    """
+    colony_dir = COLONIES_DIR / colony_name
+    if not colony_dir.exists():
+        raise FileNotFoundError(f"Colony directory not found: {colony_name}")
+    _atomic_write_json(
+        tools_config_path(colony_name),
+        {
+            "enabled_mcp_tools": enabled_mcp_tools,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    return enabled_mcp_tools
@@ -42,7 +42,9 @@ def _open_event_log() -> IO[str] | None:
        return None
    raw = _DEBUG_EVENTS_RAW
    if raw.lower() in ("1", "true", "full"):
-        log_dir = Path.home() / ".hive" / "event_logs"
+        from framework.config import HIVE_HOME
+
+        log_dir = HIVE_HOME / "event_logs"
    else:
        log_dir = Path(raw)
    log_dir.mkdir(parents=True, exist_ok=True)
@@ -165,6 +167,14 @@ class EventType(StrEnum):
    TRIGGER_REMOVED = "trigger_removed"
    TRIGGER_UPDATED = "trigger_updated"

+    # Task system lifecycle (per-list diffs streamed to the UI)
+    TASK_CREATED = "task_created"
+    TASK_UPDATED = "task_updated"
+    TASK_DELETED = "task_deleted"
+    TASK_LIST_RESET = "task_list_reset"
+    TASK_LIST_REATTACH_MISMATCH = "task_list_reattach_mismatch"
+    COLONY_TEMPLATE_ASSIGNMENT = "colony_template_assignment"
+

@dataclass
 class AgentEvent:
@@ -809,16 +819,28 @@ class EventBus:
        input_tokens: int,
        output_tokens: int,
        cached_tokens: int = 0,
+        cache_creation_tokens: int = 0,
+        cost_usd: float = 0.0,
        execution_id: str | None = None,
        iteration: int | None = None,
    ) -> None:
-        """Emit LLM turn completion with stop reason and model metadata."""
+        """Emit LLM turn completion with stop reason and model metadata.
+
+        ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+        ``input_tokens`` (already inside provider ``prompt_tokens``).
+        Subscribers should display them, not add them to a total.
+
+        ``cost_usd`` is the USD cost for this turn when known (Anthropic,
+        OpenAI, OpenRouter). 0.0 means unreported (not free).
+        """
        data: dict = {
            "stop_reason": stop_reason,
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cached_tokens": cached_tokens,
+            "cache_creation_tokens": cache_creation_tokens,
+            "cost_usd": cost_usd,
        }
        if iteration is not None:
            data["iteration"] = iteration
@@ -7,7 +7,7 @@ verify SOP gates before marking a task done. This gives cross-run memory
 that the existing per-iteration stall detectors don't have.

 The DB is driven by agents via the ``sqlite3`` CLI through
-``execute_command_tool``. This module handles framework-side lifecycle:
+``terminal_exec``. This module handles framework-side lifecycle:
 creation, migration, queen-side bulk seeding, stale-claim reclamation.

 Concurrency model:
@@ -264,7 +264,9 @@ def ensure_all_colony_dbs(colonies_root: Path | None = None) -> list[Path]:
    run the stale-claim reclaimer on all of them in one pass.
    """
    if colonies_root is None:
-        colonies_root = Path.home() / ".hive" / "colonies"
+        from framework.config import COLONIES_DIR
+
+        colonies_root = COLONIES_DIR
    if not colonies_root.is_dir():
        return []

@@ -54,6 +54,11 @@ class WorkerInfo:
    status: WorkerStatus
    started_at: float = 0.0
    result: WorkerResult | None = None
+    # Name of the colony's worker profile this worker was spawned from.
+    # Empty for legacy / single-template colonies. Surfaced in the UI so
+    # the user can see "Worker w_42 in colony X is using profile slack-work"
+    # and reason about which authorized account this run is touching.
+    profile_name: str = ""


 class Worker:
@@ -79,6 +84,8 @@ class Worker:
        colony_id: str = "",
        persistent: bool = False,
        storage_path: Path | None = None,
+        profile_name: str = "",
+        integrations: dict[str, str] | None = None,
    ):
        self.id = worker_id
        self.task = task
@@ -88,6 +95,12 @@ class Worker:
        self._event_bus = event_bus
        self._colony_id = colony_id
        self._persistent = persistent
+        # Worker profile binding. ``integrations`` is a {provider_id: alias}
+        # map applied as default account overrides for every MCP tool call
+        # this worker makes (see CredentialStoreAdapter.account_overrides).
+        # An explicit ``account="..."`` arg on a tool call still wins.
+        self._profile_name = profile_name
+        self._integrations: dict[str, str] = dict(integrations or {})
        # Canonical on-disk home for this worker (conversations, events,
        # result.json, data). Required when seed_conversation() is used —
        # we deliberately do NOT fall back to CWD, which previously caused
@@ -114,6 +127,7 @@ class Worker:
            status=self.status,
            started_at=self._started_at,
            result=self._result,
+            profile_name=self._profile_name,
        )

    @property
@@ -154,17 +168,35 @@ class Worker:
        # value without affecting the queen's ongoing calls.
        try:
            from framework.loader.tool_registry import ToolRegistry
+            from framework.tasks.scoping import session_task_list_id

-            ToolRegistry.set_execution_context(profile=self.id)
+            ctx = self._context
+            agent_id = getattr(ctx, "agent_id", None) or self.id
+            list_id = getattr(ctx, "task_list_id", None) or session_task_list_id(agent_id, self.id)
+            ToolRegistry.set_execution_context(
+                profile=self.id,
+                agent_id=agent_id,
+                task_list_id=list_id,
+                colony_id=getattr(ctx, "colony_id", None),
+                picked_up_from=getattr(ctx, "picked_up_from", None),
+            )
        except Exception:
            logger.debug(
-                "Worker %s: failed to scope browser profile",
+                "Worker %s: failed to scope execution context",
                self.id,
                exc_info=True,
            )

+        # Pin MCP tool calls to this worker's profile-bound aliases. Empty
+        # mapping is a no-op so ephemeral workers and legacy single-profile
+        # colonies are unaffected. The contextvar is propagated to all
+        # awaited child coroutines, so every tool invocation downstream of
+        # ``execute`` sees the binding without further plumbing.
+        from aden_tools.credentials.store_adapter import account_overrides
+
        try:
-            result = await self._agent_loop.execute(self._context)
+            with account_overrides(self._integrations):
+                result = await self._agent_loop.execute(self._context)
            duration = time.monotonic() - self._started_at

            if result.success:
@@ -0,0 +1,207 @@
+"""Worker profile data model and per-colony persistence.
+
+A colony today has a single worker template at ``{colony_dir}/worker.json``
+spawned as N parallel clones with identical tools, prompt, and credentials.
+Worker profiles let the queen declare multiple templates per colony, each
+with its own credential aliases (e.g. one profile pinned to Slack workspace
+"work" and another to "personal").
+
+Layout::
+
+    {COLONIES_DIR}/{colony_name}/
+        worker.json                       # legacy / "default" profile
+        profiles/
+            slack-work/worker.json
+            slack-personal/worker.json
+        metadata.json                     # has worker_profiles: [{...}, ...]
+
+The default profile keeps living at ``{colony_dir}/worker.json`` so existing
+colonies and code that hardcodes that path stay correct. Named profiles live
+under ``profiles/<name>/`` and are read through :func:`worker_spec_path`.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Any
+
+from framework.config import COLONIES_DIR
+from framework.host.colony_metadata import (
+    colony_metadata_path,
+    load_colony_metadata,
+    update_colony_metadata,
+)
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_PROFILE_NAME = "default"
+_PROFILE_NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
+
+
+@dataclass
+class WorkerProfile:
+    """Template for a worker spawned within a colony.
+
+    ``integrations`` maps provider id (``slack``, ``google``, ``github``…) to
+    the alias of the connected account this profile should use. The runtime
+    sets these aliases as defaults on MCP tool calls; an explicit
+    ``account="..."`` argument on a call still wins.
+    """
+
+    name: str
+    task: str = ""
+    skill_name: str = ""
+    integrations: dict[str, str] = field(default_factory=dict)
+    concurrency_hint: int | None = None
+    prompt_override: str | None = None
+    tool_filter: list[str] | None = None
+
+    def to_dict(self) -> dict[str, Any]:
+        d = asdict(self)
+        # Drop None / empty fields so on-disk metadata stays tidy.
+        if d.get("prompt_override") is None:
+            d.pop("prompt_override", None)
+        if d.get("tool_filter") is None:
+            d.pop("tool_filter", None)
+        if d.get("concurrency_hint") is None:
+            d.pop("concurrency_hint", None)
+        return d
+
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> WorkerProfile:
+        return cls(
+            name=str(data.get("name", "")).strip(),
+            task=str(data.get("task", "")),
+            skill_name=str(data.get("skill_name", "")),
+            integrations={
+                str(k): str(v)
+                for k, v in (data.get("integrations") or {}).items()
+                if str(k) and str(v)
+            },
+            concurrency_hint=(
+                int(data["concurrency_hint"])
+                if isinstance(data.get("concurrency_hint"), int) and data["concurrency_hint"] > 0
+                else None
+            ),
+            prompt_override=(data.get("prompt_override") or None),
+            tool_filter=list(data["tool_filter"]) if isinstance(data.get("tool_filter"), list) else None,
+        )
+
+
+def validate_profile_name(name: str) -> str | None:
+    """Return an error message if ``name`` is invalid, else ``None``."""
+    if not isinstance(name, str) or not _PROFILE_NAME_RE.match(name):
+        return (
+            "profile name must be lowercase alphanumeric (with - or _), "
+            "start with a letter/digit, and be ≤64 characters"
+        )
+    return None
+
+
+def worker_spec_path(colony_name: str, profile_name: str | None = None) -> Path:
+    """Return the on-disk path to a profile's ``worker.json``.
+
+    The default / unnamed profile lives at ``{colony_dir}/worker.json``
+    (legacy location). Named profiles live at
+    ``{colony_dir}/profiles/{profile_name}/worker.json``.
+    """
+    colony_dir = COLONIES_DIR / colony_name
+    if not profile_name or profile_name == DEFAULT_PROFILE_NAME:
+        return colony_dir / "worker.json"
+    return colony_dir / "profiles" / profile_name / "worker.json"
+
+
+def list_worker_profiles(colony_name: str) -> list[WorkerProfile]:
+    """Return the colony's declared worker profiles.
+
+    Legacy colonies (no ``worker_profiles`` field in metadata.json) get a
+    synthetic single-entry list with the default profile, so dispatch logic
+    elsewhere can treat the profile registry as always non-empty.
+    """
+    metadata = load_colony_metadata(colony_name)
+    raw = metadata.get("worker_profiles")
+    if not isinstance(raw, list) or not raw:
+        return [WorkerProfile(name=DEFAULT_PROFILE_NAME)]
+    profiles: list[WorkerProfile] = []
+    seen: set[str] = set()
+    for entry in raw:
+        if not isinstance(entry, dict):
+            continue
+        profile = WorkerProfile.from_dict(entry)
+        if not profile.name or profile.name in seen:
+            continue
+        if validate_profile_name(profile.name) is not None:
+            logger.warning(
+                "worker_profiles: skipping invalid profile name %r in colony %s",
+                profile.name,
+                colony_name,
+            )
+            continue
+        seen.add(profile.name)
+        profiles.append(profile)
+    if not profiles:
+        return [WorkerProfile(name=DEFAULT_PROFILE_NAME)]
+    return profiles
+
+
+def get_worker_profile(colony_name: str, profile_name: str) -> WorkerProfile | None:
+    """Return one profile by name, or ``None`` if not declared."""
+    for profile in list_worker_profiles(colony_name):
+        if profile.name == profile_name:
+            return profile
+    return None
+
+
+def save_worker_profiles(colony_name: str, profiles: list[WorkerProfile]) -> list[WorkerProfile]:
+    """Persist ``profiles`` to the colony's metadata.json.
+
+    Validates names, deduplicates, and refuses to write an empty list (use
+    the default profile representation instead). Returns the canonicalized
+    list as written.
+    """
+    if not colony_metadata_path(colony_name).parent.exists():
+        raise FileNotFoundError(f"Colony '{colony_name}' not found")
+
+    canonical: list[WorkerProfile] = []
+    seen: set[str] = set()
+    for profile in profiles:
+        err = validate_profile_name(profile.name)
+        if err is not None:
+            raise ValueError(err)
+        if profile.name in seen:
+            raise ValueError(f"duplicate worker profile name: {profile.name!r}")
+        seen.add(profile.name)
+        canonical.append(profile)
+    if not canonical:
+        canonical = [WorkerProfile(name=DEFAULT_PROFILE_NAME)]
+    update_colony_metadata(colony_name, {"worker_profiles": [p.to_dict() for p in canonical]})
+    return canonical
+
+
+def upsert_worker_profile(colony_name: str, profile: WorkerProfile) -> list[WorkerProfile]:
+    """Insert or replace a single profile, preserving siblings."""
+    err = validate_profile_name(profile.name)
+    if err is not None:
+        raise ValueError(err)
+    existing = list_worker_profiles(colony_name)
+    out = [p for p in existing if p.name != profile.name]
+    out.append(profile)
+    return save_worker_profiles(colony_name, out)
+
+
+def delete_worker_profile(colony_name: str, profile_name: str) -> bool:
+    """Remove a profile by name. Returns True if a profile was removed.
+
+    Refuses to remove the default profile so dispatch always has a fallback.
+    """
+    if profile_name == DEFAULT_PROFILE_NAME:
+        raise ValueError("cannot delete the default worker profile")
+    existing = list_worker_profiles(colony_name)
+    out = [p for p in existing if p.name != profile_name]
+    if len(out) == len(existing):
+        return False
+    save_worker_profiles(colony_name, out)
+    return True
@@ -23,6 +23,7 @@ from collections.abc import AsyncIterator, Callable, Iterator
 from pathlib import Path
 from typing import Any

+from framework.config import HIVE_HOME as _HIVE_HOME
 from framework.llm.provider import LLMProvider, LLMResponse, Tool
 from framework.llm.stream_events import (
    FinishEvent,
@@ -50,8 +51,8 @@ _ENDPOINTS = [
 _DEFAULT_PROJECT_ID = "rising-fact-p41fc"
 _TOKEN_REFRESH_BUFFER_SECS = 60

-# Credentials file in ~/.hive/ (native implementation)
-_ACCOUNTS_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
+# Credentials file in $HIVE_HOME (native implementation)
+_ACCOUNTS_FILE = _HIVE_HOME / "antigravity-accounts.json"
 _IDE_STATE_DB_MAC = (
    Path.home() / "Library" / "Application Support" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
 )
@@ -653,10 +654,17 @@ class AntigravityProvider(LLMProvider):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        import asyncio  # noqa: PLC0415
        import concurrent.futures  # noqa: PLC0415

+        # Antigravity (Google's proprietary endpoint) doesn't expose a
+        # cache_control hook. Concatenate the dynamic suffix so its shape
+        # matches the legacy single-string call site.
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
+
        loop = asyncio.get_running_loop()
        queue: asyncio.Queue[StreamEvent | None] = asyncio.Queue()

@@ -1,114 +1,32 @@
 """Model capability checks for LLM providers.

-Vision support rules are derived from official vendor documentation:
- ZAI (z.ai): docs.z.ai/guides/vlm — GLM-4.6V variants are vision; GLM-5/4.6/4.7 are text-only
- MiniMax: platform.minimax.io/docs — minimax-vl-01 is vision; M2.x are text-only
- DeepSeek: api-docs.deepseek.com — deepseek-vl2 is vision; chat/reasoner are text-only
- Cerebras: inference-docs.cerebras.ai — no vision models at all
- Groq: console.groq.com/docs/vision — vision capable; treat as supported by default
- Ollama/LM Studio/vLLM/llama.cpp: local runners denied by default; model names
-  don't reliably indicate vision support, so users must configure explicitly
+Vision support is sourced from the curated ``model_catalog.json``. Each model
+entry carries an optional ``supports_vision`` boolean; unknown models default
+to vision-capable so hosted frontier models work out of the box. To toggle
+support for a model, edit its catalog entry rather than this file.
 """

 from __future__ import annotations

 from typing import TYPE_CHECKING

+from framework.llm.model_catalog import model_supports_vision
+
 if TYPE_CHECKING:
    from framework.llm.provider import Tool


-def _model_name(model: str) -> str:
-    """Return the bare model name after stripping any 'provider/' prefix."""
-    if "/" in model:
-        return model.split("/", 1)[1]
-    return model
-
-
-# Step 1: explicit vision allow-list — these always support images regardless
-# of what the provider-level rules say.  Checked first so that e.g. glm-4.6v
-# is allowed even though glm-4.6 is denied.
-_VISION_ALLOW_BARE_PREFIXES: tuple[str, ...] = (
-    # ZAI/GLM vision models (docs.z.ai/guides/vlm)
-    "glm-4v",  # GLM-4V series (legacy)
-    "glm-4.6v",  # GLM-4.6V, GLM-4.6V-flash, GLM-4.6V-flashx
-    # DeepSeek vision models
-    "deepseek-vl",  # deepseek-vl2, deepseek-vl2-small, deepseek-vl2-tiny
-    # MiniMax vision model
-    "minimax-vl",  # minimax-vl-01
-)
-
-# Step 2: provider-level deny — every model from this provider is text-only.
-_TEXT_ONLY_PROVIDER_PREFIXES: tuple[str, ...] = (
-    # Cerebras: inference-docs.cerebras.ai lists only text models
-    "cerebras/",
-    # Local runners: model names don't reliably indicate vision support
-    "ollama/",
-    "ollama_chat/",
-    "lm_studio/",
-    "vllm/",
-    "llamacpp/",
-)
-
-# Step 3: per-model deny — text-only models within otherwise mixed providers.
-# Matched against the bare model name (provider prefix stripped, lower-cased).
-# The vision allow-list above is checked first, so vision variants of the same
-# family are already handled before these deny patterns are reached.
-_TEXT_ONLY_MODEL_BARE_PREFIXES: tuple[str, ...] = (
-    # --- ZAI / GLM family ---
-    # text-only: glm-5, glm-4.6, glm-4.7, glm-4.5, zai-glm-*
-    # vision:    glm-4v, glm-4.6v (caught by allow-list above)
-    "glm-5",
-    "glm-4.6",  # bare glm-4.6 is text-only; glm-4.6v is caught by allow-list
-    "glm-4.7",
-    "glm-4.5",
-    "zai-glm",
-    # --- DeepSeek ---
-    # text-only: deepseek-chat, deepseek-coder, deepseek-reasoner
-    # vision:    deepseek-vl2 (caught by allow-list above)
-    # Note: LiteLLM's deepseek handler may flatten content lists for some models;
-    # VL models are allowed through and rely on LiteLLM's native VL support.
-    "deepseek-chat",
-    "deepseek-coder",
-    "deepseek-reasoner",
-    # --- MiniMax ---
-    # text-only: minimax-m2.*, minimax-text-*, abab* (legacy)
-    # vision:    minimax-vl-01 (caught by allow-list above)
-    "minimax-m2",
-    "minimax-text",
-    "abab",
-)
-
-
 def supports_image_tool_results(model: str) -> bool:
    """Return whether *model* can receive image content in messages.

-    Used to gate both user-message images and tool-result image blocks.
-
-    Logic (checked in order):
-    1. Vision allow-list  → True  (known vision model, skip all denies)
-    2. Provider deny      → False (entire provider is text-only)
-    3. Model deny         → False (specific text-only model within a mixed provider)
-    4. Default            → True  (assume capable; unknown providers and models)
+    Thin wrapper over :func:`model_supports_vision` so existing call sites
+    keep working. Used to gate both user-message images and tool-result
+    image blocks. Empty model strings are treated as capable so the default
+    code path doesn't strip images before a provider is selected.
    """
-    model_lower = model.lower()
-    bare = _model_name(model_lower)
-
-    # 1. Explicit vision allow — takes priority over all denies
-    if any(bare.startswith(p) for p in _VISION_ALLOW_BARE_PREFIXES):
+    if not model:
        return True
-
-    # 2. Provider-level deny (all models from this provider are text-only)
-    if any(model_lower.startswith(p) for p in _TEXT_ONLY_PROVIDER_PREFIXES):
-        return False
-
-    # 3. Per-model deny (text-only variants within mixed-capability families)
-    if any(bare.startswith(p) for p in _TEXT_ONLY_MODEL_BARE_PREFIXES):
-        return False
-
-    # 5. Default: assume vision capable
-    #    Covers: OpenAI, Anthropic, Google, Mistral, Kimi, and other hosted providers
-    return True
+    return model_supports_vision(model)


 def filter_tools_for_model(tools: list[Tool], model: str) -> tuple[list[Tool], list[str]]:
@@ -33,6 +33,7 @@ except ImportError:
    RateLimitError = Exception  # type: ignore[assignment, misc]

 from framework.config import HIVE_LLM_ENDPOINT as HIVE_API_BASE
+from framework.llm.model_catalog import get_model_pricing
 from framework.llm.provider import LLMProvider, LLMResponse, Tool
 from framework.llm.stream_events import StreamEvent

@@ -43,6 +44,30 @@ logging.getLogger("httpx").setLevel(logging.WARNING)
 logging.getLogger("httpcore").setLevel(logging.WARNING)


+def _api_base_needs_bearer_auth(api_base: str | None) -> bool:
+    """Return True when api_base points at an Anthropic-compatible endpoint
+    that authenticates via ``Authorization: Bearer`` rather than ``x-api-key``.
+
+    The Hive LLM proxy (Rust service in hive-backend/llm/) speaks the
+    Anthropic Messages API but mints user-scoped JWTs and validates them
+    via Bearer auth. Default upstream Anthropic endpoints (api.anthropic.com,
+    Kimi's api.kimi.com/coding) keep using x-api-key, so the override is
+    scoped to known hive-proxy hosts plus the env-configured override.
+    """
+    if not api_base:
+        return False
+    # Strip protocol, port, and path so a plain hostname compare is enough
+    # for the common cases.
+    lowered = api_base.lower()
+    for host in ("adenhq.com", "open-hive.com", "127.0.0.1:8890", "localhost:8890"):
+        if host in lowered:
+            return True
+    override = os.environ.get("HIVE_LLM_BASE_URL")
+    if override and override.lower() in lowered:
+        return True
+    return False
+
+
 def _patch_litellm_anthropic_oauth() -> None:
    """Patch litellm's Anthropic header construction to fix OAuth token handling.

@@ -186,6 +211,44 @@ def _ensure_ollama_chat_prefix(model: str) -> str:
    return model


+def rewrite_proxy_model(
+    model: str, api_key: str | None, api_base: str | None
+) -> tuple[str, str | None, dict[str, str]]:
+    """Apply Hive/Kimi proxy rewrites for any caller of ``litellm.acompletion``.
+
+    Both the Hive LLM proxy and Kimi For Coding expose Anthropic-API-
+    compatible endpoints. LiteLLM doesn't recognise the ``hive/`` or
+    ``kimi/`` prefixes natively, so we rewrite them to ``anthropic/``
+    here. For the Hive proxy we also stamp a Bearer token into
+    ``extra_headers`` because litellm's Anthropic handler only sends
+    ``x-api-key`` and the proxy expects ``Authorization: Bearer``.
+
+    Used by ad-hoc ``litellm.acompletion`` callers (e.g. the vision-
+    fallback subagent in ``caption_tool_image``) so they hit the same
+    proxy with the same auth as the main agent's ``LiteLLMProvider``.
+    The provider's own ``__init__`` keeps its inlined rewrite for now —
+    this helper is the single source of truth for ad-hoc callers.
+
+    Returns: (rewritten_model, normalised_api_base, extra_headers).
+    The ``extra_headers`` dict is non-empty only for the Hive proxy
+    (and only when ``api_key`` is provided).
+    """
+    extra_headers: dict[str, str] = {}
+    if model.lower().startswith("kimi/"):
+        model = "anthropic/" + model[len("kimi/") :]
+        if api_base and api_base.rstrip("/").endswith("/v1"):
+            api_base = api_base.rstrip("/")[:-3]
+    elif model.lower().startswith("hive/"):
+        model = "anthropic/" + model[len("hive/") :]
+        if api_base and api_base.rstrip("/").endswith("/v1"):
+            api_base = api_base.rstrip("/")[:-3]
+        # Hive proxy expects Bearer auth; litellm's Anthropic handler
+        # only sends x-api-key without this nudge.
+        if api_key:
+            extra_headers["Authorization"] = f"Bearer {api_key}"
+    return model, api_base, extra_headers
+
+
 RATE_LIMIT_MAX_RETRIES = 10
 RATE_LIMIT_BACKOFF_BASE = 2  # seconds
 RATE_LIMIT_MAX_DELAY = 120  # seconds - cap to prevent absurd waits
@@ -213,9 +276,72 @@ _CACHE_CONTROL_PREFIXES = (
    "glm-",
 )

+# OpenRouter sub-provider prefixes whose upstream API honors `cache_control`.
+# OpenRouter passes the marker through to the underlying provider for these.
+# (See https://openrouter.ai/docs/guides/best-practices/prompt-caching.)
+# OpenAI/DeepSeek/Groq/Grok/Moonshot route through OpenRouter but cache
+# automatically server-side — sending cache_control there is a no-op, not a
+# win, and they need a separate prefix-stability fix to actually get hits.
+_OPENROUTER_CACHE_CONTROL_PREFIXES = (
+    "openrouter/anthropic/",
+    "openrouter/google/gemini-",
+    "openrouter/z-ai/glm",
+    "openrouter/minimax/",
+)
+

 def _model_supports_cache_control(model: str) -> bool:
-    return any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES)
+    if any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES):
+        return True
+    return any(model.startswith(p) for p in _OPENROUTER_CACHE_CONTROL_PREFIXES)
+
+
+def _build_system_message(
+    system: str,
+    system_dynamic_suffix: str | None,
+    model: str,
+) -> dict[str, Any] | None:
+    """Construct the system-role message for the chat completion.
+
+    Returns ``None`` when there is nothing to send.
+
+    Two-block split path — used when the caller supplied a non-empty
+    ``system_dynamic_suffix`` AND the provider honors ``cache_control``
+    (Anthropic, MiniMax, Z-AI/GLM). We emit ``content`` as a list of two
+    text blocks with an ephemeral ``cache_control`` marker on the first
+    block only. The prompt cache keeps the static prefix warm across
+    turns and across iterations within a turn; only the small dynamic
+    tail is recomputed on every request.
+
+    Single-string path — used for every other case (no suffix provided,
+    or provider doesn't honor ``cache_control``). We concatenate
+    ``system`` + ``\\n\\n`` + ``system_dynamic_suffix`` and attach
+    ``cache_control`` to the whole message when the provider supports
+    it. This is byte-identical to the pre-split behavior for all
+    non-cache-control providers (OpenAI, Gemini, Groq, Ollama, etc.).
+    """
+    if not system and not system_dynamic_suffix:
+        return None
+    if system_dynamic_suffix and _model_supports_cache_control(model):
+        content_blocks: list[dict[str, Any]] = []
+        if system:
+            content_blocks.append(
+                {
+                    "type": "text",
+                    "text": system,
+                    "cache_control": {"type": "ephemeral"},
+                }
+            )
+        content_blocks.append({"type": "text", "text": system_dynamic_suffix})
+        return {"role": "system", "content": content_blocks}
+    # Single-string path (legacy or no-cache-control provider).
+    combined = system
+    if system_dynamic_suffix:
+        combined = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
+    sys_msg: dict[str, Any] = {"role": "system", "content": combined}
+    if _model_supports_cache_control(model):
+        sys_msg["cache_control"] = {"type": "ephemeral"}
+    return sys_msg


 # Kimi For Coding uses an Anthropic-compatible endpoint (no /v1 suffix).
@@ -289,14 +415,186 @@ OPENROUTER_TOOL_COMPAT_MODEL_CACHE: dict[str, float] = {}
 # from rate-limit retries — 3 retries is sufficient for connection failures.
 STREAM_TRANSIENT_MAX_RETRIES = 3

-# Directory for dumping failed requests
-FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"

-# Maximum number of dump files to retain in ~/.hive/failed_requests/.
+# Directory for dumping failed requests. Resolved lazily so HIVE_HOME
+# overrides (set by the desktop shell) take effect even if this module
+# is imported before framework.config picks up the override.
+def _failed_requests_dir() -> Path:
+    from framework.config import HIVE_HOME
+
+    return HIVE_HOME / "failed_requests"
+
+
+# Maximum number of dump files to retain in $HIVE_HOME/failed_requests/.
 # Older files are pruned automatically to prevent unbounded disk growth.
 MAX_FAILED_REQUEST_DUMPS = 50


+def _cost_from_catalog_pricing(
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+    cached_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+) -> float:
+    """Last-resort cost calculation using curated catalog pricing.
+
+    Consulted only when the provider response carries no native cost and
+    LiteLLM's own catalog has no pricing for ``model``. Reads
+    ``pricing_usd_per_mtok`` from ``model_catalog.json``. Rates are USD per
+    million tokens.
+
+    ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+    ``input_tokens`` (see ``_extract_cache_tokens``), so subtract them from
+    the base input count to avoid double-billing. If a cache rate is absent,
+    fall back to the plain input rate.
+    """
+    if not model or (input_tokens == 0 and output_tokens == 0):
+        return 0.0
+    pricing = get_model_pricing(model)
+    if pricing is None and "/" in model:
+        # LiteLLM prefixes some ids (e.g. "openrouter/z-ai/glm-5.1"); the
+        # catalog stores the bare form ("z-ai/glm-5.1"). Strip one segment.
+        pricing = get_model_pricing(model.split("/", 1)[1])
+    if pricing is None:
+        return 0.0
+
+    per_mtok_in = pricing.get("input", 0.0)
+    per_mtok_out = pricing.get("output", 0.0)
+    per_mtok_cache_read = pricing.get("cache_read", per_mtok_in)
+    per_mtok_cache_write = pricing.get("cache_creation", per_mtok_in)
+
+    plain_input = max(input_tokens - cached_tokens - cache_creation_tokens, 0)
+    total = (
+        plain_input * per_mtok_in
+        + cached_tokens * per_mtok_cache_read
+        + cache_creation_tokens * per_mtok_cache_write
+        + output_tokens * per_mtok_out
+    ) / 1_000_000
+    return float(total) if total > 0 else 0.0
+
+
+def _extract_cost(response: Any, model: str) -> float:
+    """Pull the USD cost for a non-streaming completion response.
+
+    Sources checked, in priority order:
+      1. ``usage.cost`` — populated when OpenRouter returns native cost via
+         ``usage: {include: true}`` or when ``litellm.include_cost_in_streaming_usage``
+         is on.
+      2. ``response._hidden_params["response_cost"]`` — set by LiteLLM's
+         logging layer after most successful completions.
+      3. ``litellm.completion_cost(...)`` — computes from the model pricing
+         table; works across Anthropic, OpenAI, and OpenRouter as long as the
+         model is in LiteLLM's catalog.
+      4. ``pricing_usd_per_mtok`` from the curated model catalog — covers
+         models (e.g. GLM, Kimi, MiniMax) that LiteLLM doesn't price.
+
+    Returns 0.0 for unpriced models or unexpected response shapes — cost is a
+    display concern, never let it break the hot path. For streaming paths
+    where the aggregate response isn't a full ``ModelResponse``, use
+    :func:`_cost_from_tokens` with the already-extracted token counts.
+    """
+    if response is None:
+        return 0.0
+    usage = getattr(response, "usage", None)
+    usage_cost = getattr(usage, "cost", None) if usage is not None else None
+    if isinstance(usage_cost, (int, float)) and usage_cost > 0:
+        return float(usage_cost)
+
+    hidden = getattr(response, "_hidden_params", None)
+    if isinstance(hidden, dict):
+        hp_cost = hidden.get("response_cost")
+        if isinstance(hp_cost, (int, float)) and hp_cost > 0:
+            return float(hp_cost)
+
+    try:
+        import litellm as _litellm
+
+        computed = _litellm.completion_cost(completion_response=response, model=model)
+        if isinstance(computed, (int, float)) and computed > 0:
+            return float(computed)
+    except Exception as exc:
+        logger.debug("[cost] completion_cost failed for %s: %s", model, exc)
+
+    if usage is not None:
+        input_tokens = int(getattr(usage, "prompt_tokens", 0) or 0)
+        output_tokens = int(getattr(usage, "completion_tokens", 0) or 0)
+        cache_read, cache_creation = _extract_cache_tokens(usage)
+        fallback = _cost_from_catalog_pricing(model, input_tokens, output_tokens, cache_read, cache_creation)
+        if fallback > 0:
+            return fallback
+    return 0.0
+
+
+def _cost_from_tokens(
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+    cached_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+) -> float:
+    """Compute USD cost from already-normalized token counts.
+
+    Used on streaming paths where the aggregate ``response`` is the stream
+    wrapper (not a full ``ModelResponse``) and ``litellm.completion_cost`` on
+    it either no-ops or raises. Calls ``litellm.cost_per_token`` directly
+    with the cache-aware inputs so Anthropic's 5-min-write / cache-read
+    multipliers are applied correctly.
+    """
+    if not model or (input_tokens == 0 and output_tokens == 0):
+        return 0.0
+    try:
+        import litellm as _litellm
+
+        prompt_cost, completion_cost = _litellm.cost_per_token(
+            model=model,
+            prompt_tokens=input_tokens,
+            completion_tokens=output_tokens,
+            cache_read_input_tokens=cached_tokens,
+            cache_creation_input_tokens=cache_creation_tokens,
+        )
+        total = (prompt_cost or 0.0) + (completion_cost or 0.0)
+        if total > 0:
+            return float(total)
+    except Exception as exc:
+        logger.debug("[cost] cost_per_token failed for %s: %s", model, exc)
+    return _cost_from_catalog_pricing(model, input_tokens, output_tokens, cached_tokens, cache_creation_tokens)
+
+
+def _extract_cache_tokens(usage: Any) -> tuple[int, int]:
+    """Pull (cache_read, cache_creation) from a LiteLLM usage object.
+
+    Both are subsets of ``prompt_tokens`` already — providers count them
+    inside the input total. Surface separately for visibility, never sum.
+
+    Field names vary by provider/proxy; check the known shapes in priority
+    order and fall back to 0:
+
+    cache_read:
+      - ``prompt_tokens_details.cached_tokens`` — OpenAI-shape; also what
+        LiteLLM normalizes Anthropic and OpenRouter into.
+      - ``cache_read_input_tokens`` — raw Anthropic field name.
+
+    cache_creation:
+      - ``prompt_tokens_details.cache_write_tokens`` — OpenRouter's
+        normalized field for cache writes (verified empirically against
+        ``openrouter/anthropic/*`` and ``openrouter/z-ai/*`` responses).
+      - ``cache_creation_input_tokens`` — raw Anthropic top-level field.
+    """
+    if not usage:
+        return 0, 0
+    _details = getattr(usage, "prompt_tokens_details", None)
+    cache_read = (
+        getattr(_details, "cached_tokens", 0) or 0
+        if _details is not None
+        else getattr(usage, "cache_read_input_tokens", 0) or 0
+    )
+    cache_creation = (getattr(_details, "cache_write_tokens", 0) or 0 if _details is not None else 0) or (
+        getattr(usage, "cache_creation_input_tokens", 0) or 0
+    )
+    return cache_read, cache_creation
+
+
 def _estimate_tokens(model: str, messages: list[dict]) -> tuple[int, str]:
    """Estimate token count for messages. Returns (token_count, method)."""
    # Try litellm's token counter first
@@ -319,7 +617,7 @@ def _prune_failed_request_dumps(max_files: int = MAX_FAILED_REQUEST_DUMPS) -> No
    """
    try:
        all_dumps = sorted(
-            FAILED_REQUESTS_DIR.glob("*.json"),
+            _failed_requests_dir().glob("*.json"),
            key=lambda f: f.stat().st_mtime,
        )
        excess = len(all_dumps) - max_files
@@ -354,11 +652,12 @@ def _dump_failed_request(
 ) -> str:
    """Dump failed request to a file for debugging. Returns the file path."""
    try:
-        FAILED_REQUESTS_DIR.mkdir(parents=True, exist_ok=True)
+        dump_dir = _failed_requests_dir()
+        dump_dir.mkdir(parents=True, exist_ok=True)

        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
        filename = f"{error_type}_{model.replace('/', '_')}_{timestamp}.json"
-        filepath = FAILED_REQUESTS_DIR / filename
+        filepath = dump_dir / filename

        # Build dump data
        messages = kwargs.get("messages", [])
@@ -388,7 +687,7 @@ def _dump_failed_request(

        return str(filepath)
    except OSError as e:
-        logger.warning(f"Failed to dump request debug log to {FAILED_REQUESTS_DIR}: {e}")
+        logger.warning(f"Failed to dump request debug log to {_failed_requests_dir()}: {e}")
        return "log_write_failed"


@@ -702,6 +1001,7 @@ class LiteLLMProvider(LLMProvider):
        # Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
        # Messages API handler and routes to that endpoint — no special headers needed.
        _original_model = model
+        self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
        if _is_ollama_model(model):
            model = _ensure_ollama_chat_prefix(model)
        elif model.lower().startswith("kimi/"):
@@ -755,6 +1055,7 @@ class LiteLLMProvider(LLMProvider):
        these attributes in-place propagates to all callers on the next LLM call.
        """
        _original_model = model
+        self._hive_proxy_auth = bool(_original_model.lower().startswith("hive/"))
        if _is_ollama_model(model):
            model = _ensure_ollama_chat_prefix(model)
        elif model.lower().startswith("kimi/"):
@@ -994,6 +1295,16 @@ class LiteLLMProvider(LLMProvider):
                # Ollama requires explicit tool_choice=auto for function calling
                # so future readers don't have to guess.
                kwargs.setdefault("tool_choice", "auto")
+            elif self._hive_proxy_auth:
+                # The Hive LLM proxy fronts GLM, which drifts into "explain
+                # the plan" mode on long-context turns instead of emitting
+                # tool_use blocks (verified 2026-04-28: tool_choice=null →
+                # text-only stop=stop; tool_choice=required → clean
+                # tool_use). Force a tool call when tools are available
+                # so queens can't get stuck in chat mode. Callers that
+                # legitimately want a non-tool turn can override via
+                # extra_kwargs.
+                kwargs.setdefault("tool_choice", "required")

        # Add response_format for structured output
        # LiteLLM passes this through to the underlying provider
@@ -1015,12 +1326,17 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)

        return LLMResponse(
            content=content,
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=response.choices[0].finish_reason or "",
            raw_response=response,
        )
@@ -1169,8 +1485,16 @@ class LiteLLMProvider(LLMProvider):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
-        """Async version of complete(). Uses litellm.acompletion — non-blocking."""
+        """Async version of complete(). Uses litellm.acompletion — non-blocking.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail. When set and
+        the provider honors ``cache_control``, ``system`` is sent as the
+        cached prefix and the suffix trails as an uncached second content
+        block. Otherwise the two strings are concatenated into a single
+        system message (legacy behavior).
+        """
        # Codex ChatGPT backend requires streaming — route through stream() which
        # already handles Codex quirks and has proper tool call accumulation.
        if self._codex_backend:
@@ -1181,6 +1505,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
            return await self._collect_stream_to_response(stream_iter)

@@ -1188,10 +1513,8 @@ class LiteLLMProvider(LLMProvider):
        if self._claude_code_oauth:
            billing = _claude_code_billing_header(messages)
            full_messages.append({"role": "system", "content": billing})
-        if system:
-            sys_msg: dict[str, Any] = {"role": "system", "content": system}
-            if _model_supports_cache_control(self.model):
-                sys_msg["cache_control"] = {"type": "ephemeral"}
+        sys_msg = _build_system_message(system, system_dynamic_suffix, self.model)
+        if sys_msg is not None:
            full_messages.append(sys_msg)
        full_messages.extend(messages)

@@ -1219,6 +1542,10 @@ class LiteLLMProvider(LLMProvider):
                # Ollama requires explicit tool_choice=auto for function calling
                # so future readers don't have to guess.
                kwargs.setdefault("tool_choice", "auto")
+            elif self._hive_proxy_auth:
+                # See `complete()` for the rationale: GLM behind the Hive
+                # proxy needs forcing or it goes chat-mode on long contexts.
+                kwargs.setdefault("tool_choice", "required")
        if response_format:
            kwargs["response_format"] = response_format

@@ -1228,12 +1555,17 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)

        return LLMResponse(
            content=content,
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=response.choices[0].finish_reason or "",
            raw_response=response,
        )
@@ -1619,6 +1951,7 @@ class LiteLLMProvider(LLMProvider):
        messages: list[dict[str, Any]],
        system: str,
        tools: list[Tool],
+        system_dynamic_suffix: str | None = None,
    ) -> list[dict[str, Any]]:
        """Build a JSON-only prompt for models without native tool support."""
        tool_specs = [
@@ -1646,7 +1979,19 @@ class LiteLLMProvider(LLMProvider):
        )
        compat_system = compat_instruction if not system else f"{system}\n\n{compat_instruction}"

-        full_messages: list[dict[str, Any]] = [{"role": "system", "content": compat_system}]
+        # If the routed sub-provider honors cache_control (e.g.
+        # openrouter/anthropic/*), split the static prefix from the dynamic
+        # suffix so the prefix stays cache-warm across turns. Otherwise fall
+        # back to a single concatenated string.
+        system_message = _build_system_message(
+            compat_system,
+            system_dynamic_suffix,
+            self.model,
+        )
+
+        full_messages: list[dict[str, Any]] = []
+        if system_message is not None:
+            full_messages.append(system_message)
        full_messages.extend(messages)
        return [
            message
@@ -1660,9 +2005,21 @@ class LiteLLMProvider(LLMProvider):
        system: str,
        tools: list[Tool],
        max_tokens: int,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
-        """Emulate tool calling via JSON when OpenRouter rejects native tools."""
-        full_messages = self._build_openrouter_tool_compat_messages(messages, system, tools)
+        """Emulate tool calling via JSON when OpenRouter rejects native tools.
+
+        When the routed sub-provider honors ``cache_control`` (e.g.
+        ``openrouter/anthropic/*``), the message builder splits the static
+        prefix from the dynamic suffix so the prefix stays cache-warm.
+        Otherwise the suffix is concatenated into a single system string.
+        """
+        full_messages = self._build_openrouter_tool_compat_messages(
+            messages,
+            system,
+            tools,
+            system_dynamic_suffix=system_dynamic_suffix,
+        )
        kwargs: dict[str, Any] = {
            "model": self.model,
            "messages": full_messages,
@@ -1683,6 +2040,8 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)
        stop_reason = "tool_calls" if tool_calls else (response.choices[0].finish_reason or "stop")

        return LLMResponse(
@@ -1690,6 +2049,9 @@ class LiteLLMProvider(LLMProvider):
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=stop_reason,
            raw_response={
                "compat_mode": "openrouter_tool_emulation",
@@ -1704,6 +2066,7 @@ class LiteLLMProvider(LLMProvider):
        system: str,
        tools: list[Tool],
        max_tokens: int,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Fallback stream for OpenRouter models without native tool support."""
        from framework.llm.stream_events import (
@@ -1724,6 +2087,7 @@ class LiteLLMProvider(LLMProvider):
                system=system,
                tools=tools,
                max_tokens=max_tokens,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
        except Exception as e:
            yield StreamErrorEvent(error=str(e), recoverable=False)
@@ -1747,6 +2111,9 @@ class LiteLLMProvider(LLMProvider):
            stop_reason=response.stop_reason,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -1758,6 +2125,7 @@ class LiteLLMProvider(LLMProvider):
        max_tokens: int,
        response_format: dict[str, Any] | None,
        json_mode: bool,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Fallback path: convert non-stream completion to stream events.

@@ -1781,6 +2149,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
        except Exception as e:
            yield StreamErrorEvent(error=str(e), recoverable=False)
@@ -1812,6 +2181,9 @@ class LiteLLMProvider(LLMProvider):
            stop_reason=response.stop_reason or "stop",
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -1823,6 +2195,7 @@ class LiteLLMProvider(LLMProvider):
        max_tokens: int = 4096,
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Stream a completion via litellm.acompletion(stream=True).

@@ -1833,6 +2206,9 @@ class LiteLLMProvider(LLMProvider):
        Empty responses (e.g. Gemini stealth rate-limits that return 200
        with no content) are retried with exponential backoff, mirroring
        the retry behaviour of ``_completion_with_rate_limit_retry``.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail. See
+        ``acomplete`` docstring for the two-block split semantics.
        """
        from framework.llm.stream_events import (
            FinishEvent,
@@ -1852,6 +2228,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            ):
                yield event
            return
@@ -1862,6 +2239,7 @@ class LiteLLMProvider(LLMProvider):
                system=system,
                tools=tools,
                max_tokens=max_tokens,
+                system_dynamic_suffix=system_dynamic_suffix,
            ):
                yield event
            return
@@ -1870,19 +2248,18 @@ class LiteLLMProvider(LLMProvider):
        if self._claude_code_oauth:
            billing = _claude_code_billing_header(messages)
            full_messages.append({"role": "system", "content": billing})
-        if system:
-            sys_msg: dict[str, Any] = {"role": "system", "content": system}
-            if _model_supports_cache_control(self.model):
-                sys_msg["cache_control"] = {"type": "ephemeral"}
+        sys_msg = _build_system_message(system, system_dynamic_suffix, self.model)
+        if sys_msg is not None:
            full_messages.append(sys_msg)
        full_messages.extend(messages)

        if logger.isEnabledFor(logging.DEBUG) and full_messages:
            import json as _json
            from datetime import datetime as _dt
-            from pathlib import Path as _Path

-            _debug_dir = _Path.home() / ".hive" / "debug_logs"
+            from framework.config import HIVE_HOME as _HIVE_HOME
+
+            _debug_dir = _HIVE_HOME / "debug_logs"
            _debug_dir.mkdir(parents=True, exist_ok=True)
            _ts = _dt.now().strftime("%Y%m%d_%H%M%S_%f")
            _dump_file = _debug_dir / f"llm_request_{_ts}.json"
@@ -1953,6 +2330,10 @@ class LiteLLMProvider(LLMProvider):
                # Ollama requires explicit tool_choice=auto for function calling
                # so future readers don't have to guess.
                kwargs.setdefault("tool_choice", "auto")
+            elif self._hive_proxy_auth:
+                # See `complete()` for the rationale: GLM behind the Hive
+                # proxy needs forcing or it goes chat-mode on long contexts.
+                kwargs.setdefault("tool_choice", "required")
        if response_format:
            kwargs["response_format"] = response_format
        # The Codex ChatGPT backend (Responses API) rejects several params.
@@ -2109,37 +2490,44 @@ class LiteLLMProvider(LLMProvider):
                            type(usage).__name__,
                        )
                        cached_tokens = 0
+                        cache_creation_tokens = 0
                        if usage:
                            input_tokens = getattr(usage, "prompt_tokens", 0) or 0
                            output_tokens = getattr(usage, "completion_tokens", 0) or 0
-                            _details = getattr(usage, "prompt_tokens_details", None)
-                            cached_tokens = (
-                                getattr(_details, "cached_tokens", 0) or 0
-                                if _details is not None
-                                else getattr(usage, "cache_read_input_tokens", 0) or 0
-                            )
+                            cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
                            logger.debug(
-                                "[tokens] finish-chunk usage: input=%d output=%d cached=%d model=%s",
+                                "[tokens] finish-chunk usage: input=%d output=%d cached=%d cache_creation=%d model=%s",
                                input_tokens,
                                output_tokens,
                                cached_tokens,
+                                cache_creation_tokens,
                                self.model,
                            )

                        logger.debug(
-                            "[tokens] finish event: input=%d output=%d cached=%d stop=%s model=%s",
+                            "[tokens] finish event: input=%d output=%d cached=%d cache_creation=%d stop=%s model=%s",
                            input_tokens,
                            output_tokens,
                            cached_tokens,
+                            cache_creation_tokens,
                            choice.finish_reason,
                            self.model,
                        )
+                        cost_usd = _cost_from_tokens(
+                            self.model,
+                            input_tokens,
+                            output_tokens,
+                            cached_tokens,
+                            cache_creation_tokens,
+                        )
                        tail_events.append(
                            FinishEvent(
                                stop_reason=choice.finish_reason,
                                input_tokens=input_tokens,
                                output_tokens=output_tokens,
                                cached_tokens=cached_tokens,
+                                cache_creation_tokens=cache_creation_tokens,
+                                cost_usd=cost_usd,
                                model=self.model,
                            )
                        )
@@ -2159,19 +2547,36 @@ class LiteLLMProvider(LLMProvider):
                            _usage = calculate_total_usage(chunks=_chunks)
                            input_tokens = _usage.prompt_tokens or 0
                            output_tokens = _usage.completion_tokens or 0
-                            _details = getattr(_usage, "prompt_tokens_details", None)
-                            cached_tokens = (
-                                getattr(_details, "cached_tokens", 0) or 0
-                                if _details is not None
-                                else getattr(_usage, "cache_read_input_tokens", 0) or 0
-                            )
+                            # `calculate_total_usage` aggregates token totals
+                            # but discards `prompt_tokens_details` — which is
+                            # where OpenRouter puts `cached_tokens` and
+                            # `cache_write_tokens`. Recover them directly
+                            # from the most recent chunk that carries usage.
+                            cached_tokens, cache_creation_tokens = 0, 0
+                            for _raw in reversed(_chunks):
+                                _raw_usage = getattr(_raw, "usage", None)
+                                if _raw_usage is None:
+                                    continue
+                                _cr, _cc = _extract_cache_tokens(_raw_usage)
+                                if _cr or _cc:
+                                    cached_tokens, cache_creation_tokens = _cr, _cc
+                                    break
                            logger.debug(
-                                "[tokens] post-loop chunks fallback: input=%d output=%d cached=%d model=%s",
+                                "[tokens] post-loop chunks fallback: input=%d output=%d "
+                                "cached=%d cache_creation=%d model=%s",
                                input_tokens,
                                output_tokens,
                                cached_tokens,
+                                cache_creation_tokens,
                                self.model,
                            )
+                            cost_usd = _cost_from_tokens(
+                                self.model,
+                                input_tokens,
+                                output_tokens,
+                                cached_tokens,
+                                cache_creation_tokens,
+                            )
                            # Patch the FinishEvent already queued with 0 tokens
                            for _i, _ev in enumerate(tail_events):
                                if isinstance(_ev, FinishEvent) and _ev.input_tokens == 0:
@@ -2180,6 +2585,8 @@ class LiteLLMProvider(LLMProvider):
                                        input_tokens=input_tokens,
                                        output_tokens=output_tokens,
                                        cached_tokens=cached_tokens,
+                                        cache_creation_tokens=cache_creation_tokens,
+                                        cost_usd=cost_usd,
                                        model=_ev.model,
                                    )
                                    break
@@ -2390,6 +2797,8 @@ class LiteLLMProvider(LLMProvider):
        tool_calls: list[dict[str, Any]] = []
        input_tokens = 0
        output_tokens = 0
+        cached_tokens = 0
+        cache_creation_tokens = 0
        stop_reason = ""
        model = self.model

@@ -2407,6 +2816,8 @@ class LiteLLMProvider(LLMProvider):
            elif isinstance(event, FinishEvent):
                input_tokens = event.input_tokens
                output_tokens = event.output_tokens
+                cached_tokens = event.cached_tokens
+                cache_creation_tokens = event.cache_creation_tokens
                stop_reason = event.stop_reason
                if event.model:
                    model = event.model
@@ -2419,6 +2830,8 @@ class LiteLLMProvider(LLMProvider):
            model=model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
            stop_reason=stop_reason,
            raw_response={"tool_calls": tool_calls} if tool_calls else None,
        )
@@ -155,8 +155,11 @@ class MockLLMProvider(LLMProvider):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
        """Async mock completion (no I/O, returns immediately)."""
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        return self.complete(
            messages=messages,
            system=system,
@@ -173,6 +176,7 @@ class MockLLMProvider(LLMProvider):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Stream a mock completion as word-level TextDeltaEvents.

@@ -180,6 +184,8 @@ class MockLLMProvider(LLMProvider):
        TextDeltaEvent with an accumulating snapshot, exercising the full
        streaming pipeline without any API calls.
        """
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        content = self._generate_mock_response(system=system, json_mode=False)
        words = content.split(" ")
        accumulated = ""
@@ -9,47 +9,65 @@
          "label": "Haiku 4.5 - Fast + cheap",
          "recommended": false,
          "max_tokens": 64000,
-          "max_context_tokens": 136000
+          "max_context_tokens": 136000,
+          "supports_vision": true
        },
        {
          "id": "claude-sonnet-4-5-20250929",
          "label": "Sonnet 4.5 - Best balance",
          "recommended": false,
          "max_tokens": 64000,
-          "max_context_tokens": 136000
+          "max_context_tokens": 136000,
+          "supports_vision": true
        },
        {
          "id": "claude-opus-4-6",
          "label": "Opus 4.6 - Most capable",
          "recommended": true,
          "max_tokens": 128000,
-          "max_context_tokens": 872000
+          "max_context_tokens": 872000,
+          "supports_vision": true
        }
      ]
    },
    "openai": {
-      "default_model": "gpt-5.4",
+      "default_model": "gpt-5.5",
      "models": [
        {
-          "id": "gpt-5.4",
-          "label": "GPT-5.4 - Best intelligence",
+          "id": "gpt-5.5",
+          "label": "GPT-5.5 - Frontier coding + reasoning",
          "recommended": true,
          "max_tokens": 128000,
-          "max_context_tokens": 960000
+          "max_context_tokens": 1050000,
+          "pricing_usd_per_mtok": {
+            "input": 5.00,
+            "output": 30.00
+          },
+          "supports_vision": true
+        },
+        {
+          "id": "gpt-5.4",
+          "label": "GPT-5.4 - Previous flagship",
+          "recommended": false,
+          "max_tokens": 128000,
+          "max_context_tokens": 960000,
+          "supports_vision": true
        },
        {
          "id": "gpt-5.4-mini",
          "label": "GPT-5.4 Mini - Faster + cheaper",
          "recommended": false,
          "max_tokens": 128000,
-          "max_context_tokens": 400000
+          "max_context_tokens": 400000,
+          "supports_vision": true
        },
        {
          "id": "gpt-5.4-nano",
          "label": "GPT-5.4 Nano - Cheapest high-volume",
          "recommended": false,
          "max_tokens": 128000,
-          "max_context_tokens": 400000
+          "max_context_tokens": 400000,
+          "supports_vision": true
        }
      ]
    },
@@ -61,14 +79,16 @@
          "label": "Gemini 3 Flash - Fast",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 240000
+          "max_context_tokens": 240000,
+          "supports_vision": true
        },
        {
          "id": "gemini-3.1-pro-preview-customtools",
          "label": "Gemini 3.1 Pro - Best quality",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 240000
+          "max_context_tokens": 240000,
+          "supports_vision": true
        }
      ]
    },
@@ -80,28 +100,32 @@
          "label": "GPT-OSS 120B - Best reasoning",
          "recommended": true,
          "max_tokens": 65536,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        },
        {
          "id": "openai/gpt-oss-20b",
          "label": "GPT-OSS 20B - Fast + cheaper",
          "recommended": false,
          "max_tokens": 65536,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        },
        {
          "id": "llama-3.3-70b-versatile",
          "label": "Llama 3.3 70B - General purpose",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        },
        {
          "id": "llama-3.1-8b-instant",
          "label": "Llama 3.1 8B - Fastest",
          "recommended": false,
          "max_tokens": 131072,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        }
      ]
    },
@@ -113,21 +137,24 @@
          "label": "GPT-OSS 120B - Best production reasoning",
          "recommended": true,
          "max_tokens": 40960,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        },
        {
          "id": "zai-glm-4.7",
          "label": "Z.ai GLM 4.7 - Strong coding preview",
          "recommended": true,
          "max_tokens": 40960,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        },
        {
          "id": "qwen-3-235b-a22b-instruct-2507",
          "label": "Qwen 3 235B Instruct - Frontier preview",
          "recommended": false,
          "max_tokens": 40960,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        }
      ]
    },
@@ -139,14 +166,20 @@
          "label": "MiniMax M2.7 - Best coding quality",
          "recommended": true,
          "max_tokens": 40960,
-          "max_context_tokens": 180000
+          "max_context_tokens": 180000,
+          "pricing_usd_per_mtok": {
+            "input": 0.30,
+            "output": 1.20
+          },
+          "supports_vision": false
        },
        {
          "id": "MiniMax-M2.5",
          "label": "MiniMax M2.5 - Strong value",
          "recommended": false,
          "max_tokens": 40960,
-          "max_context_tokens": 180000
+          "max_context_tokens": 180000,
+          "supports_vision": false
        }
      ]
    },
@@ -158,28 +191,32 @@
          "label": "Mistral Large 3 - Best quality",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 256000
+          "max_context_tokens": 256000,
+          "supports_vision": true
        },
        {
          "id": "mistral-medium-2508",
          "label": "Mistral Medium 3.1 - Balanced",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 128000
+          "max_context_tokens": 128000,
+          "supports_vision": true
        },
        {
          "id": "mistral-small-2603",
          "label": "Mistral Small 4 - Fast + capable",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 256000
+          "max_context_tokens": 256000,
+          "supports_vision": true
        },
        {
          "id": "codestral-2508",
          "label": "Codestral - Coding specialist",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 128000
+          "max_context_tokens": 128000,
+          "supports_vision": false
        }
      ]
    },
@@ -191,47 +228,71 @@
          "label": "DeepSeek V3.1 - Best general coding",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 128000
+          "max_context_tokens": 128000,
+          "supports_vision": false
        },
        {
          "id": "Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8",
          "label": "Qwen3 Coder 480B - Advanced coding",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 262144
+          "max_context_tokens": 262144,
+          "supports_vision": false
        },
        {
          "id": "openai/gpt-oss-120b",
          "label": "GPT-OSS 120B - Strong reasoning",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 128000
+          "max_context_tokens": 128000,
+          "supports_vision": false
        },
        {
          "id": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
          "label": "Llama 3.3 70B Turbo - Fast baseline",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 131072
+          "max_context_tokens": 131072,
+          "supports_vision": false
        }
      ]
    },
    "deepseek": {
-      "default_model": "deepseek-chat",
+      "default_model": "deepseek-v4-pro",
      "models": [
        {
-          "id": "deepseek-chat",
-          "label": "DeepSeek Chat - Fast default",
+          "id": "deepseek-v4-pro",
+          "label": "DeepSeek V4 Pro - Most capable",
          "recommended": true,
-          "max_tokens": 8192,
-          "max_context_tokens": 128000
+          "max_tokens": 384000,
+          "max_context_tokens": 1000000,
+          "pricing_usd_per_mtok": {
+            "input": 1.74,
+            "output": 3.48,
+            "cache_read": 0.145
+          },
+          "supports_vision": false
+        },
+        {
+          "id": "deepseek-v4-flash",
+          "label": "DeepSeek V4 Flash - Fast + cheap",
+          "recommended": true,
+          "max_tokens": 384000,
+          "max_context_tokens": 1000000,
+          "pricing_usd_per_mtok": {
+            "input": 0.14,
+            "output": 0.28,
+            "cache_read": 0.028
+          },
+          "supports_vision": false
        },
        {
          "id": "deepseek-reasoner",
-          "label": "DeepSeek Reasoner - Deep thinking",
+          "label": "DeepSeek Reasoner - Legacy (deprecating)",
          "recommended": false,
          "max_tokens": 64000,
-          "max_context_tokens": 128000
+          "max_context_tokens": 128000,
+          "supports_vision": false
        }
      ]
    },
@@ -243,7 +304,13 @@
          "label": "Kimi K2.5 - Best coding",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 200000
+          "max_context_tokens": 200000,
+          "pricing_usd_per_mtok": {
+            "input": 0.60,
+            "output": 2.50,
+            "cache_read": 0.15
+          },
+          "supports_vision": true
        }
      ]
    },
@@ -255,21 +322,30 @@
          "label": "Queen - Hive native",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 180000
+          "max_context_tokens": 180000,
+          "supports_vision": false
        },
        {
-          "id": "kimi-2.5",
+          "id": "kimi-k2.5",
          "label": "Kimi 2.5 - Via Hive",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 240000
+          "max_context_tokens": 240000,
+          "supports_vision": true
        },
        {
-          "id": "GLM-5",
-          "label": "GLM-5 - Via Hive",
+          "id": "glm-5.1",
+          "label": "GLM-5.1 - Via Hive",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 180000
+          "max_context_tokens": 180000,
+          "pricing_usd_per_mtok": {
+            "input": 1.40,
+            "output": 4.40,
+            "cache_read": 0.26,
+            "cache_creation": 0.0
+          },
+          "supports_vision": false
        }
      ]
    },
@@ -281,63 +357,82 @@
          "label": "GPT-5.4 - Best overall",
          "recommended": true,
          "max_tokens": 128000,
-          "max_context_tokens": 872000
+          "max_context_tokens": 872000,
+          "supports_vision": true
        },
        {
          "id": "anthropic/claude-sonnet-4.6",
          "label": "Claude Sonnet 4.6 - Best coding balance",
          "recommended": false,
          "max_tokens": 64000,
-          "max_context_tokens": 872000
+          "max_context_tokens": 872000,
+          "supports_vision": true
        },
        {
          "id": "anthropic/claude-opus-4.6",
          "label": "Claude Opus 4.6 - Most capable",
          "recommended": false,
          "max_tokens": 128000,
-          "max_context_tokens": 872000
+          "max_context_tokens": 872000,
+          "supports_vision": true
        },
        {
          "id": "google/gemini-3.1-pro-preview-customtools",
          "label": "Gemini 3.1 Pro Preview - Long-context reasoning",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 872000
+          "max_context_tokens": 872000,
+          "supports_vision": true
        },
        {
          "id": "qwen/qwen3.6-plus",
          "label": "Qwen 3.6 Plus - Strong reasoning",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 240000
+          "max_context_tokens": 240000,
+          "supports_vision": false
        },
        {
          "id": "z-ai/glm-5v-turbo",
          "label": "GLM-5V Turbo - Vision capable",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 192000
+          "max_context_tokens": 192000,
+          "supports_vision": true
        },
        {
          "id": "z-ai/glm-5.1",
          "label": "GLM-5.1 - Better but Slower",
          "recommended": true,
          "max_tokens": 40960,
-          "max_context_tokens": 192000
+          "max_context_tokens": 192000,
+          "pricing_usd_per_mtok": {
+            "input": 1.40,
+            "output": 4.40,
+            "cache_read": 0.26,
+            "cache_creation": 0.0
+          },
+          "supports_vision": false
        },
        {
          "id": "minimax/minimax-m2.7",
          "label": "Minimax M2.7 - Minimax flagship",
          "recommended": false,
          "max_tokens": 40960,
-          "max_context_tokens": 180000
+          "max_context_tokens": 180000,
+          "pricing_usd_per_mtok": {
+            "input": 0.30,
+            "output": 1.20
+          },
+          "supports_vision": false
        },
        {
          "id": "xiaomi/mimo-v2-pro",
          "label": "MiMo V2 Pro - Xiaomi multimodal",
          "recommended": true,
          "max_tokens": 64000,
-          "max_context_tokens": 240000
+          "max_context_tokens": 240000,
+          "supports_vision": true
        }
      ]
    }
@@ -352,7 +447,7 @@
    "zai_code": {
      "provider": "openai",
      "api_key_env_var": "ZAI_API_KEY",
-      "model": "glm-5",
+      "model": "glm-5.1",
      "max_tokens": 32768,
      "max_context_tokens": 180000,
      "api_base": "https://api.z.ai/api/coding/paas/v4"
@@ -394,13 +489,13 @@
          "recommended": true
        },
        {
-          "id": "kimi-2.5",
-          "label": "kimi-2.5",
+          "id": "kimi-k2.5",
+          "label": "kimi-k2.5",
          "recommended": false
        },
        {
-          "id": "GLM-5",
-          "label": "GLM-5",
+          "id": "glm-5.1",
+          "label": "glm-5.1",
          "recommended": false
        }
      ]
@@ -27,6 +27,28 @@ def _require_list(value: Any, path: str) -> list[Any]:
    return value


+_PRICING_KEYS = ("input", "output", "cache_read", "cache_creation")
+
+
+def _validate_pricing(value: Any, path: str) -> None:
+    """Validate an optional ``pricing_usd_per_mtok`` block.
+
+    Keys are USD-per-million-tokens rates. ``input``/``output`` are required;
+    ``cache_read``/``cache_creation`` are optional. All values must be
+    non-negative numbers. Used as a last-resort fallback when neither the
+    provider nor LiteLLM's catalog reports a cost.
+    """
+    pricing = _require_mapping(value, path)
+    for key in ("input", "output"):
+        if key not in pricing:
+            raise ModelCatalogError(f"{path}.{key} is required")
+    for key, rate in pricing.items():
+        if key not in _PRICING_KEYS:
+            raise ModelCatalogError(f"{path}.{key} is not a recognized pricing field")
+        if not isinstance(rate, (int, float)) or isinstance(rate, bool) or rate < 0:
+            raise ModelCatalogError(f"{path}.{key} must be a non-negative number")
+
+
 def _validate_model_catalog(data: dict[str, Any]) -> dict[str, Any]:
    providers = _require_mapping(data.get("providers"), "providers")

@@ -69,6 +91,14 @@ def _validate_model_catalog(data: dict[str, Any]) -> dict[str, Any]:
                if not isinstance(value, int) or value <= 0:
                    raise ModelCatalogError(f"{model_path}.{key} must be a positive integer")

+            pricing = model_map.get("pricing_usd_per_mtok")
+            if pricing is not None:
+                _validate_pricing(pricing, f"{model_path}.pricing_usd_per_mtok")
+
+            supports_vision = model_map.get("supports_vision")
+            if supports_vision is not None and not isinstance(supports_vision, bool):
+                raise ModelCatalogError(f"{model_path}.supports_vision must be a boolean when present")
+
        if not default_found:
            raise ModelCatalogError(
                f"{provider_path}.default_model={default_model!r} is not present in {provider_path}.models"
@@ -184,6 +214,53 @@ def get_model_limits(provider: str, model_id: str) -> tuple[int, int] | None:
    return int(model["max_tokens"]), int(model["max_context_tokens"])


+def get_model_pricing(model_id: str) -> dict[str, float] | None:
+    """Return ``pricing_usd_per_mtok`` for a model id, searching all providers.
+
+    Returns ``None`` when the model is absent from the catalog or has no
+    pricing entry. Used by the cost-extraction fallback in ``litellm.py``
+    when the provider response and LiteLLM's catalog both come up empty.
+    """
+    if not model_id:
+        return None
+    for provider_info in load_model_catalog()["providers"].values():
+        for model in provider_info["models"]:
+            if model["id"] == model_id:
+                pricing = model.get("pricing_usd_per_mtok")
+                if pricing is None:
+                    return None
+                return {key: float(rate) for key, rate in pricing.items()}
+    return None
+
+
+def model_supports_vision(model_id: str) -> bool:
+    """Return whether *model_id* supports image inputs per the curated catalog.
+
+    Looks up the bare model id (and the provider-prefix-stripped form) in the
+    catalog. Returns the model's ``supports_vision`` flag when found, defaulting
+    to ``True`` for unknown models or when the flag is absent — assume vision
+    capable for hosted providers, since modern frontier models support images
+    by default and the captioning fallback is more expensive than just letting
+    the provider handle the image.
+    """
+    if not model_id:
+        return True
+
+    candidates = [model_id]
+    if "/" in model_id:
+        candidates.append(model_id.split("/", 1)[1])
+
+    for candidate in candidates:
+        for provider_info in load_model_catalog()["providers"].values():
+            for model in provider_info["models"]:
+                if model["id"] == candidate:
+                    flag = model.get("supports_vision")
+                    if isinstance(flag, bool):
+                        return flag
+                    return True
+    return True
+
+
 def get_preset(preset_id: str) -> dict[str, Any] | None:
    """Return one preset entry."""
    preset = load_model_catalog()["presets"].get(preset_id)
@@ -10,12 +10,24 @@ from typing import Any

@dataclass
 class LLMResponse:
-    """Response from an LLM call."""
+    """Response from an LLM call.
+
+    ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+    ``input_tokens`` (providers report them inside ``prompt_tokens``).
+    Surface them for visibility; do not add to a total.
+
+    ``cost_usd`` is the per-call USD cost when the provider / pricing table
+    can produce one (Anthropic, OpenAI, OpenRouter are supported). 0.0 when
+    unknown or unpriced — treat as "unreported", not "free".
+    """

    content: str
    model: str
    input_tokens: int = 0
    output_tokens: int = 0
+    cached_tokens: int = 0
+    cache_creation_tokens: int = 0
+    cost_usd: float = 0.0
    stop_reason: str = ""
    raw_response: Any = None

@@ -110,19 +122,28 @@ class LLMProvider(ABC):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> "LLMResponse":
        """Async version of complete(). Non-blocking on the event loop.

        Default implementation offloads the sync complete() to a thread pool.
        Subclasses SHOULD override for native async I/O.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail for providers
+        that honor ``cache_control`` (see LiteLLMProvider for semantics).
+        The default implementation concatenates it onto ``system`` since the
+        sync ``complete()`` path does not support the split.
        """
+        combined_system = system
+        if system_dynamic_suffix:
+            combined_system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        loop = asyncio.get_running_loop()
        return await loop.run_in_executor(
            None,
            partial(
                self.complete,
                messages=messages,
-                system=system,
+                system=combined_system,
                tools=tools,
                max_tokens=max_tokens,
                response_format=response_format,
@@ -137,6 +158,7 @@ class LLMProvider(ABC):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator["StreamEvent"]:
        """
        Stream a completion as an async iterator of StreamEvents.
@@ -147,6 +169,9 @@ class LLMProvider(ABC):
        Tool orchestration is the CALLER's responsibility:
        - Caller detects ToolCallEvent, executes tool, adds result
          to messages, calls stream() again.
+
+        ``system_dynamic_suffix`` is forwarded to ``acomplete``; see its
+        docstring for the two-block split semantics.
        """
        from framework.llm.stream_events import (
            FinishEvent,
@@ -159,6 +184,7 @@ class LLMProvider(ABC):
            system=system,
            tools=tools,
            max_tokens=max_tokens,
+            system_dynamic_suffix=system_dynamic_suffix,
        )
        yield TextDeltaEvent(content=response.content, snapshot=response.content)
        yield TextEndEvent(full_text=response.content)
@@ -166,6 +192,9 @@ class LLMProvider(ABC):
            stop_reason=response.stop_reason,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -65,13 +65,23 @@ class ReasoningDeltaEvent:

@dataclass(frozen=True)
 class FinishEvent:
-    """The LLM has finished generating."""
+    """The LLM has finished generating.
+
+    ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+    ``input_tokens`` — providers count both inside ``prompt_tokens`` already.
+    Surface them separately for visibility; never add to a total.
+
+    ``cost_usd`` is the per-turn USD cost when the provider or LiteLLM's
+    pricing table supplies one; 0.0 means unreported (not free).
+    """

    type: Literal["finish"] = "finish"
    stop_reason: str = ""
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
+    cache_creation_tokens: int = 0
+    cost_usd: float = 0.0
    model: str = ""


@@ -9,7 +9,7 @@ from datetime import UTC
 from pathlib import Path
 from typing import Any

-from framework.config import get_hive_config, get_preferred_model
+from framework.config import HIVE_HOME as _HIVE_HOME, get_hive_config, get_preferred_model
 from framework.credentials.validation import (
    ensure_credential_key_env as _ensure_credential_key_env,
 )
@@ -558,7 +558,7 @@ ANTIGRAVITY_IDE_STATE_DB = (
 # Linux fallback for the IDE state DB
 ANTIGRAVITY_IDE_STATE_DB_LINUX = Path.home() / ".config" / "Antigravity" / "User" / "globalStorage" / "state.vscdb"
 # Antigravity credentials stored by native OAuth implementation
-ANTIGRAVITY_AUTH_FILE = Path.home() / ".hive" / "antigravity-accounts.json"
+ANTIGRAVITY_AUTH_FILE = _HIVE_HOME / "antigravity-accounts.json"

 ANTIGRAVITY_OAUTH_TOKEN_URL = "https://oauth2.googleapis.com/token"
 _ANTIGRAVITY_TOKEN_LIFETIME_SECS = 3600  # Google access tokens expire in 1 hour
@@ -1389,7 +1389,7 @@ class AgentLoader:
        )

        if storage_path is None:
-            storage_path = Path.home() / ".hive" / "agents" / agent_path.name / worker_name
+            storage_path = _HIVE_HOME / "agents" / agent_path.name / worker_name
            storage_path.mkdir(parents=True, exist_ok=True)

        runner = cls(
@@ -1503,6 +1503,7 @@ class AgentLoader:
        from framework.pipeline.stages.mcp_registry import McpRegistryStage
        from framework.pipeline.stages.skill_registry import SkillRegistryStage
        from framework.skills.config import SkillsConfig
+        from framework.skills.discovery import ExtraScope

        configure_logging(level="INFO", format="auto")

@@ -1545,6 +1546,19 @@ class AgentLoader:
                    default_skills=getattr(self, "_agent_default_skills", None),
                    skills=getattr(self, "_agent_skills", None),
                ),
+                # Surface the colony's flat ``skills/`` directory as a
+                # ``colony_ui`` extra scope so SKILL.md files written there
+                # by ``create_colony`` (or the HTTP routes) are picked up
+                # with correct provenance. The legacy nested
+                # ``<colony>/.hive/skills/`` path is still picked up via
+                # project-scope auto-discovery (project_root above).
+                extra_scope_dirs=[
+                    ExtraScope(
+                        directory=self.agent_path / "skills",
+                        label="colony_ui",
+                        priority=3,
+                    )
+                ],
            ),
        ]

@@ -51,6 +51,14 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
        "description": "File I/O: read, write, edit, search, list, run commands",
        "args": ["run", "python", "files_server.py", "--stdio"],
    },
+    "terminal-tools": {
+        "description": "Terminal capabilities",
+        "args": ["run", "python", "terminal_tools_server.py", "--stdio"],
+    },
+    "chart-tools": {
+        "description": "BI/financial chart + diagram rendering: ECharts, Mermaid",
+        "args": ["run", "python", "chart_tools_server.py", "--stdio"],
+    },
 }

 # Aliases that earlier versions of ensure_defaults wrote under the wrong name.
@@ -58,14 +66,22 @@ _DEFAULT_LOCAL_SERVERS: dict[str, dict[str, Any]] = {
 # name so the active agents (queen, credential_tester) can find their tools.
 _STALE_DEFAULT_ALIASES: dict[str, str] = {
    "hive_tools": "hive-tools",
+    # 2026-04-30: shell-tools renamed to terminal-tools. Drop the stale name
+    # on next ensure_defaults() so the queen's allowlist (which now includes
+    # @server:terminal-tools) actually finds a server with the new name.
+    "terminal-tools": "shell-tools",
 }


 class MCPRegistry:
-    """Manages local MCP server state in ~/.hive/mcp_registry/."""
+    """Manages local MCP server state in $HIVE_HOME/mcp_registry/."""

    def __init__(self, base_path: Path | None = None):
-        self._base = base_path or Path.home() / ".hive" / "mcp_registry"
+        if base_path is None:
+            from framework.config import HIVE_HOME
+
+            base_path = HIVE_HOME / "mcp_registry"
+        self._base = base_path
        self._installed_path = self._base / "installed.json"
        self._config_path = self._base / "config.json"
        self._cache_dir = self._base / "cache"
@@ -73,7 +89,30 @@ class MCPRegistry:
    # ── Initialization ──────────────────────────────────────────────

    def initialize(self) -> None:
-        """Create directory structure and default files if missing."""
+        """Create directory structure, default files, and seed bundled servers.
+
+        Every read path (queen orchestrator, pipeline stage, CLI, routes)
+        calls this — keeping the seeding here means a fresh ``HIVE_HOME``
+        (e.g. the desktop's per-user dir under ``~/.config/Hive/users/<hash>/``
+        or ``~/Library/Application Support/Hive/users/<hash>/``) is always
+        populated with ``hive_tools`` / ``gcu-tools`` / ``files-tools`` /
+        ``shell-tools`` before any agent code reads ``installed.json``.
+        Without this, ``load_agent_selection()`` resolves an empty registry
+        and emits "Server X requested but not installed" warnings even
+        though the server is bundled.
+
+        Idempotent — already-installed entries are left untouched.
+        """
+        self._bootstrap_io()
+        self._seed_defaults()
+
+    def _bootstrap_io(self) -> None:
+        """Create the registry directory + empty config/installed files.
+
+        Split out from ``initialize()`` so ``_seed_defaults()`` can call it
+        without re-entering the seeding logic (which would recurse via
+        ``_read_installed()`` → ``initialize()``).
+        """
        self._base.mkdir(parents=True, exist_ok=True)
        self._cache_dir.mkdir(parents=True, exist_ok=True)

@@ -84,21 +123,26 @@ class MCPRegistry:
            self._write_json(self._installed_path, {"servers": {}})

    def ensure_defaults(self) -> list[str]:
-        """Seed the built-in local MCP servers (hive-tools, gcu-tools, files-tools).
+        """Public alias kept for the ``hive mcp init`` CLI command.

-        Idempotent — servers already present are left untouched. Skips seeding
-        entirely when the source-tree ``tools/`` directory cannot be located
-        (e.g. when Hive is installed from a wheel rather than a checkout).
-
-        Returns the list of names that were newly registered.
+        Returns the list of newly-registered server names so the CLI can
+        print them. Same idempotent seeding logic as ``initialize()``.
        """
-        self.initialize()
+        self._bootstrap_io()
+        return self._seed_defaults()

+    def _seed_defaults(self) -> list[str]:
+        """Idempotently register the bundled default local servers.
+
+        Skips entirely when the source-tree ``tools/`` directory cannot
+        be located (e.g. wheel installs). Returns the list of names that
+        were newly registered.
+        """
        # parents: [0]=loader, [1]=framework, [2]=core, [3]=repo root
        tools_dir = Path(__file__).resolve().parents[3] / "tools"
        if not tools_dir.is_dir():
            logger.debug(
-                "MCPRegistry.ensure_defaults: tools dir %s missing; skipping default seed",
+                "MCPRegistry._seed_defaults: tools dir %s missing; skipping default seed",
                tools_dir,
            )
            return []
@@ -115,7 +159,7 @@ class MCPRegistry:
        for canonical, stale in _STALE_DEFAULT_ALIASES.items():
            if stale in existing and canonical not in existing:
                logger.info(
-                    "MCPRegistry.ensure_defaults: removing stale alias '%s' (canonical: '%s')",
+                    "MCPRegistry._seed_defaults: removing stale alias '%s' (canonical: '%s')",
                    stale,
                    canonical,
                )
@@ -138,7 +182,7 @@ class MCPRegistry:
                )
                added.append(name)
            except MCPError as exc:
-                logger.warning("MCPRegistry.ensure_defaults: failed to seed '%s': %s", name, exc)
+                logger.warning("MCPRegistry._seed_defaults: failed to seed '%s': %s", name, exc)

        if added:
            logger.info("MCPRegistry: seeded default local servers: %s", added)
@@ -71,25 +71,36 @@ class ToolRegistry:
        {
            # File system reads
            "read_file",
-            "list_directory",
-            "grep",
-            "glob",
-            # Web reads
-            "web_search",
-            "web_fetch",
+            "search_files",
+            "pdf_read",
+            # Terminal reads (rg / find / output buffer polling — neither
+            # changes process state)
+            "terminal_rg",
+            "terminal_find",
+            "terminal_output_get",
+            # Web / research reads (re-issuable, side-effect-free fetches)
+            "web_scrape",
+            "search_papers",
+            "search_wikipedia",
+            "download_paper",
            # Browser read-only snapshots (mutate-free observations)
            "browser_screenshot",
            "browser_snapshot",
            "browser_console",
            "browser_get_text",
-            # Background bash polling - reads output buffers only, does
-            # not touch the subprocess itself.
-            "bash_output",
+            "browser_html",
+            "browser_get_attribute",
+            "browser_get_rect",
        }
    )

-    # Credential directory used for change detection
-    _CREDENTIAL_DIR = Path("~/.hive/credentials/credentials").expanduser()
+    # Credential directory used for change detection. Resolved at attribute
+    # access so HIVE_HOME overrides (set by the desktop) are honoured.
+    @property
+    def _CREDENTIAL_DIR(self) -> Path:
+        from framework.config import HIVE_HOME
+
+        return HIVE_HOME / "credentials" / "credentials"

    def __init__(self):
        self._tools: dict[str, RegisteredTool] = {}
@@ -457,7 +468,7 @@ class ToolRegistry:
            else:
                resolved_cwd = (base_dir / cwd).resolve()

-        # Find .py script in args (e.g. coder_tools_server.py, files_server.py)
+        # Find .py script in args (e.g. files_server.py)
        script_name = None
        for i, arg in enumerate(args):
            if isinstance(arg, str) and arg.endswith(".py"):
@@ -497,24 +508,6 @@ class ToolRegistry:
            config["cwd"] = str(resolved_cwd)
            return config

-        # For coder_tools_server, inject --project-root so reads land
-        # in the expected workspace (hive repo, for framework skills
-        # and docs), and inject --write-root so writes land under
-        # ~/.hive/workspace/ instead of polluting the git checkout
-        # with queen-authored skills, ledgers, and scripts. Without
-        # the split, every ``write_file`` call from the queen landed
-        # in the hive repo root.
-        if script_name and "coder_tools" in script_name:
-            project_root = str(resolved_cwd.parent.resolve())
-            args = list(args)
-            if "--project-root" not in args:
-                args.extend(["--project-root", project_root])
-            if "--write-root" not in args:
-                _write_root = Path.home() / ".hive" / "workspace"
-                _write_root.mkdir(parents=True, exist_ok=True)
-                args.extend(["--write-root", str(_write_root)])
-            config["args"] = args
-
        if os.name == "nt":
            # Windows: cwd=None avoids WinError 267; use absolute script path
            config["cwd"] = None
@@ -29,9 +29,7 @@ _ALWAYS_AVAILABLE_TOOLS: frozenset[str] = frozenset(
        "read_file",
        "write_file",
        "edit_file",
-        "list_directory",
        "search_files",
-        "hashline_edit",
        "set_output",
        "escalate",
    }
@@ -9,8 +9,8 @@ Nodes that need browser access declare ``tools: {policy: "all"}`` in their
 agent.json config.

 Note: the canonical source of truth for browser automation guidance is
-the ``browser-automation`` default skill at
-``core/framework/skills/_default_skills/browser-automation/SKILL.md``.
+the ``browser-automation`` preset skill at
+``core/framework/skills/_preset_skills/browser-automation/SKILL.md``.
 Activate that skill for the full decision tree. This module holds a
 compact subset suitable for direct inlining into a node's system prompt
 when a skill activation is not desired.
@@ -35,7 +35,7 @@ Follow these rules for reliable, efficient browser interaction.
 Use snapshot first for structure and ordinary controls; switch to
 screenshot when snapshot can't find or verify the target. Interaction
 tools (`browser_click`, `browser_type`, `browser_type_focused`,
-`browser_fill`, `browser_scroll`) wait 0.5 s for the page to settle
+`browser_scroll`) wait 0.5 s for the page to settle
 after a successful action, then attach a fresh snapshot under the
 `snapshot` key of their result — so don't call `browser_snapshot`
 separately after an interaction unless you need a newer view. Tune
@@ -140,8 +140,9 @@ shortcut dispatcher requires both), then releases in reverse order.
 ## Tab management

 Close tabs as soon as you're done with them — not only at the end of
-the task. `browser_close(target_id=...)` for one, `browser_close_finished()`
-for a full cleanup. Never accumulate more than 3 open tabs.
+the task. Use `browser_close(tab_id=...)` (or no arg to close the
+active tab); call it for each tab when cleaning up after a multi-tab
+workflow. Never accumulate more than 3 open tabs.
 `browser_tabs` reports an `origin` field: `"agent"` (you own it, close
 when done), `"popup"` (close after extracting), `"startup"`/`"user"`
 (leave alone).
@@ -157,7 +158,7 @@ cookie consent banners if they block content.
 - If `browser_snapshot` fails, try `browser_get_text` with a narrow
  selector as fallback.
 - If `browser_open` fails or the page seems stale, `browser_stop` →
-  `browser_start` → retry.
+  `browser_open(url)` to lazy-create a fresh context.

 ## `browser_evaluate`

@@ -543,6 +543,10 @@ class NodeContext:
    # Dynamic memory provider — when set, EventLoopNode rebuilds the
    # system prompt with the latest memory block each iteration.
    dynamic_memory_provider: Any = None  # Callable[[], str] | None
+    # Surgical skills-catalog refresh, same contract as AgentContext's
+    # field of the same name. Lets workers pick up UI-driven skill
+    # toggles without rebuilding the full system prompt each turn.
+    dynamic_skills_catalog_provider: Any = None  # Callable[[], str] | None

    # Skill system prompts — injected by the skill discovery pipeline
    skills_catalog_prompt: str = ""  # Available skills XML catalog
@@ -331,10 +331,10 @@ class Orchestrator:

        # Strip tool names that aren't registered in this runtime instead of
        # hard-failing. The worker is forked from the queen's tool snapshot
-        # which may include MCP tools the worker's runtime doesn't load (e.g.
-        # coder-tools agent-management tools). Blocking the worker on missing
-        # tools leaves the queen stranded mid-task; stripping + warning lets
-        # the worker proceed with what it does have.
+        # which may include MCP tools the worker's runtime doesn't load.
+        # Blocking the worker on missing tools leaves the queen stranded
+        # mid-task; stripping + warning lets the worker proceed with what
+        # it does have.
        for node in graph.nodes:
            if node.id not in reachable:
                continue
@@ -683,11 +683,10 @@ class Orchestrator:
        # Set per-execution data_dir and agent_id so data tools and
        # spillover files share the same session-scoped directory, and
        # so MCP tools whose server-side schemas mark agent_id as a
-        # required field (list_dir, hashline_edit, replace_file_content,
-        # execute_command_tool, …) get a valid value injected even on
-        # registry instances where agent_loader.setup() didn't populate
-        # the session_context. Without this, FastMCP rejects those
-        # calls with "agent_id is a required property".
+        # required field get a valid value injected even on registry
+        # instances where agent_loader.setup() didn't populate the
+        # session_context. Without this, FastMCP rejects those calls
+        # with "agent_id is a required property".
        _ctx_token = None
        if self._storage_path:
            from framework.loader.tool_registry import ToolRegistry
@@ -44,6 +44,9 @@ class McpRegistryStage(PipelineStage):
        from framework.loader.mcp_registry import MCPRegistry
        from framework.orchestrator.files import FILES_MCP_SERVER_NAME

+        # Bundled defaults (hive_tools / gcu-tools / files-tools / shell-tools)
+        # are seeded inside MCPRegistry.initialize(); resolve_for_agent below
+        # will find them even on a fresh HIVE_HOME.
        registry = MCPRegistry()
        mcp_loaded = False

@@ -26,11 +26,15 @@ class SkillRegistryStage(PipelineStage):
        project_root: str | Path | None = None,
        interactive: bool = True,
        skills_config: Any = None,
+        extra_scope_dirs: list[Any] | None = None,
        **kwargs: Any,
    ) -> None:
        self._project_root = Path(project_root) if project_root else None
        self._interactive = interactive
        self._skills_config = skills_config
+        # Optional list of ExtraScope entries layered between user and
+        # project scope (e.g. ``colony_ui`` for a colony agent's skills/).
+        self._extra_scope_dirs = list(extra_scope_dirs) if extra_scope_dirs else []
        self.skills_manager: Any = None

    async def initialize(self) -> None:
@@ -41,6 +45,7 @@ class SkillRegistryStage(PipelineStage):
            skills_config=self._skills_config or SkillsConfig(),
            project_root=self._project_root,
            interactive=self._interactive,
+            extra_scope_dirs=self._extra_scope_dirs,
        )
        self.skills_manager = SkillsManager(config)
        self.skills_manager.load()
@@ -155,6 +155,17 @@ class SessionState(BaseModel):
    # True after first successful worker execution (gates trigger delivery on restart)
    worker_configured: bool = Field(default=False)

+    # Task-system fields (see framework/tasks).
+    # task_list_id: this session's own task list id (populated on first
+    #   task_create; immutable thereafter). Used for resume reattachment —
+    #   if it differs from resolve_task_list_id(ctx) on resume, a
+    #   TASK_LIST_REATTACH_MISMATCH event is emitted and a fresh list is
+    #   created at the resolved id (the orphan stays on disk).
+    task_list_id: str | None = None
+    # picked_up_from: for worker sessions, the (colony_task_list_id,
+    #   template_task_id) pair this session was spawned for.
+    picked_up_from: list[Any] | None = None
+
    model_config = {"extra": "allow"}

    @property
@@ -1,5 +1,6 @@
 """aiohttp Application factory for the Hive HTTP API server."""

+import hmac
 import logging
 import os
 from pathlib import Path
@@ -21,7 +22,9 @@ _ALLOWED_AGENT_ROOTS: tuple[Path, ...] | None = None

 def _has_encrypted_credentials() -> bool:
    """Return True when an encrypted credential store already exists on disk."""
-    cred_dir = Path.home() / ".hive" / "credentials" / "credentials"
+    from framework.config import HIVE_HOME
+
+    cred_dir = HIVE_HOME / "credentials" / "credentials"
    return cred_dir.is_dir() and any(cred_dir.glob("*.enc"))


@@ -30,17 +33,18 @@ def _get_allowed_agent_roots() -> tuple[Path, ...]:

    Roots are anchored to the repository root (derived from ``__file__``)
    so the allowlist is correct regardless of the process's working
-    directory.
+    directory. The hive-home subtrees honour ``HIVE_HOME`` so the desktop's
+    per-user root is allowed in addition to (or instead of) ``~/.hive``.
    """
    global _ALLOWED_AGENT_ROOTS
    if _ALLOWED_AGENT_ROOTS is None:
-        from framework.config import COLONIES_DIR
+        from framework.config import COLONIES_DIR, HIVE_HOME

        _ALLOWED_AGENT_ROOTS = (
-            COLONIES_DIR.resolve(),  # ~/.hive/colonies/
+            COLONIES_DIR.resolve(),  # $HIVE_HOME/colonies/
            (_REPO_ROOT / "exports").resolve(),  # compat fallback
            (_REPO_ROOT / "examples").resolve(),
-            (Path.home() / ".hive" / "agents").resolve(),
+            (HIVE_HOME / "agents").resolve(),
        )
    return _ALLOWED_AGENT_ROOTS

@@ -62,7 +66,8 @@ def validate_agent_path(agent_path: str | Path) -> Path:
        if resolved.is_relative_to(root) and resolved != root:
            return resolved
    raise ValueError(
-        "agent_path must be inside an allowed directory (~/.hive/colonies/, exports/, examples/, or ~/.hive/agents/)"
+        "agent_path must be inside an allowed directory "
+        "($HIVE_HOME/colonies/, exports/, examples/, or $HIVE_HOME/agents/)"
    )


@@ -94,13 +99,15 @@ def resolve_session(request: web.Request):
 def sessions_dir(session: Session) -> Path:
    """Resolve the worker sessions directory for a session.

-    Storage layout: ~/.hive/agents/{agent_name}/sessions/
+    Storage layout: $HIVE_HOME/agents/{agent_name}/sessions/
    Requires a worker to be loaded (worker_path must be set).
    """
    if session.worker_path is None:
        raise ValueError("No worker loaded — no worker sessions directory")
+    from framework.config import HIVE_HOME
+
    agent_name = session.worker_path.name
-    return Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    return HIVE_HOME / "agents" / agent_name / "sessions"


 # Allowed CORS origins (localhost on any port)
@@ -140,6 +147,47 @@ async def cors_middleware(request: web.Request, handler):
    return response


+@web.middleware
+async def no_cache_api_middleware(request: web.Request, handler):
+    """Prevent browsers from caching API responses.
+
+    Without this, a one-off bad response (e.g. the SPA catch-all leaking
+    index.html for an /api/* URL before a route was registered) can get
+    pinned in the browser's disk cache and replayed forever, since our
+    JSON handlers don't emit ETag/Last-Modified and browsers fall back
+    to heuristic freshness.
+    """
+    try:
+        response = await handler(request)
+    except web.HTTPException as exc:
+        response = exc
+    if request.path.startswith("/api/"):
+        response.headers["Cache-Control"] = "no-store"
+    return response
+
+
+# ---------------------------------------------------------------------------
+# Desktop shared-secret auth middleware.
+#
+# When the runtime is spawned by the Electron main process, a fresh random
+# token is passed via ``HIVE_DESKTOP_TOKEN``. Every request from main must
+# carry the matching ``X-Hive-Token`` header. If the env var is unset (e.g.
+# running ``hive serve`` directly from a terminal), the check is skipped —
+# OSS behaviour is preserved.
+# ---------------------------------------------------------------------------
+_EXPECTED_DESKTOP_TOKEN: str | None = os.environ.get("HIVE_DESKTOP_TOKEN") or None
+
+
+@web.middleware
+async def desktop_auth_middleware(request: web.Request, handler):
+    if _EXPECTED_DESKTOP_TOKEN is None:
+        return await handler(request)
+    provided = request.headers.get("X-Hive-Token", "")
+    if not hmac.compare_digest(provided, _EXPECTED_DESKTOP_TOKEN):
+        return web.json_response({"error": "unauthorized"}, status=401)
+    return await handler(request)
+
+
@web.middleware
 async def error_middleware(request: web.Request, handler):
    """Catch exceptions and return JSON error responses.
@@ -268,7 +316,12 @@ def create_app(model: str | None = None) -> web.Application:
    Returns:
        Configured aiohttp Application ready to run.
    """
-    app = web.Application(middlewares=[cors_middleware, error_middleware])
+    # Desktop mode: the runtime is always a subprocess of the Electron main
+    # process, which reaches it via IPC and the `hive://` custom protocol.
+    # There is no browser origin to authorize, so CORS is unnecessary.
+    # The auth middleware enforces the shared-secret token when the env var
+    # is set (i.e. when Electron spawned us); it is a no-op otherwise.
+    app = web.Application(middlewares=[desktop_auth_middleware, no_cache_api_middleware, error_middleware])

    # Initialize credential store (before SessionManager so it can be shared)
    from framework.credentials.store import CredentialStore
@@ -316,6 +369,18 @@ def create_app(model: str | None = None) -> web.Application:
        queen_tool_registry=None,
    )

+    # Clear orphaned compaction markers from prior server crashes. Without
+    # this, any session whose compaction was interrupted would block the
+    # next colony cold-load for the full await_completion timeout (180s)
+    # before falling through. See compaction_status.sweep_stale_in_progress.
+    try:
+        from framework.config import QUEENS_DIR
+        from framework.server import compaction_status
+
+        compaction_status.sweep_stale_in_progress(QUEENS_DIR)
+    except Exception:
+        logger.debug("compaction_status: startup sweep skipped", exc_info=True)
+
    # Register shutdown hook
    app.on_shutdown.append(_on_shutdown)

@@ -325,16 +390,22 @@ def create_app(model: str | None = None) -> web.Application:
    app.router.add_get("/api/browser/status/stream", handle_browser_status_stream)

    # Register route modules
+    from framework.server.routes_colonies import register_routes as register_colonies_routes
+    from framework.server.routes_colony_tools import register_routes as register_colony_tools_routes
    from framework.server.routes_colony_workers import register_routes as register_colony_worker_routes
    from framework.server.routes_config import register_routes as register_config_routes
    from framework.server.routes_credentials import register_routes as register_credential_routes
    from framework.server.routes_events import register_routes as register_event_routes
    from framework.server.routes_execution import register_routes as register_execution_routes
    from framework.server.routes_logs import register_routes as register_log_routes
+    from framework.server.routes_mcp import register_routes as register_mcp_routes
    from framework.server.routes_messages import register_routes as register_message_routes
    from framework.server.routes_prompts import register_routes as register_prompt_routes
+    from framework.server.routes_queen_tools import register_routes as register_queen_tools_routes
    from framework.server.routes_queens import register_routes as register_queen_routes
    from framework.server.routes_sessions import register_routes as register_session_routes
+    from framework.server.routes_skills import register_routes as register_skills_routes
+    from framework.server.routes_tasks import register_routes as register_task_routes
    from framework.server.routes_workers import register_routes as register_worker_routes

    register_config_routes(app)
@@ -346,11 +417,32 @@ def create_app(model: str | None = None) -> web.Application:
    register_worker_routes(app)
    register_log_routes(app)
    register_queen_routes(app)
+    register_queen_tools_routes(app)
+    register_colonies_routes(app)
+    register_colony_tools_routes(app)
+    register_mcp_routes(app)
    register_colony_worker_routes(app)
    register_prompt_routes(app)
+    register_skills_routes(app)
+    register_task_routes(app)

-    # Static file serving — Option C production mode
-    # If frontend/dist/ exists, serve built frontend files on /
+    # Commercial extensions (optional — only present in hive-desktop-runtime).
+    # Imports lazily so an OSS install without the `commercial` package keeps
+    # working unchanged.
+    try:
+        from commercial.middleware import setup_commercial_middleware
+        from commercial.routes import register_routes as register_commercial_routes
+
+        setup_commercial_middleware(app)
+        register_commercial_routes(app)
+        logger.info("Commercial extensions loaded")
+    except ImportError:
+        pass
+
+    # Serve the built frontend SPA (if frontend/dist exists) so hitting the
+    # API host in a browser loads the dashboard instead of 404'ing. In
+    # Electron/desktop mode the renderer still loads from file:// and
+    # ignores this; in dev mode Vite is used instead.
    _setup_static_serving(app)

    return app
@@ -147,3 +147,55 @@ async def await_completion(
            )
            return last
        await asyncio.sleep(poll)
+
+
+def sweep_stale_in_progress(queens_root: Path) -> int:
+    """Rewrite any orphaned ``in_progress`` markers under ``queens_root`` to
+    ``failed``. Returns the count of rewritten markers.
+
+    Whatever process owned the original compaction is gone (server crash,
+    SIGKILL, etc.), so leaving the marker at ``in_progress`` would cause every
+    subsequent colony cold-load for that queen session to wait the full
+    ``await_completion`` timeout (default 180s) before falling through.
+
+    Called once during server bootstrap. Best-effort: any per-file failure is
+    logged and skipped — the sweep should never prevent the server from
+    coming up.
+    """
+    if not queens_root.exists():
+        return 0
+    cleaned = 0
+    try:
+        for queen_dir in queens_root.iterdir():
+            if not queen_dir.is_dir():
+                continue
+            sessions_dir = queen_dir / "sessions"
+            if not sessions_dir.exists():
+                continue
+            try:
+                for session_dir in sessions_dir.iterdir():
+                    if not session_dir.is_dir():
+                        continue
+                    status = get_status(session_dir)
+                    if status is None or status.get("status") != "in_progress":
+                        continue
+                    mark_failed(session_dir, "server restarted while compaction was in progress")
+                    cleaned += 1
+            except OSError:
+                logger.debug(
+                    "compaction_status: sweep failed under %s",
+                    sessions_dir,
+                    exc_info=True,
+                )
+    except OSError:
+        logger.debug(
+            "compaction_status: sweep failed under %s",
+            queens_root,
+            exc_info=True,
+        )
+    if cleaned:
+        logger.info(
+            "compaction_status: cleared %d stale 'in_progress' marker(s) at startup",
+            cleaned,
+        )
+    return cleaned
@@ -253,6 +253,92 @@ async def materialize_queen_identity(
        )


+def build_queen_tool_registry_bare() -> tuple[Any, dict[str, list[dict[str, Any]]]]:
+    """Build a Queen ``ToolRegistry`` and a (server_name → tools) catalog.
+
+    Used by the Tool Library GET route to populate the MCP tool surface
+    without needing a live queen session. We DO NOT register queen
+    lifecycle tools here (they require a Session stub); the catalog only
+    covers MCP-origin tools, which is what the allowlist gates.
+
+    Loading MCP servers spawns subprocesses, so call this once per
+    backend process and cache the result.
+    """
+    from pathlib import Path
+
+    import framework.agents.queen as _queen_pkg
+    from framework.loader.mcp_registry import MCPRegistry
+    from framework.loader.tool_registry import ToolRegistry
+
+    queen_registry = ToolRegistry()
+    queen_pkg_dir = Path(_queen_pkg.__file__).parent
+
+    mcp_config = queen_pkg_dir / "mcp_servers.json"
+    if mcp_config.exists():
+        try:
+            queen_registry.load_mcp_config(mcp_config)
+        except Exception:
+            logger.warning("build_queen_tool_registry_bare: MCP config failed", exc_info=True)
+
+    try:
+        reg = MCPRegistry()
+        reg.initialize()
+        if (queen_pkg_dir / "mcp_registry.json").is_file():
+            queen_registry.set_mcp_registry_agent_path(queen_pkg_dir)
+        registry_configs, selection_max_tools = reg.load_agent_selection(queen_pkg_dir)
+
+        already = {cfg.get("name") for cfg in registry_configs if cfg.get("name")}
+        extra: list[str] = []
+        try:
+            for entry in reg.list_installed():
+                if entry.get("source") != "local":
+                    continue
+                if not entry.get("enabled", True):
+                    continue
+                name = entry.get("name")
+                if name and name not in already:
+                    extra.append(name)
+        except Exception:
+            pass
+        if extra:
+            try:
+                extra_configs = reg.resolve_for_agent(include=extra)
+                registry_configs = list(registry_configs) + [reg._server_config_to_dict(c) for c in extra_configs]
+            except Exception:
+                logger.debug("build_queen_tool_registry_bare: resolve_for_agent(extra) failed", exc_info=True)
+
+        if registry_configs:
+            queen_registry.load_registry_servers(
+                registry_configs,
+                preserve_existing_tools=True,
+                log_collisions=False,
+                max_tools=selection_max_tools,
+            )
+    except Exception:
+        logger.warning("build_queen_tool_registry_bare: MCP registry load failed", exc_info=True)
+
+    # Build the catalog.
+    tools_by_name = queen_registry.get_tools()
+    server_map = dict(getattr(queen_registry, "_mcp_server_tools", {}) or {})
+    catalog: dict[str, list[dict[str, Any]]] = {}
+    for server_name in sorted(server_map):
+        entries: list[dict[str, Any]] = []
+        for tool_name in sorted(server_map[server_name]):
+            tool = tools_by_name.get(tool_name)
+            if tool is None:
+                continue
+            entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        catalog[server_name] = entries
+
+    return queen_registry, catalog
+
+
 async def create_queen(
    session: Session,
    session_manager: Any,
@@ -273,6 +359,7 @@ async def create_queen(
        queen_goal,
        queen_loop_config as _base_loop_config,
    )
+    from framework.config import get_max_tokens as _get_max_tokens
    from framework.agents.queen.nodes import (
        _QUEEN_INCUBATING_TOOLS,
        _QUEEN_INDEPENDENT_TOOLS,
@@ -285,7 +372,6 @@ async def create_queen(
        _queen_role_independent,
        _queen_role_reviewing,
        _queen_role_working,
-        _queen_style,
        _queen_tools_incubating,
        _queen_tools_independent,
        _queen_tools_reviewing,
@@ -326,6 +412,45 @@ async def create_queen(
            if (queen_pkg_dir / "mcp_registry.json").is_file():
                queen_registry.set_mcp_registry_agent_path(queen_pkg_dir)
            registry_configs, selection_max_tools = registry.load_agent_selection(queen_pkg_dir)
+
+            # Auto-include every user-added local MCP server that the repo
+            # selection hasn't already loaded. Users register servers via
+            # the `/api/mcp/servers` route (or `hive mcp add`); they live in
+            # ~/.hive/mcp_registry/installed.json with source == "local".
+            # New servers take effect on the next queen session start; the
+            # prompt cache and ToolRegistry are still loaded once per boot.
+            already_loaded_names = {cfg.get("name") for cfg in registry_configs if cfg.get("name")}
+            extra_names: list[str] = []
+            try:
+                for entry in registry.list_installed():
+                    if entry.get("source") != "local":
+                        continue
+                    if not entry.get("enabled", True):
+                        continue
+                    name = entry.get("name")
+                    if not name or name in already_loaded_names:
+                        continue
+                    extra_names.append(name)
+            except Exception:
+                logger.debug("Queen: list_installed() failed while auto-including user servers", exc_info=True)
+
+            if extra_names:
+                try:
+                    extra_configs = registry.resolve_for_agent(include=extra_names)
+                    extra_dicts = [registry._server_config_to_dict(c) for c in extra_configs]
+                    registry_configs = list(registry_configs) + extra_dicts
+                    logger.info(
+                        "Queen: auto-including %d user-added MCP server(s): %s",
+                        len(extra_dicts),
+                        [c.get("name") for c in extra_dicts],
+                    )
+                except Exception:
+                    logger.warning(
+                        "Queen: failed to resolve user-added MCP servers %s",
+                        extra_names,
+                        exc_info=True,
+                    )
+
            if registry_configs:
                results = queen_registry.load_registry_servers(
                    registry_configs,
@@ -363,6 +488,21 @@ async def create_queen(
        phase_state=phase_state,
    )

+    # ---- Task system tools --------------------------------------------
+    # Every queen gets the four session task tools. Queens-of-colony
+    # additionally get the colony_template_* tools (gated by colony_id).
+    from framework.tasks.tools import (
+        register_colony_template_tools,
+        register_task_tools,
+    )
+
+    register_task_tools(queen_registry)
+    _colony_id_for_queen = getattr(session, "colony_id", None) or getattr(
+        getattr(session, "colony_runtime", None), "_colony_id", None
+    )
+    if _colony_id_for_queen:
+        register_colony_template_tools(queen_registry, colony_id=_colony_id_for_queen)
+
    # ---- Colony runtime check (only when worker is loaded) ----------------
    if session.colony_runtime:
        from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
@@ -404,7 +544,7 @@ async def create_queen(
    phase_state.incubating_tools = [t for t in queen_tools if t.name in incubating_names]

    # Independent phase gets core tools + all MCP tools not claimed by any
-    # other phase (coder-tools file I/O, gcu-tools browser, etc.).
+    # other phase (files-tools file I/O, gcu-tools browser, etc.).
    all_phase_names = independent_names | incubating_names | working_names | reviewing_names
    mcp_tools = [t for t in queen_tools if t.name not in all_phase_names]
    phase_state.independent_tools = [t for t in queen_tools if t.name in independent_names] + mcp_tools
@@ -417,6 +557,71 @@ async def create_queen(
        sorted(t.name for t in phase_state.incubating_tools),
    )

+    # ---- Per-queen MCP tool allowlist --------------------------------
+    # Capture the set of MCP-origin tool names so the allowlist in
+    # ``QueenPhaseState`` only gates MCP tools (lifecycle and synthetic
+    # tools always pass through). Then apply the queen profile's stored
+    # allowlist (if any) and memoize the filtered independent tool list.
+    mcp_server_tools_map: dict[str, set[str]] = dict(getattr(queen_registry, "_mcp_server_tools", {}))
+    phase_state.mcp_tool_names_all = set().union(*mcp_server_tools_map.values()) if mcp_server_tools_map else set()
+    # The queen's MCP tool allowlist now lives in a dedicated
+    # ``tools.json`` sidecar next to ``profile.yaml``. ``load_queen_tools_config``
+    # migrates any legacy ``enabled_mcp_tools`` field out of profile.yaml
+    # on first read, so existing installs upgrade silently.
+    from framework.agents.queen.queen_tools_config import load_queen_tools_config
+
+    # Build a minimal catalog for default-tool resolution. The full
+    # ``session_manager._mcp_tool_catalog`` snapshot is written further
+    # down the flow; a queen booted for the first time needs the catalog
+    # now so ``@server:NAME`` shorthands in the role-default table can
+    # expand against the just-loaded MCP servers.
+    _boot_catalog: dict[str, list[dict]] = {
+        srv: [{"name": name} for name in sorted(names)] for srv, names in mcp_server_tools_map.items()
+    }
+    # ``queen_dir`` is ``queens/<queen_id>/sessions/<session_id>``; the
+    # allowlist sidecar is keyed by queen_id, not session_id.
+    phase_state.enabled_mcp_tools = load_queen_tools_config(session.queen_name, _boot_catalog)
+    phase_state.rebuild_independent_filter()
+    if phase_state.enabled_mcp_tools is not None:
+        total_mcp = len(phase_state.mcp_tool_names_all)
+        allowed_mcp = len(set(phase_state.enabled_mcp_tools) & phase_state.mcp_tool_names_all)
+        logger.info(
+            "Queen: per-queen MCP allowlist active — %d of %d MCP tools enabled",
+            allowed_mcp,
+            total_mcp,
+        )
+
+    # ---- MCP tool catalog for the frontend ---------------------------
+    # Snapshot per-server tool metadata so the Queen Tools API can render
+    # the tool surface without spawning MCP subprocesses. Keyed by server
+    # name so the UI can group tools by origin. Updated every time a
+    # queen boots, so installing a new server and starting a new queen
+    # session refreshes the catalog.
+    mcp_tool_catalog: dict[str, list[dict[str, Any]]] = {}
+    tools_by_name = {t.name: t for t in queen_tools}
+    for server_name, tool_names in mcp_server_tools_map.items():
+        server_entries: list[dict[str, Any]] = []
+        for tool_name in sorted(tool_names):
+            tool = tools_by_name.get(tool_name)
+            if tool is None:
+                continue
+            server_entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        mcp_tool_catalog[server_name] = server_entries
+    # All queens share one MCP registry, so the catalog is a manager-level
+    # fact; stash it on the SessionManager so the Queen Tools route can
+    # render the tool list even when no queen session is currently live.
+    if session_manager is not None:
+        try:
+            session_manager._mcp_tool_catalog = mcp_tool_catalog  # type: ignore[attr-defined]
+        except Exception:
+            logger.debug("Queen: could not attach mcp_tool_catalog to manager", exc_info=True)
+
    # ---- Global + queen-scoped memory ----------------------------------
    global_dir, queen_mem_dir = initialize_memory_scopes(session, phase_state)

@@ -441,7 +646,6 @@ async def create_queen(
        (
            _queen_character_core
            + _queen_role_independent
-            + _queen_style
            + _queen_tools_independent
            + _queen_behavior_always
            + _queen_behavior_independent
@@ -449,39 +653,49 @@ async def create_queen(
        _has_vision,
    )
    phase_state.prompt_incubating = finalize_queen_prompt(
-        (
-            _queen_character_core
-            + _queen_role_incubating
-            + _queen_style
-            + _queen_tools_incubating
-            + _queen_behavior_always
-        ),
+        (_queen_character_core + _queen_role_incubating + _queen_tools_incubating + _queen_behavior_always),
        _has_vision,
    )
    phase_state.prompt_working = finalize_queen_prompt(
-        (_queen_character_core + _queen_role_working + _queen_style + _queen_tools_working + _queen_behavior_always),
+        (_queen_character_core + _queen_role_working + _queen_tools_working + _queen_behavior_always),
        _has_vision,
    )
    phase_state.prompt_reviewing = finalize_queen_prompt(
-        (
-            _queen_character_core
-            + _queen_role_reviewing
-            + _queen_style
-            + _queen_tools_reviewing
-            + _queen_behavior_always
-        ),
+        (_queen_character_core + _queen_role_reviewing + _queen_tools_reviewing + _queen_behavior_always),
        _has_vision,
    )

    # ---- Default skill protocols -------------------------------------
    _queen_skill_dirs: list[str] = []
    try:
+        from framework.config import QUEENS_DIR
+        from framework.skills.discovery import ExtraScope
        from framework.skills.manager import SkillsManager, SkillsManagerConfig

-        # Pass project_root so user-scope skills (~/.hive/skills/, ~/.agents/skills/)
-        # are discovered. Queen has no agent-specific project root, so we use its
-        # own directory — the value just needs to be non-None to enable user-scope scanning.
-        _queen_skills_mgr = SkillsManager(SkillsManagerConfig(project_root=Path(__file__).parent))
+        # Queen home backs the queen-UI skill scope and the queen's
+        # override store. The directory already exists (or is created on
+        # demand by queen_profiles.py); treat a missing queen_name as the
+        # default queen to preserve backwards compatibility.
+        _queen_id = getattr(session, "queen_name", None) or "default"
+        _queen_home = QUEENS_DIR / _queen_id
+        _queen_skills_mgr = SkillsManager(
+            SkillsManagerConfig(
+                queen_id=_queen_id,
+                queen_overrides_path=_queen_home / "skills_overrides.json",
+                extra_scope_dirs=[
+                    ExtraScope(
+                        directory=_queen_home / "skills",
+                        label="queen_ui",
+                        priority=2,
+                    )
+                ],
+                # No project_root — queen's project is her own identity;
+                # user-scope discovery still runs without one.
+                project_root=None,
+                skip_community_discovery=True,
+                interactive=False,
+            )
+        )
        _queen_skills_mgr.load()
        phase_state.protocols_prompt = _queen_skills_mgr.protocols_prompt
        phase_state.skills_catalog_prompt = _queen_skills_mgr.skills_catalog_prompt
@@ -520,8 +734,37 @@ async def create_queen(

    # ---- Recall on each real user turn --------------------------------
    async def _recall_on_user_input(event: AgentEvent) -> None:
-        """Re-select memories when real user input arrives."""
-        await _refresh_recall_cache((event.data or {}).get("content", ""))
+        """On real user input, freeze the dynamic system-prompt suffix and
+        refresh recall memories in the background.
+
+        The EventBus drops handlers that exceed 15s, so we MUST return fast.
+        Recall selection queries the LLM and can take >15s on slow backends;
+        we fire it off as a background task and re-stamp the suffix when it
+        completes. The immediate refresh_dynamic_suffix call stamps a fresh
+        timestamp using the last-known recall blocks so every iteration of
+        THIS user turn sees a byte-stable prompt (prompt cache hits on the
+        static block). Phase-change injections and worker-report injections
+        go through agent_loop.inject_event() and do NOT publish
+        CLIENT_INPUT_RECEIVED, so this runs exactly once per real user turn.
+        """
+        query = (event.data or {}).get("content", "")
+        # Immediate: stamp "now" into the frozen suffix, using whatever
+        # recall blocks we already cached (from the prior turn or seeding).
+        phase_state.refresh_dynamic_suffix()
+
+        async def _bg_refresh() -> None:
+            try:
+                await _refresh_recall_cache(query)
+                # Re-stamp with the fresh recall blocks. Any iteration that
+                # read the suffix before this point used the older recall
+                # — acceptable; recall was already eventual-consistency.
+                phase_state.refresh_dynamic_suffix()
+            except Exception:
+                logger.debug("background recall refresh failed", exc_info=True)
+
+        import asyncio as _asyncio
+
+        _asyncio.create_task(_bg_refresh())

    session.event_bus.subscribe(
        [EventType.CLIENT_INPUT_RECEIVED],
@@ -631,6 +874,9 @@ async def create_queen(
            except Exception:
                logger.debug("recall: initial seeding failed", exc_info=True)

+        # Freeze the dynamic suffix once so the first real turn sends a
+        # byte-stable prompt even before CLIENT_INPUT_RECEIVED fires.
+        phase_state.refresh_dynamic_suffix()
        return HookResult(system_prompt=phase_state.get_current_prompt())

    # ---- Colony preparation -------------------------------------------
@@ -675,10 +921,21 @@ async def create_queen(
        # token stays local to this task.
        try:
            from framework.loader.tool_registry import ToolRegistry
+            from framework.tasks.scoping import session_task_list_id

-            ToolRegistry.set_execution_context(profile=session.id)
+            queen_agent_id = getattr(session, "agent_id", None) or "queen"
+            queen_list_id = session_task_list_id(queen_agent_id, session.id)
+            colony_id = getattr(session, "colony_id", None) or getattr(
+                getattr(session, "colony_runtime", None), "_colony_id", None
+            )
+            ToolRegistry.set_execution_context(
+                profile=session.id,
+                agent_id=queen_agent_id,
+                task_list_id=queen_list_id,
+                colony_id=colony_id,
+            )
        except Exception:
-            logger.debug("Queen: failed to set browser profile for session %s", session.id, exc_info=True)
+            logger.debug("Queen: failed to set execution context for session %s", session.id, exc_info=True)
        try:
            lc = _queen_loop_config
            queen_loop_config = LoopConfig(
@@ -726,11 +983,17 @@ async def create_queen(
                llm=session.llm,
                available_tools=queen_tools,
                goal_context=queen_goal.to_prompt_context(),
-                max_tokens=lc.get("max_tokens", 8192),
+                # Honor configuration.json (llm.max_tokens) instead of
+                # hard-defaulting to 8192. The legacy fallback ignored both
+                # the user's saved ceiling AND the model's actual output
+                # capacity (e.g. glm-5.1 / kimi-k2.5 both support 32k out),
+                # which silently truncated long tool-emitting turns.
+                max_tokens=lc.get("max_tokens", _get_max_tokens()),
                stream_id="queen",
                execution_id=session.id,
                dynamic_tools_provider=phase_state.get_current_tools,
-                dynamic_prompt_provider=phase_state.get_current_prompt,
+                dynamic_prompt_provider=phase_state.get_static_prompt,
+                dynamic_prompt_suffix_provider=phase_state.get_dynamic_suffix,
                iteration_metadata_provider=lambda: {"phase": phase_state.phase},
                skills_catalog_prompt=phase_state.skills_catalog_prompt,
                protocols_prompt=phase_state.protocols_prompt,
@@ -0,0 +1,646 @@
+"""HTTP routes for colony import/export — moving a colony spec between hosts.
+
+Today, just the import side: accept a `tar.gz` and unpack it into HIVE_HOME so
+a desktop client (or any external mover) can hand a colony to a remote runtime
+to run.
+
+  POST /api/colonies/import   -- multipart/form-data
+    file              required  -- .tar / .tar.gz / .tar.bz2 / .tar.xz
+    name              optional  -- override the colony name (legacy single-root
+                                   archives only); defaults to the archive's
+                                   single top-level directory
+    replace_existing  optional  -- "true" to overwrite, else 409 on conflict
+
+The desktop sends a *multi-root* tar so the queen sees a colony's full state
+(not just metadata + data) on resume. Recognised top-level prefixes:
+
+  colonies/<name>/...                                 → HIVE_HOME/colonies/<name>/...
+  agents/<name>/worker/...                            → HIVE_HOME/agents/<name>/worker/...
+  agents/queens/<queen>/sessions/<sid>/...            → HIVE_HOME/agents/queens/<queen>/sessions/<sid>/...
+
+Anything outside those is rejected. For backwards compat with older clients
+that tar `<name>/...` directly (single colony dir, no `colonies/` wrapper),
+the handler falls back to the legacy single-root flow when no recognised
+multi-root prefix is found.
+"""
+
+from __future__ import annotations
+
+import io
+import logging
+import re
+import shutil
+import tarfile
+from pathlib import Path
+
+from aiohttp import web
+
+from framework.config import COLONIES_DIR
+
+logger = logging.getLogger(__name__)
+
+# Matches the convention used elsewhere in the codebase (see
+# routes_colony_workers and queen_lifecycle_tools): lowercase alphanumerics
+# and underscores only. No dots, no slashes — names are filesystem segments.
+_COLONY_NAME_RE = re.compile(r"^[a-z0-9_]+$")
+
+# Conservative segment validator for the queen's session id (date-stamped UUID
+# tail like ``session_20260415_175106_eca07a69``) and queen name slug
+# (``queen_technology``). Same charset as colony names — the codebase already
+# normalises both to ``[a-z0-9_]+`` everywhere they're created, so accepting
+# a wider charset here would just introduce a foothold for path mischief.
+_SESSION_SEGMENT_RE = re.compile(r"^[a-z0-9_]+$")
+
+# 100 MB cap on upload size. The multi-root tar carries worker conversations
+# (often 100s of small JSON parts) plus the queen's forked session, so the
+# legacy 50 MB ceiling is too tight. Anything bigger probably shouldn't be
+# pushed wholesale anyway.
+_MAX_UPLOAD_BYTES = 100 * 1024 * 1024
+
+
+def _agents_dir() -> Path:
+    """``COLONIES_DIR`` resolves to ``HIVE_HOME/colonies``; ``agents/`` is
+    the sibling. Resolved per-call so tests that monkeypatch
+    ``COLONIES_DIR`` propagate without a second patch."""
+    return Path(COLONIES_DIR).parent / "agents"
+
+
+def _validate_colony_name(name: str) -> str | None:
+    """Return an error message if name isn't a valid colony name, else None."""
+    if not name:
+        return "colony name is required"
+    if len(name) > 64:
+        return "colony name too long (max 64 chars)"
+    if not _COLONY_NAME_RE.match(name):
+        return "colony name must match [a-z0-9_]+"
+    return None
+
+
+def _validate_session_segment(seg: str, label: str) -> str | None:
+    """Validate a path segment we're going to plumb into a destination dir."""
+    if not seg:
+        return f"{label} is required"
+    if len(seg) > 128:
+        return f"{label} too long (max 128 chars)"
+    if not _SESSION_SEGMENT_RE.match(seg):
+        return f"{label} must match [a-zA-Z0-9_-]+"
+    return None
+
+
+def _archive_top_level(tf: tarfile.TarFile) -> tuple[str | None, str | None]:
+    """Find the archive's single top-level directory, if it has one.
+
+    Used only for the legacy single-root path. Returns ``(name, error)``.
+    Allows the archive to optionally include a leading ``./`` prefix.
+    """
+    tops: set[str] = set()
+    for member in tf.getmembers():
+        if not member.name or member.name.startswith("/"):
+            return None, f"invalid member path: {member.name!r}"
+        parts = Path(member.name).parts
+        if not parts or parts[0] == "..":
+            return None, f"invalid member path: {member.name!r}"
+        first = parts[0] if parts[0] != "." else (parts[1] if len(parts) > 1 else "")
+        if first:
+            tops.add(first)
+    if len(tops) != 1:
+        return None, "archive must contain exactly one top-level directory"
+    return next(iter(tops)), None
+
+
+def _has_multi_root_prefix(tf: tarfile.TarFile) -> bool:
+    """True iff any member name starts with a recognised multi-root prefix.
+
+    The legacy shape (`<name>/...`) doesn't match either prefix, so this lets
+    us route old and new clients through the same endpoint.
+    """
+    for member in tf.getmembers():
+        name = member.name
+        if name.startswith("./"):
+            name = name[2:]
+        if name.startswith("colonies/") or name.startswith("agents/"):
+            return True
+    return False
+
+
+def _normalise_member_name(name: str) -> str:
+    """Strip a leading ``./`` if present; reject absolute or empty names."""
+    if name.startswith("./"):
+        name = name[2:]
+    return name
+
+
+def _safe_extract_tar(tf: tarfile.TarFile, dest: Path, *, strip_prefix: str) -> tuple[int, str | None]:
+    """Extract every member of ``tf`` whose name starts with ``strip_prefix/``
+    into ``dest``, with the prefix stripped off.
+
+    Each member's resolved path must stay under ``dest``; symlinks, hardlinks,
+    and device/fifo entries are rejected. Returns ``(files_extracted, error)``;
+    on error the caller is responsible for cleanup.
+
+    Members outside ``strip_prefix`` are silently *skipped* (not an error) so
+    the caller can call this multiple times on the same tar with different
+    prefixes — once per recognised root.
+    """
+    base = dest.resolve()
+    base.mkdir(parents=True, exist_ok=True)
+    files_extracted = 0
+    prefix_with_sep = f"{strip_prefix}/" if strip_prefix else ""
+
+    for member in tf.getmembers():
+        name = _normalise_member_name(member.name)
+        if not name:
+            continue
+        if strip_prefix:
+            if name == strip_prefix:
+                # The top-level dir entry itself; dest already exists.
+                continue
+            if not name.startswith(prefix_with_sep):
+                # Belongs to a different root in a multi-root tar; skip.
+                continue
+            rel = name[len(prefix_with_sep) :]
+        else:
+            rel = name
+        if not rel:
+            continue
+        if ".." in Path(rel).parts:
+            return files_extracted, f"path traversal in member: {member.name!r}"
+        if member.issym() or member.islnk():
+            return (
+                files_extracted,
+                f"symlinks/hardlinks not supported: {member.name!r}",
+            )
+        if member.isdev() or member.isfifo():
+            return (
+                files_extracted,
+                f"device/fifo not supported: {member.name!r}",
+            )
+
+        target = (base / rel).resolve()
+        try:
+            target.relative_to(base)
+        except ValueError:
+            return files_extracted, f"member escapes destination: {member.name!r}"
+
+        if member.isdir():
+            target.mkdir(parents=True, exist_ok=True)
+            continue
+
+        target.parent.mkdir(parents=True, exist_ok=True)
+        src = tf.extractfile(member)
+        if src is None:
+            return files_extracted, f"unsupported member: {member.name!r}"
+        with target.open("wb") as out:
+            shutil.copyfileobj(src, out)
+        target.chmod(member.mode & 0o755 if member.mode else 0o644)
+        files_extracted += 1
+
+    return files_extracted, None
+
+
+def _classify_multi_root_member(name: str) -> tuple[str, str] | None:
+    """Recognise a multi-root tar member and return ``(root, top_dir)``.
+
+    ``root`` is one of ``"colonies"``, ``"agents_worker"``, ``"agents_queen"``;
+    ``top_dir`` is the prefix to feed to ``_safe_extract_tar`` (the part of
+    the path that should be stripped before joining with the destination
+    base). Returns None for members that don't match any recognised root.
+
+    The caller pre-validates segments before extraction, so this is purely
+    structural: which root, what the strip prefix should be.
+    """
+    parts = Path(name).parts
+    if not parts:
+        return None
+    if parts[0] == "colonies" and len(parts) >= 2:
+        return ("colonies", f"colonies/{parts[1]}")
+    if parts[0] == "agents" and len(parts) >= 2:
+        # agents/queens/<queen>/sessions/<sid>/...  vs  agents/<name>/worker/...
+        if parts[1] == "queens":
+            if len(parts) >= 5 and parts[3] == "sessions":
+                return ("agents_queen", f"agents/queens/{parts[2]}/sessions/{parts[4]}")
+            return None
+        # Plain agent — only the worker subtree is exported.
+        if len(parts) >= 3 and parts[2] == "worker":
+            return ("agents_worker", f"agents/{parts[1]}/worker")
+        return None
+    return None
+
+
+def _plan_multi_root(
+    tf: tarfile.TarFile,
+) -> tuple[dict[str, dict[str, str]], str | None]:
+    """Walk the tar once and group entries by root.
+
+    Returns ``(groups, error)`` where ``groups`` is keyed by root kind
+    (``"colonies"`` etc.) and each entry maps the strip prefix to its
+    destination directory under HIVE_HOME. Validates name segments so we
+    bail before unpacking when something looks off.
+    """
+    groups: dict[str, dict[str, str]] = {
+        "colonies": {},
+        "agents_worker": {},
+        "agents_queen": {},
+    }
+    seen_unrecognised: set[str] = set()
+    for member in tf.getmembers():
+        name = _normalise_member_name(member.name)
+        if not name or name.startswith("/") or ".." in Path(name).parts:
+            return groups, f"invalid member path: {member.name!r}"
+        classified = _classify_multi_root_member(name)
+        if classified is None:
+            # Track unique top-level dirs to give a useful error if nothing
+            # ended up classified.
+            seen_unrecognised.add(Path(name).parts[0])
+            continue
+        kind, prefix = classified
+        if prefix in groups[kind]:
+            continue
+        # Validate path segments per-kind so we never plumb dirty input into
+        # a destination we don't fully control.
+        prefix_parts = Path(prefix).parts
+        if kind == "colonies":
+            err = _validate_colony_name(prefix_parts[1])
+            if err:
+                return groups, err
+            dest = str(COLONIES_DIR / prefix_parts[1])
+        elif kind == "agents_worker":
+            err = _validate_colony_name(prefix_parts[1])
+            if err:
+                return groups, err
+            dest = str(_agents_dir() / prefix_parts[1] / "worker")
+        elif kind == "agents_queen":
+            queen, sid = prefix_parts[2], prefix_parts[4]
+            err = _validate_session_segment(queen, "queen name")
+            if err:
+                return groups, err
+            err = _validate_session_segment(sid, "queen session id")
+            if err:
+                return groups, err
+            dest = str(_agents_dir() / "queens" / queen / "sessions" / sid)
+        else:  # pragma: no cover — defensive
+            continue
+        groups[kind][prefix] = dest
+
+    if not any(groups.values()):
+        roots = ", ".join(sorted(seen_unrecognised)) or "(none)"
+        return (
+            groups,
+            "tar has no recognised top-level prefix "
+            f"(expected colonies/, agents/<name>/worker/, "
+            f"agents/queens/<queen>/sessions/<sid>/; got: {roots})",
+        )
+    return groups, None
+
+
+async def _read_upload(
+    request: web.Request,
+) -> tuple[bytes | None, str | None, dict[str, str], web.Response | None]:
+    """Drain the multipart upload. Returns ``(bytes, filename, form, error)``."""
+    if not request.content_type.startswith("multipart/"):
+        return None, None, {}, web.json_response({"error": "expected multipart/form-data"}, status=400)
+    reader = await request.multipart()
+    upload: bytes | None = None
+    upload_filename: str | None = None
+    form: dict[str, str] = {}
+    while True:
+        part = await reader.next()
+        if part is None:
+            break
+        if part.name == "file":
+            buf = io.BytesIO()
+            while True:
+                chunk = await part.read_chunk(size=65536)
+                if not chunk:
+                    break
+                buf.write(chunk)
+                if buf.tell() > _MAX_UPLOAD_BYTES:
+                    return (
+                        None,
+                        None,
+                        {},
+                        web.json_response(
+                            {"error": f"upload exceeds {_MAX_UPLOAD_BYTES} bytes"},
+                            status=413,
+                        ),
+                    )
+            upload = buf.getvalue()
+            upload_filename = part.filename or ""
+        else:
+            form[part.name or ""] = (await part.text()).strip()
+    if upload is None:
+        return None, None, {}, web.json_response({"error": "missing 'file' part"}, status=400)
+    return upload, upload_filename, form, None
+
+
+async def handle_import_colony(request: web.Request) -> web.Response:
+    """POST /api/colonies/import — unpack a colony tarball into HIVE_HOME."""
+    upload, upload_filename, form, err_resp = await _read_upload(request)
+    if err_resp is not None:
+        return err_resp
+    assert upload is not None  # for the type checker
+
+    replace_existing = form.get("replace_existing", "false").lower() == "true"
+    name_override = form.get("name", "").strip() or None
+
+    try:
+        tf = tarfile.open(fileobj=io.BytesIO(upload), mode="r:*")
+    except tarfile.TarError as err:
+        return web.json_response({"error": f"invalid tar archive: {err}"}, status=400)
+
+    try:
+        if _has_multi_root_prefix(tf):
+            return await _import_multi_root(tf, replace_existing, upload_filename, len(upload))
+        return await _import_legacy_single_root(tf, name_override, replace_existing, upload_filename, len(upload))
+    finally:
+        tf.close()
+
+
+async def _import_legacy_single_root(
+    tf: tarfile.TarFile,
+    name_override: str | None,
+    replace_existing: bool,
+    upload_filename: str | None,
+    upload_size: int,
+) -> web.Response:
+    """Legacy path: tar contains `<name>/...` only, route to colonies/<name>/.
+
+    Kept verbatim from the previous handler so existing test fixtures and
+    older desktop builds keep working during a partial rollout.
+    """
+    top, top_err = _archive_top_level(tf)
+    if top_err or top is None:
+        return web.json_response({"error": top_err}, status=400)
+
+    colony_name = name_override or top
+    name_err = _validate_colony_name(colony_name)
+    if name_err:
+        return web.json_response({"error": name_err}, status=400)
+
+    target = COLONIES_DIR / colony_name
+    if target.exists():
+        if not replace_existing:
+            return web.json_response(
+                {
+                    "error": "colony already exists",
+                    "name": colony_name,
+                    "hint": "set replace_existing=true to overwrite",
+                },
+                status=409,
+            )
+        shutil.rmtree(target)
+
+    files_extracted, extract_err = _safe_extract_tar(tf, target, strip_prefix=top)
+    if extract_err:
+        shutil.rmtree(target, ignore_errors=True)
+        return web.json_response({"error": extract_err}, status=400)
+
+    logger.info(
+        "Imported colony %s (legacy, %d files) from upload %s (%d bytes)",
+        colony_name,
+        files_extracted,
+        upload_filename or "<unnamed>",
+        upload_size,
+    )
+    return web.json_response(
+        {
+            "name": colony_name,
+            "path": str(target),
+            "files_imported": files_extracted,
+            "replaced": replace_existing,
+        },
+        status=201,
+    )
+
+
+async def _import_multi_root(
+    tf: tarfile.TarFile,
+    replace_existing: bool,
+    upload_filename: str | None,
+    upload_size: int,
+) -> web.Response:
+    """New path: tar contains `colonies/<name>/...` plus optional agents trees.
+
+    Each recognised root is extracted to its corresponding HIVE_HOME subtree
+    using the same traversal-safe walker as the legacy path. ``replace_existing``
+    governs the colonies dir conflict; the agents trees overwrite in place
+    (worker conversations and queen sessions are append-mostly stores —
+    overwriting a stale subset is fine, and adding the conflict gate would
+    block legitimate re-pushes from a different desktop session).
+    """
+    plan, plan_err = _plan_multi_root(tf)
+    if plan_err:
+        return web.json_response({"error": plan_err}, status=400)
+
+    # Conflict guard for the colonies root only — these are user-visible
+    # entities the desktop expects to control overwrite of.
+    primary_colony_name: str | None = None
+    primary_colony_target: Path | None = None
+    for prefix, dest in plan["colonies"].items():
+        target = Path(dest)
+        primary_colony_name = Path(prefix).parts[1]
+        primary_colony_target = target
+        if target.exists() and not replace_existing:
+            return web.json_response(
+                {
+                    "error": "colony already exists",
+                    "name": primary_colony_name,
+                    "hint": "set replace_existing=true to overwrite",
+                },
+                status=409,
+            )
+        if target.exists() and replace_existing:
+            shutil.rmtree(target)
+
+    # The colonies/ root is required. agents/ trees are optional follow-ons.
+    if not plan["colonies"]:
+        return web.json_response(
+            {
+                "error": "tar missing required colonies/<name>/ root",
+            },
+            status=400,
+        )
+
+    summary: dict[str, dict[str, int | str]] = {}
+    extracted_dests: list[Path] = []
+
+    def _abort(err: str, status: int = 400) -> web.Response:
+        for path in extracted_dests:
+            shutil.rmtree(path, ignore_errors=True)
+        return web.json_response({"error": err}, status=status)
+
+    for kind in ("colonies", "agents_worker", "agents_queen"):
+        for prefix, dest in plan[kind].items():
+            target = Path(dest)
+            files_extracted, extract_err = _safe_extract_tar(tf, target, strip_prefix=prefix)
+            if extract_err:
+                return _abort(extract_err)
+            summary.setdefault(kind, {"files": 0})
+            summary[kind]["files"] = int(summary[kind].get("files", 0)) + files_extracted
+            extracted_dests.append(target)
+
+    total_files = sum(int(v.get("files", 0)) for v in summary.values())
+    logger.info(
+        "Imported colony %s (%d files across %d roots) from upload %s (%d bytes)",
+        primary_colony_name or "<unknown>",
+        total_files,
+        sum(1 for v in summary.values() if int(v.get("files", 0)) > 0),
+        upload_filename or "<unnamed>",
+        upload_size,
+    )
+
+    return web.json_response(
+        {
+            "name": primary_colony_name,
+            "path": str(primary_colony_target) if primary_colony_target else None,
+            "files_imported": total_files,
+            "by_root": summary,
+            "replaced": replace_existing,
+        },
+        status=201,
+    )
+
+
+def _find_workers_bound_to_profile(request: web.Request, colony_name: str, profile_name: str) -> list[str]:
+    """Return live worker IDs bound to ``(colony_name, profile_name)``.
+
+    Walks every live session's ColonyRuntime workers map. Used to refuse
+    profile deletes / renames while workers are still using the binding —
+    the contextvar that pins a worker's MCP account lookups is set at
+    spawn time and a profile mutation underneath a running worker would
+    leave its tool calls pointing at a removed alias on the next turn.
+    """
+    manager = request.app.get("manager")
+    if manager is None:
+        return []
+    bound: list[str] = []
+    try:
+        sessions = manager.list_sessions()
+    except Exception:
+        return []
+    for s in sessions:
+        runtime = getattr(s, "colony", None) or getattr(s, "colony_runtime", None)
+        if runtime is None:
+            continue
+        if getattr(runtime, "_colony_id", None) != colony_name:
+            continue
+        try:
+            for info in runtime.list_workers():
+                if info.profile_name == profile_name and info.status in {
+                    "WorkerStatus.RUNNING",
+                    "WorkerStatus.PENDING",
+                    "running",
+                    "pending",
+                }:
+                    bound.append(info.id)
+        except Exception:
+            continue
+    return bound
+
+
+async def handle_list_worker_profiles(request: web.Request) -> web.Response:
+    """GET /api/colonies/{colony_name}/worker_profiles"""
+    colony_name = request.match_info["colony_name"]
+    err = _validate_colony_name(colony_name)
+    if err:
+        return web.json_response({"error": err}, status=400)
+    if not (COLONIES_DIR / colony_name).exists():
+        return web.json_response({"error": f"colony '{colony_name}' not found"}, status=404)
+
+    from framework.host.worker_profiles import list_worker_profiles
+
+    profiles = list_worker_profiles(colony_name)
+    return web.json_response({"worker_profiles": [p.to_dict() for p in profiles]})
+
+
+async def handle_upsert_worker_profile(request: web.Request) -> web.Response:
+    """POST /api/colonies/{colony_name}/worker_profiles — create or replace one profile.
+
+    Body: ``{name, integrations?, task?, skill_name?, concurrency_hint?,
+             prompt_override?, tool_filter?}``. Existing siblings are
+    preserved; an existing profile with the same ``name`` is replaced
+    (so the desktop can use this for both add and edit).
+    """
+    colony_name = request.match_info["colony_name"]
+    err = _validate_colony_name(colony_name)
+    if err:
+        return web.json_response({"error": err}, status=400)
+    if not (COLONIES_DIR / colony_name).exists():
+        return web.json_response({"error": f"colony '{colony_name}' not found"}, status=404)
+
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "invalid JSON body"}, status=400)
+    if not isinstance(body, dict):
+        return web.json_response({"error": "body must be a JSON object"}, status=400)
+
+    from framework.host.worker_profiles import (
+        WorkerProfile,
+        upsert_worker_profile,
+        validate_profile_name,
+    )
+
+    profile = WorkerProfile.from_dict(body)
+    name_err = validate_profile_name(profile.name)
+    if name_err:
+        return web.json_response({"error": name_err}, status=400)
+
+    try:
+        saved = upsert_worker_profile(colony_name, profile)
+    except (FileNotFoundError, ValueError) as exc:
+        return web.json_response({"error": str(exc)}, status=400)
+
+    return web.json_response({"worker_profiles": [p.to_dict() for p in saved]}, status=201)
+
+
+async def handle_delete_worker_profile(request: web.Request) -> web.Response:
+    """DELETE /api/colonies/{colony_name}/worker_profiles/{profile_name}.
+
+    Refused with 409 + ``bound_workers`` listing if a live worker is
+    bound to the profile, so the user can stop those workers before
+    pruning the binding.
+    """
+    colony_name = request.match_info["colony_name"]
+    profile_name = request.match_info["profile_name"]
+    err = _validate_colony_name(colony_name)
+    if err:
+        return web.json_response({"error": err}, status=400)
+    if not (COLONIES_DIR / colony_name).exists():
+        return web.json_response({"error": f"colony '{colony_name}' not found"}, status=404)
+
+    bound = _find_workers_bound_to_profile(request, colony_name, profile_name)
+    if bound:
+        return web.json_response(
+            {
+                "error": "profile is bound to live workers; stop them first",
+                "bound_workers": bound,
+            },
+            status=409,
+        )
+
+    from framework.host.worker_profiles import delete_worker_profile
+
+    try:
+        removed = delete_worker_profile(colony_name, profile_name)
+    except ValueError as exc:
+        return web.json_response({"error": str(exc)}, status=400)
+    if not removed:
+        return web.json_response({"error": f"profile '{profile_name}' not found"}, status=404)
+    return web.json_response({"deleted": True, "profile_name": profile_name})
+
+
+def register_routes(app: web.Application) -> None:
+    app.router.add_post("/api/colonies/import", handle_import_colony)
+    app.router.add_get(
+        "/api/colonies/{colony_name}/worker_profiles",
+        handle_list_worker_profiles,
+    )
+    app.router.add_post(
+        "/api/colonies/{colony_name}/worker_profiles",
+        handle_upsert_worker_profile,
+    )
+    app.router.add_delete(
+        "/api/colonies/{colony_name}/worker_profiles/{profile_name}",
+        handle_delete_worker_profile,
+    )
@@ -0,0 +1,329 @@
+"""Per-colony MCP tool allowlist routes.
+
+- GET   /api/colony/{colony_name}/tools  -- enumerate colony tool surface
+- PATCH /api/colony/{colony_name}/tools  -- set or clear the allowlist
+
+A colony's tool set is inherited from the queen that forked it, so the
+tool surface mirrors the queen's MCP servers. Lifecycle/synthetic tools
+are included for display only. MCP tools are grouped by origin server
+with per-tool ``enabled`` flags.
+
+Semantics:
+
+- ``enabled_mcp_tools: null``  →  allow every MCP tool (default).
+- ``enabled_mcp_tools: []``    →  allow no MCP tools (only lifecycle /
+  synthetic pass through).
+- ``enabled_mcp_tools: [...]`` →  only listed names pass.
+
+The allowlist is persisted in a dedicated ``tools.json`` sidecar at
+``~/.hive/colonies/{colony_name}/tools.json``. Changes take effect on
+the *next* worker spawn. In-flight workers keep the tool list they
+booted with because workers have no dynamic tools provider today —
+mutating their tool set mid-turn would diverge from the list the LLM
+is already using.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.host.colony_metadata import colony_metadata_path
+from framework.host.colony_tools_config import (
+    load_colony_tools_config,
+    update_colony_tools_config,
+)
+
+logger = logging.getLogger(__name__)
+
+
+_SYNTHETIC_NAMES = {"ask_user"}
+
+
+def _synthetic_entries() -> list[dict[str, Any]]:
+    try:
+        from framework.agent_loop.internals.synthetic_tools import build_ask_user_tool
+
+        tool = build_ask_user_tool()
+        return [
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        ]
+    except Exception:
+        return [
+            {
+                "name": "ask_user",
+                "description": "Pause and ask the user a structured question.",
+                "editable": False,
+            }
+        ]
+
+
+def _colony_runtimes_for_name(manager: Any, colony_name: str) -> list[Any]:
+    """Return every live ColonyRuntime whose session is attached to ``colony_name``."""
+    sessions = getattr(manager, "_sessions", None) or {}
+    runtimes: list[Any] = []
+    for session in sessions.values():
+        if getattr(session, "colony_name", None) != colony_name:
+            continue
+        # Both ``session.colony`` (queen-side unified runtime) and
+        # ``session.colony_runtime`` (legacy worker runtime) may carry
+        # tools that need the allowlist applied. We update both.
+        for attr in ("colony", "colony_runtime"):
+            rt = getattr(session, attr, None)
+            if rt is not None and rt not in runtimes:
+                runtimes.append(rt)
+    return runtimes
+
+
+async def _render_catalog(manager: Any, colony_name: str) -> dict[str, list[dict[str, Any]]]:
+    """Build a per-server tool catalog for this colony.
+
+    All colonies inherit the queen's MCP surface, so we reuse the
+    manager-level ``_mcp_tool_catalog`` populated during queen boot.
+    """
+    # If a live runtime exists and carries its own registry, prefer it —
+    # it's authoritative (reflects any post-queen-boot MCP additions).
+    for rt in _colony_runtimes_for_name(manager, colony_name):
+        tools = getattr(rt, "_tools", None)
+        if not tools:
+            continue
+        mcp_names = set(getattr(rt, "_mcp_tool_names_all", set()) or set())
+        if not mcp_names:
+            continue
+        catalog: dict[str, list[dict[str, Any]]] = {"(mcp)": []}
+        for tool in tools:
+            name = getattr(tool, "name", None)
+            if name in mcp_names:
+                catalog["(mcp)"].append(
+                    {
+                        "name": name,
+                        "description": getattr(tool, "description", ""),
+                        "input_schema": getattr(tool, "parameters", {}),
+                    }
+                )
+        return catalog
+
+    # Otherwise fall back to the queen-level snapshot. Build it on demand
+    # (off the event loop) when empty so the Tool Library works before
+    # any queen has been started in this process.
+    cached = getattr(manager, "_mcp_tool_catalog", None)
+    if isinstance(cached, dict) and cached:
+        return cached
+    try:
+        import asyncio
+
+        from framework.server.queen_orchestrator import build_queen_tool_registry_bare
+
+        registry, built = await asyncio.to_thread(build_queen_tool_registry_bare)
+        if manager is not None:
+            manager._mcp_tool_catalog = built  # type: ignore[attr-defined]
+            manager._bootstrap_tool_registry = registry  # type: ignore[attr-defined]
+        return built
+    except Exception:
+        logger.warning("Colony tools: catalog bootstrap failed", exc_info=True)
+        return {}
+
+
+def _lifecycle_entries_from_runtime(manager: Any, colony_name: str) -> list[dict[str, Any]]:
+    """Non-MCP tools currently registered on the colony runtime (if any).
+
+    When no live runtime is available we fall back to the bootstrap
+    registry stashed on the manager by ``routes_queen_tools`` — it
+    already has queen lifecycle tools registered, which are also the
+    lifecycle tools colonies inherit at spawn time.
+    """
+    out: list[dict[str, Any]] = []
+    seen: set[str] = set()
+
+    def _push(name: str, description: str) -> None:
+        if not name or name in seen:
+            return
+        if name in _SYNTHETIC_NAMES:
+            return
+        seen.add(name)
+        out.append({"name": name, "description": description, "editable": False})
+
+    runtimes = _colony_runtimes_for_name(manager, colony_name)
+    if runtimes:
+        for rt in runtimes:
+            mcp_names = set(getattr(rt, "_mcp_tool_names_all", set()) or set())
+            for tool in getattr(rt, "_tools", []) or []:
+                name = getattr(tool, "name", None)
+                if name in mcp_names:
+                    continue
+                _push(name, getattr(tool, "description", ""))
+    else:
+        # No live runtime — derive from the bootstrap registry.
+        from framework.server.routes_queen_tools import _lifecycle_entries_without_session
+
+        catalog = getattr(manager, "_mcp_tool_catalog", {}) or {}
+        mcp_names: set[str] = set()
+        for entries in catalog.values():
+            for entry in entries:
+                if entry.get("name"):
+                    mcp_names.add(entry["name"])
+        out.extend(_lifecycle_entries_without_session(manager, mcp_names))
+        return out
+    return sorted(out, key=lambda e: e["name"])
+
+
+def _render_servers(
+    catalog: dict[str, list[dict[str, Any]]],
+    enabled_mcp_tools: list[str] | None,
+) -> list[dict[str, Any]]:
+    allowed: set[str] | None = None if enabled_mcp_tools is None else set(enabled_mcp_tools)
+    servers: list[dict[str, Any]] = []
+    for name in sorted(catalog):
+        tools = []
+        for entry in catalog[name]:
+            tool_name = entry.get("name")
+            tools.append(
+                {
+                    "name": tool_name,
+                    "description": entry.get("description", ""),
+                    "input_schema": entry.get("input_schema", {}),
+                    "enabled": True if allowed is None else tool_name in allowed,
+                }
+            )
+        servers.append({"name": name, "tools": tools})
+    return servers
+
+
+async def handle_get_tools(request: web.Request) -> web.Response:
+    """GET /api/colony/{colony_name}/tools."""
+    colony_name = request.match_info["colony_name"]
+    if not colony_metadata_path(colony_name).exists():
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    manager = request.app.get("manager")
+    # Allowlist now lives in a dedicated tools.json sidecar; helper
+    # migrates any legacy metadata.json field on first read.
+    enabled = load_colony_tools_config(colony_name)
+
+    catalog = await _render_catalog(manager, colony_name)
+    stale = not catalog
+
+    return web.json_response(
+        {
+            "colony_name": colony_name,
+            "enabled_mcp_tools": enabled,
+            "stale": stale,
+            "lifecycle": _lifecycle_entries_from_runtime(manager, colony_name),
+            "synthetic": _synthetic_entries(),
+            "mcp_servers": _render_servers(catalog, enabled),
+        }
+    )
+
+
+async def handle_patch_tools(request: web.Request) -> web.Response:
+    """PATCH /api/colony/{colony_name}/tools."""
+    colony_name = request.match_info["colony_name"]
+    if not colony_metadata_path(colony_name).exists():
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict) or "enabled_mcp_tools" not in body:
+        return web.json_response(
+            {"error": "Body must be an object with an 'enabled_mcp_tools' field"},
+            status=400,
+        )
+
+    enabled = body["enabled_mcp_tools"]
+    if enabled is not None:
+        if not isinstance(enabled, list) or not all(isinstance(x, str) for x in enabled):
+            return web.json_response(
+                {"error": "'enabled_mcp_tools' must be null or a list of strings"},
+                status=400,
+            )
+
+    manager = request.app.get("manager")
+
+    # Validate names against the known MCP catalog — lifts the same
+    # typo-catching guarantee we already offer on queen tools.
+    catalog = await _render_catalog(manager, colony_name)
+    known: set[str] = {e.get("name") for entries in catalog.values() for e in entries if e.get("name")}
+    if enabled is not None and known:
+        unknown = sorted(set(enabled) - known)
+        if unknown:
+            return web.json_response(
+                {"error": "Unknown MCP tool name(s)", "unknown": unknown},
+                status=400,
+            )
+
+    # Persist — tools.json sidecar, not metadata.json. Missing directory
+    # is already guarded by the 404 check above.
+    try:
+        update_colony_tools_config(colony_name, enabled)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    # Update any live runtimes so the NEXT worker spawn reflects the change.
+    # We do NOT rebuild in-flight workers' tool lists (see module docstring).
+    refreshed = 0
+    for rt in _colony_runtimes_for_name(manager, colony_name):
+        setter = getattr(rt, "set_tool_allowlist", None)
+        if callable(setter):
+            try:
+                setter(enabled)
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Colony tools: set_tool_allowlist failed on runtime for %s",
+                    colony_name,
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Colony tools: colony=%s allowlist=%s refreshed_runtimes=%d",
+        colony_name,
+        "null" if enabled is None else f"{len(enabled)} tool(s)",
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "colony_name": colony_name,
+            "enabled_mcp_tools": enabled,
+            "refreshed_runtimes": refreshed,
+            "note": "Changes apply to the next worker spawn. Running workers keep their booted tool list.",
+        }
+    )
+
+
+async def handle_list_colonies(request: web.Request) -> web.Response:
+    """GET /api/colonies — list colonies with their tool allowlist status.
+
+    Powers the Tool Library page's colony picker.
+    """
+    from framework.host.colony_metadata import list_colony_names, load_colony_metadata
+
+    colonies: list[dict[str, Any]] = []
+    for name in list_colony_names():
+        meta = load_colony_metadata(name)
+        # Provenance stays in metadata.json; allowlist lives in tools.json.
+        allowlist = load_colony_tools_config(name)
+        colonies.append(
+            {
+                "name": name,
+                "queen_name": meta.get("queen_name"),
+                "created_at": meta.get("created_at"),
+                "has_allowlist": allowlist is not None,
+                "enabled_count": len(allowlist) if isinstance(allowlist, list) else None,
+            }
+        )
+    return web.json_response({"colonies": colonies})
+
+
+def register_routes(app: web.Application) -> None:
+    """Register per-colony tool routes."""
+    app.router.add_get("/api/colonies/tools-index", handle_list_colonies)
+    app.router.add_get("/api/colony/{colony_name}/tools", handle_get_tools)
+    app.router.add_patch("/api/colony/{colony_name}/tools", handle_patch_tools)
@@ -62,6 +62,7 @@ def _worker_info_to_dict(info) -> dict:
        "status": str(info.status),
        "started_at": info.started_at,
        "result": result_dict,
+        "profile_name": getattr(info, "profile_name", "") or "",
    }


@@ -235,10 +236,6 @@ _SYSTEM_TOOLS: frozenset[str] = frozenset(
    {
        "get_account_info",
        "get_current_time",
-        "bash_kill",
-        "bash_output",
-        "execute_command_tool",
-        "example_tool",
    }
 )

@@ -294,7 +291,9 @@ def _resolve_progress_db_by_name(colony_name: str) -> Path | None:
    """
    if not _COLONY_NAME_RE.match(colony_name):
        return None
-    db_path = Path.home() / ".hive" / "colonies" / colony_name / "data" / "progress.db"
+    from framework.config import COLONIES_DIR
+
+    db_path = COLONIES_DIR / colony_name / "data" / "progress.db"
    return db_path if db_path.exists() else None


@@ -51,6 +51,8 @@ PROVIDER_ENV_VARS: dict[str, str] = {
    "together": "TOGETHER_API_KEY",
    "together_ai": "TOGETHER_API_KEY",
    "deepseek": "DEEPSEEK_API_KEY",
+    "kimi": "KIMI_API_KEY",
+    "hive": "HIVE_API_KEY",
 }

 _SUBSCRIPTION_DEFINITIONS: list[dict[str, str]] = [
@@ -43,6 +43,21 @@ def _get_store(request: web.Request) -> CredentialStore:
    return request.app["credential_store"]


+def _reset_credential_adapter_cache() -> None:
+    """Clear the memoized CredentialStoreAdapter so the next call re-syncs.
+
+    The adapter cache is keyed on ``(id(specs), ADEN_API_KEY)``; without
+    this reset, a key save/delete done after process startup is invisible
+    to in-process MCP tool calls until restart.
+    """
+    try:
+        from aden_tools.credentials.store_adapter import _reset_default_adapter_cache
+
+        _reset_default_adapter_cache()
+    except Exception:
+        logger.warning("Failed to reset credential adapter cache", exc_info=True)
+
+
 def _invalidate_queen_credentials_cache(request: web.Request) -> None:
    """Force every live Queen session to rebuild its ambient credentials block.

@@ -158,6 +173,12 @@ async def handle_save_credential(request: web.Request) -> web.Response:

        save_aden_api_key(key)

+        # Make the new key visible to the in-process AdenSyncProvider on
+        # the very next CredentialStoreAdapter.default() call. The adapter
+        # cache is keyed on this env var.
+        os.environ["ADEN_API_KEY"] = key
+        _reset_credential_adapter_cache()
+
        # Immediately sync OAuth tokens from Aden (runs in executor because
        # _presync_aden_tokens makes blocking HTTP calls to the Aden server).
        try:
@@ -193,6 +214,11 @@ async def handle_delete_credential(request: web.Request) -> web.Response:
        deleted = delete_aden_api_key()
        if not deleted:
            return web.json_response({"error": "Credential 'aden_api_key' not found"}, status=404)
+        # Drop the env var so the next adapter rebuild lands in the
+        # non-Aden branch instead of trying to reuse the stale key.
+        os.environ.pop("ADEN_API_KEY", None)
+        _reset_credential_adapter_cache()
+        _invalidate_queen_credentials_cache(request)
        return web.json_response({"deleted": True})

    store = _get_store(request)
@@ -358,6 +384,9 @@ async def handle_resync_credentials(request: web.Request) -> web.Response:
        # _presync_aden_tokens makes blocking HTTP calls to the Aden server.
        await loop.run_in_executor(None, lambda: _presync_aden_tokens(CREDENTIAL_SPECS, force=True))

+        # Drop the cached adapter so newly-fetched accounts are visible
+        # to the next MCP tool call without waiting for a process restart.
+        _reset_credential_adapter_cache()
        _invalidate_queen_credentials_cache(request)

        accounts_by_provider = _collect_accounts_by_provider()
@@ -42,12 +42,11 @@ _WORKER_INHERITED_TOOLS: frozenset[str] = frozenset(
        "read_file",
        "write_file",
        "edit_file",
-        "hashline_edit",
-        "list_directory",
        "search_files",
-        "undo_changes",
-        # Shell
-        "run_command",
+        # Terminal (basics — exec + ripgrep + glob/find)
+        "terminal_exec",
+        "terminal_rg",
+        "terminal_find",
        # Framework synthetics (always available to any AgentLoop node)
        "set_output",
        "escalate",
@@ -1152,6 +1151,7 @@ async def fork_session_into_colony(
    task: str,
    tasks: list[dict] | None = None,
    concurrency_hint: int | None = None,
+    worker_profiles: list[dict] | None = None,
 ) -> dict:
    """Fork a queen session into a colony directory.

@@ -1181,7 +1181,6 @@ async def fork_session_into_colony(
    import json
    import shutil
    from datetime import datetime
-    from pathlib import Path

    from framework.agent_loop.agent_loop import AgentLoop, LoopConfig
    from framework.agent_loop.types import AgentContext
@@ -1245,7 +1244,9 @@ async def fork_session_into_colony(
    # would wrongly flag every fresh colony as "already-exists" if we
    # used ``not colony_dir.exists()``. A colony is "new" until its
    # worker config has actually been written.
-    colony_dir = Path.home() / ".hive" / "colonies" / colony_name
+    from framework.config import COLONIES_DIR
+
+    colony_dir = COLONIES_DIR / colony_name
    worker_name = "worker"
    worker_config_path = colony_dir / f"{worker_name}.json"
    is_new = not worker_config_path.exists()
@@ -1398,6 +1399,56 @@ async def fork_session_into_colony(
        worker_meta["concurrency_hint"] = concurrency_hint
    worker_config_path.write_text(json.dumps(worker_meta, indent=2, ensure_ascii=False), encoding="utf-8")

+    # ── 2a. Materialize named worker profiles ────────────────────
+    # Each named profile gets its own ``profiles/<name>/worker.json``
+    # cloned from the base worker_meta with profile-specific overrides
+    # (task, system_prompt, tool_filter, concurrency_hint). The base
+    # ``worker.json`` above acts as the implicit "default" profile.
+    persisted_profiles: list[dict] = []
+    if worker_profiles:
+        from framework.host.worker_profiles import (
+            DEFAULT_PROFILE_NAME,
+            WorkerProfile,
+            validate_profile_name,
+            worker_spec_path,
+        )
+
+        for raw in worker_profiles:
+            if not isinstance(raw, dict):
+                continue
+            profile = WorkerProfile.from_dict(raw)
+            err = validate_profile_name(profile.name)
+            if err is not None:
+                logger.warning("create_colony: invalid profile name %r: %s", profile.name, err)
+                continue
+            profile_meta = dict(worker_meta)
+            profile_meta["profile_name"] = profile.name
+            if profile.task:
+                profile_meta["goal"] = {
+                    **profile_meta.get("goal", {}),
+                    "description": profile.task,
+                }
+            if profile.prompt_override:
+                profile_meta["system_prompt"] = (
+                    f"{worker_meta['system_prompt']}\n\n{profile.prompt_override}"
+                )
+            if profile.tool_filter:
+                profile_meta["tools"] = [t for t in worker_meta["tools"] if t in set(profile.tool_filter)]
+            if isinstance(profile.concurrency_hint, int) and profile.concurrency_hint > 0:
+                profile_meta["concurrency_hint"] = profile.concurrency_hint
+            if profile.integrations:
+                profile_meta["integrations"] = dict(profile.integrations)
+
+            target = worker_spec_path(colony_name, profile.name)
+            if profile.name == DEFAULT_PROFILE_NAME:
+                # Skip — the legacy file already written above is the
+                # canonical default.
+                persisted_profiles.append(profile.to_dict())
+                continue
+            target.parent.mkdir(parents=True, exist_ok=True)
+            target.write_text(json.dumps(profile_meta, indent=2, ensure_ascii=False), encoding="utf-8")
+            persisted_profiles.append(profile.to_dict())
+
    # ── 3. Duplicate queen session into colony ───────────────────
    # Copy the queen's full session directory (conversations, events,
    # meta) into a new queen-session dir assigned to this colony.
@@ -1469,7 +1520,9 @@ async def fork_session_into_colony(

        compaction_status.mark_in_progress(dest_queen_dir)

-        _worker_storage = Path.home() / ".hive" / "agents" / colony_name / worker_name
+        from framework.config import HIVE_HOME
+
+        _worker_storage = HIVE_HOME / "agents" / colony_name / worker_name
        _dest_queen_dir = dest_queen_dir
        _queen_ctx = queen_ctx
        _queen_loop = queen_loop
@@ -1575,8 +1628,55 @@ async def fork_session_into_colony(
        "task": worker_task[:100],
        "spawned_at": datetime.now(UTC).isoformat(),
    }
+    if persisted_profiles:
+        # Persist the canonical profile roster so dispatch + UI can read
+        # back what the queen declared at create_colony time. Merge with
+        # any existing list so a later update_worker_profile call doesn't
+        # erase profiles created in an earlier fork.
+        existing_profiles = metadata.get("worker_profiles") or []
+        if not isinstance(existing_profiles, list):
+            existing_profiles = []
+        seen = {p["name"] for p in persisted_profiles if isinstance(p, dict) and p.get("name")}
+        merged = list(persisted_profiles) + [
+            p for p in existing_profiles
+            if isinstance(p, dict) and p.get("name") and p["name"] not in seen
+        ]
+        metadata["worker_profiles"] = merged
    metadata_path.write_text(json.dumps(metadata, indent=2, ensure_ascii=False), encoding="utf-8")

+    # ── 4a. Inherit the queen's tool allowlist into the colony ───
+    # A colony forked from a curated queen should start with the same
+    # tool surface (otherwise the colony silently falls back to its own
+    # "allow every MCP tool" default, undoing the parent's curation).
+    # We copy the queen's LIVE effective allowlist so the snapshot
+    # reflects whatever was in force the moment the user clicked "Create
+    # Colony". Users can further narrow the colony via the Tool Library.
+    # Skip the write when the queen is on allow-all (None) so the colony
+    # keeps the same semantics without creating an inert sidecar.
+    try:
+        queen_enabled = getattr(
+            getattr(session, "phase_state", None),
+            "enabled_mcp_tools",
+            None,
+        )
+        if isinstance(queen_enabled, list):
+            from framework.host.colony_tools_config import update_colony_tools_config
+
+            update_colony_tools_config(colony_name, list(queen_enabled))
+            logger.info(
+                "Inherited queen allowlist into colony '%s' (%d tools)",
+                colony_name,
+                len(queen_enabled),
+            )
+    except Exception:
+        # Inheritance is best-effort — don't let a tools.json hiccup
+        # abort colony creation.
+        logger.warning(
+            "Failed to inherit queen allowlist into colony '%s'",
+            colony_name,
+            exc_info=True,
+        )
+
    # ── 5. Update source queen session meta.json ─────────────────
    # Link the originating session back to the colony for discovery.
    source_meta_path = source_queen_dir / "meta.json"
@@ -0,0 +1,291 @@
+"""MCP server registration routes.
+
+Thin HTTP wrapper around ``MCPRegistry`` so the frontend can add, remove,
+enable, and health-check user-registered MCP servers. The CLI path
+(``hive mcp add`` / ``hive mcp remove`` / etc.) is unchanged.
+
+- GET    /api/mcp/servers                  -- list installed servers
+- POST   /api/mcp/servers                  -- register a local server
+- DELETE /api/mcp/servers/{name}           -- remove a local server
+- POST   /api/mcp/servers/{name}/enable    -- enable a server
+- POST   /api/mcp/servers/{name}/disable   -- disable a server
+- POST   /api/mcp/servers/{name}/health    -- probe server health
+
+New servers take effect on the *next* queen session start. Existing live
+queen sessions keep the tool list they booted with to avoid mid-turn
+cache invalidation. The ``add`` response hints at this explicitly.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.loader.mcp_errors import MCPError
+from framework.loader.mcp_registry import MCPRegistry
+
+logger = logging.getLogger(__name__)
+
+
+_VALID_TRANSPORTS = {"stdio", "http", "sse", "unix"}
+
+
+def _registry() -> MCPRegistry:
+    # MCPRegistry is a thin wrapper around ~/.hive/mcp_registry/installed.json
+    # so instantiation is cheap — no need to cache on app["..."].
+    reg = MCPRegistry()
+    reg.initialize()
+    return reg
+
+
+def _package_builtin_servers() -> list[dict[str, Any]]:
+    """Return the package-baked queen MCP servers from ``queen/mcp_servers.json``.
+
+    Those servers are loaded directly by ``ToolRegistry.load_mcp_config``
+    at queen boot and never go through ``MCPRegistry.list_installed``,
+    so the raw registry view shows them as missing. Surface them here so
+    the Tool Library reflects what the queen actually talks to.
+
+    Entries carry ``source: "built-in"`` and are NOT removable / toggleable
+    — editing them requires changing the repo file.
+    """
+    import json
+    from pathlib import Path
+
+    import framework.agents.queen as _queen_pkg
+
+    path = Path(_queen_pkg.__file__).parent / "mcp_servers.json"
+    if not path.exists():
+        return []
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        return []
+
+    out: list[dict[str, Any]] = []
+    for name, cfg in data.items():
+        if not isinstance(cfg, dict):
+            continue
+        out.append(
+            {
+                "name": name,
+                "source": "built-in",
+                "transport": cfg.get("transport", "stdio"),
+                "description": cfg.get("description", "") or "",
+                "enabled": True,
+                "last_health_status": None,
+                "last_error": None,
+                "last_health_check_at": None,
+                "tool_count": None,
+                "removable": False,
+            }
+        )
+    return out
+
+
+def _server_to_summary(entry: dict[str, Any]) -> dict[str, Any]:
+    """Shape an installed.json entry for API responses.
+
+    Strips the full manifest body (which can be large) but keeps the tool
+    list if the manifest already embeds one (happens for registry-installed
+    servers). Users with ``source: "local"`` only get a tool list after
+    running a health check.
+    """
+    manifest = entry.get("manifest") or {}
+    tools = manifest.get("tools") if isinstance(manifest, dict) else None
+    if not isinstance(tools, list):
+        tools = None
+    return {
+        "name": entry.get("name"),
+        "source": entry.get("source"),
+        "transport": entry.get("transport"),
+        "description": (manifest.get("description") if isinstance(manifest, dict) else None) or "",
+        "enabled": entry.get("enabled", True),
+        "last_health_status": entry.get("last_health_status"),
+        "last_error": entry.get("last_error"),
+        "last_health_check_at": entry.get("last_health_check_at"),
+        "tool_count": (len(tools) if tools is not None else None),
+    }
+
+
+def _mcp_error_response(exc: MCPError, *, default_status: int = 400) -> web.Response:
+    return web.json_response(
+        {
+            "error": exc.what,
+            "code": exc.code.value,
+            "what": exc.what,
+            "why": exc.why,
+            "fix": exc.fix,
+        },
+        status=default_status,
+    )
+
+
+async def handle_list_servers(request: web.Request) -> web.Response:
+    """GET /api/mcp/servers — list every server the queen actually uses.
+
+    Merges two sources:
+
+    - ``MCPRegistry.list_installed()`` — servers registered via
+      ``hive mcp add`` / the ``/api/mcp/servers`` POST route, stored in
+      ``~/.hive/mcp_registry/installed.json``. These carry
+      ``source: "local"`` (user-added) or ``source: "registry"``
+      (installed from the remote registry).
+    - Repo-baked queen servers from
+      ``core/framework/agents/queen/mcp_servers.json``. These are loaded
+      directly by the queen's ``ToolRegistry`` at boot and never touch
+      ``MCPRegistry``; we surface them here so the UI reflects what the
+      queen really talks to. They are not removable from the UI because
+      editing them requires changing the repo.
+
+    If a name collides between the two sources, the registry entry wins
+    because that's the one the user has customized.
+    """
+    reg = _registry()
+    registry_entries = [_server_to_summary(e) for e in reg.list_installed()]
+    seen_names = {e.get("name") for e in registry_entries}
+
+    package_entries = [e for e in _package_builtin_servers() if e.get("name") not in seen_names]
+
+    servers = [*package_entries, *registry_entries]
+    return web.json_response({"servers": servers})
+
+
+async def handle_add_server(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers — register a local MCP server.
+
+    Body mirrors ``MCPRegistry.add_local`` args:
+
+    ::
+
+        {
+          "name": "my-tool",
+          "transport": "stdio" | "http" | "sse" | "unix",
+          "command": "...", "args": [...], "env": {...}, "cwd": "...",
+          "url": "...", "headers": {...},
+          "socket_path": "...",
+          "description": "..."
+        }
+    """
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict):
+        return web.json_response({"error": "Body must be a JSON object"}, status=400)
+
+    name = body.get("name")
+    transport = body.get("transport")
+    if not isinstance(name, str) or not name.strip():
+        return web.json_response({"error": "'name' is required"}, status=400)
+    if transport not in _VALID_TRANSPORTS:
+        return web.json_response(
+            {"error": f"'transport' must be one of {sorted(_VALID_TRANSPORTS)}"},
+            status=400,
+        )
+
+    reg = _registry()
+    try:
+        entry = reg.add_local(
+            name=name.strip(),
+            transport=transport,
+            command=body.get("command"),
+            args=body.get("args"),
+            env=body.get("env"),
+            cwd=body.get("cwd"),
+            url=body.get("url"),
+            headers=body.get("headers"),
+            socket_path=body.get("socket_path"),
+            description=body.get("description", ""),
+        )
+    except MCPError as exc:
+        status = 409 if "already exists" in exc.what else 400
+        return _mcp_error_response(exc, default_status=status)
+    except Exception as exc:
+        logger.exception("MCP add_local failed for %r", name)
+        return web.json_response({"error": str(exc)}, status=500)
+
+    summary = _server_to_summary({"name": name, **entry})
+    return web.json_response(
+        {
+            "server": summary,
+            "hint": "Start a new queen session to use this server's tools.",
+        },
+        status=201,
+    )
+
+
+async def handle_remove_server(request: web.Request) -> web.Response:
+    """DELETE /api/mcp/servers/{name} — remove a local server."""
+    name = request.match_info["name"]
+    reg = _registry()
+
+    existing = reg.get_server(name)
+    if existing is None:
+        return web.json_response({"error": f"Server '{name}' not installed"}, status=404)
+    if existing.get("source") != "local":
+        return web.json_response(
+            {
+                "error": f"Server '{name}' is a built-in; it cannot be removed from the UI.",
+            },
+            status=400,
+        )
+
+    try:
+        reg.remove(name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    return web.json_response({"removed": name})
+
+
+async def handle_set_enabled(request: web.Request, *, enabled: bool) -> web.Response:
+    name = request.match_info["name"]
+    reg = _registry()
+    try:
+        if enabled:
+            reg.enable(name)
+        else:
+            reg.disable(name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    return web.json_response({"name": name, "enabled": enabled})
+
+
+async def handle_enable(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/enable."""
+    return await handle_set_enabled(request, enabled=True)
+
+
+async def handle_disable(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/disable."""
+    return await handle_set_enabled(request, enabled=False)
+
+
+async def handle_health(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/health — probe one server."""
+    name = request.match_info["name"]
+    reg = _registry()
+    try:
+        # MCPRegistry.health_check blocks on subprocess IO — run it off
+        # the event loop so the HTTP worker stays responsive.
+        import asyncio
+
+        result = await asyncio.to_thread(reg.health_check, name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    except Exception as exc:
+        logger.exception("MCP health_check failed for %r", name)
+        return web.json_response({"error": str(exc)}, status=500)
+    return web.json_response(result)
+
+
+def register_routes(app: web.Application) -> None:
+    """Register MCP server CRUD routes."""
+    app.router.add_get("/api/mcp/servers", handle_list_servers)
+    app.router.add_post("/api/mcp/servers", handle_add_server)
+    app.router.add_delete("/api/mcp/servers/{name}", handle_remove_server)
+    app.router.add_post("/api/mcp/servers/{name}/enable", handle_enable)
+    app.router.add_post("/api/mcp/servers/{name}/disable", handle_disable)
+    app.router.add_post("/api/mcp/servers/{name}/health", handle_health)
@@ -0,0 +1,537 @@
+"""Per-queen MCP tool allowlist routes.
+
+- GET   /api/queen/{queen_id}/tools  -- enumerate the queen's tool surface
+- PATCH /api/queen/{queen_id}/tools  -- set or clear the MCP tool allowlist
+
+Lifecycle and synthetic tools (``ask_user``) are always part of the queen's
+surface in INDEPENDENT mode and are returned with ``editable: false``. MCP
+tools are grouped by origin server and carry per-tool ``enabled`` flags.
+
+The allowlist is persisted in a dedicated ``tools.json`` sidecar at
+``~/.hive/agents/queens/{queen_id}/tools.json``:
+
+- ``null`` / missing file -> "allow every MCP tool" (default)
+- ``[]``                  -> explicitly disable every MCP tool
+- ``["foo", "bar"]``      -> only these MCP tools pass through to the LLM
+
+Filtering happens in ``QueenPhaseState.rebuild_independent_filter`` so the
+LLM prompt cache stays warm between saves.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.agents.queen.queen_profiles import (
+    ensure_default_queens,
+    load_queen_profile,
+)
+from framework.agents.queen.queen_tools_config import (
+    delete_queen_tools_config,
+    load_queen_tools_config,
+    tools_config_exists,
+    update_queen_tools_config,
+)
+from framework.agents.queen.queen_tools_defaults import (
+    list_category_names,
+    queen_role_categories,
+    resolve_category_tools,
+)
+
+logger = logging.getLogger(__name__)
+
+
+_SYNTHETIC_NAMES = {"ask_user"}
+
+
+async def _ensure_manager_catalog(manager: Any) -> dict[str, list[dict[str, Any]]]:
+    """Return the cached MCP tool catalog, building it on first call.
+
+    ``queen_orchestrator.create_queen`` populates ``_mcp_tool_catalog`` on
+    every queen boot. On a fresh backend process the user may open the
+    Tool Library before any queen session has started, so the catalog is
+    empty. In that case we build one from the shared MCP config; the
+    first call pays an MCP-subprocess-spawn cost, subsequent calls are
+    cache hits. The build runs off the event loop via asyncio.to_thread
+    so the HTTP worker stays responsive while MCP servers initialize.
+    """
+    if manager is None:
+        return {}
+    catalog = getattr(manager, "_mcp_tool_catalog", None)
+    if isinstance(catalog, dict) and catalog:
+        return catalog
+    try:
+        import asyncio
+
+        from framework.server.queen_orchestrator import build_queen_tool_registry_bare
+
+        registry, built = await asyncio.to_thread(build_queen_tool_registry_bare)
+        manager._mcp_tool_catalog = built  # type: ignore[attr-defined]
+        manager._bootstrap_tool_registry = registry  # type: ignore[attr-defined]
+        return built
+    except Exception:
+        logger.warning("Tool catalog bootstrap failed", exc_info=True)
+        return {}
+
+
+def _lifecycle_entries_without_session(
+    manager: Any,
+    mcp_names: set[str],
+) -> list[dict[str, Any]]:
+    """Derive lifecycle tool names from the registry even without a session.
+
+    We register queen lifecycle tools against a temporary registry using a
+    minimal stub, then subtract the MCP-origin set and the synthetic set.
+    The result matches what the queen sees at runtime (minus context-
+    specific variants).
+    """
+    registry = getattr(manager, "_bootstrap_tool_registry", None)
+    # If the bootstrap registry exists but doesn't carry lifecycle tools
+    # yet, register them now.
+    if registry is not None and not getattr(registry, "_lifecycle_bootstrap_done", False):
+        try:
+            from types import SimpleNamespace
+
+            from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
+
+            stub_session = SimpleNamespace(
+                id="tool-library-bootstrap",
+                colony_runtime=None,
+                event_bus=None,
+                worker_path=None,
+                phase_state=None,
+                llm=None,
+            )
+            register_queen_lifecycle_tools(
+                registry,
+                session=stub_session,
+                session_id=stub_session.id,
+                session_manager=None,
+                manager_session_id=stub_session.id,
+                phase_state=None,
+            )
+            registry._lifecycle_bootstrap_done = True  # type: ignore[attr-defined]
+        except Exception:
+            logger.debug("lifecycle bootstrap failed", exc_info=True)
+
+    if registry is None:
+        return []
+
+    out: list[dict[str, Any]] = []
+    for name, tool in sorted(registry.get_tools().items()):
+        if name in mcp_names or name in _SYNTHETIC_NAMES:
+            continue
+        out.append(
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        )
+    return out
+
+
+def _synthetic_entries() -> list[dict[str, Any]]:
+    """Return display metadata for synthetic tools injected by the agent loop.
+
+    Kept behind a lazy import so test harnesses that don't wire the agent
+    loop can still hit this route without blowing up.
+    """
+    try:
+        from framework.agent_loop.internals.synthetic_tools import build_ask_user_tool
+
+        tool = build_ask_user_tool()
+        return [
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        ]
+    except Exception:
+        return [
+            {
+                "name": "ask_user",
+                "description": "Pause and ask the user a structured question.",
+                "editable": False,
+            }
+        ]
+
+
+def _live_queen_session(manager: Any, queen_id: str) -> Any:
+    """Return any live DM session owned by this queen, or ``None``."""
+    sessions = getattr(manager, "_sessions", None) or {}
+    for session in sessions.values():
+        if getattr(session, "queen_name", None) != queen_id:
+            continue
+        # Prefer DM (non-colony) sessions
+        if getattr(session, "colony_runtime", None) is None:
+            return session
+    return None
+
+
+def _render_mcp_servers(
+    *,
+    mcp_tool_names_by_server: dict[str, list[dict[str, Any]]],
+    enabled_mcp_tools: list[str] | None,
+) -> list[dict[str, Any]]:
+    """Shape the mcp_tool_catalog entries for the API response."""
+    allowed: set[str] | None = None if enabled_mcp_tools is None else set(enabled_mcp_tools)
+    servers: list[dict[str, Any]] = []
+    for server_name in sorted(mcp_tool_names_by_server):
+        entries = mcp_tool_names_by_server[server_name]
+        tools = []
+        for entry in entries:
+            name = entry.get("name")
+            enabled = True if allowed is None else name in allowed
+            tools.append(
+                {
+                    "name": name,
+                    "description": entry.get("description", ""),
+                    "input_schema": entry.get("input_schema", {}),
+                    "enabled": enabled,
+                }
+            )
+        servers.append({"name": server_name, "tools": tools})
+    return servers
+
+
+def _catalog_from_live_session(session: Any) -> dict[str, list[dict[str, Any]]]:
+    """Rebuild a per-server tool catalog from a live queen session.
+
+    The session's registry is authoritative — this reflects any hot-added
+    MCP servers since the manager-level snapshot was cached.
+    """
+    registry = getattr(session, "_queen_tool_registry", None)
+    if registry is None:
+        # session._queen_tools_by_name is a stash from create_queen; we
+        # only have registry via the tools list, so reconstruct from the
+        # phase state instead.
+        phase_state = getattr(session, "phase_state", None)
+        if phase_state is None:
+            return {}
+        mcp_names = getattr(phase_state, "mcp_tool_names_all", set()) or set()
+        independent_tools = getattr(phase_state, "independent_tools", []) or []
+        result: dict[str, list[dict[str, Any]]] = {"MCP Tools": []}
+        for tool in independent_tools:
+            if tool.name not in mcp_names:
+                continue
+            result["MCP Tools"].append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        return result if result["MCP Tools"] else {}
+
+    server_map = getattr(registry, "_mcp_server_tools", {}) or {}
+    tools_by_name = {t.name: t for t in registry.get_tools().values()}
+    catalog: dict[str, list[dict[str, Any]]] = {}
+    for server_name, tool_names in server_map.items():
+        entries: list[dict[str, Any]] = []
+        for name in sorted(tool_names):
+            tool = tools_by_name.get(name)
+            if tool is None:
+                continue
+            entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        catalog[server_name] = entries
+    return catalog
+
+
+def _lifecycle_entries(
+    *,
+    session: Any,
+    mcp_tool_names_all: set[str],
+) -> list[dict[str, Any]]:
+    """Lifecycle tools = independent_tools minus MCP-origin minus synthetic.
+
+    We compute this from a live session when available so the list exactly
+    matches what the queen actually sees on her next turn.
+    """
+    if session is None:
+        return []
+    phase_state = getattr(session, "phase_state", None)
+    if phase_state is None:
+        return []
+    result: list[dict[str, Any]] = []
+    for tool in getattr(phase_state, "independent_tools", []) or []:
+        if tool.name in mcp_tool_names_all:
+            continue
+        if tool.name in _SYNTHETIC_NAMES:
+            continue
+        result.append(
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        )
+    return sorted(result, key=lambda x: x["name"])
+
+
+async def handle_get_tools(request: web.Request) -> web.Response:
+    """GET /api/queen/{queen_id}/tools — enumerate tool surface for the UI."""
+    queen_id = request.match_info["queen_id"]
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+
+    # Prefer a live session's registry for freshness. Otherwise use (or
+    # build on demand) the manager-level catalog so the Tool Library works
+    # even before any queen has been started in this process.
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    stale = not catalog
+
+    mcp_tool_names_all: set[str] = set()
+    for entries in catalog.values():
+        for entry in entries:
+            if entry.get("name"):
+                mcp_tool_names_all.add(entry["name"])
+
+    if session is not None:
+        lifecycle = _lifecycle_entries(
+            session=session,
+            mcp_tool_names_all=mcp_tool_names_all,
+        )
+    else:
+        lifecycle = _lifecycle_entries_without_session(manager, mcp_tool_names_all)
+
+    # Allowlist lives in the dedicated tools.json sidecar; helper
+    # migrates legacy profile.yaml field on first read, and falls back
+    # to the role-based default when no sidecar exists.
+    enabled_mcp_tools = load_queen_tools_config(queen_id, mcp_catalog=catalog)
+    is_role_default = not tools_config_exists(queen_id)
+
+    response = {
+        "queen_id": queen_id,
+        "enabled_mcp_tools": enabled_mcp_tools,
+        "is_role_default": is_role_default,
+        "stale": stale,
+        "lifecycle": lifecycle,
+        "synthetic": _synthetic_entries(),
+        "mcp_servers": _render_mcp_servers(
+            mcp_tool_names_by_server=catalog,
+            enabled_mcp_tools=enabled_mcp_tools,
+        ),
+        "categories": _render_categories(queen_id, catalog),
+    }
+    return web.json_response(response)
+
+
+def _render_categories(
+    queen_id: str,
+    mcp_catalog: dict[str, list[dict[str, Any]]],
+) -> list[dict[str, Any]]:
+    """Expose the role-default category table to the frontend.
+
+    Each entry carries the category name, the resolved member tool names
+    (after ``@server:NAME`` shorthand expansion against the live catalog),
+    and ``in_role_default`` to flag categories that contribute to this
+    queen's role-based default. Lets the Tool Library group tools by
+    category alongside the per-server view.
+    """
+    applied = set(queen_role_categories(queen_id))
+    out: list[dict[str, Any]] = []
+    for name in list_category_names():
+        out.append(
+            {
+                "name": name,
+                "tools": resolve_category_tools(name, mcp_catalog),
+                "in_role_default": name in applied,
+            }
+        )
+    return out
+
+
+async def handle_patch_tools(request: web.Request) -> web.Response:
+    """PATCH /api/queen/{queen_id}/tools — persist the MCP tool allowlist.
+
+    Body: ``{"enabled_mcp_tools": null | string[]}``.
+
+    - ``null`` resets to "allow every MCP tool" (default).
+    - A list is validated against the known MCP catalog; unknown names
+      are rejected with 400 so the frontend catches typos.
+    """
+    queen_id = request.match_info["queen_id"]
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict) or "enabled_mcp_tools" not in body:
+        return web.json_response(
+            {"error": "Body must be an object with an 'enabled_mcp_tools' field"},
+            status=400,
+        )
+
+    enabled = body["enabled_mcp_tools"]
+    if enabled is not None:
+        if not isinstance(enabled, list) or not all(isinstance(x, str) for x in enabled):
+            return web.json_response(
+                {"error": "'enabled_mcp_tools' must be null or a list of strings"},
+                status=400,
+            )
+
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    # Validate names against the known MCP tool catalog. We prefer a live
+    # session's registry for the most up-to-date set, then fall back to
+    # the manager-level snapshot (building it on demand if absent).
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    known_names: set[str] = set()
+    for entries in catalog.values():
+        for entry in entries:
+            if entry.get("name"):
+                known_names.add(entry["name"])
+
+    if enabled is not None and known_names:
+        unknown = sorted(set(enabled) - known_names)
+        if unknown:
+            return web.json_response(
+                {"error": "Unknown MCP tool name(s)", "unknown": unknown},
+                status=400,
+            )
+
+    # Persist — tools.json sidecar, not profile.yaml.
+    try:
+        update_queen_tools_config(queen_id, enabled)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    # Hot-reload every live DM session for this queen. The filter memo is
+    # rebuilt so the very next turn sees the new allowlist without a
+    # session restart, and the prompt cache is invalidated exactly once.
+    refreshed = 0
+    sessions = getattr(manager, "_sessions", None) or {}
+    for sess in sessions.values():
+        if getattr(sess, "queen_name", None) != queen_id:
+            continue
+        phase_state = getattr(sess, "phase_state", None)
+        if phase_state is None:
+            continue
+        phase_state.enabled_mcp_tools = enabled
+        rebuild = getattr(phase_state, "rebuild_independent_filter", None)
+        if callable(rebuild):
+            try:
+                rebuild()
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Queen tools: rebuild_independent_filter failed for session %s",
+                    getattr(sess, "id", "?"),
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Queen tools: queen_id=%s allowlist=%s refreshed_sessions=%d",
+        queen_id,
+        "null" if enabled is None else f"{len(enabled)} tool(s)",
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "queen_id": queen_id,
+            "enabled_mcp_tools": enabled,
+            "refreshed_sessions": refreshed,
+        }
+    )
+
+
+async def handle_delete_tools(request: web.Request) -> web.Response:
+    """DELETE /api/queen/{queen_id}/tools — drop the sidecar, fall back to role defaults.
+
+    Users click "Reset to role default" in the Tool Library. That
+    removes ``tools.json`` so the queen's effective allowlist becomes
+    the role-based default (or allow-all if the queen has no role
+    entry). Live sessions are refreshed so the next turn reflects the
+    change without a restart.
+    """
+    queen_id = request.match_info["queen_id"]
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    removed = delete_queen_tools_config(queen_id)
+
+    # Recompute the queen's effective allowlist from the role defaults
+    # so we can hot-reload live sessions in one pass (same shape as
+    # PATCH).
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    new_enabled = load_queen_tools_config(queen_id, mcp_catalog=catalog)
+
+    refreshed = 0
+    sessions = getattr(manager, "_sessions", None) or {}
+    for sess in sessions.values():
+        if getattr(sess, "queen_name", None) != queen_id:
+            continue
+        phase_state = getattr(sess, "phase_state", None)
+        if phase_state is None:
+            continue
+        phase_state.enabled_mcp_tools = new_enabled
+        rebuild = getattr(phase_state, "rebuild_independent_filter", None)
+        if callable(rebuild):
+            try:
+                rebuild()
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Queen tools: rebuild_independent_filter failed for session %s",
+                    getattr(sess, "id", "?"),
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Queen tools: queen_id=%s reset-to-default removed=%s refreshed_sessions=%d",
+        queen_id,
+        removed,
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "queen_id": queen_id,
+            "removed": removed,
+            "enabled_mcp_tools": new_enabled,
+            "is_role_default": True,
+            "refreshed_sessions": refreshed,
+        }
+    )
+
+
+def register_routes(app: web.Application) -> None:
+    """Register queen-tools routes."""
+    app.router.add_get("/api/queen/{queen_id}/tools", handle_get_tools)
+    app.router.add_patch("/api/queen/{queen_id}/tools", handle_patch_tools)
+    app.router.add_delete("/api/queen/{queen_id}/tools", handle_delete_tools)
@@ -248,15 +248,22 @@ async def handle_queen_session(request: web.Request) -> web.Response:
    # Skip colony sessions: a colony forked from this queen also carries
    # queen_name == queen_id, but it has a worker loaded (colony_id /
    # worker_path set) and is the colony's chat, not the queen's DM.
-    for session in manager.list_sessions():
-        if session.queen_name == queen_id and session.colony_id is None and session.worker_path is None:
-            return web.json_response(
-                {
-                    "session_id": session.id,
-                    "queen_id": queen_id,
-                    "status": "live",
-                }
-            )
+    # When multiple DM sessions for this queen are live at once (e.g. the
+    # user created a new session, then navigated away and back), return
+    # the most recently loaded one so we don't resurrect a stale older
+    # session ahead of a freshly created one.
+    live_matches = [
+        s for s in manager.list_sessions() if s.queen_name == queen_id and s.colony_id is None and s.worker_path is None
+    ]
+    if live_matches:
+        latest = max(live_matches, key=lambda s: s.loaded_at)
+        return web.json_response(
+            {
+                "session_id": latest.id,
+                "queen_id": queen_id,
+                "status": "live",
+            }
+        )

    # 2. Find the most recent cold session for this queen and resume it.
    # IMPORTANT: skip sessions that don't belong in the queen DM:
@@ -378,6 +385,8 @@ async def handle_select_queen_session(request: web.Request) -> web.Response:

 async def handle_new_queen_session(request: web.Request) -> web.Response:
    """POST /api/queen/{queen_id}/session/new -- create a fresh queen session."""
+    from framework.tools.queen_lifecycle_tools import QUEEN_PHASES
+
    queen_id = request.match_info["queen_id"]
    manager = request.app["manager"]

@@ -387,9 +396,25 @@ async def handle_new_queen_session(request: web.Request) -> web.Response:
    except FileNotFoundError:
        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)

-    body = await request.json() if request.can_read_body else {}
+    if request.can_read_body:
+        try:
+            body = await request.json()
+        except json.JSONDecodeError:
+            return web.json_response({"error": "Invalid JSON body"}, status=400)
+        if not isinstance(body, dict):
+            return web.json_response({"error": "Request body must be a JSON object"}, status=400)
+    else:
+        body = {}
    initial_prompt = body.get("initial_prompt")
    initial_phase = body.get("initial_phase") or "independent"
+    if initial_phase not in QUEEN_PHASES:
+        return web.json_response(
+            {
+                "error": f"Invalid initial_phase '{initial_phase}'",
+                "valid": sorted(QUEEN_PHASES),
+            },
+            status=400,
+        )

    session = await manager.create_session(
        initial_prompt=initial_prompt,
@@ -122,8 +122,19 @@ async def handle_create_session(request: web.Request) -> web.Response:
    (equivalent to the old POST /api/agents). Otherwise creates a queen-only
    session that can later have a colony loaded via POST /sessions/{id}/colony.
    """
+    from framework.agents.queen.queen_profiles import ensure_default_queens, load_queen_profile
+    from framework.tools.queen_lifecycle_tools import QUEEN_PHASES
+
    manager = _get_manager(request)
-    body = await request.json() if request.can_read_body else {}
+    if request.can_read_body:
+        try:
+            body = await request.json()
+        except json.JSONDecodeError:
+            return web.json_response({"error": "Invalid JSON body"}, status=400)
+        if not isinstance(body, dict):
+            return web.json_response({"error": "Request body must be a JSON object"}, status=400)
+    else:
+        body = {}
    agent_path = body.get("agent_path")
    agent_id = body.get("agent_id")
    session_id = body.get("session_id")
@@ -134,6 +145,21 @@ async def handle_create_session(request: web.Request) -> web.Response:
    initial_phase = body.get("initial_phase")
    worker_name = body.get("worker_name")

+    if initial_phase is not None and initial_phase not in QUEEN_PHASES:
+        return web.json_response(
+            {
+                "error": f"Invalid initial_phase '{initial_phase}'",
+                "valid": sorted(QUEEN_PHASES),
+            },
+            status=400,
+        )
+    if queen_name:
+        ensure_default_queens()
+        try:
+            load_queen_profile(queen_name)
+        except FileNotFoundError:
+            return web.json_response({"error": f"Queen '{queen_name}' not found"}, status=404)
+
    if agent_path:
        try:
            agent_path = str(validate_agent_path(agent_path))
@@ -160,6 +186,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
                model=model,
                initial_prompt=initial_prompt,
                queen_resume_from=queen_resume_from,
+                queen_name=queen_name,
                initial_phase=initial_phase,
            )
    except ValueError as e:
@@ -771,6 +798,110 @@ async def handle_session_colonies(request: web.Request) -> web.Response:
 _EVENTS_HISTORY_DEFAULT_LIMIT = 2000
 _EVENTS_HISTORY_MAX_LIMIT = 10000

+# Files at or below this size use the simple forward-scan path (cheap enough
+# that the seek-backward dance isn't worth it). Above this threshold we read
+# the tail directly from end-of-file so a 50 MB log doesn't have to be paged
+# through entirely just to surface the last 2000 lines.
+_EVENTS_HISTORY_REVERSE_TAIL_THRESHOLD_BYTES = 1 << 20  # 1 MB
+_EVENTS_HISTORY_REVERSE_TAIL_CHUNK_BYTES = 64 * 1024
+
+
+def _read_events_tail(events_path: Path, limit: int) -> tuple[list[dict], int, bool]:
+    """Read the tail of an append-only JSONL events log.
+
+    Returns ``(events, total, truncated)``.  ``events`` is at most ``limit``
+    lines, oldest-first.  ``total`` is the total number of non-blank lines in
+    the file (exact for the small-file path, exact for the large-file path
+    too — we do a separate fast newline-count pass).
+
+    Two paths:
+    - Small files (< ~1 MB): forward scan.  Cheap; gives an exact total for
+      free.  Defers ``json.loads`` to the bounded deque so we never parse a
+      line that's about to be dropped.
+    - Large files: seek to EOF and read backward in 64 KB chunks until we have
+      at least ``limit`` complete lines.  Parses only the tail.  ``total`` is
+      counted by a separate forward byte-scan that just counts newlines —
+      no JSON parse — so it stays cheap even for huge files.
+
+    Without these optimizations, mounting the chat for a long-running queen
+    with a ~50 k-event log used to spend most of its time inside ``json.loads``
+    on the server thread (and block the event loop while doing it).
+    """
+    from collections import deque
+
+    file_size = events_path.stat().st_size
+
+    if file_size <= _EVENTS_HISTORY_REVERSE_TAIL_THRESHOLD_BYTES:
+        tail_raw: deque[str] = deque(maxlen=limit)
+        total = 0
+        with open(events_path, encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if not line:
+                    continue
+                total += 1
+                tail_raw.append(line)
+        events: list[dict] = []
+        for raw in tail_raw:
+            try:
+                events.append(json.loads(raw))
+            except json.JSONDecodeError:
+                continue
+        return events, total, total > len(events)
+
+    # Large-file path: read backward until we have enough lines.
+    import os as _os
+
+    chunk_size = _EVENTS_HISTORY_REVERSE_TAIL_CHUNK_BYTES
+    pieces: list[bytes] = []
+    newline_count = 0
+    with open(events_path, "rb") as fb:
+        fb.seek(0, _os.SEEK_END)
+        pos = fb.tell()
+        while pos > 0 and newline_count <= limit:
+            read_size = min(chunk_size, pos)
+            pos -= read_size
+            fb.seek(pos)
+            chunk = fb.read(read_size)
+            newline_count += chunk.count(b"\n")
+            pieces.append(chunk)
+    pieces.reverse()
+    blob = b"".join(pieces)
+
+    # Drop the leading partial line unless we read from offset 0.
+    raw_lines = blob.split(b"\n")
+    if pos > 0 and raw_lines:
+        raw_lines = raw_lines[1:]
+    decoded = [ln.decode("utf-8", errors="replace").strip() for ln in raw_lines]
+    decoded = [ln for ln in decoded if ln]
+    if len(decoded) > limit:
+        decoded = decoded[-limit:]
+
+    events = []
+    for raw in decoded:
+        try:
+            events.append(json.loads(raw))
+        except json.JSONDecodeError:
+            continue
+
+    # Separate fast pass for total: count newlines only, no JSON parse.
+    total = 0
+    with open(events_path, "rb") as fb:
+        while True:
+            chunk = fb.read(1 << 20)
+            if not chunk:
+                break
+            total += chunk.count(b"\n")
+    # File may end without a trailing newline; if so, the last non-empty line
+    # was missed. Count it.
+    if file_size > 0:
+        with open(events_path, "rb") as fb:
+            fb.seek(-1, _os.SEEK_END)
+            if fb.read(1) != b"\n":
+                total += 1
+
+    return events, total, total > len(events)
+

 async def handle_session_events_history(request: web.Request) -> web.Response:
    """GET /api/sessions/{session_id}/events/history — persisted eventbus log.
@@ -800,6 +931,9 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
    recent N events". Long-running colonies have produced files with 50k+
    events; before this cap, restoring on page-mount shipped the whole thing
    down the wire and blocked the UI for seconds.
+
+    The actual file read runs in a worker thread via ``asyncio.to_thread`` so
+    it doesn't block the event loop while other requests are in flight.
    """
    session_id = request.match_info["session_id"]

@@ -825,24 +959,8 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
            }
        )

-    # Tail the file using a bounded deque — O(limit) memory regardless
-    # of file size. No need to materialize the whole list only to slice it.
-    from collections import deque
-
-    tail: deque[dict] = deque(maxlen=limit)
-    total = 0
    try:
-        with open(events_path, encoding="utf-8") as f:
-            for line in f:
-                line = line.strip()
-                if not line:
-                    continue
-                try:
-                    evt = json.loads(line)
-                except json.JSONDecodeError:
-                    continue
-                total += 1
-                tail.append(evt)
+        events, total, truncated = await asyncio.to_thread(_read_events_tail, events_path, limit)
    except OSError:
        return web.json_response(
            {
@@ -855,14 +973,13 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
            }
        )

-    events = list(tail)
    return web.json_response(
        {
            "events": events,
            "session_id": session_id,
            "total": total,
            "returned": len(events),
-            "truncated": total > len(events),
+            "truncated": truncated,
            "limit": limit,
        }
    )
@@ -977,8 +1094,10 @@ async def handle_delete_agent(request: web.Request) -> web.Response:
    except ValueError as exc:
        return web.json_response({"error": str(exc)}, status=400)

-    # Reject deletion of framework agents (~/.hive/agents/) — those are internal
-    hive_agents_dir = Path.home() / ".hive" / "agents"
+    # Reject deletion of framework agents ($HIVE_HOME/agents/) — those are internal
+    from framework.config import HIVE_HOME
+
+    hive_agents_dir = HIVE_HOME / "agents"
    if resolved.is_relative_to(hive_agents_dir):
        return web.json_response({"error": "Cannot delete framework agents"}, status=403)

@@ -0,0 +1,112 @@
+"""REST routes for task lists.
+
+  GET /api/tasks/{task_list_id}                     -- snapshot of one list
+  GET /api/colonies/{colony_id}/task_lists          -- helper for colony view
+  GET /api/sessions/{session_id}/task_list_id       -- helper for session view
+
+The task_list_id segment uses URL-encoded colons (``colony%3Aabc`` /
+``session%3Aagent%3Asess``); aiohttp decodes them automatically.
+"""
+
+from __future__ import annotations
+
+import logging
+
+from aiohttp import web
+
+from framework.tasks import get_task_store
+from framework.tasks.scoping import (
+    colony_task_list_id,
+    session_task_list_id,
+)
+
+logger = logging.getLogger(__name__)
+
+
+async def handle_get_task_list(request: web.Request) -> web.Response:
+    raw = request.match_info.get("task_list_id", "")
+    if not raw:
+        return web.json_response({"error": "task_list_id required"}, status=400)
+
+    store = get_task_store()
+    if not await store.list_exists(raw):
+        return web.json_response(
+            {"error": f"Task list {raw!r} not found", "task_list_id": raw, "tasks": []},
+            status=404,
+        )
+
+    meta = await store.get_meta(raw)
+    records = await store.list_tasks(raw)
+    return web.json_response(
+        {
+            "task_list_id": raw,
+            "role": meta.role.value if meta else "session",
+            "meta": meta.model_dump(mode="json") if meta else None,
+            "tasks": [
+                {
+                    "id": r.id,
+                    "subject": r.subject,
+                    "description": r.description,
+                    "active_form": r.active_form,
+                    "owner": r.owner,
+                    "status": r.status.value,
+                    "blocks": list(r.blocks),
+                    "blocked_by": list(r.blocked_by),
+                    "metadata": dict(r.metadata),
+                    "created_at": r.created_at,
+                    "updated_at": r.updated_at,
+                }
+                for r in records
+            ],
+        }
+    )
+
+
+async def handle_get_colony_task_lists(request: web.Request) -> web.Response:
+    """Return template_task_list_id and queen_session_task_list_id for a colony."""
+    colony_id = request.match_info.get("colony_id", "")
+    if not colony_id:
+        return web.json_response({"error": "colony_id required"}, status=400)
+
+    template_id = colony_task_list_id(colony_id)
+    # Queen's session list — the queen-of-colony's session_id == the
+    # browser-facing colony session id. The frontend already knows that
+    # value; we surface what we have on disk for completeness.
+    queen_session_id = request.query.get("queen_session_id")
+    queen_list_id = session_task_list_id("queen", queen_session_id) if queen_session_id else None
+    return web.json_response(
+        {
+            "template_task_list_id": template_id,
+            "queen_session_task_list_id": queen_list_id,
+        }
+    )
+
+
+async def handle_get_session_task_list_id(request: web.Request) -> web.Response:
+    """Return task_list_id and picked_up_from for a session.
+
+    The session_id is the queen's session id or a worker's session id;
+    both follow the same path. The agent_id is read from the request query
+    (passed by the frontend, which already knows which agent the session
+    belongs to).
+    """
+    session_id = request.match_info.get("session_id", "")
+    agent_id = request.query.get("agent_id", "queen")
+    if not session_id:
+        return web.json_response({"error": "session_id required"}, status=400)
+
+    task_list_id = session_task_list_id(agent_id, session_id)
+    store = get_task_store()
+    exists = await store.list_exists(task_list_id)
+    return web.json_response(
+        {
+            "task_list_id": task_list_id if exists else None,
+            "picked_up_from": None,
+        }
+    )
+
+
+def register_routes(app: web.Application) -> None:
+    app.router.add_get("/api/tasks/{task_list_id}", handle_get_task_list)
+    app.router.add_get("/api/colonies/{colony_id}/task_lists", handle_get_colony_task_lists)
+    app.router.add_get("/api/sessions/{session_id}/task_list_id", handle_get_session_task_list_id)
@@ -67,11 +67,9 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
    worker_session_id = request.query.get("session_id")
    if worker_session_id and session.worker_path:
        worker_session_id = safe_path_segment(worker_session_id)
-        from pathlib import Path
+        from framework.config import HIVE_HOME

-        state_path = (
-            Path.home() / ".hive" / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
-        )
+        state_path = HIVE_HOME / "agents" / session.worker_path.name / "sessions" / worker_session_id / "state.json"
        if state_path.exists():
            try:
                state = json.loads(state_path.read_text(encoding="utf-8"))
@@ -19,7 +19,7 @@ from datetime import datetime
 from pathlib import Path
 from typing import Any, Literal

-from framework.config import QUEENS_DIR
+from framework.config import QUEENS_DIR, get_max_tokens
 from framework.host.triggers import TriggerDefinition

 logger = logging.getLogger(__name__)
@@ -546,8 +546,10 @@ class SessionManager:
        session.colony_name = colony_id
        session.worker_path = agent_path

-        # Worker storage: ~/.hive/agents/{colony_name}/{worker_name}/
-        worker_storage = Path.home() / ".hive" / "agents" / colony_id / worker_name
+        # Worker storage: $HIVE_HOME/agents/{colony_name}/{worker_name}/
+        from framework.config import HIVE_HOME
+
+        worker_storage = HIVE_HOME / "agents" / colony_id / worker_name
        worker_storage.mkdir(parents=True, exist_ok=True)

        # Copy conversations from colony if fresh
@@ -698,7 +700,10 @@ class SessionManager:
            available_tools=all_tools,
            goal_context=goal.to_prompt_context(),
            goal=goal,
-            max_tokens=8192,
+            # Worker output cap — pull from configuration.json instead of
+            # hard-coding 8192. glm-5.1/kimi-k2.5 both support 32k out, and
+            # capping at 8k silently truncates long worker turns mid-tool.
+            max_tokens=get_max_tokens(),
            stream_id=worker_name,
            execution_id=worker_name,
            identity_prompt=worker_data.get("identity_prompt", ""),
@@ -927,7 +932,9 @@ class SessionManager:
           that process is still running on the host. If it is, the session is
           owned by another healthy worker process, so leave it alone.
        """
-        sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
+        from framework.config import HIVE_HOME
+
+        sessions_path = HIVE_HOME / "agents" / agent_path.name / "sessions"
        if not sessions_path.exists():
            return

@@ -1223,8 +1230,27 @@ class SessionManager:
                logger.info("Session '%s': shutdown reflection spawned", session_id)
                self._background_tasks.add(task)
                task.add_done_callback(self._background_tasks.discard)
-            except Exception:
-                logger.warning("Session '%s': failed to spawn shutdown reflection", session_id, exc_info=True)
+            except RuntimeError as exc:
+                # Most common when a session is stopped after the event loop
+                # has closed (e.g. during server shutdown or from an atexit
+                # handler). The reflection would have had nothing to write
+                # anyway — no new turns since the last periodic reflection.
+                logger.warning(
+                    "Session '%s': shutdown reflection skipped — event loop unavailable (%s). "
+                    "Normal during server shutdown; anything worth persisting was saved by the "
+                    "periodic reflection after the last turn.",
+                    session_id,
+                    exc,
+                )
+            except Exception as exc:
+                logger.warning(
+                    "Session '%s': failed to spawn shutdown reflection: %s: %s. "
+                    "Check that queen_dir exists and session.llm is configured; full traceback follows.",
+                    session_id,
+                    type(exc).__name__,
+                    exc,
+                    exc_info=True,
+                )

        if session.queen_task is not None:
            session.queen_task.cancel()
@@ -1516,8 +1542,46 @@ class SessionManager:
            tool_executor=queen_tool_executor,
            event_bus=session.event_bus,
            colony_id=session.id,
+            # Wire the on-disk colony name and queen id so
+            # ColonyRuntime auto-derives its override paths. DM sessions
+            # have no colony_name (session.colony_name is None), which
+            # keeps them out of the per-colony JSON store.
+            colony_name=getattr(session, "colony_name", None),
+            queen_id=getattr(session, "queen_name", None) or None,
            pipeline_stages=[],  # queen pipeline runs in queen_orchestrator, not here
        )
+
+        # Per-colony tool allowlist, loaded from the colony's metadata.json
+        # when this session is attached to a real forked colony. For pure
+        # queen DM sessions (session.colony_name is None) we only capture
+        # the MCP-origin set — the allowlist stays ``None`` so every MCP
+        # tool passes through by default.
+        try:
+            mcp_tool_names_all: set[str] = set()
+            mgr_catalog = getattr(self, "_mcp_tool_catalog", None)
+            if isinstance(mgr_catalog, dict):
+                for entries in mgr_catalog.values():
+                    for entry in entries:
+                        name = entry.get("name") if isinstance(entry, dict) else None
+                        if name:
+                            mcp_tool_names_all.add(name)
+            enabled_mcp_tools: list[str] | None = None
+            colony_name = getattr(session, "colony_name", None)
+            if colony_name:
+                # Colony tool allowlist lives in a dedicated tools.json
+                # sidecar next to metadata.json. The helper migrates any
+                # legacy field out of metadata.json on first read.
+                from framework.host.colony_tools_config import load_colony_tools_config
+
+                enabled_mcp_tools = load_colony_tools_config(colony_name)
+            colony.set_tool_allowlist(enabled_mcp_tools, mcp_tool_names_all)
+        except Exception:
+            logger.debug(
+                "Colony allowlist bootstrap failed for session %s",
+                session.id,
+                exc_info=True,
+            )
+
        await colony.start()
        session.colony = colony

@@ -1707,6 +1771,42 @@ class SessionManager:
    def list_sessions(self) -> list[Session]:
        return list(self._sessions.values())

+    # ------------------------------------------------------------------
+    # Skill override helpers — used by routes_skills to find every live
+    # SkillsManager affected by a queen- or colony-scope mutation so a
+    # single HTTP call can reload them all.
+    # ------------------------------------------------------------------
+
+    def iter_queen_sessions(self, queen_id: str):
+        """Yield live sessions whose queen matches ``queen_id``."""
+        for s in self._sessions.values():
+            if getattr(s, "queen_name", None) == queen_id:
+                yield s
+
+    def iter_colony_runtimes(
+        self,
+        *,
+        queen_id: str | None = None,
+        colony_name: str | None = None,
+    ):
+        """Yield live ``ColonyRuntime`` instances matching the filters.
+
+        ``queen_id`` alone → every runtime whose ``queen_id`` matches
+        (useful when the user toggles a queen-scope skill — all her
+        colonies must reload).  ``colony_name`` alone → the single
+        runtime pinned to that colony.  Both → intersection. No filters
+        → every live runtime (used by global ``/api/skills`` reload).
+        """
+        for s in self._sessions.values():
+            colony = getattr(s, "colony", None)
+            if colony is None:
+                continue
+            if queen_id is not None and getattr(colony, "queen_id", None) != queen_id:
+                continue
+            if colony_name is not None and getattr(colony, "colony_name", None) != colony_name:
+                continue
+            yield colony
+
    # ------------------------------------------------------------------
    # Cold session helpers (disk-only, no live runtime required)
    # ------------------------------------------------------------------
@@ -1825,73 +1925,38 @@ class SessionManager:
            if meta.get("colony_fork"):
                continue

-            # Build a quick preview of the last human/assistant exchange.
-            # We read all conversation parts, filter to client-facing messages,
-            # and return the last assistant message content as a snippet.
+            # Preview of the last client-facing exchange. Cached in
+            # ``summary.json`` next to ``meta.json`` so the sidebar doesn't
+            # have to rescan every part on each list call. The cache is
+            # written incrementally by FileConversationStore.write_part; if
+            # missing or stale (parts dir mtime newer than the summary file)
+            # we do a one-time full rebuild and write a fresh summary.
+            #
+            # NOTE on activity timestamps: the session directory's own mtime
+            # is NOT reliable as a "last activity" marker — POSIX dir mtime
+            # only updates when direct entries change, and conversation
+            # parts live under conversations/parts/, so writing a new part
+            # does not bubble up to the session dir.
+            from framework.storage import session_summary
+
            last_message: str | None = None
            message_count: int = 0
-            # Last-activity timestamp — mtime of the latest client-facing message.
-            # Falls back to session creation time for empty sessions. NOTE: the
-            # session directory's own mtime is NOT reliable here — POSIX dir mtime
-            # only updates when direct entries change, and conversation parts are
-            # nested under conversations/parts/, so writing a new part does not
-            # bubble up to the session dir.
            last_active_at: float = float(created_at) if isinstance(created_at, (int, float)) else 0.0
            convs_dir = d / "conversations"
+
+            summary: dict | None = None
            if convs_dir.exists():
-                try:
-                    all_parts: list[dict] = []
+                if session_summary.is_stale(d):
+                    summary = session_summary.rebuild_summary(d)
+                else:
+                    summary = session_summary.read_summary(d)

-                    def _collect_parts(parts_dir: Path, _dest: list[dict] = all_parts) -> None:
-                        if not parts_dir.exists():
-                            return
-                        for part_file in sorted(parts_dir.iterdir()):
-                            if part_file.suffix != ".json":
-                                continue
-                            try:
-                                part = json.loads(part_file.read_text(encoding="utf-8"))
-                                part.setdefault("created_at", part_file.stat().st_mtime)
-                                _dest.append(part)
-                            except (json.JSONDecodeError, OSError):
-                                continue
-
-                    # Flat layout: conversations/parts/*.json
-                    _collect_parts(convs_dir / "parts")
-                    # Node-based layout: conversations/<node_id>/parts/*.json
-                    for node_dir in convs_dir.iterdir():
-                        if not node_dir.is_dir() or node_dir.name == "parts":
-                            continue
-                        _collect_parts(node_dir / "parts")
-                    # Filter to client-facing messages only
-                    client_msgs = [
-                        p
-                        for p in all_parts
-                        if not p.get("is_transition_marker")
-                        and p.get("role") != "tool"
-                        and not (p.get("role") == "assistant" and p.get("tool_calls"))
-                    ]
-                    client_msgs.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
-                    message_count = len(client_msgs)
-                    # Take the latest message's timestamp as the activity marker.
-                    # _collect_parts sets created_at via setdefault to the part
-                    # file's mtime, so this is always a valid float.
-                    if client_msgs:
-                        latest_ts = client_msgs[-1].get("created_at")
-                        if isinstance(latest_ts, (int, float)) and latest_ts > last_active_at:
-                            last_active_at = float(latest_ts)
-                    # Last assistant message as preview snippet
-                    for msg in reversed(client_msgs):
-                        content = msg.get("content") or ""
-                        if isinstance(content, list):
-                            # Anthropic-style content blocks
-                            content = " ".join(
-                                b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text"
-                            )
-                        if content and msg.get("role") == "assistant":
-                            last_message = content[:120].strip()
-                            break
-                except OSError:
-                    pass
+            if summary is not None:
+                message_count = int(summary.get("message_count") or 0)
+                last_message = summary.get("last_message")
+                cached_active = summary.get("last_active_at")
+                if isinstance(cached_active, (int, float)) and cached_active > last_active_at:
+                    last_active_at = float(cached_active)

            # Derive queen_id from directory structure: queens/{queen_id}/sessions/{session_id}
            queen_id = d.parent.parent.name if d.parent.name == "sessions" else None
@@ -0,0 +1,425 @@
+"""Tests for POST /api/colonies/import — tar-based colony onboarding.
+
+The handler resolves writes against ``framework.config.COLONIES_DIR``;
+every test redirects that into a ``tmp_path`` so we never touch the real
+``~/.hive/colonies`` tree.
+"""
+
+from __future__ import annotations
+
+import io
+import tarfile
+from pathlib import Path
+
+import pytest
+from aiohttp import FormData, web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.server import routes_colonies
+
+
+def _build_tar(layout: dict[str, bytes | None], *, gzip: bool = True) -> bytes:
+    """Build an in-memory tar with the given paths.
+
+    ``layout`` maps archive member names to file contents; passing ``None``
+    creates a directory entry instead of a regular file.
+    """
+    buf = io.BytesIO()
+    mode = "w:gz" if gzip else "w"
+    with tarfile.open(fileobj=buf, mode=mode) as tf:
+        for name, content in layout.items():
+            if content is None:
+                info = tarfile.TarInfo(name=name)
+                info.type = tarfile.DIRTYPE
+                info.mode = 0o755
+                tf.addfile(info)
+            else:
+                info = tarfile.TarInfo(name=name)
+                info.size = len(content)
+                info.mode = 0o644
+                tf.addfile(info, io.BytesIO(content))
+    return buf.getvalue()
+
+
+def _build_tar_with_symlink(top: str, link_name: str, link_target: str) -> bytes:
+    buf = io.BytesIO()
+    with tarfile.open(fileobj=buf, mode="w:gz") as tf:
+        info = tarfile.TarInfo(name=top)
+        info.type = tarfile.DIRTYPE
+        info.mode = 0o755
+        tf.addfile(info)
+        sym = tarfile.TarInfo(name=f"{top}/{link_name}")
+        sym.type = tarfile.SYMTYPE
+        sym.linkname = link_target
+        tf.addfile(sym)
+    return buf.getvalue()
+
+
+@pytest.fixture
+def colonies_dir(tmp_path, monkeypatch):
+    """Redirect COLONIES_DIR into a tmp tree."""
+    colonies = tmp_path / "colonies"
+    colonies.mkdir()
+    monkeypatch.setattr(routes_colonies, "COLONIES_DIR", colonies)
+    return colonies
+
+
+async def _client(app: web.Application) -> TestClient:
+    return TestClient(TestServer(app))
+
+
+def _app() -> web.Application:
+    app = web.Application()
+    routes_colonies.register_routes(app)
+    return app
+
+
+def _form(file_bytes: bytes, *, filename: str = "colony.tar.gz", **fields: str) -> FormData:
+    fd = FormData()
+    fd.add_field("file", file_bytes, filename=filename, content_type="application/gzip")
+    for k, v in fields.items():
+        fd.add_field(k, v)
+    return fd
+
+
+@pytest.mark.asyncio
+async def test_happy_path_imports_colony(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "x_daily/": None,
+            "x_daily/metadata.json": b'{"colony_name":"x_daily"}',
+            "x_daily/scripts/run.sh": b"#!/bin/sh\necho hi\n",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 201, await resp.text()
+        body = await resp.json()
+    assert body["name"] == "x_daily"
+    assert body["files_imported"] == 2
+    assert (colonies_dir / "x_daily" / "metadata.json").read_bytes() == b'{"colony_name":"x_daily"}'
+    assert (colonies_dir / "x_daily" / "scripts" / "run.sh").exists()
+
+
+@pytest.mark.asyncio
+async def test_name_override(colonies_dir: Path) -> None:
+    archive = _build_tar({"x_daily/": None, "x_daily/file.txt": b"hi"})
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive, name="other_name"))
+        assert resp.status == 201
+        body = await resp.json()
+    assert body["name"] == "other_name"
+    assert (colonies_dir / "other_name" / "file.txt").read_bytes() == b"hi"
+    assert not (colonies_dir / "x_daily").exists()
+
+
+@pytest.mark.asyncio
+async def test_rejects_existing_without_replace_flag(colonies_dir: Path) -> None:
+    (colonies_dir / "x_daily").mkdir()
+    (colonies_dir / "x_daily" / "old.txt").write_text("preserved")
+    archive = _build_tar({"x_daily/": None, "x_daily/new.txt": b"new"})
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 409
+    # Original content untouched
+    assert (colonies_dir / "x_daily" / "old.txt").read_text() == "preserved"
+
+
+@pytest.mark.asyncio
+async def test_replace_existing_overwrites(colonies_dir: Path) -> None:
+    (colonies_dir / "x_daily").mkdir()
+    (colonies_dir / "x_daily" / "old.txt").write_text("preserved")
+    archive = _build_tar({"x_daily/": None, "x_daily/new.txt": b"new"})
+    async with await _client(_app()) as c:
+        resp = await c.post(
+            "/api/colonies/import",
+            data=_form(archive, replace_existing="true"),
+        )
+        assert resp.status == 201, await resp.text()
+    assert not (colonies_dir / "x_daily" / "old.txt").exists()
+    assert (colonies_dir / "x_daily" / "new.txt").read_text() == "new"
+
+
+@pytest.mark.asyncio
+async def test_rejects_path_traversal(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "x_daily/": None,
+            "x_daily/../escape.txt": b"oops",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+        assert "traversal" in (await resp.json())["error"].lower() or "outside" in (await resp.json())["error"].lower()
+    assert not (colonies_dir / "x_daily").exists()
+    assert not (colonies_dir.parent / "escape.txt").exists()
+
+
+@pytest.mark.asyncio
+async def test_rejects_absolute_member(colonies_dir: Path) -> None:
+    archive = _build_tar({"x_daily/": None, "/etc/passwd": b"oops"})
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_rejects_symlinks(colonies_dir: Path) -> None:
+    archive = _build_tar_with_symlink("x_daily", "evil", "/etc/passwd")
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+        assert "symlink" in (await resp.json())["error"].lower()
+
+
+@pytest.mark.asyncio
+async def test_rejects_multiple_top_level_dirs(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "a/": None,
+            "a/x.txt": b"a",
+            "b/": None,
+            "b/y.txt": b"b",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+        assert "top-level" in (await resp.json())["error"].lower()
+
+
+@pytest.mark.asyncio
+async def test_rejects_invalid_colony_name(colonies_dir: Path) -> None:
+    archive = _build_tar({"Bad-Name/": None, "Bad-Name/x.txt": b"x"})
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_rejects_non_multipart(colonies_dir: Path) -> None:
+    async with await _client(_app()) as c:
+        resp = await c.post(
+            "/api/colonies/import", data=b"not multipart", headers={"Content-Type": "application/octet-stream"}
+        )
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_rejects_corrupt_tar(colonies_dir: Path) -> None:
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(b"not a real tar"))
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_rejects_missing_file_part(colonies_dir: Path) -> None:
+    fd = FormData()
+    fd.add_field("name", "anything")
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=fd)
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_accepts_uncompressed_tar(colonies_dir: Path) -> None:
+    archive = _build_tar({"x_daily/": None, "x_daily/file.txt": b"plain"}, gzip=False)
+    async with await _client(_app()) as c:
+        resp = await c.post(
+            "/api/colonies/import",
+            data=_form(archive, filename="colony.tar"),
+        )
+        assert resp.status == 201
+    assert (colonies_dir / "x_daily" / "file.txt").read_text() == "plain"
+
+
+# --------------------------------------------------------------------------
+# Multi-root tar tests — the desktop's pushColonyToWorkspace ships the colony
+# dir + worker conversations + the queen's forked session in one tar so the
+# queen has full context on resume. Each recognised top-level prefix unpacks
+# into its corresponding HIVE_HOME subtree.
+# --------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_multi_root_unpacks_three_subtrees(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/metadata.json": b'{"queen_session_id":"session_x"}',
+            "colonies/x_daily/data/progress.db": b"sqlite",
+            "agents/x_daily/worker/": None,
+            "agents/x_daily/worker/conversations/": None,
+            "agents/x_daily/worker/conversations/0001.json": b'{"role":"user"}',
+            "agents/x_daily/worker/conversations/0002.json": b'{"role":"assistant"}',
+            "agents/queens/queen_alpha/sessions/session_x/": None,
+            "agents/queens/queen_alpha/sessions/session_x/queen.json": b'{"id":"x"}',
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 201, await resp.text()
+        body = await resp.json()
+    # Colony files
+    assert (colonies_dir / "x_daily" / "metadata.json").exists()
+    assert (colonies_dir / "x_daily" / "data" / "progress.db").exists()
+    # Worker conversations under HIVE_HOME/agents/<colony>/worker/
+    hive_home = colonies_dir.parent
+    assert (
+        hive_home / "agents" / "x_daily" / "worker" / "conversations" / "0001.json"
+    ).read_bytes() == b'{"role":"user"}'
+    # Queen forked session under HIVE_HOME/agents/queens/<queen>/sessions/<sid>/
+    assert (hive_home / "agents" / "queens" / "queen_alpha" / "sessions" / "session_x" / "queen.json").exists()
+    # Summary in response
+    assert body["name"] == "x_daily"
+    assert body["files_imported"] == 5
+    by_root = body["by_root"]
+    assert by_root["colonies"]["files"] == 2
+    assert by_root["agents_worker"]["files"] == 2
+    assert by_root["agents_queen"]["files"] == 1
+
+
+@pytest.mark.asyncio
+async def test_multi_root_colonies_only_succeeds(colonies_dir: Path) -> None:
+    """The agents/ subtrees are optional — a fresh colony has no history."""
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/metadata.json": b"{}",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 201, await resp.text()
+        body = await resp.json()
+    assert body["files_imported"] == 1
+    assert (colonies_dir / "x_daily" / "metadata.json").read_bytes() == b"{}"
+
+
+@pytest.mark.asyncio
+async def test_multi_root_rejects_missing_colonies_root(colonies_dir: Path) -> None:
+    """Worker / queen trees alone aren't valid — every push must include
+    the colony dir, otherwise the desktop's intent is unclear and we
+    refuse rather than silently leave HIVE_HOME in a half-state."""
+    archive = _build_tar(
+        {
+            "agents/x_daily/worker/": None,
+            "agents/x_daily/worker/log.json": b"{}",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400, await resp.text()
+        err = (await resp.json())["error"]
+        assert "colonies/" in err
+
+
+@pytest.mark.asyncio
+async def test_multi_root_replace_existing_colony(colonies_dir: Path) -> None:
+    (colonies_dir / "x_daily").mkdir()
+    (colonies_dir / "x_daily" / "old.txt").write_text("preserved")
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/new.txt": b"new",
+        }
+    )
+    # Without flag → 409
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 409
+    assert (colonies_dir / "x_daily" / "old.txt").read_text() == "preserved"
+    # With flag → wipes + replaces
+    async with await _client(_app()) as c:
+        resp = await c.post(
+            "/api/colonies/import",
+            data=_form(archive, replace_existing="true"),
+        )
+        assert resp.status == 201, await resp.text()
+    assert not (colonies_dir / "x_daily" / "old.txt").exists()
+    assert (colonies_dir / "x_daily" / "new.txt").read_text() == "new"
+
+
+@pytest.mark.asyncio
+async def test_multi_root_rejects_traversal_in_worker_subtree(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/m.json": b"{}",
+            "agents/x_daily/worker/": None,
+            "agents/x_daily/worker/../escape.txt": b"oops",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+    hive_home = colonies_dir.parent
+    assert not (hive_home / "agents" / "escape.txt").exists()
+
+
+@pytest.mark.asyncio
+async def test_multi_root_rejects_unknown_prefix(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/m.json": b"{}",
+            "etc/passwd": b"oops",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        # The unknown root is silently ignored (it doesn't match any
+        # recognised prefix); the colony root is required and present, so
+        # extraction succeeds and only the colonies subtree lands. We don't
+        # write outside HIVE_HOME because the dispatcher only routes to
+        # known destinations.
+        assert resp.status == 201, await resp.text()
+    hive_home = colonies_dir.parent
+    assert not (hive_home.parent / "etc" / "passwd").exists()
+    assert not (hive_home / "etc" / "passwd").exists()
+
+
+@pytest.mark.asyncio
+async def test_multi_root_rejects_invalid_segment(colonies_dir: Path) -> None:
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/m.json": b"{}",
+            "agents/queens/Bad-Queen/sessions/sess_1/": None,
+            "agents/queens/Bad-Queen/sessions/sess_1/x.json": b"{}",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post("/api/colonies/import", data=_form(archive))
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_multi_root_overwrites_agents_subtree_in_place(colonies_dir: Path) -> None:
+    """Worker/queen subtrees are append-mostly stores — the import handler
+    extracts in place without an existence-conflict gate so the desktop can
+    re-push from another machine without explicit overwrite."""
+    hive_home = colonies_dir.parent
+    worker_dir = hive_home / "agents" / "x_daily" / "worker" / "conversations"
+    worker_dir.mkdir(parents=True)
+    (worker_dir / "0000_old.json").write_text("old")
+    archive = _build_tar(
+        {
+            "colonies/x_daily/": None,
+            "colonies/x_daily/m.json": b"{}",
+            "agents/x_daily/worker/": None,
+            "agents/x_daily/worker/conversations/": None,
+            "agents/x_daily/worker/conversations/0001_new.json": b"new",
+        }
+    )
+    async with await _client(_app()) as c:
+        resp = await c.post(
+            "/api/colonies/import",
+            data=_form(archive, replace_existing="true"),
+        )
+        assert resp.status == 201, await resp.text()
+    # Old conversation file untouched (extraction is additive on agents/),
+    # new one written.
+    assert (worker_dir / "0000_old.json").read_text() == "old"
+    assert (worker_dir / "0001_new.json").read_text() == "new"
@@ -0,0 +1,300 @@
+"""Tests for the per-colony MCP tool allowlist filter + routes.
+
+Covers:
+1. ``ColonyRuntime`` filter semantics (default-allow, allowlist, empty,
+   lifecycle passes through).
+2. routes_colony_tools round trip (GET/PATCH, validation, 404).
+3. Colony index route for the Tool Library picker.
+
+Routes never touch the real ``~/.hive/colonies`` tree — we redirect
+``COLONIES_DIR`` into ``tmp_path`` via monkeypatch.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from typing import Any
+
+import pytest
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.host.colony_runtime import ColonyRuntime
+from framework.llm.provider import Tool
+from framework.server import routes_colony_tools
+
+
+def _tool(name: str) -> Tool:
+    return Tool(name=name, description=f"desc of {name}", parameters={"type": "object"})
+
+
+# ---------------------------------------------------------------------------
+# ColonyRuntime filter unit tests
+# ---------------------------------------------------------------------------
+
+
+def _bare_runtime() -> ColonyRuntime:
+    rt = ColonyRuntime.__new__(ColonyRuntime)
+    rt._enabled_mcp_tools = None
+    rt._mcp_tool_names_all = set()
+    return rt
+
+
+class TestColonyFilter:
+    def test_default_is_noop(self):
+        rt = _bare_runtime()
+        tools = [_tool("mcp_a"), _tool("lc_b")]
+        assert rt._apply_tool_allowlist(tools) == tools
+
+    def test_allowlist_gates_mcp_only(self):
+        rt = _bare_runtime()
+        rt._mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        rt._enabled_mcp_tools = ["mcp_a"]
+        tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        names = [t.name for t in rt._apply_tool_allowlist(tools)]
+        assert names == ["mcp_a", "lc_c"]
+
+    def test_empty_allowlist_keeps_lifecycle(self):
+        rt = _bare_runtime()
+        rt._mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        rt._enabled_mcp_tools = []
+        tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        names = [t.name for t in rt._apply_tool_allowlist(tools)]
+        assert names == ["lc_c"]
+
+    def test_setter_mutates_live_state(self):
+        rt = _bare_runtime()
+        rt.set_tool_allowlist(["x"], {"x", "y"})
+        assert rt._enabled_mcp_tools == ["x"]
+        assert rt._mcp_tool_names_all == {"x", "y"}
+
+        # Passing None on allowlist clears gating; mcp_tool_names_all
+        # defaults to "keep current" so a subsequent caller doesn't need
+        # to repeat the set.
+        rt.set_tool_allowlist(None)
+        assert rt._enabled_mcp_tools is None
+        assert rt._mcp_tool_names_all == {"x", "y"}
+
+
+# ---------------------------------------------------------------------------
+# Route round-trip tests
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class _FakeSession:
+    colony_name: str
+    colony: Any = None
+    colony_runtime: Any = None
+    id: str = "sess-1"
+
+
+@dataclass
+class _FakeManager:
+    _sessions: dict = field(default_factory=dict)
+    _mcp_tool_catalog: dict = field(default_factory=dict)
+
+
+@pytest.fixture
+def colony_dir(tmp_path, monkeypatch):
+    """Point COLONIES_DIR into a tmp tree and seed a colony."""
+    colonies = tmp_path / "colonies"
+    colonies.mkdir()
+    monkeypatch.setattr("framework.host.colony_metadata.COLONIES_DIR", colonies)
+    monkeypatch.setattr("framework.host.colony_tools_config.COLONIES_DIR", colonies)
+
+    name = "my_colony"
+    cdir = colonies / name
+    cdir.mkdir()
+    (cdir / "metadata.json").write_text(
+        json.dumps(
+            {
+                "colony_name": name,
+                "queen_name": "queen_technology",
+                "created_at": "2026-04-20T00:00:00+00:00",
+            }
+        )
+    )
+    return colonies, name
+
+
+async def _app(manager: _FakeManager) -> web.Application:
+    app = web.Application()
+    app["manager"] = manager
+    routes_colony_tools.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_get_tools_default_allow(colony_dir):
+    _, name = colony_dir
+    manager = _FakeManager(
+        _mcp_tool_catalog={
+            "files-tools": [
+                {"name": "read_file", "description": "read", "input_schema": {}},
+                {"name": "write_file", "description": "write", "input_schema": {}},
+            ],
+        }
+    )
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/colony/{name}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+    assert body["enabled_mcp_tools"] is None
+    assert body["stale"] is False
+    tools = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+    assert all(t["enabled"] for t in tools.values())
+
+
+@pytest.mark.asyncio
+async def test_patch_persists_and_validates(colony_dir):
+    colonies_dir, name = colony_dir
+    manager = _FakeManager(
+        _mcp_tool_catalog={
+            "files-tools": [
+                {"name": "read_file", "description": "", "input_schema": {}},
+                {"name": "write_file", "description": "", "input_schema": {}},
+            ]
+        }
+    )
+    app = await _app(manager)
+    tools_path = colonies_dir / name / "tools.json"
+    metadata_path = colonies_dir / name / "metadata.json"
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["read_file"]})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] == ["read_file"]
+
+        # Persisted to tools.json; metadata.json does not carry the field.
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] == ["read_file"]
+        assert "updated_at" in sidecar
+        meta = json.loads(metadata_path.read_text())
+        assert "enabled_mcp_tools" not in meta
+
+        # GET reflects the allowlist
+        resp = await client.get(f"/api/colony/{name}/tools")
+        body = await resp.json()
+        tools = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+        assert tools["read_file"]["enabled"] is True
+        assert tools["write_file"]["enabled"] is False
+
+        # Unknown → 400
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["ghost"]})
+        assert resp.status == 400
+        assert "ghost" in (await resp.json()).get("unknown", [])
+
+
+@pytest.mark.asyncio
+async def test_patch_refreshes_live_runtime(colony_dir):
+    _, name = colony_dir
+
+    rt = _bare_runtime()
+    rt._mcp_tool_names_all = {"read_file", "write_file"}
+    rt.set_tool_allowlist(None)
+
+    session = _FakeSession(colony_name=name, colony=rt)
+    manager = _FakeManager(
+        _sessions={session.id: session},
+        _mcp_tool_catalog={
+            "files-tools": [
+                {"name": "read_file", "description": "", "input_schema": {}},
+                {"name": "write_file", "description": "", "input_schema": {}},
+            ]
+        },
+    )
+
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["read_file"]})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["refreshed_runtimes"] == 1
+    assert rt._enabled_mcp_tools == ["read_file"]
+
+
+@pytest.mark.asyncio
+async def test_404_for_unknown_colony(colony_dir):
+    manager = _FakeManager()
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/colony/unknown/tools")
+        assert resp.status == 404
+        resp = await client.patch("/api/colony/unknown/tools", json={"enabled_mcp_tools": None})
+        assert resp.status == 404
+
+
+@pytest.mark.asyncio
+async def test_tools_index_lists_colonies(colony_dir):
+    _, name = colony_dir
+    manager = _FakeManager()
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/colonies/tools-index")
+        assert resp.status == 200
+        body = await resp.json()
+    entries = {c["name"]: c for c in body["colonies"]}
+    assert name in entries
+    assert entries[name]["queen_name"] == "queen_technology"
+    assert entries[name]["has_allowlist"] is False
+
+
+def test_queen_allowlist_inherits_into_new_colony(tmp_path, monkeypatch):
+    """A colony forked with a curated queen inherits her allowlist.
+
+    Exercises the inheritance hook in
+    ``routes_execution.fork_session_into_colony`` without running the
+    full fork machinery — we just call
+    ``update_colony_tools_config`` the same way the hook does and
+    assert the colony's ``tools.json`` matches the queen's live list.
+    """
+    colonies = tmp_path / "colonies"
+    colonies.mkdir()
+    monkeypatch.setattr("framework.host.colony_tools_config.COLONIES_DIR", colonies)
+
+    from framework.host.colony_tools_config import (
+        load_colony_tools_config,
+        update_colony_tools_config,
+    )
+
+    colony_name = "forked_child"
+    (colonies / colony_name).mkdir()
+
+    # Simulate: queen has a curated allowlist (e.g. role default resolved
+    # to a concrete list). The inheritance hook copies it verbatim.
+    queen_live_allowlist = ["read_file", "web_scrape", "csv_read"]
+    update_colony_tools_config(colony_name, list(queen_live_allowlist))
+
+    assert load_colony_tools_config(colony_name) == queen_live_allowlist
+
+
+def test_legacy_metadata_field_migrates_to_sidecar(colony_dir):
+    """A legacy enabled_mcp_tools field in metadata.json is hoisted to tools.json."""
+    colonies_dir, name = colony_dir
+    meta_path = colonies_dir / name / "metadata.json"
+    tools_path = colonies_dir / name / "tools.json"
+
+    # Seed legacy field in metadata.json.
+    meta = json.loads(meta_path.read_text())
+    meta["enabled_mcp_tools"] = ["read_file"]
+    meta_path.write_text(json.dumps(meta))
+
+    from framework.host.colony_tools_config import load_colony_tools_config
+
+    # First load migrates.
+    assert load_colony_tools_config(name) == ["read_file"]
+    assert tools_path.exists()
+    sidecar = json.loads(tools_path.read_text())
+    assert sidecar["enabled_mcp_tools"] == ["read_file"]
+
+    # metadata.json no longer contains the field; provenance fields preserved.
+    migrated = json.loads(meta_path.read_text())
+    assert "enabled_mcp_tools" not in migrated
+    assert migrated["queen_name"] == "queen_technology"
+
+    # Second load is a direct sidecar read.
+    assert load_colony_tools_config(name) == ["read_file"]
@@ -0,0 +1,239 @@
+"""Tests for the MCP server CRUD HTTP routes.
+
+Monkey-patches ``MCPRegistry`` inside ``routes_mcp`` so the HTTP layer is
+exercised without reading or writing ``~/.hive/mcp_registry/installed.json``
+or spawning actual subprocesses.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.loader.mcp_errors import MCPError, MCPErrorCode
+from framework.server import routes_mcp
+
+
+class _FakeRegistry:
+    """Stand-in for MCPRegistry — just enough surface for the routes."""
+
+    def __init__(self) -> None:
+        self._servers: dict[str, dict[str, Any]] = {
+            "built-in-seed": {
+                "source": "registry",
+                "transport": "stdio",
+                "enabled": True,
+                "manifest": {"description": "Factory-seeded server", "tools": []},
+                "last_health_status": "healthy",
+                "last_error": None,
+                "last_health_check_at": None,
+            }
+        }
+
+    def initialize(self) -> None:  # noqa: D401 — registry idempotent init
+        return
+
+    def list_installed(self) -> list[dict[str, Any]]:
+        return [{"name": name, **entry} for name, entry in self._servers.items()]
+
+    def get_server(self, name: str) -> dict | None:
+        if name not in self._servers:
+            return None
+        return {"name": name, **self._servers[name]}
+
+    def add_local(self, *, name: str, transport: str, **kwargs: Any) -> dict:
+        if name in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Server '{name}' already exists",
+                why="A server with this name is already registered locally.",
+                fix=f"Run: hive mcp remove {name}",
+            )
+        entry = {
+            "source": "local",
+            "transport": transport,
+            "enabled": True,
+            "manifest": {"description": kwargs.get("description") or ""},
+            "last_health_status": None,
+            "last_error": None,
+            "last_health_check_at": None,
+        }
+        self._servers[name] = entry
+        return entry
+
+    def remove(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Cannot remove server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list",
+            )
+        del self._servers[name]
+
+    def enable(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        self._servers[name]["enabled"] = True
+
+    def disable(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        self._servers[name]["enabled"] = False
+
+    def health_check(self, name: str) -> dict[str, Any]:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_HEALTH_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        return {"name": name, "status": "healthy", "tools": 3, "error": None}
+
+
+@pytest.fixture
+def registry(monkeypatch):
+    reg = _FakeRegistry()
+    monkeypatch.setattr(routes_mcp, "_registry", lambda: reg)
+    return reg
+
+
+async def _make_app() -> web.Application:
+    app = web.Application()
+    routes_mcp.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_list_servers_returns_built_in(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/mcp/servers")
+        assert resp.status == 200
+        body = await resp.json()
+    names = {s["name"] for s in body["servers"]}
+    # The registry fake carries one entry; the list also merges package-
+    # baked entries from core/framework/agents/queen/mcp_servers.json so
+    # the UI matches what the queen actually loads. Both should appear.
+    assert "built-in-seed" in names
+    sources = {s["name"]: s["source"] for s in body["servers"]}
+    assert sources.get("built-in-seed") == "registry"
+    # The package-baked servers (files-tools/gcu-tools/hive_tools) carry
+    # source=="built-in" and are non-removable.
+    pkg_entries = [s for s in body["servers"] if s["source"] == "built-in"]
+    assert pkg_entries, "expected at least one package-baked MCP server"
+    assert all(s.get("removable") is False for s in pkg_entries)
+
+
+@pytest.mark.asyncio
+async def test_add_local_server(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post(
+            "/api/mcp/servers",
+            json={
+                "name": "my-tool",
+                "transport": "stdio",
+                "command": "echo",
+                "args": ["hi"],
+                "description": "says hi",
+            },
+        )
+        assert resp.status == 201
+        body = await resp.json()
+        assert body["server"]["name"] == "my-tool"
+        assert body["server"]["source"] == "local"
+
+        resp = await client.get("/api/mcp/servers")
+        names = [s["name"] for s in (await resp.json())["servers"]]
+    assert "my-tool" in names
+
+
+@pytest.mark.asyncio
+async def test_add_rejects_duplicate(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        for _ in range(2):
+            resp = await client.post(
+                "/api/mcp/servers",
+                json={"name": "dup", "transport": "stdio", "command": "x"},
+            )
+        assert resp.status == 409
+        body = await resp.json()
+        assert "already exists" in body["error"].lower()
+        assert body["fix"]
+
+
+@pytest.mark.asyncio
+async def test_add_rejects_invalid_transport(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post(
+            "/api/mcp/servers",
+            json={"name": "x", "transport": "nope"},
+        )
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_enable_disable_cycle(registry):
+    app = await _make_app()
+    # Seed a local server
+    registry.add_local(name="local-one", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post("/api/mcp/servers/local-one/disable")
+        assert resp.status == 200
+        assert (await resp.json())["enabled"] is False
+        assert registry._servers["local-one"]["enabled"] is False
+
+        resp = await client.post("/api/mcp/servers/local-one/enable")
+        assert resp.status == 200
+        assert (await resp.json())["enabled"] is True
+
+
+@pytest.mark.asyncio
+async def test_remove_local_only(registry):
+    app = await _make_app()
+    registry.add_local(name="local-two", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        # Built-ins are protected
+        resp = await client.delete("/api/mcp/servers/built-in-seed")
+        assert resp.status == 400
+
+        # Missing
+        resp = await client.delete("/api/mcp/servers/ghost")
+        assert resp.status == 404
+
+        # Happy path
+        resp = await client.delete("/api/mcp/servers/local-two")
+        assert resp.status == 200
+        assert "local-two" not in registry._servers
+
+
+@pytest.mark.asyncio
+async def test_health_check(registry, monkeypatch):
+    app = await _make_app()
+    registry.add_local(name="pingable", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post("/api/mcp/servers/pingable/health")
+        assert resp.status == 200
+        body = await resp.json()
+    assert body["status"] == "healthy"
+    assert body["tools"] == 3
@@ -0,0 +1,486 @@
+"""Tests for the per-queen MCP tool allowlist filter + routes.
+
+Covers:
+1. QueenPhaseState filter semantics (default-allow, allowlist, empty, phase-
+   isolation, memo identity for LLM prompt-cache stability).
+2. routes_queen_tools round trip (GET, PATCH, validation, live-session
+   hot-reload).
+
+Route tests monkey-patch a tiny queen profile + manager catalog; they never
+spawn an MCP subprocess.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from typing import Any
+from unittest.mock import MagicMock
+
+import pytest
+import yaml
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.llm.provider import Tool
+from framework.server import routes_queen_tools
+from framework.tools.queen_lifecycle_tools import QueenPhaseState
+
+# ---------------------------------------------------------------------------
+# QueenPhaseState filter — pure unit tests
+# ---------------------------------------------------------------------------
+
+
+def _tool(name: str) -> Tool:
+    return Tool(name=name, description=f"desc of {name}", parameters={"type": "object"})
+
+
+class TestPhaseStateFilter:
+    def test_default_allow_returns_every_tool(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = None
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["mcp_a", "mcp_b", "lc_c"]
+
+    def test_allowlist_keeps_listed_mcp_plus_all_lifecycle(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = ["mcp_a"]
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["mcp_a", "lc_c"]
+
+    def test_empty_allowlist_keeps_only_lifecycle(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = []
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["lc_c"]
+
+    def test_filter_isolated_to_independent_phase(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.working_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a"}
+        ps.enabled_mcp_tools = []
+        ps.rebuild_independent_filter()
+
+        # Independent → filtered
+        assert [t.name for t in ps.get_current_tools()] == ["lc_c"]
+
+        # Other phases → unaffected
+        ps.phase = "working"
+        assert [t.name for t in ps.get_current_tools()] == ["mcp_a", "lc_c"]
+
+    def test_memo_returns_stable_identity_for_prompt_cache(self):
+        """Same Python list object across turns → LLM prompt cache stays warm."""
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a"}
+        ps.enabled_mcp_tools = None
+        ps.rebuild_independent_filter()
+
+        first = ps.get_current_tools()
+        second = ps.get_current_tools()
+        assert first is second, "memoized list must be the same object across turns"
+
+        # A rebuild should produce a different object so downstream caches
+        # correctly invalidate.
+        ps.enabled_mcp_tools = ["mcp_a"]
+        ps.rebuild_independent_filter()
+        third = ps.get_current_tools()
+        assert third is not first
+        assert [t.name for t in third] == ["mcp_a", "lc_c"]
+
+
+# ---------------------------------------------------------------------------
+# Route round-trip tests
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class _FakeSession:
+    queen_name: str
+    phase_state: QueenPhaseState
+    colony_runtime: Any = None
+    id: str = "sess-1"
+    _queen_tool_registry: Any = None
+
+
+@dataclass
+class _FakeManager:
+    _sessions: dict = field(default_factory=dict)
+    _mcp_tool_catalog: dict = field(default_factory=dict)
+
+
+@pytest.fixture
+def queen_dir(tmp_path, monkeypatch):
+    """Redirect queen profile + tools storage into a tmp dir."""
+    queens_dir = tmp_path / "queens"
+    queens_dir.mkdir()
+    monkeypatch.setattr("framework.agents.queen.queen_profiles.QUEENS_DIR", queens_dir)
+    monkeypatch.setattr("framework.agents.queen.queen_tools_config.QUEENS_DIR", queens_dir)
+
+    queen_id = "queen_technology"
+    (queens_dir / queen_id).mkdir()
+    (queens_dir / queen_id / "profile.yaml").write_text(
+        yaml.safe_dump({"name": "Alexandra", "title": "Head of Technology"})
+    )
+    return queens_dir, queen_id
+
+
+async def _make_app(*, manager: _FakeManager) -> web.Application:
+    app = web.Application()
+    app["manager"] = manager
+    routes_queen_tools.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir, monkeypatch):
+    """Queens NOT in the role-default table fall back to allow-all."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+
+    queens_dir, _ = queen_dir
+    # Use a queen id that isn't in QUEEN_DEFAULT_CATEGORIES so we exercise
+    # the fallback-to-allow-all path.
+    custom_id = "queen_custom_unknown"
+    (queens_dir / custom_id).mkdir()
+    (queens_dir / custom_id / "profile.yaml").write_text(yaml.safe_dump({"name": "Custom", "title": "Custom Role"}))
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "files-tools": [
+            {"name": "read_file", "description": "read", "input_schema": {}},
+            {"name": "write_file", "description": "write", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/queen/{custom_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+
+    assert body["enabled_mcp_tools"] is None
+    assert body["is_role_default"] is True  # no sidecar → default-allow
+    assert body["stale"] is False
+    servers = {s["name"]: s for s in body["mcp_servers"]}
+    assert set(servers) == {"files-tools"}
+    for tool in servers["files-tools"]["tools"]:
+        assert tool["enabled"] is True
+
+
+@pytest.mark.asyncio
+async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
+    """Known persona queens get their role-based default allowlist."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    _, queen_id = queen_dir  # queen_technology — has a role default
+
+    manager = _FakeManager()
+    # Seed two MCP servers: files-tools is referenced by the technology
+    # role via the @server:files-tools shorthand in `file_ops`, so its
+    # tools should bubble into the default. unrelated-server is NOT
+    # referenced by any role category — its tools must NOT leak in.
+    manager._mcp_tool_catalog = {
+        "files-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "edit_file", "description": "", "input_schema": {}},
+        ],
+        "unrelated-server": [
+            {"name": "fluffy_unknown_tool", "description": "", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+
+    assert body["is_role_default"] is True
+    enabled = set(body["enabled_mcp_tools"] or [])
+    # @server:files-tools shorthand pulls in every tool under that server.
+    assert "read_file" in enabled
+    assert "edit_file" in enabled
+    # Tools registered under a server the role doesn't reference are NOT
+    # part of the default.
+    assert "fluffy_unknown_tool" not in enabled
+
+
+@pytest.mark.asyncio
+async def test_get_tools_exposes_categories(queen_dir, monkeypatch):
+    """Response includes the category catalog with role-default flags."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    _, queen_id = queen_dir  # queen_technology
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "files-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "edit_file", "description": "", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+
+    cats = {c["name"]: c for c in body["categories"]}
+    # Categories that contribute to queen_technology's role default
+    assert cats["file_ops"]["in_role_default"] is True
+    assert cats["browser_basic"]["in_role_default"] is True
+    # Spreadsheet category is exposed even though queen_technology doesn't
+    # use it — frontend can group/show it.
+    assert "spreadsheet_advanced" in cats
+    assert cats["spreadsheet_advanced"]["in_role_default"] is False
+    # Security was removed from queen_technology defaults.
+    assert cats["security"]["in_role_default"] is False
+    # @server:files-tools shorthand expanded against the catalog.
+    assert "read_file" in cats["file_ops"]["tools"]
+    assert "edit_file" in cats["file_ops"]["tools"]
+
+
+def test_resolve_queen_default_tools_expands_server_shorthand():
+    """@server:NAME shorthand expands against the provided catalog."""
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    catalog = {
+        "files-tools": [
+            {"name": "read_file"},
+            {"name": "write_file"},
+        ],
+    }
+    # queen_brand_design uses "file_ops" category → expands via @server:files-tools.
+    result = resolve_queen_default_tools("queen_brand_design", catalog)
+    assert result is not None
+    assert "read_file" in result
+    assert "write_file" in result
+
+
+def test_resolve_queen_default_tools_unknown_queen_returns_none():
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    assert resolve_queen_default_tools("queen_made_up", {}) is None
+
+
+@pytest.mark.asyncio
+async def test_patch_persists_and_validates(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    queens_dir, queen_id = queen_dir
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "files-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "write_file", "description": "", "input_schema": {}},
+        ]
+    }
+
+    app = await _make_app(manager=manager)
+    tools_path = queens_dir / queen_id / "tools.json"
+    profile_path = queens_dir / queen_id / "profile.yaml"
+
+    async with TestClient(TestServer(app)) as client:
+        # Happy path
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] == ["read_file"]
+
+        # Sidecar persisted; profile YAML untouched by tools PATCH
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] == ["read_file"]
+        assert "updated_at" in sidecar
+        profile = yaml.safe_load(profile_path.read_text())
+        assert "enabled_mcp_tools" not in profile
+
+        # GET reflects the new state
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        body = await resp.json()
+        assert body["is_role_default"] is False  # user has explicitly saved
+        servers = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+        assert servers["read_file"]["enabled"] is True
+        assert servers["write_file"]["enabled"] is False
+
+        # Null resets
+        resp = await client.patch(f"/api/queen/{queen_id}/tools", json={"enabled_mcp_tools": None})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] is None
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] is None
+
+        # Unknown tool name → 400; sidecar unchanged
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["nope_not_a_tool"]},
+        )
+        assert resp.status == 400
+        detail = await resp.json()
+        assert "nope_not_a_tool" in detail.get("unknown", [])
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] is None
+
+
+@pytest.mark.asyncio
+async def test_patch_hot_reloads_live_session(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    _, queen_id = queen_dir
+
+    # Build a fake live session whose phase state carries a tool list the
+    # filter can gate. We also need a fake registry so
+    # _catalog_from_live_session can enumerate tools.
+    class _FakeRegistry:
+        def __init__(self, server_map, tools_by_name):
+            self._mcp_server_tools = server_map
+            self._tools_by_name = tools_by_name
+
+        def get_tools(self):
+            return {n: MagicMock(name=n) for n in self._tools_by_name}
+
+    tools_by_name = {"read_file": _tool("read_file"), "write_file": _tool("write_file")}
+    registry = _FakeRegistry(
+        server_map={"files-tools": {"read_file", "write_file"}},
+        tools_by_name=tools_by_name,
+    )
+    # Patch get_tools to return real Tool objects for name/description plumbing.
+    registry.get_tools = lambda: tools_by_name  # type: ignore[method-assign]
+
+    phase_state = QueenPhaseState(phase="independent")
+    phase_state.independent_tools = [tools_by_name["read_file"], tools_by_name["write_file"]]
+    phase_state.mcp_tool_names_all = {"read_file", "write_file"}
+    phase_state.enabled_mcp_tools = None
+    phase_state.rebuild_independent_filter()
+
+    session = _FakeSession(queen_name=queen_id, phase_state=phase_state)
+    session._queen_tool_registry = registry
+    manager = _FakeManager(_sessions={"sess-1": session})
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["refreshed_sessions"] == 1
+
+    # Session's phase state reflects the new allowlist without a restart
+    current = phase_state.get_current_tools()
+    assert [t.name for t in current] == ["read_file"]
+
+
+@pytest.mark.asyncio
+async def test_missing_queen_returns_404(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    manager = _FakeManager()
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/queen/queen_nonexistent/tools")
+        assert resp.status == 404
+
+        resp = await client.patch(
+            "/api/queen/queen_nonexistent/tools",
+            json={"enabled_mcp_tools": None},
+        )
+        assert resp.status == 404
+
+
+@pytest.mark.asyncio
+async def test_delete_restores_role_default(queen_dir, monkeypatch):
+    """DELETE removes tools.json so the queen falls back to the role default."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    queens_dir, queen_id = queen_dir
+    tools_path = queens_dir / queen_id / "tools.json"
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "files-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            # pdf_read lives in hive_tools but is named explicitly in the
+            # file_ops category, so we stage it in any server here just to
+            # surface it through the catalog.
+            {"name": "pdf_read", "description": "", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        # Seed a custom allowlist first so we have a sidecar to delete.
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        assert tools_path.exists()
+
+        resp = await client.delete(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["removed"] is True
+        assert body["is_role_default"] is True
+        assert not tools_path.exists()
+
+        # The new effective list is the role default for queen_technology;
+        # security tools were intentionally removed, so port_scan must NOT
+        # appear, while file_ops members like read_file/pdf_read do.
+        enabled = set(body["enabled_mcp_tools"] or [])
+        assert "read_file" in enabled
+        assert "pdf_read" in enabled
+        assert "port_scan" not in enabled
+        assert "subdomain_enumerate" not in enabled
+
+        # GET confirms.
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        body = await resp.json()
+        assert body["is_role_default"] is True
+
+        # Deleting again is a no-op.
+        resp = await client.delete(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        assert (await resp.json())["removed"] is False
+
+
+def test_legacy_profile_field_migrates_to_sidecar(queen_dir):
+    """A legacy enabled_mcp_tools field in profile.yaml is hoisted to tools.json."""
+    queens_dir, queen_id = queen_dir
+    profile_path = queens_dir / queen_id / "profile.yaml"
+    tools_path = queens_dir / queen_id / "tools.json"
+
+    # Seed legacy field in profile.yaml.
+    profile = yaml.safe_load(profile_path.read_text()) or {}
+    profile["enabled_mcp_tools"] = ["read_file", "write_file"]
+    profile_path.write_text(yaml.safe_dump(profile, sort_keys=False))
+
+    from framework.agents.queen.queen_tools_config import load_queen_tools_config
+
+    # First load migrates.
+    assert load_queen_tools_config(queen_id) == ["read_file", "write_file"]
+    assert tools_path.exists()
+    sidecar = json.loads(tools_path.read_text())
+    assert sidecar["enabled_mcp_tools"] == ["read_file", "write_file"]
+
+    # profile.yaml no longer contains the field; other fields preserved.
+    migrated_profile = yaml.safe_load(profile_path.read_text())
+    assert "enabled_mcp_tools" not in migrated_profile
+    assert migrated_profile["name"] == "Alexandra"
+
+    # Second load is a direct read — no migration work to do.
+    assert load_queen_tools_config(queen_id) == ["read_file", "write_file"]
@@ -11,7 +11,7 @@ metadata:

 **Applies when** your spawn message has `db_path:` and `colony_id:` fields. The DB is your durable working memory — tells you what's done, what to skip, which SOP gates you owe.

-Access via `execute_command_tool` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
+Access via `terminal_exec` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).

 ### Claim: assigned task (check this FIRST)

@@ -113,7 +113,7 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in
 ### Reading pages efficiently

 - **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~1–5 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, `browser_fill`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
+- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
 - Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) can have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. Try the available snapshot first; when the target is not present in that snapshot or visual position matters, switch to `browser_screenshot` to orient yourself.
 - Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.

@@ -244,8 +244,8 @@ The highlight overlay stays visible on the page for **10 seconds** after each in

 **Close tabs as soon as you are done with them** — not only at the end of the task. After reading or extracting data from a tab, close it immediately.

- Finished reading/extracting from a tab? `browser_close(target_id=...)`
- Completed a multi-tab workflow? `browser_close_finished()` to clean up all your tabs
+- Finished reading/extracting from a tab? `browser_close(tab_id=...)` (or no arg to close the active tab)
+- Completed a multi-tab workflow? Call `browser_close` for each tab you opened — list with `browser_tabs` first if you've lost track of IDs
 - More than 3 tabs open? Stop and close finished ones before opening more
 - Popup appeared that you didn't need? Close it immediately

@@ -410,7 +410,7 @@ In all of these cases the script is SHORT (< 10 lines) and the result is CONSUME
 - If a tool fails, retry once with the same approach.
 - If it fails a second time, STOP retrying and switch approach.
 - If `browser_snapshot` fails, try `browser_get_text` with a specific small selector as fallback.
- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_start`, then retry.
+- If `browser_open` fails or page seems stale, `browser_stop`, then `browser_open(url)` again to recreate a fresh context.

 ## Verified workflows

@@ -0,0 +1,160 @@
+---
+name: hive.chart-creation-foundations
+description: Required reading whenever any chart_* tool is available. Teaches the one-tool embedding contract (call chart_render → live chart appears in chat AND a downloadable PNG lands in the queen session dir), the ECharts (data viz) vs Mermaid (structural diagrams) decision, the BI/financial-grade aesthetic baseline (no chartjunk, restrained palette, proper typography, single message per chart), and the canonical spec patterns for the 12 most-common chart types. Skipping this leads to 1990s-Excel charts, missing downloads, and the agent writing markdown image links by hand instead of letting chart_render drive the UI.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# Chart creation foundations
+
+These tools render BI/financial-analyst-grade charts and diagrams that show up live in the chat AND save as high-DPI PNGs in the user's queen session dir.
+
+## The embedding contract — one rule
+
+> **To put a chart in chat, call `chart_render`. The chat reads `result.spec` and renders the chart live in the message bubble. The download link is `result.file_url`. Do not write `![chart](...)` image markdown by hand — the tool's result drives the UI.**
+
+That's it. One tool call, one chart in chat, one file on disk. No two-step "remember to also save it" pattern. The chat's chart-rendering UI is fed by the tool result envelope automatically.
+
+## When to chart at all
+
+Chart when the data is **visual at heart**: trends over time, distributions, comparisons across categories, hierarchies, flows, geo. Skip the chart when:
+
+- The point is one number → just say it. ("Revenue was $4.2M, up 12% YoY.")
+- The point is a ranking of 5 things → use a markdown table with bold and emoji indicators.
+- The data is so noisy a chart would mislead → describe the takeaway in prose.
+
+A chart costs the user attention. It must repay that cost with a takeaway they couldn't get from prose.
+
+## ECharts vs Mermaid — the picking rule
+
+| Use ECharts (`kind: "echarts"`) when... | Use Mermaid (`kind: "mermaid"`) when... |
+|---|---|
+| You're plotting **numbers over categories or time** | You're showing **structure, not data** |
+| Bar / line / area / scatter / candlestick / heatmap / treemap / sankey / parallel coordinates / calendar / gauge / pie / sunburst / geo map | Flowchart / sequence / gantt / ERD / state diagram / mindmap / class diagram / C4 architecture |
+| The viewer's question is "how much / how many / what's the trend" | The viewer's question is "what calls what / what depends on what / what happens after what" |
+
+If both fit (rare), prefer ECharts — its rasterized output is a proper data chart for slides; Mermaid's diagrams are for technical docs.
+
+## The aesthetic baseline (non-negotiable)
+
+These are the rules that turn an Excel-default chart into a Tableau-grade one. Every chart you produce must follow them.
+
+### 1. Theme & background
+- `chart_render` has **no `theme` parameter**. The renderer reads the user's UI theme from the desktop env (`HIVE_DESKTOP_THEME`) so the saved PNG matches what the user is actually looking at. You don't pick; the system does.
+- Title goes in `option.title.text`, NOT in the message body. The chart is self-contained.
+
+### 2. Palette discipline — DO NOT set `color` on series
+
+The OpenHive ECharts theme is auto-applied to every `chart_render` call. It defines:
+- An 8-hue **categorical palette** for multi-series charts (honey orange, slate blue, sage, terracotta, bronze, indigo, olive, rust)
+- Cozy spacing (`grid.top: 90`, `grid.bottom: 56`, etc.)
+- Brand typography (Inter Tight)
+- Tasteful axis lines + dashed gridlines
+
+**Do not set `option.color`, `option.title.textStyle`, `option.grid`, or `option.itemStyle.color` on series.** The theme covers it. If you do override, you'll fight the brand palette and the chart will look generic.
+
+When you need data-encoded color (NOT category color):
+- **Sequential** (magnitude): use `visualMap` with `inRange.color: ['#fff7e0', '#db6f02']` (light-to-honey)
+- **Diverging** (positive/negative): use `visualMap` with `inRange.color: ['#a8453d', '#f5f5f5', '#3d7a4a']` (terracotta/neutral/sage)
+- **Semantic up/down** (candlestick is auto-themed): for explicit gain/loss bars use `#3d7a4a` (gain) and `#a8453d` (loss), NOT `#27ae60` / `#e74c3c`.
+
+### 3. Typography
+The default font (`-apple-system, "Inter Tight", system-ui`) is already wired in the renderer — don't override unless the user asked. Set `option.textStyle.fontSize: 13` for body labels, `16` for axis names, `18` bold for the title.
+
+### 4. No chartjunk
+- **No 3D**. Ever. 3D pie charts and 3D bar charts are visual lies.
+- **No drop shadows** on bars or lines. The default flat ECharts look is correct.
+- **No gradient fills** unless the gradient encodes data (e.g. heatmap fill).
+- **No neon colors**. Saturation belongs on highlighted bars, not on every series.
+- **No more than 5 stacked colors** in a stacked bar — past that the eye can't separate them.
+
+### 5. Axis hygiene
+- X-axis labels rotate 45° only when they overflow. Otherwise horizontal.
+- Y-axis starts at 0 for bar/area charts (truncating misleads). Line charts can start at min - 5%.
+- Use `option.yAxis.axisLabel.formatter: '{value} M'` to add units, NOT a separate "USD millions" subtitle.
+- Date axes: pass ISO strings (`"2024-01-15"`) and ECharts handles the layout. Use `xAxis.type: "time"`.
+
+### 6. One message per chart
+Every chart goes in its own assistant message (or its own `chart_render` call). Do not pile 4 charts into one wall of tool calls — the user can't focus and the chat gets noisy.
+
+## Calling `chart_render` — the canonical pattern
+
+```
+chart_render(
+  kind="echarts",
+  spec={
+    "title": {"text": "Q4 revenue by region", "left": "center"},
+    "tooltip": {"trigger": "axis"},
+    "xAxis": {"type": "category", "data": ["NA", "EU", "APAC", "LATAM"]},
+    "yAxis": {"type": "value", "axisLabel": {"formatter": "${value}M"}},
+    "series": [{"type": "bar", "data": [12.4, 8.7, 5.3, 2.1], "itemStyle": {"color": "#db6f02"}}]
+  },
+  title="q4-revenue-by-region",
+  width=1600, height=900, dpi=300
+)
+```
+
+Returns:
+```
+{
+  "kind": "echarts",
+  "spec": {...echoed...},
+  "file_path": "/.../charts/2026-04-30T...q4-revenue-by-region.png",
+  "file_url": "file:///.../q4-revenue-by-region.png",
+  "width": 1600, "height": 900, "dpi": 300, "bytes": 142318,
+  "title": "q4-revenue-by-region", "runtime_ms": 287
+}
+```
+
+The chat panel reads `result.spec` and mounts ECharts in the message bubble. The user sees the chart immediately. The PNG is on disk and the chat shows a download link from `result.file_url`. **You don't write that link — it appears automatically.**
+
+## The 12 chart types you'll use 95% of the time
+
+| When | ECharts type | Notes |
+|---|---|---|
+| Trend over time | `series.type: "line"` | Smooth = `smooth: true` only when data is noisy |
+| Multi-metric trend | Two `line` series with `yAxis: [{}, {}]` | Separate scales when units differ |
+| Category comparison | `series.type: "bar"` | Sort by value descending, not alphabetically |
+| Stacked composition | `bar` with `stack: "total"` | Cap at 5 categories |
+| Distribution | `series.type: "boxplot"` or `bar` of bins | Boxplot for ≥3 groups; histogram for one |
+| Two-variable correlation | `series.type: "scatter"` | Add `regression` markline if relevant |
+| Candlestick / OHLC | `series.type: "candlestick"` | Date axis + `dataZoom` range slider |
+| Geo distribution | `series.type: "map"` | Bundled `world` and country GeoJSONs |
+| Hierarchy / share | `series.type: "treemap"` or `sunburst` | Use treemap for >12 leaves; pie only for 2-5 |
+| Flow | `series.type: "sankey"` | Names matter — keep them short |
+| Calendar density | `series.type: "heatmap"` + `calendar` | Daily metrics over a year |
+| KPI scorecard | `series.type: "gauge"` | Set `min`, `max`, threshold band |
+
+Worked specs for each are in `references/` — paste, modify, render.
+
+## Mermaid quick rules
+
+```
+chart_render(
+  kind="mermaid",
+  spec="""
+flowchart LR
+  A[Customer signs up] --> B{Onboarded?}
+  B -- yes --> C[Activate trial]
+  B -- no --> D[Email reminder]
+""",
+  title="signup-flow"
+)
+```
+
+- One diagram per chart_render call.
+- Keep node labels short (≤20 chars).
+- Use `flowchart LR` for left-to-right; `TD` for top-down. LR reads better in a chat bubble.
+- For sequence diagrams, indicate async with `->>` (open arrow) and sync return with `-->>` (dashed).
+- Don't try to encode data in mermaid (no widths, no quantities) — that's an ECharts job.
+
+## Common mistakes the agent makes
+
+1. **Writing `![chart](file://...)` markdown by hand.** Don't. The chat renders from the tool result automatically. Manual image markdown will display nothing (file:// is blocked from arbitrary chat content).
+2. **Calling chart_render twice for the same chart "to embed and to save".** Only one call. The single call does both.
+3. **Overriding fonts to fancy display faces.** Stay with the default; the agent's job is data, not typography.
+4. **Pie charts with 12 slices.** Use a horizontal bar chart sorted by value. Pie is only for 2-5 mutually-exclusive shares.
+5. **Forgetting `axisLabel.formatter` for currency / percentage.** A y-axis showing "12000000" is unreadable; "12M" is correct.
+6. **Putting a chart's title in the message body.** Set `option.title.text` instead so the title is part of the saved PNG.
@@ -0,0 +1,139 @@
+---
+name: hive.terminal-tools-foundations
+description: Required reading whenever any shell_* tool is available. Teaches the foreground/background dichotomy (terminal_exec auto-promotes past 30s, returns a job_id you poll with terminal_job_logs), the standard envelope shape (exit_code, stdout, stdout_truncated_bytes, output_handle, semantic_status, warning, auto_backgrounded, job_id), output handle pagination via terminal_output_get, when to read semantic_status instead of raw exit_code (grep/rg/find/diff/test exit 1 is NOT an error), the destructive-warning surface (rm -rf, git push --force, DROP TABLE), tool preference (use files-tools / gcu-tools / hive_tools before raw shell), and the bash-only-on-macOS policy. Skipping this leads to "tool returned no output" surprises, orphaned jobs, and panic over benign grep exit codes.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# terminal-tools — foundations
+
+These tools give you a real terminal: foreground exec with smart envelopes, background jobs with offset-based log streaming, persistent PTY shells, and filesystem search. Bash-only on POSIX.
+
+## Tool preference (read first)
+
+Before reaching for terminal-tools, check whether a higher-level tool already covers the task. Shell is for system operations the other servers don't reach.
+
+- **Reading files** → `files-tools.read_file` (handles size, paging, line-numbered output) — NOT `terminal_exec("cat ...")`
+- **Editing files** → `files-tools.edit_file` (atomic patch with diff verification) — NOT `terminal_exec("sed -i ...")`
+- **Writing files** → `files-tools.write_file` — NOT `terminal_exec("echo > ...")`
+- **In-project search** → `files-tools.search_files` (project-scoped, code-aware) — use `terminal_rg` only for raw paths outside the project (`/var/log`, `/etc`)
+- **Browser / web pages** → `gcu-tools.browser_*` for rendered pages — NOT `terminal_exec("curl ...")`
+- **Web search** → `hive_tools.web_search` — NOT scraping
+- **System operations** (process exec, jobs, PTYs, raw fs search) → terminal-tools. This is its territory.
+
+## The standard envelope
+
+Every spawn-style call (`terminal_exec`, the auto-promoted job state) returns this shape:
+
+```jsonc
+{
+  "exit_code": 0,                    // null when auto-backgrounded or pre-spawn error
+  "stdout": "...",                   // decoded, truncated to max_output_kb (default 256 KB)
+  "stderr": "...",
+  "stdout_truncated_bytes": 0,       // > 0 means more is in output_handle
+  "stderr_truncated_bytes": 0,
+  "runtime_ms": 42,
+  "pid": 12345,
+  "output_handle": null,             // "out_<hex>" when truncated — paginate with terminal_output_get
+  "timed_out": false,
+  "semantic_status": "ok",           // "ok" | "signal" | "error" — read THIS, not just exit_code
+  "semantic_message": null,          // e.g. "No matches found" for grep exit 1
+  "warning": null,                   // e.g. "may force-remove files" for rm -rf
+  "auto_backgrounded": false,
+  "job_id": null                     // set when auto_backgrounded=true
+}
+```
+
+## Auto-promotion (the core mental model)
+
+`terminal_exec` runs commands in the foreground until the **auto-background budget** (default 30s) elapses. Past that point, the process is silently transferred to a background job and the call returns immediately with:
+
+```jsonc
+{ "auto_backgrounded": true, "exit_code": null, "job_id": "job_<hex>", ... }
+```
+
+When you see `auto_backgrounded: true`, **pivot to polling**. The job is still running:
+
+```
+terminal_job_logs(job_id, since_offset=0, wait_until_exit=true, wait_timeout_sec=60)
+  → blocks server-side until the job exits or the timeout, returns logs + status
+```
+
+You're not failing — you're freed up to do other work while the long task runs.
+
+To force pure-foreground (kill on `timeout_sec`), pass `auto_background_after_sec=0`. Use this when you genuinely don't want a background job (small commands where promotion would surprise you).
+
+## Semantic exit codes — read `semantic_status`, not raw `exit_code`
+
+Several common commands use exit 1 for legitimate non-error states:
+
+| Command | exit 0 | exit 1 |
+|---|---|---|
+| `grep` / `rg` | matches found | **no matches** (not an error) |
+| `find` | success | **some dirs unreadable** (informational) |
+| `diff` | identical | **files differ** (informational) |
+| `test` / `[` | true | **false** (informational) |
+
+For these, `semantic_status` will be `"ok"` even when `exit_code == 1`, with `semantic_message` describing why ("No matches found"). For everything else, `semantic_status` defaults to `"ok"` on 0 and `"error"` on nonzero.
+
+**Rule**: always check `semantic_status` first. Only fall back to `exit_code` when you need the exact number (e.g. distinguishing `make` errors).
+
+## Destructive warnings — re-read your command
+
+The envelope's `warning` field is set when the command matches a known destructive pattern (`rm -rf`, `git push --force`, `git reset --hard`, `DROP TABLE`, `kubectl delete`, `terraform destroy`, etc.). The command **still ran** — the warning is informational. Use it as a "did I mean to do that?" prompt before trusting subsequent steps that depend on the side effect.
+
+If a `warning` appears unexpectedly, stop and verify: was the destructive action intended, or did a path/glob slip in?
+
+## Output handles — never lose output
+
+When `stdout_truncated_bytes > 0` or `stderr_truncated_bytes > 0`, the inline output was capped at `max_output_kb` (default 256 KB). The full bytes are stashed under `output_handle` for **5 minutes**. Paginate with:
+
+```
+terminal_output_get(output_handle, since_offset=0, max_kb=64)
+  → { data, offset, next_offset, eof, expired }
+```
+
+Track `next_offset` across calls. If `expired: true`, re-run the command (the handle's TTL has lapsed).
+
+The store has a 64 MB cap with LRU eviction. For huge outputs, prefer `terminal_job_start` + `terminal_job_logs` polling (4 MB ring buffer per stream, infinite total throughput).
+
+## Bash, not zsh — even on macOS
+
+`terminal_exec` and `terminal_pty_open` always invoke `/bin/bash`. The user's `$SHELL` is ignored. Explicit `shell="/bin/zsh"` is **rejected** with a clear error. This is a deliberate security stance, not aesthetic — zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`, `zf_*`) that bypass bash-shaped checks. The `terminal-tools-pty-sessions` skill explains the implications for PTY sessions specifically.
+
+`ZDOTDIR` and `ZSH_*` env vars are stripped before exec to prevent zsh dotfiles leaking in. Bash dotfiles still apply when invoked interactively (e.g. PTY sessions use `bash --norc --noprofile` to keep things predictable).
+
+## Pipelines and complex commands
+
+Pipes (`|`), redirects (`>`, `<`, `>>`), conditionals (`&&`, `||`, `;`), and globs (`*`, `?`, `[`) are detected automatically. You can pass them with the default `shell=False` and the runtime will transparently route through `/bin/bash -c` and surface `auto_shell: true` in the envelope:
+
+```
+terminal_exec("ps aux | sort -k3 -rn | head -40")
+  → { exit_code: 0, stdout: "...", auto_shell: true, ... }
+```
+
+For simple argv commands (no metacharacters) `shell=False` is faster and direct-execs the binary. For commands with shell features but no metacharacters that the detector catches (rare — exotic bash builtins, here-strings), pass `shell=True` explicitly:
+
+```
+terminal_exec("set -e; complicated bash logic", shell=True)
+```
+
+Quoted strings work either way — the detector uses `shlex.split` which handles `"quoted args with spaces"` correctly.
+
+## When to use what (cheat sheet)
+
+| Need | Tool |
+|---|---|
+| One-shot command, ≤30s | `terminal_exec` |
+| One-shot command, might be longer | `terminal_exec` (auto-promotes) |
+| Long-running job from the start | `terminal_job_start` |
+| State across calls (cd, env, REPL) | `terminal_pty_open` + `terminal_pty_run` |
+| Search file contents (raw paths) | `terminal_rg` |
+| Find files by predicate | `terminal_find` |
+| Retrieve truncated output | `terminal_output_get` |
+| Tree / stat / du | `terminal_exec("ls -la"/"stat foo"/"du -sh path")` |
+| HTTP / DNS / ping / archives | `terminal_exec("curl ..."/"dig ..."/"tar xzf ...")` |
+
+See `references/exit_codes.md` for the full POSIX + signal-induced + semantic catalog.
@@ -0,0 +1,50 @@
+# Exit code reference
+
+## POSIX conventions
+
+| Code | Meaning |
+|---|---|
+| 0 | Success |
+| 1 | General error / catchall |
+| 2 | Misuse of shell builtins, syntax error |
+| 126 | Command found but not executable |
+| 127 | Command not found |
+| 128 | Invalid argument to `exit` |
+| 128 + N | Killed by signal N |
+| 130 | Killed by SIGINT (Ctrl-C) |
+| 137 | Killed by SIGKILL |
+| 143 | Killed by SIGTERM |
+| 255 | Exit status out of range |
+
+When `exit_code < 0` in the envelope, the process was killed by a signal: `abs(exit_code)` is the signal number (subprocess uses negative codes for signaled exits, separate from the `128 + N` shell convention).
+
+## Semantic exits — when exit 1 is NOT an error
+
+terminal-tools encodes these in `semantic_status`. The agent should read `semantic_status` first.
+
+| Command | Code 0 | Code 1 | Code ≥2 |
+|---|---|---|---|
+| `grep` / `rg` / `ripgrep` | matches found | **no matches** (ok) | error |
+| `find` | success | **some dirs unreadable** (ok) | error |
+| `diff` | files identical | **files differ** (ok) | error |
+| `test` / `[` | condition true | **condition false** (ok) | error |
+
+For any command not in this table, the default convention applies (0 = ok, nonzero = error).
+
+## When `exit_code` is `null`
+
+- `auto_backgrounded: true` — the process is still running under a `job_id`. Poll with `terminal_job_logs`.
+- Pre-spawn error (command not found, exec failed) — see `error` field in the envelope.
+- `timed_out: true` and the process refused to die — extremely rare; the kernel has the answer.
+
+## Common signal-induced exits
+
+| Signal | Number | Subprocess exit | Shell exit | Meaning |
+|---|---|---|---|---|
+| SIGHUP | 1 | -1 | 129 | Terminal hangup |
+| SIGINT | 2 | -2 | 130 | Interrupt (Ctrl-C) |
+| SIGQUIT | 3 | -3 | 131 | Quit (Ctrl-\\) |
+| SIGKILL | 9 | -9 | 137 | Forced kill (uncatchable) |
+| SIGTERM | 15 | -15 | 143 | Polite termination |
+| SIGSEGV | 11 | -11 | 139 | Segmentation fault |
+| SIGABRT | 6 | -6 | 134 | Abort (assertion failed, etc.) |
@@ -0,0 +1,96 @@
+---
+name: hive.terminal-tools-fs-search
+description: Use terminal_rg / terminal_find when you need raw filesystem search outside the project tree — system configs, /var/log, /etc, archive contents — or when files-tools.search_files is too project-scoped. Teaches the rg vs find vs terminal_exec("ls/du/tree") split, common rg flag combos for code/logs/configs, find predicates for mtime/size/type queries, and the rule that for tree views or single-file stat info you should just use terminal_exec instead of inventing a tool. Read before reaching for raw shell to grep or find anything.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# Filesystem search
+
+terminal-tools provides two structured search tools: `terminal_rg` (ripgrep for content) and `terminal_find` (find for predicates). Everything else (tree, stat, du) is just `terminal_exec`.
+
+## When to use what
+
+| Task | Tool |
+|---|---|
+| Find code/text matching a pattern in your **project** | `files-tools.search_files` (project-aware, ranks by relevance) |
+| Find code/text matching a pattern in `/var/log`, `/etc`, archives, system dirs | `terminal_rg` |
+| Find files matching name/glob/predicate | `terminal_find` |
+| List a directory | `terminal_exec("ls -la /path")` |
+| Tree view | `terminal_exec("tree -L 2 /path")` |
+| Single-path stat | `terminal_exec("stat /path")` |
+| Disk usage | `terminal_exec("du -sh /path")` or `terminal_exec("du -h --max-depth=2 /")` |
+| Count matches across files | `terminal_rg(pattern, count=True via extra_args=["-c"])` |
+
+## `terminal_rg` — content search
+
+ripgrep is fast, gitignore-aware, and has a deep flag surface. The structured wrapper exposes the most useful flags directly; `extra_args` covers the rest.
+
+### Common patterns
+
+```
+# All Python files containing "TODO"
+terminal_rg(pattern="TODO", path=".", type_filter="py")
+
+# Case-insensitive, with context
+terminal_rg(pattern="error", path="/var/log", ignore_case=True, context=2)
+
+# Search hidden files (rg ignores them by default)
+terminal_rg(pattern="api_key", path="~", hidden=True)
+
+# Don't respect .gitignore (find files git would ignore)
+terminal_rg(pattern="generated", path=".", no_ignore=True)
+
+# Multi-line pattern (e.g., function definitions spanning lines)
+terminal_rg(pattern=r"def\s+\w+\(.*\n.*\n", path="src", extra_args=["--multiline"])
+
+# Specific filename glob
+terminal_rg(pattern="version", path=".", glob="*.toml")
+```
+
+### rg flag idioms
+
+| Flag | Effect |
+|---|---|
+| `-tpy` (`type_filter="py"`) | Only Python files |
+| `-uu` | Don't respect any ignores (incl. `.git/`) |
+| `--multiline` (`extra_args`) | Allow regex spanning lines |
+| `--max-count` (`max_count`) | Stop after N matches per file |
+| `--max-depth` (`max_depth`) | Limit recursion |
+| `-w` (`extra_args`) | Whole word match |
+| `-F` (`extra_args`) | Fixed string (no regex) |
+
+See `references/ripgrep_cheatsheet.md` for the long form.
+
+## `terminal_find` — predicate search
+
+`find` excels at "files matching N criteria". The wrapper surfaces the most common predicates; combine via the structured arguments.
+
+```
+# All .log files modified in the last 7 days, larger than 1MB
+terminal_find(path="/var/log", iname="*.log", mtime_days=7, size_kb_min=1024)
+
+# All directories named ".git" (find Git repos under a tree)
+terminal_find(path="~/projects", name=".git", type_filter="d")
+
+# Only the top three levels
+terminal_find(path="/etc", max_depth=3, type_filter="f")
+
+# Symlinks
+terminal_find(path=".", type_filter="l")
+```
+
+See `references/find_predicates.md` for combinations not directly exposed.
+
+## Output truncation
+
+Both tools return `truncated: true` when their output exceeded the inline cap. For `terminal_rg`, this means matches were dropped (refine the pattern or narrow the path); for `terminal_find`, results past `max_results` (default 1000) are dropped. Tighten predicates rather than raising the cap.
+
+## Anti-patterns
+
+- **Don't `terminal_rg` your project tree** — `files-tools.search_files` is project-aware and ranks results.
+- **Don't reach for `terminal_find` to list one directory** — `terminal_exec("ls -la /path")` is shorter.
+- **Don't use `terminal_exec("grep ...")`** when `terminal_rg` exists — rg is faster, gitignore-aware, and returns structured matches.
+- **Don't use `terminal_exec("find ...")`** to invent your own predicate combinations — use `terminal_find` and report missing capabilities.
@@ -0,0 +1,78 @@
+# find predicate reference
+
+The `terminal_find` wrapper exposes name/iname, type, mtime_days, size bounds, max_depth, max_results. For combinations beyond that, drop to `terminal_exec("find ...")`.
+
+## Time predicates
+
+| Need | find predicate |
+|---|---|
+| Modified within N days | `-mtime -N` (wrapper: `mtime_days=N`) |
+| Modified more than N days ago | `-mtime +N` |
+| Modified exactly N days ago | `-mtime N` |
+| Accessed within N days | `-atime -N` |
+| Inode changed within N days | `-ctime -N` |
+| Modified in last N minutes | `-mmin -N` |
+| Newer than reference file | `-newer ref` |
+
+## Size predicates
+
+| Need | find predicate |
+|---|---|
+| Bigger than N kilobytes | `-size +Nk` (wrapper: `size_kb_min`) |
+| Smaller than N kilobytes | `-size -Nk` (wrapper: `size_kb_max`) |
+| Exactly N kilobytes | `-size Nk` |
+| Bigger than N megabytes | `-size +NM` |
+| Empty files | `-empty` |
+
+## Type predicates
+
+| Need | find predicate |
+|---|---|
+| Regular file | `-type f` (wrapper: `type_filter="f"`) |
+| Directory | `-type d` (wrapper: `type_filter="d"`) |
+| Symlink | `-type l` (wrapper: `type_filter="l"`) |
+| Block device | `-type b` |
+| Character device | `-type c` |
+| FIFO | `-type p` |
+| Socket | `-type s` |
+
+## Permission predicates
+
+| Need | find predicate |
+|---|---|
+| Owned by user | `-user alice` |
+| Owned by group | `-group dev` |
+| Permission bits exact | `-perm 644` |
+| Has any of these bits | `-perm /u+x` |
+| Has all of these bits | `-perm -u+x` |
+| Readable by current user | `-readable` |
+| Writable | `-writable` |
+| Executable | `-executable` |
+
+## Composing
+
+`find` evaluates predicates left-to-right with implicit AND. For OR, use `\(`...\` or .
+
+```
+# .log OR .txt (drop to terminal_exec for OR)
+terminal_exec(r"find /path \( -name '*.log' -o -name '*.txt' \) -type f", shell=True)
+
+# NOT in a directory called node_modules
+terminal_exec("find . -path '*/node_modules' -prune -o -name '*.js' -print", shell=True)
+```
+
+## Actions
+
+| Need | predicate |
+|---|---|
+| Print path (default) | (implicit `-print`) |
+| Print null-separated | `-print0` (for piping to xargs -0) |
+| Delete | `-delete` (DANGEROUS — use terminal_exec with explicit confirmation) |
+| Run command per match | `-exec cmd {} \;` (drop to terminal_exec) |
+| Run command, batched | `-exec cmd {} +` |
+
+## When NOT to use find
+
+- **One directory listing**: `terminal_exec("ls -la /path")`
+- **Recursive grep**: `terminal_rg`
+- **Count files**: `terminal_exec("find /path -type f | wc -l")`
@@ -0,0 +1,70 @@
+# ripgrep cheatsheet
+
+For when the structured `terminal_rg` flags don't cover the case. Pass via `extra_args=[...]`.
+
+## Filtering
+
+| Need | Flag |
+|---|---|
+| Whole word | `-w` |
+| Fixed string (no regex) | `-F` |
+| Match files only (paths, not lines) | `-l` |
+| Count matches per file | `-c` |
+| Print only filenames with no matches | `--files-without-match` |
+| Exclude binary files | (default) |
+| Include binaries | `--binary` |
+| Search archives transparently | (rg doesn't — extract first) |
+
+## Output shape
+
+| Need | Flag |
+|---|---|
+| Show only matched part | `-o` |
+| Show byte offset of match | `-b` |
+| No filename prefix | `-N` (or pipe through awk) |
+| Color always (for piping into a colorizer) | `--color=always` |
+| JSON output | (the wrapper already uses `--json` internally) |
+
+## Boundaries
+
+| Need | Flag |
+|---|---|
+| Line-by-line (default) | (default) |
+| Multi-line regex | `--multiline` (or `-U`) |
+| Multi-line dotall (`.` matches `\n`) | `--multiline-dotall` |
+| Crlf line endings | `--crlf` |
+
+## Path control
+
+| Need | Flag |
+|---|---|
+| Follow symlinks | `-L` |
+| Don't follow | (default) |
+| Search hidden | `-.` (also expressed as `hidden=True`) |
+| Don't respect any ignores | `-uuu` |
+| Glob include | `-g 'pattern'` (also `glob="..."`) |
+| Glob exclude | `-g '!pattern'` |
+
+## Performance
+
+| Need | Flag |
+|---|---|
+| One thread | `-j 1` |
+| Smaller mmap chunks | `--mmap` (default behavior usually fine) |
+| Per-file match cap | `-m N` (also `max_count=N`) |
+
+## Common composed queries
+
+```
+# Find unused imports in Python
+terminal_rg(pattern=r"^import\s+\w+$", path="src", type_filter="py")
+
+# All TODO/FIXME/XXX with file:line
+terminal_rg(pattern=r"\b(TODO|FIXME|XXX)\b", path=".", extra_args=["-n"])
+
+# Functions defined at module top-level
+terminal_rg(pattern=r"^def\s+\w+", path=".", type_filter="py")
+
+# Lines that DON'T match a pattern (filtered through awk)
+# rg can't invert at line level; use terminal_exec with grep -v
+```
@@ -0,0 +1,110 @@
+---
+name: hive.terminal-tools-job-control
+description: Use when launching anything that runs longer than a minute, anything that streams logs, anything you want to keep running while doing other work — or when terminal_exec auto-backgrounded on you and returned a job_id. Teaches the start→poll→wait pattern with terminal_job_logs offset bookkeeping, the `wait_until_exit=True` blocking-poll idiom, the truncated_bytes_dropped resumption signal, the merge_stderr decision, the SIGINT→SIGTERM→SIGKILL escalation ladder via terminal_job_manage, and the hard rule that jobs die when the terminal-tools server restarts. Read before calling terminal_job_start, or right after terminal_exec auto-backgrounded.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# Background job control
+
+Background jobs are how you do things that take time without blocking your conversation. Three tools cover the surface: `terminal_job_start`, `terminal_job_logs`, `terminal_job_manage`.
+
+## When to use a job
+
+- Builds, deploys, long tests
+- Processes you want to monitor (streaming a log file, a dev server)
+- Anything that auto-backgrounded from `terminal_exec` (you have a `job_id`; pivot to this skill's idioms)
+
+For one-shot work expected to finish quickly, `terminal_exec` is simpler. The auto-promotion mechanic in `terminal_exec` is your safety net — start with `terminal_exec`, take over with this skill if needed.
+
+## Lifecycle
+
+```
+terminal_job_start(command, ...)
+  → { job_id, pid, started_at }
+
+terminal_job_logs(job_id, since_offset=0, max_bytes=64000)
+  → { data, offset, next_offset, status: "running"|"exited", exit_code, ... }
+
+# Repeat with since_offset = previous next_offset until status == "exited"
+# Or block once with wait_until_exit=True:
+terminal_job_logs(job_id, since_offset=N, wait_until_exit=True, wait_timeout_sec=60)
+  → blocks server-side until exit or timeout
+```
+
+After exit, the job is retained for inspection (`terminal_job_manage(action="list")`) until evicted by FIFO (50 most recent exits kept).
+
+## Offset bookkeeping — the only rule that matters
+
+The job's output lives in a 4 MB ring buffer per stream. Each call to `terminal_job_logs` returns:
+
+- `data` — bytes between `since_offset` and `next_offset`
+- `next_offset` — pass this as `since_offset` on your next call
+- `truncated_bytes_dropped` — non-zero when your `since_offset` was older than the ring's floor (you fell behind)
+
+**Always carry `next_offset` forward.** Don't replay from 0 — that's an offset reset, you'll see the same data twice and miss the part that fell off.
+
+When `truncated_bytes_dropped > 0`, the buffer evicted N bytes between your last call and now. Treat it as a signal that the job is producing output faster than you're consuming. Either poll more often or accept the gap and read from `next_offset` going forward.
+
+## merge_stderr — interleaved or separate
+
+```
+merge_stderr=False  → two streams, request "stdout" or "stderr" by name
+merge_stderr=True   → one stream ("merged"), order preserved
+```
+
+Pick `merge_stderr=True` when:
+- The job's logs are designed to be read together (most servers, build tools)
+- You don't need to distinguish "this was stderr"
+
+Pick `merge_stderr=False` when:
+- stderr is genuinely error-only and stdout is data
+- You'll process them differently
+
+## Signal escalation
+
+```
+terminal_job_manage(action="signal_int",  job_id=...)   # graceful (Ctrl-C-equivalent)
+terminal_job_manage(action="signal_term", job_id=...)   # polite kill (SIGTERM)
+terminal_job_manage(action="signal_kill", job_id=...)   # forced kill (SIGKILL, uncatchable)
+```
+
+The idiom: `signal_int` → wait 2-5s → `signal_term` → wait 2-5s → `signal_kill`. Most well-behaved processes handle SIGINT (graceful) and SIGTERM (cleanup, then exit). SIGKILL bypasses cleanup — use only when the process is truly unresponsive.
+
+After signaling, check exit with `terminal_job_logs(job_id, wait_until_exit=True, wait_timeout_sec=2)`.
+
+## Stdin
+
+```
+terminal_job_manage(action="stdin", job_id=..., data="some input\n")
+terminal_job_manage(action="close_stdin", job_id=...)
+```
+
+For tools that read stdin to EOF, `close_stdin` after writing flushes them. For interactive tools that read line-by-line, just write each line.
+
+## Take-over: when terminal_exec auto-backgrounds
+
+When `terminal_exec` returned `auto_backgrounded: true, job_id: <X>`, the process is **already** in the JobManager with its output flowing into the ring buffer. Your transition is seamless:
+
+```
+# Already saw the start of output in terminal_exec's stdout/stderr.
+# Pick up reading where the env left off — use the byte count of the
+# initial stdout as your since_offset, OR just request tail output:
+terminal_job_logs(job_id="job_xxx", tail=True, max_bytes=64000)
+```
+
+Or block until exit and grab everything:
+
+```
+terminal_job_logs(job_id="job_xxx", since_offset=0, wait_until_exit=True, wait_timeout_sec=120)
+```
+
+## Hard rules
+
+- **Jobs die when the server restarts.** The desktop runtime restarts terminal-tools when Hive restarts. There's no re-attach. If you need durability, use `nohup` + `terminal_exec` to detach into the system's process tree and track the PID yourself.
+- **Server-wide hard cap on concurrent jobs** (`TERMINAL_TOOLS_MAX_JOBS`, default 32). Past the cap, `terminal_job_start` returns an error. Wait for jobs to exit or kill old ones.
+- **No cross-restart output.** Output handles and ring buffers are in-memory only.
+
+See `references/signals.md` for the full signal catalog.
@@ -0,0 +1,41 @@
+# Signal reference
+
+terminal_job_manage exposes six signals via the action name.
+
+| Action | Signal | Number | Purpose | Catchable? |
+|---|---|---|---|---|
+| `signal_int` | SIGINT | 2 | Interrupt — Ctrl-C equivalent. Most CLIs treat as "stop gracefully". | Yes |
+| `signal_term` | SIGTERM | 15 | Polite termination request. Default for `kill`. | Yes |
+| `signal_kill` | SIGKILL | 9 | Forced kill. Process can't catch, clean up, or finalize. Use sparingly. | **No** |
+| `signal_hup` | SIGHUP | 1 | Hangup. Many daemons reload config on this. | Yes |
+| `signal_usr1` | SIGUSR1 | 10 | User-defined #1. Common: dump state, rotate logs (nginx, etc). | Yes |
+| `signal_usr2` | SIGUSR2 | 12 | User-defined #2. Common: graceful binary upgrade (unicorn, etc). | Yes |
+
+## Escalation idiom
+
+```
+1. signal_int   (Ctrl-C — graceful)
+2. wait 2-5s, check status with terminal_job_logs(wait_until_exit=True, wait_timeout_sec=3)
+3. if still running: signal_term (cleanup-then-exit)
+4. wait 2-5s
+5. if still running: signal_kill (forced)
+```
+
+The waits matter: SIGTERM handlers do real work (flush logs, close DBs, release locks) and need time. Skipping straight to SIGKILL leaks resources.
+
+## When to use SIGUSR1 / SIGUSR2
+
+These are application-defined. Read the target's docs first. Common:
+- **nginx**: SIGUSR1 → reopen log files (for log rotation)
+- **unicorn / puma**: SIGUSR2 → fork a new master with the latest binary (graceful restart)
+- **rsync**: SIGUSR1 → print stats so far
+
+## Reading exit codes after a signal
+
+When a job exits via signal, `terminal_job_logs` returns `exit_code: -N` (subprocess convention) where `abs(N)` is the signal number. The shell convention `128 + N` doesn't apply to the JobManager — that's for shell-spawned children.
+
+| exit_code | Means |
+|---|---|
+| -2 | Killed by SIGINT |
+| -9 | Killed by SIGKILL |
+| -15 | Killed by SIGTERM |
@@ -0,0 +1,127 @@
+---
+name: hive.terminal-tools-pty-sessions
+description: Use when you need state across calls — building env vars, navigating with cd, driving REPLs (python -i, mysql, psql, node), or responding to interactive prompts (sudo password, ssh host-key confirmation, mysql connection). Teaches the prompt-sentinel exec pattern (default mode), raw I/O for REPLs (raw_send=True then read_only=True), the one-in-flight-per-session rule, and the close-or-leak-against-the-cap discipline. Bash on macOS — never zsh; explicit shell=/bin/zsh is rejected. Read before calling terminal_pty_open.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# Persistent PTY sessions
+
+PTY sessions are how you talk to interactive programs — programs that detect a terminal (`isatty()`) and behave differently when they don't see one. Use a session when:
+
+- You need state to persist across calls (`cd`, env vars, sourced scripts)
+- You're driving a REPL (`python -i`, `mysql`, `psql`, `node`, `irb`)
+- A program demands an interactive prompt (`sudo`, `ssh`, `npm login`, `gh auth login`)
+
+For everything else, `terminal_exec` is simpler. Sessions cost more (per-session bash process, ring buffer, idle-reaping bookkeeping) and have a hard cap (`TERMINAL_TOOLS_MAX_PTY`, default 8).
+
+## Why PTY (and not subprocess pipes)
+
+Subprocess pipes break on every interactive program. The moment a program calls `isatty()` and sees False, it disables prompts, color, line-editing, password masking, progress bars — sometimes refuses to start. PTY makes us look like a real terminal so these programs work the same as in your shell.
+
+The cost: PTY output includes terminal escape codes (cursor moves, color codes). The session captures them as-is; if you need clean text, strip ANSI escapes in your processing layer.
+
+## Bash on macOS — by deliberate policy
+
+`terminal_pty_open` always invokes `/bin/bash`, regardless of the user's `$SHELL`. macOS users: yes, even when zsh is your interactive default. This is the **terminal-tools-foundations** policy applied to PTYs.
+
+Reasons:
+- zsh has command/builtin classes (`zmodload`, `=cmd` expansion, `zpty`, `ztcp`) that bypass bash-shaped security checks
+- One shell behavior across platforms eliminates "works on Linux, breaks on macOS" surprises
+- Bash is universal: any shell you've used will accept the bash subset
+
+The bash invocation uses `--norc --noprofile` so user dotfiles don't leak in. PS1 is set to a unique sentinel for prompt detection. PS2 is empty. PROMPT_COMMAND is empty.
+
+## Three modes of `terminal_pty_run`
+
+### 1. Default: send command, wait for prompt sentinel
+
+```
+terminal_pty_run(session_id, command="ls -la")
+  → { output, prompt_after: True, ... }
+```
+
+The session writes `ls -la\n`, waits for the sentinel that its custom PS1 emits, returns the slice between submission and prompt. **One in-flight call per session** — a concurrent call returns a `"session busy"` error.
+
+### 2. raw_send: send raw input, no waiting
+
+```
+terminal_pty_run(session_id, command="print('hi')\n", raw_send=True)
+  → { bytes_sent: 12 }
+```
+
+For REPLs, vim keystrokes, password prompts. The session writes the bytes and returns immediately — it doesn't wait for a prompt (REPLs don't print bash's prompt; they print their own).
+
+After a `raw_send`, you typically follow with:
+
+### 3. read_only: drain currently-buffered output
+
+```
+terminal_pty_run(session_id, read_only=True, timeout_sec=2)
+  → { output: "hi\n", more: False, ... }
+```
+
+Reads whatever the session has accumulated since the last drain, with a brief settle window. Use after raw_send to capture the REPL's response.
+
+## Custom prompt detection (`expect`)
+
+When the command launches a program with its own prompt (Python REPL's `>>> `, mysql's `mysql> `, sudo's password prompt), the bash sentinel won't appear until the program exits. Override:
+
+```
+terminal_pty_run(session_id, command="python3", expect=r">>>\s*$", timeout_sec=10)
+  → output up to and including ">>>", then control returns
+```
+
+For sudo:
+
+```
+terminal_pty_run(session_id, command="sudo -k && sudo whoami", expect=r"[Pp]assword:")
+terminal_pty_run(session_id, command="<password>", raw_send=True, command="<password>\n")
+terminal_pty_run(session_id, read_only=True, timeout_sec=5)
+```
+
+(Treat passwords carefully — they end up in the ring buffer.)
+
+## Always close
+
+```
+terminal_pty_close(session_id)
+```
+
+Leaked sessions count against `TERMINAL_TOOLS_MAX_PTY` (default 8). Idle reaping happens lazily on every `_open` call (sessions inactive longer than `idle_timeout_sec`, default 1800s, are dropped) — but don't rely on it. Close when you're done.
+
+For unresponsive sessions, `force=True` skips the graceful "exit" attempt and goes straight to SIGTERM/SIGKILL.
+
+## Common patterns
+
+### Stateful navigation
+
+```
+sid = terminal_pty_open(cwd="/")
+terminal_pty_run(sid, command="cd /var/log")
+terminal_pty_run(sid, command="ls -la *.log | head")
+terminal_pty_close(sid)
+```
+
+### Python REPL
+
+```
+sid = terminal_pty_open()
+terminal_pty_run(sid, command="python3", expect=r">>>\s*$")
+terminal_pty_run(sid, command="x = 42", raw_send=True)
+terminal_pty_run(sid, command="print(x*x)\n", raw_send=True)
+result = terminal_pty_run(sid, read_only=True)  # → "1764\n>>> "
+terminal_pty_run(sid, command="exit()", raw_send=True)
+terminal_pty_close(sid)
+```
+
+### ssh with host-key prompt
+
+```
+sid = terminal_pty_open()
+terminal_pty_run(sid, command="ssh user@new-host", expect=r"\(yes/no.*\)\?")
+terminal_pty_run(sid, command="yes\n", raw_send=True)
+terminal_pty_run(sid, read_only=True, timeout_sec=10)  # password prompt or login
+```
@@ -0,0 +1,92 @@
+---
+name: hive.terminal-tools-troubleshooting
+description: Read when a terminal-tools call returned something surprising — empty stdout despite no error, exit_code is null, output_handle came back expired, "too many jobs" / "session busy" / "too many PTYs", warning was set unexpectedly, semantic_status disagrees with exit_code. Diagnostic recipes only — load on demand. Don't preload; the foundational skill covers the happy path.
+metadata:
+  author: hive
+  type: preset-skill
+  version: "1.0"
+---
+
+# Troubleshooting terminal-tools
+
+Recipes for surprising results. Match the symptom to the section.
+
+## Empty `stdout` despite the command "should have" produced output
+
+Possible causes:
+1. Output went to **stderr** instead. Check `stderr` in the envelope (or use `merge_stderr=True` for jobs).
+2. Output was **fully truncated** because `max_output_kb` is too small. Check `stdout_truncated_bytes > 0`. Bump `max_output_kb` or paginate via `output_handle`.
+3. Command produced no output (correct, just unexpected — `silent` flags, no matches).
+4. Pipeline issue: the last stage of a pipe ran but stdout went elsewhere (`> /dev/null`, redirected via `2>&1`).
+5. Process is buffering its output and didn't flush before exit. Add `stdbuf -oL` (line-buffered) or `unbuffer` to the command.
+
+## `exit_code: null`
+
+| Cause | Other field |
+|---|---|
+| Auto-backgrounded | `auto_backgrounded: true, job_id: <X>` |
+| Hard timeout, process killed | `timed_out: true` |
+| Pre-spawn failure (command not found) | `error: ...` set, `pid: null` |
+| Still running (in `terminal_job_logs`) | `status: "running"` |
+
+## `output_handle` returned `expired: true`
+
+5-minute TTL. Either (a) you waited too long, or (b) the store evicted it under memory pressure (64 MB total cap, LRU eviction). Re-run the command.
+
+To reduce risk: paginate the handle as soon as you receive it, or use `terminal_job_*` for huge outputs (4 MB ring buffer with offsets — no expiry).
+
+## "too many jobs" / `JobLimitExceeded`
+
+`TERMINAL_TOOLS_MAX_JOBS` (default 32) hit. Either:
+- Wait for jobs to exit (poll with `terminal_job_logs(wait_until_exit=True)`)
+- Kill old jobs: `terminal_job_manage(action="list")` to see what's running, then `signal_term` the abandoned ones
+- Raise the cap via env (rare)
+
+## "session busy"
+
+A `terminal_pty_run` was issued while another `_run` is in flight on the same session. PTY sessions are single-threaded conversations. Wait for the prior call to return, or open a second session.
+
+## "PTY cap reached"
+
+`TERMINAL_TOOLS_MAX_PTY` (default 8) hit. Close idle sessions (`terminal_pty_close`). Idle reaping is lazy; force it by opening — no, actually, opening throws when the cap is hit. Just close manually.
+
+## `warning` is set, the command worked
+
+Informational only. The pattern matched (e.g. `rm -rf` literally appears, or `git push --force` was used). The command ran. The warning is your "did I mean to do that?" prompt — verify the side effect was intended before continuing.
+
+## `semantic_status: "ok"` but `exit_code: 1`
+
+Working as designed. Some commands use exit 1 for legitimate non-error states:
+- `grep` / `rg` exit 1 when **no matches** found
+- `find` exit 1 when **some directories were unreadable** (typical on `/proc`, etc.)
+- `diff` exit 1 when **files differ**
+- `test` / `[` exit 1 when **condition is false**
+
+The `semantic_message` field explains. Trust `semantic_status`, not raw `exit_code`.
+
+## `semantic_status: "error"` but `exit_code: 0`
+
+Shouldn't happen. If it does, file a bug.
+
+## `truncated_bytes_dropped > 0` in `terminal_job_logs`
+
+Your `since_offset` was older than the ring buffer's floor — bytes evicted before you could read them. Either:
+- Poll faster (lower latency between calls)
+- Use `merge_stderr=True` (single 4 MB ring instead of 4 MB × 2)
+- Accept the gap and move forward from `next_offset`
+
+## `terminal_pty_open` succeeds but the first `_run` times out
+
+The session may not have produced its first prompt sentinel within the 2-second startup window. Try:
+- A `terminal_pty_run(sid, read_only=True, timeout_sec=2)` to drain whatever's accumulated
+- A noop command (`terminal_pty_run(sid, command="true")`) to force a prompt cycle
+
+Could also indicate the bash process died at startup — `terminal_pty_run(sid, ...)` would then return `"session has exited"`.
+
+## `shell="/bin/zsh"` returned an error
+
+By design. terminal-tools is bash-only on POSIX. Use `shell=True` (default `/bin/bash`) or omit `shell=` to exec directly.
+
+## A command in `shell=True` is interpreted differently than expected
+
+Bash, not zsh, semantics. `**/*` doesn't recurse without `shopt -s globstar`; `=cmd` expansion doesn't work; arrays use `arr[idx]` not `${arr[idx]}` differently than zsh. When in doubt, the foundational skill's "bash, not zsh" section is the canonical statement.
@@ -0,0 +1,203 @@
+"""Shared skill authoring primitives.
+
+Validates and materializes a skill folder. Used by three callers:
+
+1. Queen's ``create_colony`` tool (``queen_lifecycle_tools.py``) — inline
+   content passed by the queen during colony creation.
+2. HTTP POST / PUT routes under ``/api/**/skills`` — UI-driven creation.
+3. Future ``create_learned_skill`` tool — runtime learning.
+
+Keeping the validators and writer here ensures the three paths share one
+authority; changes to the name regex or frontmatter layout happen in one
+place.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+import shutil
+from dataclasses import dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Framework skill names include dots (``hive.note-taking``), so the
+# validator needs to allow them even though the queen's ``create_colony``
+# tool historically forbade dots. User-created skills without dots still
+# pass; the dot cap just prevents us from rejecting existing framework
+# names when the UI toggles them via ``validate_skill_name``.
+_SKILL_NAME_RE = re.compile(r"^[a-z0-9.-]+$")
+_MAX_NAME_LEN = 64
+_MAX_DESC_LEN = 1024
+
+
+@dataclass
+class SkillFile:
+    """Supporting file bundled with a skill (relative path + content)."""
+
+    rel_path: Path
+    content: str
+
+
+@dataclass
+class SkillDraft:
+    """Validated skill content ready to be written to disk."""
+
+    name: str
+    description: str
+    body: str
+    files: list[SkillFile] = field(default_factory=list)
+
+    @property
+    def skill_md_text(self) -> str:
+        """Assemble the final SKILL.md text (frontmatter + body)."""
+        body_norm = self.body.rstrip() + "\n"
+        return f"---\nname: {self.name}\ndescription: {self.description}\n---\n\n{body_norm}"
+
+
+def validate_skill_name(raw: str) -> tuple[str | None, str | None]:
+    """Return ``(normalized_name, error)``. Either side may be None."""
+    name = (raw or "").strip() if isinstance(raw, str) else ""
+    if not name:
+        return None, "skill_name is required"
+    if not _SKILL_NAME_RE.match(name):
+        return None, f"skill_name '{name}' must match [a-z0-9-] pattern"
+    if name.startswith("-") or name.endswith("-") or "--" in name:
+        return None, f"skill_name '{name}' has leading/trailing/consecutive hyphens"
+    if len(name) > _MAX_NAME_LEN:
+        return None, f"skill_name '{name}' exceeds {_MAX_NAME_LEN} chars"
+    return name, None
+
+
+def validate_description(raw: str) -> tuple[str | None, str | None]:
+    desc = (raw or "").strip() if isinstance(raw, str) else ""
+    if not desc:
+        return None, "skill_description is required"
+    if len(desc) > _MAX_DESC_LEN:
+        return None, f"skill_description must be 1–{_MAX_DESC_LEN} chars"
+    # Frontmatter descriptions are line-oriented — the parser reads one value.
+    if "\n" in desc or "\r" in desc:
+        return None, "skill_description must be a single line (no newlines)"
+    return desc, None
+
+
+def validate_files(raw: list[dict] | None) -> tuple[list[SkillFile] | None, str | None]:
+    if not raw:
+        return [], None
+    if not isinstance(raw, list):
+        return None, "skill_files must be an array"
+    out: list[SkillFile] = []
+    for entry in raw:
+        if not isinstance(entry, dict):
+            return None, "each skill_files entry must be an object with 'path' and 'content'"
+        rel_raw = entry.get("path")
+        content = entry.get("content")
+        if not isinstance(rel_raw, str) or not rel_raw.strip():
+            return None, "skill_files entry missing non-empty 'path'"
+        if not isinstance(content, str):
+            return None, f"skill_files entry '{rel_raw}' missing string 'content'"
+        rel_stripped = rel_raw.strip()
+        # Allow './foo' but reject '/foo' — relativizing absolute paths silently
+        # has bitten other tools; make the intent loud instead.
+        if rel_stripped.startswith("./"):
+            rel_stripped = rel_stripped[2:]
+        rel_path = Path(rel_stripped)
+        if rel_stripped.startswith("/") or rel_path.is_absolute() or ".." in rel_path.parts:
+            return None, f"skill_files path '{rel_raw}' must be relative and inside the skill folder"
+        if rel_path.as_posix() == "SKILL.md":
+            return None, "skill_files must not contain SKILL.md — pass skill_body instead"
+        out.append(SkillFile(rel_path=rel_path, content=content))
+    return out, None
+
+
+def build_draft(
+    *,
+    skill_name: str,
+    skill_description: str,
+    skill_body: str,
+    skill_files: list[dict] | None = None,
+) -> tuple[SkillDraft | None, str | None]:
+    """Validate all inputs and return an immutable draft ready for writing."""
+    name, err = validate_skill_name(skill_name)
+    if err or name is None:
+        return None, err
+    desc, err = validate_description(skill_description)
+    if err or desc is None:
+        return None, err
+    body = skill_body if isinstance(skill_body, str) else ""
+    if not body.strip():
+        return None, (
+            "skill_body is required — the operational procedure the colony worker needs to run this job unattended"
+        )
+    files, err = validate_files(skill_files)
+    if err or files is None:
+        return None, err
+    return SkillDraft(name=name, description=desc, body=body, files=list(files)), None
+
+
+def write_skill(
+    draft: SkillDraft,
+    *,
+    target_root: Path,
+    replace_existing: bool = True,
+) -> tuple[Path | None, str | None, bool]:
+    """Write the draft under ``target_root/{draft.name}/``.
+
+    ``target_root`` is the parent scope dir (e.g.
+    ``~/.hive/agents/queens/{id}/skills`` or
+    ``{colony_dir}/skills``). The function creates it if needed.
+
+    Returns ``(installed_path, error, replaced)``. On success ``error`` is
+    ``None``; on failure ``installed_path`` is ``None`` and the target is
+    left as it was before the call (best-effort).
+
+    When ``replace_existing=False`` and the target dir already exists,
+    the write is refused with a non-fatal error (caller decides whether
+    to surface it as a 409 or a warning).
+    """
+    try:
+        target_root.mkdir(parents=True, exist_ok=True)
+    except OSError as e:
+        return None, f"failed to create skills root: {e}", False
+
+    target = target_root / draft.name
+    replaced = False
+    try:
+        if target.exists():
+            if not replace_existing:
+                return None, f"skill '{draft.name}' already exists", False
+            # Remove the old dir outright so stale files from a prior
+            # version don't linger alongside the new ones.
+            replaced = True
+            shutil.rmtree(target)
+        target.mkdir(parents=True, exist_ok=False)
+        (target / "SKILL.md").write_text(draft.skill_md_text, encoding="utf-8")
+        for sf in draft.files:
+            full_path = target / sf.rel_path
+            full_path.parent.mkdir(parents=True, exist_ok=True)
+            full_path.write_text(sf.content, encoding="utf-8")
+    except OSError as e:
+        return None, f"failed to write skill folder {target}: {e}", replaced
+    return target, None, replaced
+
+
+def remove_skill(target_root: Path, skill_name: str) -> tuple[bool, str | None]:
+    """Rm-tree the skill directory under ``target_root/{skill_name}/``.
+
+    Returns ``(removed, error)``. ``removed=False, error=None`` means
+    the directory didn't exist (idempotent). Name is validated on the
+    way in so an attacker with UI access can't traverse out of the
+    scope root.
+    """
+    name, err = validate_skill_name(skill_name)
+    if err or name is None:
+        return False, err
+    target = target_root / name
+    if not target.exists():
+        return False, None
+    try:
+        shutil.rmtree(target)
+    except OSError as e:
+        return False, f"failed to remove skill folder {target}: {e}"
+    return True, None
@@ -7,7 +7,7 @@ locations. Resolves name collisions deterministically.
 from __future__ import annotations

 import logging
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path

 from framework.skills.parser import ParsedSkill, parse_skill_md
@@ -30,16 +30,40 @@ _SKIP_DIRS = frozenset(
 )

 # Scope priority (higher = takes precedence)
+# ``preset`` sits between framework and user: bundled alongside the
+# framework distribution, but off by default — capability packs the user
+# opts into per queen/colony rather than globally-enabled infra.
 _SCOPE_PRIORITY = {
    "framework": 0,
-    "user": 1,
-    "project": 2,
+    "preset": 1,
+    "user": 2,
+    "queen_ui": 3,
+    "colony_ui": 4,
+    "project": 5,
 }

 # Within the same scope, Hive-specific paths override cross-client paths.
 # We encode this by scanning cross-client first, then Hive-specific (later wins).


+@dataclass
+class ExtraScope:
+    """Additional scope dir to scan beyond the standard five.
+
+    Used by :class:`framework.skills.manager.SkillsManager` to surface
+    per-queen (``queen_ui``) and per-colony (``colony_ui``) skill
+    directories created through the UI. The ``label`` feeds
+    :attr:`ParsedSkill.source_scope` so downstream consumers (trust
+    gate, UI provenance resolver) can distinguish scope origins.
+    """
+
+    directory: Path
+    label: str
+    # Kept for forward-compat with the priority table; discovery itself
+    # relies on scan order for last-wins resolution.
+    priority: int = 0
+
+
@dataclass
 class DiscoveryConfig:
    """Configuration for skill discovery."""
@@ -49,6 +73,10 @@ class DiscoveryConfig:
    skip_framework_scope: bool = False
    max_depth: int = 4
    max_dirs: int = 2000
+    # Additional scope dirs scanned between user and project scopes,
+    # in the order they are provided. Use ``ExtraScope`` to tag each
+    # with its logical label (``queen_ui`` / ``colony_ui``).
+    extra_scopes: list[ExtraScope] = field(default_factory=list)


 class SkillDiscovery:
@@ -82,13 +110,22 @@ class SkillDiscovery:
        all_skills: list[ParsedSkill] = []
        self._scanned_dirs = []

-        # Framework scope (lowest precedence)
+        # Framework scope (lowest precedence) — always-on infra skills.
        if not self._config.skip_framework_scope:
            framework_dir = Path(__file__).parent / "_default_skills"
            if framework_dir.is_dir():
                self._scanned_dirs.append(framework_dir)
                all_skills.extend(self._scan_scope(framework_dir, "framework"))

+            # Preset scope — bundled capability packs that ship with the
+            # framework but default to OFF. User opts in per queen/colony
+            # via the Skills Library. ``skip_framework_scope`` covers both
+            # bundled directories since they live side-by-side on disk.
+            preset_dir = Path(__file__).parent / "_preset_skills"
+            if preset_dir.is_dir():
+                self._scanned_dirs.append(preset_dir)
+                all_skills.extend(self._scan_scope(preset_dir, "preset"))
+
        # User scope
        if not self._config.skip_user_scope:
            home = Path.home()
@@ -99,12 +136,23 @@ class SkillDiscovery:
                self._scanned_dirs.append(user_agents)
                all_skills.extend(self._scan_scope(user_agents, "user"))

-            # Hive-specific (higher precedence within user scope)
-            user_hive = home / ".hive" / "skills"
+            # Hive-specific (higher precedence within user scope). Honors
+            # HIVE_HOME so the desktop's per-user root (set via env) wins
+            # over the shared ``~/.hive`` location.
+            from framework.config import HIVE_HOME
+
+            user_hive = HIVE_HOME / "skills"
            if user_hive.is_dir():
                self._scanned_dirs.append(user_hive)
                all_skills.extend(self._scan_scope(user_hive, "user"))

+        # Extra scopes (queen_ui / colony_ui), scanned between user and project
+        # so colony overrides beat queen overrides, and both beat user-scope.
+        for extra in self._config.extra_scopes:
+            if extra.directory.is_dir():
+                self._scanned_dirs.append(extra.directory)
+                all_skills.extend(self._scan_scope(extra.directory, extra.label))
+
        # Project scope (highest precedence)
        if self._config.project_root:
            root = self._config.project_root
@@ -15,14 +15,18 @@ import subprocess
 import tempfile
 from pathlib import Path

+from framework.config import HIVE_HOME
 from framework.skills.parser import ParsedSkill
 from framework.skills.skill_errors import SkillError, SkillErrorCode

-# Default install destination for user-scope skills
-USER_SKILLS_DIR = Path.home() / ".hive" / "skills"
+# Default install destination for user-scope skills.
+# Anchored on HIVE_HOME so the desktop shell can override the install
+# root via $HIVE_HOME without patching every call site.
+USER_SKILLS_DIR = HIVE_HOME / "skills"
+
+# Sentinel file for the one-time security notice on first install (NFR-5).
+INSTALL_NOTICE_SENTINEL = HIVE_HOME / ".install_notice_shown"

-# Sentinel file for the one-time security notice on first install (NFR-5)
-INSTALL_NOTICE_SENTINEL = Path.home() / ".hive" / ".install_notice_shown"

 _INSTALL_NOTICE = """\
 ─────────────────────────────────────────────────────────────
@@ -44,7 +48,7 @@ _INSTALL_NOTICE = """\
 def maybe_show_install_notice() -> None:
    """Print a one-time security notice before the first skill install (NFR-5).

-    Touches a sentinel file in ~/.hive/ after showing the notice so it is
+    Touches a sentinel file in $HIVE_HOME after showing the notice so it is
    only displayed once across all future installs.
    """
    if INSTALL_NOTICE_SENTINEL.exists():
@@ -23,6 +23,7 @@ Typical usage — **bare** (exported agents, SDK users)::

 from __future__ import annotations

+import asyncio
 import logging
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -44,6 +45,18 @@ class SkillsManagerConfig:
            even when ``project_root`` is set.
        interactive: Whether trust gating can prompt the user interactively.
            When ``False``, untrusted project skills are silently skipped.
+        queen_id: Optional queen identifier. When set, enables the
+            ``queen_ui`` scope and per-queen override file.
+        queen_overrides_path: Path to
+            ``~/.hive/agents/queens/{queen_id}/skills_overrides.json``.
+            When set, the store is loaded and its entries override
+            discovery results (disable skills, record provenance).
+        colony_name: Optional colony identifier; mirrors ``queen_id`` for
+            the ``colony_ui`` scope.
+        colony_overrides_path: Per-colony override file path.
+        extra_scope_dirs: Extra scope dirs scanned between user and
+            project scopes. Typically populated by the caller with the
+            queen/colony UI skill directories.
    """

    skills_config: SkillsConfig = field(default_factory=SkillsConfig)
@@ -51,6 +64,15 @@ class SkillsManagerConfig:
    skip_community_discovery: bool = False
    interactive: bool = True

+    # Override support
+    queen_id: str | None = None
+    queen_overrides_path: Path | None = None
+    colony_name: str | None = None
+    colony_overrides_path: Path | None = None
+    # Typed at the call site as ``list[ExtraScope]`` — not imported here
+    # to keep this module free of discovery-layer dependencies.
+    extra_scope_dirs: list = field(default_factory=list)
+

 class SkillsManager:
    """Unified skill lifecycle: discovery → loading → prompt renderation.
@@ -65,13 +87,21 @@ class SkillsManager:
        self._config = config or SkillsManagerConfig()
        self._loaded = False
        self._catalog: object = None  # SkillCatalog, set after load()
+        self._all_skills: list = []  # list[ParsedSkill], pre-override-filter
        self._catalog_prompt: str = ""
        self._protocols_prompt: str = ""
        self._allowlisted_dirs: list[str] = []
        self._default_mgr: object = None  # DefaultSkillManager, set after load()
+        # Override stores (loaded lazily in _do_load). Queen-scope and
+        # colony-scope are read together; colony entries win on collision.
+        self._queen_overrides: object = None  # SkillOverrideStore | None
+        self._colony_overrides: object = None  # SkillOverrideStore | None
        # Hot-reload state
        self._watched_dirs: list[str] = []
+        self._watched_files: list[str] = []
        self._watcher_task: object = None  # asyncio.Task, set by start_watching()
+        # Serializes in-process mutations (HTTP handlers + create_colony).
+        self._mutation_lock = asyncio.Lock()

    # ------------------------------------------------------------------
    # Factory for backwards-compat bridge
@@ -119,6 +149,7 @@ class SkillsManager:
        from framework.skills.catalog import SkillCatalog
        from framework.skills.defaults import DefaultSkillManager
        from framework.skills.discovery import DiscoveryConfig, SkillDiscovery
+        from framework.skills.overrides import SkillOverrideStore

        skills_config = self._config.skills_config

@@ -128,12 +159,13 @@ class SkillsManager:
            DiscoveryConfig(
                project_root=self._config.project_root,
                skip_framework_scope=False,
+                extra_scopes=list(self._config.extra_scope_dirs or []),
            )
        )
        discovered = discovery.discover()
        self._watched_dirs = discovery.scanned_directories

-        # Trust-gate project-scope skills (AS-13)
+        # Trust-gate project-scope skills (AS-13). UI scopes bypass.
        if self._config.project_root is not None and not self._config.skip_community_discovery:
            from framework.skills.trust import TrustGate

@@ -141,6 +173,31 @@ class SkillsManager:
                discovered, project_dir=self._config.project_root
            )

+        # 1b. Load per-scope override stores. Missing files → empty stores.
+        queen_store = None
+        if self._config.queen_overrides_path is not None:
+            queen_store = SkillOverrideStore.load(
+                self._config.queen_overrides_path,
+                scope_label=f"queen:{self._config.queen_id or ''}",
+            )
+        colony_store = None
+        if self._config.colony_overrides_path is not None:
+            colony_store = SkillOverrideStore.load(
+                self._config.colony_overrides_path,
+                scope_label=f"colony:{self._config.colony_name or ''}",
+            )
+        self._queen_overrides = queen_store
+        self._colony_overrides = colony_store
+        self._watched_files = [
+            str(p) for p in (self._config.queen_overrides_path, self._config.colony_overrides_path) if p is not None
+        ]
+
+        # 1c. Apply override filtering. Colony entries take precedence over
+        # queen entries on name collision; the store's ``is_disabled`` keeps
+        # the resolution rule in one place.
+        self._all_skills = list(discovered)
+        discovered = self._apply_overrides(discovered, skills_config, queen_store, colony_store)
+
        catalog = SkillCatalog(discovered)
        self._catalog = catalog
        self._allowlisted_dirs = catalog.allowlisted_dirs
@@ -174,6 +231,101 @@ class SkillsManager:
                len(catalog_prompt),
            )

+    # ------------------------------------------------------------------
+    # Override application
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _apply_overrides(
+        discovered: list,
+        skills_config: SkillsConfig,
+        queen_store: object,
+        colony_store: object,
+    ) -> list:
+        """Filter ``discovered`` per the queen + colony override stores.
+
+        Resolution rule:
+          1. Tombstoned names (``deleted_ui_skills``) drop out.
+          2. An explicit ``enabled=False`` override drops the skill.
+          3. An explicit ``enabled=True`` override keeps it (wins over
+             ``all_defaults_disabled`` for framework defaults AND over the
+             preset-scope default-off rule).
+          4. Otherwise: preset-scope skills are off by default; everything
+             else inherits :meth:`SkillsConfig.is_default_enabled`.
+        """
+        from framework.skills.overrides import SkillOverrideStore
+
+        stores: list[SkillOverrideStore] = [s for s in (queen_store, colony_store) if s is not None]
+
+        tombstones: set[str] = set()
+        for store in stores:
+            tombstones |= set(store.deleted_ui_skills)
+
+        out = []
+        for skill in discovered:
+            if skill.name in tombstones:
+                continue
+            # Check colony first so colony overrides win over queen's.
+            explicit: bool | None = None
+            master_disabled = False
+            for store in reversed(stores):  # colony, then queen
+                entry = store.get(skill.name)
+                if entry is not None and entry.enabled is not None:
+                    explicit = entry.enabled
+                    break
+                if store.all_defaults_disabled:
+                    master_disabled = True
+            if explicit is False:
+                continue
+            if explicit is True:
+                out.append(skill)
+                continue
+            # Preset-scope capability packs are bundled but ship OFF; the
+            # user must explicitly enable them per queen or colony. This
+            # runs even when no store is present so bare agents don't
+            # silently load x-automation etc.
+            if skill.source_scope == "preset":
+                continue
+            # No explicit entry — master switch takes effect against framework defaults.
+            default_enabled = skills_config.is_default_enabled(skill.name)
+            if master_disabled and default_enabled and skill.source_scope == "framework":
+                continue
+            if default_enabled:
+                out.append(skill)
+        return out
+
+    # ------------------------------------------------------------------
+    # Override accessors
+    # ------------------------------------------------------------------
+
+    @property
+    def queen_overrides(self) -> object:
+        """The queen-scope :class:`SkillOverrideStore` or ``None``."""
+        return self._queen_overrides
+
+    @property
+    def colony_overrides(self) -> object:
+        """The colony-scope :class:`SkillOverrideStore` or ``None``."""
+        return self._colony_overrides
+
+    @property
+    def mutation_lock(self) -> asyncio.Lock:
+        """Serializes in-process override mutations (routes + queen tools)."""
+        return self._mutation_lock
+
+    def reload(self) -> None:
+        """Re-run discovery and rebuild cached prompts. Public wrapper for ``_reload``."""
+        self._reload()
+
+    def enumerate_skills_with_source(self) -> list:
+        """Return every discovered skill, including ones disabled by overrides.
+
+        The UI relies on this: a disabled framework skill needs to render
+        in the list so the user can toggle it back on. The post-filter
+        catalog omits those entries.
+        """
+        return list(self._all_skills)
+
    # ------------------------------------------------------------------
    # Hot-reload: watch skill directories for SKILL.md changes.
    # ------------------------------------------------------------------
@@ -181,14 +333,14 @@ class SkillsManager:
    async def start_watching(self) -> None:
        """Start a background task watching skill directories for changes.

-        When a ``SKILL.md`` file is added/modified/removed, the cached
-        ``skills_catalog_prompt`` is rebuilt.  The next node iteration picks
-        up the new prompt automatically via the ``dynamic_prompt_provider``.
+        Triggers a reload when any ``SKILL.md`` changes or an override
+        JSON file is modified. The next node iteration picks up the new
+        prompt via the ``dynamic_prompt_provider`` / per-worker
+        ``dynamic_skills_catalog_provider``.

-        Silently no-ops when ``watchfiles`` is not installed or when no
-        directories are being watched (e.g. bare mode, no project_root).
+        Silently no-ops when ``watchfiles`` is not installed or there
+        are no paths to watch.
        """
-        import asyncio

        try:
            import watchfiles  # noqa: F401 -- optional dep check
@@ -196,7 +348,7 @@ class SkillsManager:
            logger.debug("watchfiles not installed; skill hot-reload disabled")
            return

-        if not self._watched_dirs:
+        if not self._watched_dirs and not self._watched_files:
            logger.debug("No skill directories to watch; hot-reload skipped")
            return

@@ -208,14 +360,13 @@ class SkillsManager:
            name="skills-hot-reload",
        )
        logger.info(
-            "Skill hot-reload enabled (watching %d directories)",
+            "Skill hot-reload enabled (watching %d dirs, %d override files)",
            len(self._watched_dirs),
+            len(self._watched_files),
        )

    async def stop_watching(self) -> None:
        """Cancel the background watcher task (if running)."""
-        import asyncio
-
        task = self._watcher_task
        if task is None:
            return
@@ -228,22 +379,35 @@ class SkillsManager:
                pass

    async def _watch_loop(self) -> None:
-        """Background coroutine that watches SKILL.md files and triggers reload."""
-        import asyncio
-
+        """Watch SKILL.md + override JSON files and trigger reload on change."""
        import watchfiles

        def _filter(_change: object, path: str) -> bool:
-            return path.endswith("SKILL.md")
+            return path.endswith("SKILL.md") or path.endswith("skills_overrides.json")
+
+        # watchfiles accepts a mix of dirs and files; file watches survive
+        # a tmp+rename (the containing dir sees the event).
+        watch_targets = list(self._watched_dirs)
+        for f in self._watched_files:
+            # watchfiles needs the parent dir for file-level events to fire
+            # reliably through atomic replace; adding the file path directly
+            # works on Linux/macOS inotify/FSEvents but a dir watch is
+            # belt-and-braces.
+            parent = str(Path(f).parent)
+            if parent not in watch_targets:
+                watch_targets.append(parent)
+
+        if not watch_targets:
+            return

        try:
            async for changes in watchfiles.awatch(
-                *self._watched_dirs,
+                *watch_targets,
                watch_filter=_filter,
                debounce=1000,
            ):
                paths = [p for _, p in changes]
-                logger.info("SKILL.md changes detected: %s", paths)
+                logger.info("Skill state changes detected: %s", paths)
                try:
                    self._reload()
                except Exception:
@@ -0,0 +1,254 @@
+"""Per-scope skill override store.
+
+Sits between :mod:`framework.skills.discovery` and
+:class:`framework.skills.catalog.SkillCatalog`: records the user's
+per-queen and per-colony decisions about which skills are enabled,
+who created them (provenance), and any parameter tweaks.
+
+Two well-known paths back this module:
+
+* Queen scope:   ``~/.hive/agents/queens/{queen_id}/skills_overrides.json``
+* Colony scope:  ``~/.hive/colonies/{colony_name}/skills_overrides.json``
+
+The schema is intentionally small; see :class:`SkillOverrideStore` for
+the JSON shape. Atomic writes mirror
+:class:`framework.skills.trust.TrustedRepoStore` (tmp + rename).
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from dataclasses import dataclass, field
+from datetime import UTC, datetime
+from enum import StrEnum
+from pathlib import Path
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+_SCHEMA_VERSION = 1
+
+
+class Provenance(StrEnum):
+    """Where a skill came from.
+
+    The override store is the authoritative provenance ledger for anything
+    the UI or the queen tools touched. Framework / user-dropped /
+    project-dropped skills don't need an entry unless they've been
+    explicitly configured.
+    """
+
+    FRAMEWORK = "framework"
+    PRESET = "preset"
+    USER_DROPPED = "user_dropped"
+    USER_UI_CREATED = "user_ui_created"
+    QUEEN_CREATED = "queen_created"
+    LEARNED_RUNTIME = "learned_runtime"
+    PROJECT_DROPPED = "project_dropped"
+    # Catch-all for skills with no recorded authorship: legacy rows from
+    # before the override store existed, PATCHes that precede any CREATE,
+    # etc. Keeps the ledger honest rather than forcing a guess.
+    OTHER = "other"
+
+
+@dataclass
+class OverrideEntry:
+    """Per-skill override record inside a scope's store."""
+
+    enabled: bool | None = None
+    provenance: Provenance = Provenance.FRAMEWORK
+    trust: str | None = None
+    param_overrides: dict[str, Any] = field(default_factory=dict)
+    notes: str | None = None
+    created_at: datetime | None = None
+    created_by: str | None = None
+
+    def clone(self) -> OverrideEntry:
+        """Return a deep-enough copy (dict fields are re-allocated)."""
+        return OverrideEntry(
+            enabled=self.enabled,
+            provenance=self.provenance,
+            trust=self.trust,
+            param_overrides=dict(self.param_overrides),
+            notes=self.notes,
+            created_at=self.created_at,
+            created_by=self.created_by,
+        )
+
+    def to_dict(self) -> dict[str, Any]:
+        out: dict[str, Any] = {"provenance": str(self.provenance)}
+        if self.enabled is not None:
+            out["enabled"] = bool(self.enabled)
+        if self.trust is not None:
+            out["trust"] = self.trust
+        if self.param_overrides:
+            out["param_overrides"] = dict(self.param_overrides)
+        if self.notes is not None:
+            out["notes"] = self.notes
+        if self.created_at is not None:
+            out["created_at"] = self.created_at.isoformat()
+        if self.created_by is not None:
+            out["created_by"] = self.created_by
+        return out
+
+    @classmethod
+    def from_dict(cls, raw: dict[str, Any]) -> OverrideEntry:
+        created_at_raw = raw.get("created_at")
+        created_at: datetime | None = None
+        if isinstance(created_at_raw, str):
+            try:
+                created_at = datetime.fromisoformat(created_at_raw)
+            except ValueError:
+                created_at = None
+        provenance_raw = raw.get("provenance") or Provenance.FRAMEWORK
+        try:
+            provenance = Provenance(provenance_raw)
+        except ValueError:
+            logger.warning("override: unknown provenance %r; defaulting to framework", provenance_raw)
+            provenance = Provenance.FRAMEWORK
+        enabled = raw.get("enabled")
+        return cls(
+            enabled=enabled if isinstance(enabled, bool) else None,
+            provenance=provenance,
+            trust=raw.get("trust") if isinstance(raw.get("trust"), str) else None,
+            param_overrides=dict(raw.get("param_overrides") or {}),
+            notes=raw.get("notes") if isinstance(raw.get("notes"), str) else None,
+            created_at=created_at,
+            created_by=raw.get("created_by") if isinstance(raw.get("created_by"), str) else None,
+        )
+
+
+@dataclass
+class SkillOverrideStore:
+    """Persistent per-scope override file.
+
+    The file is created lazily on first save; a missing file behaves like
+    an empty store (all skills inherit defaults, no metadata recorded).
+    """
+
+    path: Path
+    scope_label: str = ""
+    version: int = _SCHEMA_VERSION
+    all_defaults_disabled: bool = False
+    overrides: dict[str, OverrideEntry] = field(default_factory=dict)
+    deleted_ui_skills: set[str] = field(default_factory=set)
+
+    # ------------------------------------------------------------------
+    # Factory
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def load(cls, path: Path, scope_label: str = "") -> SkillOverrideStore:
+        """Load the store from disk; return an empty store if the file is absent.
+
+        Permissive on parse errors: logs and returns an empty store rather
+        than raising, so a corrupted file never takes down skill loading.
+        """
+        store = cls(path=path, scope_label=scope_label)
+        try:
+            raw = json.loads(path.read_text(encoding="utf-8"))
+        except FileNotFoundError:
+            return store
+        except Exception as exc:
+            logger.warning("override: failed to read %s (%s); starting empty", path, exc)
+            return store
+        if not isinstance(raw, dict):
+            logger.warning("override: %s is not an object; starting empty", path)
+            return store
+
+        store.version = int(raw.get("version", _SCHEMA_VERSION))
+        store.all_defaults_disabled = bool(raw.get("all_defaults_disabled", False))
+        raw_overrides = raw.get("overrides") or {}
+        if isinstance(raw_overrides, dict):
+            for name, entry_raw in raw_overrides.items():
+                if not isinstance(name, str) or not isinstance(entry_raw, dict):
+                    continue
+                store.overrides[name] = OverrideEntry.from_dict(entry_raw)
+        deleted = raw.get("deleted_ui_skills") or []
+        if isinstance(deleted, list):
+            store.deleted_ui_skills = {s for s in deleted if isinstance(s, str)}
+        return store
+
+    # ------------------------------------------------------------------
+    # Mutations
+    # ------------------------------------------------------------------
+
+    def upsert(self, skill_name: str, entry: OverrideEntry) -> None:
+        """Insert or replace a skill's override entry."""
+        self.overrides[skill_name] = entry
+        # If we're explicitly managing this skill again, lift any tombstone.
+        self.deleted_ui_skills.discard(skill_name)
+
+    def set_enabled(self, skill_name: str, enabled: bool, *, provenance: Provenance | None = None) -> None:
+        """Convenience: toggle enabled without rewriting other fields."""
+        existing = self.overrides.get(skill_name)
+        if existing is None:
+            existing = OverrideEntry(
+                enabled=enabled,
+                provenance=provenance or Provenance.FRAMEWORK,
+            )
+        else:
+            existing.enabled = enabled
+            if provenance is not None:
+                existing.provenance = provenance
+        self.overrides[skill_name] = existing
+
+    def remove(self, skill_name: str, *, tombstone: bool = True) -> None:
+        """Drop a skill's override entry; optionally leave a tombstone.
+
+        Tombstones matter for UI-created skills: if the user deletes a
+        queen-scope skill via the UI, we rm-tree its directory, but the
+        file watcher might lag or a background process might have an
+        open handle. A tombstone ensures the loader treats the skill as
+        gone even if a stale SKILL.md lingers.
+        """
+        self.overrides.pop(skill_name, None)
+        if tombstone:
+            self.deleted_ui_skills.add(skill_name)
+
+    def is_disabled(self, skill_name: str, *, default_enabled: bool) -> bool:
+        """Return True when this scope's override force-disables the skill."""
+        if self.all_defaults_disabled and default_enabled:
+            # Caller says "default enabled"; master switch flips it off unless
+            # an explicit enabled=True override re-enables.
+            entry = self.overrides.get(skill_name)
+            if entry is not None and entry.enabled is True:
+                return False
+            return True
+        entry = self.overrides.get(skill_name)
+        if entry is None:
+            return not default_enabled
+        if entry.enabled is None:
+            return not default_enabled
+        return not entry.enabled
+
+    def effective_enabled(self, skill_name: str, *, default_enabled: bool) -> bool:
+        """The inverse of :meth:`is_disabled`, for readability at call sites."""
+        return not self.is_disabled(skill_name, default_enabled=default_enabled)
+
+    def get(self, skill_name: str) -> OverrideEntry | None:
+        return self.overrides.get(skill_name)
+
+    # ------------------------------------------------------------------
+    # Persistence
+    # ------------------------------------------------------------------
+
+    def save(self) -> None:
+        """Atomic write: tmp + rename. Creates the parent dir if needed."""
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+        payload: dict[str, Any] = {
+            "version": self.version,
+            "all_defaults_disabled": self.all_defaults_disabled,
+            "overrides": {name: entry.to_dict() for name, entry in sorted(self.overrides.items())},
+        }
+        if self.deleted_ui_skills:
+            payload["deleted_ui_skills"] = sorted(self.deleted_ui_skills)
+        tmp = self.path.with_suffix(self.path.suffix + ".tmp")
+        tmp.write_text(json.dumps(payload, indent=2), encoding="utf-8")
+        tmp.replace(self.path)
+
+
+def utc_now() -> datetime:
+    """Single source of truth for override timestamps."""
+    return datetime.now(tz=UTC)
@@ -26,9 +26,21 @@ _DEFAULT_REGISTRY_URL = (
    "https://raw.githubusercontent.com/hive-skill-registry/hive-skill-registry/main/skill_index.json"
 )

-_CACHE_DIR = Path.home() / ".hive" / "registry_cache"
-_CACHE_INDEX_PATH = _CACHE_DIR / "skill_index.json"
-_CACHE_METADATA_PATH = _CACHE_DIR / "metadata.json"
+
+def _cache_dir() -> Path:
+    from framework.config import HIVE_HOME
+
+    return HIVE_HOME / "registry_cache"
+
+
+def _cache_index_path() -> Path:
+    return _cache_dir() / "skill_index.json"
+
+
+def _cache_metadata_path() -> Path:
+    return _cache_dir() / "metadata.json"
+
+
 _CACHE_TTL_SECONDS = 3600  # 1 hour


@@ -46,7 +58,7 @@ class RegistryClient:
        cache_dir: Path | None = None,
    ) -> None:
        self._url = registry_url or os.environ.get("HIVE_REGISTRY_URL", _DEFAULT_REGISTRY_URL)
-        cache_root = cache_dir or _CACHE_DIR
+        cache_root = cache_dir or _cache_dir()
        self._index_path = cache_root / "skill_index.json"
        self._metadata_path = cache_root / "metadata.json"

@@ -20,23 +20,44 @@ from pathlib import Path

 logger = logging.getLogger(__name__)

-_DEFAULT_SKILLS_DIR = Path(__file__).parent / "_default_skills"
+# Bundled skills live in two sibling dirs: ``_default_skills`` (always-on
+# infra) and ``_preset_skills`` (capability packs, off by default but
+# still bundled). Tool-gated pre-activation walks both so ``browser_*``
+# tools still pull in the browser-automation preset even though it isn't
+# default-enabled in the catalog.
+_BUNDLED_DIRS: tuple[Path, ...] = (
+    Path(__file__).parent / "_default_skills",
+    Path(__file__).parent / "_preset_skills",
+)

-# (tool-name prefix, default skill directory name, display name)
+# (tool-name prefix, skill directory name, display name)
 _TOOL_GATED_SKILLS: list[tuple[str, str, str]] = [
    ("browser_", "browser-automation", "hive.browser-automation"),
+    ("terminal_", "terminal-tools-foundations", "hive.terminal-tools-foundations"),
+    ("chart_", "chart-creation-foundations", "hive.chart-creation-foundations"),
 ]

 _BODY_CACHE: dict[str, str] = {}


 def _load_body(dir_name: str) -> str:
-    """Load the markdown body of a framework default skill, cached."""
+    """Load the markdown body of a bundled skill, cached. Searches every
+    bundled directory (default + preset) so the mapping table doesn't
+    need to know which dir a skill lives in.
+    """
    if dir_name in _BODY_CACHE:
        return _BODY_CACHE[dir_name]

-    path = _DEFAULT_SKILLS_DIR / dir_name / "SKILL.md"
+    path: Path | None = None
+    for parent in _BUNDLED_DIRS:
+        candidate = parent / dir_name / "SKILL.md"
+        if candidate.exists():
+            path = candidate
+            break
    body = ""
+    if path is None:
+        _BODY_CACHE[dir_name] = body
+        return body
    try:
        raw = path.read_text(encoding="utf-8")
        # Strip YAML frontmatter (between the first two '---' fences)
@@ -20,6 +20,7 @@ from enum import StrEnum
 from pathlib import Path
 from urllib.parse import urlparse

+from framework.config import HIVE_HOME
 from framework.skills.parser import ParsedSkill

 logger = logging.getLogger(__name__)
@@ -30,8 +31,11 @@ _ENV_TRUST_ALL = "HIVE_TRUST_PROJECT_SKILLS"
 # Env var for comma-separated own-remote glob patterns (e.g. "github.com/myorg/*").
 _ENV_OWN_REMOTES = "HIVE_OWN_REMOTES"

-_TRUSTED_REPOS_PATH = Path.home() / ".hive" / "trusted_repos.json"
-_NOTICE_SENTINEL_PATH = Path.home() / ".hive" / ".skill_trust_notice_shown"
+# Persisted store of trusted git remotes (one-shot consent per repo).
+_TRUSTED_REPOS_PATH = HIVE_HOME / "trusted_repos.json"
+
+# Sentinel for the one-time security notice (NFR-5).
+_NOTICE_SENTINEL_PATH = HIVE_HOME / ".skill_trust_notice_shown"


 # ---------------------------------------------------------------------------
@@ -224,7 +228,9 @@ class ProjectTrustDetector:
            patterns.extend(p.strip() for p in raw.split(",") if p.strip())

        # From ~/.hive/own_remotes file
-        own_remotes_file = Path.home() / ".hive" / "own_remotes"
+        from framework.config import HIVE_HOME
+
+        own_remotes_file = HIVE_HOME / "own_remotes"
        if own_remotes_file.is_file():
            try:
                for line in own_remotes_file.read_text(encoding="utf-8").splitlines():
@@ -318,13 +324,19 @@ class TrustGate:
    ) -> list[ParsedSkill]:
        """Return the subset of skills that are trusted for loading.

-        - Framework and user-scope skills: always included.
+        - Framework, user, queen_ui, and colony_ui scopes: always included.
+          (UI-created skills are authenticated by the user creating them
+          through the authenticated UI — they do not go through the
+          trusted_repos.json flow.)
        - Project-scope skills: classified; consent prompt shown if untrusted.
        """
        import os

-        # Separate project skills from always-trusted scopes
-        always_trusted = [s for s in skills if s.source_scope != "project"]
+        # UI-authored scopes bypass the trust gate — they're implicitly
+        # trusted because the user authored them through the UI. ``preset``
+        # ships with the framework distribution, so it's trusted too.
+        _bypass_scopes = {"framework", "preset", "user", "queen_ui", "colony_ui"}
+        always_trusted = [s for s in skills if s.source_scope in _bypass_scopes]
        project_skills = [s for s in skills if s.source_scope == "project"]

        if not project_skills:
@@ -409,7 +421,8 @@ class TrustGate:

    def _maybe_show_security_notice(self, Colors) -> None:  # noqa: N803
        """Show the one-time security notice if not already shown (NFR-5)."""
-        if _NOTICE_SENTINEL_PATH.exists():
+        sentinel = _NOTICE_SENTINEL_PATH
+        if sentinel.exists():
            return
        self._print("")
        self._print(
@@ -421,8 +434,8 @@ class TrustGate:
        )
        self._print("")
        try:
-            _NOTICE_SENTINEL_PATH.parent.mkdir(parents=True, exist_ok=True)
-            _NOTICE_SENTINEL_PATH.touch()
+            sentinel.parent.mkdir(parents=True, exist_ok=True)
+            sentinel.touch()
        except OSError:
            pass

--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Timothy	74a283aab6	fix: runtime oauth issues	2026-05-04 20:44:07 -07:00
Timothy	569c715031	feat: allow credential assignment as worker profile	2026-05-04 12:01:43 -07:00
Richard Tang	feabf32768	fix: worker context token	2026-05-03 11:45:37 -07:00
Richard Tang	eee55ea8c7	chore: fix wrong model name	2026-05-03 11:35:05 -07:00
Richard Tang	78fffa63ec	chore: ci and release doc Release / Create Release (push) Waiting to run Details	2026-05-01 18:06:39 -07:00
Richard Tang	9a75d45351	chore: lint	2026-05-01 17:53:44 -07:00
Timothy	3a94f52009	feat: sync tool result contentful display	2026-05-01 17:44:19 -07:00
Timothy	522e0f511e	fix: y-axis	2026-05-01 15:48:36 -07:00
Timothy	e6310f1243	fix: normalize chart spec in renderer	2026-05-01 15:36:09 -07:00
Richard Tang	12ffacccab	feat: tools config frontend grouping and tools cleanup	2026-05-01 15:28:40 -07:00
Timothy	8c36b1575c	Merge branch 'feature/merge-to-file-ops' into feat/file-ops	2026-05-01 14:57:21 -07:00
Timothy	6540f7b31e	feat: pura linea	2026-05-01 14:57:06 -07:00
Richard Tang	a09eac06f1	feat: improve web search and consolidate browser open	2026-05-01 14:55:20 -07:00
Richard Tang	b939a875a7	refactor: update autocompaction tools and concurrency tools	2026-05-01 14:27:38 -07:00
Richard Tang	b826e70d8c	feat: remove old lifecyle tools	2026-05-01 14:07:34 -07:00
Richard Tang	6f2f037c9c	feat: remvoe other default tools	2026-05-01 13:35:04 -07:00
Richard Tang	c147364d8c	feat: browser tools audit and improvements	2026-05-01 13:22:31 -07:00
Richard Tang	35bd497750	feat: refactor edit file and update default tools	2026-05-01 12:40:53 -07:00
Richard Tang	574c4bbe33	Merge remote-tracking branch 'origin/feature/sync-20260430' into feat/file-ops	2026-05-01 07:42:20 -07:00
Richard Tang	d22a01682a	feat: major file ops refactor	2026-05-01 07:41:42 -07:00
Timothy	0c6f0f8aef	refactor: rename shell tools to terminal tools	2026-04-30 19:52:34 -07:00
Richard Tang	0e8efa7bcc	feat: vision fallback auth	2026-04-30 19:52:12 -07:00
Timothy	7b1dda7bf3	fix: mcp registry initialization	2026-04-30 19:52:04 -07:00
Timothy	725dd1f410	fix: shell split command	2026-04-30 19:52:01 -07:00
Timothy	de4b2dc151	chore: give shell tools to queen	2026-04-30 19:51:57 -07:00
Timothy	0784cea314	fix: initial install	2026-04-30 19:51:55 -07:00
Timothy	20bbf08278	feat: perita manus	2026-04-30 19:51:44 -07:00
Richard Tang	f8233bda56	feat: consolidate search and list file tools	2026-04-30 15:43:15 -07:00
Richard Tang	76a7dd4bd5	feat: loose the max token for vision fallback as some models spend output on internal thinking	2026-04-30 13:24:49 -07:00
Richard Tang	73511a3c59	feat: vision fallback with intent	2026-04-30 13:02:57 -07:00
Richard Tang	a0817fcde4	feat: vision model retry and fallback	2026-04-30 12:38:30 -07:00
Richard Tang	628ce9ca12	feat: use simple snapshot for auto_snapshot_mode	2026-04-30 10:43:14 -07:00
Richard Tang	cc4213a942	fix: llm debugger tool call display	2026-04-30 10:31:21 -07:00
Richard Tang	d12d5b7e8b	fix: llm debugger timeline order	2026-04-30 08:05:38 -07:00
Harshit Shukla	038c5fd807	fix(credentials): align EnvVarStorage exists with load semantics (#5680 ) * Return boolean from exists method for credential check * Add test for empty value handling in EnvVarStorage Add test to verify exists() and load() consistency for empty values in EnvVarStorage.	2026-04-30 19:40:28 +08:00
Leayx	3d5f2595c9	bug(test_zoho_crm_tool): remove orphan test directory under src (#7142 ) Problem - The Zoho CRM tool was refactored to an MCP-based architecture, making the old in-tree test suite obsolete - The remaining tests under src were not executed by pytest. Testpaths only includes tools/tests, effectively making them dead code - A proper MCP test suite already exists under tools/tests, providing coverage Decision - Removed the unused test directory under src/aden_tools/tools/zoho_crm_tool/tests - Aligns project structure with existing tools, where tests are only located in tools/tests - Avoids confusion and prevents future contributors from relying on outdated or non-executed tests	2026-04-30 18:57:49 +08:00
Hundao	7881177f1f	fix: unbreak main CI — skills HIVE_HOME refactor + run_parallel_workers task text (#7149 ) * fix(skills): restore module-level path constants for HIVE_HOME refactor `ae2aa30e` replaced module-level USER_SKILLS_DIR / INSTALL_NOTICE_SENTINEL in installer.py and _NOTICE_SENTINEL_PATH / _TRUSTED_REPOS_PATH in trust.py with lazy helper functions, but left callers and tests still referencing the original symbols. CI fails with ImportError / AttributeError. Restore them as module-level constants computed from HIVE_HOME so the desktop-shell override still works, callers in cli.py keep importing the same names, and existing test monkeypatches stay valid. Refs #7148 * fix(colony): preserve task text in run_parallel_workers spawn data run_parallel_workers stamps __template_task_id into spec['data'] before calling spawn_batch. Once that mutation makes spec['data'] non-empty, colony_runtime.spawn()'s ``input_data or {"task": task}`` fallback no longer fires and the task description disappears from the worker's first user message. Workers loop on empty responses and never emit SUBAGENT_REPORT. Hoist the ``setdefault("task")`` step out of the template-publish try block so task text survives even if the template store fails non-fatally. Inner loop only stamps __template_task_id. Refs #7148	2026-04-30 18:43:54 +08:00
Richard Tang	2cfea915f4	chore: ruff format	2026-04-29 19:23:31 -07:00
Richard Tang	ac46a1be72	Merge branch 'feat/ask-user-chat-display'	2026-04-29 19:22:51 -07:00
Richard Tang	7b0b472167	chore: lint	2026-04-29 19:16:00 -07:00
Richard Tang	697aae33fe	feat: prompts simplification	2026-04-29 19:13:01 -07:00
Richard Tang	d26e7f33d2	fix: incubating mode approval guidence injection	2026-04-29 18:43:26 -07:00
Richard Tang	6357597e88	chore: improve llm visibility	2026-04-29 18:37:29 -07:00
Richard Tang	579f1d7512	feat(tasks): refactor task folder	2026-04-29 17:33:34 -07:00
bryan	965264c973	fix: defer ask_user question bubble until user answers	2026-04-29 16:31:19 -07:00
Richard Tang	e80d275321	feat(queen): drop redundant _queen_style prompt block	2026-04-29 15:47:18 -07:00
Richard Tang	5b45fac435	feat: prompt improvements	2026-04-29 15:26:58 -07:00
Richard Tang	4794c8b816	chore: log the vision fallback model usage	2026-04-29 13:08:38 -07:00
Timothy	5492366c31	fix: recover frontend	2026-04-29 11:25:29 -07:00
Timothy	ae2aa30edf	fix: all agent path prefixed by HIVE_HOME	2026-04-28 19:16:35 -07:00
Timothy	dd69a53de1	fix: hardcoded hive home	2026-04-28 18:25:21 -07:00
bryan	062a4e3166	feat: new-session navigation with queen warm-up UI	2026-04-28 18:17:25 -07:00
bryan	fe9a903928	feat: surface ask_user questions in chat transcript	2026-04-28 18:16:47 -07:00
Timothy	7c3bada70c	fix: patch litellm	2026-04-28 18:00:31 -07:00
Timothy	4ef951447d	fix: compaction issues	2026-04-28 10:43:54 -07:00
Timothy	ccb6556a41	Merge branch 'main' into feature/colony-push	2026-04-27 18:24:41 -07:00
Timothy	5ca5021fc1	feat: colony routes	2026-04-27 18:24:18 -07:00
Richard Tang	9eeba74851	Merge branch 'feat/tasks-system'	2026-04-27 10:55:50 -07:00
Richard Tang	facd919371	chore: task list order	2026-04-27 10:55:18 -07:00
Richard Tang	cb1484be85	feat: multi task creation	2026-04-27 10:35:02 -07:00
Timothy	82ce6bed68	Merge branch 'feat/api-colonies-import'	2026-04-26 21:13:18 -07:00
Timothy	efdb404655	feat: POST /api/colonies/import — onboard a colony from a tarball Accepts a multipart upload of `tar` / `tar.gz` (any compression tarfile.open auto-detects) containing a single top-level directory and unpacks it into HIVE_HOME/colonies/<name>. Lets a desktop client (or any external tool) hand a colony spec to a remote runtime to run. Form fields: file (required) the archive blob name (optional) override the colony name; defaults to the archive's top-level dir replace_existing (optional) "true" to overwrite; else 409 if the target dir already exists Safety: - 50 MB upload cap (multipart reader streams + caps each part) - Manual path-traversal validation per member (Python 3.11 compatible — tarfile's safe `filter='data'` only landed in 3.12) - Symlinks, hardlinks, device, fifo entries all rejected - Colony name validated against the existing [a-z0-9_]+ pattern used by routes_colony_workers + queen_lifecycle_tools - Mode bits masked to 0o755 / 0o644 so a tampered tar can't ship world-writable scripts Tests cover happy path, name override, 409 / 201 around replace_existing, path traversal, absolute paths, symlinks, multiple top-level dirs, invalid colony name, missing file part, corrupt tar, non-multipart, and uncompressed tar. Future work (not in this PR): export endpoint, colony list/delete via this same prefix, and an MCP tool wrapper so queens can move colonies between hosts mid-conversation.	2026-04-26 20:16:10 -07:00
Richard Tang	da361f735d	chore: lint	2026-04-26 19:45:52 -07:00
Richard Tang	eea0429f93	fix: improve prompt	2026-04-26 19:38:14 -07:00
Richard Tang	833aa4bc7a	feat): fix structural blockers preventing the queen from using task_* , also enhanced the hook	2026-04-26 19:15:23 -07:00
Richard Tang	0af597881f	feat(tasks): file-backed task system with colony template + UI	2026-04-26 18:49:45 -07:00
RichardTang-Aden	6fae1f04c8	Merge pull request #7143 from aden-hive/fix/scrolling-container Fix/scrolling container	2026-04-26 12:14:29 -07:00
Richard Tang	8c4085f5e8	chore: lint	2026-04-26 11:35:16 -07:00
Richard Tang	53240eb888	fix: scroll with certian element selector	2026-04-26 11:34:47 -07:00
Hundao	de8d6f0946	fix(tests): unblock main CI (#7141 ) Two unrelated test failures were keeping main red: - test_capabilities.py: fixtures referenced deprecated model identifiers no longer in model_catalog.json. After the catalog refactor unknown models default to vision-capable, so 12 "expect False" assertions flipped to True. Replace fixtures with current catalog entries that carry an explicit supports_vision flag. - test_colony_runtime_overseer.py: a 200ms hard sleep racing the background worker was flaky on Windows CI. Poll for llm.stream_calls with a 5s deadline instead.	2026-04-26 21:34:21 +08:00
SAURABH KUMAR	ea707438f2	feat(tools): add SimilarWeb V5 API integration (#7066 ) Adds 29 MCP tools for SimilarWeb V5 covering traffic and engagement, competitor intelligence, keywords/SERP, audience demographics, and segment analysis. Includes credential spec, health checker, README, and tests on ubuntu and windows. Closes #7022	2026-04-26 20:37:44 +08:00
Richard Tang	445c9600ab	chore: release v0.10.5 Release / Create Release (push) Waiting to run Details Cache-aware cost reporting + new frontier models (GPT-5.5, DeepSeek V4 Pro/Flash, GLM-5.1). cache_control now propagates through OpenRouter sub-providers (anthropic / gemini / glm / minimax) so the static system prefix actually hits cache, and every response/finish event carries cost_usd computed from a four-source fallback chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:21:03 -07:00
Richard Tang	2ab5e6d784	feat: model support	2026-04-24 20:17:41 -07:00
RichardTang-Aden	e7f9b7d791	Merge pull request #7132 from vincentjiang777/feat/colony-session-transfer feat: redesign configuration UI for sidebar, prompts, skills, and tools	2026-04-24 19:02:03 -07:00
Vincent Jiang	3cb0c69a96	feat: redesign configuration UI — sidebar, prompt library, skills, and tools - Sidebar: rename Library to Configuration, reorder nav (Credentials 3rd, Configuration 4th), reorder sub-items (Prompts, Skills, Tools) - Prompt Library: separate My Prompts from Community Prompts into distinct sections - Skills Configuration: rename page title, sort queens by org chart order, group active/inactive skills, style Upload button as primary - Tool Configuration: rename page title, sort queens by org chart order, add Save/Cancel/Allow all/Reset to defaults workflow, filter lifecycle tool names to fix "Unknown MCP tool name" save errors - Fix (unknown) tool group label in server fallback catalog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 10:41:38 -07:00
Richard Tang	22d75bfb05	chore: lint format	2026-04-24 10:12:06 -07:00
Vincent Jiang	357df1bbcb	merge: pull upstream/main into feat/colony-session-transfer	2026-04-24 09:28:46 -07:00
Richard Tang	386bbd5780	feat: persistent cost tracking	2026-04-24 09:19:57 -07:00
Richard Tang	235022b35d	feat: support glm 5.1	2026-04-24 07:45:37 -07:00
Richard Tang	4d8f312c3e	Merge remote-tracking branch 'origin/feat/cache-token' into feature/vision-subagent	2026-04-23 22:21:45 -07:00
Timothy	4651a6a85a	fix: vision caption	2026-04-23 21:30:59 -07:00
Timothy	ea9c163438	feat: image vision fallback	2026-04-23 21:24:56 -07:00
Richard Tang	77cc169606	feat: cost tracking	2026-04-23 15:34:07 -07:00
Richard Tang	8c6428f445	feat: token comsumption usage	2026-04-23 15:05:30 -07:00
Richard Tang	44cb0c0f4c	feat: hybrid compaction buffer (fixed tokens + ratio of context) The compaction trigger now reserves headroom equal to compaction_buffer_tokens + compaction_buffer_ratio * max_context_tokens. The fixed component (default 8k, sized for one max-sized tool result) gives a floor on small windows; the ratio (default 0.15) keeps the trigger meaningful on large windows where any constant buffer becomes a rounding error (8k buffer is 75% on a 32k window but 96% on a 200k window). Result: ~80% pre-turn trigger on 200k+ windows so the inner tool loop has room to grow without firing the mid-turn pre-send guard.	2026-04-23 15:04:19 -07:00
Richard Tang	2621fb88b1	fix: drain bg fork tasks before colony-spawn artifact asserts Compaction + worker-storage copy moved to a background task in f39c1c87; the test checked the worker-storage file before the task ran, which flaked under CI load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:38:21 -07:00
Richard Tang	a70f92edbe	chore: lint format	2026-04-22 21:33:33 -07:00
Richard Tang	b2efa179ea	docs: note cache fix in v0.10.4 release notes Release / Create Release (push) Waiting to run Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:27:24 -07:00
Richard Tang	8c6e76d052	fix: no cache for queen config	2026-04-22 21:24:00 -07:00
Richard Tang	c7f1fbf19f	chore: release v0.10.4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:12:28 -07:00
Richard Tang	7047ecbf46	chore: fixed ci	2026-04-22 20:14:36 -07:00
Richard Tang	b96ee5aaab	fix: create new session and switch branch	2026-04-22 20:05:21 -07:00
Richard Tang	6744bea01a	feat: move date time inject from system prompt	2026-04-22 17:11:34 -07:00
Richard Tang	390038225b	feat static system prompt	2026-04-22 16:54:47 -07:00
Richard Tang	b55c8fdf86	fix: validate session creation inputs and tighten skill/reflection edges	2026-04-22 15:08:50 -07:00
Richard Tang	e9aea0bbc4	fix: tools and skills registration	2026-04-22 13:54:10 -07:00
Richard Tang	0ba1fa8262	feat: created colony inherit skills and tools	2026-04-21 19:23:33 -07:00
Richard Tang	0fd96d410e	feat: configurable default tools and skills	2026-04-21 19:15:40 -07:00
Richard Tang	c658a7c50b	feat: default skills and tools	2026-04-21 19:15:28 -07:00
Richard Tang	56c3659bda	feat: refactor tool config and library menu	2026-04-21 18:57:11 -07:00
Richard Tang	14f927996c	feat: skill library	2026-04-21 18:48:22 -07:00
Richard Tang	8a0ec070b8	feat: tool library	2026-04-21 17:20:54 -07:00
Richard Tang	80cd77ac30	chore: release v0.10.3 Release / Create Release (push) Waiting to run Details	2026-04-20 19:49:28 -07:00