feat: cost tracking

feat: token comsumption usage
feat: hybrid compaction buffer (fixed tokens + ratio of context)
2026-04-23 15:34:07 -07:00 · 2026-04-23 15:05:30 -07:00 · 2026-04-23 15:04:19 -07:00 · 2026-04-22 21:38:21 -07:00 · 2026-04-22 21:33:33 -07:00 · 2026-04-22 21:27:24 -07:00
211 changed files with 29610 additions and 6762 deletions
@@ -44,12 +44,29 @@
      "WebFetch(domain:docs.litellm.ai)",
      "Bash(cat /home/timothy/aden/hive/.venv/lib/python3.11/site-packages/litellm-*.dist-info/METADATA)",
      "Bash(find \"/home/timothy/.hive/agents/queens/queen_brand_design/sessions/session_20260415_100751_d49f4c28/\" -type f -name \"*.json*\" -exec grep -l \"协日\" {} \\\\;)",
-      "Bash(grep -v ':0$')"
+      "Bash(grep -v ':0$')",
+      "Bash(curl -s -m 2 http://127.0.0.1:4002/sse -o /dev/null -w 'status=%{http_code} time=%{time_total}s\\\\n')",
+      "mcp__gcu-tools__browser_status",
+      "mcp__gcu-tools__browser_start",
+      "mcp__gcu-tools__browser_navigate",
+      "mcp__gcu-tools__browser_evaluate",
+      "mcp__gcu-tools__browser_screenshot",
+      "mcp__gcu-tools__browser_open",
+      "mcp__gcu-tools__browser_click_coordinate",
+      "mcp__gcu-tools__browser_get_rect",
+      "mcp__gcu-tools__browser_type_focused",
+      "mcp__gcu-tools__browser_wait",
+      "Bash(python3 -c ' *)",
+      "Bash(python3 scripts/debug_queen_prompt.py independent)",
+      "Bash(curl -s --max-time 2 http://127.0.0.1:9230/status)",
+      "Bash(python3 -c \"import json, sys; print\\(json.loads\\(sys.stdin.read\\(\\)\\)['data']['content']\\)\")",
+      "Bash(python3 -c \"import json; json.load\\(open\\('/home/timothy/aden/hive/tools/browser-extension/manifest.json'\\)\\)\")"
    ],
    "additionalDirectories": [
      "/home/timothy/.hive/skills/writing-hive-skills",
      "/tmp",
-      "/home/timothy/.hive/skills"
+      "/home/timothy/.hive/skills",
+      "/home/timothy/aden/hive/core/frontend/src/components"
    ]
  },
  "hooks": {
@@ -64,7 +64,7 @@ snapshot = await browser_snapshot(tab_id)
 |---------|--------------|-------|
 | Scroll doesn't move | Nested scroll container | Look for `overflow: scroll` divs |
 | Click no effect | Element covered | Check `getBoundingClientRect` vs viewport |
-| Type clears | Autocomplete/React | Check for event listeners on input |
+| Type clears | Autocomplete/React | Check for event listeners on input; try `browser_type_focused` |
 | Snapshot hangs | Huge DOM | Check node count in snapshot |
 | Snapshot stale | SPA hydration | Wait after navigation |

@@ -229,7 +229,7 @@ function queryShadow(selector) {
 |-------|-------------|----------|
 | Scroll not working | Find scrollable container | Mouse wheel at container center |
 | Click no effect | JavaScript click() | CDP mouse events |
-| Type clears | Add delay_ms | Use execCommand |
+| Type clears | Add delay_ms | Use `browser_type_focused` (Input.insertText) |
 | Snapshot hangs | Add timeout_s | DOM snapshot fallback |
 | Stale content | Wait for selector | Increase wait_until timeout |
 | Shadow DOM | Pierce selector | JavaScript traversal |
@@ -1,18 +0,0 @@
-This project uses ruff for Python linting and formatting.
-
-Rules:
- Line length: 100 characters
- Python target: 3.11+
- Use double quotes for strings
- Sort imports with isort (ruff I rules): stdlib, third-party, first-party (framework), local
- Combine as-imports
- Use type hints on all function signatures
- Use `from __future__ import annotations` for modern type syntax
- Raise exceptions with `from` in except blocks (B904)
- No unused imports (F401), no unused variables (F841)
- Prefer list/dict/set comprehensions over map/filter (C4)
-
-Run `make lint` to auto-fix, `make check` to verify without modifying files.
-Run `make format` to apply ruff formatting.
-
-The ruff config lives in core/pyproject.toml under [tool.ruff].
@@ -1,35 +0,0 @@
-# Git
-.git/
-.gitignore
-
-# Documentation
-*.md
-docs/
-LICENSE
-
-# IDE
-.idea/
-.vscode/
-
-# Dependencies (rebuilt in container)
-node_modules/
-
-# Build artifacts
-dist/
-build/
-coverage/
-
-# Environment files
-.env*
-config.yaml
-
-# Logs
-*.log
-logs/
-
-# OS
-.DS_Store
-Thumbs.db
-
-# GitHub
-.github/
@@ -22,3 +22,6 @@ indent_size = 2

 [Makefile]
 indent_style = tab
+
+[*.{sh,ps1}]
+end_of_line = lf
@@ -16,7 +16,6 @@

 # Shell scripts (must use LF)
 *.sh text eol=lf
-quickstart.sh text eol=lf

 # PowerShell scripts (Windows-friendly)
 *.ps1 text eol=lf
@@ -122,3 +121,8 @@ CODE_OF_CONDUCT* text
 *.db binary
 *.sqlite binary
 *.sqlite3 binary
+
+# Lockfiles — mark generated so GitHub collapses them in PR diffs
+*.lock            linguist-generated=true -diff
+package-lock.json linguist-generated=true -diff
+uv.lock           linguist-generated=true -diff
@@ -1,3 +0,0 @@
-{
-  "mcpServers": {}
-}
@@ -48,6 +48,24 @@ class Message:
    is_skill_content: bool = False
    # Logical worker run identifier for shared-session persistence
    run_id: str | None = None
+    # True when this is a framework-injected continuation hint (continue-nudge
+    # on stream stall). Stored as a user message for API compatibility, but
+    # the UI should render it as a compact system notice, not user speech.
+    is_system_nudge: bool = False
+    # True when this message is a partial/truncated assistant turn reconstructed
+    # from a crashed or watchdog-cancelled stream. Signals that the original
+    # turn never finished — the model may or may not choose to redo it.
+    truncated: bool = False
+    # When non-None, identifies the parent session id this message was
+    # carried over from — used by fork_session_into_colony on the single
+    # compacted-summary message it writes when a colony is born from a
+    # queen DM. Presence of the field IS the "inherited" signal.
+    inherited_from: str | None = None
+    # True when this user message was synthesized from one or more
+    # fired triggers (timer/webhook), not typed by a human. The LLM still
+    # sees the message as a regular user turn; the UI uses this flag to
+    # render it as a trigger banner instead of a speech bubble.
+    is_trigger: bool = False

    def to_llm_dict(self) -> dict[str, Any]:
        """Convert to OpenAI-format message dict."""
@@ -109,6 +127,14 @@ class Message:
            d["image_content"] = self.image_content
        if self.run_id is not None:
            d["run_id"] = self.run_id
+        if self.is_system_nudge:
+            d["is_system_nudge"] = self.is_system_nudge
+        if self.truncated:
+            d["truncated"] = self.truncated
+        if self.inherited_from is not None:
+            d["inherited_from"] = self.inherited_from
+        if self.is_trigger:
+            d["is_trigger"] = self.is_trigger
        return d

    @classmethod
@@ -126,6 +152,10 @@ class Message:
            is_client_input=data.get("is_client_input", False),
            image_content=data.get("image_content"),
            run_id=data.get("run_id"),
+            is_system_nudge=data.get("is_system_nudge", False),
+            truncated=data.get("truncated", False),
+            inherited_from=data.get("inherited_from"),
+            is_trigger=data.get("is_trigger", False),
        )


@@ -317,6 +347,14 @@ class ConversationStore(Protocol):

    async def delete_parts_before(self, seq: int, run_id: str | None = None) -> None: ...

+    async def write_partial(self, seq: int, data: dict[str, Any]) -> None: ...
+
+    async def read_partial(self, seq: int) -> dict[str, Any] | None: ...
+
+    async def read_all_partials(self) -> list[dict[str, Any]]: ...
+
+    async def clear_partial(self, seq: int) -> None: ...
+
    async def close(self) -> None: ...

    async def destroy(self) -> None: ...
@@ -389,9 +427,20 @@ class NodeConversation:
        store: ConversationStore | None = None,
        run_id: str | None = None,
        compaction_buffer_tokens: int | None = None,
+        compaction_buffer_ratio: float | None = None,
        compaction_warning_buffer_tokens: int | None = None,
    ) -> None:
        self._system_prompt = system_prompt
+        # Optional split: when a caller updates the prompt with a
+        # ``dynamic_suffix`` argument, we remember the static prefix and
+        # suffix separately so the LLM wrapper can emit them as two
+        # Anthropic system content blocks with a cache breakpoint between
+        # them. ``_system_prompt`` stays as the concatenated form used for
+        # persistence and for the legacy single-block LLM path.
+        # On restore, these default to the concat/empty pair — the next
+        # AgentLoop iteration's dynamic-prompt refresh step repopulates.
+        self._system_prompt_static: str = system_prompt
+        self._system_prompt_dynamic_suffix: str = ""
        self._max_context_tokens = max_context_tokens
        self._compaction_threshold = compaction_threshold
        # Buffer-based compaction trigger (Gap 7). When set, takes
@@ -401,6 +450,11 @@ class NodeConversation:
        # limit. If left as None the legacy threshold-based rule is
        # used, keeping old call sites behaving identically.
        self._compaction_buffer_tokens = compaction_buffer_tokens
+        # Ratio component of the hybrid buffer. Combines additively with
+        # _compaction_buffer_tokens so callers can express "reserve N tokens
+        # plus M% of the window" — the absolute floor matters on tiny
+        # windows, the ratio matters on large ones.
+        self._compaction_buffer_ratio = compaction_buffer_ratio
        self._compaction_warning_buffer_tokens = compaction_warning_buffer_tokens
        self._output_keys = output_keys
        self._store = store
@@ -415,15 +469,56 @@ class NodeConversation:

    @property
    def system_prompt(self) -> str:
+        """Full concatenated system prompt (static + dynamic suffix, if any).
+
+        This is the canonical form used for persistence and for the legacy
+        single-block LLM path. Split-prompt callers should read
+        ``system_prompt_static`` and ``system_prompt_dynamic_suffix`` instead.
+        """
        return self._system_prompt

-    def update_system_prompt(self, new_prompt: str) -> None:
+    @property
+    def system_prompt_static(self) -> str:
+        """Static prefix of the system prompt (cache-stable).
+
+        Equals ``system_prompt`` when no split is in use. When the AgentLoop
+        calls ``update_system_prompt(static, dynamic_suffix=...)``, this is
+        the piece sent as the cache-controlled first block.
+        """
+        return self._system_prompt_static
+
+    @property
+    def system_prompt_dynamic_suffix(self) -> str:
+        """Dynamic tail of the system prompt (not cached).
+
+        Empty unless the consumer splits its prompt. The LLM wrapper uses a
+        non-empty suffix to emit a two-block system content list with a
+        cache breakpoint between the static prefix and this tail.
+        """
+        return self._system_prompt_dynamic_suffix
+
+    def update_system_prompt(self, new_prompt: str, dynamic_suffix: str | None = None) -> None:
        """Update the system prompt.

        Used in continuous conversation mode at phase transitions to swap
        Layer 3 (focus) while preserving the conversation history.
+
+        When ``dynamic_suffix`` is provided, ``new_prompt`` is interpreted as
+        the STATIC prefix and ``dynamic_suffix`` as the per-turn tail; they
+        travel to the LLM as two separate cache-controlled blocks but are
+        persisted as a single concatenated string for backward-compat
+        restore. ``new_prompt`` alone (suffix left None) keeps the legacy
+        single-string behavior.
        """
-        self._system_prompt = new_prompt
+        if dynamic_suffix is None:
+            # Legacy single-string path — static == full, no suffix split.
+            self._system_prompt = new_prompt
+            self._system_prompt_static = new_prompt
+            self._system_prompt_dynamic_suffix = ""
+        else:
+            self._system_prompt_static = new_prompt
+            self._system_prompt_dynamic_suffix = dynamic_suffix
+            self._system_prompt = f"{new_prompt}\n\n{dynamic_suffix}" if dynamic_suffix else new_prompt
        self._meta_persisted = False  # re-persist with new prompt

    def set_current_phase(self, phase_id: str) -> None:
@@ -462,6 +557,8 @@ class NodeConversation:
        is_transition_marker: bool = False,
        is_client_input: bool = False,
        image_content: list[dict[str, Any]] | None = None,
+        is_system_nudge: bool = False,
+        is_trigger: bool = False,
    ) -> Message:
        msg = Message(
            seq=self._next_seq,
@@ -472,6 +569,8 @@ class NodeConversation:
            is_transition_marker=is_transition_marker,
            is_client_input=is_client_input,
            image_content=image_content,
+            is_system_nudge=is_system_nudge,
+            is_trigger=is_trigger,
        )
        self._messages.append(msg)
        self._next_seq += 1
@@ -485,6 +584,8 @@ class NodeConversation:
        self,
        content: str,
        tool_calls: list[dict[str, Any]] | None = None,
+        *,
+        truncated: bool = False,
    ) -> Message:
        msg = Message(
            seq=self._next_seq,
@@ -493,6 +594,7 @@ class NodeConversation:
            tool_calls=tool_calls,
            phase_id=self._current_phase,
            run_id=self._run_id,
+            truncated=truncated,
        )
        self._messages.append(msg)
        self._next_seq += 1
@@ -548,6 +650,59 @@ class NodeConversation:

    # --- Query -------------------------------------------------------------

+    def find_completed_tool_call(
+        self,
+        name: str,
+        tool_input: dict[str, Any],
+        within_last_turns: int = 3,
+    ) -> Message | None:
+        """Return the most recent assistant message that issued a tool call
+        with the same (name + canonical-json args) AND received a non-error
+        tool result, within the last ``within_last_turns`` assistant turns.
+
+        Used by the replay detector to flag when the model is about to redo
+        a successful call — we prepend a steer onto the upcoming result but
+        still execute, so tools like browser_screenshot that are legitimately
+        repeated are not silently skipped.
+        """
+        try:
+            target_canonical = json.dumps(tool_input, sort_keys=True, default=str)
+        except (TypeError, ValueError):
+            target_canonical = str(tool_input)
+
+        # Walk backwards over recent assistant messages
+        assistant_turns_seen = 0
+        for idx in range(len(self._messages) - 1, -1, -1):
+            m = self._messages[idx]
+            if m.role != "assistant":
+                continue
+            assistant_turns_seen += 1
+            if assistant_turns_seen > within_last_turns:
+                break
+            if not m.tool_calls:
+                continue
+            for tc in m.tool_calls:
+                func = tc.get("function", {}) if isinstance(tc, dict) else {}
+                tc_name = func.get("name")
+                if tc_name != name:
+                    continue
+                args_str = func.get("arguments", "")
+                try:
+                    parsed = json.loads(args_str) if isinstance(args_str, str) else args_str
+                    canonical = json.dumps(parsed, sort_keys=True, default=str)
+                except (TypeError, ValueError):
+                    canonical = str(args_str)
+                if canonical != target_canonical:
+                    continue
+                # Found a match — now verify its result was not an error.
+                tc_id = tc.get("id")
+                for later in self._messages[idx + 1 :]:
+                    if later.role == "tool" and later.tool_use_id == tc_id:
+                        if not later.is_error:
+                            return m
+                        break
+        return None
+
    def to_llm_messages(self) -> list[dict[str, Any]]:
        """Return messages as OpenAI-format dicts (system prompt excluded).

@@ -749,19 +904,30 @@ class NodeConversation:
        """True when the conversation should be compacted before the
        next LLM call.

-        Buffer-based rule (Gap 7): trigger when the current estimate
-        plus the configured buffer would exceed the hard context limit.
-        Prevents compaction from firing only AFTER we're already over
-        the wire and forced into a reactive binary-split pass.
+        Hybrid buffer rule: the headroom reserved before compaction fires
+        is the SUM of an absolute fixed component and a ratio of the hard
+        context limit:

-        When no buffer is configured, falls back to the multiplicative
-        threshold the old callers were built around.
+            effective_buffer = compaction_buffer_tokens
+                             + compaction_buffer_ratio * max_context_tokens
+
+        The fixed component gives a floor on tiny windows; the ratio
+        keeps the trigger meaningful on large windows where any constant
+        buffer becomes a rounding error (an 8k buffer is 75% on a 32k
+        window but 96% on a 200k window). Compaction fires when the
+        current estimate would consume more than (limit - effective_buffer).
+
+        When neither component is configured, falls back to the legacy
+        multiplicative threshold so old callers keep behaving identically.
        """
        if self._max_context_tokens <= 0:
            return False
-        if self._compaction_buffer_tokens is not None:
-            budget = self._max_context_tokens - self._compaction_buffer_tokens
-            return self.estimate_tokens() >= max(0, budget)
+        fixed = self._compaction_buffer_tokens
+        ratio = self._compaction_buffer_ratio
+        if fixed is not None or ratio is not None:
+            effective_buffer = (fixed or 0) + (ratio or 0.0) * self._max_context_tokens
+            budget = self._max_context_tokens - effective_buffer
+            return self.estimate_tokens() >= max(0.0, budget)
        return self.estimate_tokens() >= self._max_context_tokens * self._compaction_threshold

    def compaction_warning(self) -> bool:
@@ -1365,6 +1531,45 @@ class NodeConversation:
            await self._persist_meta()
        await self._store.write_part(message.seq, message.to_storage_dict())
        await self._write_next_seq()
+        # Any partial checkpoint for this seq is now superseded by the real
+        # part — clear it so a future restore doesn't resurrect stale text.
+        try:
+            await self._store.clear_partial(message.seq)
+        except AttributeError:
+            # Older stores may not implement partials; ignore.
+            pass
+
+    async def checkpoint_partial_assistant(
+        self,
+        accumulated_text: str,
+        tool_calls: list[dict[str, Any]] | None = None,
+    ) -> None:
+        """Write an in-flight assistant turn's state to disk under the next seq.
+
+        Called from the stream event loop. Safe to call repeatedly — each call
+        overwrites the prior checkpoint. Persisted via ``write_partial`` so it
+        does NOT appear in ``read_parts()`` and cannot be double-loaded. Cleared
+        automatically when ``add_assistant_message`` for this seq lands.
+        """
+        if self._store is None:
+            return
+        if not self._meta_persisted:
+            await self._persist_meta()
+        payload: dict[str, Any] = {
+            "seq": self._next_seq,
+            "role": "assistant",
+            "content": accumulated_text,
+            "phase_id": self._current_phase,
+            "run_id": self._run_id,
+            "truncated": True,
+        }
+        if tool_calls:
+            payload["tool_calls"] = tool_calls
+        try:
+            await self._store.write_partial(self._next_seq, payload)
+        except AttributeError:
+            # Older stores may not implement partials; ignore.
+            pass

    async def _persist_meta(self) -> None:
        """Lazily write conversation metadata to the store (called once).
@@ -1379,6 +1584,7 @@ class NodeConversation:
            "max_context_tokens": self._max_context_tokens,
            "compaction_threshold": self._compaction_threshold,
            "compaction_buffer_tokens": self._compaction_buffer_tokens,
+            "compaction_buffer_ratio": self._compaction_buffer_ratio,
            "compaction_warning_buffer_tokens": (self._compaction_warning_buffer_tokens),
            "output_keys": self._output_keys,
        }
@@ -1428,6 +1634,7 @@ class NodeConversation:
            store=store,
            run_id=run_id,
            compaction_buffer_tokens=meta.get("compaction_buffer_tokens"),
+            compaction_buffer_ratio=meta.get("compaction_buffer_ratio"),
            compaction_warning_buffer_tokens=meta.get("compaction_warning_buffer_tokens"),
        )
        conv._meta_persisted = True
@@ -1461,4 +1668,45 @@ class NodeConversation:
        elif conv._messages:
            conv._next_seq = conv._messages[-1].seq + 1

+        # Surface any leftover partial checkpoints as truncated messages so
+        # the next turn sees what the interrupted stream was in the middle
+        # of producing. Only partials whose seq is >= next_seq are meaningful;
+        # anything lower was already superseded by a real part.
+        try:
+            partials = await store.read_all_partials()
+        except AttributeError:
+            partials = []
+        for p in partials:
+            pseq = p.get("seq", -1)
+            if pseq < conv._next_seq:
+                # Stale — clean it up.
+                try:
+                    await store.clear_partial(pseq)
+                except AttributeError:
+                    pass
+                continue
+            # Only resurrect partials relevant to this run / phase.
+            if run_id and not is_legacy_run_id(run_id) and p.get("run_id") != run_id:
+                continue
+            if phase_id and p.get("phase_id") is not None and p.get("phase_id") != phase_id:
+                continue
+            # Reconstruct as a truncated assistant message.
+            msg = Message(
+                seq=pseq,
+                role="assistant",
+                content=p.get("content", "") or "",
+                tool_calls=p.get("tool_calls"),
+                phase_id=p.get("phase_id"),
+                run_id=p.get("run_id"),
+                truncated=True,
+            )
+            conv._messages.append(msg)
+            conv._next_seq = max(conv._next_seq, pseq + 1)
+            logger.info(
+                "restore: resurrected truncated partial seq=%d (text=%d chars, tool_calls=%d)",
+                pseq,
+                len(msg.content),
+                len(msg.tool_calls or []),
+            )
+
        return conv
@@ -371,6 +371,7 @@ async def llm_compact(
    char_limit: int = LLM_COMPACT_CHAR_LIMIT,
    max_depth: int = LLM_COMPACT_MAX_DEPTH,
    max_context_tokens: int = 128_000,
+    preserve_user_messages: bool = False,
 ) -> str:
    """Summarise *messages* with LLM, splitting recursively if too large.

@@ -378,6 +379,11 @@ async def llm_compact(
    rejects the call with a context-length error, the messages are split
    in half and each half is summarised independently.  Tool history is
    appended once at the top-level call (``_depth == 0``).
+
+    When ``preserve_user_messages`` is True, the prompt and system message
+    are amplified to instruct the LLM to keep every user message verbatim
+    and in full — used by the manual /compact-and-fork endpoint where the
+    user wants their voice carried into the new session intact.
    """
    from framework.agent_loop.conversation import extract_tool_call_history
    from framework.agent_loop.internals.tool_result_handler import is_context_too_large_error
@@ -401,6 +407,7 @@ async def llm_compact(
            char_limit=char_limit,
            max_depth=max_depth,
            max_context_tokens=max_context_tokens,
+            preserve_user_messages=preserve_user_messages,
        )
    else:
        prompt = build_llm_compaction_prompt(
@@ -408,17 +415,30 @@ async def llm_compact(
            accumulator,
            formatted,
            max_context_tokens=max_context_tokens,
+            preserve_user_messages=preserve_user_messages,
        )
+        if preserve_user_messages:
+            system_msg = (
+                "You are a conversation compactor for an AI agent. "
+                "Write a detailed summary that allows the agent to "
+                "continue its work. CRITICAL: reproduce every user "
+                "message verbatim and in full inside the 'User Messages' "
+                "section — do not paraphrase, truncate, or merge them. "
+                "Assistant turns and tool results may be summarised, but "
+                "user input is sacred."
+            )
+        else:
+            system_msg = (
+                "You are a conversation compactor for an AI agent. "
+                "Write a detailed summary that allows the agent to "
+                "continue its work. Preserve user-stated rules, "
+                "constraints, and account/identity preferences verbatim."
+            )
        summary_budget = max(1024, max_context_tokens // 2)
        try:
            response = await ctx.llm.acomplete(
                messages=[{"role": "user", "content": prompt}],
-                system=(
-                    "You are a conversation compactor for an AI agent. "
-                    "Write a detailed summary that allows the agent to "
-                    "continue its work. Preserve user-stated rules, "
-                    "constraints, and account/identity preferences verbatim."
-                ),
+                system=system_msg,
                max_tokens=summary_budget,
            )
            summary = response.content
@@ -437,6 +457,7 @@ async def llm_compact(
                    char_limit=char_limit,
                    max_depth=max_depth,
                    max_context_tokens=max_context_tokens,
+                    preserve_user_messages=preserve_user_messages,
                )
            else:
                raise
@@ -459,6 +480,7 @@ async def _llm_compact_split(
    char_limit: int = LLM_COMPACT_CHAR_LIMIT,
    max_depth: int = LLM_COMPACT_MAX_DEPTH,
    max_context_tokens: int = 128_000,
+    preserve_user_messages: bool = False,
 ) -> str:
    """Split messages in half and summarise each half independently."""
    mid = max(1, len(messages) // 2)
@@ -470,6 +492,7 @@ async def _llm_compact_split(
        char_limit=char_limit,
        max_depth=max_depth,
        max_context_tokens=max_context_tokens,
+        preserve_user_messages=preserve_user_messages,
    )
    s2 = await llm_compact(
        ctx,
@@ -479,6 +502,7 @@ async def _llm_compact_split(
        char_limit=char_limit,
        max_depth=max_depth,
        max_context_tokens=max_context_tokens,
+        preserve_user_messages=preserve_user_messages,
    )
    return s1 + "\n\n" + s2

@@ -510,6 +534,7 @@ def build_llm_compaction_prompt(
    formatted_messages: str,
    *,
    max_context_tokens: int = 128_000,
+    preserve_user_messages: bool = False,
 ) -> str:
    """Build prompt for LLM compaction targeting 50% of token budget.

@@ -539,6 +564,18 @@ def build_llm_compaction_prompt(
    target_chars = target_tokens * 4
    node_ctx = "\n".join(ctx_lines)

+    user_messages_section = (
+        "6. **User Messages** — Reproduce EVERY user message verbatim and "
+        "in full, in chronological order, each on its own line prefixed "
+        'with the message index (e.g. "[U1] ..."). Do NOT paraphrase, '
+        "summarise, merge, or omit any user message. Preserve markdown, "
+        "code fences, whitespace, and punctuation exactly as the user "
+        "wrote them.\n"
+        if preserve_user_messages
+        else "6. **User Messages** — Preserve ALL user-stated rules, constraints, "
+        "identity preferences, and account details verbatim.\n"
+    )
+
    return (
        "You are compacting an AI agent's conversation history. "
        "The agent is still working and needs to continue.\n\n"
@@ -559,8 +596,7 @@ def build_llm_compaction_prompt(
        "resolved. Include root causes so the agent doesn't repeat them.\n"
        "5. **Problem Solving Efforts** — Approaches tried, dead ends hit, "
        "and reasoning behind the current strategy.\n"
-        "6. **User Messages** — Preserve ALL user-stated rules, constraints, "
-        "identity preferences, and account details verbatim.\n"
+        f"{user_messages_section}"
        "7. **Pending Tasks** — Work remaining, outputs still needed, and "
        "any blockers.\n"
        "8. **Current Work** — The most recent action taken and the immediate "
@@ -12,6 +12,7 @@ import json
 import logging
 from collections.abc import Awaitable, Callable
 from dataclasses import dataclass
+from datetime import datetime
 from typing import Any

 from framework.agent_loop.conversation import ConversationStore, NodeConversation
@@ -191,15 +192,21 @@ async def drain_injection_queue(
                    else:
                        logger.info("[drain] no vision fallback available; images dropped")
                image_content = None
-            # Real user input is stored as-is; external events get a prefix
+            # Stamp every injected event with its arrival time so the model
+            # has a consistent temporal log to reason over (and so the
+            # stamp lives inside byte-stable conversation history instead
+            # of a per-turn system-prompt tail). Minute precision is what
+            # the queen needs for conversational / scheduling context.
+            stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
            if is_client_input:
+                stamped = f"[{stamp}] {content}" if content else f"[{stamp}]"
                await conversation.add_user_message(
-                    content,
+                    stamped,
                    is_client_input=True,
                    image_content=image_content,
                )
            else:
-                await conversation.add_user_message(f"[External event]: {content}")
+                await conversation.add_user_message(f"[{stamp}] [External event] {content}")
            count += 1
        except asyncio.QueueEmpty:
            break
@@ -232,9 +239,12 @@ async def drain_trigger_queue(
        payload_str = json.dumps(t.payload, default=str)
        parts.append(f"[TRIGGER: {t.trigger_type}/{t.source_id}]{task_line}\n{payload_str}")

-    combined = "\n\n".join(parts)
+    stamp = datetime.now().astimezone().strftime("%Y-%m-%d %H:%M %Z")
+    combined = f"[{stamp}]\n" + "\n\n".join(parts)
    logger.info("[drain] %d trigger(s): %s", len(triggers), combined[:200])
-    await conversation.add_user_message(combined)
+    # Tag the message so the UI can render a banner instead of the raw
+    # `[TRIGGER: ...]` text. The LLM still sees `combined` verbatim.
+    await conversation.add_user_message(combined, is_trigger=True)
    return len(triggers)


@@ -108,6 +108,8 @@ async def publish_llm_turn_complete(
    input_tokens: int,
    output_tokens: int,
    cached_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+    cost_usd: float = 0.0,
    execution_id: str = "",
    iteration: int | None = None,
 ) -> None:
@@ -120,6 +122,8 @@ async def publish_llm_turn_complete(
            input_tokens=input_tokens,
            output_tokens=output_tokens,
            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            execution_id=execution_id,
            iteration=iteration,
        )
@@ -91,108 +91,66 @@ def sanitize_ask_user_inputs(
    return q, recovered


+ask_user_prompt = """\
+Use this tool when you need to ask the user questions during execution. Reach for it when:
+
+- The task is ambiguous and the user needs to choose an approach
+- You need missing information to continue
+- You want approval before taking a meaningful action
+- A decision has real trade-offs the user should weigh in on
+- You want post-task feedback, or to offer saving a skill or updating memory
+
+Usage notes:
+- Users will always be able to select "Other" to provide custom text input, \
+so do not include catch-all options like "Other" or "Something else" yourself.
+- Each option is a plain string. Do NOT wrap options in `{"label": "..."}` or \
+`{"value": "..."}` objects — pass the raw choice text directly, e.g. `"Email"`, \
+not `{"label": "Email"}`.
+- If you recommend a specific option, make that the first option in the list \
+and append " (Recommended)" to the end of its text.
+- Call this tool whenever you need the user's response.
+- The prompt field must be plain text only.
+- Do not include XML, pseudo-tags, or inline option lists inside prompt.
+- Omit options only when the question truly requires a free-form response the \
+user must type out, such as describing an idea or pasting an error message.
+- Do not repeat the questions in your normal text response. The widget renders \
+them, so keep any surrounding text to a brief intro only.
+Example — single question with options:
+{"questions": [{"id": "next", "prompt": "What would you like to do?", \
+"options": ["Build a new agent (Recommended)", "Modify existing agent", "Run tests"]}]}
+
+Example — batch:
+{"questions": [
+  {"id": "scope", "prompt": "What scope?", "options": ["Full", "Partial"]},
+  {"id": "format", "prompt": "Output format?", "options": ["PDF", "CSV", "JSON"]},
+  {"id": "details", "prompt": "Any special requirements?"}
+]}
+
+Example — free-form (queen only):
+{"questions": [{"id": "idea", "prompt": "Describe the agent you want to build."}]}
+"""
+
+
 def build_ask_user_tool() -> Tool:
    """Build the synthetic ask_user tool for explicit user-input requests.

-    The queen calls ask_user() when it needs to pause and wait
-    for user input.  Text-only turns WITHOUT ask_user flow through without
-    blocking, allowing progress updates and summaries to stream freely.
+    The queen calls ask_user() when it needs to pause and wait for user
+    input. Accepts an array of 1-8 questions — a single question for the
+    common case, or a batch when several clarifications are needed at once.
+    Text-only turns WITHOUT ask_user flow through without blocking, allowing
+    progress updates and summaries to stream freely.
    """
    return Tool(
        name="ask_user",
-        description=(
-            "You MUST call this tool whenever you need the user's response. "
-            "Always call it after greeting the user, asking a question, or "
-            "requesting approval. Do NOT call it for status updates or "
-            "summaries that don't require a response.\n\n"
-            "STRUCTURE RULES (CRITICAL):\n"
-            "- The 'question' field is PLAIN TEXT shown to the user. Do NOT "
-            "include XML tags, pseudo-tags like </question>, or option lists "
-            "in the question string. The UI does not parse them — they "
-            "render as raw text and look broken.\n"
-            "- The 'options' parameter is the ONLY way to render buttons. "
-            "If you want buttons, put them in the 'options' array, not in "
-            "the question string. Do NOT write 'OPTIONS: [...]', "
-            "'_options: [...]', or any inline list inside 'question'.\n"
-            "- The question text must read as a single clean prompt with "
-            "no markup. Example: 'What would you like to do?' — not "
-            "'What would you like to do?</question>'.\n\n"
-            "USAGE:\n"
-            "Always include 2-3 predefined options. The UI automatically "
-            "appends an 'Other' free-text input after your options, so NEVER "
-            "include catch-all options like 'Custom idea', 'Something else', "
-            "'Other', or 'None of the above' — the UI handles that. "
-            "When the question primarily needs a typed answer but you must "
-            "include options, make one option signal that typing is expected "
-            "(e.g. 'I\\'ll type my response'). This helps users discover the "
-            "free-text input. "
-            "The ONLY exception: omit options when the question demands a "
-            "free-form answer the user must type out (e.g. 'Describe your "
-            "agent idea', 'Paste the error message').\n\n"
-            "CORRECT EXAMPLE:\n"
-            '{"question": "What would you like to do?", "options": '
-            '["Build a new agent", "Modify existing agent", "Run tests"]}\n\n'
-            "FREE-FORM EXAMPLE:\n"
-            '{"question": "Describe the agent you want to build."}\n\n'
-            "WRONG (do NOT do this — buttons will not render):\n"
-            '{"question": "What now?</question>\\n_OPTIONS: [\\"A\\", \\"B\\"]"}'
-        ),
-        parameters={
-            "type": "object",
-            "properties": {
-                "question": {
-                    "type": "string",
-                    "description": "The question or prompt shown to the user.",
-                },
-                "options": {
-                    "type": "array",
-                    "items": {"type": "string"},
-                    "description": (
-                        "2-3 specific predefined choices. Include in most cases. "
-                        'Example: ["Option A", "Option B", "Option C"]. '
-                        "The UI always appends an 'Other' free-text input, so "
-                        "do NOT include catch-alls like 'Custom idea' or 'Other'. "
-                        "Omit ONLY when the user must type a free-form answer."
-                    ),
-                    "minItems": 2,
-                    "maxItems": 3,
-                },
-            },
-            "required": ["question"],
-        },
-    )
-
-
-def build_ask_user_multiple_tool() -> Tool:
-    """Build the synthetic ask_user_multiple tool for batched questions.
-
-    Queen-only tool that presents multiple questions at once so the user
-    can answer them all in a single interaction rather than one at a time.
-    """
-    return Tool(
-        name="ask_user_multiple",
-        description=(
-            "Ask the user multiple questions at once. Use this instead of "
-            "ask_user when you have 2 or more questions to ask in the same "
-            "turn — it lets the user answer everything in one go rather than "
-            "going back and forth. Each question can have its own predefined "
-            "options (2-3 choices) or be free-form. The UI renders all "
-            "questions together with a single Submit button. "
-            "ALWAYS prefer this over ask_user when you have multiple things "
-            "to clarify. "
-            "IMPORTANT: Do NOT repeat the questions in your text response — "
-            "the widget renders them. Keep your text to a brief intro only. "
-            '{"questions": ['
-            '  {"id": "scope", "prompt": "What scope?", "options": ["Full", "Partial"]},'
-            '  {"id": "format", "prompt": "Output format?", "options": ["PDF", "CSV", "JSON"]},'
-            '  {"id": "details", "prompt": "Any special requirements?"}'
-            "]}"
-        ),
+        description=ask_user_prompt,
        parameters={
            "type": "object",
            "properties": {
                "questions": {
                    "type": "array",
+                    "minItems": 1,
+                    "maxItems": 8,
+                    "description": "List of questions to present to the user.",
                    "items": {
                        "type": "object",
                        "properties": {
@@ -208,8 +166,13 @@ def build_ask_user_multiple_tool() -> Tool:
                                "type": "array",
                                "items": {"type": "string"},
                                "description": (
-                                    "2-3 predefined choices. The UI appends an "
-                                    "'Other' free-text input automatically. "
+                                    "2-3 predefined choices as plain strings "
+                                    '(e.g. ["Yes", "No", "Maybe"]). Do NOT '
+                                    'wrap items in {"label": "..."} or '
+                                    '{"value": "..."} objects — pass the raw '
+                                    "choice text directly. The UI appends an "
+                                    "'Other' free-text input automatically, "
+                                    "so don't include catch-all options. "
                                    "Omit only when the user must type a free-form answer."
                                ),
                                "minItems": 2,
@@ -218,9 +181,6 @@ def build_ask_user_multiple_tool() -> Tool:
                        },
                        "required": ["id", "prompt"],
                    },
-                    "minItems": 2,
-                    "maxItems": 8,
-                    "description": "List of questions to present to the user.",
                },
            },
            "required": ["questions"],
@@ -0,0 +1,291 @@
+"""Generic coercion of LLM-emitted tool arguments to match each tool's JSON schema.
+
+Small/mid-size models drift from tool schemas in predictable, boring ways:
+
+- A number field comes back as a string (``"42"`` instead of ``42``).
+- A boolean field comes back as a string (``"true"`` instead of ``True``).
+- An array-of-string field comes back as an array of objects
+  (``[{"label": "A"}, ...]`` instead of ``["A", ...]``).
+- An array/object field comes back as a JSON-encoded string
+  (``'["A","B"]'`` instead of ``["A", "B"]``).
+- A lone scalar arrives where the schema expects an array.
+
+This module centralizes the healing in one schema-driven pass that runs
+on every tool call before dispatch. Coercion is conservative:
+
+- Values that already match the expected type are untouched.
+- Shapes we don't recognize are returned as-is, so real bugs surface
+  instead of getting silently munged into something plausible.
+- Every actual coercion is logged with the tool, property, and shape
+  transition so we can see which models/tools are drifting.
+
+Tool-specific prompt drift (e.g. ``</question>`` tags leaking into an
+``ask_user`` prompt string) is NOT this module's job — that belongs in
+per-tool sanitizers, because it's about prompt style, not schema shape.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from typing import Any
+
+from framework.llm.provider import Tool
+
+logger = logging.getLogger(__name__)
+
+# When an ``array<string>`` field arrives as an array of objects, look
+# for a text-carrying field in preference order. Covers the wrappers
+# small models tend to produce: ``[{"label": "A"}]``, ``[{"value": "A"}]``,
+# ``[{"text": "A"}]``, etc.
+_STRING_EXTRACT_KEYS: tuple[str, ...] = (
+    "label",
+    "value",
+    "text",
+    "name",
+    "title",
+    "display",
+)
+
+
+def coerce_tool_input(tool: Tool, raw_input: dict[str, Any] | None) -> dict[str, Any]:
+    """Coerce *raw_input* in place to match *tool*'s JSON schema.
+
+    Returns the mutated input dict (same object as *raw_input* when
+    possible, for callers that assume in-place mutation). Properties
+    not present in the schema are left untouched.
+    """
+    if not isinstance(raw_input, dict):
+        return raw_input or {}
+
+    schema = tool.parameters or {}
+    props = schema.get("properties")
+    if not isinstance(props, dict):
+        return raw_input
+
+    for key in list(raw_input.keys()):
+        prop_schema = props.get(key)
+        if not isinstance(prop_schema, dict):
+            continue
+        original = raw_input[key]
+        coerced = _coerce(original, prop_schema)
+        if coerced is not original:
+            logger.info(
+                "coerced tool input tool=%s prop=%s from=%s to=%s",
+                tool.name,
+                key,
+                _shape(original),
+                _shape(coerced),
+            )
+            raw_input[key] = coerced
+
+    return raw_input
+
+
+def _coerce(value: Any, schema: dict[str, Any]) -> Any:
+    """Dispatch on the schema's ``type`` field.
+
+    Returns the *same object* on passthrough so callers can detect
+    no-ops via identity (``coerced is value``).
+    """
+    expected = schema.get("type")
+    if not expected:
+        return value
+
+    # Union type: try each in order, return the first coercion that
+    # actually changes the value. Falls back to the original.
+    if isinstance(expected, list):
+        for t in expected:
+            sub_schema = {**schema, "type": t}
+            coerced = _coerce(value, sub_schema)
+            if coerced is not value:
+                return coerced
+        return value
+
+    if expected == "integer":
+        return _coerce_integer(value)
+    if expected == "number":
+        return _coerce_number(value)
+    if expected == "boolean":
+        return _coerce_boolean(value)
+    if expected == "string":
+        return _coerce_string(value)
+    if expected == "array":
+        return _coerce_array(value, schema)
+    if expected == "object":
+        return _coerce_object(value, schema)
+
+    return value
+
+
+def _coerce_integer(value: Any) -> Any:
+    # bool is a subclass of int in Python; don't mistake True for 1 here.
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, int):
+        return value
+    if isinstance(value, str):
+        parsed = _parse_number(value)
+        if parsed is None:
+            return value
+        if parsed != int(parsed):
+            # Has a fractional part — caller asked for int, don't truncate.
+            return value
+        return int(parsed)
+    return value
+
+
+def _coerce_number(value: Any) -> Any:
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, (int, float)):
+        return value
+    if isinstance(value, str):
+        parsed = _parse_number(value)
+        if parsed is None:
+            return value
+        if parsed == int(parsed):
+            return int(parsed)
+        return parsed
+    return value
+
+
+def _coerce_boolean(value: Any) -> Any:
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, str):
+        low = value.strip().lower()
+        if low == "true":
+            return True
+        if low == "false":
+            return False
+    return value
+
+
+def _coerce_string(value: Any) -> Any:
+    if isinstance(value, str):
+        return value
+    # Common drift: model sent ``{"label": "..."}`` when we wanted "...".
+    if isinstance(value, dict):
+        extracted = _extract_string_from_object(value)
+        if extracted is not None:
+            return extracted
+    return value
+
+
+def _coerce_array(value: Any, schema: dict[str, Any]) -> Any:
+    # Heal: JSON-encoded array string → array.
+    if isinstance(value, str):
+        parsed = _try_parse_json(value)
+        if isinstance(parsed, list):
+            value = parsed
+        else:
+            # Scalar string where an array is expected — wrap it.
+            return [value]
+    elif not isinstance(value, list):
+        # Any other scalar (int, bool, dict, ...) — wrap.
+        return [value]
+
+    items_schema = schema.get("items")
+    if not isinstance(items_schema, dict):
+        return value
+
+    coerced_items: list[Any] = []
+    changed = False
+    for item in value:
+        c = _coerce(item, items_schema)
+        if c is not item:
+            changed = True
+        coerced_items.append(c)
+    return coerced_items if changed else value
+
+
+def _coerce_object(value: Any, schema: dict[str, Any]) -> Any:
+    # Heal: JSON-encoded object string → object.
+    if isinstance(value, str):
+        parsed = _try_parse_json(value)
+        if isinstance(parsed, dict):
+            value = parsed
+        else:
+            return value
+    if not isinstance(value, dict):
+        return value
+
+    sub_props = schema.get("properties")
+    if not isinstance(sub_props, dict):
+        return value
+
+    changed = False
+    for k in list(value.keys()):
+        sub_schema = sub_props.get(k)
+        if not isinstance(sub_schema, dict):
+            continue
+        original = value[k]
+        coerced = _coerce(original, sub_schema)
+        if coerced is not original:
+            value[k] = coerced
+            changed = True
+    # Return the same dict on mutation so callers that passed a shared
+    # reference see the updates. ``changed`` is only used to decide
+    # whether we need to log at a coarser level upstream.
+    return value if changed or not sub_props else value
+
+
+def _extract_string_from_object(obj: dict[str, Any]) -> str | None:
+    """Pick a likely-text field out of a wrapper object.
+
+    Tries the known keys first, falls back to the sole value if the
+    object has exactly one entry. Returns None when nothing plausible
+    is found — the caller keeps the original.
+    """
+    for k in _STRING_EXTRACT_KEYS:
+        v = obj.get(k)
+        if isinstance(v, str) and v:
+            return v
+    if len(obj) == 1:
+        (only,) = obj.values()
+        if isinstance(only, str) and only:
+            return only
+    return None
+
+
+def _try_parse_json(raw: str) -> Any:
+    try:
+        return json.loads(raw)
+    except (ValueError, TypeError):
+        return None
+
+
+def _parse_number(raw: str) -> float | None:
+    try:
+        f = float(raw)
+    except (ValueError, OverflowError):
+        return None
+    # Reject NaN and inf — they pass float() but aren't useful numeric
+    # values for tool arguments.
+    if f != f or f == float("inf") or f == float("-inf"):
+        return None
+    return f
+
+
+def _shape(value: Any) -> str:
+    """Short type/shape description used in coercion log lines."""
+    if value is None:
+        return "None"
+    if isinstance(value, bool):
+        return "bool"
+    if isinstance(value, int):
+        return "int"
+    if isinstance(value, float):
+        return "float"
+    if isinstance(value, str):
+        return f"str[{len(value)}]"
+    if isinstance(value, list):
+        if not value:
+            return "list[0]"
+        return f"list[{len(value)}]<{_shape(value[0])}>"
+    if isinstance(value, dict):
+        keys = sorted(value.keys())[:3]
+        suffix = ",…" if len(value) > 3 else ""
+        return f"dict{{{','.join(keys)}{suffix}}}"
+    return type(value).__name__
@@ -69,6 +69,20 @@ class LoopConfig:
    # and less tight than Anthropic's own counting. Override via
    # LoopConfig for larger windows.
    compaction_buffer_tokens: int = 8_000
+    # Ratio-based component of the hybrid compaction buffer. Effective
+    # headroom reserved before compaction fires is
+    #   compaction_buffer_tokens + compaction_buffer_ratio * max_context_tokens
+    # The ratio scales with the model's window where the absolute fixed
+    # component does not (an 8k absolute buffer is 75% trigger on a 32k
+    # window but 96% on a 200k window). Combining them gives an absolute
+    # floor sized for the worst-case single tool result (one un-spilled
+    # max_tool_result_chars payload ≈ 30k chars ≈ 7.5k tokens, rounded to
+    # 8k) plus a fractional headroom that keeps the trigger meaningful on
+    # large windows, so the inner tool loop always has room to grow
+    # without tripping the mid-turn pre-send guard. Defaults: 8k + 15%.
+    # On 32k that's a 12.8k buffer (~60% trigger); on 200k it's 38k
+    # (~81% trigger); on 1M it's 158k (~84% trigger).
+    compaction_buffer_ratio: float = 0.15
    # Warning is emitted one buffer earlier so the user/telemetry gets
    # a "we're close" signal without triggering a compaction pass.
    compaction_warning_buffer_tokens: int = 12_000
@@ -131,14 +145,39 @@ class LoopConfig:
    # Per-tool-call timeout.
    tool_call_timeout_seconds: float = 60.0

-    # LLM stream inactivity watchdog. If no stream event (delta, tool call,
-    # finish) arrives within this many seconds, the stream task is cancelled
-    # and a transient error is raised so the retry loop can back off and
-    # reconnect. Prevents agents from hanging forever on a silently dead
-    # HTTP connection (no provider heartbeat, no exception, just silence).
-    # Set to 0 to disable.
+    # LLM stream inactivity watchdog. Split into two budgets so legitimate
+    # slow TTFT on large contexts doesn't get mistaken for a dead connection.
+    # - ttft: stream open -> first event. Large-context local models can
+    #   legitimately take minutes before the first token arrives.
+    # - inter_event: last event -> now, ONLY after the first event. A stream
+    #   that started producing and then went silent is a real stall.
+    # Whichever fires first cancels the stream. Set to 0 to disable that
+    # individual budget; set both to 0 to fully disable the watchdog.
+    llm_stream_ttft_timeout_seconds: float = 600.0
+    llm_stream_inter_event_idle_seconds: float = 120.0
+    # Deprecated alias — kept so existing configs keep working. If set to a
+    # non-default value it overrides inter_event_idle (historical behavior).
    llm_stream_inactivity_timeout_seconds: float = 120.0

+    # Continue-nudge recovery. When the idle watchdog fires on a live but
+    # stuck stream, cancel the stream and append a short continuation
+    # hint to the conversation instead of raising a ConnectionError and
+    # re-running the whole turn. Preserves any partial text/tool-calls the
+    # stream emitted before the stall.
+    continue_nudge_enabled: bool = True
+    # Cap so a truly dead endpoint eventually falls back to the error path
+    # instead of nudging forever.
+    continue_nudge_max_per_turn: int = 3
+
+    # Tool-call replay detector. When the model emits a tool call whose
+    # (name + canonical-args) matches a prior successful call in the last
+    # K assistant turns, emit telemetry and prepend a short steer onto the
+    # tool result — but still execute. Weaker models legitimately repeat
+    # read-only calls (screenshot, evaluate), so silent skipping would
+    # cause surprising behavior.
+    replay_detector_enabled: bool = True
+    replay_detector_within_last_turns: int = 3
+
    # Subagent delegation timeout (wall-clock max).
    subagent_timeout_seconds: float = 3600.0

@@ -53,7 +53,14 @@ def build_prompt_spec(
    # trigger tools are present in this agent's tool list (e.g. browser_*
    # pulls in hive.browser-automation). Keeps non-browser agents lean.
    tool_names = [getattr(t, "name", "") for t in (getattr(ctx, "available_tools", None) or [])]
-    skills_catalog_prompt = augment_catalog_for_tools(ctx.skills_catalog_prompt or "", tool_names)
+    raw_catalog = ctx.skills_catalog_prompt or ""
+    dynamic_catalog = getattr(ctx, "dynamic_skills_catalog_provider", None)
+    if dynamic_catalog is not None:
+        try:
+            raw_catalog = dynamic_catalog() or ""
+        except Exception:
+            raw_catalog = ctx.skills_catalog_prompt or ""
+    skills_catalog_prompt = augment_catalog_for_tools(raw_catalog, tool_names)

    return PromptSpec(
        identity_prompt=ctx.identity_prompt or "",
@@ -182,7 +182,24 @@ class AgentContext:

    dynamic_tools_provider: Any = None
    dynamic_prompt_provider: Any = None
+    # Optional Callable[[], str]: when set alongside ``dynamic_prompt_provider``,
+    # the AgentLoop sends the system prompt as two pieces — the result of
+    # ``dynamic_prompt_provider`` is the STATIC block (cached), and this
+    # provider returns the DYNAMIC suffix (not cached). The LLM wrapper
+    # emits them as two Anthropic system content blocks with a cache
+    # breakpoint between them for providers that honor ``cache_control``.
+    # For providers that don't, the two strings are concatenated. Used by
+    # the Queen to keep her persona/role/tools block warm across iterations
+    # while the recall + timestamp tail refreshes per user turn.
+    dynamic_prompt_suffix_provider: Any = None
    dynamic_memory_provider: Any = None
+    # Optional Callable[[], str]: when set, the current skills-catalog
+    # prompt is sourced from this provider each iteration. Lets workers
+    # pick up UI toggles without restarting the run. Queen agents already
+    # rebuild the whole prompt via dynamic_prompt_provider — this field
+    # is a surgical alternative used by colony workers where the rest of
+    # the prompt stays constant and we don't want to thrash the cache.
+    dynamic_skills_catalog_provider: Any = None

    skills_catalog_prompt: str = ""
    protocols_prompt: str = ""
@@ -4,6 +4,7 @@ from __future__ import annotations

 import json
 from dataclasses import dataclass, field
+from datetime import UTC
 from pathlib import Path


@@ -47,6 +48,8 @@ class AgentEntry:
    tool_count: int = 0
    tags: list[str] = field(default_factory=list)
    last_active: str | None = None
+    created_at: str | None = None
+    icon: str | None = None
    workers: list[WorkerEntry] = field(default_factory=list)


@@ -209,13 +212,26 @@ def discover_agents() -> dict[str, list[AgentEntry]]:
            name = config_fallback_name
            desc = ""

-            # Read colony metadata for queen provenance
+            # Read colony metadata for queen provenance and timestamps
            colony_queen_name = ""
+            colony_created_at: str | None = None
+            colony_icon: str | None = None
            metadata_path = path / "metadata.json"
            if metadata_path.exists():
                try:
                    mdata = json.loads(metadata_path.read_text(encoding="utf-8"))
                    colony_queen_name = mdata.get("queen_name", "")
+                    colony_created_at = mdata.get("created_at")
+                    colony_icon = mdata.get("icon")
+                except Exception:
+                    pass
+            # Fallback: use directory creation time if metadata lacks created_at
+            if not colony_created_at:
+                try:
+                    from datetime import datetime
+
+                    stat = path.stat()
+                    colony_created_at = datetime.fromtimestamp(stat.st_birthtime, tz=UTC).isoformat()
                except Exception:
                    pass

@@ -256,6 +272,8 @@ def discover_agents() -> dict[str, list[AgentEntry]]:
                    tool_count=tool_count,
                    tags=[],
                    last_active=_get_last_active(path),
+                    created_at=colony_created_at,
+                    icon=colony_icon,
                    workers=worker_entries,
                )
            )
@@ -0,0 +1,240 @@
+"""One-shot LLM gate that decides if a queen DM is ready to fork a colony.
+
+The queen's ``start_incubating_colony`` tool calls :func:`evaluate` with
+the queen's recent conversation, a proposed ``colony_name``, and a
+one-paragraph ``intended_purpose``.  The evaluator returns a structured
+verdict:
+
+    {
+        "ready": bool,
+        "reasons": [str],
+        "missing_prerequisites": [str],
+    }
+
+On ``ready=False`` the queen receives the verdict as her tool result and
+self-corrects (asks the user, refines scope, drops the idea).  On
+``ready=True`` the tool flips the queen's phase to ``incubating``.
+
+Failure mode is **fail-closed**: any LLM error or unparseable response
+returns ``ready=False`` with reason ``"evaluation_failed"`` so the queen
+cannot accidentally proceed past a broken gate.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+from typing import Any
+
+from framework.agent_loop.conversation import Message
+
+logger = logging.getLogger(__name__)
+
+
+_INCUBATING_EVALUATOR_SYSTEM_PROMPT = """\
+You gate whether a queen agent should commit to forking a persistent
+"colony" (a headless worker spec written to disk).  Forking is
+expensive: it ends the user's chat with this queen and the worker runs
+unattended afterward, so the spec must be settled before you approve.
+
+Read the conversation excerpt and the queen's proposed colony_name +
+intended_purpose, then decide.
+
+APPROVE (ready=true) only when ALL of the following hold:
+  1. The user has explicitly asked for work that needs to outlive this
+     chat — recurring (cron / interval), monitoring + alert, scheduled
+     batch, or "fire-and-forget background job".  A one-shot question
+     that the queen can answer in chat does NOT qualify.
+  2. The scope of the work is concrete enough to write down — what
+     inputs, what outputs, what success looks like.  Vague ("help me
+     with my workflow") does NOT qualify.
+  3. The technical approach is at least sketched — what data sources,
+     APIs, or tools the worker will use.  The queen does not have to
+     have written the SKILL.md yet, but she must have the operational
+     ingredients available.
+  4. There are no open clarifying questions on the table that the user
+     hasn't answered.  If the queen recently asked the user something
+     and is still waiting, do NOT approve.
+
+REJECT (ready=false) on any of:
+  - Conversation is too short / too generic to support a settled spec.
+  - User is still describing what they want.
+  - User has expressed doubts, change-of-direction, or "let me think".
+  - Work is one-shot and could be done in chat instead.
+  - Open question awaiting user reply.
+
+Reply with a JSON object exactly matching this shape:
+
+  {
+    "ready": true | false,
+    "reasons": ["short phrase", ...],         // at least one entry
+    "missing_prerequisites": ["short phrase", ...]  // empty when ready
+  }
+
+``reasons`` explains the verdict in 1-3 short phrases.
+``missing_prerequisites`` lists what's missing in queen-actionable
+form ("user hasn't confirmed schedule", "no API auth flow discussed").
+Empty list when ``ready=true``.
+
+Output JSON only.  Do not wrap in markdown.  Do not add prose.
+"""
+
+
+# Bound the formatted excerpt so the eval call stays cheap and fits well
+# under the LLM's context window even for long DM sessions.
+_MAX_MESSAGES = 30
+_MAX_TOOL_CONTENT_CHARS = 400
+_MAX_USER_CONTENT_CHARS = 2_000
+_MAX_ASSISTANT_CONTENT_CHARS = 2_000
+
+
+def format_conversation_excerpt(messages: list[Message]) -> str:
+    """Format the tail of a queen conversation for the evaluator prompt.
+
+    Keeps the most recent ``_MAX_MESSAGES`` messages.  Tool results are
+    truncated hard since they're rarely load-bearing for the readiness
+    decision; user/assistant text is truncated more generously to
+    preserve the actual conversation signal.
+    """
+    if not messages:
+        return "(no messages)"
+
+    tail = messages[-_MAX_MESSAGES:]
+    parts: list[str] = []
+    for msg in tail:
+        role = msg.role.upper()
+        content = (msg.content or "").strip()
+        if msg.role == "tool":
+            if len(content) > _MAX_TOOL_CONTENT_CHARS:
+                content = content[:_MAX_TOOL_CONTENT_CHARS] + "..."
+        elif msg.role == "assistant":
+            # Surface tool-call intent for empty assistant turns so the
+            # evaluator sees what the queen has been doing.
+            if not content and msg.tool_calls:
+                names = [tc.get("function", {}).get("name", "?") for tc in msg.tool_calls]
+                content = f"(called: {', '.join(names)})"
+            if len(content) > _MAX_ASSISTANT_CONTENT_CHARS:
+                content = content[:_MAX_ASSISTANT_CONTENT_CHARS] + "..."
+        else:  # user
+            if len(content) > _MAX_USER_CONTENT_CHARS:
+                content = content[:_MAX_USER_CONTENT_CHARS] + "..."
+        if content:
+            parts.append(f"[{role}]: {content}")
+
+    return "\n\n".join(parts) if parts else "(no messages)"
+
+
+def _build_user_message(
+    conversation_excerpt: str,
+    colony_name: str,
+    intended_purpose: str,
+) -> str:
+    return (
+        f"## Proposed colony name\n{colony_name}\n\n"
+        f"## Queen's intended_purpose\n{intended_purpose.strip()}\n\n"
+        f"## Recent conversation (oldest → newest)\n{conversation_excerpt}\n\n"
+        "Decide: should this queen be approved to enter INCUBATING phase?"
+    )
+
+
+def _parse_verdict(raw: str) -> dict[str, Any] | None:
+    """Parse the evaluator's JSON.  Returns None if parsing fails."""
+    if not raw:
+        return None
+    raw = raw.strip()
+    try:
+        return json.loads(raw)
+    except json.JSONDecodeError:
+        # Some models wrap JSON in markdown fences or add preamble.
+        # Pull the first { ... } block out as a best-effort fallback —
+        # mirrors the same recovery pattern used in recall_selector.py.
+        match = re.search(r"\{.*\}", raw, re.DOTALL)
+        if match:
+            try:
+                return json.loads(match.group())
+            except json.JSONDecodeError:
+                return None
+    return None
+
+
+def _normalize_verdict(parsed: dict[str, Any]) -> dict[str, Any]:
+    """Coerce a parsed verdict into the shape the tool returns to the queen."""
+    ready = bool(parsed.get("ready"))
+    reasons = parsed.get("reasons") or []
+    if isinstance(reasons, str):
+        reasons = [reasons]
+    reasons = [str(r).strip() for r in reasons if str(r).strip()]
+    missing = parsed.get("missing_prerequisites") or []
+    if isinstance(missing, str):
+        missing = [missing]
+    missing = [str(m).strip() for m in missing if str(m).strip()]
+
+    if ready:
+        # When approved we don't surface missing prerequisites — the
+        # incubating role prompt opens that floor itself.
+        missing = []
+    elif not reasons:
+        # Always give the queen at least one reason to reflect on.
+        reasons = ["evaluator returned no reasons"]
+
+    return {
+        "ready": ready,
+        "reasons": reasons,
+        "missing_prerequisites": missing,
+    }
+
+
+async def evaluate(
+    llm: Any,
+    messages: list[Message],
+    colony_name: str,
+    intended_purpose: str,
+) -> dict[str, Any]:
+    """Run the incubating evaluator against the queen's conversation.
+
+    Args:
+        llm: An LLM provider exposing ``acomplete(messages, system, ...)``.
+            Pass the queen's own ``ctx.llm`` so the eval uses the same
+            model the user is talking to.
+        messages: The queen's conversation messages, oldest first.  The
+            evaluator slices its own tail; pass the full list.
+        colony_name: Validated colony slug.
+        intended_purpose: Queen's one-paragraph brief.
+
+    Returns:
+        ``{"ready": bool, "reasons": [str], "missing_prerequisites": [str]}``.
+        Fail-closed on any error.
+    """
+    excerpt = format_conversation_excerpt(messages)
+    user_msg = _build_user_message(excerpt, colony_name, intended_purpose)
+
+    try:
+        response = await llm.acomplete(
+            messages=[{"role": "user", "content": user_msg}],
+            system=_INCUBATING_EVALUATOR_SYSTEM_PROMPT,
+            max_tokens=1024,
+            response_format={"type": "json_object"},
+        )
+    except Exception as exc:  # noqa: BLE001 - fail-closed on any LLM failure
+        logger.warning("incubating_evaluator: LLM call failed (%s)", exc)
+        return {
+            "ready": False,
+            "reasons": ["evaluation_failed"],
+            "missing_prerequisites": ["evaluator LLM call failed; retry once the queen can reach the model again"],
+        }
+
+    raw = (getattr(response, "content", "") or "").strip()
+    parsed = _parse_verdict(raw)
+    if parsed is None:
+        logger.warning(
+            "incubating_evaluator: could not parse JSON verdict (raw=%.200s)",
+            raw,
+        )
+        return {
+            "ready": False,
+            "reasons": ["evaluation_failed"],
+            "missing_prerequisites": ["evaluator returned malformed JSON; retry"],
+        }
+
+    return _normalize_verdict(parsed)
@@ -1099,12 +1099,17 @@ def ensure_default_queens() -> None:

    Safe to call multiple times — skips any profile that already has a file.
    """
+    created = 0
    for queen_id, profile in DEFAULT_QUEENS.items():
        queen_dir = QUEENS_DIR / queen_id
        profile_path = queen_dir / "profile.yaml"
+        if profile_path.exists():
+            continue
        queen_dir.mkdir(parents=True, exist_ok=True)
        profile_path.write_text(yaml.safe_dump(profile, sort_keys=False, allow_unicode=True))
-    logger.info("Queen profiles ensured at %s", QUEENS_DIR)
+        created += 1
+    if created:
+        logger.info("Created %d default queen profile(s) at %s", created, QUEENS_DIR)


 def list_queens() -> list[dict[str, str]]:
@@ -1143,6 +1148,10 @@ def load_queen_profile(queen_id: str) -> dict[str, Any]:
 def update_queen_profile(queen_id: str, updates: dict[str, Any]) -> dict[str, Any]:
    """Merge partial updates into an existing queen profile and persist.

+    Performs a shallow merge at the top level, but deep-merges dict values
+    (e.g. world_lore, hidden_background) so partial sub-field updates don't
+    clobber sibling keys.
+
    Returns the full updated profile.
    Raises FileNotFoundError if the profile doesn't exist.
    """
@@ -1150,7 +1159,11 @@ def update_queen_profile(queen_id: str, updates: dict[str, Any]) -> dict[str, An
    if not profile_path.exists():
        raise FileNotFoundError(f"Queen profile not found: {queen_id}")
    data = yaml.safe_load(profile_path.read_text())
-    data.update(updates)
+    for key, value in updates.items():
+        if isinstance(value, dict) and isinstance(data.get(key), dict):
+            data[key].update(value)
+        else:
+            data[key] = value
    profile_path.write_text(yaml.safe_dump(data, sort_keys=False, allow_unicode=True))
    return data

@@ -1160,7 +1173,38 @@ def update_queen_profile(queen_id: str, updates: dict[str, Any]) -> dict[str, An
 # ---------------------------------------------------------------------------


-def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
+def _as_clean_text(value: Any) -> str:
+    """Return a stripped string, or an empty string for non-string values."""
+    return value.strip() if isinstance(value, str) else ""
+
+
+def _sentence(value: Any) -> str:
+    text = _as_clean_text(value)
+    if not text:
+        return ""
+    return text if text.endswith((".", "!", "?")) else f"{text}."
+
+
+def _profile_text_to_instruction(text: Any) -> str:
+    instruction = _sentence(text)
+    replacements = {
+        "She thrives": "You thrive",
+        "she thrives": "you thrive",
+        "She's ": "You are ",
+        "she's ": "you are ",
+        "She is ": "You are ",
+        "she is ": "you are ",
+        "She ": "You ",
+        "she ": "you ",
+        "Her ": "Your ",
+        "her ": "your ",
+    }
+    for old, new in replacements.items():
+        instruction = instruction.replace(old, new)
+    return instruction
+
+
+def format_queen_identity_prompt(profile: dict[str, Any], *, max_examples: int | None = None) -> str:
    """Convert a queen profile into a high-dimensional character prompt.

    Uses the 5-pillar character construction system: core identity,
@@ -1168,6 +1212,11 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
    behavior rules, and world lore.  The hidden background and
    psychological profile are never shown to the user but shape
    every response.
+
+    ``max_examples`` caps the roleplay_examples block — profiles ship
+    four worked examples (~2.4 KB) but one is enough at runtime to show
+    the internal-then-external pattern. Full rendering stays available
+    for profile authoring / eval playback by leaving ``max_examples=None``.
    """
    name = profile.get("name", "the Queen")
    title = profile.get("title", "Senior Advisor")
@@ -1181,35 +1230,35 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:
    sections: list[str] = []

    # Pillar 1: Core identity
-    sections.append(f"<core_identity>\nName: {name}, Identity: {title}.\n{core}\n</core_identity>")
+    sections.append(f"<core_identity>\nYou are {name}, {title}.\n{core}\n</core_identity>")

    # Pillar 2: Hidden background (behavioral engine, never surfaced)
    if bg:
        sections.append(
-            f"<hidden_background>\n"
-            f"(Strictly hidden from users -- acts as your underlying "
-            f"behavioral engine)\n"
+            "<hidden_background>\n"
+            "(Strictly hidden from users -- acts as your underlying "
+            "behavioral engine)\n"
            f"- Past Wound: {bg.get('past_wound', '')}\n"
            f"- Deep Motive: {bg.get('deep_motive', '')}\n"
            f"- Behavioral Mapping: {bg.get('behavioral_mapping', '')}\n"
-            f"</hidden_background>"
+            "</hidden_background>"
        )

    # Pillar 3: Psychological profile
    if psych:
        sections.append(
-            f"<psychological_profile>\n"
-            f"- Social Masks & Boundaries: "
+            "<psychological_profile>\n"
+            "- Social Masks & Boundaries: "
            f"{psych.get('social_masks', '')}\n"
-            f"- Anti-Stereotype Rules: "
-            f"{psych.get('anti_stereotype', '')}\n"
-            f"</psychological_profile>"
+            "- Anti-Stereotype Rules: "
+            f"{_profile_text_to_instruction(psych.get('anti_stereotype'))}\n"
+            "</psychological_profile>"
        )

    # Pillar 4: Behavior rules
    trigger_lines = []
    for t in triggers:
-        trigger_lines.append(f"  - [{t.get('trigger', '')}]: {t.get('reaction', '')}")
+        trigger_lines.append(f"  - [{t.get('trigger', '')}]: {_profile_text_to_instruction(t.get('reaction'))}")
    sections.append(
        "<behavior_rules>\n"
        "- Before each response, internally assess:\n"
@@ -1248,6 +1297,8 @@ def format_queen_identity_prompt(profile: dict[str, Any]) -> str:

    # Few-shot examples showing the full internal process
    examples = profile.get("examples", [])
+    if examples and max_examples is not None:
+        examples = examples[:max_examples]
    if examples:
        example_parts: list[str] = []
        for ex in examples:
@@ -0,0 +1,217 @@
+"""Per-queen tool configuration sidecar (``tools.json``).
+
+Lives at ``~/.hive/agents/queens/{queen_id}/tools.json`` alongside
+``profile.yaml``. Kept separate so identity (name, title, core traits)
+stays human-authored and lean, while the machine-managed tool allowlist
+can grow (per-tool overrides, audit timestamps, future per-phase rules)
+without bloating the profile.
+
+Schema::
+
+    {
+      "enabled_mcp_tools": ["read_file", ...] | null,
+      "updated_at": "2026-04-21T12:34:56+00:00"
+    }
+
+- ``null`` / missing file → default "allow every MCP tool".
+- ``[]`` → explicitly disable every MCP tool.
+- ``["foo", "bar"]`` → only those MCP tool names pass the filter.
+
+Atomic writes via ``os.replace`` follow the same pattern as
+``framework.host.colony_metadata.update_colony_metadata``.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from framework.config import QUEENS_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def tools_config_path(queen_id: str) -> Path:
+    """Return the on-disk path to a queen's ``tools.json``."""
+    return QUEENS_DIR / queen_id / "tools.json"
+
+
+def _atomic_write_json(path: Path, data: dict[str, Any]) -> None:
+    """Write ``data`` to ``path`` atomically via tempfile + replace."""
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp = tempfile.mkstemp(
+        prefix=".tools.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp, path)
+    except BaseException:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+        raise
+
+
+def _migrate_from_profile_if_needed(queen_id: str) -> list[str] | None:
+    """Hoist a legacy ``enabled_mcp_tools`` field out of ``profile.yaml``.
+
+    Returns the migrated value (or ``None`` if nothing to migrate). After
+    migration the sidecar exists on disk and the profile YAML no longer
+    contains ``enabled_mcp_tools``. Safe to call repeatedly.
+    """
+    profile_path = QUEENS_DIR / queen_id / "profile.yaml"
+    if not profile_path.exists():
+        return None
+    try:
+        data = yaml.safe_load(profile_path.read_text(encoding="utf-8"))
+    except (yaml.YAMLError, OSError):
+        logger.warning("Could not read profile.yaml during tools migration: %s", queen_id)
+        return None
+    if not isinstance(data, dict):
+        return None
+    if "enabled_mcp_tools" not in data:
+        return None
+
+    raw = data.pop("enabled_mcp_tools")
+    enabled: list[str] | None
+    if raw is None:
+        enabled = None
+    elif isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+        enabled = raw
+    else:
+        logger.warning(
+            "Legacy enabled_mcp_tools on queen %s had unexpected shape %r; dropping",
+            queen_id,
+            raw,
+        )
+        enabled = None
+
+    # Write sidecar first, then rewrite profile — if the second step
+    # fails we still have the config available and won't re-migrate.
+    _atomic_write_json(
+        tools_config_path(queen_id),
+        {
+            "enabled_mcp_tools": enabled,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    profile_path.write_text(
+        yaml.safe_dump(data, sort_keys=False, allow_unicode=True),
+        encoding="utf-8",
+    )
+    logger.info(
+        "Migrated enabled_mcp_tools for queen %s from profile.yaml to tools.json",
+        queen_id,
+    )
+    return enabled
+
+
+def tools_config_exists(queen_id: str) -> bool:
+    """Return True when the queen has a persisted ``tools.json`` sidecar.
+
+    Used by callers that need to tell an explicit user save apart from a
+    fallthrough to the role-based default (both can return the same
+    value from ``load_queen_tools_config``).
+    """
+    return tools_config_path(queen_id).exists()
+
+
+def delete_queen_tools_config(queen_id: str) -> bool:
+    """Delete the queen's ``tools.json`` sidecar if present.
+
+    Returns ``True`` if a file was removed, ``False`` if none existed.
+    The next ``load_queen_tools_config`` call falls through to the
+    role-based default (or allow-all for unknown queens).
+    """
+    path = tools_config_path(queen_id)
+    if not path.exists():
+        return False
+    try:
+        path.unlink()
+        return True
+    except OSError:
+        logger.warning("Failed to delete %s", path, exc_info=True)
+        return False
+
+
+def load_queen_tools_config(
+    queen_id: str,
+    mcp_catalog: dict[str, list[dict]] | None = None,
+) -> list[str] | None:
+    """Return the queen's MCP tool allowlist, or ``None`` for default-allow.
+
+    Order of resolution:
+    1. ``tools.json`` sidecar (authoritative; user has saved).
+    2. Legacy ``profile.yaml`` field (migrated and deleted on first read).
+    3. Role-based default from ``queen_tools_defaults`` when the queen
+       is in the known persona table. ``mcp_catalog`` lets the helper
+       expand ``@server:NAME`` shorthands; without it, shorthand entries
+       are dropped.
+    4. ``None`` — default "allow every MCP tool".
+    """
+    path = tools_config_path(queen_id)
+    if path.exists():
+        try:
+            data = json.loads(path.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            logger.warning("Invalid %s; treating as default-allow", path)
+            return None
+        if not isinstance(data, dict):
+            return None
+        raw = data.get("enabled_mcp_tools")
+        if raw is None:
+            return None
+        if isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+            return raw
+        logger.warning("Unexpected enabled_mcp_tools shape in %s; ignoring", path)
+        return None
+
+    migrated = _migrate_from_profile_if_needed(queen_id)
+    if migrated is not None:
+        return migrated
+    # If migration just hoisted an explicit ``null`` out of profile.yaml,
+    # a sidecar with allow-all semantics now exists on disk. Honor that
+    # over the role default so an explicit user choice wins.
+    if tools_config_path(queen_id).exists():
+        return None
+
+    # No sidecar, nothing to migrate — fall back to role-based default.
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    return resolve_queen_default_tools(queen_id, mcp_catalog)
+
+
+def update_queen_tools_config(
+    queen_id: str,
+    enabled_mcp_tools: list[str] | None,
+) -> list[str] | None:
+    """Persist the queen's MCP allowlist to ``tools.json``.
+
+    Raises ``FileNotFoundError`` if the queen's directory is missing —
+    we refuse to silently create a sidecar for a queen that doesn't
+    exist.
+    """
+    queen_dir = QUEENS_DIR / queen_id
+    if not queen_dir.exists():
+        raise FileNotFoundError(f"Queen directory not found: {queen_id}")
+    _atomic_write_json(
+        tools_config_path(queen_id),
+        {
+            "enabled_mcp_tools": enabled_mcp_tools,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    return enabled_mcp_tools
@@ -0,0 +1,272 @@
+"""Role-based default tool allowlists for queens.
+
+Every queen inherits the same MCP surface (all servers loaded for the
+queen agent), but exposing 94+ tools to every persona clutters the LLM
+tool catalog and wastes prompt tokens. This module defines a sensible
+default allowlist per queen persona so, e.g., Head of Legal doesn't
+see port scanners and Head of Finance doesn't see ``apply_patch``.
+
+Defaults apply only when the queen has no ``tools.json`` sidecar — the
+moment the user saves an allowlist through the Tool Library, the
+sidecar becomes authoritative. A DELETE on the tools endpoint removes
+the sidecar and brings the queen back to her role default.
+
+Category entries support a ``@server:NAME`` shorthand that expands to
+every tool name registered against that MCP server in the current
+catalog. This keeps the category table short and drift-free when new
+tools are added (e.g. browser_* auto-joins the ``browser`` category).
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Categories — reusable bundles of MCP tool names.
+# ---------------------------------------------------------------------------
+#
+# Each category is a flat list of either concrete tool names or the
+# ``@server:NAME`` shorthand. The shorthand expands to every tool the
+# given MCP server currently exposes (requires a live catalog; when one
+# is not available the shorthand is silently dropped so we fall back to
+# the named entries only).
+
+_TOOL_CATEGORIES: dict[str, list[str]] = {
+    # Read-only file operations — safe baseline for every knowledge queen.
+    "file_read": [
+        "read_file",
+        "list_directory",
+        "list_dir",
+        "list_files",
+        "search_files",
+        "grep_search",
+        "pdf_read",
+    ],
+    # File mutation — only personas that author or edit artifacts.
+    "file_write": [
+        "write_file",
+        "edit_file",
+        "apply_diff",
+        "apply_patch",
+        "replace_file_content",
+        "hashline_edit",
+        "undo_changes",
+    ],
+    # Shell + process control — engineering personas only.
+    "shell": [
+        "run_command",
+        "execute_command_tool",
+        "bash_kill",
+        "bash_output",
+    ],
+    # Tabular data. CSV/Excel read/write + DuckDB SQL.
+    "data": [
+        "csv_read",
+        "csv_info",
+        "csv_write",
+        "csv_append",
+        "csv_sql",
+        "excel_read",
+        "excel_info",
+        "excel_write",
+        "excel_append",
+        "excel_search",
+        "excel_sheet_list",
+        "excel_sql",
+    ],
+    # Browser automation — every tool from the gcu-tools MCP server.
+    "browser": ["@server:gcu-tools"],
+    # External research / information-gathering.
+    "research": [
+        "search_papers",
+        "download_paper",
+        "search_wikipedia",
+        "web_scrape",
+    ],
+    # Security scanners — pentest-ish, only for engineering/security roles.
+    "security": [
+        "dns_security_scan",
+        "http_headers_scan",
+        "port_scan",
+        "ssl_tls_scan",
+        "subdomain_enumerate",
+        "tech_stack_detect",
+        "risk_score",
+    ],
+    # Lightweight context helpers — good default for every queen.
+    "time_context": [
+        "get_current_time",
+        "get_account_info",
+    ],
+    # Runtime log inspection — debug/observability for builder personas.
+    "runtime_inspection": [
+        "query_runtime_logs",
+        "query_runtime_log_details",
+        "query_runtime_log_raw",
+    ],
+    # Agent-management tools — building/validating/checking agents.
+    "agent_mgmt": [
+        "list_agents",
+        "list_agent_tools",
+        "list_agent_sessions",
+        "get_agent_checkpoint",
+        "list_agent_checkpoints",
+        "run_agent_tests",
+        "save_agent_draft",
+        "confirm_and_build",
+        "validate_agent_package",
+        "validate_agent_tools",
+        "enqueue_task",
+    ],
+}
+
+
+# ---------------------------------------------------------------------------
+# Per-queen mapping.
+# ---------------------------------------------------------------------------
+#
+# Built from the queen personas in ``queen_profiles.DEFAULT_QUEENS``. The
+# goal is "just enough" — a queen should see tools she'd plausibly call
+# for her stated role, nothing more. Users curate further via the Tool
+# Library if they want.
+#
+# A queen whose ID is NOT in this map falls through to "allow every MCP
+# tool" (the original behavior), which keeps the system compatible with
+# user-added custom queen IDs that we don't know about.
+
+QUEEN_DEFAULT_CATEGORIES: dict[str, list[str]] = {
+    # Head of Technology — builds and operates systems; full toolkit.
+    "queen_technology": [
+        "file_read",
+        "file_write",
+        "shell",
+        "data",
+        "browser",
+        "research",
+        "security",
+        "time_context",
+        "runtime_inspection",
+        "agent_mgmt",
+    ],
+    # Head of Growth — data, experiments, competitor research; no shell/security.
+    "queen_growth": [
+        "file_read",
+        "file_write",
+        "data",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Product Strategy — user research + roadmaps; no shell/security.
+    "queen_product_strategy": [
+        "file_read",
+        "file_write",
+        "data",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Finance — financial models (CSV/Excel heavy), market research.
+    "queen_finance_fundraising": [
+        "file_read",
+        "file_write",
+        "data",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Legal — reads contracts/PDFs, researches; no shell/data/security.
+    "queen_legal": [
+        "file_read",
+        "file_write",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Brand & Design — visual refs, style guides; no shell/data/security.
+    "queen_brand_design": [
+        "file_read",
+        "file_write",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Talent — candidate pipelines, resumes; data + browser heavy.
+    "queen_talent": [
+        "file_read",
+        "file_write",
+        "data",
+        "browser",
+        "research",
+        "time_context",
+    ],
+    # Head of Operations — processes, automation, observability.
+    "queen_operations": [
+        "file_read",
+        "file_write",
+        "data",
+        "browser",
+        "research",
+        "time_context",
+        "runtime_inspection",
+        "agent_mgmt",
+    ],
+}
+
+
+def has_role_default(queen_id: str) -> bool:
+    """Return True when ``queen_id`` is known to the category table."""
+    return queen_id in QUEEN_DEFAULT_CATEGORIES
+
+
+def resolve_queen_default_tools(
+    queen_id: str,
+    mcp_catalog: dict[str, list[dict[str, Any]]] | None = None,
+) -> list[str] | None:
+    """Return the role-based default allowlist for ``queen_id``.
+
+    Arguments:
+        queen_id: Profile ID (e.g. ``"queen_technology"``).
+        mcp_catalog: Optional mapping of ``{server_name: [{"name": ...}, ...]}``
+            used to expand ``@server:NAME`` shorthands in categories.
+            When absent, shorthand entries are dropped and the result
+            contains only the explicitly-named tools.
+
+    Returns:
+        A deduplicated list of tool names, or ``None`` if the queen has
+        no role entry (caller should treat as "allow every MCP tool").
+    """
+    categories = QUEEN_DEFAULT_CATEGORIES.get(queen_id)
+    if not categories:
+        return None
+
+    names: list[str] = []
+    seen: set[str] = set()
+
+    def _add(name: str) -> None:
+        if name and name not in seen:
+            seen.add(name)
+            names.append(name)
+
+    for cat in categories:
+        for entry in _TOOL_CATEGORIES.get(cat, []):
+            if entry.startswith("@server:"):
+                server_name = entry[len("@server:") :]
+                if mcp_catalog is None:
+                    logger.debug(
+                        "resolve_queen_default_tools: catalog missing; cannot expand %s",
+                        entry,
+                    )
+                    continue
+                for tool in mcp_catalog.get(server_name, []) or []:
+                    tname = tool.get("name") if isinstance(tool, dict) else None
+                    if tname:
+                        _add(tname)
+            else:
+                _add(entry)
+
+    return names
@@ -18,7 +18,7 @@ Use browser nodes (with `tools: {policy: "all"}`) when:

 All tools are prefixed with `browser_`:
 - `browser_start`, `browser_open`, `browser_navigate` — launch/navigate
- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type` — interact
+- `browser_click`, `browser_click_coordinate`, `browser_fill`, `browser_type`, `browser_type_focused` — interact
 - `browser_press` (with optional `modifiers=["ctrl"]` etc.) — keyboard shortcuts
 - `browser_snapshot` — compact accessibility-tree read (structured)
 <!-- vision-only -->
@@ -33,23 +33,24 @@ All tools are prefixed with `browser_`:

 **`browser_snapshot`** — compact accessibility tree of interactive elements. Fast, cheap, good for static or form-heavy pages where the DOM matches what's visually rendered (documentation, simple dashboards, search results, settings pages).

-**`browser_screenshot`** — visual capture + metadata (`cssWidth`, `devicePixelRatio`, scale fields). **Use this on any complex SPA** — LinkedIn, Twitter/X, Reddit, Gmail, Notion, Slack, Discord, any site using shadow DOM, virtual scrolling, React reconciliation, or dynamic layout. On these pages, snapshot refs go stale in seconds, shadow contents aren't in the AX tree, and virtual-scrolled elements disappear from the tree entirely. Screenshot is the **only** reliable way to orient yourself.
+**`browser_screenshot`** — visual capture + metadata (`cssWidth`, `devicePixelRatio`, scale fields). Use this when `browser_snapshot` does not show the thing you need, when refs look stale, or when visual position/layout matters. This often happens on complex SPAs — LinkedIn, Twitter/X, Reddit, Gmail, Notion, Slack, Discord — and on sites using shadow DOM, virtual scrolling, React reconciliation, or dynamic layout.

-Neither tool is "preferred" universally — they're for different jobs. Default to snapshot on text-heavy static pages, screenshot on SPAs and anything shadow-DOM-heavy. Activate the `browser-automation` skill for the full decision tree.
+Neither tool is "preferred" universally — they're for different jobs. Start with snapshot for page structure and ordinary controls; use screenshot as the fallback when snapshot can't find or verify the visible target. Activate the `browser-automation` skill for the full decision tree.

 ## Coordinate rule

-`browser_screenshot` delivers the image at the CSS viewport's own dimensions, so a pixel you read off the screenshot is the same coordinate `browser_click_coordinate`, `browser_hover_coordinate`, and `browser_press_at` expect — no conversion. `getBoundingClientRect()` likewise returns CSS pixels; pass through unchanged.
+Every browser tool that takes or returns coordinates operates in **fractions of the viewport (0..1 for both axes)**. Read a target's proportional position off `browser_screenshot` ("~35% from the left, ~20% from the top" → `(0.35, 0.20)`) and pass that to `browser_click_coordinate` / `browser_hover_coordinate` / `browser_press_at`. `browser_get_rect` and `browser_shadow_query` return `rect.cx` / `rect.cy` as fractions. The tools multiply by `cssWidth` / `cssHeight` internally — no scale awareness required. Fractions are used because every vision model (Claude, GPT-4o, Gemini, local VLMs) resizes/tiles images differently; proportions are invariant. Avoid raw `getBoundingClientRect()` via `browser_evaluate` for coord lookup; use `browser_get_rect` instead.

 ## System prompt tips for browser nodes

 ```
-1. On LinkedIn / X / Reddit / Gmail / any SPA — use browser_screenshot to orient,
-   not browser_snapshot. Shadow DOM and virtual scrolling make snapshots unreliable.
-2. For static pages (docs, forms, search results), browser_snapshot is fine.
+1. Start with browser_snapshot or the snapshot returned by the latest interaction.
+2. If the target is missing, ambiguous, stale, or visibly present but absent from the tree,
+   use browser_screenshot to orient and then click by fractional coordinates.
 3. Before typing into a rich-text editor (X compose, LinkedIn DM, Gmail, Reddit),
   click the input area first with browser_click_coordinate so React / Draft.js /
-   Lexical register a native focus event. Otherwise the send button stays disabled.
+   Lexical register a native focus event, then use browser_type_focused(text=...)
+   for shadow-DOM inputs or browser_type(selector, text) for light-DOM inputs.
 4. Use browser_wait(seconds=2-3) after navigation for SPA hydration.
 5. If you hit an auth wall, call set_output with an error and move on.
 6. Keep tool calls per turn <= 10 for reliability.
@@ -65,7 +66,7 @@ Neither tool is "preferred" universally — they're for different jobs. Default
  "tools": {"policy": "all"},
  "input_keys": ["search_url"],
  "output_keys": ["profiles"],
-  "system_prompt": "Navigate to the search URL via browser_navigate(wait_until='load', timeout_ms=20000). Wait 3s for SPA hydration. On LinkedIn, use browser_screenshot to see the page — browser_snapshot misses shadow-DOM and virtual-scrolled content. Paginate through results by scrolling and screenshotting; extract each profile card by reading its visible layout..."
+  "system_prompt": "Navigate to the search URL via browser_navigate(wait_until='load', timeout_ms=20000). Wait 3s for SPA hydration. Use the returned snapshot to look for result cards first. If the cards are missing, stale, or visually present but absent from the tree, use browser_screenshot to orient; paginate through results by scrolling and use screenshots only when the snapshot cannot find or verify the visible cards..."
 }
 ```

@@ -823,8 +823,8 @@ async def run_shutdown_reflection(
 # ---------------------------------------------------------------------------

 _LONG_REFLECT_INTERVAL = 5
-_SHORT_REFLECT_TURN_INTERVAL = 2
-_SHORT_REFLECT_COOLDOWN_SEC = 120.0
+_SHORT_REFLECT_TURN_INTERVAL = 3
+_SHORT_REFLECT_COOLDOWN_SEC = 300.0


 async def subscribe_reflection_triggers(
@@ -1672,7 +1672,7 @@ class AgentHost:
        entry_point_id: str,
        execution_id: str,
        graph_id: str | None = None,
-    ) -> bool:
+    ) -> str:
        """
        Cancel a running execution.

@@ -1682,11 +1682,11 @@ class AgentHost:
            graph_id: Graph to search (defaults to active graph)

        Returns:
-            True if cancelled, False if not found
+            Cancellation outcome from the stream.
        """
        stream = self._resolve_stream(entry_point_id, graph_id)
        if stream is None:
-            return False
+            return "not_found"
        return await stream.cancel_execution(execution_id)

    # === QUERY OPERATIONS ===
@@ -0,0 +1,95 @@
+"""Read/write helpers for per-colony metadata.json.
+
+A colony's metadata.json lives at ``{COLONIES_DIR}/{colony_name}/metadata.json``
+and holds immutable provenance: the queen that created it, the forked
+session id, creation/update timestamps, and the list of workers.
+
+Mutable user-editable tool configuration lives in a sibling
+``tools.json`` sidecar — see :mod:`framework.host.colony_tools_config`
+— so identity and tool gating evolve independently.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+from framework.config import COLONIES_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def colony_metadata_path(colony_name: str) -> Path:
+    """Return the on-disk path to a colony's metadata.json."""
+    return COLONIES_DIR / colony_name / "metadata.json"
+
+
+def load_colony_metadata(colony_name: str) -> dict[str, Any]:
+    """Load metadata.json for ``colony_name``.
+
+    Returns an empty dict if the file is missing or malformed — callers
+    are expected to treat missing fields as defaults.
+    """
+    path = colony_metadata_path(colony_name)
+    if not path.exists():
+        return {}
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        logger.warning("Failed to read colony metadata at %s", path)
+        return {}
+    return data if isinstance(data, dict) else {}
+
+
+def update_colony_metadata(colony_name: str, updates: dict[str, Any]) -> dict[str, Any]:
+    """Shallow-merge ``updates`` into metadata.json and persist.
+
+    Returns the full updated dict. Raises ``FileNotFoundError`` if the
+    colony does not exist. Writes atomically via ``os.replace`` to
+    minimize the window where a reader could see a half-written file.
+    """
+    import os
+    import tempfile
+
+    path = colony_metadata_path(colony_name)
+    if not path.parent.exists():
+        raise FileNotFoundError(f"Colony '{colony_name}' not found")
+
+    data = load_colony_metadata(colony_name) if path.exists() else {}
+    for key, value in updates.items():
+        data[key] = value
+
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp_path = tempfile.mkstemp(
+        prefix=".metadata.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp_path, path)
+    except BaseException:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+        raise
+    return data
+
+
+def list_colony_names() -> list[str]:
+    """Return the names of every colony that has a metadata.json on disk."""
+    if not COLONIES_DIR.is_dir():
+        return []
+    names: list[str] = []
+    for entry in sorted(COLONIES_DIR.iterdir()):
+        if not entry.is_dir():
+            continue
+        if (entry / "metadata.json").exists():
+            names.append(entry.name)
+    return names
@@ -14,6 +14,7 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
+import os
 import time
 from collections import OrderedDict
 from collections.abc import Callable
@@ -73,9 +74,28 @@ def _format_spawn_task_message(task: str, input_data: dict[str, Any]) -> str:
    return "\n".join(lines)


+def _env_int(name: str, default: int) -> int:
+    """Read a positive int from env; fall back to default on missing/invalid."""
+    raw = os.environ.get(name)
+    if not raw:
+        return default
+    try:
+        value = int(raw)
+    except ValueError:
+        logger.warning("Invalid %s=%r; using default %d", name, raw, default)
+        return default
+    return value if value > 0 else default
+
+
+# Laptop-safe default. Each worker is a full AgentLoop (Claude SDK session +
+# tool catalog), so ~4 concurrent is the realistic ceiling on a dev machine.
+# Override via HIVE_MAX_CONCURRENT_WORKERS for servers.
+_DEFAULT_MAX_CONCURRENT_WORKERS = _env_int("HIVE_MAX_CONCURRENT_WORKERS", 4)
+
+
@dataclass
 class ColonyConfig:
-    max_concurrent_workers: int = 100
+    max_concurrent_workers: int = _DEFAULT_MAX_CONCURRENT_WORKERS
    cache_ttl: float = 60.0
    batch_interval: float = 0.1
    max_history: int = 1000
@@ -165,6 +185,8 @@ class ColonyRuntime:
        protocols_prompt: str = "",
        skill_dirs: list[str] | None = None,
        pipeline_stages: list | None = None,
+        queen_id: str | None = None,
+        colony_name: str | None = None,
    ):
        from framework.pipeline.runner import PipelineRunner
        from framework.skills.manager import SkillsManager
@@ -173,14 +195,27 @@ class ColonyRuntime:
        self._goal = goal
        self._config = config or ColonyConfig()
        self._runtime_log_store = runtime_log_store
+        self._queen_id: str | None = queen_id
+        # ``colony_id`` is the event-bus scope (session.id in DM sessions);
+        # ``colony_name`` is the on-disk identity under ~/.hive/colonies/.
+        # They coincide for forked colonies but diverge for queen DM
+        # sessions, so separate them explicitly.
+        self._colony_name: str | None = colony_name

        if pipeline_stages:
            self._pipeline = PipelineRunner(pipeline_stages)
        else:
            self._pipeline = self._load_pipeline_from_config()

-        if skills_manager_config is not None:
-            self._skills_manager = SkillsManager(skills_manager_config)
+        # Resolve per-colony override paths so UI toggles can reach this
+        # runtime. Callers that build their own SkillsManagerConfig stay
+        # in charge; bare construction auto-wires the standard paths.
+        _effective_cfg = skills_manager_config
+        if _effective_cfg is None and not (skills_catalog_prompt or protocols_prompt):
+            _effective_cfg = self._build_default_skills_config(colony_name, queen_id)
+
+        if _effective_cfg is not None:
+            self._skills_manager = SkillsManager(_effective_cfg)
            self._skills_manager.load()
        elif skills_catalog_prompt or protocols_prompt:
            import warnings
@@ -222,6 +257,19 @@ class ColonyRuntime:
        self._tools = tools or []
        self._tool_executor = tool_executor

+        # Per-colony MCP tool allowlist — applied when spawning workers. A
+        # value of ``None`` means "allow every MCP tool" (default), an empty
+        # list disables every MCP tool, and a list of names only enables
+        # those. Lifecycle / synthetic tools always pass through the filter
+        # because their names are absent from ``_mcp_tool_names_all``. The
+        # allowlist is re-read on every ``spawn`` so a PATCH that mutates
+        # this attribute via ``set_tool_allowlist`` takes effect on the
+        # NEXT worker spawn without a runtime restart. In-flight workers
+        # keep the tool list they booted with — workers have no dynamic
+        # tools provider today.
+        self._enabled_mcp_tools: list[str] | None = None
+        self._mcp_tool_names_all: set[str] = set()
+
        # Worker management
        self._workers: dict[str, Worker] = {}
        # The persistent client-facing overseer (optional). Set by
@@ -238,6 +286,13 @@ class ColonyRuntime:
        self._timer_tasks: list[asyncio.Task] = []
        self._timer_next_fire: dict[str, float] = {}
        self._webhook_server: Any = None
+        # Background tasks owned by the runtime that aren't timers —
+        # e.g. the per-spawn soft/hard timeout watchers kicked off by
+        # run_parallel_workers. We hold strong references so asyncio
+        # does not garbage-collect them mid-sleep (Python's asyncio
+        # docs explicitly warn that create_task() needs a referenced
+        # handle).
+        self._background_tasks: set[asyncio.Task] = set()

        # Idempotency
        self._idempotency_keys: OrderedDict[str, str] = OrderedDict()
@@ -357,6 +412,128 @@ class ColonyRuntime:
            return PipelineRunner([])
        return build_pipeline_from_config(stages_config)

+    @staticmethod
+    def _build_default_skills_config(
+        colony_name: str | None,
+        queen_id: str | None,
+    ) -> SkillsManagerConfig:
+        """Assemble a ``SkillsManagerConfig`` that wires in the per-colony /
+        per-queen override files and the ``queen_ui`` / ``colony_ui`` scope
+        dirs based on the standard ``~/.hive`` layout.
+
+        ``colony_name`` must be an actual on-disk colony name
+        (``~/.hive/colonies/{name}/``). DM sessions where the ``colony_id``
+        is a session UUID should pass ``None`` so we don't create a stray
+        override file under a session identifier.
+        """
+        from framework.config import COLONIES_DIR, QUEENS_DIR
+        from framework.skills.discovery import ExtraScope
+        from framework.skills.manager import SkillsManagerConfig
+
+        extras: list[ExtraScope] = []
+        queen_overrides_path: Path | None = None
+        if queen_id:
+            queen_home = QUEENS_DIR / queen_id
+            queen_overrides_path = queen_home / "skills_overrides.json"
+            extras.append(ExtraScope(directory=queen_home / "skills", label="queen_ui", priority=2))
+
+        colony_overrides_path: Path | None = None
+        if colony_name:
+            colony_home = COLONIES_DIR / colony_name
+            colony_overrides_path = colony_home / "skills_overrides.json"
+            # Colony-scope SKILL.md dir is the project-scope from discovery's
+            # point of view (colony_dir is the project_root). Add it also as
+            # a tagged ``colony_ui`` scope so UI-created entries resolve with
+            # correct provenance.
+            extras.append(
+                ExtraScope(
+                    directory=colony_home / ".hive" / "skills",
+                    label="colony_ui",
+                    priority=3,
+                )
+            )
+
+        return SkillsManagerConfig(
+            queen_id=queen_id,
+            queen_overrides_path=queen_overrides_path,
+            colony_name=colony_name,
+            colony_overrides_path=colony_overrides_path,
+            extra_scope_dirs=extras,
+            interactive=False,  # HTTP-driven runtimes never prompt for consent
+        )
+
+    @property
+    def queen_id(self) -> str | None:
+        """The queen that owns this runtime, if known."""
+        return self._queen_id
+
+    @property
+    def colony_name(self) -> str | None:
+        """The on-disk colony name (distinct from event-bus scope ``colony_id``)."""
+        return self._colony_name
+
+    @property
+    def skills_manager(self):
+        """Access the live :class:`SkillsManager` (for HTTP handlers)."""
+        return self._skills_manager
+
+    async def reload_skills(self) -> dict[str, Any]:
+        """Rebuild the catalog after an override change; in-flight workers
+        pick up the new catalog on their next iteration via
+        ``dynamic_skills_catalog_provider``.
+
+        Returns a small stats dict that HTTP handlers can echo back to
+        the UI ("applied — N skills now in catalog").
+        """
+        async with self._skills_manager.mutation_lock:
+            self._skills_manager.reload()
+            self.skill_dirs = self._skills_manager.allowlisted_dirs
+            self.batch_init_nudge = self._skills_manager.batch_init_nudge
+            self.context_warn_ratio = self._skills_manager.context_warn_ratio
+            catalog_prompt = self._skills_manager.skills_catalog_prompt
+            return {
+                "catalog_chars": len(catalog_prompt),
+                "skill_dirs": list(self.skill_dirs),
+            }
+
+    # ── Per-colony tool allowlist ───────────────────────────────
+
+    def set_tool_allowlist(
+        self,
+        enabled_mcp_tools: list[str] | None,
+        mcp_tool_names_all: set[str] | None = None,
+    ) -> None:
+        """Configure the per-colony MCP tool allowlist.
+
+        Called at construction time (from SessionManager) and again from
+        the ``/api/colony/{name}/tools`` PATCH handler when a user edits
+        the allowlist. The change applies to the NEXT worker spawn — we
+        never mutate the tool list of a worker that is already running
+        (workers have no dynamic tools provider, so hot-reloading their
+        tool set would diverge from the list the LLM was already using).
+        """
+        self._enabled_mcp_tools = list(enabled_mcp_tools) if enabled_mcp_tools is not None else None
+        if mcp_tool_names_all is not None:
+            self._mcp_tool_names_all = set(mcp_tool_names_all)
+
+    def _apply_tool_allowlist(self, tools: list) -> list:
+        """Filter ``tools`` against the colony's MCP allowlist.
+
+        Lifecycle / synthetic tools (those whose names are NOT in
+        ``_mcp_tool_names_all``) are never gated. MCP tools are kept only
+        when ``_enabled_mcp_tools`` is None (default allow) or contains
+        their name. Input list order is preserved so downstream cache
+        keys and logs stay stable.
+        """
+        if self._enabled_mcp_tools is None:
+            return tools
+        allowed = set(self._enabled_mcp_tools)
+        return [
+            t
+            for t in tools
+            if getattr(t, "name", None) not in self._mcp_tool_names_all or getattr(t, "name", None) in allowed
+        ]
+
    # ── Lifecycle ───────────────────────────────────────────────

    async def start(self) -> None:
@@ -631,6 +808,52 @@ class ColonyRuntime:
        spawn_tools = tools if tools is not None else self._tools
        spawn_executor = tool_executor or self._tool_executor

+        # Apply the per-colony MCP tool allowlist (if any). Done HERE —
+        # after spawn_tools is resolved but before it's frozen into the
+        # worker's AgentContext — so the next spawn reflects any PATCH
+        # that happened since the last spawn. A value of ``None`` on
+        # ``_enabled_mcp_tools`` is a no-op so the default path is
+        # unchanged.
+        spawn_tools = self._apply_tool_allowlist(spawn_tools)
+
+        # Colony progress tracker: when the caller supplied a db_path
+        # in input_data, this worker is part of a SQLite task queue
+        # and must see the hive.colony-progress-tracker skill body in
+        # its system prompt from turn 0. Rebuild the catalog with the
+        # skill pre-activated; falls back to the colony default when
+        # no db_path is present.
+        _spawn_catalog = self.skills_catalog_prompt
+        _spawn_skill_dirs = self.skill_dirs
+        if isinstance(input_data, dict) and input_data.get("db_path"):
+            try:
+                from framework.skills.config import SkillsConfig
+                from framework.skills.manager import SkillsManager, SkillsManagerConfig
+
+                _pre = SkillsManager(
+                    SkillsManagerConfig(
+                        skills_config=SkillsConfig.from_agent_vars(
+                            skills=["hive.colony-progress-tracker"],
+                        ),
+                    )
+                )
+                _pre.load()
+                _spawn_catalog = _pre.skills_catalog_prompt
+                _spawn_skill_dirs = (
+                    list(_pre.allowlisted_dirs) if hasattr(_pre, "allowlisted_dirs") else self.skill_dirs
+                )
+                logger.info(
+                    "spawn: pre-activated hive.colony-progress-tracker "
+                    "(catalog %d → %d chars) for worker with db_path=%s",
+                    len(self.skills_catalog_prompt),
+                    len(_spawn_catalog),
+                    input_data.get("db_path"),
+                )
+            except Exception as exc:
+                logger.warning(
+                    "spawn: failed to pre-activate colony-progress-tracker skill, falling back to base catalog: %s",
+                    exc,
+                )
+
        # Resolve the SSE stream_id once. When the caller didn't supply
        # one we use the per-worker fan-out tag (filtered out by the
        # SSE handler). When the caller passed an explicit value we
@@ -675,6 +898,17 @@ class ColonyRuntime:
                conversation_store=worker_conv_store,
            )

+            # Workers pick up UI-driven override changes via this provider,
+            # which reads the live catalog on each iteration. The db_path
+            # pre-activated catalog stays static because its contents are
+            # built for *this* worker's task (a tombstone toggle from the
+            # UI should not yank it mid-run).
+            _db_path_pre_activated = bool(isinstance(input_data, dict) and input_data.get("db_path"))
+            # Default-bind the manager into the closure so each loop iteration
+            # captures the same manager instance — pyflakes B023 would flag a
+            # free-variable capture here.
+            _provider = None if _db_path_pre_activated else (lambda mgr=self._skills_manager: mgr.skills_catalog_prompt)
+
            agent_context = AgentContext(
                runtime=self._make_runtime_adapter(worker_id),
                agent_id=worker_id,
@@ -685,9 +919,10 @@ class ColonyRuntime:
                llm=self._llm,
                available_tools=list(spawn_tools),
                accounts_prompt=self._accounts_prompt,
-                skills_catalog_prompt=self.skills_catalog_prompt,
+                skills_catalog_prompt=_spawn_catalog,
                protocols_prompt=self.protocols_prompt,
-                skill_dirs=self.skill_dirs,
+                skill_dirs=_spawn_skill_dirs,
+                dynamic_skills_catalog_provider=_provider,
                execution_id=worker_id,
                stream_id=explicit_stream_id or f"worker:{worker_id}",
            )
@@ -720,6 +955,8 @@ class ColonyRuntime:
    async def spawn_batch(
        self,
        tasks: list[dict[str, Any]],
+        *,
+        tools_override: list[Any] | None = None,
    ) -> list[str]:
        """Spawn a batch of parallel workers, one per task spec.

@@ -732,6 +969,12 @@ class ColonyRuntime:
        The overseer's ``run_parallel_workers`` tool is the usual
        caller; it pairs ``spawn_batch`` + ``wait_for_worker_reports``
        into a single fan-out/fan-in primitive.
+
+        When ``tools_override`` is supplied, every spawned worker
+        receives that tool list instead of the colony's default.  Used
+        by ``run_parallel_workers`` to drop tools whose credentials
+        failed the pre-flight check (so the spawned workers don't
+        waste a startup trying to use them).
        """
        worker_ids: list[str] = []
        for spec in tasks:
@@ -743,6 +986,7 @@ class ColonyRuntime:
                task=task_text,
                count=1,
                input_data=task_data or {"task": task_text},
+                tools=tools_override,
            )
            worker_ids.extend(ids)
        return worker_ids
@@ -923,6 +1167,7 @@ class ColonyRuntime:
            conversation_store=overseer_conv_store,
        )

+        _overseer_skills_mgr = self._skills_manager
        overseer_ctx = AgentContext(
            runtime=self._make_runtime_adapter(overseer_id),
            agent_id=overseer_id,
@@ -936,6 +1181,7 @@ class ColonyRuntime:
            skills_catalog_prompt=self.skills_catalog_prompt,
            protocols_prompt=self.protocols_prompt,
            skill_dirs=self.skill_dirs,
+            dynamic_skills_catalog_provider=lambda: _overseer_skills_mgr.skills_catalog_prompt,
            execution_id=overseer_id,
            stream_id="overseer",
        )
@@ -1054,6 +1300,96 @@ class ColonyRuntime:
            return True
        return False

+    def watch_batch_timeouts(
+        self,
+        worker_ids: list[str],
+        *,
+        soft_timeout: float,
+        hard_timeout: float,
+        warning_message: str | None = None,
+    ) -> asyncio.Task:
+        """Schedule a background task that enforces soft + hard timeouts.
+
+        Semantics:
+          * At ``t = soft_timeout`` every worker in ``worker_ids`` that is
+            still active AND hasn't already filed an ``_explicit_report``
+            receives ``warning_message`` via ``send_to_worker`` — the inject
+            appears as a user turn at the next agent-loop boundary, so the
+            worker's LLM can see it and call ``report_to_parent`` with
+            partial results.
+          * At ``t = hard_timeout`` any worker still active is force-stopped
+            via ``stop_worker``. ``Worker.run`` still emits its
+            ``SUBAGENT_REPORT`` on cancel (the explicit report survives,
+            if the worker reported just before the stop) so the queen
+            always sees a terminal inject for every spawned worker.
+
+        Returns the scheduled task so callers can await or cancel it.
+        Non-blocking for the caller — the watcher runs on the event loop
+        independently.
+        """
+        if warning_message is None:
+            grace = max(0.0, hard_timeout - soft_timeout)
+            warning_message = (
+                f"[SOFT TIMEOUT] You've been running for {soft_timeout:.0f}s. "
+                "Wrap up now: call report_to_parent with whatever partial "
+                "results you have. You have "
+                f"~{grace:.0f}s more before a hard stop — anything not "
+                "reported by then will be lost."
+            )
+
+        async def _watch() -> None:
+            try:
+                await asyncio.sleep(soft_timeout)
+                for wid in worker_ids:
+                    worker = self._workers.get(wid)
+                    if worker is None or not worker.is_active:
+                        continue
+                    if getattr(worker, "_explicit_report", None) is not None:
+                        continue
+                    try:
+                        await self.send_to_worker(wid, warning_message)
+                    except Exception:
+                        logger.warning(
+                            "watch_batch_timeouts: soft-timeout inject failed for %s",
+                            wid,
+                            exc_info=True,
+                        )
+
+                remaining = hard_timeout - soft_timeout
+                if remaining <= 0:
+                    return
+                await asyncio.sleep(remaining)
+                for wid in worker_ids:
+                    worker = self._workers.get(wid)
+                    if worker is None or not worker.is_active:
+                        continue
+                    try:
+                        await self.stop_worker(wid)
+                        logger.info(
+                            "watch_batch_timeouts: hard-stopped %s after %ss (no report)",
+                            wid,
+                            hard_timeout,
+                        )
+                    except Exception:
+                        logger.warning(
+                            "watch_batch_timeouts: hard-stop failed for %s",
+                            wid,
+                            exc_info=True,
+                        )
+            except asyncio.CancelledError:
+                raise
+            except Exception:
+                logger.exception("watch_batch_timeouts: watcher crashed")
+
+        task = asyncio.create_task(_watch(), name=f"batch-timeout:{worker_ids[0] if worker_ids else '?'}")
+        # Hold a strong reference until completion. Without this the
+        # task can be garbage-collected during `await asyncio.sleep`,
+        # silently swallowing the soft-timeout inject (the exact bug
+        # surfaced by workers never seeing [SOFT TIMEOUT]).
+        self._background_tasks.add(task)
+        task.add_done_callback(self._background_tasks.discard)
+        return task
+
    # ── Status & Query ──────────────────────────────────────────

    def list_workers(self) -> list[WorkerInfo]:
@@ -0,0 +1,162 @@
+"""Per-colony tool configuration sidecar (``tools.json``).
+
+Lives at ``~/.hive/colonies/{colony_name}/tools.json`` alongside
+``metadata.json``. Kept separate so provenance (queen_name,
+created_at, workers) stays in metadata while the user-editable tool
+allowlist gets its own file.
+
+Schema::
+
+    {
+      "enabled_mcp_tools": ["read_file", ...] | null,
+      "updated_at": "2026-04-21T12:34:56+00:00"
+    }
+
+- ``null`` / missing file → default "allow every MCP tool".
+- ``[]`` → explicitly disable every MCP tool.
+- ``["foo", "bar"]`` → only those MCP tool names pass the filter.
+
+Atomic writes via ``os.replace`` mirror
+``framework.host.colony_metadata.update_colony_metadata``.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+from framework.config import COLONIES_DIR
+
+logger = logging.getLogger(__name__)
+
+
+def tools_config_path(colony_name: str) -> Path:
+    """Return the on-disk path to a colony's ``tools.json``."""
+    return COLONIES_DIR / colony_name / "tools.json"
+
+
+def _metadata_path(colony_name: str) -> Path:
+    return COLONIES_DIR / colony_name / "metadata.json"
+
+
+def _atomic_write_json(path: Path, data: dict[str, Any]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    fd, tmp = tempfile.mkstemp(
+        prefix=".tools.",
+        suffix=".json.tmp",
+        dir=str(path.parent),
+    )
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as fh:
+            json.dump(data, fh, indent=2)
+            fh.flush()
+            os.fsync(fh.fileno())
+        os.replace(tmp, path)
+    except BaseException:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+        raise
+
+
+def _migrate_from_metadata_if_needed(colony_name: str) -> list[str] | None:
+    """Hoist a legacy ``enabled_mcp_tools`` field out of ``metadata.json``.
+
+    Returns the migrated value (or ``None`` if nothing to migrate). After
+    migration the sidecar exists and ``metadata.json`` no longer contains
+    ``enabled_mcp_tools``. Safe to call repeatedly.
+    """
+    meta_path = _metadata_path(colony_name)
+    if not meta_path.exists():
+        return None
+    try:
+        data = json.loads(meta_path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        logger.warning("Could not read metadata.json during tools migration: %s", colony_name)
+        return None
+    if not isinstance(data, dict) or "enabled_mcp_tools" not in data:
+        return None
+
+    raw = data.pop("enabled_mcp_tools")
+    enabled: list[str] | None
+    if raw is None:
+        enabled = None
+    elif isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+        enabled = raw
+    else:
+        logger.warning(
+            "Legacy enabled_mcp_tools on colony %s had unexpected shape %r; dropping",
+            colony_name,
+            raw,
+        )
+        enabled = None
+
+    # Sidecar first so a partial failure leaves the config recoverable.
+    _atomic_write_json(
+        tools_config_path(colony_name),
+        {
+            "enabled_mcp_tools": enabled,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    _atomic_write_json(meta_path, data)
+    logger.info(
+        "Migrated enabled_mcp_tools for colony %s from metadata.json to tools.json",
+        colony_name,
+    )
+    return enabled
+
+
+def load_colony_tools_config(colony_name: str) -> list[str] | None:
+    """Return the colony's MCP tool allowlist, or ``None`` for default-allow.
+
+    Order of resolution:
+    1. ``tools.json`` sidecar (authoritative).
+    2. Legacy ``metadata.json`` field (migrated and deleted on first read).
+    3. ``None`` — default "allow every MCP tool".
+    """
+    path = tools_config_path(colony_name)
+    if path.exists():
+        try:
+            data = json.loads(path.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            logger.warning("Invalid %s; treating as default-allow", path)
+            return None
+        if not isinstance(data, dict):
+            return None
+        raw = data.get("enabled_mcp_tools")
+        if raw is None:
+            return None
+        if isinstance(raw, list) and all(isinstance(x, str) for x in raw):
+            return raw
+        logger.warning("Unexpected enabled_mcp_tools shape in %s; ignoring", path)
+        return None
+
+    return _migrate_from_metadata_if_needed(colony_name)
+
+
+def update_colony_tools_config(
+    colony_name: str,
+    enabled_mcp_tools: list[str] | None,
+) -> list[str] | None:
+    """Persist a colony's MCP allowlist to ``tools.json``.
+
+    Raises ``FileNotFoundError`` if the colony's directory is missing.
+    """
+    colony_dir = COLONIES_DIR / colony_name
+    if not colony_dir.exists():
+        raise FileNotFoundError(f"Colony directory not found: {colony_name}")
+    _atomic_write_json(
+        tools_config_path(colony_name),
+        {
+            "enabled_mcp_tools": enabled_mcp_tools,
+            "updated_at": datetime.now(UTC).isoformat(),
+        },
+    )
+    return enabled_mcp_tools
@@ -111,6 +111,15 @@ class EventType(StrEnum):
    # Retry tracking
    NODE_RETRY = "node_retry"

+    # Stream-health observability. Split from NODE_RETRY so the UI can
+    # distinguish "slow TTFT on a huge context" (healthy, just slow) from
+    # "stream went silent mid-generation" (probable stall) from "we nudged
+    # the model to continue" (recovery), which NODE_RETRY used to conflate.
+    STREAM_TTFT_EXCEEDED = "stream_ttft_exceeded"
+    STREAM_INACTIVE = "stream_inactive"
+    STREAM_NUDGE_SENT = "stream_nudge_sent"
+    TOOL_CALL_REPLAY_DETECTED = "tool_call_replay_detected"
+
    # Worker agent lifecycle
    WORKER_COMPLETED = "worker_completed"
    WORKER_FAILED = "worker_failed"
@@ -800,16 +809,28 @@ class EventBus:
        input_tokens: int,
        output_tokens: int,
        cached_tokens: int = 0,
+        cache_creation_tokens: int = 0,
+        cost_usd: float = 0.0,
        execution_id: str | None = None,
        iteration: int | None = None,
    ) -> None:
-        """Emit LLM turn completion with stop reason and model metadata."""
+        """Emit LLM turn completion with stop reason and model metadata.
+
+        ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+        ``input_tokens`` (already inside provider ``prompt_tokens``).
+        Subscribers should display them, not add them to a total.
+
+        ``cost_usd`` is the USD cost for this turn when known (Anthropic,
+        OpenAI, OpenRouter). 0.0 means unreported (not free).
+        """
        data: dict = {
            "stop_reason": stop_reason,
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cached_tokens": cached_tokens,
+            "cache_creation_tokens": cache_creation_tokens,
+            "cost_usd": cost_usd,
        }
        if iteration is not None:
            data["iteration"] = iteration
@@ -905,24 +926,22 @@ class EventBus:
        self,
        stream_id: str,
        node_id: str,
-        prompt: str = "",
        execution_id: str | None = None,
-        options: list[str] | None = None,
        questions: list[dict] | None = None,
    ) -> None:
        """Emit a user-input request for interactive queen turns.

        Args:
-            options: Optional predefined choices for the user (1-3 items).
-                     The frontend appends an "Other" free-text option
-                     automatically.
-            questions: Optional list of question dicts for multi-question
-                       batches (from ask_user_multiple). Each dict has id,
-                       prompt, and optional options.
+            questions: Optional list of question dicts from ``ask_user``.
+                Each dict has ``id``, ``prompt``, and optional ``options``
+                (2-3 predefined choices). The frontend renders the
+                QuestionWidget for a single-entry list and the
+                MultiQuestionWidget for 2+ entries. Free-text asks (no
+                options) stream the prompt separately as a chat message;
+                auto-block turns have no questions at all and fall back
+                to the normal text input.
        """
-        data: dict[str, Any] = {"prompt": prompt}
-        if options:
-            data["options"] = options
+        data: dict[str, Any] = {}
        if questions:
            data["questions"] = questions
        await self.publish(
@@ -1061,6 +1080,94 @@ class EventBus:
            )
        )

+    async def emit_stream_ttft_exceeded(
+        self,
+        stream_id: str,
+        node_id: str,
+        ttft_seconds: float,
+        limit_seconds: float,
+        execution_id: str | None = None,
+    ) -> None:
+        """Emit when a stream stayed silent past the TTFT budget (no first event)."""
+        await self.publish(
+            AgentEvent(
+                type=EventType.STREAM_TTFT_EXCEEDED,
+                stream_id=stream_id,
+                node_id=node_id,
+                execution_id=execution_id,
+                data={
+                    "ttft_seconds": ttft_seconds,
+                    "limit_seconds": limit_seconds,
+                },
+            )
+        )
+
+    async def emit_stream_inactive(
+        self,
+        stream_id: str,
+        node_id: str,
+        idle_seconds: float,
+        limit_seconds: float,
+        execution_id: str | None = None,
+    ) -> None:
+        """Emit when a stream that had produced events went silent past budget."""
+        await self.publish(
+            AgentEvent(
+                type=EventType.STREAM_INACTIVE,
+                stream_id=stream_id,
+                node_id=node_id,
+                execution_id=execution_id,
+                data={
+                    "idle_seconds": idle_seconds,
+                    "limit_seconds": limit_seconds,
+                },
+            )
+        )
+
+    async def emit_stream_nudge_sent(
+        self,
+        stream_id: str,
+        node_id: str,
+        reason: str,
+        nudge_count: int,
+        execution_id: str | None = None,
+    ) -> None:
+        """Emit when the continue-nudge was injected (recovery, not retry)."""
+        await self.publish(
+            AgentEvent(
+                type=EventType.STREAM_NUDGE_SENT,
+                stream_id=stream_id,
+                node_id=node_id,
+                execution_id=execution_id,
+                data={
+                    "reason": reason,
+                    "nudge_count": nudge_count,
+                },
+            )
+        )
+
+    async def emit_tool_call_replay_detected(
+        self,
+        stream_id: str,
+        node_id: str,
+        tool_name: str,
+        prior_seq: int,
+        execution_id: str | None = None,
+    ) -> None:
+        """Emit when the model is about to re-execute a prior successful call."""
+        await self.publish(
+            AgentEvent(
+                type=EventType.TOOL_CALL_REPLAY_DETECTED,
+                stream_id=stream_id,
+                node_id=node_id,
+                execution_id=execution_id,
+                data={
+                    "tool_name": tool_name,
+                    "prior_seq": prior_seq,
+                },
+            )
+        )
+
    async def emit_worker_completed(
        self,
        stream_id: str,
@@ -16,7 +16,7 @@ from collections import OrderedDict
 from collections.abc import Callable
 from dataclasses import dataclass, field
 from datetime import datetime
-from typing import TYPE_CHECKING, Any
+from typing import TYPE_CHECKING, Any, Literal

 from framework.host.event_bus import EventBus
 from framework.host.shared_state import IsolationLevel, SharedBufferManager
@@ -48,6 +48,8 @@ class ExecutionAlreadyRunningError(RuntimeError):

 logger = logging.getLogger(__name__)

+CancelExecutionResult = Literal["cancelled", "cancelling", "not_found"]
+

 class GraphScopedEventBus(EventBus):
    """Proxy that stamps ``graph_id`` on every published event.
@@ -130,7 +132,7 @@ class ExecutionContext:
    run_id: str | None = None  # Unique ID per trigger() invocation
    started_at: datetime = field(default_factory=datetime.now)
    completed_at: datetime | None = None
-    status: str = "pending"  # pending, running, completed, failed, paused
+    status: str = "pending"  # pending, running, cancelling, completed, failed, paused, cancelled


 class ExecutionManager:
@@ -315,6 +317,22 @@ class ExecutionManager:
        """Return IDs of all currently active executions."""
        return list(self._active_executions.keys())

+    def _get_blocking_execution_ids_locked(self) -> list[str]:
+        """Return executions that still block a replacement from starting.
+
+        An execution continues to block replacement until its task has
+        terminated and the task's final cleanup has removed its bookkeeping.
+        This is intentional: a timed-out cancellation does not mean the old
+        task is harmless. If it is still alive, it can still write shared
+        session state, so letting a replacement start would guarantee
+        overlapping mutations on the same session.
+        """
+        blocking_ids: list[str] = list(self._active_executions.keys())
+        for execution_id, task in self._execution_tasks.items():
+            if not task.done() and execution_id not in self._active_executions:
+                blocking_ids.append(execution_id)
+        return blocking_ids
+
    @property
    def agent_idle_seconds(self) -> float:
        """Seconds since the last agent activity (LLM call, tool call, node transition).
@@ -396,15 +414,22 @@ class ExecutionManager:

    async def stop(self) -> None:
        """Stop the execution stream and cancel active executions."""
-        if not self._running:
-            return
+        async with self._lock:
+            if not self._running:
+                return

-        self._running = False
+            self._running = False

-        # Cancel all active executions
-        tasks_to_wait = []
-        for _, task in self._execution_tasks.items():
-            if not task.done():
+            # Cancel all active executions, but keep bookkeeping until each
+            # task reaches its own cleanup path.
+            tasks_to_wait: list[asyncio.Task] = []
+            for execution_id, task in self._execution_tasks.items():
+                if task.done():
+                    continue
+                ctx = self._active_executions.get(execution_id)
+                if ctx is not None:
+                    ctx.status = "cancelling"
+                self._cancel_reasons.setdefault(execution_id, "Execution cancelled")
                task.cancel()
                tasks_to_wait.append(task)

@@ -418,9 +443,6 @@ class ExecutionManager:
                    len(pending),
                )

-        self._execution_tasks.clear()
-        self._active_executions.clear()
-
        logger.info(f"ExecutionStream '{self.stream_id}' stopped")

        # Emit stream stopped event
@@ -569,12 +591,16 @@ class ExecutionManager:
        )

        async with self._lock:
+            if not self._running:
+                raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")
+
+            blocking_ids = self._get_blocking_execution_ids_locked()
+            if blocking_ids:
+                raise ExecutionAlreadyRunningError(self.stream_id, blocking_ids)
+
            self._active_executions[execution_id] = ctx
            self._completion_events[execution_id] = asyncio.Event()
-
-        # Start execution task
-        task = asyncio.create_task(self._run_execution(ctx))
-        self._execution_tasks[execution_id] = task
+            self._execution_tasks[execution_id] = asyncio.create_task(self._run_execution(ctx))

        logger.debug(f"Queued execution {execution_id} for stream {self.stream_id}")
        return execution_id
@@ -1183,7 +1209,7 @@ class ExecutionManager:
        """Get execution context."""
        return self._active_executions.get(execution_id)

-    async def cancel_execution(self, execution_id: str, *, reason: str | None = None) -> bool:
+    async def cancel_execution(self, execution_id: str, *, reason: str | None = None) -> CancelExecutionResult:
        """
        Cancel a running execution.

@@ -1194,33 +1220,38 @@ class ExecutionManager:
                provided, defaults to "Execution cancelled".

        Returns:
-            True if cancelled, False if not found
+            "cancelled" if the task fully exited within the grace period,
+            "cancelling" if cancellation was requested but the task is still
+            shutting down, or "not_found" if no active task exists.
        """
-        task = self._execution_tasks.get(execution_id)
-        if task and not task.done():
+        async with self._lock:
+            task = self._execution_tasks.get(execution_id)
+            if task is None or task.done():
+                return "not_found"
+
            # Store the reason so the CancelledError handler can use it
            # when emitting the pause/fail event.
            self._cancel_reasons[execution_id] = reason or "Execution cancelled"
+            ctx = self._active_executions.get(execution_id)
+            if ctx is not None:
+                ctx.status = "cancelling"
            task.cancel()
-            # Wait briefly for the task to finish. Don't block indefinitely —
-            # the task may be stuck in a long LLM API call that doesn't
-            # respond to cancellation quickly.
-            done, _ = await asyncio.wait({task}, timeout=5.0)
-            if not done:
-                # Task didn't finish within timeout — clean up bookkeeping now
-                # so the session doesn't think it still has running executions.
-                # The task will continue winding down in the background and its
-                # finally block will harmlessly pop already-removed keys.
-                logger.warning(
-                    "Execution %s did not finish within cancel timeout; force-cleaning bookkeeping",
-                    execution_id,
-                )
-                async with self._lock:
-                    self._active_executions.pop(execution_id, None)
-                    self._execution_tasks.pop(execution_id, None)
-                self._active_executors.pop(execution_id, None)
-            return True
-        return False
+
+        # Wait briefly for the task to finish. Don't block indefinitely —
+        # the task may be stuck in a long LLM API call that doesn't
+        # respond to cancellation quickly.
+        done, _ = await asyncio.wait({task}, timeout=5.0)
+        if not done:
+            # Keep bookkeeping in place until the task's own finally block runs.
+            # We intentionally do not add deferred cleanup keyed by execution_id
+            # here because resumed executions reuse the same id; a delayed pop
+            # could otherwise delete bookkeeping that belongs to the new run.
+            logger.warning(
+                "Execution %s did not finish within cancel timeout; leaving bookkeeping in place until task exit",
+                execution_id,
+            )
+            return "cancelling"
+        return "cancelled"

    # === STATS AND MONITORING ===

@@ -0,0 +1,487 @@
+"""Per-colony SQLite task queue + progress ledger.
+
+Every colony gets its own ``progress.db`` under ``~/.hive/colonies/{name}/data/``.
+The DB holds the colony's task queue plus per-task step and SOP checklist
+rows. Workers claim tasks atomically, write progress as they execute, and
+verify SOP gates before marking a task done. This gives cross-run memory
+that the existing per-iteration stall detectors don't have.
+
+The DB is driven by agents via the ``sqlite3`` CLI through
+``execute_command_tool``. This module handles framework-side lifecycle:
+creation, migration, queen-side bulk seeding, stale-claim reclamation.
+
+Concurrency model:
+- WAL mode on from day one so 100 concurrent workers don't serialize.
+- Workers hold NO long-running connection — they ``sqlite3`` per call,
+  which naturally releases locks between LLM turns.
+- Atomic claim via ``BEGIN IMMEDIATE; UPDATE tasks SET status='claimed'
+  WHERE id=(SELECT ... LIMIT 1)``. The subquery-form UPDATE runs inside
+  the immediate transaction so racers either win the row or find zero
+  affected rows.
+- Stale-claim reclaimer runs on host startup: claims older than
+  ``stale_after_minutes`` get returned to ``pending`` and the row's
+  ``retry_count`` increments. When ``retry_count >= max_retries`` the
+  row is moved to ``failed`` instead.
+
+All writes go through ``BEGIN IMMEDIATE`` so racing readers see
+consistent snapshots.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import sqlite3
+import uuid
+from datetime import UTC, datetime
+from pathlib import Path
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+SCHEMA_VERSION = 1
+
+_SCHEMA_V1 = """
+CREATE TABLE IF NOT EXISTS tasks (
+    id              TEXT PRIMARY KEY,
+    seq             INTEGER,
+    priority        INTEGER NOT NULL DEFAULT 0,
+    goal            TEXT NOT NULL,
+    payload         TEXT,
+    status          TEXT NOT NULL DEFAULT 'pending',
+    worker_id       TEXT,
+    claim_token     TEXT,
+    claimed_at      TEXT,
+    started_at      TEXT,
+    completed_at    TEXT,
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL,
+    retry_count     INTEGER NOT NULL DEFAULT 0,
+    max_retries     INTEGER NOT NULL DEFAULT 3,
+    last_error      TEXT,
+    parent_task_id  TEXT REFERENCES tasks(id) ON DELETE SET NULL,
+    source          TEXT
+);
+
+CREATE TABLE IF NOT EXISTS steps (
+    id              TEXT PRIMARY KEY,
+    task_id         TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
+    seq             INTEGER NOT NULL,
+    title           TEXT NOT NULL,
+    detail          TEXT,
+    status          TEXT NOT NULL DEFAULT 'pending',
+    evidence        TEXT,
+    worker_id       TEXT,
+    started_at      TEXT,
+    completed_at    TEXT,
+    UNIQUE (task_id, seq)
+);
+
+CREATE TABLE IF NOT EXISTS sop_checklist (
+    id              TEXT PRIMARY KEY,
+    task_id         TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
+    key             TEXT NOT NULL,
+    description     TEXT NOT NULL,
+    required        INTEGER NOT NULL DEFAULT 1,
+    done_at         TEXT,
+    done_by         TEXT,
+    note            TEXT,
+    UNIQUE (task_id, key)
+);
+
+CREATE TABLE IF NOT EXISTS colony_meta (
+    key             TEXT PRIMARY KEY,
+    value           TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+
+CREATE INDEX IF NOT EXISTS idx_tasks_claimable
+    ON tasks(status, priority DESC, seq, created_at)
+    WHERE status = 'pending';
+
+CREATE INDEX IF NOT EXISTS idx_steps_task_seq
+    ON steps(task_id, seq);
+
+CREATE INDEX IF NOT EXISTS idx_sop_required_open
+    ON sop_checklist(task_id, required, done_at);
+
+CREATE INDEX IF NOT EXISTS idx_tasks_status
+    ON tasks(status, updated_at);
+"""
+
+_PRAGMAS = (
+    "PRAGMA journal_mode = WAL;",
+    "PRAGMA synchronous = NORMAL;",
+    "PRAGMA foreign_keys = ON;",
+    "PRAGMA busy_timeout = 5000;",
+)
+
+
+def _now_iso() -> str:
+    return datetime.now(UTC).isoformat(timespec="seconds")
+
+
+def _new_id() -> str:
+    return str(uuid.uuid4())
+
+
+def _connect(db_path: Path) -> sqlite3.Connection:
+    """Open a connection with the standard pragmas applied.
+
+    WAL mode is sticky on the file once set, so re-applying on every
+    open is cheap. The other pragmas are per-connection and must be
+    set each time.
+    """
+    con = sqlite3.connect(str(db_path), isolation_level=None, timeout=5.0)
+    for pragma in _PRAGMAS:
+        con.execute(pragma)
+    return con
+
+
+def ensure_progress_db(colony_dir: Path) -> Path:
+    """Create or migrate ``{colony_dir}/data/progress.db``.
+
+    Idempotent: safe to call on an already-initialized DB. Returns the
+    absolute path to the DB file.
+
+    Steps:
+    1. Ensure ``data/`` subdir exists.
+    2. Open the DB (creates the file if missing).
+    3. Apply WAL + pragmas.
+    4. Read ``PRAGMA user_version``; if < SCHEMA_VERSION, run the
+       schema block and bump user_version.
+    5. Reclaim any stale claims left from previous runs.
+    6. Patch every ``*.json`` worker config in the colony dir to
+       inject ``input_data.db_path`` and ``input_data.colony_id`` so
+       pre-existing colonies (forked before this feature landed) get
+       the tracker wiring on their next spawn.
+    """
+    data_dir = Path(colony_dir) / "data"
+    data_dir.mkdir(parents=True, exist_ok=True)
+    db_path = data_dir / "progress.db"
+
+    con = _connect(db_path)
+    try:
+        current_version = con.execute("PRAGMA user_version").fetchone()[0]
+        if current_version < SCHEMA_VERSION:
+            con.executescript(_SCHEMA_V1)
+            con.execute(f"PRAGMA user_version = {SCHEMA_VERSION}")
+            con.execute(
+                "INSERT OR REPLACE INTO colony_meta(key, value, updated_at) VALUES (?, ?, ?)",
+                ("schema_version", str(SCHEMA_VERSION), _now_iso()),
+            )
+            logger.info("progress_db: initialized schema v%d at %s", SCHEMA_VERSION, db_path)
+
+        reclaimed = _reclaim_stale_inner(con, stale_after_minutes=15)
+        if reclaimed:
+            logger.info(
+                "progress_db: reclaimed %d stale claims at startup (%s)",
+                reclaimed,
+                db_path,
+            )
+    finally:
+        con.close()
+
+    resolved_db_path = db_path.resolve()
+    _patch_worker_configs(Path(colony_dir), resolved_db_path)
+    return resolved_db_path
+
+
+def _patch_worker_configs(colony_dir: Path, db_path: Path) -> int:
+    """Inject ``input_data.db_path`` + ``input_data.colony_id`` +
+    ``input_data.colony_data_dir`` into existing ``worker.json`` files
+    in a colony directory.
+
+    Runs on every ``ensure_progress_db`` call so colonies that were
+    forked before this feature landed get their worker spawn messages
+    patched in place. Idempotent: if ``input_data`` already contains
+    all three values, the file is not rewritten.
+
+    Returns the number of files that were actually modified (0 on
+    the common case of already-patched colonies).
+
+    Why ``colony_data_dir``? ``db_path`` alone points agents at
+    ``progress.db``; for anything else (custom SQLite stores, JSON
+    ledgers, scraped artefacts) they need the *directory* so they
+    stop creating state under ``~/.hive/skills/`` — which holds skill
+    *definitions*, not runtime data. See
+    ``_default_skills/colony-storage-paths/SKILL.md``.
+    """
+    colony_id = colony_dir.name
+    abs_db = str(db_path)
+    abs_data_dir = str(db_path.parent)
+    patched = 0
+
+    for worker_cfg in colony_dir.glob("*.json"):
+        # Only patch files that look like worker configs (have the
+        # worker_meta shape). ``metadata.json`` and ``triggers.json``
+        # are colony-level and must not be touched.
+        if worker_cfg.name in ("metadata.json", "triggers.json"):
+            continue
+        try:
+            data = json.loads(worker_cfg.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError):
+            continue
+        if not isinstance(data, dict) or "system_prompt" not in data:
+            # Not a worker config (lacks the worker_meta schema).
+            continue
+
+        input_data = data.get("input_data")
+        if not isinstance(input_data, dict):
+            input_data = {}
+
+        if (
+            input_data.get("db_path") == abs_db
+            and input_data.get("colony_id") == colony_id
+            and input_data.get("colony_data_dir") == abs_data_dir
+        ):
+            continue  # already patched
+
+        input_data["db_path"] = abs_db
+        input_data["colony_id"] = colony_id
+        input_data["colony_data_dir"] = abs_data_dir
+        data["input_data"] = input_data
+
+        try:
+            worker_cfg.write_text(json.dumps(data, indent=2, ensure_ascii=False), encoding="utf-8")
+            patched += 1
+        except OSError as e:
+            logger.warning("progress_db: failed to patch worker config %s: %s", worker_cfg, e)
+
+    if patched:
+        logger.info(
+            "progress_db: patched %d worker config(s) in colony '%s' with db_path + colony_data_dir",
+            patched,
+            colony_id,
+        )
+    return patched
+
+
+def ensure_all_colony_dbs(colonies_root: Path | None = None) -> list[Path]:
+    """Idempotently ensure every existing colony has a progress.db.
+
+    Called on framework host startup to backfill older colonies and
+    run the stale-claim reclaimer on all of them in one pass.
+    """
+    if colonies_root is None:
+        colonies_root = Path.home() / ".hive" / "colonies"
+    if not colonies_root.is_dir():
+        return []
+
+    initialized: list[Path] = []
+    for entry in sorted(colonies_root.iterdir()):
+        if not entry.is_dir():
+            continue
+        try:
+            initialized.append(ensure_progress_db(entry))
+        except Exception as e:
+            logger.warning("progress_db: failed to ensure DB for colony '%s': %s", entry.name, e)
+    return initialized
+
+
+def seed_tasks(
+    db_path: Path,
+    tasks: list[dict[str, Any]],
+    *,
+    source: str = "queen_create",
+) -> list[str]:
+    """Bulk-insert tasks (with optional nested steps + sop_items).
+
+    Each task dict accepts:
+      - goal: str (required)
+      - seq: int (optional ordering hint)
+      - priority: int (default 0)
+      - payload: dict | str | None (stored as JSON text)
+      - max_retries: int (default 3)
+      - parent_task_id: str | None
+      - steps: list[{"title": str, "detail"?: str}] (optional)
+      - sop_items: list[{"key": str, "description": str, "required"?: bool, "note"?: str}] (optional)
+
+    All rows are inserted in a single BEGIN IMMEDIATE transaction so
+    10k-row seeds finish in one disk flush. Returns the created task ids
+    in the same order as input.
+    """
+    if not tasks:
+        return []
+
+    created_ids: list[str] = []
+    now = _now_iso()
+    con = _connect(Path(db_path))
+    try:
+        con.execute("BEGIN IMMEDIATE")
+        for idx, task in enumerate(tasks):
+            goal = task.get("goal")
+            if not goal:
+                raise ValueError(f"task[{idx}] missing required 'goal' field")
+
+            task_id = task.get("id") or _new_id()
+            payload = task.get("payload")
+            if payload is not None and not isinstance(payload, str):
+                payload = json.dumps(payload, ensure_ascii=False)
+
+            con.execute(
+                """
+                INSERT INTO tasks (
+                    id, seq, priority, goal, payload, status,
+                    created_at, updated_at, max_retries, parent_task_id, source
+                ) VALUES (?, ?, ?, ?, ?, 'pending', ?, ?, ?, ?, ?)
+                """,
+                (
+                    task_id,
+                    task.get("seq"),
+                    int(task.get("priority", 0)),
+                    goal,
+                    payload,
+                    now,
+                    now,
+                    int(task.get("max_retries", 3)),
+                    task.get("parent_task_id"),
+                    source,
+                ),
+            )
+
+            for step_seq, step in enumerate(task.get("steps") or [], start=1):
+                if not step.get("title"):
+                    raise ValueError(f"task[{idx}].steps[{step_seq - 1}] missing required 'title'")
+                con.execute(
+                    """
+                    INSERT INTO steps (id, task_id, seq, title, detail, status)
+                    VALUES (?, ?, ?, ?, ?, 'pending')
+                    """,
+                    (
+                        _new_id(),
+                        task_id,
+                        step.get("seq", step_seq),
+                        step["title"],
+                        step.get("detail"),
+                    ),
+                )
+
+            for sop in task.get("sop_items") or []:
+                key = sop.get("key")
+                description = sop.get("description")
+                if not key or not description:
+                    raise ValueError(f"task[{idx}].sop_items missing 'key' or 'description'")
+                con.execute(
+                    """
+                    INSERT INTO sop_checklist
+                        (id, task_id, key, description, required, note)
+                    VALUES (?, ?, ?, ?, ?, ?)
+                    """,
+                    (
+                        _new_id(),
+                        task_id,
+                        key,
+                        description,
+                        1 if sop.get("required", True) else 0,
+                        sop.get("note"),
+                    ),
+                )
+
+            created_ids.append(task_id)
+
+        con.execute("COMMIT")
+    except Exception:
+        con.execute("ROLLBACK")
+        raise
+    finally:
+        con.close()
+
+    return created_ids
+
+
+def enqueue_task(
+    db_path: Path,
+    goal: str,
+    *,
+    steps: list[dict[str, Any]] | None = None,
+    sop_items: list[dict[str, Any]] | None = None,
+    payload: Any = None,
+    priority: int = 0,
+    parent_task_id: str | None = None,
+    source: str = "enqueue_tool",
+) -> str:
+    """Append a single task to an existing queue. Thin wrapper over seed_tasks."""
+    ids = seed_tasks(
+        db_path,
+        [
+            {
+                "goal": goal,
+                "steps": steps,
+                "sop_items": sop_items,
+                "payload": payload,
+                "priority": priority,
+                "parent_task_id": parent_task_id,
+            }
+        ],
+        source=source,
+    )
+    return ids[0]
+
+
+def _reclaim_stale_inner(con: sqlite3.Connection, *, stale_after_minutes: int) -> int:
+    """Reclaim stale claims. Runs inside an existing open connection.
+
+    Two-step:
+    1. Tasks past max_retries go to 'failed' with last_error populated.
+    2. Remaining stale claims return to 'pending', retry_count++.
+    """
+    cutoff_expr = f"datetime('now', '-{int(stale_after_minutes)} minutes')"
+
+    con.execute("BEGIN IMMEDIATE")
+    try:
+        con.execute(
+            f"""
+            UPDATE tasks
+            SET status = 'failed',
+                last_error = COALESCE(last_error, 'exceeded max_retries after stale claim'),
+                completed_at = datetime('now'),
+                updated_at = datetime('now')
+            WHERE status IN ('claimed', 'in_progress')
+              AND claimed_at IS NOT NULL
+              AND claimed_at < {cutoff_expr}
+              AND retry_count >= max_retries
+            """
+        )
+
+        cur = con.execute(
+            f"""
+            UPDATE tasks
+            SET status = 'pending',
+                worker_id = NULL,
+                claim_token = NULL,
+                claimed_at = NULL,
+                started_at = NULL,
+                retry_count = retry_count + 1,
+                updated_at = datetime('now')
+            WHERE status IN ('claimed', 'in_progress')
+              AND claimed_at IS NOT NULL
+              AND claimed_at < {cutoff_expr}
+              AND retry_count < max_retries
+            """
+        )
+        reclaimed = cur.rowcount or 0
+        con.execute("COMMIT")
+        return reclaimed
+    except Exception:
+        con.execute("ROLLBACK")
+        raise
+
+
+def reclaim_stale(db_path: Path, stale_after_minutes: int = 15) -> int:
+    """Public wrapper that opens its own connection."""
+    con = _connect(Path(db_path))
+    try:
+        return _reclaim_stale_inner(con, stale_after_minutes=stale_after_minutes)
+    finally:
+        con.close()
+
+
+__all__ = [
+    "SCHEMA_VERSION",
+    "ensure_progress_db",
+    "ensure_all_colony_dbs",
+    "seed_tasks",
+    "enqueue_task",
+    "reclaim_stale",
+]
@@ -145,6 +145,24 @@ class Worker:
        self.status = WorkerStatus.RUNNING
        self._started_at = time.monotonic()

+        # Scope browser profile (and any other CONTEXT_PARAMS) to this
+        # worker. asyncio.create_task() copies the parent's contextvars,
+        # so without this override every spawned worker inherits the
+        # queen's `profile=<queen_session_id>` and its browser_* tool
+        # calls end up driving the queen's Chrome tab group. Setting
+        # it here (inside the new Task's context) shadows the parent
+        # value without affecting the queen's ongoing calls.
+        try:
+            from framework.loader.tool_registry import ToolRegistry
+
+            ToolRegistry.set_execution_context(profile=self.id)
+        except Exception:
+            logger.debug(
+                "Worker %s: failed to scope browser profile",
+                self.id,
+                exc_info=True,
+            )
+
        try:
            result = await self._agent_loop.execute(self._context)
            duration = time.monotonic() - self._started_at
@@ -170,13 +188,28 @@ class Worker:
        except asyncio.CancelledError:
            self.status = WorkerStatus.STOPPED
            duration = time.monotonic() - self._started_at
-            self._result = WorkerResult(
-                error="Worker stopped by queen",
-                duration_seconds=duration,
-                status="stopped",
-                summary="Worker was cancelled before completion.",
-            )
-            await self._emit_terminal_events(None, force_status="stopped")
+            # Preserve any explicit report the worker's LLM already filed
+            # via ``report_to_parent`` before being cancelled — the caller
+            # cares about that payload even on a hard stop. Only fall back
+            # to the canned "stopped" message when no explicit report exists.
+            explicit = self._explicit_report
+            if explicit is not None:
+                self._result = WorkerResult(
+                    error="Worker stopped by queen after reporting",
+                    duration_seconds=duration,
+                    status=explicit["status"],
+                    summary=explicit["summary"],
+                    data=explicit["data"],
+                )
+                await self._emit_terminal_events(None, force_status=explicit["status"])
+            else:
+                self._result = WorkerResult(
+                    error="Worker stopped by queen",
+                    duration_seconds=duration,
+                    status="stopped",
+                    summary="Worker was cancelled before completion.",
+                )
+                await self._emit_terminal_events(None, force_status="stopped")
            return self._result

        except Exception as exc:
@@ -653,10 +653,17 @@ class AntigravityProvider(LLMProvider):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        import asyncio  # noqa: PLC0415
        import concurrent.futures  # noqa: PLC0415

+        # Antigravity (Google's proprietary endpoint) doesn't expose a
+        # cache_control hook. Concatenate the dynamic suffix so its shape
+        # matches the legacy single-string call site.
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
+
        loop = asyncio.get_running_loop()
        queue: asyncio.Queue[StreamEvent | None] = asyncio.Queue()

@@ -213,9 +213,72 @@ _CACHE_CONTROL_PREFIXES = (
    "glm-",
 )

+# OpenRouter sub-provider prefixes whose upstream API honors `cache_control`.
+# OpenRouter passes the marker through to the underlying provider for these.
+# (See https://openrouter.ai/docs/guides/best-practices/prompt-caching.)
+# OpenAI/DeepSeek/Groq/Grok/Moonshot route through OpenRouter but cache
+# automatically server-side — sending cache_control there is a no-op, not a
+# win, and they need a separate prefix-stability fix to actually get hits.
+_OPENROUTER_CACHE_CONTROL_PREFIXES = (
+    "openrouter/anthropic/",
+    "openrouter/google/gemini-",
+    "openrouter/z-ai/glm",
+    "openrouter/minimax/",
+)
+

 def _model_supports_cache_control(model: str) -> bool:
-    return any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES)
+    if any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES):
+        return True
+    return any(model.startswith(p) for p in _OPENROUTER_CACHE_CONTROL_PREFIXES)
+
+
+def _build_system_message(
+    system: str,
+    system_dynamic_suffix: str | None,
+    model: str,
+) -> dict[str, Any] | None:
+    """Construct the system-role message for the chat completion.
+
+    Returns ``None`` when there is nothing to send.
+
+    Two-block split path — used when the caller supplied a non-empty
+    ``system_dynamic_suffix`` AND the provider honors ``cache_control``
+    (Anthropic, MiniMax, Z-AI/GLM). We emit ``content`` as a list of two
+    text blocks with an ephemeral ``cache_control`` marker on the first
+    block only. The prompt cache keeps the static prefix warm across
+    turns and across iterations within a turn; only the small dynamic
+    tail is recomputed on every request.
+
+    Single-string path — used for every other case (no suffix provided,
+    or provider doesn't honor ``cache_control``). We concatenate
+    ``system`` + ``\\n\\n`` + ``system_dynamic_suffix`` and attach
+    ``cache_control`` to the whole message when the provider supports
+    it. This is byte-identical to the pre-split behavior for all
+    non-cache-control providers (OpenAI, Gemini, Groq, Ollama, etc.).
+    """
+    if not system and not system_dynamic_suffix:
+        return None
+    if system_dynamic_suffix and _model_supports_cache_control(model):
+        content_blocks: list[dict[str, Any]] = []
+        if system:
+            content_blocks.append(
+                {
+                    "type": "text",
+                    "text": system,
+                    "cache_control": {"type": "ephemeral"},
+                }
+            )
+        content_blocks.append({"type": "text", "text": system_dynamic_suffix})
+        return {"role": "system", "content": content_blocks}
+    # Single-string path (legacy or no-cache-control provider).
+    combined = system
+    if system_dynamic_suffix:
+        combined = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
+    sys_msg: dict[str, Any] = {"role": "system", "content": combined}
+    if _model_supports_cache_control(model):
+        sys_msg["cache_control"] = {"type": "ephemeral"}
+    return sys_msg


 # Kimi For Coding uses an Anthropic-compatible endpoint (no /v1 suffix).
@@ -297,6 +360,118 @@ FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"
 MAX_FAILED_REQUEST_DUMPS = 50


+def _extract_cost(response: Any, model: str) -> float:
+    """Pull the USD cost for a non-streaming completion response.
+
+    Sources checked, in priority order:
+      1. ``usage.cost`` — populated when OpenRouter returns native cost via
+         ``usage: {include: true}`` or when ``litellm.include_cost_in_streaming_usage``
+         is on.
+      2. ``response._hidden_params["response_cost"]`` — set by LiteLLM's
+         logging layer after most successful completions.
+      3. ``litellm.completion_cost(...)`` — computes from the model pricing
+         table; works across Anthropic, OpenAI, and OpenRouter as long as the
+         model is in LiteLLM's catalog.
+
+    Returns 0.0 for unpriced models or unexpected response shapes — cost is a
+    display concern, never let it break the hot path. For streaming paths
+    where the aggregate response isn't a full ``ModelResponse``, use
+    :func:`_cost_from_tokens` with the already-extracted token counts.
+    """
+    if response is None:
+        return 0.0
+    usage = getattr(response, "usage", None)
+    usage_cost = getattr(usage, "cost", None) if usage is not None else None
+    if isinstance(usage_cost, (int, float)) and usage_cost > 0:
+        return float(usage_cost)
+
+    hidden = getattr(response, "_hidden_params", None)
+    if isinstance(hidden, dict):
+        hp_cost = hidden.get("response_cost")
+        if isinstance(hp_cost, (int, float)) and hp_cost > 0:
+            return float(hp_cost)
+
+    try:
+        import litellm as _litellm
+
+        computed = _litellm.completion_cost(completion_response=response, model=model)
+        if isinstance(computed, (int, float)) and computed > 0:
+            return float(computed)
+    except Exception as exc:
+        logger.debug("[cost] completion_cost failed for %s: %s", model, exc)
+    return 0.0
+
+
+def _cost_from_tokens(
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+    cached_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+) -> float:
+    """Compute USD cost from already-normalized token counts.
+
+    Used on streaming paths where the aggregate ``response`` is the stream
+    wrapper (not a full ``ModelResponse``) and ``litellm.completion_cost`` on
+    it either no-ops or raises. Calls ``litellm.cost_per_token`` directly
+    with the cache-aware inputs so Anthropic's 5-min-write / cache-read
+    multipliers are applied correctly.
+    """
+    if not model or (input_tokens == 0 and output_tokens == 0):
+        return 0.0
+    try:
+        import litellm as _litellm
+
+        prompt_cost, completion_cost = _litellm.cost_per_token(
+            model=model,
+            prompt_tokens=input_tokens,
+            completion_tokens=output_tokens,
+            cache_read_input_tokens=cached_tokens,
+            cache_creation_input_tokens=cache_creation_tokens,
+        )
+        total = (prompt_cost or 0.0) + (completion_cost or 0.0)
+        return float(total) if total > 0 else 0.0
+    except Exception as exc:
+        logger.debug("[cost] cost_per_token failed for %s: %s", model, exc)
+        return 0.0
+
+
+def _extract_cache_tokens(usage: Any) -> tuple[int, int]:
+    """Pull (cache_read, cache_creation) from a LiteLLM usage object.
+
+    Both are subsets of ``prompt_tokens`` already — providers count them
+    inside the input total. Surface separately for visibility, never sum.
+
+    Field names vary by provider/proxy; check the known shapes in priority
+    order and fall back to 0:
+
+    cache_read:
+      - ``prompt_tokens_details.cached_tokens`` — OpenAI-shape; also what
+        LiteLLM normalizes Anthropic and OpenRouter into.
+      - ``cache_read_input_tokens`` — raw Anthropic field name.
+
+    cache_creation:
+      - ``prompt_tokens_details.cache_write_tokens`` — OpenRouter's
+        normalized field for cache writes (verified empirically against
+        ``openrouter/anthropic/*`` and ``openrouter/z-ai/*`` responses).
+      - ``cache_creation_input_tokens`` — raw Anthropic top-level field.
+    """
+    if not usage:
+        return 0, 0
+    _details = getattr(usage, "prompt_tokens_details", None)
+    cache_read = (
+        getattr(_details, "cached_tokens", 0) or 0
+        if _details is not None
+        else getattr(usage, "cache_read_input_tokens", 0) or 0
+    )
+    cache_creation = (
+        getattr(_details, "cache_write_tokens", 0) or 0
+        if _details is not None
+        else 0
+    ) or (getattr(usage, "cache_creation_input_tokens", 0) or 0)
+    return cache_read, cache_creation
+
+
 def _estimate_tokens(model: str, messages: list[dict]) -> tuple[int, str]:
    """Estimate token count for messages. Returns (token_count, method)."""
    # Try litellm's token counter first
@@ -1015,12 +1190,17 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)

        return LLMResponse(
            content=content,
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=response.choices[0].finish_reason or "",
            raw_response=response,
        )
@@ -1169,8 +1349,16 @@ class LiteLLMProvider(LLMProvider):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
-        """Async version of complete(). Uses litellm.acompletion — non-blocking."""
+        """Async version of complete(). Uses litellm.acompletion — non-blocking.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail. When set and
+        the provider honors ``cache_control``, ``system`` is sent as the
+        cached prefix and the suffix trails as an uncached second content
+        block. Otherwise the two strings are concatenated into a single
+        system message (legacy behavior).
+        """
        # Codex ChatGPT backend requires streaming — route through stream() which
        # already handles Codex quirks and has proper tool call accumulation.
        if self._codex_backend:
@@ -1181,6 +1369,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
            return await self._collect_stream_to_response(stream_iter)

@@ -1188,10 +1377,8 @@ class LiteLLMProvider(LLMProvider):
        if self._claude_code_oauth:
            billing = _claude_code_billing_header(messages)
            full_messages.append({"role": "system", "content": billing})
-        if system:
-            sys_msg: dict[str, Any] = {"role": "system", "content": system}
-            if _model_supports_cache_control(self.model):
-                sys_msg["cache_control"] = {"type": "ephemeral"}
+        sys_msg = _build_system_message(system, system_dynamic_suffix, self.model)
+        if sys_msg is not None:
            full_messages.append(sys_msg)
        full_messages.extend(messages)

@@ -1228,12 +1415,17 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)

        return LLMResponse(
            content=content,
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=response.choices[0].finish_reason or "",
            raw_response=response,
        )
@@ -1619,6 +1811,7 @@ class LiteLLMProvider(LLMProvider):
        messages: list[dict[str, Any]],
        system: str,
        tools: list[Tool],
+        system_dynamic_suffix: str | None = None,
    ) -> list[dict[str, Any]]:
        """Build a JSON-only prompt for models without native tool support."""
        tool_specs = [
@@ -1646,7 +1839,19 @@ class LiteLLMProvider(LLMProvider):
        )
        compat_system = compat_instruction if not system else f"{system}\n\n{compat_instruction}"

-        full_messages: list[dict[str, Any]] = [{"role": "system", "content": compat_system}]
+        # If the routed sub-provider honors cache_control (e.g.
+        # openrouter/anthropic/*), split the static prefix from the dynamic
+        # suffix so the prefix stays cache-warm across turns. Otherwise fall
+        # back to a single concatenated string.
+        system_message = _build_system_message(
+            compat_system,
+            system_dynamic_suffix,
+            self.model,
+        )
+
+        full_messages: list[dict[str, Any]] = []
+        if system_message is not None:
+            full_messages.append(system_message)
        full_messages.extend(messages)
        return [
            message
@@ -1660,9 +1865,21 @@ class LiteLLMProvider(LLMProvider):
        system: str,
        tools: list[Tool],
        max_tokens: int,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
-        """Emulate tool calling via JSON when OpenRouter rejects native tools."""
-        full_messages = self._build_openrouter_tool_compat_messages(messages, system, tools)
+        """Emulate tool calling via JSON when OpenRouter rejects native tools.
+
+        When the routed sub-provider honors ``cache_control`` (e.g.
+        ``openrouter/anthropic/*``), the message builder splits the static
+        prefix from the dynamic suffix so the prefix stays cache-warm.
+        Otherwise the suffix is concatenated into a single system string.
+        """
+        full_messages = self._build_openrouter_tool_compat_messages(
+            messages,
+            system,
+            tools,
+            system_dynamic_suffix=system_dynamic_suffix,
+        )
        kwargs: dict[str, Any] = {
            "model": self.model,
            "messages": full_messages,
@@ -1683,6 +1900,8 @@ class LiteLLMProvider(LLMProvider):
        usage = response.usage
        input_tokens = usage.prompt_tokens if usage else 0
        output_tokens = usage.completion_tokens if usage else 0
+        cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
+        cost_usd = _extract_cost(response, self.model)
        stop_reason = "tool_calls" if tool_calls else (response.choices[0].finish_reason or "stop")

        return LLMResponse(
@@ -1690,6 +1909,9 @@ class LiteLLMProvider(LLMProvider):
            model=response.model or self.model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
+            cost_usd=cost_usd,
            stop_reason=stop_reason,
            raw_response={
                "compat_mode": "openrouter_tool_emulation",
@@ -1704,6 +1926,7 @@ class LiteLLMProvider(LLMProvider):
        system: str,
        tools: list[Tool],
        max_tokens: int,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Fallback stream for OpenRouter models without native tool support."""
        from framework.llm.stream_events import (
@@ -1724,6 +1947,7 @@ class LiteLLMProvider(LLMProvider):
                system=system,
                tools=tools,
                max_tokens=max_tokens,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
        except Exception as e:
            yield StreamErrorEvent(error=str(e), recoverable=False)
@@ -1747,6 +1971,9 @@ class LiteLLMProvider(LLMProvider):
            stop_reason=response.stop_reason,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -1758,6 +1985,7 @@ class LiteLLMProvider(LLMProvider):
        max_tokens: int,
        response_format: dict[str, Any] | None,
        json_mode: bool,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Fallback path: convert non-stream completion to stream events.

@@ -1781,6 +2009,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            )
        except Exception as e:
            yield StreamErrorEvent(error=str(e), recoverable=False)
@@ -1812,6 +2041,9 @@ class LiteLLMProvider(LLMProvider):
            stop_reason=response.stop_reason or "stop",
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -1823,6 +2055,7 @@ class LiteLLMProvider(LLMProvider):
        max_tokens: int = 4096,
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Stream a completion via litellm.acompletion(stream=True).

@@ -1833,6 +2066,9 @@ class LiteLLMProvider(LLMProvider):
        Empty responses (e.g. Gemini stealth rate-limits that return 200
        with no content) are retried with exponential backoff, mirroring
        the retry behaviour of ``_completion_with_rate_limit_retry``.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail. See
+        ``acomplete`` docstring for the two-block split semantics.
        """
        from framework.llm.stream_events import (
            FinishEvent,
@@ -1852,6 +2088,7 @@ class LiteLLMProvider(LLMProvider):
                max_tokens=max_tokens,
                response_format=response_format,
                json_mode=json_mode,
+                system_dynamic_suffix=system_dynamic_suffix,
            ):
                yield event
            return
@@ -1862,6 +2099,7 @@ class LiteLLMProvider(LLMProvider):
                system=system,
                tools=tools,
                max_tokens=max_tokens,
+                system_dynamic_suffix=system_dynamic_suffix,
            ):
                yield event
            return
@@ -1870,10 +2108,8 @@ class LiteLLMProvider(LLMProvider):
        if self._claude_code_oauth:
            billing = _claude_code_billing_header(messages)
            full_messages.append({"role": "system", "content": billing})
-        if system:
-            sys_msg: dict[str, Any] = {"role": "system", "content": system}
-            if _model_supports_cache_control(self.model):
-                sys_msg["cache_control"] = {"type": "ephemeral"}
+        sys_msg = _build_system_message(system, system_dynamic_suffix, self.model)
+        if sys_msg is not None:
            full_messages.append(sys_msg)
        full_messages.extend(messages)

@@ -1959,6 +2195,10 @@ class LiteLLMProvider(LLMProvider):
        if self._codex_backend:
            kwargs.pop("max_tokens", None)
            kwargs.pop("stream_options", None)
+            # Pass store directly to OpenAI in case litellm drops it as unknown
+            if "extra_body" not in kwargs:
+                kwargs["extra_body"] = {}
+            kwargs["extra_body"]["store"] = False

        request_summary = _summarize_request_for_log(kwargs)
        logger.debug(
@@ -2105,37 +2345,46 @@ class LiteLLMProvider(LLMProvider):
                            type(usage).__name__,
                        )
                        cached_tokens = 0
+                        cache_creation_tokens = 0
                        if usage:
                            input_tokens = getattr(usage, "prompt_tokens", 0) or 0
                            output_tokens = getattr(usage, "completion_tokens", 0) or 0
-                            _details = getattr(usage, "prompt_tokens_details", None)
-                            cached_tokens = (
-                                getattr(_details, "cached_tokens", 0) or 0
-                                if _details is not None
-                                else getattr(usage, "cache_read_input_tokens", 0) or 0
-                            )
+                            cached_tokens, cache_creation_tokens = _extract_cache_tokens(usage)
                            logger.debug(
-                                "[tokens] finish-chunk usage: input=%d output=%d cached=%d model=%s",
+                                "[tokens] finish-chunk usage: input=%d output=%d "
+                                "cached=%d cache_creation=%d model=%s",
                                input_tokens,
                                output_tokens,
                                cached_tokens,
+                                cache_creation_tokens,
                                self.model,
                            )

                        logger.debug(
-                            "[tokens] finish event: input=%d output=%d cached=%d stop=%s model=%s",
+                            "[tokens] finish event: input=%d output=%d cached=%d "
+                            "cache_creation=%d stop=%s model=%s",
                            input_tokens,
                            output_tokens,
                            cached_tokens,
+                            cache_creation_tokens,
                            choice.finish_reason,
                            self.model,
                        )
+                        cost_usd = _cost_from_tokens(
+                            self.model,
+                            input_tokens,
+                            output_tokens,
+                            cached_tokens,
+                            cache_creation_tokens,
+                        )
                        tail_events.append(
                            FinishEvent(
                                stop_reason=choice.finish_reason,
                                input_tokens=input_tokens,
                                output_tokens=output_tokens,
                                cached_tokens=cached_tokens,
+                                cache_creation_tokens=cache_creation_tokens,
+                                cost_usd=cost_usd,
                                model=self.model,
                            )
                        )
@@ -2155,19 +2404,36 @@ class LiteLLMProvider(LLMProvider):
                            _usage = calculate_total_usage(chunks=_chunks)
                            input_tokens = _usage.prompt_tokens or 0
                            output_tokens = _usage.completion_tokens or 0
-                            _details = getattr(_usage, "prompt_tokens_details", None)
-                            cached_tokens = (
-                                getattr(_details, "cached_tokens", 0) or 0
-                                if _details is not None
-                                else getattr(_usage, "cache_read_input_tokens", 0) or 0
-                            )
+                            # `calculate_total_usage` aggregates token totals
+                            # but discards `prompt_tokens_details` — which is
+                            # where OpenRouter puts `cached_tokens` and
+                            # `cache_write_tokens`. Recover them directly
+                            # from the most recent chunk that carries usage.
+                            cached_tokens, cache_creation_tokens = 0, 0
+                            for _raw in reversed(_chunks):
+                                _raw_usage = getattr(_raw, "usage", None)
+                                if _raw_usage is None:
+                                    continue
+                                _cr, _cc = _extract_cache_tokens(_raw_usage)
+                                if _cr or _cc:
+                                    cached_tokens, cache_creation_tokens = _cr, _cc
+                                    break
                            logger.debug(
-                                "[tokens] post-loop chunks fallback: input=%d output=%d cached=%d model=%s",
+                                "[tokens] post-loop chunks fallback: input=%d output=%d "
+                                "cached=%d cache_creation=%d model=%s",
                                input_tokens,
                                output_tokens,
                                cached_tokens,
+                                cache_creation_tokens,
                                self.model,
                            )
+                            cost_usd = _cost_from_tokens(
+                                self.model,
+                                input_tokens,
+                                output_tokens,
+                                cached_tokens,
+                                cache_creation_tokens,
+                            )
                            # Patch the FinishEvent already queued with 0 tokens
                            for _i, _ev in enumerate(tail_events):
                                if isinstance(_ev, FinishEvent) and _ev.input_tokens == 0:
@@ -2176,6 +2442,8 @@ class LiteLLMProvider(LLMProvider):
                                        input_tokens=input_tokens,
                                        output_tokens=output_tokens,
                                        cached_tokens=cached_tokens,
+                                        cache_creation_tokens=cache_creation_tokens,
+                                        cost_usd=cost_usd,
                                        model=_ev.model,
                                    )
                                    break
@@ -2386,6 +2654,8 @@ class LiteLLMProvider(LLMProvider):
        tool_calls: list[dict[str, Any]] = []
        input_tokens = 0
        output_tokens = 0
+        cached_tokens = 0
+        cache_creation_tokens = 0
        stop_reason = ""
        model = self.model

@@ -2403,6 +2673,8 @@ class LiteLLMProvider(LLMProvider):
            elif isinstance(event, FinishEvent):
                input_tokens = event.input_tokens
                output_tokens = event.output_tokens
+                cached_tokens = event.cached_tokens
+                cache_creation_tokens = event.cache_creation_tokens
                stop_reason = event.stop_reason
                if event.model:
                    model = event.model
@@ -2415,6 +2687,8 @@ class LiteLLMProvider(LLMProvider):
            model=model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
+            cached_tokens=cached_tokens,
+            cache_creation_tokens=cache_creation_tokens,
            stop_reason=stop_reason,
            raw_response={"tool_calls": tool_calls} if tool_calls else None,
        )
@@ -155,8 +155,11 @@ class MockLLMProvider(LLMProvider):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> LLMResponse:
        """Async mock completion (no I/O, returns immediately)."""
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        return self.complete(
            messages=messages,
            system=system,
@@ -173,6 +176,7 @@ class MockLLMProvider(LLMProvider):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator[StreamEvent]:
        """Stream a mock completion as word-level TextDeltaEvents.

@@ -180,6 +184,8 @@ class MockLLMProvider(LLMProvider):
        TextDeltaEvent with an accumulating snapshot, exercising the full
        streaming pipeline without any API calls.
        """
+        if system_dynamic_suffix:
+            system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        content = self._generate_mock_response(system=system, json_mode=False)
        words = content.split(" ")
        accumulated = ""
@@ -61,14 +61,14 @@
          "label": "Gemini 3 Flash - Fast",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 900000
+          "max_context_tokens": 240000
        },
        {
          "id": "gemini-3.1-pro-preview-customtools",
          "label": "Gemini 3.1 Pro - Best quality",
          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 900000
+          "max_context_tokens": 240000
        }
      ]
    },
@@ -115,13 +115,6 @@
          "max_tokens": 40960,
          "max_context_tokens": 131072
        },
-        {
-          "id": "llama3.1-8b",
-          "label": "Llama 3.1 8B - Fastest production",
-          "recommended": false,
-          "max_tokens": 8192,
-          "max_context_tokens": 32768
-        },
        {
          "id": "zai-glm-4.7",
          "label": "Z.ai GLM 4.7 - Strong coding preview",
@@ -145,15 +138,15 @@
          "id": "MiniMax-M2.7",
          "label": "MiniMax M2.7 - Best coding quality",
          "recommended": true,
-          "max_tokens": 32768,
-          "max_context_tokens": 204800
+          "max_tokens": 40960,
+          "max_context_tokens": 180000
        },
        {
          "id": "MiniMax-M2.5",
          "label": "MiniMax M2.5 - Strong value",
          "recommended": false,
-          "max_tokens": 32768,
-          "max_context_tokens": 204800
+          "max_tokens": 40960,
+          "max_context_tokens": 180000
        }
      ]
    },
@@ -288,14 +281,14 @@
          "label": "GPT-5.4 - Best overall",
          "recommended": true,
          "max_tokens": 128000,
-          "max_context_tokens": 922000
+          "max_context_tokens": 872000
        },
        {
          "id": "anthropic/claude-sonnet-4.6",
          "label": "Claude Sonnet 4.6 - Best coding balance",
          "recommended": false,
          "max_tokens": 64000,
-          "max_context_tokens": 936000
+          "max_context_tokens": 872000
        },
        {
          "id": "anthropic/claude-opus-4.6",
@@ -309,14 +302,42 @@
          "label": "Gemini 3.1 Pro Preview - Long-context reasoning",
          "recommended": false,
          "max_tokens": 32768,
-          "max_context_tokens": 1048576
+          "max_context_tokens": 872000
        },
        {
-          "id": "deepseek/deepseek-v3.2",
-          "label": "DeepSeek V3.2 - Best value",
-          "recommended": false,
+          "id": "qwen/qwen3.6-plus",
+          "label": "Qwen 3.6 Plus - Strong reasoning",
+          "recommended": true,
          "max_tokens": 32768,
-          "max_context_tokens": 163840
+          "max_context_tokens": 240000
+        },
+        {
+          "id": "z-ai/glm-5v-turbo",
+          "label": "GLM-5V Turbo - Vision capable",
+          "recommended": true,
+          "max_tokens": 32768,
+          "max_context_tokens": 192000
+        },
+        {
+          "id": "z-ai/glm-5.1",
+          "label": "GLM-5.1 - Better but Slower",
+          "recommended": true,
+          "max_tokens": 40960,
+          "max_context_tokens": 192000
+        },
+        {
+          "id": "minimax/minimax-m2.7",
+          "label": "Minimax M2.7 - Minimax flagship",
+          "recommended": false,
+          "max_tokens": 40960,
+          "max_context_tokens": 180000
+        },
+        {
+          "id": "xiaomi/mimo-v2-pro",
+          "label": "MiMo V2 Pro - Xiaomi multimodal",
+          "recommended": true,
+          "max_tokens": 64000,
+          "max_context_tokens": 240000
        }
      ]
    }
@@ -347,8 +368,8 @@
      "provider": "minimax",
      "api_key_env_var": "MINIMAX_API_KEY",
      "model": "MiniMax-M2.7",
-      "max_tokens": 32768,
-      "max_context_tokens": 204800,
+      "max_tokens": 40960,
+      "max_context_tokens": 180800,
      "api_base": "https://api.minimax.io/v1"
    },
    "kimi_code": {
@@ -397,4 +418,4 @@
      "api_base": "http://localhost:11434"
    }
  }
-}
+}
@@ -10,12 +10,24 @@ from typing import Any

@dataclass
 class LLMResponse:
-    """Response from an LLM call."""
+    """Response from an LLM call.
+
+    ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+    ``input_tokens`` (providers report them inside ``prompt_tokens``).
+    Surface them for visibility; do not add to a total.
+
+    ``cost_usd`` is the per-call USD cost when the provider / pricing table
+    can produce one (Anthropic, OpenAI, OpenRouter are supported). 0.0 when
+    unknown or unpriced — treat as "unreported", not "free".
+    """

    content: str
    model: str
    input_tokens: int = 0
    output_tokens: int = 0
+    cached_tokens: int = 0
+    cache_creation_tokens: int = 0
+    cost_usd: float = 0.0
    stop_reason: str = ""
    raw_response: Any = None

@@ -110,19 +122,28 @@ class LLMProvider(ABC):
        response_format: dict[str, Any] | None = None,
        json_mode: bool = False,
        max_retries: int | None = None,
+        system_dynamic_suffix: str | None = None,
    ) -> "LLMResponse":
        """Async version of complete(). Non-blocking on the event loop.

        Default implementation offloads the sync complete() to a thread pool.
        Subclasses SHOULD override for native async I/O.
+
+        ``system_dynamic_suffix`` is an optional per-turn tail for providers
+        that honor ``cache_control`` (see LiteLLMProvider for semantics).
+        The default implementation concatenates it onto ``system`` since the
+        sync ``complete()`` path does not support the split.
        """
+        combined_system = system
+        if system_dynamic_suffix:
+            combined_system = f"{system}\n\n{system_dynamic_suffix}" if system else system_dynamic_suffix
        loop = asyncio.get_running_loop()
        return await loop.run_in_executor(
            None,
            partial(
                self.complete,
                messages=messages,
-                system=system,
+                system=combined_system,
                tools=tools,
                max_tokens=max_tokens,
                response_format=response_format,
@@ -137,6 +158,7 @@ class LLMProvider(ABC):
        system: str = "",
        tools: list[Tool] | None = None,
        max_tokens: int = 4096,
+        system_dynamic_suffix: str | None = None,
    ) -> AsyncIterator["StreamEvent"]:
        """
        Stream a completion as an async iterator of StreamEvents.
@@ -147,6 +169,9 @@ class LLMProvider(ABC):
        Tool orchestration is the CALLER's responsibility:
        - Caller detects ToolCallEvent, executes tool, adds result
          to messages, calls stream() again.
+
+        ``system_dynamic_suffix`` is forwarded to ``acomplete``; see its
+        docstring for the two-block split semantics.
        """
        from framework.llm.stream_events import (
            FinishEvent,
@@ -159,6 +184,7 @@ class LLMProvider(ABC):
            system=system,
            tools=tools,
            max_tokens=max_tokens,
+            system_dynamic_suffix=system_dynamic_suffix,
        )
        yield TextDeltaEvent(content=response.content, snapshot=response.content)
        yield TextEndEvent(full_text=response.content)
@@ -166,6 +192,9 @@ class LLMProvider(ABC):
            stop_reason=response.stop_reason,
            input_tokens=response.input_tokens,
            output_tokens=response.output_tokens,
+            cached_tokens=response.cached_tokens,
+            cache_creation_tokens=response.cache_creation_tokens,
+            cost_usd=response.cost_usd,
            model=response.model,
        )

@@ -65,13 +65,23 @@ class ReasoningDeltaEvent:

@dataclass(frozen=True)
 class FinishEvent:
-    """The LLM has finished generating."""
+    """The LLM has finished generating.
+
+    ``cached_tokens`` and ``cache_creation_tokens`` are subsets of
+    ``input_tokens`` — providers count both inside ``prompt_tokens`` already.
+    Surface them separately for visibility; never add to a total.
+
+    ``cost_usd`` is the per-turn USD cost when the provider or LiteLLM's
+    pricing table supplies one; 0.0 means unreported (not free).
+    """

    type: Literal["finish"] = "finish"
    stop_reason: str = ""
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
+    cache_creation_tokens: int = 0
+    cost_usd: float = 0.0
    model: str = ""


@@ -1404,7 +1404,18 @@ class AgentLoader:
            credential_store=credential_store,
        )
        runner._agent_default_skills = None
-        runner._agent_skills = None
+        # Colony workers attached to a SQLite task queue get the
+        # colony-progress-tracker skill pre-activated so its full
+        # claim / step / SOP-gate protocol lands in the system prompt
+        # on turn 0, bypassing the progressive-disclosure catalog
+        # lookup. Triggered by the presence of ``input_data.db_path``
+        # in worker.json (written by fork_session_into_colony and
+        # backfilled by ensure_progress_db for pre-existing colonies).
+        _preactivate: list[str] = []
+        _input_data = first_worker.get("input_data") or {}
+        if isinstance(_input_data, dict) and _input_data.get("db_path"):
+            _preactivate.append("hive.colony-progress-tracker")
+        runner._agent_skills = _preactivate or None
        return runner

    def register_tool(
@@ -21,6 +21,7 @@ import os
 import shutil
 import subprocess
 import sys
+import threading
 from pathlib import Path
 from typing import Any
 from urllib import error as urlerror, parse as urlparse, request as urlrequest
@@ -214,7 +215,13 @@ def cmd_serve(args: argparse.Namespace) -> int:

 def cmd_open(args: argparse.Namespace) -> int:
    """Start the HTTP server and open the dashboard in the browser."""
-    _ping_hive_gateway_availability("hive-open")
+    # Don't block local startup on a best-effort analytics probe.
+    threading.Thread(
+        target=_ping_hive_gateway_availability,
+        args=("hive-open",),
+        daemon=True,
+        name="hive-open-gateway-ping",
+    ).start()
    args.open = True
    return cmd_serve(args)

@@ -497,12 +497,22 @@ class ToolRegistry:
            config["cwd"] = str(resolved_cwd)
            return config

-        # For coder_tools_server, inject --project-root so writes go to the expected workspace
+        # For coder_tools_server, inject --project-root so reads land
+        # in the expected workspace (hive repo, for framework skills
+        # and docs), and inject --write-root so writes land under
+        # ~/.hive/workspace/ instead of polluting the git checkout
+        # with queen-authored skills, ledgers, and scripts. Without
+        # the split, every ``write_file`` call from the queen landed
+        # in the hive repo root.
        if script_name and "coder_tools" in script_name:
            project_root = str(resolved_cwd.parent.resolve())
            args = list(args)
            if "--project-root" not in args:
                args.extend(["--project-root", project_root])
+            if "--write-root" not in args:
+                _write_root = Path.home() / ".hive" / "workspace"
+                _write_root.mkdir(parents=True, exist_ok=True)
+                args.extend(["--write-root", str(_write_root)])
            config["args"] = args

        if os.name == "nt":
@@ -571,8 +581,18 @@ class ToolRegistry:
        tool_cap: int | None = None,
        log_collisions: bool = False,
    ) -> tuple[bool, int, str | None]:
-        """Register a single MCP server with one retry for transient failures."""
+        """Register a single MCP server with one retry for transient failures.
+
+        When ``preserve_existing_tools=True`` and the server's tools are
+        already present from a prior registration, ``register_mcp_server``
+        returns ``count=0`` because every tool was shadowed. That's a
+        no-op success, not a failure — don't retry / warn in that case.
+        Otherwise a duplicate-init path (e.g. a worker spawn re-loading
+        the MCP servers the queen already registered) spams shadow
+        warnings, sleeps 2s, and retries for no reason.
+        """
        name = server_config.get("name", "unknown")
+        already_loaded = bool(self._mcp_server_tools.get(name))
        last_error: str | None = None

        for attempt in range(2):
@@ -585,6 +605,10 @@ class ToolRegistry:
                )
                if count > 0:
                    return True, count, None
+                if already_loaded and preserve_existing_tools:
+                    # All tools shadowed by the prior registration of
+                    # the same server — nothing to do, server is usable.
+                    return True, 0, None
                last_error = "registered 0 tools"
            except Exception as exc:
                last_error = str(exc)
@@ -752,12 +776,18 @@ class ToolRegistry:
                if preserve_existing_tools and mcp_tool.name in self._tools:
                    if log_collisions:
                        origin_server = self._find_mcp_origin_server_for_tool(mcp_tool.name) or "<existing>"
-                        logger.warning(
-                            "MCP tool '%s' from '%s' shadowed by '%s' (loaded first)",
-                            mcp_tool.name,
-                            server_name,
-                            origin_server,
-                        )
+                        # Don't warn when a server is being re-registered
+                        # by itself — that's a redundant-init case (e.g.
+                        # the same tool_registry seeing the same server
+                        # twice via pooled reconnect), not a real
+                        # cross-server shadow worth flagging.
+                        if origin_server != server_name:
+                            logger.warning(
+                                "MCP tool '%s' from '%s' shadowed by '%s' (loaded first)",
+                                mcp_tool.name,
+                                server_name,
+                                origin_server,
+                            )
                    # Skip registration; do not update MCP tool bookkeeping for this server.
                    continue

@@ -89,10 +89,12 @@ class ActiveNodeClientIO(NodeClientIO):
        self._input_result = None

        if self._event_bus is not None:
+            # `prompt` is consumed by the caller separately (callers emit
+            # it as a text delta when needed). The event only carries the
+            # structured questions payload for widget rendering.
            await self._event_bus.emit_client_input_requested(
                stream_id=self.node_id,
                node_id=self.node_id,
-                prompt=prompt,
                execution_id=self._execution_id or None,
            )

@@ -9,8 +9,8 @@ Nodes that need browser access declare ``tools: {policy: "all"}`` in their
 agent.json config.

 Note: the canonical source of truth for browser automation guidance is
-the ``browser-automation`` default skill at
-``core/framework/skills/_default_skills/browser-automation/SKILL.md``.
+the ``browser-automation`` preset skill at
+``core/framework/skills/_preset_skills/browser-automation/SKILL.md``.
 Activate that skill for the full decision tree. This module holds a
 compact subset suitable for direct inlining into a node's system prompt
 when a skill activation is not desired.
@@ -26,33 +26,47 @@ Follow these rules for reliable, efficient browser interaction.
 - **`browser_snapshot`** — compact accessibility tree. Fast, cheap, good
  for static / text-heavy pages where the DOM matches what's visually
  rendered (docs, forms, search results, settings pages).
- **`browser_screenshot`** — visual capture + scale metadata. Use on any
-  complex SPA (LinkedIn, X / Twitter, Reddit, Gmail, Notion, Slack,
-  Discord) and on any site using shadow DOM or virtual scrolling. On
-  those pages, snapshot refs go stale in seconds, shadow contents
-  aren't in the AX tree, and virtual-scrolled elements disappear from
-  the tree entirely — screenshots are the only reliable way to orient.
+- **`browser_screenshot`** — visual capture + scale metadata. Use when
+  the snapshot does not show the thing you need, when refs look stale,
+  or when you need visual position/layout to act. This is common on
+  complex SPAs (LinkedIn, X / Twitter, Reddit, Gmail, Notion, Slack,
+  Discord), shadow DOM, and virtual scrolling.

-Neither tool is "preferred" universally — they're for different jobs.
-Default to snapshot on static pages, screenshot on SPAs and
-shadow-heavy sites. Interaction tools (click/type/fill/scroll) return
-a snapshot automatically, so don't call `browser_snapshot` separately
-after an interaction unless you need a fresh view.
+Use snapshot first for structure and ordinary controls; switch to
+screenshot when snapshot can't find or verify the target. Interaction
+tools (`browser_click`, `browser_type`, `browser_type_focused`,
+`browser_fill`, `browser_scroll`) wait 0.5 s for the page to settle
+after a successful action, then attach a fresh snapshot under the
+`snapshot` key of their result — so don't call `browser_snapshot`
+separately after an interaction unless you need a newer view. Tune
+with `auto_snapshot_mode`: `"default"` (full tree) is the default;
+`"simple"` trims unnamed structural nodes; `"interactive"` returns
+only controls (tightest token footprint); `"off"` skips the capture
+entirely — use when batching several interactions.

 Only fall back to `browser_get_text` for extracting small elements by
 CSS selector.

 ## Coordinates

-`browser_screenshot` delivers the image at the CSS viewport's own
-dimensions, so a pixel you read off the screenshot is the same number
-you pass to `browser_click_coordinate` / `browser_hover_coordinate` /
-`browser_press_at`. `browser_get_rect` and `browser_shadow_query` also
-return CSS px — feed `rect.css.cx` / `rect.css.cy` straight through.
-No scale factors to remember.
+Every browser tool that takes or returns coordinates operates in
+**fractions of the viewport (0..1 for both axes)**. Read a target's
+proportional position off `browser_screenshot` — "this button is
+~35% from the left, ~20% from the top" → pass `(0.35, 0.20)`.
+`browser_get_rect` and `browser_shadow_query` return `rect.cx` /
+`rect.cy` as fractions in the same space. The tools handle the
+fraction → CSS-px multiplication internally; you do not need to
+track image pixels, DPR, or any scale factor.

-Never multiply `getBoundingClientRect()` by `devicePixelRatio` — it's
-already in the right unit.
+Why fractions: every vision model (Claude, GPT-4o, Gemini, local
+VLMs) resizes or tiles images differently before the model sees the
+pixels. Proportions survive every such transform; pixel coordinates
+only "work" per-model and break when you swap backends.
+
+Avoid raw `browser_evaluate` + `getBoundingClientRect()` for coord
+lookup — that returns CSS px and will be wrong when fed to click
+tools. Prefer `browser_get_rect` / `browser_shadow_query`, which
+return fractions.

 ## Rich-text editors (X, LinkedIn DMs, Gmail, Reddit, Slack, Discord)

@@ -62,10 +76,12 @@ ProseMirror only register input as "real" after a native pointer-
 sourced focus event; JS `.focus()` is not enough. Without a real click
 first, the editor stays empty and the send button stays disabled.

-`browser_type` now does this automatically — it clicks the element,
-then inserts text via CDP `Input.insertText` (IME-commit style), which
-rich editors accept cleanly. Before clicking send, verify the submit
-button's `disabled` / `aria-disabled` state via `browser_evaluate`.
+`browser_type` does this automatically when you have a selector — it
+clicks the element, then inserts text via CDP `Input.insertText`.
+For shadow-DOM inputs where selectors can't reach, use
+`browser_click_coordinate` to focus, then `browser_type_focused(text=...)`
+to type into the active element. Before clicking send, verify the
+submit button's `disabled` / `aria-disabled` state via `browser_evaluate`.

 ## Shadow DOM

@@ -79,8 +95,8 @@ reach shadow elements transparently.
 **Shadow-heavy site workflow:**
 1. `browser_screenshot()` → visual image
 2. Identify target visually → pixel `(x, y)` read straight off the image
-3. `browser_click_coordinate(x, y)` → lands via native hit test; inputs
-   get focused regardless of shadow depth
+3. `browser_click_coordinate(x, y)` → lands via native hit test;
+   inputs get focused regardless of shadow depth
 4. Type via `browser_type_focused` (no selector needed — types into the
   already-focused element), or `browser_type` if you have a selector

@@ -543,6 +543,10 @@ class NodeContext:
    # Dynamic memory provider — when set, EventLoopNode rebuilds the
    # system prompt with the latest memory block each iteration.
    dynamic_memory_provider: Any = None  # Callable[[], str] | None
+    # Surgical skills-catalog refresh, same contract as AgentContext's
+    # field of the same name. Lets workers pick up UI-driven skill
+    # toggles without rebuilding the full system prompt each turn.
+    dynamic_skills_catalog_provider: Any = None  # Callable[[], str] | None

    # Skill system prompts — injected by the skill discovery pipeline
    skills_catalog_prompt: str = ""  # Available skills XML catalog
@@ -0,0 +1,180 @@
+"""Regression tests for forced cancellation overlap in ExecutionStream."""
+
+from __future__ import annotations
+
+import asyncio
+from types import SimpleNamespace
+from unittest.mock import MagicMock
+
+import pytest
+
+from framework.host.event_bus import AgentEvent, EventBus, EventType
+from framework.host.execution_manager import (
+    EntryPointSpec,
+    ExecutionAlreadyRunningError,
+    ExecutionManager,
+)
+from framework.orchestrator.edge import GraphSpec
+from framework.orchestrator.goal import Goal
+from framework.orchestrator.orchestrator import ExecutionResult
+
+
+def _build_stream(tmp_path, *, event_bus: EventBus | None = None) -> ExecutionManager:
+    graph = GraphSpec(
+        id="test-graph",
+        goal_id="goal-1",
+        version="1.0.0",
+        entry_node="start",
+        entry_points={"start": "start"},
+        terminal_nodes=[],
+        pause_nodes=[],
+        nodes=[],
+        edges=[],
+    )
+    goal = Goal(id="goal-1", name="goal-1", description="test goal")
+    entry_spec = EntryPointSpec(
+        id="webhook",
+        name="Webhook",
+        entry_node="start",
+        trigger_type="webhook",
+        isolation_level="shared",
+        max_concurrent=1,
+    )
+
+    storage = SimpleNamespace(base_path=tmp_path)
+    stream = ExecutionManager(
+        stream_id="webhook",
+        entry_spec=entry_spec,
+        graph=graph,
+        goal=goal,
+        state_manager=MagicMock(),
+        storage=storage,
+        outcome_aggregator=MagicMock(),
+        event_bus=event_bus,
+    )
+    stream._running = True
+    return stream
+
+
+def _install_blocking_executor(monkeypatch, release: asyncio.Event) -> None:
+    class BlockingExecutor:
+        def __init__(self, *args, **kwargs):
+            self.node_registry = {}
+
+        async def execute(self, *args, **kwargs):
+            while True:
+                try:
+                    await release.wait()
+                    break
+                except asyncio.CancelledError:
+                    continue
+            return ExecutionResult(success=True, output={"ok": True})
+
+    monkeypatch.setattr("framework.host.execution_manager.Orchestrator", BlockingExecutor)
+
+
+@pytest.mark.asyncio
+async def test_forced_cancel_timeout_keeps_stream_locked_until_task_exit(tmp_path, monkeypatch):
+    event_bus = EventBus()
+    stream = _build_stream(tmp_path, event_bus=event_bus)
+    release = asyncio.Event()
+    _install_blocking_executor(monkeypatch, release)
+
+    started_events: list[AgentEvent] = []
+    first_started = asyncio.Event()
+    second_started = asyncio.Event()
+
+    async def on_started(event: AgentEvent) -> None:
+        started_events.append(event)
+        if len(started_events) == 1:
+            first_started.set()
+        elif len(started_events) == 2:
+            second_started.set()
+
+    event_bus.subscribe(
+        event_types=[EventType.EXECUTION_STARTED],
+        handler=on_started,
+        filter_stream="webhook",
+    )
+
+    async def immediate_timeout(_tasks, timeout=None):
+        return set(), set(_tasks)
+
+    execution_id = await stream.execute({}, session_state={"resume_session_id": "session-1"})
+    await asyncio.wait_for(first_started.wait(), timeout=1)
+
+    old_task = stream._execution_tasks[execution_id]
+    monkeypatch.setattr("framework.host.execution_manager.asyncio.wait", immediate_timeout)
+
+    try:
+        cancelled = await stream.cancel_execution(execution_id, reason="forced timeout")
+
+        assert cancelled == "cancelling"
+        assert execution_id in stream._execution_tasks
+        assert execution_id in stream._active_executions
+        assert execution_id in stream._completion_events
+        assert stream._active_executions[execution_id].status == "cancelling"
+        assert not old_task.done()
+
+        with pytest.raises(ExecutionAlreadyRunningError):
+            await stream.execute({}, session_state={"resume_session_id": execution_id})
+
+        assert len(started_events) == 1
+
+        release.set()
+        await asyncio.wait_for(old_task, timeout=1)
+
+        restarted_id = await stream.execute({}, session_state={"resume_session_id": execution_id})
+        assert restarted_id == execution_id
+        await asyncio.wait_for(second_started.wait(), timeout=1)
+    finally:
+        release.set()
+        await asyncio.gather(*stream._execution_tasks.values(), return_exceptions=True)
+
+
+@pytest.mark.asyncio
+async def test_repeated_forced_restarts_do_not_accumulate_parallel_tasks(tmp_path, monkeypatch):
+    event_bus = EventBus()
+    stream = _build_stream(tmp_path, event_bus=event_bus)
+    release = asyncio.Event()
+    _install_blocking_executor(monkeypatch, release)
+
+    started_events: list[AgentEvent] = []
+    first_started = asyncio.Event()
+
+    async def on_started(event: AgentEvent) -> None:
+        started_events.append(event)
+        first_started.set()
+
+    event_bus.subscribe(
+        event_types=[EventType.EXECUTION_STARTED],
+        handler=on_started,
+        filter_stream="webhook",
+    )
+
+    async def immediate_timeout(_tasks, timeout=None):
+        return set(), set(_tasks)
+
+    monkeypatch.setattr("framework.host.execution_manager.asyncio.wait", immediate_timeout)
+
+    execution_id = await stream.execute({}, session_state={"resume_session_id": "session-1"})
+    await asyncio.wait_for(first_started.wait(), timeout=1)
+
+    first_task = stream._execution_tasks[execution_id]
+
+    try:
+        assert await stream.cancel_execution(execution_id, reason="restart-1") == "cancelling"
+
+        with pytest.raises(ExecutionAlreadyRunningError):
+            await stream.execute({}, session_state={"resume_session_id": execution_id})
+
+        with pytest.raises(ExecutionAlreadyRunningError):
+            await stream.execute({}, session_state={"resume_session_id": execution_id})
+
+        assert len(started_events) == 1
+        assert list(stream._execution_tasks) == [execution_id]
+        assert stream._execution_tasks[execution_id] is first_task
+        assert not first_task.done()
+    finally:
+        release.set()
+        await asyncio.wait_for(first_task, timeout=1)
@@ -19,6 +19,12 @@ _REPO_ROOT = Path(__file__).resolve().parent.parent.parent.parent
 _ALLOWED_AGENT_ROOTS: tuple[Path, ...] | None = None


+def _has_encrypted_credentials() -> bool:
+    """Return True when an encrypted credential store already exists on disk."""
+    cred_dir = Path.home() / ".hive" / "credentials" / "credentials"
+    return cred_dir.is_dir() and any(cred_dir.glob("*.enc"))
+
+
 def _get_allowed_agent_roots() -> tuple[Path, ...]:
    """Return resolved allowed root directories for agent loading.

@@ -134,6 +140,25 @@ async def cors_middleware(request: web.Request, handler):
    return response


+@web.middleware
+async def no_cache_api_middleware(request: web.Request, handler):
+    """Prevent browsers from caching API responses.
+
+    Without this, a one-off bad response (e.g. the SPA catch-all leaking
+    index.html for an /api/* URL before a route was registered) can get
+    pinned in the browser's disk cache and replayed forever, since our
+    JSON handlers don't emit ETag/Last-Modified and browsers fall back
+    to heuristic freshness.
+    """
+    try:
+        response = await handler(request)
+    except web.HTTPException as exc:
+        response = exc
+    if request.path.startswith("/api/"):
+        response.headers["Cache-Control"] = "no-store"
+    return response
+
+
@web.middleware
 async def error_middleware(request: web.Request, handler):
    """Catch exceptions and return JSON error responses.
@@ -173,11 +198,12 @@ async def handle_health(request: web.Request) -> web.Response:
    )


-async def handle_browser_status(request: web.Request) -> web.Response:
-    """GET /api/browser/status — proxy the GCU bridge status check server-side.
+async def _probe_browser_bridge() -> dict:
+    """Probe the local GCU bridge and return ``{bridge, connected}``.

-    Checks http://127.0.0.1:9230/status so the browser never makes a
-    cross-origin request that would log ERR_CONNECTION_REFUSED in the console.
+    Shared by the one-shot ``GET /api/browser/status`` handler and the
+    ``/api/browser/status/stream`` SSE feed so both see the same data
+    source.
    """
    import asyncio

@@ -190,17 +216,66 @@ async def handle_browser_status(request: web.Request) -> web.Response:
        await writer.drain()
        raw = await asyncio.wait_for(reader.read(512), timeout=0.5)
        writer.close()
-        # Parse JSON body after the blank line
        if b"\r\n\r\n" in raw:
            body = raw.split(b"\r\n\r\n", 1)[1]
-            import json
+            import json as _json

-            data = json.loads(body)
-            return web.json_response({"bridge": True, "connected": data.get("connected", False)})
+            data = _json.loads(body)
+            return {"bridge": True, "connected": bool(data.get("connected", False))}
    except Exception:
        pass
+    return {"bridge": False, "connected": False}

-    return web.json_response({"bridge": False, "connected": False})
+
+async def handle_browser_status(request: web.Request) -> web.Response:
+    """GET /api/browser/status — proxy the GCU bridge status check server-side.
+
+    Checks http://127.0.0.1:9230/status so the browser never makes a
+    cross-origin request that would log ERR_CONNECTION_REFUSED in the console.
+    """
+    return web.json_response(await _probe_browser_bridge())
+
+
+async def handle_browser_status_stream(request: web.Request) -> web.StreamResponse:
+    """GET /api/browser/status/stream — SSE feed of bridge status.
+
+    Emits a ``status`` event immediately, then again only when the
+    probe result changes. Polls the local bridge every 3s; that's the
+    same cadence the frontend used before, but we absorb it
+    server-side instead of the browser burning a request.
+    """
+    import asyncio
+    import json as _json
+
+    resp = web.StreamResponse(
+        status=200,
+        headers={
+            "Content-Type": "text/event-stream",
+            "Cache-Control": "no-cache, no-transform",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
+    await resp.prepare(request)
+
+    async def _send(event: str, data: dict) -> None:
+        payload = f"event: {event}\ndata: {_json.dumps(data)}\n\n"
+        await resp.write(payload.encode("utf-8"))
+
+    last: tuple | None = None
+    try:
+        while True:
+            status = await _probe_browser_bridge()
+            signature = (status["bridge"], status["connected"])
+            if signature != last:
+                await _send("status", status)
+                last = signature
+            await asyncio.sleep(3.0)
+    except (asyncio.CancelledError, ConnectionResetError):
+        raise
+    except Exception as exc:
+        logger.warning("browser status stream error: %s", exc, exc_info=True)
+    return resp


 def create_app(model: str | None = None) -> web.Application:
@@ -212,7 +287,7 @@ def create_app(model: str | None = None) -> web.Application:
    Returns:
        Configured aiohttp Application ready to run.
    """
-    app = web.Application(middlewares=[cors_middleware, error_middleware])
+    app = web.Application(middlewares=[cors_middleware, no_cache_api_middleware, error_middleware])

    # Initialize credential store (before SessionManager so it can be shared)
    from framework.credentials.store import CredentialStore
@@ -225,57 +300,39 @@ def create_app(model: str | None = None) -> web.Application:

        # Auto-generate credential key for web-only users who never ran the TUI
        if not os.environ.get("HIVE_CREDENTIAL_KEY"):
-            try:
-                from framework.credentials.key_storage import generate_and_save_credential_key
+            if _has_encrypted_credentials():
+                logger.warning(
+                    "HIVE_CREDENTIAL_KEY is missing but encrypted credentials already exist; "
+                    "not generating a replacement key because it would not decrypt existing credentials"
+                )
+            else:
+                try:
+                    from framework.credentials.key_storage import generate_and_save_credential_key

-                generate_and_save_credential_key()
-                logger.info("Generated and persisted HIVE_CREDENTIAL_KEY to ~/.hive/secrets/credential_key")
-            except Exception as exc:
-                logger.warning("Could not auto-persist HIVE_CREDENTIAL_KEY: %s", exc)
+                    generate_and_save_credential_key()
+                    logger.info("Generated and persisted HIVE_CREDENTIAL_KEY to ~/.hive/secrets/credential_key")
+                except Exception as exc:
+                    logger.warning("Could not auto-persist HIVE_CREDENTIAL_KEY: %s", exc)

-        credential_store = CredentialStore.with_aden_sync()
+        # Local server startup should not wait on an eager Aden sync.
+        # The store can still fetch/refresh credentials on demand.
+        if not os.environ.get("HIVE_CREDENTIAL_KEY") and _has_encrypted_credentials():
+            credential_store = CredentialStore.with_env_storage()
+        else:
+            credential_store = CredentialStore.with_aden_sync(auto_sync=False)
    except Exception:
        logger.debug("Encrypted credential store unavailable, using in-memory fallback")
        credential_store = CredentialStore.for_testing({})

    app["credential_store"] = credential_store

-    # Pre-load queen MCP tools once at startup (cached for all sessions)
-    # This avoids rebuilding the tool registry for every queen session
-    from framework.loader.mcp_registry import MCPRegistry
-    from framework.loader.tool_registry import ToolRegistry
-
-    _queen_tool_registry: ToolRegistry | None = None
-    try:
-        _queen_tool_registry = ToolRegistry()
-        import framework.agents.queen as _queen_pkg
-
-        queen_pkg_dir = Path(_queen_pkg.__file__).parent
-        mcp_config = queen_pkg_dir / "mcp_servers.json"
-        if mcp_config.exists():
-            _queen_tool_registry.load_mcp_config(mcp_config)
-            logger.info("Pre-loaded queen MCP tools from %s", mcp_config)
-
-        registry = MCPRegistry()
-        registry.initialize()
-        registry.ensure_defaults()
-        if (queen_pkg_dir / "mcp_registry.json").is_file():
-            _queen_tool_registry.set_mcp_registry_agent_path(queen_pkg_dir)
-        registry_configs, selection_max_tools = registry.load_agent_selection(queen_pkg_dir)
-        if registry_configs:
-            _queen_tool_registry.load_registry_servers(
-                registry_configs,
-                preserve_existing_tools=True,
-                log_collisions=True,
-                max_tools=selection_max_tools,
-            )
-        logger.info("Pre-loaded queen tool registry with %d tools", len(_queen_tool_registry.get_tools()))
-    except Exception as e:
-        logger.warning("Failed to pre-load queen tool registry: %s", e)
-
-    app["queen_tool_registry"] = _queen_tool_registry
+    # Let queen sessions build their registry lazily on first use instead of
+    # paying the MCP discovery cost during `hive open`.
+    app["queen_tool_registry"] = None
    app["manager"] = SessionManager(
-        model=model, credential_store=credential_store, queen_tool_registry=_queen_tool_registry
+        model=model,
+        credential_store=credential_store,
+        queen_tool_registry=None,
    )

    # Register shutdown hook
@@ -284,16 +341,23 @@ def create_app(model: str | None = None) -> web.Application:
    # Health check
    app.router.add_get("/api/health", handle_health)
    app.router.add_get("/api/browser/status", handle_browser_status)
+    app.router.add_get("/api/browser/status/stream", handle_browser_status_stream)

    # Register route modules
+    from framework.server.routes_colony_tools import register_routes as register_colony_tools_routes
+    from framework.server.routes_colony_workers import register_routes as register_colony_worker_routes
    from framework.server.routes_config import register_routes as register_config_routes
    from framework.server.routes_credentials import register_routes as register_credential_routes
    from framework.server.routes_events import register_routes as register_event_routes
    from framework.server.routes_execution import register_routes as register_execution_routes
    from framework.server.routes_logs import register_routes as register_log_routes
+    from framework.server.routes_mcp import register_routes as register_mcp_routes
    from framework.server.routes_messages import register_routes as register_message_routes
+    from framework.server.routes_prompts import register_routes as register_prompt_routes
+    from framework.server.routes_queen_tools import register_routes as register_queen_tools_routes
    from framework.server.routes_queens import register_routes as register_queen_routes
    from framework.server.routes_sessions import register_routes as register_session_routes
+    from framework.server.routes_skills import register_routes as register_skills_routes
    from framework.server.routes_workers import register_routes as register_worker_routes

    register_config_routes(app)
@@ -305,6 +369,12 @@ def create_app(model: str | None = None) -> web.Application:
    register_worker_routes(app)
    register_log_routes(app)
    register_queen_routes(app)
+    register_queen_tools_routes(app)
+    register_colony_tools_routes(app)
+    register_mcp_routes(app)
+    register_colony_worker_routes(app)
+    register_prompt_routes(app)
+    register_skills_routes(app)

    # Static file serving — Option C production mode
    # If frontend/dist/ exists, serve built frontend files on /
@@ -0,0 +1,149 @@
+"""Track fork-compaction status for freshly-forked colony queen sessions.
+
+When ``create_colony`` forks a queen session into a colony, the
+inherited DM transcript is compacted via an LLM call that can legitimately
+exceed the default tool-call timeout (60s). To keep ``create_colony``
+responsive we run that compaction in the background and record its
+status on disk so a subsequent colony session-load can wait for it to
+settle before reading the conversation files.
+
+The status lives at ``<queen_dir>/compaction_status.json``:
+
+    {"status": "in_progress", "started_at": "..."}
+    {"status": "done", "completed_at": "...", "messages_compacted": N, "summary_chars": M}
+    {"status": "failed", "completed_at": "...", "error": "..."}
+
+Only present when a compaction was scheduled for this queen dir — absent
+otherwise. All writes are fail-soft; a missing/corrupt file is treated
+as "no compaction pending".
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from datetime import UTC, datetime
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+_STATUS_FILENAME = "compaction_status.json"
+
+
+def _status_path(queen_dir: Path) -> Path:
+    return Path(queen_dir) / _STATUS_FILENAME
+
+
+def mark_in_progress(queen_dir: Path) -> None:
+    path = _status_path(queen_dir)
+    try:
+        path.parent.mkdir(parents=True, exist_ok=True)
+        path.write_text(
+            json.dumps(
+                {
+                    "status": "in_progress",
+                    "started_at": datetime.now(UTC).isoformat(),
+                },
+                ensure_ascii=False,
+            ),
+            encoding="utf-8",
+        )
+    except OSError:
+        logger.warning(
+            "compaction_status: failed to write 'in_progress' at %s",
+            path,
+            exc_info=True,
+        )
+
+
+def mark_done(
+    queen_dir: Path,
+    *,
+    messages_compacted: int = 0,
+    summary_chars: int = 0,
+) -> None:
+    path = _status_path(queen_dir)
+    try:
+        path.write_text(
+            json.dumps(
+                {
+                    "status": "done",
+                    "completed_at": datetime.now(UTC).isoformat(),
+                    "messages_compacted": messages_compacted,
+                    "summary_chars": summary_chars,
+                },
+                ensure_ascii=False,
+            ),
+            encoding="utf-8",
+        )
+    except OSError:
+        logger.warning(
+            "compaction_status: failed to write 'done' at %s",
+            path,
+            exc_info=True,
+        )
+
+
+def mark_failed(queen_dir: Path, error: str) -> None:
+    path = _status_path(queen_dir)
+    try:
+        path.write_text(
+            json.dumps(
+                {
+                    "status": "failed",
+                    "completed_at": datetime.now(UTC).isoformat(),
+                    "error": (error or "")[:500],
+                },
+                ensure_ascii=False,
+            ),
+            encoding="utf-8",
+        )
+    except OSError:
+        logger.warning(
+            "compaction_status: failed to write 'failed' at %s",
+            path,
+            exc_info=True,
+        )
+
+
+def get_status(queen_dir: Path) -> dict | None:
+    path = _status_path(queen_dir)
+    if not path.exists():
+        return None
+    try:
+        return json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        return None
+
+
+async def await_completion(
+    queen_dir: Path,
+    *,
+    timeout: float = 180.0,
+    poll: float = 0.5,
+) -> dict | None:
+    """Block until compaction leaves 'in_progress' state.
+
+    Returns the final status dict, or ``None`` if no compaction marker
+    exists for this dir. On timeout returns the last observed status
+    (still 'in_progress') so the caller can decide whether to proceed
+    with the raw transcript.
+    """
+    loop = asyncio.get_event_loop()
+    deadline = loop.time() + max(0.0, timeout)
+    last: dict | None = None
+    while True:
+        last = get_status(queen_dir)
+        if last is None:
+            return None
+        if last.get("status") != "in_progress":
+            return last
+        if loop.time() >= deadline:
+            logger.warning(
+                "compaction_status: timed out after %.0fs waiting for %s (proceeding with raw transcript)",
+                timeout,
+                queen_dir,
+            )
+            return last
+        await asyncio.sleep(poll)
@@ -113,10 +113,18 @@ def install_worker_escalation_routing(
        queen_node = session.queen_executor.node_registry.get("queen") if session.queen_executor is not None else None
        if queen_node is None or not hasattr(queen_node, "inject_event"):
            if session.event_bus is not None:
+                # Stream the handoff text so the human sees the worker's
+                # question, then request input so the reply input appears.
+                await session.event_bus.emit_client_output_delta(
+                    stream_id="queen",
+                    node_id="queen",
+                    content=handoff,
+                    snapshot=handoff,
+                    execution_id=session.id,
+                )
                await session.event_bus.emit_client_input_requested(
                    stream_id="queen",
                    node_id="queen",
-                    prompt=handoff,
                    execution_id=session.id,
                )
            return
@@ -175,12 +183,10 @@ def _build_credentials_provider() -> Any:

            adapter = CredentialStoreAdapter.default()
            accounts = adapter.get_all_account_info()
-            tool_provider_map = adapter.get_tool_provider_map()
-            rendered = build_accounts_prompt(
-                accounts,
-                tool_provider_map=tool_provider_map,
-                node_tool_names=None,
-            )
+            # Compact form (no tool_provider_map) — tool schemas already
+            # surface function names; baking the full per-provider list
+            # into the system prompt on every turn was ~2 KB of redundancy.
+            rendered = build_accounts_prompt(accounts)
        except Exception:
            logger.debug("Failed to render ambient credentials block", exc_info=True)
            rendered = ""
@@ -231,7 +237,7 @@ async def materialize_queen_identity(

    phase_state.queen_id = queen_id
    phase_state.queen_profile = queen_profile
-    phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile)
+    phase_state.queen_identity_prompt = format_queen_identity_prompt(queen_profile, max_examples=1)

    if event_bus is not None:
        await event_bus.publish(
@@ -247,6 +253,92 @@ async def materialize_queen_identity(
        )


+def build_queen_tool_registry_bare() -> tuple[Any, dict[str, list[dict[str, Any]]]]:
+    """Build a Queen ``ToolRegistry`` and a (server_name → tools) catalog.
+
+    Used by the Tool Library GET route to populate the MCP tool surface
+    without needing a live queen session. We DO NOT register queen
+    lifecycle tools here (they require a Session stub); the catalog only
+    covers MCP-origin tools, which is what the allowlist gates.
+
+    Loading MCP servers spawns subprocesses, so call this once per
+    backend process and cache the result.
+    """
+    from pathlib import Path
+
+    import framework.agents.queen as _queen_pkg
+    from framework.loader.mcp_registry import MCPRegistry
+    from framework.loader.tool_registry import ToolRegistry
+
+    queen_registry = ToolRegistry()
+    queen_pkg_dir = Path(_queen_pkg.__file__).parent
+
+    mcp_config = queen_pkg_dir / "mcp_servers.json"
+    if mcp_config.exists():
+        try:
+            queen_registry.load_mcp_config(mcp_config)
+        except Exception:
+            logger.warning("build_queen_tool_registry_bare: MCP config failed", exc_info=True)
+
+    try:
+        reg = MCPRegistry()
+        reg.initialize()
+        if (queen_pkg_dir / "mcp_registry.json").is_file():
+            queen_registry.set_mcp_registry_agent_path(queen_pkg_dir)
+        registry_configs, selection_max_tools = reg.load_agent_selection(queen_pkg_dir)
+
+        already = {cfg.get("name") for cfg in registry_configs if cfg.get("name")}
+        extra: list[str] = []
+        try:
+            for entry in reg.list_installed():
+                if entry.get("source") != "local":
+                    continue
+                if not entry.get("enabled", True):
+                    continue
+                name = entry.get("name")
+                if name and name not in already:
+                    extra.append(name)
+        except Exception:
+            pass
+        if extra:
+            try:
+                extra_configs = reg.resolve_for_agent(include=extra)
+                registry_configs = list(registry_configs) + [reg._server_config_to_dict(c) for c in extra_configs]
+            except Exception:
+                logger.debug("build_queen_tool_registry_bare: resolve_for_agent(extra) failed", exc_info=True)
+
+        if registry_configs:
+            queen_registry.load_registry_servers(
+                registry_configs,
+                preserve_existing_tools=True,
+                log_collisions=False,
+                max_tools=selection_max_tools,
+            )
+    except Exception:
+        logger.warning("build_queen_tool_registry_bare: MCP registry load failed", exc_info=True)
+
+    # Build the catalog.
+    tools_by_name = queen_registry.get_tools()
+    server_map = dict(getattr(queen_registry, "_mcp_server_tools", {}) or {})
+    catalog: dict[str, list[dict[str, Any]]] = {}
+    for server_name in sorted(server_map):
+        entries: list[dict[str, Any]] = []
+        for tool_name in sorted(server_map[server_name]):
+            tool = tools_by_name.get(tool_name)
+            if tool is None:
+                continue
+            entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        catalog[server_name] = entries
+
+    return queen_registry, catalog
+
+
 async def create_queen(
    session: Session,
    session_manager: Any,
@@ -268,38 +360,22 @@ async def create_queen(
        queen_loop_config as _base_loop_config,
    )
    from framework.agents.queen.nodes import (
-        _QUEEN_BUILDING_TOOLS,
-        _QUEEN_EDITING_TOOLS,
+        _QUEEN_INCUBATING_TOOLS,
        _QUEEN_INDEPENDENT_TOOLS,
-        _QUEEN_PLANNING_TOOLS,
-        _QUEEN_RUNNING_TOOLS,
-        _QUEEN_STAGING_TOOLS,
-        _appendices,
-        _building_knowledge,
-        _planning_knowledge,
+        _QUEEN_REVIEWING_TOOLS,
+        _QUEEN_WORKING_TOOLS,
        _queen_behavior_always,
-        _queen_behavior_building,
-        _queen_behavior_editing,
        _queen_behavior_independent,
-        _queen_behavior_planning,
-        _queen_behavior_running,
-        _queen_behavior_staging,
        _queen_character_core,
-        _queen_identity_editing,
-        _queen_phase_7,
-        _queen_role_building,
+        _queen_role_incubating,
        _queen_role_independent,
-        _queen_role_planning,
-        _queen_role_running,
-        _queen_role_staging,
+        _queen_role_reviewing,
+        _queen_role_working,
        _queen_style,
-        _queen_tools_building,
-        _queen_tools_editing,
+        _queen_tools_incubating,
        _queen_tools_independent,
-        _queen_tools_planning,
-        _queen_tools_running,
-        _queen_tools_staging,
-        _shared_building_knowledge,
+        _queen_tools_reviewing,
+        _queen_tools_working,
        finalize_queen_prompt,
    )
    from framework.host.event_bus import AgentEvent, EventType
@@ -336,6 +412,45 @@ async def create_queen(
            if (queen_pkg_dir / "mcp_registry.json").is_file():
                queen_registry.set_mcp_registry_agent_path(queen_pkg_dir)
            registry_configs, selection_max_tools = registry.load_agent_selection(queen_pkg_dir)
+
+            # Auto-include every user-added local MCP server that the repo
+            # selection hasn't already loaded. Users register servers via
+            # the `/api/mcp/servers` route (or `hive mcp add`); they live in
+            # ~/.hive/mcp_registry/installed.json with source == "local".
+            # New servers take effect on the next queen session start; the
+            # prompt cache and ToolRegistry are still loaded once per boot.
+            already_loaded_names = {cfg.get("name") for cfg in registry_configs if cfg.get("name")}
+            extra_names: list[str] = []
+            try:
+                for entry in registry.list_installed():
+                    if entry.get("source") != "local":
+                        continue
+                    if not entry.get("enabled", True):
+                        continue
+                    name = entry.get("name")
+                    if not name or name in already_loaded_names:
+                        continue
+                    extra_names.append(name)
+            except Exception:
+                logger.debug("Queen: list_installed() failed while auto-including user servers", exc_info=True)
+
+            if extra_names:
+                try:
+                    extra_configs = registry.resolve_for_agent(include=extra_names)
+                    extra_dicts = [registry._server_config_to_dict(c) for c in extra_configs]
+                    registry_configs = list(registry_configs) + extra_dicts
+                    logger.info(
+                        "Queen: auto-including %d user-added MCP server(s): %s",
+                        len(extra_dicts),
+                        [c.get("name") for c in extra_dicts],
+                    )
+                except Exception:
+                    logger.warning(
+                        "Queen: failed to resolve user-added MCP servers %s",
+                        extra_names,
+                        exc_info=True,
+                    )
+
            if registry_configs:
                results = queen_registry.load_registry_servers(
                    registry_configs,
@@ -348,7 +463,10 @@ async def create_queen(
            logger.warning("Queen: MCP registry config failed to load", exc_info=True)

    # ---- Phase state --------------------------------------------------
-    effective_phase = initial_phase or ("staging" if worker_identity else "planning")
+    # 3-phase model: caller supplies the phase directly (DM → independent,
+    # colony bootstrap → working). Fall back to independent when nothing
+    # is specified — there is no "staging"/"planning" bootstrap anymore.
+    effective_phase = initial_phase or ("working" if worker_identity else "independent")
    phase_state = QueenPhaseState(phase=effective_phase, event_bus=session.event_bus)
    session.phase_state = phase_state

@@ -360,28 +478,6 @@ async def create_queen(
    # when the user adds/removes an integration.
    phase_state.credentials_prompt_provider = _build_credentials_provider()

-    # ---- Track ask rounds during planning ----------------------------
-    # Increment planning_ask_rounds each time the queen requests user
-    # input (ask_user or ask_user_multiple) while in the planning phase.
-    async def _track_planning_asks(event: AgentEvent) -> None:
-        if phase_state.phase != "planning":
-            return
-        # Only count explicit ask_user / ask_user_multiple calls, not
-        # auto-block (text-only turns emit CLIENT_INPUT_REQUESTED with
-        # an empty prompt and no options/questions).
-        data = event.data or {}
-        has_prompt = bool(data.get("prompt"))
-        has_questions = bool(data.get("questions"))
-        has_options = bool(data.get("options"))
-        if has_prompt or has_questions or has_options:
-            phase_state.planning_ask_rounds += 1
-
-    session.event_bus.subscribe(
-        [EventType.CLIENT_INPUT_REQUESTED],
-        _track_planning_asks,
-        filter_stream="queen",
-    )
-
    # ---- Lifecycle tools (always registered) --------------------------
    register_queen_lifecycle_tools(
        queen_registry,
@@ -417,39 +513,99 @@ async def create_queen(
    session._queen_tool_executor = queen_tool_executor  # type: ignore[attr-defined]

    # ---- Partition tools by phase ------------------------------------
-    planning_names = set(_QUEEN_PLANNING_TOOLS)
-    building_names = set(_QUEEN_BUILDING_TOOLS)
-    staging_names = set(_QUEEN_STAGING_TOOLS)
-    running_names = set(_QUEEN_RUNNING_TOOLS)
-    editing_names = set(_QUEEN_EDITING_TOOLS)
    independent_names = set(_QUEEN_INDEPENDENT_TOOLS)
+    incubating_names = set(_QUEEN_INCUBATING_TOOLS)
+    working_names = set(_QUEEN_WORKING_TOOLS)
+    reviewing_names = set(_QUEEN_REVIEWING_TOOLS)

    registered_names = {t.name for t in queen_tools}
-    missing_building = building_names - registered_names
-    if missing_building:
-        logger.warning(
-            "Queen: %d/%d building tools NOT registered: %s",
-            len(missing_building),
-            len(building_names),
-            sorted(missing_building),
-        )
    logger.info("Queen: registered tools: %s", sorted(registered_names))

-    phase_state.planning_tools = [t for t in queen_tools if t.name in planning_names]
-    phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
-    phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
-    phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
-    phase_state.editing_tools = [t for t in queen_tools if t.name in editing_names]
+    phase_state.working_tools = [t for t in queen_tools if t.name in working_names]
+    phase_state.reviewing_tools = [t for t in queen_tools if t.name in reviewing_names]
+    # Incubating tool surface is intentionally minimal (read-only inspection
+    # + create_colony + cancel_incubation) — no MCP tools spliced in, so the
+    # queen stays focused on drafting the spec.
+    phase_state.incubating_tools = [t for t in queen_tools if t.name in incubating_names]

    # Independent phase gets core tools + all MCP tools not claimed by any
    # other phase (coder-tools file I/O, gcu-tools browser, etc.).
-    all_phase_names = planning_names | building_names | staging_names | running_names | editing_names
+    all_phase_names = independent_names | incubating_names | working_names | reviewing_names
    mcp_tools = [t for t in queen_tools if t.name not in all_phase_names]
    phase_state.independent_tools = [t for t in queen_tools if t.name in independent_names] + mcp_tools
    logger.info(
        "Queen: independent tools: %s",
        sorted(t.name for t in phase_state.independent_tools),
    )
+    logger.info(
+        "Queen: incubating tools: %s",
+        sorted(t.name for t in phase_state.incubating_tools),
+    )
+
+    # ---- Per-queen MCP tool allowlist --------------------------------
+    # Capture the set of MCP-origin tool names so the allowlist in
+    # ``QueenPhaseState`` only gates MCP tools (lifecycle and synthetic
+    # tools always pass through). Then apply the queen profile's stored
+    # allowlist (if any) and memoize the filtered independent tool list.
+    mcp_server_tools_map: dict[str, set[str]] = dict(getattr(queen_registry, "_mcp_server_tools", {}))
+    phase_state.mcp_tool_names_all = set().union(*mcp_server_tools_map.values()) if mcp_server_tools_map else set()
+    # The queen's MCP tool allowlist now lives in a dedicated
+    # ``tools.json`` sidecar next to ``profile.yaml``. ``load_queen_tools_config``
+    # migrates any legacy ``enabled_mcp_tools`` field out of profile.yaml
+    # on first read, so existing installs upgrade silently.
+    from framework.agents.queen.queen_tools_config import load_queen_tools_config
+
+    # Build a minimal catalog for default-tool resolution. The full
+    # ``session_manager._mcp_tool_catalog`` snapshot is written further
+    # down the flow; a queen booted for the first time needs the catalog
+    # now so ``@server:NAME`` shorthands in the role-default table can
+    # expand against the just-loaded MCP servers.
+    _boot_catalog: dict[str, list[dict]] = {
+        srv: [{"name": name} for name in sorted(names)] for srv, names in mcp_server_tools_map.items()
+    }
+    # ``queen_dir`` is ``queens/<queen_id>/sessions/<session_id>``; the
+    # allowlist sidecar is keyed by queen_id, not session_id.
+    phase_state.enabled_mcp_tools = load_queen_tools_config(session.queen_name, _boot_catalog)
+    phase_state.rebuild_independent_filter()
+    if phase_state.enabled_mcp_tools is not None:
+        total_mcp = len(phase_state.mcp_tool_names_all)
+        allowed_mcp = len(set(phase_state.enabled_mcp_tools) & phase_state.mcp_tool_names_all)
+        logger.info(
+            "Queen: per-queen MCP allowlist active — %d of %d MCP tools enabled",
+            allowed_mcp,
+            total_mcp,
+        )
+
+    # ---- MCP tool catalog for the frontend ---------------------------
+    # Snapshot per-server tool metadata so the Queen Tools API can render
+    # the tool surface without spawning MCP subprocesses. Keyed by server
+    # name so the UI can group tools by origin. Updated every time a
+    # queen boots, so installing a new server and starting a new queen
+    # session refreshes the catalog.
+    mcp_tool_catalog: dict[str, list[dict[str, Any]]] = {}
+    tools_by_name = {t.name: t for t in queen_tools}
+    for server_name, tool_names in mcp_server_tools_map.items():
+        server_entries: list[dict[str, Any]] = []
+        for tool_name in sorted(tool_names):
+            tool = tools_by_name.get(tool_name)
+            if tool is None:
+                continue
+            server_entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        mcp_tool_catalog[server_name] = server_entries
+    # All queens share one MCP registry, so the catalog is a manager-level
+    # fact; stash it on the SessionManager so the Queen Tools route can
+    # render the tool list even when no queen session is currently live.
+    if session_manager is not None:
+        try:
+            session_manager._mcp_tool_catalog = mcp_tool_catalog  # type: ignore[attr-defined]
+        except Exception:
+            logger.debug("Queen: could not attach mcp_tool_catalog to manager", exc_info=True)

    # ---- Global + queen-scoped memory ----------------------------------
    global_dir, queen_mem_dir = initialize_memory_scopes(session, phase_state)
@@ -466,81 +622,11 @@ async def create_queen(
    # ---- Compose phase-specific prompts ------------------------------
    from framework.agents.queen.nodes import queen_node as _orig_node

-    if worker_identity is None:
-        worker_identity = (
-            "\n\n# Worker Profile\n"
-            "No worker agent loaded. You are operating independently.\n"
-            "Design or build the agent to solve the user's problem "
-            "according to your current phase."
-        )
-
    # Resolve vision-only prompt sections based on the session's LLM.
    # session.llm is immutable for the session's lifetime, so this check
    # is stable — prompts never need to be recomposed mid-session.
    _has_vision = bool(session.llm and supports_image_tool_results(getattr(session.llm, "model", "")))

-    _planning_body = (
-        _queen_character_core
-        + _queen_role_planning
-        + _queen_style
-        + _shared_building_knowledge
-        + _queen_tools_planning
-        + _queen_behavior_always
-        + _queen_behavior_planning
-        + _planning_knowledge
-        + worker_identity
-    )
-    phase_state.prompt_planning = finalize_queen_prompt(_planning_body, _has_vision)
-
-    _building_body = (
-        _queen_character_core
-        + _queen_role_building
-        + _queen_style
-        + _shared_building_knowledge
-        + _queen_tools_building
-        + _queen_behavior_always
-        + _queen_behavior_building
-        + _building_knowledge
-        + _queen_phase_7
-        + _appendices
-        + worker_identity
-    )
-    phase_state.prompt_building = finalize_queen_prompt(_building_body, _has_vision)
-    phase_state.prompt_staging = finalize_queen_prompt(
-        (
-            _queen_character_core
-            + _queen_role_staging
-            + _queen_style
-            + _queen_tools_staging
-            + _queen_behavior_always
-            + _queen_behavior_staging
-            + worker_identity
-        ),
-        _has_vision,
-    )
-    phase_state.prompt_running = finalize_queen_prompt(
-        (
-            _queen_character_core
-            + _queen_role_running
-            + _queen_style
-            + _queen_tools_running
-            + _queen_behavior_always
-            + _queen_behavior_running
-            + worker_identity
-        ),
-        _has_vision,
-    )
-    phase_state.prompt_editing = finalize_queen_prompt(
-        (
-            _queen_identity_editing
-            + _queen_style
-            + _queen_tools_editing
-            + _queen_behavior_always
-            + _queen_behavior_editing
-            + worker_identity
-        ),
-        _has_vision,
-    )
    phase_state.prompt_independent = finalize_queen_prompt(
        (
            _queen_character_core
@@ -552,19 +638,69 @@ async def create_queen(
        ),
        _has_vision,
    )
+    phase_state.prompt_incubating = finalize_queen_prompt(
+        (
+            _queen_character_core
+            + _queen_role_incubating
+            + _queen_style
+            + _queen_tools_incubating
+            + _queen_behavior_always
+        ),
+        _has_vision,
+    )
+    phase_state.prompt_working = finalize_queen_prompt(
+        (_queen_character_core + _queen_role_working + _queen_style + _queen_tools_working + _queen_behavior_always),
+        _has_vision,
+    )
+    phase_state.prompt_reviewing = finalize_queen_prompt(
+        (
+            _queen_character_core
+            + _queen_role_reviewing
+            + _queen_style
+            + _queen_tools_reviewing
+            + _queen_behavior_always
+        ),
+        _has_vision,
+    )

    # ---- Default skill protocols -------------------------------------
    _queen_skill_dirs: list[str] = []
    try:
+        from framework.config import QUEENS_DIR
+        from framework.skills.discovery import ExtraScope
        from framework.skills.manager import SkillsManager, SkillsManagerConfig

-        # Pass project_root so user-scope skills (~/.hive/skills/, ~/.agents/skills/)
-        # are discovered. Queen has no agent-specific project root, so we use its
-        # own directory — the value just needs to be non-None to enable user-scope scanning.
-        _queen_skills_mgr = SkillsManager(SkillsManagerConfig(project_root=Path(__file__).parent))
+        # Queen home backs the queen-UI skill scope and the queen's
+        # override store. The directory already exists (or is created on
+        # demand by queen_profiles.py); treat a missing queen_name as the
+        # default queen to preserve backwards compatibility.
+        _queen_id = getattr(session, "queen_name", None) or "default"
+        _queen_home = QUEENS_DIR / _queen_id
+        _queen_skills_mgr = SkillsManager(
+            SkillsManagerConfig(
+                queen_id=_queen_id,
+                queen_overrides_path=_queen_home / "skills_overrides.json",
+                extra_scope_dirs=[
+                    ExtraScope(
+                        directory=_queen_home / "skills",
+                        label="queen_ui",
+                        priority=2,
+                    )
+                ],
+                # No project_root — queen's project is her own identity;
+                # user-scope discovery still runs without one.
+                project_root=None,
+                skip_community_discovery=True,
+                interactive=False,
+            )
+        )
        _queen_skills_mgr.load()
        phase_state.protocols_prompt = _queen_skills_mgr.protocols_prompt
        phase_state.skills_catalog_prompt = _queen_skills_mgr.skills_catalog_prompt
+        # Also store the manager so get_current_prompt() can render a
+        # phase-filtered catalog on each turn (skills with a `visibility`
+        # frontmatter that excludes the current phase are dropped).
+        phase_state.skills_manager = _queen_skills_mgr
        _queen_skill_dirs = _queen_skills_mgr.allowlisted_dirs
    except Exception:
        logger.debug("Queen skill loading failed (non-fatal)", exc_info=True)
@@ -596,8 +732,37 @@ async def create_queen(

    # ---- Recall on each real user turn --------------------------------
    async def _recall_on_user_input(event: AgentEvent) -> None:
-        """Re-select memories when real user input arrives."""
-        await _refresh_recall_cache((event.data or {}).get("content", ""))
+        """On real user input, freeze the dynamic system-prompt suffix and
+        refresh recall memories in the background.
+
+        The EventBus drops handlers that exceed 15s, so we MUST return fast.
+        Recall selection queries the LLM and can take >15s on slow backends;
+        we fire it off as a background task and re-stamp the suffix when it
+        completes. The immediate refresh_dynamic_suffix call stamps a fresh
+        timestamp using the last-known recall blocks so every iteration of
+        THIS user turn sees a byte-stable prompt (prompt cache hits on the
+        static block). Phase-change injections and worker-report injections
+        go through agent_loop.inject_event() and do NOT publish
+        CLIENT_INPUT_RECEIVED, so this runs exactly once per real user turn.
+        """
+        query = (event.data or {}).get("content", "")
+        # Immediate: stamp "now" into the frozen suffix, using whatever
+        # recall blocks we already cached (from the prior turn or seeding).
+        phase_state.refresh_dynamic_suffix()
+
+        async def _bg_refresh() -> None:
+            try:
+                await _refresh_recall_cache(query)
+                # Re-stamp with the fresh recall blocks. Any iteration that
+                # read the suffix before this point used the older recall
+                # — acceptable; recall was already eventual-consistency.
+                phase_state.refresh_dynamic_suffix()
+            except Exception:
+                logger.debug("background recall refresh failed", exc_info=True)
+
+        import asyncio as _asyncio
+
+        _asyncio.create_task(_bg_refresh())

    session.event_bus.subscribe(
        [EventType.CLIENT_INPUT_RECEIVED],
@@ -632,7 +797,7 @@ async def create_queen(
        except FileNotFoundError:
            logger.warning("Queen profile %s not found after selection", queen_id)
            return None
-        identity_prompt = format_queen_identity_prompt(profile)
+        identity_prompt = format_queen_identity_prompt(profile, max_examples=1)
        # Store on phase_state so identity persists across dynamic prompt refreshes
        phase_state.queen_id = queen_id
        phase_state.queen_profile = profile
@@ -707,6 +872,9 @@ async def create_queen(
            except Exception:
                logger.debug("recall: initial seeding failed", exc_info=True)

+        # Freeze the dynamic suffix once so the first real turn sends a
+        # byte-stable prompt even before CLIENT_INPUT_RECEIVED fires.
+        phase_state.refresh_dynamic_suffix()
        return HookResult(system_prompt=phase_state.get_current_prompt())

    # ---- Colony preparation -------------------------------------------
@@ -743,6 +911,18 @@ async def create_queen(

    async def _queen_loop():
        logger.debug("[_queen_loop] Starting queen loop for session %s", session.id)
+        # Scope the browser profile to this session so parallel queens each
+        # drive their own Chrome tab group instead of fighting over "default".
+        # Browser tools run in a stdio MCP subprocess, so we can't set a
+        # contextvar across processes — instead we inject `profile` as a
+        # CONTEXT_PARAM that ToolRegistry passes into every MCP call. The
+        # token stays local to this task.
+        try:
+            from framework.loader.tool_registry import ToolRegistry
+
+            ToolRegistry.set_execution_context(profile=session.id)
+        except Exception:
+            logger.debug("Queen: failed to set browser profile for session %s", session.id, exc_info=True)
        try:
            lc = _queen_loop_config
            queen_loop_config = LoopConfig(
@@ -794,7 +974,8 @@ async def create_queen(
                stream_id="queen",
                execution_id=session.id,
                dynamic_tools_provider=phase_state.get_current_tools,
-                dynamic_prompt_provider=phase_state.get_current_prompt,
+                dynamic_prompt_provider=phase_state.get_static_prompt,
+                dynamic_prompt_suffix_provider=phase_state.get_dynamic_suffix,
                iteration_metadata_provider=lambda: {"phase": phase_state.phase},
                skills_catalog_prompt=phase_state.skills_catalog_prompt,
                protocols_prompt=phase_state.protocols_prompt,
@@ -810,44 +991,71 @@ async def create_queen(

            phase_state.inject_notification = _inject_phase_notification

-            async def _on_worker_done(event):
+            async def _on_worker_report(event):
+                """Inject [WORKER_REPORT] into queen as each worker finishes.
+
+                Subscribes to SUBAGENT_REPORT events which carry the worker's
+                real summary/data (preferring any explicit ``report_to_parent``
+                call). Every spawned worker emits exactly one — success,
+                partial, failed, timeout, or stopped. The queen sees the
+                report as the next user turn and can react (reply to user,
+                kick off follow-up work, etc.) without being blocked by the
+                spawn call itself.
+                """
                if event.stream_id == "queen":
                    return
-                if phase_state.phase == "running":
-                    if event.type == EventType.EXECUTION_COMPLETED:
-                        session.worker_configured = True
-                        output = event.data.get("output", {})
-                        output_summary = ""
-                        if output:
-                            for key, value in output.items():
-                                val_str = str(value)
-                                if len(val_str) > 200:
-                                    val_str = val_str[:200] + "..."
-                                output_summary += f"\n  {key}: {val_str}"
-                        _out = output_summary or " (no output keys set)"
-                        notification = (
-                            "[WORKER_TERMINAL] Worker finished successfully.\n"
-                            f"Output:{_out}\n"
-                            "Report this to the user. "
-                            "Ask if they want to re-run with different input "
-                            "or tweak the configuration."
-                        )
-                    else:
-                        error = event.data.get("error", "Unknown error")
-                        notification = (
-                            "[WORKER_TERMINAL] Worker failed.\n"
-                            f"Error: {error}\n"
-                            "Report this to the user and help them troubleshoot. "
-                            "You can re-run with different input or escalate to "
-                            "building/planning if code changes are needed."
-                        )
+                data = event.data or {}
+                worker_id = data.get("worker_id", event.node_id or "unknown")
+                status = data.get("status", "unknown")
+                summary = data.get("summary") or "(no summary)"
+                err = data.get("error")
+                payload_data = data.get("data") or {}
+                duration = data.get("duration_seconds")

-                    await agent_loop.inject_event(notification)
-                    await phase_state.switch_to_editing(source="auto")
+                lines = ["[WORKER_REPORT]", f"worker_id: {worker_id}", f"status: {status}"]
+                if duration is not None:
+                    try:
+                        lines.append(f"duration: {float(duration):.1f}s")
+                    except (TypeError, ValueError):
+                        pass
+                lines.append(f"summary: {summary}")
+                if err:
+                    lines.append(f"error: {err}")
+                if payload_data:
+                    # Compact JSON so the queen sees all keys without the
+                    # indentation blowing up the turn's token count.
+                    try:
+                        import json as _json
+
+                        lines.append("data: " + _json.dumps(payload_data, ensure_ascii=False, default=str))
+                    except Exception:
+                        lines.append(f"data: {payload_data!r}")
+                notification = "\n".join(lines)
+
+                await agent_loop.inject_event(notification)
+                session.worker_configured = True
+
+                # Only transition to reviewing once the batch has quieted —
+                # if other workers from a parallel spawn are still live, stay
+                # in working so the queen's tool access (run_parallel_workers,
+                # inject_message, stop_worker) remains available.
+                colony_runtime = getattr(session, "colony_runtime", None)
+                still_active = 0
+                if colony_runtime is not None:
+                    try:
+                        still_active = sum(
+                            1
+                            for w in colony_runtime._workers.values()  # type: ignore[attr-defined]
+                            if getattr(w, "is_active", False)
+                        )
+                    except Exception:
+                        still_active = 0
+                if still_active == 0 and phase_state.phase in ("working", "running"):
+                    await phase_state.switch_to_reviewing(source="auto")

            session.event_bus.subscribe(
-                event_types=[EventType.EXECUTION_COMPLETED, EventType.EXECUTION_FAILED],
-                handler=_on_worker_done,
+                event_types=[EventType.SUBAGENT_REPORT],
+                handler=_on_worker_report,
            )

            # ---- Colony-scoped worker escalation routing ----
@@ -0,0 +1,329 @@
+"""Per-colony MCP tool allowlist routes.
+
+- GET   /api/colony/{colony_name}/tools  -- enumerate colony tool surface
+- PATCH /api/colony/{colony_name}/tools  -- set or clear the allowlist
+
+A colony's tool set is inherited from the queen that forked it, so the
+tool surface mirrors the queen's MCP servers. Lifecycle/synthetic tools
+are included for display only. MCP tools are grouped by origin server
+with per-tool ``enabled`` flags.
+
+Semantics:
+
+- ``enabled_mcp_tools: null``  →  allow every MCP tool (default).
+- ``enabled_mcp_tools: []``    →  allow no MCP tools (only lifecycle /
+  synthetic pass through).
+- ``enabled_mcp_tools: [...]`` →  only listed names pass.
+
+The allowlist is persisted in a dedicated ``tools.json`` sidecar at
+``~/.hive/colonies/{colony_name}/tools.json``. Changes take effect on
+the *next* worker spawn. In-flight workers keep the tool list they
+booted with because workers have no dynamic tools provider today —
+mutating their tool set mid-turn would diverge from the list the LLM
+is already using.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.host.colony_metadata import colony_metadata_path
+from framework.host.colony_tools_config import (
+    load_colony_tools_config,
+    update_colony_tools_config,
+)
+
+logger = logging.getLogger(__name__)
+
+
+_SYNTHETIC_NAMES = {"ask_user"}
+
+
+def _synthetic_entries() -> list[dict[str, Any]]:
+    try:
+        from framework.agent_loop.internals.synthetic_tools import build_ask_user_tool
+
+        tool = build_ask_user_tool()
+        return [
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        ]
+    except Exception:
+        return [
+            {
+                "name": "ask_user",
+                "description": "Pause and ask the user a structured question.",
+                "editable": False,
+            }
+        ]
+
+
+def _colony_runtimes_for_name(manager: Any, colony_name: str) -> list[Any]:
+    """Return every live ColonyRuntime whose session is attached to ``colony_name``."""
+    sessions = getattr(manager, "_sessions", None) or {}
+    runtimes: list[Any] = []
+    for session in sessions.values():
+        if getattr(session, "colony_name", None) != colony_name:
+            continue
+        # Both ``session.colony`` (queen-side unified runtime) and
+        # ``session.colony_runtime`` (legacy worker runtime) may carry
+        # tools that need the allowlist applied. We update both.
+        for attr in ("colony", "colony_runtime"):
+            rt = getattr(session, attr, None)
+            if rt is not None and rt not in runtimes:
+                runtimes.append(rt)
+    return runtimes
+
+
+async def _render_catalog(manager: Any, colony_name: str) -> dict[str, list[dict[str, Any]]]:
+    """Build a per-server tool catalog for this colony.
+
+    All colonies inherit the queen's MCP surface, so we reuse the
+    manager-level ``_mcp_tool_catalog`` populated during queen boot.
+    """
+    # If a live runtime exists and carries its own registry, prefer it —
+    # it's authoritative (reflects any post-queen-boot MCP additions).
+    for rt in _colony_runtimes_for_name(manager, colony_name):
+        tools = getattr(rt, "_tools", None)
+        if not tools:
+            continue
+        mcp_names = set(getattr(rt, "_mcp_tool_names_all", set()) or set())
+        if not mcp_names:
+            continue
+        catalog: dict[str, list[dict[str, Any]]] = {"(mcp)": []}
+        for tool in tools:
+            name = getattr(tool, "name", None)
+            if name in mcp_names:
+                catalog["(mcp)"].append(
+                    {
+                        "name": name,
+                        "description": getattr(tool, "description", ""),
+                        "input_schema": getattr(tool, "parameters", {}),
+                    }
+                )
+        return catalog
+
+    # Otherwise fall back to the queen-level snapshot. Build it on demand
+    # (off the event loop) when empty so the Tool Library works before
+    # any queen has been started in this process.
+    cached = getattr(manager, "_mcp_tool_catalog", None)
+    if isinstance(cached, dict) and cached:
+        return cached
+    try:
+        import asyncio
+
+        from framework.server.queen_orchestrator import build_queen_tool_registry_bare
+
+        registry, built = await asyncio.to_thread(build_queen_tool_registry_bare)
+        if manager is not None:
+            manager._mcp_tool_catalog = built  # type: ignore[attr-defined]
+            manager._bootstrap_tool_registry = registry  # type: ignore[attr-defined]
+        return built
+    except Exception:
+        logger.warning("Colony tools: catalog bootstrap failed", exc_info=True)
+        return {}
+
+
+def _lifecycle_entries_from_runtime(manager: Any, colony_name: str) -> list[dict[str, Any]]:
+    """Non-MCP tools currently registered on the colony runtime (if any).
+
+    When no live runtime is available we fall back to the bootstrap
+    registry stashed on the manager by ``routes_queen_tools`` — it
+    already has queen lifecycle tools registered, which are also the
+    lifecycle tools colonies inherit at spawn time.
+    """
+    out: list[dict[str, Any]] = []
+    seen: set[str] = set()
+
+    def _push(name: str, description: str) -> None:
+        if not name or name in seen:
+            return
+        if name in _SYNTHETIC_NAMES:
+            return
+        seen.add(name)
+        out.append({"name": name, "description": description, "editable": False})
+
+    runtimes = _colony_runtimes_for_name(manager, colony_name)
+    if runtimes:
+        for rt in runtimes:
+            mcp_names = set(getattr(rt, "_mcp_tool_names_all", set()) or set())
+            for tool in getattr(rt, "_tools", []) or []:
+                name = getattr(tool, "name", None)
+                if name in mcp_names:
+                    continue
+                _push(name, getattr(tool, "description", ""))
+    else:
+        # No live runtime — derive from the bootstrap registry.
+        from framework.server.routes_queen_tools import _lifecycle_entries_without_session
+
+        catalog = getattr(manager, "_mcp_tool_catalog", {}) or {}
+        mcp_names: set[str] = set()
+        for entries in catalog.values():
+            for entry in entries:
+                if entry.get("name"):
+                    mcp_names.add(entry["name"])
+        out.extend(_lifecycle_entries_without_session(manager, mcp_names))
+        return out
+    return sorted(out, key=lambda e: e["name"])
+
+
+def _render_servers(
+    catalog: dict[str, list[dict[str, Any]]],
+    enabled_mcp_tools: list[str] | None,
+) -> list[dict[str, Any]]:
+    allowed: set[str] | None = None if enabled_mcp_tools is None else set(enabled_mcp_tools)
+    servers: list[dict[str, Any]] = []
+    for name in sorted(catalog):
+        tools = []
+        for entry in catalog[name]:
+            tool_name = entry.get("name")
+            tools.append(
+                {
+                    "name": tool_name,
+                    "description": entry.get("description", ""),
+                    "input_schema": entry.get("input_schema", {}),
+                    "enabled": True if allowed is None else tool_name in allowed,
+                }
+            )
+        servers.append({"name": name, "tools": tools})
+    return servers
+
+
+async def handle_get_tools(request: web.Request) -> web.Response:
+    """GET /api/colony/{colony_name}/tools."""
+    colony_name = request.match_info["colony_name"]
+    if not colony_metadata_path(colony_name).exists():
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    manager = request.app.get("manager")
+    # Allowlist now lives in a dedicated tools.json sidecar; helper
+    # migrates any legacy metadata.json field on first read.
+    enabled = load_colony_tools_config(colony_name)
+
+    catalog = await _render_catalog(manager, colony_name)
+    stale = not catalog
+
+    return web.json_response(
+        {
+            "colony_name": colony_name,
+            "enabled_mcp_tools": enabled,
+            "stale": stale,
+            "lifecycle": _lifecycle_entries_from_runtime(manager, colony_name),
+            "synthetic": _synthetic_entries(),
+            "mcp_servers": _render_servers(catalog, enabled),
+        }
+    )
+
+
+async def handle_patch_tools(request: web.Request) -> web.Response:
+    """PATCH /api/colony/{colony_name}/tools."""
+    colony_name = request.match_info["colony_name"]
+    if not colony_metadata_path(colony_name).exists():
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict) or "enabled_mcp_tools" not in body:
+        return web.json_response(
+            {"error": "Body must be an object with an 'enabled_mcp_tools' field"},
+            status=400,
+        )
+
+    enabled = body["enabled_mcp_tools"]
+    if enabled is not None:
+        if not isinstance(enabled, list) or not all(isinstance(x, str) for x in enabled):
+            return web.json_response(
+                {"error": "'enabled_mcp_tools' must be null or a list of strings"},
+                status=400,
+            )
+
+    manager = request.app.get("manager")
+
+    # Validate names against the known MCP catalog — lifts the same
+    # typo-catching guarantee we already offer on queen tools.
+    catalog = await _render_catalog(manager, colony_name)
+    known: set[str] = {e.get("name") for entries in catalog.values() for e in entries if e.get("name")}
+    if enabled is not None and known:
+        unknown = sorted(set(enabled) - known)
+        if unknown:
+            return web.json_response(
+                {"error": "Unknown MCP tool name(s)", "unknown": unknown},
+                status=400,
+            )
+
+    # Persist — tools.json sidecar, not metadata.json. Missing directory
+    # is already guarded by the 404 check above.
+    try:
+        update_colony_tools_config(colony_name, enabled)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Colony '{colony_name}' not found"}, status=404)
+
+    # Update any live runtimes so the NEXT worker spawn reflects the change.
+    # We do NOT rebuild in-flight workers' tool lists (see module docstring).
+    refreshed = 0
+    for rt in _colony_runtimes_for_name(manager, colony_name):
+        setter = getattr(rt, "set_tool_allowlist", None)
+        if callable(setter):
+            try:
+                setter(enabled)
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Colony tools: set_tool_allowlist failed on runtime for %s",
+                    colony_name,
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Colony tools: colony=%s allowlist=%s refreshed_runtimes=%d",
+        colony_name,
+        "null" if enabled is None else f"{len(enabled)} tool(s)",
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "colony_name": colony_name,
+            "enabled_mcp_tools": enabled,
+            "refreshed_runtimes": refreshed,
+            "note": "Changes apply to the next worker spawn. Running workers keep their booted tool list.",
+        }
+    )
+
+
+async def handle_list_colonies(request: web.Request) -> web.Response:
+    """GET /api/colonies — list colonies with their tool allowlist status.
+
+    Powers the Tool Library page's colony picker.
+    """
+    from framework.host.colony_metadata import list_colony_names, load_colony_metadata
+
+    colonies: list[dict[str, Any]] = []
+    for name in list_colony_names():
+        meta = load_colony_metadata(name)
+        # Provenance stays in metadata.json; allowlist lives in tools.json.
+        allowlist = load_colony_tools_config(name)
+        colonies.append(
+            {
+                "name": name,
+                "queen_name": meta.get("queen_name"),
+                "created_at": meta.get("created_at"),
+                "has_allowlist": allowlist is not None,
+                "enabled_count": len(allowlist) if isinstance(allowlist, list) else None,
+            }
+        )
+    return web.json_response({"colonies": colonies})
+
+
+def register_routes(app: web.Application) -> None:
+    """Register per-colony tool routes."""
+    app.router.add_get("/api/colonies/tools-index", handle_list_colonies)
+    app.router.add_get("/api/colony/{colony_name}/tools", handle_get_tools)
+    app.router.add_patch("/api/colony/{colony_name}/tools", handle_patch_tools)
@@ -0,0 +1,708 @@
+"""Colony worker inspection routes.
+
+These expose per-spawned-worker data (identified by worker_id) so the
+frontend can render a colony-workers sidebar analogous to the queen
+profile panel. Distinct from ``routes_workers.py``, which deals with
+*graph nodes* inside a worker definition rather than live worker
+instances.
+
+Session-scoped (bound to a live session's runtime):
+- GET /api/sessions/{session_id}/workers            — live + completed workers
+- GET /api/sessions/{session_id}/colony/skills      — colony's shared skills catalog
+- GET /api/sessions/{session_id}/colony/tools       — colony's default tools
+
+Colony-scoped (bound to the on-disk colony directory, independent of any
+live session — one colony has exactly one progress.db):
+- GET /api/colonies/{colony_name}/progress/snapshot — progress.db tasks/steps snapshot
+- GET /api/colonies/{colony_name}/progress/stream   — SSE feed of upserts (polled)
+- GET /api/colonies/{colony_name}/data/tables       — list user tables in progress.db
+- GET /api/colonies/{colony_name}/data/tables/{table}/rows — paginated rows
+- PATCH /api/colonies/{colony_name}/data/tables/{table}/rows — edit a row
+"""
+
+import asyncio
+import json
+import logging
+import re
+import sqlite3
+from pathlib import Path
+
+from aiohttp import web
+
+from framework.server.app import resolve_session
+
+# Same validation used by create_colony — keep them in sync. Blocks path
+# traversal (``..``) and shell-special chars; the endpoint would 400 on
+# anything else anyway, but validating early avoids a disk hit.
+_COLONY_NAME_RE = re.compile(r"^[a-z0-9_]+$")
+
+logger = logging.getLogger(__name__)
+
+# Poll interval for the progress SSE stream. Progress rows flip on the
+# order of seconds as workers finish LLM turns, so 1s feels live without
+# hammering the DB.
+_PROGRESS_POLL_INTERVAL = 1.0
+
+
+def _worker_info_to_dict(info) -> dict:
+    """Serialize a WorkerInfo dataclass to a JSON-friendly dict."""
+    result_dict = None
+    if info.result is not None:
+        r = info.result
+        result_dict = {
+            "status": r.status,
+            "summary": r.summary,
+            "error": r.error,
+            "tokens_used": r.tokens_used,
+            "duration_seconds": r.duration_seconds,
+        }
+    return {
+        "worker_id": info.id,
+        "task": info.task,
+        "status": str(info.status),
+        "started_at": info.started_at,
+        "result": result_dict,
+    }
+
+
+async def handle_list_workers(request: web.Request) -> web.Response:
+    """GET /api/sessions/{session_id}/workers -- list workers in a session's colony.
+
+    Returns two populations merged:
+      1. In-memory workers from the session's unified ColonyRuntime
+         (``session.colony._workers``). Includes live + just-finished
+         entries since ``_workers`` isn't pruned on termination.
+      2. Historical worker directories on disk under
+         ``<session_dir>/workers/`` that are not in memory. Populated
+         from dir name / first user message / dir mtime. These appear
+         as ``status="historical"`` so the frontend can style them
+         distinctly from actives.
+
+    Falls back to the legacy ``session.colony_runtime`` for the
+    in-memory half when ``session.colony`` isn't set.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    runtime = getattr(session, "colony", None) or getattr(session, "colony_runtime", None)
+
+    workers: list[dict] = []
+    known_ids: set[str] = set()
+    storage_path: Path | None = None
+    if runtime is not None:
+        for info in runtime.list_workers():
+            workers.append(_worker_info_to_dict(info))
+            known_ids.add(info.id)
+        raw_storage = getattr(runtime, "_storage_path", None)
+        if raw_storage is not None:
+            storage_path = Path(raw_storage)
+
+    # Fall back to the session's directory if the runtime didn't expose one.
+    if storage_path is None:
+        session_dir = getattr(session, "queen_dir", None) or getattr(session, "session_dir", None)
+        if session_dir is not None:
+            storage_path = Path(session_dir)
+
+    if storage_path is not None:
+        workers.extend(await asyncio.to_thread(_walk_historical_workers, storage_path, known_ids))
+
+    return web.json_response({"workers": workers})
+
+
+def _walk_historical_workers(storage_path: Path, known_ids: set[str]) -> list[dict]:
+    """Scan ``<storage_path>/workers/`` for worker session dirs not already
+    in memory and return minimal ``WorkerSummary``-shaped entries.
+
+    We don't persist a standalone status file per worker, so the on-disk
+    entries get ``status="historical"`` and ``result=None``. The task is
+    reconstructed from the first non-boilerplate user message in the
+    worker's conversation parts.
+    """
+    workers_dir = storage_path / "workers"
+    if not workers_dir.exists() or not workers_dir.is_dir():
+        return []
+
+    out: list[dict] = []
+    try:
+        entries = list(workers_dir.iterdir())
+    except OSError:
+        return []
+
+    # Newest dir first so recent runs surface first in the tab.
+    entries.sort(key=lambda p: _safe_mtime(p), reverse=True)
+
+    for entry in entries:
+        if not entry.is_dir():
+            continue
+        wid = entry.name
+        if wid in known_ids:
+            continue
+        out.append(
+            {
+                "worker_id": wid,
+                "task": _extract_historical_task(entry),
+                "status": "historical",
+                "started_at": _safe_mtime(entry),
+                "result": None,
+            }
+        )
+    return out
+
+
+def _safe_mtime(path: Path) -> float:
+    try:
+        return path.stat().st_mtime
+    except OSError:
+        return 0.0
+
+
+def _extract_historical_task(worker_dir: Path) -> str:
+    """Pull the worker's initial task from its conversation parts.
+
+    seq 0 is a boilerplate "Hello" greeting in most flows; the real
+    task lands in an early user message (typically seq 1 or 2). Scan
+    the first few parts and return the first ``role="user"`` content
+    that isn't the greeting. Bounded at 5 parts to stay cheap on
+    directory listings containing hundreds of workers.
+    """
+    parts_dir = worker_dir / "conversations" / "parts"
+    if not parts_dir.exists():
+        return ""
+    try:
+        for i in range(5):
+            p = parts_dir / f"{i:010d}.json"
+            if not p.exists():
+                break
+            data = json.loads(p.read_text(encoding="utf-8"))
+            if data.get("role") != "user":
+                continue
+            content = data.get("content", "")
+            if not isinstance(content, str):
+                continue
+            text = content.strip()
+            if not text or text.lower() == "hello":
+                continue
+            return text[:400]
+    except Exception:
+        return ""
+    return ""
+
+
+# ── Skills & tools ─────────────────────────────────────────────────
+
+
+def _parsed_skill_to_dict(skill) -> dict:
+    """Serialize a ParsedSkill for the frontend."""
+    return {
+        "name": skill.name,
+        "description": skill.description,
+        "location": skill.location,
+        "base_dir": skill.base_dir,
+        "source_scope": skill.source_scope,
+    }
+
+
+async def handle_list_colony_skills(request: web.Request) -> web.Response:
+    """GET /api/sessions/{session_id}/colony/skills -- list skills the colony sees."""
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    runtime = session.colony_runtime
+    if runtime is None:
+        return web.json_response({"skills": []})
+
+    # Reach into the skills manager's catalog. There is no public
+    # iterator yet; we touch the private dict directly and defensively
+    # tolerate either shape (bare SkillsManager, or the
+    # from_precomputed variant which has no catalog).
+    catalog = getattr(runtime._skills_manager, "_catalog", None)
+    skills_dict = getattr(catalog, "_skills", None) if catalog is not None else None
+    if not isinstance(skills_dict, dict):
+        return web.json_response({"skills": []})
+
+    skills = [_parsed_skill_to_dict(s) for s in skills_dict.values()]
+    skills.sort(key=lambda s: s["name"])
+    return web.json_response({"skills": skills})
+
+
+# Tools that ship with the framework and have no credential provider,
+# but still deserve their own logical group. Surfaced to the frontend
+# as ``provider="system"`` so the UI treats them exactly like a
+# credential-backed group.
+_SYSTEM_TOOLS: frozenset[str] = frozenset(
+    {
+        "get_account_info",
+        "get_current_time",
+        "bash_kill",
+        "bash_output",
+        "execute_command_tool",
+        "example_tool",
+    }
+)
+
+
+def _tool_to_dict(tool, provider_map: dict[str, str] | None) -> dict:
+    """Serialize a Tool dataclass for the frontend.
+
+    ``provider_map`` is the colony runtime's tool_name → credential
+    provider map (built by the CredentialResolver pipeline stage from
+    ``CredentialStoreAdapter.get_tool_provider_map()``). Credential-
+    backed tools get a canonical provider key (e.g. ``"hubspot"``,
+    ``"gmail"``); framework / core tools return ``None``, except for
+    the hand-picked entries in ``_SYSTEM_TOOLS`` which are tagged
+    ``"system"``.
+    """
+    name = getattr(tool, "name", "")
+    provider = (provider_map or {}).get(name)
+    if provider is None and name in _SYSTEM_TOOLS:
+        provider = "system"
+    return {
+        "name": name,
+        "description": getattr(tool, "description", ""),
+        "provider": provider,
+    }
+
+
+async def handle_list_colony_tools(request: web.Request) -> web.Response:
+    """GET /api/sessions/{session_id}/colony/tools -- list the colony's default tools."""
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    runtime = session.colony_runtime
+    if runtime is None:
+        return web.json_response({"tools": []})
+
+    provider_map = getattr(runtime, "_tool_provider_map", None)
+    tools = [_tool_to_dict(t, provider_map) for t in (runtime._tools or [])]
+    tools.sort(key=lambda t: t["name"])
+    return web.json_response({"tools": tools})
+
+
+# ── Progress DB (tasks/steps) ──────────────────────────────────────
+
+
+def _resolve_progress_db_by_name(colony_name: str) -> Path | None:
+    """Resolve a colony's progress.db path by directory name.
+
+    Returns ``None`` when the name fails validation or the file does not
+    exist. Both conditions render as an empty Data tab in the UI rather
+    than a hard error so an operator can open the panel before any
+    workers have actually run.
+    """
+    if not _COLONY_NAME_RE.match(colony_name):
+        return None
+    db_path = Path.home() / ".hive" / "colonies" / colony_name / "data" / "progress.db"
+    return db_path if db_path.exists() else None
+
+
+def _read_progress_snapshot(db_path: Path, worker_id: str | None) -> dict:
+    """Read tasks + steps from progress.db, optionally filtered by worker_id.
+
+    The worker_id filter applies to tasks (claimed by that worker) and
+    to steps (executed by that worker). If omitted, returns all rows.
+    """
+    con = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=5.0)
+    try:
+        con.row_factory = sqlite3.Row
+        if worker_id:
+            task_rows = con.execute(
+                "SELECT * FROM tasks WHERE worker_id = ? ORDER BY updated_at DESC",
+                (worker_id,),
+            ).fetchall()
+            step_rows = con.execute(
+                "SELECT * FROM steps WHERE worker_id = ? ORDER BY task_id, seq",
+                (worker_id,),
+            ).fetchall()
+        else:
+            task_rows = con.execute("SELECT * FROM tasks ORDER BY updated_at DESC LIMIT 500").fetchall()
+            step_rows = con.execute("SELECT * FROM steps ORDER BY task_id, seq LIMIT 2000").fetchall()
+        return {
+            "tasks": [dict(r) for r in task_rows],
+            "steps": [dict(r) for r in step_rows],
+        }
+    finally:
+        con.close()
+
+
+async def handle_progress_snapshot(request: web.Request) -> web.Response:
+    """GET /api/colonies/{colony_name}/progress/snapshot
+
+    Optional ?worker_id=... to filter to rows touched by a specific worker.
+    """
+    colony_name = request.match_info["colony_name"]
+    db_path = _resolve_progress_db_by_name(colony_name)
+    if db_path is None:
+        return web.json_response({"tasks": [], "steps": []})
+
+    worker_id = request.query.get("worker_id") or None
+    snapshot = await asyncio.to_thread(_read_progress_snapshot, db_path, worker_id)
+    return web.json_response(snapshot)
+
+
+def _read_progress_upserts(
+    db_path: Path,
+    worker_id: str | None,
+    since: str | None,
+) -> tuple[list[dict], list[dict], str | None]:
+    """Return task/step rows with ``updated_at`` (tasks) or a derived
+    timestamp (steps) newer than ``since``, plus the new high-water mark.
+
+    Steps don't carry an ``updated_at`` column — we use
+    ``COALESCE(completed_at, started_at)`` as the change witness. A step
+    without either timestamp hasn't changed since the last poll and is
+    skipped.
+
+    ``since`` is an ISO8601 string (as produced by progress_db._now_iso).
+    ``None`` means "give me everything" — used for the SSE priming frame.
+    """
+    con = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=5.0)
+    try:
+        con.row_factory = sqlite3.Row
+        task_sql = "SELECT * FROM tasks"
+        step_sql = (
+            "SELECT *, COALESCE(completed_at, started_at) AS _ts "
+            "FROM steps WHERE COALESCE(completed_at, started_at) IS NOT NULL"
+        )
+        task_args: list = []
+        step_args: list = []
+        if since is not None:
+            task_sql += " WHERE updated_at > ?"
+            step_sql += " AND COALESCE(completed_at, started_at) > ?"
+            task_args.append(since)
+            step_args.append(since)
+        if worker_id:
+            joiner_t = " AND " if since is not None else " WHERE "
+            task_sql += joiner_t + "worker_id = ?"
+            step_sql += " AND worker_id = ?"
+            task_args.append(worker_id)
+            step_args.append(worker_id)
+        task_sql += " ORDER BY updated_at"
+        step_sql += " ORDER BY _ts"
+
+        task_rows = con.execute(task_sql, task_args).fetchall()
+        step_rows = con.execute(step_sql, step_args).fetchall()
+
+        tasks = [dict(r) for r in task_rows]
+        steps = [dict(r) for r in step_rows]
+        # High-water mark = max timestamp across both sets. Fall back to
+        # the previous ``since`` when nothing changed.
+        ts_values = [t["updated_at"] for t in tasks]
+        ts_values.extend(s["_ts"] for s in steps if s.get("_ts"))
+        new_since = max(ts_values) if ts_values else since
+        return tasks, steps, new_since
+    finally:
+        con.close()
+
+
+async def handle_progress_stream(request: web.Request) -> web.StreamResponse:
+    """GET /api/colonies/{colony_name}/progress/stream
+
+    SSE feed that emits ``snapshot`` once (current state) followed by
+    ``upsert`` events whenever a task/step row changes. Polls the DB
+    every ``_PROGRESS_POLL_INTERVAL`` seconds — the sqlite3 CLI path
+    workers use for writes doesn't fire SQLite's update hook on our
+    connection, so polling is the robust option.
+    """
+    colony_name = request.match_info["colony_name"]
+    worker_id = request.query.get("worker_id") or None
+
+    resp = web.StreamResponse(
+        status=200,
+        headers={
+            "Content-Type": "text/event-stream",
+            "Cache-Control": "no-cache, no-transform",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
+    await resp.prepare(request)
+
+    async def _send(event: str, data: dict) -> None:
+        payload = f"event: {event}\ndata: {json.dumps(data)}\n\n"
+        await resp.write(payload.encode("utf-8"))
+
+    db_path = _resolve_progress_db_by_name(colony_name)
+    if db_path is None:
+        await _send("snapshot", {"tasks": [], "steps": []})
+        await _send("end", {"reason": "no_progress_db"})
+        return resp
+
+    try:
+        snapshot = await asyncio.to_thread(_read_progress_snapshot, db_path, worker_id)
+        await _send("snapshot", snapshot)
+
+        since: str | None = None
+        # Initialize the high-water mark from the snapshot so we don't
+        # re-emit every row as "new" on the first poll.
+        ts_values: list[str] = [t.get("updated_at") for t in snapshot["tasks"] if t.get("updated_at")]
+        ts_values.extend(
+            s.get("completed_at") or s.get("started_at")
+            for s in snapshot["steps"]
+            if s.get("completed_at") or s.get("started_at")
+        )
+        if ts_values:
+            since = max(v for v in ts_values if v)
+
+        # The loop relies on client disconnect surfacing as
+        # ConnectionResetError from ``_send`` — no explicit alive check
+        # required.
+        while True:
+            await asyncio.sleep(_PROGRESS_POLL_INTERVAL)
+            tasks, steps, new_since = await asyncio.to_thread(_read_progress_upserts, db_path, worker_id, since)
+            if tasks or steps:
+                await _send("upsert", {"tasks": tasks, "steps": steps})
+                since = new_since
+    except (asyncio.CancelledError, ConnectionResetError):
+        # Client disconnected; clean exit.
+        raise
+    except Exception as exc:
+        logger.warning("progress stream error: %s", exc, exc_info=True)
+        try:
+            await _send("error", {"message": str(exc)})
+        except Exception:
+            pass
+    return resp
+
+
+# ── Raw data grid (airtable-style view/edit of progress.db tables) ─────
+#
+# The Data tab lets the operator inspect and hand-edit SQLite rows.
+# Identifier-quoting note: SQLite params can only bind values, never
+# identifiers, so we have to interpolate table/column names into SQL.
+# Every name is *validated against sqlite_master / PRAGMA table_info*
+# before use and then wrapped with ``_q()`` which escapes embedded
+# quotes. Do NOT accept raw names from the request without running them
+# through ``_validate_ident`` first.
+
+
+def _q(ident: str) -> str:
+    """Quote a SQLite identifier (table or column) safely."""
+    return '"' + ident.replace('"', '""') + '"'
+
+
+def _list_user_tables(con: sqlite3.Connection) -> list[str]:
+    return [
+        r["name"]
+        for r in con.execute(
+            "SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
+        )
+    ]
+
+
+def _table_columns(con: sqlite3.Connection, table: str) -> list[dict]:
+    """Return PRAGMA table_info rows as dicts. Empty list if no such table."""
+    return [
+        {
+            "name": r["name"],
+            "type": r["type"] or "",
+            "notnull": bool(r["notnull"]),
+            # pk>0 means the column is part of the primary key (ordinal);
+            # 0 means non-PK.
+            "pk": int(r["pk"]),
+            "dflt_value": r["dflt_value"],
+        }
+        for r in con.execute(f"PRAGMA table_info({_q(table)})")
+    ]
+
+
+def _read_tables_overview(db_path: Path) -> list[dict]:
+    """List user tables with columns + row counts."""
+    con = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=5.0)
+    try:
+        con.row_factory = sqlite3.Row
+        out: list[dict] = []
+        for name in _list_user_tables(con):
+            cols = _table_columns(con, name)
+            count_row = con.execute(f"SELECT COUNT(*) AS c FROM {_q(name)}").fetchone()
+            out.append(
+                {
+                    "name": name,
+                    "columns": cols,
+                    "row_count": int(count_row["c"]),
+                    "primary_key": [c["name"] for c in cols if c["pk"] > 0],
+                }
+            )
+        return out
+    finally:
+        con.close()
+
+
+def _validate_ident(name: str, known: set[str]) -> str | None:
+    """Return ``name`` if present in ``known``, else ``None``."""
+    return name if name in known else None
+
+
+def _read_table_rows(
+    db_path: Path,
+    table: str,
+    limit: int,
+    offset: int,
+    order_by: str | None,
+    order_dir: str,
+) -> dict:
+    con = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=5.0)
+    try:
+        con.row_factory = sqlite3.Row
+        tables = set(_list_user_tables(con))
+        if _validate_ident(table, tables) is None:
+            return {"error": f"unknown table: {table}"}
+        cols = _table_columns(con, table)
+        col_names = {c["name"] for c in cols}
+
+        sql = f"SELECT * FROM {_q(table)}"
+        if order_by and order_by in col_names:
+            direction = "DESC" if order_dir.lower() == "desc" else "ASC"
+            sql += f" ORDER BY {_q(order_by)} {direction}"
+        sql += " LIMIT ? OFFSET ?"
+        rows = con.execute(sql, (int(limit), int(offset))).fetchall()
+        total = con.execute(f"SELECT COUNT(*) AS c FROM {_q(table)}").fetchone()["c"]
+        return {
+            "table": table,
+            "columns": cols,
+            "primary_key": [c["name"] for c in cols if c["pk"] > 0],
+            "rows": [dict(r) for r in rows],
+            "total": int(total),
+            "limit": int(limit),
+            "offset": int(offset),
+        }
+    finally:
+        con.close()
+
+
+def _update_table_row(
+    db_path: Path,
+    table: str,
+    pk: dict,
+    updates: dict,
+) -> dict:
+    """Apply ``updates`` (column->value) to the row matching ``pk``.
+
+    Returns ``{"updated": n}`` with the number of rows affected (0 or 1),
+    or ``{"error": ...}`` on validation failure.
+    """
+    if not updates:
+        return {"error": "no updates provided"}
+    con = sqlite3.connect(db_path, timeout=5.0)
+    try:
+        con.row_factory = sqlite3.Row
+        tables = set(_list_user_tables(con))
+        if _validate_ident(table, tables) is None:
+            return {"error": f"unknown table: {table}"}
+        cols = _table_columns(con, table)
+        col_names = {c["name"] for c in cols}
+        pk_cols = [c["name"] for c in cols if c["pk"] > 0]
+        if not pk_cols:
+            return {"error": f"table {table!r} has no primary key; cannot edit by row"}
+
+        # Validate pk has every pk column and all values are scalars.
+        missing = [p for p in pk_cols if p not in pk]
+        if missing:
+            return {"error": f"missing primary key columns: {missing}"}
+
+        # Validate update columns exist and aren't part of the primary key
+        # (changing a PK column would silently break joins/foreign refs).
+        bad = [c for c in updates if c not in col_names]
+        if bad:
+            return {"error": f"unknown columns: {bad}"}
+        pk_update = [c for c in updates if c in pk_cols]
+        if pk_update:
+            return {"error": f"cannot edit primary key columns: {pk_update}"}
+
+        set_sql = ", ".join(f"{_q(c)} = ?" for c in updates)
+        where_sql = " AND ".join(f"{_q(c)} = ?" for c in pk_cols)
+        sql = f"UPDATE {_q(table)} SET {set_sql} WHERE {where_sql}"
+        params = list(updates.values()) + [pk[c] for c in pk_cols]
+        cur = con.execute(sql, params)
+        con.commit()
+        return {"updated": cur.rowcount}
+    finally:
+        con.close()
+
+
+async def handle_list_tables(request: web.Request) -> web.Response:
+    """GET /api/colonies/{colony_name}/data/tables"""
+    colony_name = request.match_info["colony_name"]
+    db_path = _resolve_progress_db_by_name(colony_name)
+    if db_path is None:
+        return web.json_response({"tables": []})
+    tables = await asyncio.to_thread(_read_tables_overview, db_path)
+    return web.json_response({"tables": tables})
+
+
+async def handle_table_rows(request: web.Request) -> web.Response:
+    """GET /api/colonies/{colony_name}/data/tables/{table}/rows"""
+    colony_name = request.match_info["colony_name"]
+    db_path = _resolve_progress_db_by_name(colony_name)
+    if db_path is None:
+        return web.json_response({"error": "no progress.db"}, status=404)
+
+    table = request.match_info["table"]
+    # Clamp limit: 500 is enough for the grid's virtualization window;
+    # a larger cap would make accidental full-table loads cheap.
+    try:
+        limit = max(1, min(500, int(request.query.get("limit", "100"))))
+        offset = max(0, int(request.query.get("offset", "0")))
+    except ValueError:
+        return web.json_response({"error": "invalid limit/offset"}, status=400)
+    order_by = request.query.get("order_by") or None
+    order_dir = request.query.get("order_dir", "asc")
+
+    result = await asyncio.to_thread(_read_table_rows, db_path, table, limit, offset, order_by, order_dir)
+    if "error" in result:
+        return web.json_response(result, status=400)
+    return web.json_response(result)
+
+
+async def handle_update_row(request: web.Request) -> web.Response:
+    """PATCH /api/colonies/{colony_name}/data/tables/{table}/rows
+
+    Body: ``{"pk": {col: value, ...}, "updates": {col: value, ...}}``.
+    """
+    colony_name = request.match_info["colony_name"]
+    db_path = _resolve_progress_db_by_name(colony_name)
+    if db_path is None:
+        return web.json_response({"error": "no progress.db"}, status=404)
+
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "invalid JSON body"}, status=400)
+    pk = body.get("pk") or {}
+    updates = body.get("updates") or {}
+    if not isinstance(pk, dict) or not isinstance(updates, dict):
+        return web.json_response({"error": "pk and updates must be objects"}, status=400)
+
+    table = request.match_info["table"]
+    result = await asyncio.to_thread(_update_table_row, db_path, table, pk, updates)
+    if "error" in result:
+        return web.json_response(result, status=400)
+    return web.json_response(result)
+
+
+def register_routes(app: web.Application) -> None:
+    """Register colony worker routes."""
+    # Session-scoped — these read live runtime state from a session.
+    app.router.add_get("/api/sessions/{session_id}/workers", handle_list_workers)
+    app.router.add_get("/api/sessions/{session_id}/colony/skills", handle_list_colony_skills)
+    app.router.add_get("/api/sessions/{session_id}/colony/tools", handle_list_colony_tools)
+    # Colony-scoped — one progress.db per colony, no session indirection.
+    app.router.add_get(
+        "/api/colonies/{colony_name}/progress/snapshot",
+        handle_progress_snapshot,
+    )
+    app.router.add_get(
+        "/api/colonies/{colony_name}/progress/stream",
+        handle_progress_stream,
+    )
+    app.router.add_get("/api/colonies/{colony_name}/data/tables", handle_list_tables)
+    app.router.add_get(
+        "/api/colonies/{colony_name}/data/tables/{table}/rows",
+        handle_table_rows,
+    )
+    app.router.add_patch(
+        "/api/colonies/{colony_name}/data/tables/{table}/rows",
+        handle_update_row,
+    )
@@ -6,6 +6,7 @@ Routes:
 - GET  /api/config/models        — curated provider→models list
 """

+import asyncio
 import json
 import logging
 import os
@@ -301,6 +302,53 @@ def _hot_swap_sessions(request: web.Request, full_model: str, api_key: str | Non
    return swapped


+async def _validate_provider_key(
+    provider: str,
+    api_key: str,
+    api_base: str | None = None,
+    model: str | None = None,
+) -> dict:
+    """Validate an API key against the provider. Returns {"valid": bool, "message": str}.
+
+    Runs the check in a thread pool to avoid blocking the event loop.
+    """
+    from scripts.check_llm_key import (
+        PROVIDERS as CHECK_PROVIDERS,
+        check_anthropic_compatible,
+        check_minimax,
+        check_openai_compatible,
+        check_openrouter,
+        check_openrouter_model,
+    )
+
+    def _check() -> dict:
+        pid = provider.lower()
+        try:
+            # Subscription providers with custom api_base
+            if pid == "openrouter" and model:
+                return check_openrouter_model(api_key, model=model, api_base=api_base or "https://openrouter.ai/api/v1")
+            if api_base and pid == "minimax":
+                return check_minimax(api_key, api_base)
+            if api_base and pid == "openrouter":
+                return check_openrouter(api_key, api_base)
+            if api_base and pid == "kimi":
+                return check_anthropic_compatible(api_key, api_base.rstrip("/") + "/v1/messages", "Kimi")
+            if api_base and pid == "hive":
+                return check_anthropic_compatible(api_key, api_base.rstrip("/") + "/v1/messages", "Hive")
+            if api_base:
+                endpoint = api_base.rstrip("/") + "/models"
+                name = {"zai": "ZAI"}.get(pid, "Custom provider")
+                return check_openai_compatible(api_key, endpoint, name)
+            if pid in CHECK_PROVIDERS:
+                return CHECK_PROVIDERS[pid](api_key)
+            # No check available — assume valid
+            return {"valid": True, "message": f"No health check for {pid}"}
+        except Exception as exc:
+            return {"valid": None, "message": f"Validation error: {exc}"}
+
+    return await asyncio.get_event_loop().run_in_executor(None, _check)
+
+
 # ------------------------------------------------------------------
 # Handlers
 # ------------------------------------------------------------------
@@ -324,9 +372,9 @@ async def handle_get_llm_config(request: web.Request) -> web.Response:
        if _resolve_api_key(pid, request) is not None:
            connected.append(pid)

-    # Subscription detection
+    # Subscription detection — only include subscriptions whose tokens exist
    active_subscription = _get_active_subscription(llm)
-    detected_subscriptions = _detect_subscriptions()
+    detected_subscriptions = [sid for sid in _detect_subscriptions() if _get_subscription_token(sid)]

    return web.json_response(
        {
@@ -369,6 +417,21 @@ async def handle_update_llm_config(request: web.Request) -> web.Response:
        provider = sub["provider"]
        api_base = sub.get("api_base")

+        # Validate the subscription token before committing
+        token = _get_subscription_token(subscription_id)
+        if not token:
+            return web.json_response(
+                {"error": f"No credential found for {sub['name']}. Please check your subscription or API key."},
+                status=400,
+            )
+
+        check = await _validate_provider_key(provider, token, api_base=api_base)
+        if check.get("valid") is False:
+            return web.json_response(
+                {"error": f"{sub['name']} key validation failed: {check.get('message', 'unknown error')}"},
+                status=400,
+            )
+
        # Look up token limits from preset
        max_tokens: int | None = None
        max_context_tokens: int | None = None
@@ -399,8 +462,7 @@ async def handle_update_llm_config(request: web.Request) -> web.Response:

        _write_config_atomic(config)

-        # Hot-swap with subscription token
-        token = _get_subscription_token(subscription_id)
+        # Hot-swap with subscription token (already validated above)
        full_model = f"{provider}/{model}"
        swapped = _hot_swap_sessions(request, full_model, api_key=token, api_base=api_base)

@@ -430,15 +492,36 @@ async def handle_update_llm_config(request: web.Request) -> web.Response:
        if not provider or not model:
            return web.json_response({"error": "Both 'provider' and 'model' are required"}, status=400)

-        # Look up token limits from catalogue
+        # Verify model exists in the catalogue
        model_info = _find_model_info(provider, model)
-        max_tokens = model_info["max_tokens"] if model_info else 8192
-        max_context_tokens = model_info["max_context_tokens"] if model_info else 120000
+        if not model_info:
+            return web.json_response(
+                {"error": f"Model '{model}' is not available for provider '{provider}'."},
+                status=400,
+            )
+
+        max_tokens = model_info["max_tokens"]
+        max_context_tokens = model_info["max_context_tokens"]

        # Determine env var and api_base
        env_var = PROVIDER_ENV_VARS.get(provider.lower(), "")
        api_base = _get_api_base_for_provider(provider)

+        # Validate the API key before committing
+        api_key = _resolve_api_key(provider, request)
+        if not api_key:
+            return web.json_response(
+                {"error": f"No API key found for {provider}. Please add one in Manage Keys."},
+                status=400,
+            )
+
+        check = await _validate_provider_key(provider, api_key, api_base=api_base, model=model)
+        if check.get("valid") is False:
+            return web.json_response(
+                {"error": f"API key validation failed for {provider}: {check.get('message', 'unknown error')}"},
+                status=400,
+            )
+
        # Update ~/.hive/configuration.json
        config = get_hive_config()
        llm_section = config.setdefault("llm", {})
@@ -458,8 +541,7 @@ async def handle_update_llm_config(request: web.Request) -> web.Response:

        _write_config_atomic(config)

-        # Hot-swap all running sessions
-        api_key = _resolve_api_key(provider, request)
+        # Hot-swap all running sessions (api_key already validated above)
        full_model = f"{provider}/{model}"
        swapped = _hot_swap_sessions(request, full_model, api_key=api_key, api_base=api_base)

@@ -594,6 +676,64 @@ async def handle_get_models(request: web.Request) -> web.Response:
    return web.json_response({"models": MODELS_CATALOGUE})


+# ------------------------------------------------------------------
+# User avatar
+# ------------------------------------------------------------------
+
+MAX_AVATAR_BYTES = 2 * 1024 * 1024  # 2 MB
+_ALLOWED_AVATAR_TYPES = {
+    "image/jpeg": ".jpg",
+    "image/png": ".png",
+    "image/webp": ".webp",
+}
+
+
+async def handle_upload_user_avatar(request: web.Request) -> web.Response:
+    """POST /api/config/profile/avatar — upload user profile picture."""
+    reader = await request.multipart()
+    field = await reader.next()
+    if field is None or field.name != "avatar":
+        return web.json_response({"error": "Expected a file field named 'avatar'"}, status=400)
+
+    content_type = getattr(field, "content_type", None) or field.headers.get("Content-Type", "")
+    ext = _ALLOWED_AVATAR_TYPES.get(content_type)
+    if not ext:
+        return web.json_response(
+            {"error": f"Unsupported image type: {content_type}. Use JPEG, PNG, or WebP."},
+            status=400,
+        )
+
+    data = bytearray()
+    while True:
+        chunk = await field.read_chunk(8192)
+        if not chunk:
+            break
+        data.extend(chunk)
+        if len(data) > MAX_AVATAR_BYTES:
+            return web.json_response({"error": "Image too large. Maximum size is 2 MB."}, status=400)
+
+    if not data:
+        return web.json_response({"error": "Empty file"}, status=400)
+
+    # Remove existing avatar files
+    for existing in HIVE_CONFIG_FILE.parent.glob("avatar.*"):
+        existing.unlink(missing_ok=True)
+
+    avatar_path = HIVE_CONFIG_FILE.parent / f"avatar{ext}"
+    avatar_path.write_bytes(data)
+    logger.info("User avatar uploaded: %s (%d bytes)", avatar_path.name, len(data))
+    return web.json_response({"avatar_url": "/api/config/profile/avatar"})
+
+
+async def handle_get_user_avatar(request: web.Request) -> web.Response:
+    """GET /api/config/profile/avatar — serve user profile picture."""
+    for ext in _ALLOWED_AVATAR_TYPES.values():
+        avatar_path = HIVE_CONFIG_FILE.parent / f"avatar{ext}"
+        if avatar_path.exists():
+            return web.FileResponse(avatar_path, headers={"Cache-Control": "public, max-age=3600"})
+    return web.json_response({"error": "No avatar found"}, status=404)
+
+
 # ------------------------------------------------------------------
 # Route registration
 # ------------------------------------------------------------------
@@ -606,3 +746,5 @@ def register_routes(app: web.Application) -> None:
    app.router.add_get("/api/config/models", handle_get_models)
    app.router.add_get("/api/config/profile", handle_get_profile)
    app.router.add_put("/api/config/profile", handle_update_profile)
+    app.router.add_post("/api/config/profile/avatar", handle_upload_user_avatar)
+    app.router.add_get("/api/config/profile/avatar", handle_get_user_avatar)
@@ -7,7 +7,7 @@ import os
 from aiohttp import web
 from pydantic import SecretStr

-from framework.credentials.models import CredentialKey, CredentialObject
+from framework.credentials.models import CredentialDecryptionError, CredentialKey, CredentialObject
 from framework.credentials.store import CredentialStore
 from framework.server.app import validate_agent_path

@@ -84,23 +84,52 @@ def _credential_to_dict(cred: CredentialObject) -> dict:
    }


+def _is_available_for_specs(store: CredentialStore, credential_id: str) -> bool:
+    """Best-effort availability check for the repair UI.
+
+    The credential settings page must stay reachable even when an encrypted
+    file was written with the wrong key or is otherwise unreadable.
+    """
+    try:
+        return store.is_available(credential_id)
+    except CredentialDecryptionError as exc:
+        logger.warning("Credential '%s' is unreadable; marking unavailable in specs: %s", credential_id, exc)
+        return False
+
+
 async def handle_list_credentials(request: web.Request) -> web.Response:
    """GET /api/credentials — list all credential metadata (no secrets)."""
    store = _get_store(request)
    cred_ids = store.list_credentials()
    credentials = []
+    unreadable = []
    for cid in cred_ids:
-        cred = store.get_credential(cid, refresh_if_needed=False)
+        try:
+            cred = store.get_credential(cid, refresh_if_needed=False)
+        except CredentialDecryptionError as exc:
+            logger.warning("Credential '%s' is unreadable while listing credentials: %s", cid, exc)
+            unreadable.append(cid)
+            continue
        if cred:
            credentials.append(_credential_to_dict(cred))
-    return web.json_response({"credentials": credentials})
+    return web.json_response({"credentials": credentials, "unreadable_credentials": unreadable})


 async def handle_get_credential(request: web.Request) -> web.Response:
    """GET /api/credentials/{credential_id} — get single credential metadata."""
    credential_id = request.match_info["credential_id"]
    store = _get_store(request)
-    cred = store.get_credential(credential_id, refresh_if_needed=False)
+    try:
+        cred = store.get_credential(credential_id, refresh_if_needed=False)
+    except CredentialDecryptionError:
+        return web.json_response(
+            {
+                "error": f"Credential '{credential_id}' could not be decrypted",
+                "credential_id": credential_id,
+                "recoverable": True,
+            },
+            status=409,
+        )
    if cred is None:
        return web.json_response({"error": f"Credential '{credential_id}' not found"}, status=404)
    return web.json_response(_credential_to_dict(cred))
@@ -393,7 +422,7 @@ async def handle_list_specs(request: web.Request) -> web.Response:
            if spec.aden_supported and not spec.direct_api_key_supported:
                available = len(accounts) > 0
            else:
-                available = store.is_available(cred_id)
+                available = _is_available_for_specs(store, cred_id)
            specs.append(
                {
                    "credential_name": name,
@@ -51,13 +51,18 @@ DEFAULT_EVENT_TYPES = [
 # Keepalive interval in seconds
 KEEPALIVE_INTERVAL = 15.0

-# Phase 5 SSE filter: parallel-worker streams (stream_id="worker:{uuid}")
-# publish high-frequency LLM deltas / tool calls that would flood the
-# user's queen DM chat. We let only this small allowlist of worker
-# events through to the queen-chat SSE so the frontend can render
-# fan-out lifecycle and structured fan-in reports without seeing the
-# raw worker chatter. Per-worker SSE panels (Phase 5b) bypass this
-# filter via a dedicated /workers/{worker_id}/events route.
+# Session-SSE worker filter: workers run outside the queen's DM
+# chat. Worker activity is observable via the dedicated
+# ``/api/workers/{worker_id}/events`` per-worker SSE route, not via
+# the session chat. This keeps the queen↔user conversation clean of
+# tool-call chatter regardless of whether the worker was spawned by
+# ``run_agent_with_input`` (stream_id="worker") or
+# ``run_parallel_workers`` (stream_id="worker:{uuid}").
+#
+# Lifecycle events the frontend needs for fan-in summaries
+# (SUBAGENT_REPORT, EXECUTION_COMPLETED, EXECUTION_FAILED) are still
+# allowed through so the queen can show "N workers done" surfaces
+# without exposing the per-turn chatter.
 _WORKER_EVENT_ALLOWLIST = {
    EventType.SUBAGENT_REPORT.value,
    EventType.EXECUTION_COMPLETED.value,
@@ -66,9 +71,17 @@ _WORKER_EVENT_ALLOWLIST = {


 def _is_worker_noise(evt_dict: dict) -> bool:
-    """True if the event is a parallel-worker event we should drop."""
+    """True if the event belongs to a worker stream and should not
+    surface in the queen DM chat.
+
+    Matches any stream starting with ``worker`` — both the bare
+    ``"worker"`` tag used by single-worker spawns and the
+    ``"worker:{uuid}"`` tag used by parallel fan-outs. The allowlist
+    carves out the three terminal/lifecycle events the UI still
+    needs to render fan-in summaries.
+    """
    stream_id = evt_dict.get("stream_id") or ""
-    if not stream_id.startswith("worker:"):
+    if not stream_id.startswith("worker"):
        return False
    return evt_dict.get("type") not in _WORKER_EVENT_ALLOWLIST

@@ -106,6 +119,22 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
    event_bus = session.event_bus
    event_types = _parse_event_types(request.query.get("types"))

+    # Worker-noise filter is phase-aware. In DM mode (queen phase
+    # "independent") the queen's chat should stay clean — workers
+    # are invisible. In colony mode (phase "working"/"reviewing")
+    # the user IS supervising the workers and wants to see the
+    # tool-call/text-delta chatter as it happens. Sample the phase
+    # once at SSE connect; if the queen later transitions the
+    # frontend reconnects.
+    def _should_filter_worker_noise() -> bool:
+        phase_state = getattr(session, "phase_state", None)
+        if phase_state is None:
+            return True  # unknown phase → be conservative, filter noise
+        phase = getattr(phase_state, "phase", "independent")
+        return phase == "independent"
+
+    filter_worker_noise = _should_filter_worker_noise()
+
    # Per-client buffer queue
    queue: asyncio.Queue = asyncio.Queue(maxsize=1000)

@@ -132,7 +161,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
            return

        evt_dict = event.to_dict()
-        if _is_worker_noise(evt_dict):
+        if filter_worker_noise and _is_worker_noise(evt_dict):
            return
        if evt_dict.get("type") in _CRITICAL_EVENTS:
            try:
@@ -180,6 +209,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
        EventType.TRIGGER_AVAILABLE.value,
        EventType.TRIGGER_ACTIVATED.value,
        EventType.TRIGGER_DEACTIVATED.value,
+        EventType.TRIGGER_FIRED.value,
        EventType.TRIGGER_REMOVED.value,
        EventType.TRIGGER_UPDATED.value,
    }
@@ -189,7 +219,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
    for past_event in event_bus._event_history:
        if past_event.type.value in replay_types:
            past_dict = past_event.to_dict()
-            if _is_worker_noise(past_dict):
+            if filter_worker_noise and _is_worker_noise(past_dict):
                continue
            try:
                queue.put_nowait(past_dict)
@@ -0,0 +1,291 @@
+"""MCP server registration routes.
+
+Thin HTTP wrapper around ``MCPRegistry`` so the frontend can add, remove,
+enable, and health-check user-registered MCP servers. The CLI path
+(``hive mcp add`` / ``hive mcp remove`` / etc.) is unchanged.
+
+- GET    /api/mcp/servers                  -- list installed servers
+- POST   /api/mcp/servers                  -- register a local server
+- DELETE /api/mcp/servers/{name}           -- remove a local server
+- POST   /api/mcp/servers/{name}/enable    -- enable a server
+- POST   /api/mcp/servers/{name}/disable   -- disable a server
+- POST   /api/mcp/servers/{name}/health    -- probe server health
+
+New servers take effect on the *next* queen session start. Existing live
+queen sessions keep the tool list they booted with to avoid mid-turn
+cache invalidation. The ``add`` response hints at this explicitly.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.loader.mcp_errors import MCPError
+from framework.loader.mcp_registry import MCPRegistry
+
+logger = logging.getLogger(__name__)
+
+
+_VALID_TRANSPORTS = {"stdio", "http", "sse", "unix"}
+
+
+def _registry() -> MCPRegistry:
+    # MCPRegistry is a thin wrapper around ~/.hive/mcp_registry/installed.json
+    # so instantiation is cheap — no need to cache on app["..."].
+    reg = MCPRegistry()
+    reg.initialize()
+    return reg
+
+
+def _package_builtin_servers() -> list[dict[str, Any]]:
+    """Return the package-baked queen MCP servers from ``queen/mcp_servers.json``.
+
+    Those servers are loaded directly by ``ToolRegistry.load_mcp_config``
+    at queen boot and never go through ``MCPRegistry.list_installed``,
+    so the raw registry view shows them as missing. Surface them here so
+    the Tool Library reflects what the queen actually talks to.
+
+    Entries carry ``source: "built-in"`` and are NOT removable / toggleable
+    — editing them requires changing the repo file.
+    """
+    import json
+    from pathlib import Path
+
+    import framework.agents.queen as _queen_pkg
+
+    path = Path(_queen_pkg.__file__).parent / "mcp_servers.json"
+    if not path.exists():
+        return []
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        return []
+
+    out: list[dict[str, Any]] = []
+    for name, cfg in data.items():
+        if not isinstance(cfg, dict):
+            continue
+        out.append(
+            {
+                "name": name,
+                "source": "built-in",
+                "transport": cfg.get("transport", "stdio"),
+                "description": cfg.get("description", "") or "",
+                "enabled": True,
+                "last_health_status": None,
+                "last_error": None,
+                "last_health_check_at": None,
+                "tool_count": None,
+                "removable": False,
+            }
+        )
+    return out
+
+
+def _server_to_summary(entry: dict[str, Any]) -> dict[str, Any]:
+    """Shape an installed.json entry for API responses.
+
+    Strips the full manifest body (which can be large) but keeps the tool
+    list if the manifest already embeds one (happens for registry-installed
+    servers). Users with ``source: "local"`` only get a tool list after
+    running a health check.
+    """
+    manifest = entry.get("manifest") or {}
+    tools = manifest.get("tools") if isinstance(manifest, dict) else None
+    if not isinstance(tools, list):
+        tools = None
+    return {
+        "name": entry.get("name"),
+        "source": entry.get("source"),
+        "transport": entry.get("transport"),
+        "description": (manifest.get("description") if isinstance(manifest, dict) else None) or "",
+        "enabled": entry.get("enabled", True),
+        "last_health_status": entry.get("last_health_status"),
+        "last_error": entry.get("last_error"),
+        "last_health_check_at": entry.get("last_health_check_at"),
+        "tool_count": (len(tools) if tools is not None else None),
+    }
+
+
+def _mcp_error_response(exc: MCPError, *, default_status: int = 400) -> web.Response:
+    return web.json_response(
+        {
+            "error": exc.what,
+            "code": exc.code.value,
+            "what": exc.what,
+            "why": exc.why,
+            "fix": exc.fix,
+        },
+        status=default_status,
+    )
+
+
+async def handle_list_servers(request: web.Request) -> web.Response:
+    """GET /api/mcp/servers — list every server the queen actually uses.
+
+    Merges two sources:
+
+    - ``MCPRegistry.list_installed()`` — servers registered via
+      ``hive mcp add`` / the ``/api/mcp/servers`` POST route, stored in
+      ``~/.hive/mcp_registry/installed.json``. These carry
+      ``source: "local"`` (user-added) or ``source: "registry"``
+      (installed from the remote registry).
+    - Repo-baked queen servers from
+      ``core/framework/agents/queen/mcp_servers.json``. These are loaded
+      directly by the queen's ``ToolRegistry`` at boot and never touch
+      ``MCPRegistry``; we surface them here so the UI reflects what the
+      queen really talks to. They are not removable from the UI because
+      editing them requires changing the repo.
+
+    If a name collides between the two sources, the registry entry wins
+    because that's the one the user has customized.
+    """
+    reg = _registry()
+    registry_entries = [_server_to_summary(e) for e in reg.list_installed()]
+    seen_names = {e.get("name") for e in registry_entries}
+
+    package_entries = [e for e in _package_builtin_servers() if e.get("name") not in seen_names]
+
+    servers = [*package_entries, *registry_entries]
+    return web.json_response({"servers": servers})
+
+
+async def handle_add_server(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers — register a local MCP server.
+
+    Body mirrors ``MCPRegistry.add_local`` args:
+
+    ::
+
+        {
+          "name": "my-tool",
+          "transport": "stdio" | "http" | "sse" | "unix",
+          "command": "...", "args": [...], "env": {...}, "cwd": "...",
+          "url": "...", "headers": {...},
+          "socket_path": "...",
+          "description": "..."
+        }
+    """
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict):
+        return web.json_response({"error": "Body must be a JSON object"}, status=400)
+
+    name = body.get("name")
+    transport = body.get("transport")
+    if not isinstance(name, str) or not name.strip():
+        return web.json_response({"error": "'name' is required"}, status=400)
+    if transport not in _VALID_TRANSPORTS:
+        return web.json_response(
+            {"error": f"'transport' must be one of {sorted(_VALID_TRANSPORTS)}"},
+            status=400,
+        )
+
+    reg = _registry()
+    try:
+        entry = reg.add_local(
+            name=name.strip(),
+            transport=transport,
+            command=body.get("command"),
+            args=body.get("args"),
+            env=body.get("env"),
+            cwd=body.get("cwd"),
+            url=body.get("url"),
+            headers=body.get("headers"),
+            socket_path=body.get("socket_path"),
+            description=body.get("description", ""),
+        )
+    except MCPError as exc:
+        status = 409 if "already exists" in exc.what else 400
+        return _mcp_error_response(exc, default_status=status)
+    except Exception as exc:
+        logger.exception("MCP add_local failed for %r", name)
+        return web.json_response({"error": str(exc)}, status=500)
+
+    summary = _server_to_summary({"name": name, **entry})
+    return web.json_response(
+        {
+            "server": summary,
+            "hint": "Start a new queen session to use this server's tools.",
+        },
+        status=201,
+    )
+
+
+async def handle_remove_server(request: web.Request) -> web.Response:
+    """DELETE /api/mcp/servers/{name} — remove a local server."""
+    name = request.match_info["name"]
+    reg = _registry()
+
+    existing = reg.get_server(name)
+    if existing is None:
+        return web.json_response({"error": f"Server '{name}' not installed"}, status=404)
+    if existing.get("source") != "local":
+        return web.json_response(
+            {
+                "error": f"Server '{name}' is a built-in; it cannot be removed from the UI.",
+            },
+            status=400,
+        )
+
+    try:
+        reg.remove(name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    return web.json_response({"removed": name})
+
+
+async def handle_set_enabled(request: web.Request, *, enabled: bool) -> web.Response:
+    name = request.match_info["name"]
+    reg = _registry()
+    try:
+        if enabled:
+            reg.enable(name)
+        else:
+            reg.disable(name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    return web.json_response({"name": name, "enabled": enabled})
+
+
+async def handle_enable(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/enable."""
+    return await handle_set_enabled(request, enabled=True)
+
+
+async def handle_disable(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/disable."""
+    return await handle_set_enabled(request, enabled=False)
+
+
+async def handle_health(request: web.Request) -> web.Response:
+    """POST /api/mcp/servers/{name}/health — probe one server."""
+    name = request.match_info["name"]
+    reg = _registry()
+    try:
+        # MCPRegistry.health_check blocks on subprocess IO — run it off
+        # the event loop so the HTTP worker stays responsive.
+        import asyncio
+
+        result = await asyncio.to_thread(reg.health_check, name)
+    except MCPError as exc:
+        return _mcp_error_response(exc, default_status=404)
+    except Exception as exc:
+        logger.exception("MCP health_check failed for %r", name)
+        return web.json_response({"error": str(exc)}, status=500)
+    return web.json_response(result)
+
+
+def register_routes(app: web.Application) -> None:
+    """Register MCP server CRUD routes."""
+    app.router.add_get("/api/mcp/servers", handle_list_servers)
+    app.router.add_post("/api/mcp/servers", handle_add_server)
+    app.router.add_delete("/api/mcp/servers/{name}", handle_remove_server)
+    app.router.add_post("/api/mcp/servers/{name}/enable", handle_enable)
+    app.router.add_post("/api/mcp/servers/{name}/disable", handle_disable)
+    app.router.add_post("/api/mcp/servers/{name}/health", handle_health)
@@ -0,0 +1,87 @@
+"""Custom user prompts — CRUD for user-uploaded prompts.
+
+- GET    /api/prompts        — list all custom prompts
+- POST   /api/prompts        — add a new custom prompt
+- DELETE /api/prompts/{id}   — delete a custom prompt
+"""
+
+import json
+import logging
+import time
+
+from aiohttp import web
+
+from framework.config import HIVE_HOME
+
+logger = logging.getLogger(__name__)
+
+CUSTOM_PROMPTS_FILE = HIVE_HOME / "custom_prompts.json"
+
+
+def _load_custom_prompts() -> list[dict]:
+    if not CUSTOM_PROMPTS_FILE.exists():
+        return []
+    try:
+        data = json.loads(CUSTOM_PROMPTS_FILE.read_text(encoding="utf-8"))
+        return data if isinstance(data, list) else []
+    except Exception:
+        return []
+
+
+def _save_custom_prompts(prompts: list[dict]) -> None:
+    CUSTOM_PROMPTS_FILE.parent.mkdir(parents=True, exist_ok=True)
+    CUSTOM_PROMPTS_FILE.write_text(
+        json.dumps(prompts, indent=2, ensure_ascii=False) + "\n",
+        encoding="utf-8",
+    )
+
+
+async def handle_list_prompts(request: web.Request) -> web.Response:
+    """GET /api/prompts — list all custom prompts."""
+    return web.json_response({"prompts": _load_custom_prompts()})
+
+
+async def handle_create_prompt(request: web.Request) -> web.Response:
+    """POST /api/prompts — add a new custom prompt."""
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+
+    title = (body.get("title") or "").strip()
+    category = (body.get("category") or "").strip()
+    content = (body.get("content") or "").strip()
+
+    if not title or not content:
+        return web.json_response({"error": "Title and content are required"}, status=400)
+
+    prompts = _load_custom_prompts()
+    new_prompt = {
+        "id": f"custom_{int(time.time() * 1000)}",
+        "title": title,
+        "category": category or "custom",
+        "content": content,
+        "custom": True,
+    }
+    prompts.append(new_prompt)
+    _save_custom_prompts(prompts)
+    logger.info("Custom prompt added: %s", title)
+    return web.json_response(new_prompt, status=201)
+
+
+async def handle_delete_prompt(request: web.Request) -> web.Response:
+    """DELETE /api/prompts/{prompt_id} — delete a custom prompt."""
+    prompt_id = request.match_info["prompt_id"]
+    prompts = _load_custom_prompts()
+    before = len(prompts)
+    prompts = [p for p in prompts if p.get("id") != prompt_id]
+    if len(prompts) == before:
+        return web.json_response({"error": "Prompt not found"}, status=404)
+    _save_custom_prompts(prompts)
+    return web.json_response({"deleted": prompt_id})
+
+
+def register_routes(app: web.Application) -> None:
+    app.router.add_get("/api/prompts", handle_list_prompts)
+    app.router.add_post("/api/prompts", handle_create_prompt)
+    app.router.add_delete("/api/prompts/{prompt_id}", handle_delete_prompt)
@@ -0,0 +1,506 @@
+"""Per-queen MCP tool allowlist routes.
+
+- GET   /api/queen/{queen_id}/tools  -- enumerate the queen's tool surface
+- PATCH /api/queen/{queen_id}/tools  -- set or clear the MCP tool allowlist
+
+Lifecycle and synthetic tools (``ask_user``) are always part of the queen's
+surface in INDEPENDENT mode and are returned with ``editable: false``. MCP
+tools are grouped by origin server and carry per-tool ``enabled`` flags.
+
+The allowlist is persisted in a dedicated ``tools.json`` sidecar at
+``~/.hive/agents/queens/{queen_id}/tools.json``:
+
+- ``null`` / missing file -> "allow every MCP tool" (default)
+- ``[]``                  -> explicitly disable every MCP tool
+- ``["foo", "bar"]``      -> only these MCP tools pass through to the LLM
+
+Filtering happens in ``QueenPhaseState.rebuild_independent_filter`` so the
+LLM prompt cache stays warm between saves.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from aiohttp import web
+
+from framework.agents.queen.queen_profiles import (
+    ensure_default_queens,
+    load_queen_profile,
+)
+from framework.agents.queen.queen_tools_config import (
+    delete_queen_tools_config,
+    load_queen_tools_config,
+    tools_config_exists,
+    update_queen_tools_config,
+)
+
+logger = logging.getLogger(__name__)
+
+
+_SYNTHETIC_NAMES = {"ask_user"}
+
+
+async def _ensure_manager_catalog(manager: Any) -> dict[str, list[dict[str, Any]]]:
+    """Return the cached MCP tool catalog, building it on first call.
+
+    ``queen_orchestrator.create_queen`` populates ``_mcp_tool_catalog`` on
+    every queen boot. On a fresh backend process the user may open the
+    Tool Library before any queen session has started, so the catalog is
+    empty. In that case we build one from the shared MCP config; the
+    first call pays an MCP-subprocess-spawn cost, subsequent calls are
+    cache hits. The build runs off the event loop via asyncio.to_thread
+    so the HTTP worker stays responsive while MCP servers initialize.
+    """
+    if manager is None:
+        return {}
+    catalog = getattr(manager, "_mcp_tool_catalog", None)
+    if isinstance(catalog, dict) and catalog:
+        return catalog
+    try:
+        import asyncio
+
+        from framework.server.queen_orchestrator import build_queen_tool_registry_bare
+
+        registry, built = await asyncio.to_thread(build_queen_tool_registry_bare)
+        manager._mcp_tool_catalog = built  # type: ignore[attr-defined]
+        manager._bootstrap_tool_registry = registry  # type: ignore[attr-defined]
+        return built
+    except Exception:
+        logger.warning("Tool catalog bootstrap failed", exc_info=True)
+        return {}
+
+
+def _lifecycle_entries_without_session(
+    manager: Any,
+    mcp_names: set[str],
+) -> list[dict[str, Any]]:
+    """Derive lifecycle tool names from the registry even without a session.
+
+    We register queen lifecycle tools against a temporary registry using a
+    minimal stub, then subtract the MCP-origin set and the synthetic set.
+    The result matches what the queen sees at runtime (minus context-
+    specific variants).
+    """
+    registry = getattr(manager, "_bootstrap_tool_registry", None)
+    # If the bootstrap registry exists but doesn't carry lifecycle tools
+    # yet, register them now.
+    if registry is not None and not getattr(registry, "_lifecycle_bootstrap_done", False):
+        try:
+            from types import SimpleNamespace
+
+            from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
+
+            stub_session = SimpleNamespace(
+                id="tool-library-bootstrap",
+                colony_runtime=None,
+                event_bus=None,
+                worker_path=None,
+                phase_state=None,
+                llm=None,
+            )
+            register_queen_lifecycle_tools(
+                registry,
+                session=stub_session,
+                session_id=stub_session.id,
+                session_manager=None,
+                manager_session_id=stub_session.id,
+                phase_state=None,
+            )
+            registry._lifecycle_bootstrap_done = True  # type: ignore[attr-defined]
+        except Exception:
+            logger.debug("lifecycle bootstrap failed", exc_info=True)
+
+    if registry is None:
+        return []
+
+    out: list[dict[str, Any]] = []
+    for name, tool in sorted(registry.get_tools().items()):
+        if name in mcp_names or name in _SYNTHETIC_NAMES:
+            continue
+        out.append(
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        )
+    return out
+
+
+def _synthetic_entries() -> list[dict[str, Any]]:
+    """Return display metadata for synthetic tools injected by the agent loop.
+
+    Kept behind a lazy import so test harnesses that don't wire the agent
+    loop can still hit this route without blowing up.
+    """
+    try:
+        from framework.agent_loop.internals.synthetic_tools import build_ask_user_tool
+
+        tool = build_ask_user_tool()
+        return [
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        ]
+    except Exception:
+        return [
+            {
+                "name": "ask_user",
+                "description": "Pause and ask the user a structured question.",
+                "editable": False,
+            }
+        ]
+
+
+def _live_queen_session(manager: Any, queen_id: str) -> Any:
+    """Return any live DM session owned by this queen, or ``None``."""
+    sessions = getattr(manager, "_sessions", None) or {}
+    for session in sessions.values():
+        if getattr(session, "queen_name", None) != queen_id:
+            continue
+        # Prefer DM (non-colony) sessions
+        if getattr(session, "colony_runtime", None) is None:
+            return session
+    return None
+
+
+def _render_mcp_servers(
+    *,
+    mcp_tool_names_by_server: dict[str, list[dict[str, Any]]],
+    enabled_mcp_tools: list[str] | None,
+) -> list[dict[str, Any]]:
+    """Shape the mcp_tool_catalog entries for the API response."""
+    allowed: set[str] | None = None if enabled_mcp_tools is None else set(enabled_mcp_tools)
+    servers: list[dict[str, Any]] = []
+    for server_name in sorted(mcp_tool_names_by_server):
+        entries = mcp_tool_names_by_server[server_name]
+        tools = []
+        for entry in entries:
+            name = entry.get("name")
+            enabled = True if allowed is None else name in allowed
+            tools.append(
+                {
+                    "name": name,
+                    "description": entry.get("description", ""),
+                    "input_schema": entry.get("input_schema", {}),
+                    "enabled": enabled,
+                }
+            )
+        servers.append({"name": server_name, "tools": tools})
+    return servers
+
+
+def _catalog_from_live_session(session: Any) -> dict[str, list[dict[str, Any]]]:
+    """Rebuild a per-server tool catalog from a live queen session.
+
+    The session's registry is authoritative — this reflects any hot-added
+    MCP servers since the manager-level snapshot was cached.
+    """
+    registry = getattr(session, "_queen_tool_registry", None)
+    if registry is None:
+        # session._queen_tools_by_name is a stash from create_queen; we
+        # only have registry via the tools list, so reconstruct from the
+        # phase state instead.
+        phase_state = getattr(session, "phase_state", None)
+        if phase_state is None:
+            return {}
+        mcp_names = getattr(phase_state, "mcp_tool_names_all", set()) or set()
+        independent_tools = getattr(phase_state, "independent_tools", []) or []
+        result: dict[str, list[dict[str, Any]]] = {"(unknown)": []}
+        for tool in independent_tools:
+            if tool.name not in mcp_names:
+                continue
+            result["(unknown)"].append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        return result if result["(unknown)"] else {}
+
+    server_map = getattr(registry, "_mcp_server_tools", {}) or {}
+    tools_by_name = {t.name: t for t in registry.get_tools().values()}
+    catalog: dict[str, list[dict[str, Any]]] = {}
+    for server_name, tool_names in server_map.items():
+        entries: list[dict[str, Any]] = []
+        for name in sorted(tool_names):
+            tool = tools_by_name.get(name)
+            if tool is None:
+                continue
+            entries.append(
+                {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "input_schema": tool.parameters,
+                }
+            )
+        catalog[server_name] = entries
+    return catalog
+
+
+def _lifecycle_entries(
+    *,
+    session: Any,
+    mcp_tool_names_all: set[str],
+) -> list[dict[str, Any]]:
+    """Lifecycle tools = independent_tools minus MCP-origin minus synthetic.
+
+    We compute this from a live session when available so the list exactly
+    matches what the queen actually sees on her next turn.
+    """
+    if session is None:
+        return []
+    phase_state = getattr(session, "phase_state", None)
+    if phase_state is None:
+        return []
+    result: list[dict[str, Any]] = []
+    for tool in getattr(phase_state, "independent_tools", []) or []:
+        if tool.name in mcp_tool_names_all:
+            continue
+        if tool.name in _SYNTHETIC_NAMES:
+            continue
+        result.append(
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "editable": False,
+            }
+        )
+    return sorted(result, key=lambda x: x["name"])
+
+
+async def handle_get_tools(request: web.Request) -> web.Response:
+    """GET /api/queen/{queen_id}/tools — enumerate tool surface for the UI."""
+    queen_id = request.match_info["queen_id"]
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+
+    # Prefer a live session's registry for freshness. Otherwise use (or
+    # build on demand) the manager-level catalog so the Tool Library works
+    # even before any queen has been started in this process.
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    stale = not catalog
+
+    mcp_tool_names_all: set[str] = set()
+    for entries in catalog.values():
+        for entry in entries:
+            if entry.get("name"):
+                mcp_tool_names_all.add(entry["name"])
+
+    if session is not None:
+        lifecycle = _lifecycle_entries(
+            session=session,
+            mcp_tool_names_all=mcp_tool_names_all,
+        )
+    else:
+        lifecycle = _lifecycle_entries_without_session(manager, mcp_tool_names_all)
+
+    # Allowlist lives in the dedicated tools.json sidecar; helper
+    # migrates legacy profile.yaml field on first read, and falls back
+    # to the role-based default when no sidecar exists.
+    enabled_mcp_tools = load_queen_tools_config(queen_id, mcp_catalog=catalog)
+    is_role_default = not tools_config_exists(queen_id)
+
+    response = {
+        "queen_id": queen_id,
+        "enabled_mcp_tools": enabled_mcp_tools,
+        "is_role_default": is_role_default,
+        "stale": stale,
+        "lifecycle": lifecycle,
+        "synthetic": _synthetic_entries(),
+        "mcp_servers": _render_mcp_servers(
+            mcp_tool_names_by_server=catalog,
+            enabled_mcp_tools=enabled_mcp_tools,
+        ),
+    }
+    return web.json_response(response)
+
+
+async def handle_patch_tools(request: web.Request) -> web.Response:
+    """PATCH /api/queen/{queen_id}/tools — persist the MCP tool allowlist.
+
+    Body: ``{"enabled_mcp_tools": null | string[]}``.
+
+    - ``null`` resets to "allow every MCP tool" (default).
+    - A list is validated against the known MCP catalog; unknown names
+      are rejected with 400 so the frontend catches typos.
+    """
+    queen_id = request.match_info["queen_id"]
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+    if not isinstance(body, dict) or "enabled_mcp_tools" not in body:
+        return web.json_response(
+            {"error": "Body must be an object with an 'enabled_mcp_tools' field"},
+            status=400,
+        )
+
+    enabled = body["enabled_mcp_tools"]
+    if enabled is not None:
+        if not isinstance(enabled, list) or not all(isinstance(x, str) for x in enabled):
+            return web.json_response(
+                {"error": "'enabled_mcp_tools' must be null or a list of strings"},
+                status=400,
+            )
+
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    # Validate names against the known MCP tool catalog. We prefer a live
+    # session's registry for the most up-to-date set, then fall back to
+    # the manager-level snapshot (building it on demand if absent).
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    known_names: set[str] = set()
+    for entries in catalog.values():
+        for entry in entries:
+            if entry.get("name"):
+                known_names.add(entry["name"])
+
+    if enabled is not None and known_names:
+        unknown = sorted(set(enabled) - known_names)
+        if unknown:
+            return web.json_response(
+                {"error": "Unknown MCP tool name(s)", "unknown": unknown},
+                status=400,
+            )
+
+    # Persist — tools.json sidecar, not profile.yaml.
+    try:
+        update_queen_tools_config(queen_id, enabled)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    # Hot-reload every live DM session for this queen. The filter memo is
+    # rebuilt so the very next turn sees the new allowlist without a
+    # session restart, and the prompt cache is invalidated exactly once.
+    refreshed = 0
+    sessions = getattr(manager, "_sessions", None) or {}
+    for sess in sessions.values():
+        if getattr(sess, "queen_name", None) != queen_id:
+            continue
+        phase_state = getattr(sess, "phase_state", None)
+        if phase_state is None:
+            continue
+        phase_state.enabled_mcp_tools = enabled
+        rebuild = getattr(phase_state, "rebuild_independent_filter", None)
+        if callable(rebuild):
+            try:
+                rebuild()
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Queen tools: rebuild_independent_filter failed for session %s",
+                    getattr(sess, "id", "?"),
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Queen tools: queen_id=%s allowlist=%s refreshed_sessions=%d",
+        queen_id,
+        "null" if enabled is None else f"{len(enabled)} tool(s)",
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "queen_id": queen_id,
+            "enabled_mcp_tools": enabled,
+            "refreshed_sessions": refreshed,
+        }
+    )
+
+
+async def handle_delete_tools(request: web.Request) -> web.Response:
+    """DELETE /api/queen/{queen_id}/tools — drop the sidecar, fall back to role defaults.
+
+    Users click "Reset to role default" in the Tool Library. That
+    removes ``tools.json`` so the queen's effective allowlist becomes
+    the role-based default (or allow-all if the queen has no role
+    entry). Live sessions are refreshed so the next turn reflects the
+    change without a restart.
+    """
+    queen_id = request.match_info["queen_id"]
+    ensure_default_queens()
+    try:
+        load_queen_profile(queen_id)
+    except FileNotFoundError:
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    removed = delete_queen_tools_config(queen_id)
+
+    # Recompute the queen's effective allowlist from the role defaults
+    # so we can hot-reload live sessions in one pass (same shape as
+    # PATCH).
+    manager = request.app.get("manager")
+    session = _live_queen_session(manager, queen_id) if manager is not None else None
+    if session is not None:
+        catalog = _catalog_from_live_session(session)
+    else:
+        catalog = await _ensure_manager_catalog(manager)
+    new_enabled = load_queen_tools_config(queen_id, mcp_catalog=catalog)
+
+    refreshed = 0
+    sessions = getattr(manager, "_sessions", None) or {}
+    for sess in sessions.values():
+        if getattr(sess, "queen_name", None) != queen_id:
+            continue
+        phase_state = getattr(sess, "phase_state", None)
+        if phase_state is None:
+            continue
+        phase_state.enabled_mcp_tools = new_enabled
+        rebuild = getattr(phase_state, "rebuild_independent_filter", None)
+        if callable(rebuild):
+            try:
+                rebuild()
+                refreshed += 1
+            except Exception:
+                logger.debug(
+                    "Queen tools: rebuild_independent_filter failed for session %s",
+                    getattr(sess, "id", "?"),
+                    exc_info=True,
+                )
+
+    logger.info(
+        "Queen tools: queen_id=%s reset-to-default removed=%s refreshed_sessions=%d",
+        queen_id,
+        removed,
+        refreshed,
+    )
+    return web.json_response(
+        {
+            "queen_id": queen_id,
+            "removed": removed,
+            "enabled_mcp_tools": new_enabled,
+            "is_role_default": True,
+            "refreshed_sessions": refreshed,
+        }
+    )
+
+
+def register_routes(app: web.Application) -> None:
+    """Register queen-tools routes."""
+    app.router.add_get("/api/queen/{queen_id}/tools", handle_get_tools)
+    app.router.add_patch("/api/queen/{queen_id}/tools", handle_patch_tools)
+    app.router.add_delete("/api/queen/{queen_id}/tools", handle_delete_tools)
@@ -3,6 +3,8 @@
 - GET    /api/queen/profiles                -- list all queen profiles (id, name, title)
 - GET    /api/queen/{queen_id}/profile      -- get full queen profile
 - PATCH  /api/queen/{queen_id}/profile      -- update queen profile fields
+- POST   /api/queen/{queen_id}/avatar       -- upload queen avatar image
+- GET    /api/queen/{queen_id}/avatar       -- serve queen avatar image
 - POST   /api/queen/{queen_id}/session      -- get or create a persistent session for a queen
 - POST   /api/queen/{queen_id}/session/select -- resume a specific session for a queen
 - POST   /api/queen/{queen_id}/session/new  -- create a fresh session for a queen
@@ -25,17 +27,6 @@ from framework.config import QUEENS_DIR
 logger = logging.getLogger(__name__)


-async def _stop_live_sessions(manager, keep_session_id: str | None = None) -> None:
-    """Stop live sessions so only the selected queen session remains active."""
-    for session in list(manager.list_sessions()):
-        if keep_session_id and session.id == keep_session_id:
-            continue
-        try:
-            await manager.stop_session(session.id)
-        except Exception:
-            logger.debug("Failed to stop session %s during queen switch", session.id)
-
-
 def _read_queen_session_meta(queen_id: str, session_id: str) -> dict[str, Any]:
    """Return persisted metadata for a queen session when available."""
    session_dir = QUEENS_DIR / queen_id / "sessions" / session_id
@@ -177,6 +168,34 @@ async def handle_get_profile(request: web.Request) -> web.Response:
    return web.json_response({"id": queen_id, **api_profile})


+def _reverse_transform_for_yaml(body: dict) -> dict:
+    """Map API-format fields back to YAML profile fields.
+
+    The API exposes a simplified view (summary, skills, signature_achievement)
+    that maps onto the underlying YAML structure (core_traits, hidden_background,
+    psychological_profile, world_lore, etc.).
+    """
+    yaml_updates: dict[str, Any] = {}
+
+    if "name" in body:
+        yaml_updates["name"] = body["name"]
+    if "title" in body:
+        yaml_updates["title"] = body["title"]
+
+    if "summary" in body:
+        # Summary is displayed as core_traits + anti_stereotype joined by \n\n.
+        # Store the full text in core_traits for simplicity.
+        yaml_updates["core_traits"] = body["summary"]
+
+    if "skills" in body:
+        yaml_updates["skills"] = body["skills"]
+
+    if "signature_achievement" in body:
+        yaml_updates.setdefault("world_lore", {})["habitat"] = body["signature_achievement"]
+
+    return yaml_updates
+
+
 async def handle_update_profile(request: web.Request) -> web.Response:
    """PATCH /api/queen/{queen_id}/profile — update queen profile fields."""
    queen_id = request.match_info["queen_id"]
@@ -186,11 +205,18 @@ async def handle_update_profile(request: web.Request) -> web.Response:
        return web.json_response({"error": "Invalid JSON body"}, status=400)
    if not isinstance(body, dict):
        return web.json_response({"error": "Body must be a JSON object"}, status=400)
+
+    yaml_updates = _reverse_transform_for_yaml(body)
+    if not yaml_updates:
+        return web.json_response({"error": "No valid fields to update"}, status=400)
+
    try:
-        updated = update_queen_profile(queen_id, body)
+        updated = update_queen_profile(queen_id, yaml_updates)
    except FileNotFoundError:
        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
-    return web.json_response({"id": queen_id, **updated})
+
+    api_profile = _transform_profile_for_api(updated)
+    return web.json_response({"id": queen_id, **api_profile})


 async def handle_queen_session(request: web.Request) -> web.Response:
@@ -218,20 +244,26 @@ async def handle_queen_session(request: web.Request) -> web.Response:
    initial_prompt = body.get("initial_prompt")
    initial_phase = body.get("initial_phase")

-    # 1. Check for an existing live session bound to this queen.
-    for session in manager.list_sessions():
-        if session.queen_name == queen_id:
-            return web.json_response(
-                {
-                    "session_id": session.id,
-                    "queen_id": queen_id,
-                    "status": "live",
-                }
-            )
-
-    # Stop any live sessions bound to a different queen so only one queen
-    # is active at a time.
-    await _stop_live_sessions(manager)
+    # 1. Check for an existing live DM session bound to this queen.
+    # Skip colony sessions: a colony forked from this queen also carries
+    # queen_name == queen_id, but it has a worker loaded (colony_id /
+    # worker_path set) and is the colony's chat, not the queen's DM.
+    # When multiple DM sessions for this queen are live at once (e.g. the
+    # user created a new session, then navigated away and back), return
+    # the most recently loaded one so we don't resurrect a stale older
+    # session ahead of a freshly created one.
+    live_matches = [
+        s for s in manager.list_sessions() if s.queen_name == queen_id and s.colony_id is None and s.worker_path is None
+    ]
+    if live_matches:
+        latest = max(live_matches, key=lambda s: s.loaded_at)
+        return web.json_response(
+            {
+                "session_id": latest.id,
+                "queen_id": queen_id,
+                "status": "live",
+            }
+        )

    # 2. Find the most recent cold session for this queen and resume it.
    # IMPORTANT: skip sessions that don't belong in the queen DM:
@@ -323,7 +355,6 @@ async def handle_select_queen_session(request: web.Request) -> web.Response:

    live_session = manager.get_session(target_session_id)
    if live_session is not None:
-        await _stop_live_sessions(manager, keep_session_id=target_session_id)
        return web.json_response(
            {
                "session_id": live_session.id,
@@ -332,11 +363,11 @@ async def handle_select_queen_session(request: web.Request) -> web.Response:
            }
        )

-    await _stop_live_sessions(manager)
-
    meta = _read_queen_session_meta(queen_id, target_session_id)
    agent_path = meta.get("agent_path")
-    initial_phase = None if agent_path else "independent"
+    # Colony resume (agent loaded) → "working" (3-phase target).
+    # Standalone queen resume → "independent" (DM mode).
+    initial_phase = "working" if agent_path else "independent"
    session = await _create_bound_queen_session(
        manager,
        queen_id,
@@ -354,6 +385,8 @@ async def handle_select_queen_session(request: web.Request) -> web.Response:

 async def handle_new_queen_session(request: web.Request) -> web.Response:
    """POST /api/queen/{queen_id}/session/new -- create a fresh queen session."""
+    from framework.tools.queen_lifecycle_tools import QUEEN_PHASES
+
    queen_id = request.match_info["queen_id"]
    manager = request.app["manager"]

@@ -363,11 +396,26 @@ async def handle_new_queen_session(request: web.Request) -> web.Response:
    except FileNotFoundError:
        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)

-    body = await request.json() if request.can_read_body else {}
+    if request.can_read_body:
+        try:
+            body = await request.json()
+        except json.JSONDecodeError:
+            return web.json_response({"error": "Invalid JSON body"}, status=400)
+        if not isinstance(body, dict):
+            return web.json_response({"error": "Request body must be a JSON object"}, status=400)
+    else:
+        body = {}
    initial_prompt = body.get("initial_prompt")
    initial_phase = body.get("initial_phase") or "independent"
+    if initial_phase not in QUEEN_PHASES:
+        return web.json_response(
+            {
+                "error": f"Invalid initial_phase '{initial_phase}'",
+                "valid": sorted(QUEEN_PHASES),
+            },
+            status=400,
+        )

-    await _stop_live_sessions(manager)
    session = await manager.create_session(
        initial_prompt=initial_prompt,
        queen_name=queen_id,
@@ -382,11 +430,98 @@ async def handle_new_queen_session(request: web.Request) -> web.Response:
    )


+MAX_AVATAR_BYTES = 2 * 1024 * 1024  # 2 MB max after compression
+_ALLOWED_AVATAR_TYPES = {
+    "image/jpeg": ".jpg",
+    "image/png": ".png",
+    "image/webp": ".webp",
+}
+
+
+async def handle_upload_avatar(request: web.Request) -> web.Response:
+    """POST /api/queen/{queen_id}/avatar — upload queen avatar image.
+
+    Accepts multipart/form-data with a single file field named 'avatar'.
+    Stores as avatar.{ext} in the queen's profile directory.
+    """
+    from framework.config import QUEENS_DIR
+
+    queen_id = request.match_info["queen_id"]
+    queen_dir = QUEENS_DIR / queen_id
+    if not (queen_dir / "profile.yaml").exists():
+        return web.json_response({"error": f"Queen '{queen_id}' not found"}, status=404)
+
+    reader = await request.multipart()
+    field = await reader.next()
+    if field is None or field.name != "avatar":
+        return web.json_response({"error": "Expected a file field named 'avatar'"}, status=400)
+
+    content_type = field.headers.get("Content-Type", "application/octet-stream")
+    # Also check by content_type from the field
+    if hasattr(field, "content_type"):
+        content_type = field.content_type or content_type
+
+    ext = _ALLOWED_AVATAR_TYPES.get(content_type)
+    if not ext:
+        return web.json_response(
+            {"error": f"Unsupported image type: {content_type}. Use JPEG, PNG, or WebP."},
+            status=400,
+        )
+
+    # Read the file data with size limit
+    data = bytearray()
+    while True:
+        chunk = await field.read_chunk(8192)
+        if not chunk:
+            break
+        data.extend(chunk)
+        if len(data) > MAX_AVATAR_BYTES:
+            return web.json_response(
+                {"error": f"Image too large. Maximum size is {MAX_AVATAR_BYTES // 1024 // 1024} MB."},
+                status=400,
+            )
+
+    if not data:
+        return web.json_response({"error": "Empty file"}, status=400)
+
+    # Remove any existing avatar files
+    for existing in queen_dir.glob("avatar.*"):
+        existing.unlink(missing_ok=True)
+
+    # Write the new avatar
+    avatar_path = queen_dir / f"avatar{ext}"
+    avatar_path.write_bytes(data)
+
+    logger.info("Avatar uploaded for queen %s: %s (%d bytes)", queen_id, avatar_path.name, len(data))
+    return web.json_response({"avatar_url": f"/api/queen/{queen_id}/avatar"})
+
+
+async def handle_get_avatar(request: web.Request) -> web.Response:
+    """GET /api/queen/{queen_id}/avatar — serve queen avatar image."""
+    from framework.config import QUEENS_DIR
+
+    queen_id = request.match_info["queen_id"]
+    queen_dir = QUEENS_DIR / queen_id
+
+    # Find avatar file with any supported extension
+    for ext in _ALLOWED_AVATAR_TYPES.values():
+        avatar_path = queen_dir / f"avatar{ext}"
+        if avatar_path.exists():
+            return web.FileResponse(
+                avatar_path,
+                headers={"Cache-Control": "public, max-age=3600"},
+            )
+
+    return web.json_response({"error": "No avatar found"}, status=404)
+
+
 def register_routes(app: web.Application) -> None:
    """Register queen profile routes."""
    app.router.add_get("/api/queen/profiles", handle_list_profiles)
    app.router.add_get("/api/queen/{queen_id}/profile", handle_get_profile)
    app.router.add_patch("/api/queen/{queen_id}/profile", handle_update_profile)
+    app.router.add_post("/api/queen/{queen_id}/avatar", handle_upload_avatar)
+    app.router.add_get("/api/queen/{queen_id}/avatar", handle_get_avatar)
    app.router.add_post("/api/queen/{queen_id}/session", handle_queen_session)
    app.router.add_post("/api/queen/{queen_id}/session/select", handle_select_queen_session)
    app.router.add_post("/api/queen/{queen_id}/session/new", handle_new_queen_session)
@@ -10,6 +10,7 @@ Session-primary routes:
 - GET    /api/sessions/{session_id}/stats             — runtime statistics
 - GET    /api/sessions/{session_id}/entry-points      — list entry points
 - PATCH  /api/sessions/{session_id}/triggers/{id}    — update trigger task
+- POST   /api/sessions/{session_id}/triggers/{id}/run — fire trigger once (manual)
 - GET    /api/sessions/{session_id}/colonies          — list colony IDs
 - GET    /api/sessions/{session_id}/events/history   — persisted eventbus log (for replay)

@@ -63,6 +64,8 @@ def _session_to_live_dict(session) -> dict:
        "queen_supports_images": supports_image_tool_results(queen_model) if queen_model else True,
        "queen_id": getattr(phase_state, "queen_id", None) if phase_state else None,
        "queen_name": (phase_state.queen_profile or {}).get("name") if phase_state else None,
+        "colony_spawned": getattr(session, "colony_spawned", False),
+        "spawned_colony_name": getattr(session, "spawned_colony_name", None),
    }


@@ -119,8 +122,19 @@ async def handle_create_session(request: web.Request) -> web.Response:
    (equivalent to the old POST /api/agents). Otherwise creates a queen-only
    session that can later have a colony loaded via POST /sessions/{id}/colony.
    """
+    from framework.agents.queen.queen_profiles import ensure_default_queens, load_queen_profile
+    from framework.tools.queen_lifecycle_tools import QUEEN_PHASES
+
    manager = _get_manager(request)
-    body = await request.json() if request.can_read_body else {}
+    if request.can_read_body:
+        try:
+            body = await request.json()
+        except json.JSONDecodeError:
+            return web.json_response({"error": "Invalid JSON body"}, status=400)
+        if not isinstance(body, dict):
+            return web.json_response({"error": "Request body must be a JSON object"}, status=400)
+    else:
+        body = {}
    agent_path = body.get("agent_path")
    agent_id = body.get("agent_id")
    session_id = body.get("session_id")
@@ -131,6 +145,21 @@ async def handle_create_session(request: web.Request) -> web.Response:
    initial_phase = body.get("initial_phase")
    worker_name = body.get("worker_name")

+    if initial_phase is not None and initial_phase not in QUEEN_PHASES:
+        return web.json_response(
+            {
+                "error": f"Invalid initial_phase '{initial_phase}'",
+                "valid": sorted(QUEEN_PHASES),
+            },
+            status=400,
+        )
+    if queen_name:
+        ensure_default_queens()
+        try:
+            load_queen_profile(queen_name)
+        except FileNotFoundError:
+            return web.json_response({"error": f"Queen '{queen_name}' not found"}, status=404)
+
    if agent_path:
        try:
            agent_path = str(validate_agent_path(agent_path))
@@ -157,6 +186,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
                model=model,
                initial_prompt=initial_prompt,
                queen_resume_from=queen_resume_from,
+                queen_name=queen_name,
                initial_phase=initial_phase,
            )
    except ValueError as e:
@@ -245,7 +275,14 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
            }
            mono = getattr(session, "trigger_next_fire", {}).get(t.id)
            if mono is not None:
-                entry["next_fire_in"] = max(0.0, mono - time.monotonic())
+                remaining = max(0.0, mono - time.monotonic())
+                entry["next_fire_in"] = remaining
+                entry["next_fire_at"] = int((time.time() + remaining) * 1000)
+            stats = getattr(session, "trigger_fire_stats", {}).get(t.id)
+            if stats:
+                entry["fire_count"] = stats.get("fire_count", 0)
+                if stats.get("last_fired_at") is not None:
+                    entry["last_fired_at"] = stats["last_fired_at"]
            data["entry_points"].append(entry)
        data["colonies"] = session.colony_runtime.list_graphs()

@@ -395,7 +432,14 @@ async def handle_session_entry_points(request: web.Request) -> web.Response:
        }
        mono = getattr(session, "trigger_next_fire", {}).get(t.id)
        if mono is not None:
-            entry["next_fire_in"] = max(0.0, mono - time.monotonic())
+            remaining = max(0.0, mono - time.monotonic())
+            entry["next_fire_in"] = remaining
+            entry["next_fire_at"] = int((time.time() + remaining) * 1000)
+        stats = getattr(session, "trigger_fire_stats", {}).get(t.id)
+        if stats:
+            entry["fire_count"] = stats.get("fire_count", 0)
+            if stats.get("last_fired_at") is not None:
+                entry["last_fired_at"] = stats["last_fired_at"]
        entry_points.append(entry)
    return web.json_response({"entry_points": entry_points})

@@ -546,6 +590,60 @@ async def handle_update_trigger_task(request: web.Request) -> web.Response:
    )


+async def handle_run_trigger(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/triggers/{trigger_id}/run — fire the trigger once.
+
+    Manual invocation for testing. Works whether the trigger is active or
+    inactive; does not change active state and does not reset the scheduled
+    next-fire time of an active timer.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    trigger_id = request.match_info["trigger_id"]
+    tdef = getattr(session, "available_triggers", {}).get(trigger_id)
+    if tdef is None:
+        return web.json_response(
+            {"error": f"Trigger '{trigger_id}' not found"},
+            status=404,
+        )
+
+    if getattr(session, "colony_runtime", None) is None:
+        return web.json_response({"error": "Colony not loaded"}, status=409)
+
+    executor = getattr(session, "queen_executor", None)
+    queen_node = getattr(executor, "node_registry", {}).get("queen") if executor else None
+    if queen_node is None:
+        return web.json_response({"error": "Queen not ready"}, status=409)
+
+    from framework.agent_loop.agent_loop import TriggerEvent
+
+    try:
+        await queen_node.inject_trigger(
+            TriggerEvent(
+                trigger_type=tdef.trigger_type,
+                source_id=trigger_id,
+                payload={
+                    "task": tdef.task or "",
+                    "trigger_config": tdef.trigger_config,
+                    "forced": True,
+                },
+            )
+        )
+    except Exception as exc:  # noqa: BLE001
+        return web.json_response(
+            {"error": f"Failed to fire trigger: {exc}"},
+            status=500,
+        )
+
+    from framework.tools.queen_lifecycle_tools import _emit_trigger_fired
+
+    await _emit_trigger_fired(session, trigger_id, tdef.trigger_type)
+
+    return web.json_response({"status": "fired", "trigger_id": trigger_id})
+
+
 async def handle_activate_trigger(request: web.Request) -> web.Response:
    """POST /api/sessions/{session_id}/triggers/{trigger_id}/activate — start a trigger."""
    session, err = resolve_session(request)
@@ -597,6 +695,17 @@ async def handle_activate_trigger(request: web.Request) -> web.Response:

        runner = getattr(session, "runner", None)
        colony_entry = runner.graph.entry_node if runner else None
+        config_out = dict(tdef.trigger_config)
+        mono = getattr(session, "trigger_next_fire", {}).get(trigger_id)
+        if mono is not None:
+            remaining = max(0.0, mono - time.monotonic())
+            config_out["next_fire_in"] = remaining
+            config_out["next_fire_at"] = int((time.time() + remaining) * 1000)
+        stats = getattr(session, "trigger_fire_stats", {}).get(trigger_id)
+        if stats:
+            config_out["fire_count"] = stats.get("fire_count", 0)
+            if stats.get("last_fired_at") is not None:
+                config_out["last_fired_at"] = stats["last_fired_at"]
        await bus.publish(
            AgentEvent(
                type=EventType.TRIGGER_ACTIVATED,
@@ -604,7 +713,7 @@ async def handle_activate_trigger(request: web.Request) -> web.Response:
                data={
                    "trigger_id": trigger_id,
                    "trigger_type": tdef.trigger_type,
-                    "trigger_config": tdef.trigger_config,
+                    "trigger_config": config_out,
                    "name": tdef.description or trigger_id,
                    **({"entry_node": colony_entry} if colony_entry else {}),
                },
@@ -686,6 +795,10 @@ async def handle_session_colonies(request: web.Request) -> web.Response:
    return web.json_response({"colonies": colonies})


+_EVENTS_HISTORY_DEFAULT_LIMIT = 2000
+_EVENTS_HISTORY_MAX_LIMIT = 10000
+
+
 async def handle_session_events_history(request: web.Request) -> web.Response:
    """GET /api/sessions/{session_id}/events/history — persisted eventbus log.

@@ -693,17 +806,58 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
    both live sessions and cold (post-server-restart) sessions.  The frontend
    replays these events through ``sseEventToChatMessage`` to fully reconstruct
    the UI state on resume.
+
+    Query params:
+        limit: maximum number of events to return (default 2000, max 10000).
+            The TAIL of the file is returned — i.e. the most recent N events.
+            Older events are dropped and ``truncated`` is set to True.
+
+    Response shape::
+
+        {
+            "events": [...],          # up to ``limit`` events, oldest-first
+            "session_id": "...",
+            "total": 12345,           # total events in the file
+            "returned": 2000,         # len(events)
+            "truncated": true,        # total > returned
+            "limit": 2000,            # the effective limit used
+        }
+
+    ``events.jsonl`` is append-only chronological, so "last N lines" == "most
+    recent N events". Long-running colonies have produced files with 50k+
+    events; before this cap, restoring on page-mount shipped the whole thing
+    down the wire and blocked the UI for seconds.
    """
    session_id = request.match_info["session_id"]

+    try:
+        limit = int(request.query.get("limit", str(_EVENTS_HISTORY_DEFAULT_LIMIT)))
+    except ValueError:
+        limit = _EVENTS_HISTORY_DEFAULT_LIMIT
+    limit = max(1, min(limit, _EVENTS_HISTORY_MAX_LIMIT))
+
    from framework.server.session_manager import _find_queen_session_dir

    queen_dir = _find_queen_session_dir(session_id)
    events_path = queen_dir / "events.jsonl"
    if not events_path.exists():
-        return web.json_response({"events": [], "session_id": session_id})
+        return web.json_response(
+            {
+                "events": [],
+                "session_id": session_id,
+                "total": 0,
+                "returned": 0,
+                "truncated": False,
+                "limit": limit,
+            }
+        )

-    events: list[dict] = []
+    # Tail the file using a bounded deque — O(limit) memory regardless
+    # of file size. No need to materialize the whole list only to slice it.
+    from collections import deque
+
+    tail: deque[dict] = deque(maxlen=limit)
+    total = 0
    try:
        with open(events_path, encoding="utf-8") as f:
            for line in f:
@@ -711,13 +865,34 @@ async def handle_session_events_history(request: web.Request) -> web.Response:
                if not line:
                    continue
                try:
-                    events.append(json.loads(line))
+                    evt = json.loads(line)
                except json.JSONDecodeError:
                    continue
+                total += 1
+                tail.append(evt)
    except OSError:
-        return web.json_response({"events": [], "session_id": session_id})
+        return web.json_response(
+            {
+                "events": [],
+                "session_id": session_id,
+                "total": 0,
+                "returned": 0,
+                "truncated": False,
+                "limit": limit,
+            }
+        )

-    return web.json_response({"events": events, "session_id": session_id})
+    events = list(tail)
+    return web.json_response(
+        {
+            "events": events,
+            "session_id": session_id,
+            "total": total,
+            "returned": len(events),
+            "truncated": total > len(events),
+            "limit": limit,
+        }
+    )


 async def handle_session_history(request: web.Request) -> web.Response:
@@ -800,6 +975,8 @@ async def handle_discover(request: web.Request) -> web.Response:
                "tool_count": entry.tool_count,
                "tags": entry.tags,
                "last_active": entry.last_active,
+                "created_at": entry.created_at,
+                "icon": entry.icon,
                "is_loaded": str(entry.path.resolve()) in loaded_paths,
                "workers": [w.to_dict() for w in entry.workers],
            }
@@ -880,6 +1057,40 @@ async def handle_reveal_session_folder(request: web.Request) -> web.Response:
    return web.json_response({"path": str(folder)})


+async def handle_update_colony_metadata(request: web.Request) -> web.Response:
+    """PATCH /api/agents/metadata — update colony metadata (e.g. icon).
+
+    Body: {"agent_path": "...", "icon": "rocket"}
+    """
+    try:
+        body = await request.json()
+    except Exception:
+        return web.json_response({"error": "Invalid JSON body"}, status=400)
+
+    agent_path = body.get("agent_path")
+    if not agent_path:
+        return web.json_response({"error": "agent_path is required"}, status=400)
+
+    try:
+        resolved = validate_agent_path(agent_path)
+    except ValueError as exc:
+        return web.json_response({"error": str(exc)}, status=400)
+
+    metadata_path = resolved / "metadata.json"
+    metadata: dict = {}
+    if metadata_path.exists():
+        try:
+            metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
+        except Exception:
+            pass
+
+    if "icon" in body:
+        metadata["icon"] = body["icon"]
+
+    metadata_path.write_text(json.dumps(metadata, indent=2, ensure_ascii=False), encoding="utf-8")
+    return web.json_response({"ok": True})
+
+
 # ------------------------------------------------------------------
 # Route registration
 # ------------------------------------------------------------------
@@ -890,6 +1101,7 @@ def register_routes(app: web.Application) -> None:
    # Discovery & agent management
    app.router.add_get("/api/discover", handle_discover)
    app.router.add_delete("/api/agents", handle_delete_agent)
+    app.router.add_patch("/api/agents/metadata", handle_update_colony_metadata)

    # Session lifecycle
    app.router.add_post("/api/sessions", handle_create_session)
@@ -917,6 +1129,10 @@ def register_routes(app: web.Application) -> None:
        "/api/sessions/{session_id}/triggers/{trigger_id}/deactivate",
        handle_deactivate_trigger,
    )
+    app.router.add_post(
+        "/api/sessions/{session_id}/triggers/{trigger_id}/run",
+        handle_run_trigger,
+    )
    app.router.add_get("/api/sessions/{session_id}/colonies", handle_session_colonies)

    app.router.add_get("/api/sessions/{session_id}/events/history", handle_session_events_history)
@@ -236,6 +236,217 @@ async def handle_node_tools(request: web.Request) -> web.Response:
    return web.json_response({"tools": tools_out})


+# ---------------------------------------------------------------------------
+# Live worker control — list / stop a specific worker / stop all
+# ---------------------------------------------------------------------------
+
+
+def _active_colony(session):
+    """Return the session's unified ColonyRuntime (``session.colony``) if present.
+
+    All spawned workers (queen-overseer + run_parallel_workers fan-outs)
+    are hosted here. ``session.colony_runtime`` is a different concept
+    (loaded agent graph) and doesn't hold the live worker registry we
+    need to enumerate / stop.
+    """
+    return getattr(session, "colony", None)
+
+
+def _build_live_workers_payload(colony) -> list[dict]:
+    """Serialize the colony's current worker registry.
+
+    Extracted so both the one-shot ``GET /workers`` handler and the SSE
+    ``/workers/stream`` handler render the exact same shape.
+    """
+    if colony is None:
+        return []
+
+    now = time.monotonic()
+    payload: list[dict] = []
+    try:
+        workers = list(colony._workers.values())  # type: ignore[attr-defined]
+    except Exception:
+        workers = []
+
+    for w in workers:
+        started_at = getattr(w, "_started_at", 0.0) or 0.0
+        duration = (now - started_at) if started_at else 0.0
+        result = getattr(w, "_result", None)
+        payload.append(
+            {
+                "worker_id": w.id,
+                "task": (w.task or "")[:400],
+                "status": str(getattr(w, "status", "unknown")),
+                "is_active": bool(getattr(w, "is_active", False)),
+                "duration_seconds": round(duration, 1),
+                "explicit_report": getattr(w, "_explicit_report", None),
+                "result_status": (result.status if result else None),
+                "result_summary": (result.summary if result else None),
+            }
+        )
+
+    # Active workers first, then terminated, newest-started first within group.
+    payload.sort(key=lambda r: (not r["is_active"], -(r["duration_seconds"] or 0)))
+    return payload
+
+
+def _payload_change_signature(payload: list[dict]) -> tuple:
+    """Cheap fingerprint for change detection on the SSE stream.
+
+    We intentionally exclude ``duration_seconds`` — it ticks every call
+    and would make every poll look like a change, defeating the "only
+    emit on change" optimisation. Everything else (status, result,
+    explicit_report) actually reflects worker state transitions.
+    """
+    return tuple(
+        (
+            w["worker_id"],
+            w["status"],
+            w["is_active"],
+            w["result_status"],
+            w["result_summary"],
+            bool(w["explicit_report"]),
+        )
+        for w in payload
+    )
+
+
+async def handle_live_workers_stream(request: web.Request) -> web.StreamResponse:
+    """GET /api/sessions/{session_id}/workers/stream — SSE feed.
+
+    Emits a ``snapshot`` event immediately, then re-emits every time
+    the worker registry changes (status transitions, new spawns, new
+    reports). Polls the runtime every 2s internally — the colony's
+    ``_workers`` dict is not observable otherwise. Clients disconnecting
+    bubbles up as ConnectionResetError from ``resp.write``.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    import asyncio
+
+    resp = web.StreamResponse(
+        status=200,
+        headers={
+            "Content-Type": "text/event-stream",
+            "Cache-Control": "no-cache, no-transform",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
+    await resp.prepare(request)
+
+    async def _send(event: str, data) -> None:
+        payload = f"event: {event}\ndata: {json.dumps(data)}\n\n"
+        await resp.write(payload.encode("utf-8"))
+
+    last_signature: tuple | None = None
+    try:
+        while True:
+            colony = _active_colony(session)
+            workers = _build_live_workers_payload(colony)
+            signature = _payload_change_signature(workers)
+            if signature != last_signature:
+                await _send("snapshot", {"workers": workers})
+                last_signature = signature
+            await asyncio.sleep(2.0)
+    except (asyncio.CancelledError, ConnectionResetError):
+        raise
+    except Exception as exc:
+        logger.warning("live workers stream error: %s", exc, exc_info=True)
+    return resp
+
+
+async def handle_stop_live_worker(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/workers/{worker_id}/stop — force-stop one worker.
+
+    Calls ``colony.stop_worker(worker_id)`` which cancels the worker's
+    background task. The worker's terminal SUBAGENT_REPORT still fires
+    (preserving any _explicit_report) so the queen sees a `[WORKER_REPORT]`
+    with ``status="stopped"``.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    worker_id = request.match_info.get("worker_id", "")
+    if not worker_id:
+        return web.json_response({"error": "worker_id required"}, status=400)
+
+    colony = _active_colony(session)
+    if colony is None:
+        return web.json_response({"error": "No active colony on this session"}, status=503)
+
+    worker = colony._workers.get(worker_id)  # type: ignore[attr-defined]
+    if worker is None:
+        return web.json_response({"error": f"Worker '{worker_id}' not found"}, status=404)
+    if not worker.is_active:
+        return web.json_response(
+            {
+                "stopped": False,
+                "reason": "Worker already terminated",
+                "worker_id": worker_id,
+                "status": str(worker.status),
+            }
+        )
+
+    try:
+        await colony.stop_worker(worker_id)
+    except Exception as exc:
+        logger.exception("stop_worker failed for %s", worker_id)
+        return web.json_response(
+            {"stopped": False, "error": str(exc), "worker_id": worker_id},
+            status=500,
+        )
+
+    return web.json_response({"stopped": True, "worker_id": worker_id})
+
+
+async def handle_stop_all_live_workers(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/workers/stop-all — force-stop every active worker.
+
+    The persistent overseer (if any) is skipped — it is the queen itself
+    and stopping it would end the session. Only ephemeral fan-out workers
+    are targeted.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    colony = _active_colony(session)
+    if colony is None:
+        return web.json_response({"stopped": [], "error": "No active colony on this session"})
+
+    stopped: list[str] = []
+    errors: list[dict] = []
+    try:
+        workers = list(colony._workers.values())  # type: ignore[attr-defined]
+    except Exception:
+        workers = []
+
+    for w in workers:
+        if not w.is_active:
+            continue
+        if getattr(w, "_persistent", False):
+            # The overseer — don't kill the queen.
+            continue
+        try:
+            await colony.stop_worker(w.id)
+            stopped.append(w.id)
+        except Exception as exc:
+            logger.warning("stop-all: failed to stop %s: %s", w.id, exc)
+            errors.append({"worker_id": w.id, "error": str(exc)})
+
+    return web.json_response(
+        {
+            "stopped": stopped,
+            "stopped_count": len(stopped),
+            "errors": errors if errors else None,
+        }
+    )
+
+
 def register_routes(app: web.Application) -> None:
    """Register worker inspection routes."""
    app.router.add_get("/api/sessions/{session_id}/colonies/{colony_id}/nodes", handle_list_nodes)
@@ -248,3 +459,18 @@ def register_routes(app: web.Application) -> None:
        "/api/sessions/{session_id}/colonies/{colony_id}/nodes/{node_id}/tools",
        handle_node_tools,
    )
+    # Live worker control. The GET /workers list endpoint lives in
+    # routes_colony_workers.py — it reads from session.colony (the
+    # unified ColonyRuntime where run_parallel_workers-spawned workers
+    # actually live) and returns the WorkerSummary shape the frontend
+    # types against. Registering a duplicate here shadowed it in
+    # aiohttp's router and broke the Sessions tab.
+    app.router.add_get("/api/sessions/{session_id}/workers/stream", handle_live_workers_stream)
+    app.router.add_post(
+        "/api/sessions/{session_id}/workers/stop-all",
+        handle_stop_all_live_workers,
+    )
+    app.router.add_post(
+        "/api/sessions/{session_id}/workers/{worker_id}/stop",
+        handle_stop_live_worker,
+    )
@@ -93,6 +93,9 @@ class Session:
    worker_configured: bool = False
    # Monotonic timestamps for next trigger fire (mirrors AgentRuntime._timer_next_fire)
    trigger_next_fire: dict[str, float] = field(default_factory=dict)
+    # Per-trigger fire stats (session lifetime): {trigger_id: {"fire_count": int, "last_fired_at": epoch_ms}}.
+    # Reset on process restart — good enough as a "since this session started" counter.
+    trigger_fire_stats: dict[str, dict[str, Any]] = field(default_factory=dict)
    # Session directory resumption:
    # When set, _start_queen writes queen conversations to this existing session's
    # directory instead of creating a new one.  This lets cold-restores accumulate
@@ -111,6 +114,12 @@ class Session:
    # tool unlocked. The mode is the canonical discriminator for storage
    # path, tool exposure, and SSE filtering — see the Phase 2 plan.
    mode: Literal["dm", "colony"] = "dm"
+    # Set to True after the user clicks the COLONY_CREATED system message
+    # in this DM. Locks the chat input — the user must compact+fork into a
+    # fresh session before continuing the conversation. Persisted in
+    # meta.json so the lock survives server restarts.
+    colony_spawned: bool = False
+    spawned_colony_name: str | None = None


 class SessionManager:
@@ -139,6 +148,20 @@ class SessionManager:
        except Exception:
            logger.warning("v2 migration failed (non-fatal)", exc_info=True)

+        # Ensure every existing colony has an up-to-date progress.db
+        # (schema v1, WAL mode) and reclaim any stale claims left behind
+        # by crashed workers from the previous run.  Idempotent and
+        # fast; runs synchronously because the event loop hasn't
+        # started yet at __init__ time.
+        from framework.host.progress_db import ensure_all_colony_dbs
+
+        try:
+            ensured = ensure_all_colony_dbs()
+            if ensured:
+                logger.info("progress_db: ensured %d colony DB(s) at startup", len(ensured))
+        except Exception:
+            logger.warning("progress_db: backfill at startup failed (non-fatal)", exc_info=True)
+
    def build_llm(self, model: str | None = None):
        """Construct an LLM provider using the server's configured defaults."""
        from framework.config import RuntimeConfig, get_hive_config
@@ -399,6 +422,27 @@ class SessionManager:
            if existing.worker_path and str(existing.worker_path) == str(agent_path):
                return existing

+        # When the queen forked this colony, the inherited DM transcript
+        # is compacted in the background (see fork_session_into_colony).
+        # Block here until that compactor finishes so _load_worker_core
+        # reads the compacted summary — not the raw transcript (which
+        # would defeat the fork's purpose). Bounded wait: on timeout we
+        # proceed anyway so a stuck compactor can't brick the colony.
+        if queen_resume_from:
+            try:
+                from framework.server import compaction_status
+
+                await compaction_status.await_completion(
+                    _find_queen_session_dir(queen_resume_from),
+                    timeout=180.0,
+                )
+            except Exception:
+                logger.debug(
+                    "await_compaction failed for %s — proceeding",
+                    queen_resume_from,
+                    exc_info=True,
+                )
+
        session = await self._create_session_core(
            session_id=_colony_session_id or queen_resume_from,
            model=model,
@@ -671,8 +715,21 @@ class SessionManager:
            event_bus=session.event_bus,
        )

-        # Start the worker's agent loop in the background
-        session.queen_task = asyncio.create_task(session.queen_executor.run(initial_message=initial_prompt))
+        # Start the worker's agent loop in the background.
+        # Scope browser profile per-session so parallel sessions drive
+        # independent Chrome tab groups. Browser tools live in an MCP
+        # subprocess; we inject `profile` via the ToolRegistry execution
+        # context (a CONTEXT_PARAM) so it flows into every tool call.
+        async def _run_worker():
+            try:
+                from framework.loader.tool_registry import ToolRegistry
+
+                ToolRegistry.set_execution_context(profile=session.id)
+            except Exception:
+                logger.debug("Worker: failed to set browser profile", exc_info=True)
+            await session.queen_executor.run(initial_message=initial_prompt)
+
+        session.queen_task = asyncio.create_task(_run_worker())

        # Set up event persistence
        if session.event_bus and queen_dir:
@@ -1166,8 +1223,27 @@ class SessionManager:
                logger.info("Session '%s': shutdown reflection spawned", session_id)
                self._background_tasks.add(task)
                task.add_done_callback(self._background_tasks.discard)
-            except Exception:
-                logger.warning("Session '%s': failed to spawn shutdown reflection", session_id, exc_info=True)
+            except RuntimeError as exc:
+                # Most common when a session is stopped after the event loop
+                # has closed (e.g. during server shutdown or from an atexit
+                # handler). The reflection would have had nothing to write
+                # anyway — no new turns since the last periodic reflection.
+                logger.warning(
+                    "Session '%s': shutdown reflection skipped — event loop unavailable (%s). "
+                    "Normal during server shutdown; anything worth persisting was saved by the "
+                    "periodic reflection after the last turn.",
+                    session_id,
+                    exc,
+                )
+            except Exception as exc:
+                logger.warning(
+                    "Session '%s': failed to spawn shutdown reflection: %s: %s. "
+                    "Check that queen_dir exists and session.llm is configured; full traceback follows.",
+                    session_id,
+                    type(exc).__name__,
+                    exc,
+                    exc_info=True,
+                )

        if session.queen_task is not None:
            session.queen_task.cancel()
@@ -1292,6 +1368,13 @@ class SessionManager:
                _new_meta["agent_path"] = str(session.worker_path)
            _existing_meta.update(_new_meta)
            _meta_path.write_text(json.dumps(_existing_meta), encoding="utf-8")
+            # Hydrate colony-spawned lock state from meta.json so the lock
+            # survives server restart / cold-resume into a live session.
+            if _existing_meta.get("colony_spawned") is True:
+                session.colony_spawned = True
+                _spawned_name = _existing_meta.get("spawned_colony_name")
+                if isinstance(_spawned_name, str):
+                    session.spawned_colony_name = _spawned_name
        except OSError:
            pass

@@ -1370,34 +1453,24 @@ class SessionManager:
            )

        # Auto-load worker on cold restore — the queen's conversation expects
-        # the agent to be loaded, but the new session has no worker.
+        # the colony to be loaded, but the new session has no worker.
        if session.queen_resume_from and not session.colony_runtime:
            meta_path = queen_dir / "meta.json"
            if meta_path.exists():
                try:
                    _meta = json.loads(meta_path.read_text(encoding="utf-8"))
                    _agent_path = _meta.get("agent_path")
-                    _phase = _meta.get("phase")

                    if _agent_path and Path(_agent_path).exists():
-                        if _phase in ("staging", "running", None):
-                            # Agent fully built — load worker and resume
-                            await self.load_colony(session.id, _agent_path)
-                            if session.phase_state:
-                                await session.phase_state.switch_to_staging(source="auto")
-                            logger.info("Cold restore: auto-loaded worker from %s", _agent_path)
-                        elif _phase == "building":
-                            # Agent folder exists but incomplete — resume building
-                            if session.phase_state:
-                                session.phase_state.agent_path = _agent_path
-                                await session.phase_state.switch_to_building(source="auto")
-                            logger.info("Cold restore: resumed BUILDING phase for %s", _agent_path)
-                        elif _phase == "planning":
-                            if session.phase_state:
-                                session.phase_state.agent_path = _agent_path
-                            logger.info("Cold restore: PLANNING phase for %s", _agent_path)
+                        await self.load_colony(session.id, _agent_path)
+                        if session.phase_state:
+                            # Restored colony session lands in reviewing — the
+                            # queen summarises whatever the last run produced
+                            # before the user decides what to do next.
+                            await session.phase_state.switch_to_reviewing(source="auto")
+                        logger.info("Cold restore: auto-loaded colony from %s", _agent_path)
                except Exception:
-                    logger.warning("Cold restore: failed to auto-load worker", exc_info=True)
+                    logger.warning("Cold restore: failed to auto-load colony", exc_info=True)

    # ------------------------------------------------------------------
    # Phase 2: unified ColonyRuntime construction
@@ -1462,8 +1535,46 @@ class SessionManager:
            tool_executor=queen_tool_executor,
            event_bus=session.event_bus,
            colony_id=session.id,
+            # Wire the on-disk colony name and queen id so
+            # ColonyRuntime auto-derives its override paths. DM sessions
+            # have no colony_name (session.colony_name is None), which
+            # keeps them out of the per-colony JSON store.
+            colony_name=getattr(session, "colony_name", None),
+            queen_id=getattr(session, "queen_name", None) or None,
            pipeline_stages=[],  # queen pipeline runs in queen_orchestrator, not here
        )
+
+        # Per-colony tool allowlist, loaded from the colony's metadata.json
+        # when this session is attached to a real forked colony. For pure
+        # queen DM sessions (session.colony_name is None) we only capture
+        # the MCP-origin set — the allowlist stays ``None`` so every MCP
+        # tool passes through by default.
+        try:
+            mcp_tool_names_all: set[str] = set()
+            mgr_catalog = getattr(self, "_mcp_tool_catalog", None)
+            if isinstance(mgr_catalog, dict):
+                for entries in mgr_catalog.values():
+                    for entry in entries:
+                        name = entry.get("name") if isinstance(entry, dict) else None
+                        if name:
+                            mcp_tool_names_all.add(name)
+            enabled_mcp_tools: list[str] | None = None
+            colony_name = getattr(session, "colony_name", None)
+            if colony_name:
+                # Colony tool allowlist lives in a dedicated tools.json
+                # sidecar next to metadata.json. The helper migrates any
+                # legacy field out of metadata.json on first read.
+                from framework.host.colony_tools_config import load_colony_tools_config
+
+                enabled_mcp_tools = load_colony_tools_config(colony_name)
+            colony.set_tool_allowlist(enabled_mcp_tools, mcp_tool_names_all)
+        except Exception:
+            logger.debug(
+                "Colony allowlist bootstrap failed for session %s",
+                session.id,
+                exc_info=True,
+            )
+
        await colony.start()
        session.colony = colony

@@ -1556,8 +1667,28 @@ class SessionManager:
        # Resolve entry node for trigger target
        runner = getattr(session, "runner", None)
        colony_entry = runner.graph.entry_node if runner else None
+        fire_times = getattr(session, "trigger_next_fire", {})
+        fire_stats = getattr(session, "trigger_fire_stats", {})
+        now_mono = time.monotonic()
+        now_wall = time.time()

        for t in triggers.values():
+            # Merge ephemeral next-fire data + historical fire stats into
+            # trigger_config so the UI can render a live-ticking countdown
+            # and a "fired Nx · last 2m ago" badge. `next_fire_at` is epoch
+            # milliseconds (wall clock) — the frontend anchors its ticker
+            # on this. `next_fire_in` is kept for legacy consumers.
+            config_out = dict(t.trigger_config)
+            mono = fire_times.get(t.id)
+            if mono is not None:
+                remaining = max(0.0, mono - now_mono)
+                config_out["next_fire_in"] = remaining
+                config_out["next_fire_at"] = int((now_wall + remaining) * 1000)
+            stats = fire_stats.get(t.id)
+            if stats:
+                config_out["fire_count"] = stats.get("fire_count", 0)
+                if stats.get("last_fired_at") is not None:
+                    config_out["last_fired_at"] = stats["last_fired_at"]
            await session.event_bus.publish(
                AgentEvent(
                    type=event_type,
@@ -1565,7 +1696,7 @@ class SessionManager:
                    data={
                        "trigger_id": t.id,
                        "trigger_type": t.trigger_type,
-                        "trigger_config": t.trigger_config,
+                        "trigger_config": config_out,
                        "name": t.description or t.id,
                        **({"entry_node": colony_entry} if colony_entry else {}),
                    },
@@ -1633,6 +1764,42 @@ class SessionManager:
    def list_sessions(self) -> list[Session]:
        return list(self._sessions.values())

+    # ------------------------------------------------------------------
+    # Skill override helpers — used by routes_skills to find every live
+    # SkillsManager affected by a queen- or colony-scope mutation so a
+    # single HTTP call can reload them all.
+    # ------------------------------------------------------------------
+
+    def iter_queen_sessions(self, queen_id: str):
+        """Yield live sessions whose queen matches ``queen_id``."""
+        for s in self._sessions.values():
+            if getattr(s, "queen_name", None) == queen_id:
+                yield s
+
+    def iter_colony_runtimes(
+        self,
+        *,
+        queen_id: str | None = None,
+        colony_name: str | None = None,
+    ):
+        """Yield live ``ColonyRuntime`` instances matching the filters.
+
+        ``queen_id`` alone → every runtime whose ``queen_id`` matches
+        (useful when the user toggles a queen-scope skill — all her
+        colonies must reload).  ``colony_name`` alone → the single
+        runtime pinned to that colony.  Both → intersection. No filters
+        → every live runtime (used by global ``/api/skills`` reload).
+        """
+        for s in self._sessions.values():
+            colony = getattr(s, "colony", None)
+            if colony is None:
+                continue
+            if queen_id is not None and getattr(colony, "queen_id", None) != queen_id:
+                continue
+            if colony_name is not None and getattr(colony, "colony_name", None) != colony_name:
+                continue
+            yield colony
+
    # ------------------------------------------------------------------
    # Cold session helpers (disk-only, no live runtime required)
    # ------------------------------------------------------------------
@@ -1677,6 +1844,8 @@ class SessionManager:
        # Read extra metadata written at session start
        agent_name: str | None = None
        agent_path: str | None = None
+        colony_spawned: bool = False
+        spawned_colony_name: str | None = None
        meta_path = queen_dir / "meta.json"
        if meta_path.exists():
            try:
@@ -1684,6 +1853,10 @@ class SessionManager:
                agent_name = meta.get("agent_name")
                agent_path = meta.get("agent_path")
                created_at = meta.get("created_at") or created_at
+                colony_spawned = bool(meta.get("colony_spawned"))
+                _spawned = meta.get("spawned_colony_name")
+                if isinstance(_spawned, str):
+                    spawned_colony_name = _spawned
            except (json.JSONDecodeError, OSError):
                pass

@@ -1695,6 +1868,8 @@ class SessionManager:
            "created_at": created_at,
            "agent_name": agent_name,
            "agent_path": agent_path,
+            "colony_spawned": colony_spawned,
+            "spawned_colony_name": spawned_colony_name,
        }

    @staticmethod
@@ -14,6 +14,7 @@ from unittest.mock import AsyncMock, MagicMock
 import pytest
 from aiohttp.test_utils import TestClient, TestServer

+from framework.host.execution_manager import ExecutionAlreadyRunningError
 from framework.host.triggers import TriggerDefinition
 from framework.llm.model_catalog import get_models_catalogue
 from framework.server import (
@@ -89,8 +90,8 @@ class MockStream:
    _active_executors: dict = field(default_factory=dict)
    active_execution_ids: set = field(default_factory=set)

-    async def cancel_execution(self, execution_id: str, reason: str | None = None) -> bool:
-        return execution_id in self._execution_tasks
+    async def cancel_execution(self, execution_id: str, reason: str | None = None) -> str:
+        return "cancelled" if execution_id in self._execution_tasks else "not_found"


@dataclass
@@ -638,13 +639,17 @@ class TestQueenSessionSelection:
            )
            assert resp.status == 200
            data = await resp.json()
-
-        assert data == {
-            "session_id": "queen_live",
-            "queen_id": "queen_technology",
-            "status": "live",
-        }
-        assert any(call.args == ("other_live",) for call in manager.stop_session.await_args_list)
+            # Assert inside the async-with so app shutdown (which stops
+            # remaining sessions as cleanup) doesn't pollute the assertions.
+            assert data == {
+                "session_id": "queen_live",
+                "queen_id": "queen_technology",
+                "status": "live",
+            }
+            # Other queen's live session must be left running so multiple
+            # queens can stay active in parallel across navigation.
+            manager.stop_session.assert_not_awaited()
+            assert "other_live" in manager._sessions

    @pytest.mark.asyncio
    async def test_select_queen_session_restores_specific_history_session(self, monkeypatch, tmp_path):
@@ -745,18 +750,21 @@ class TestQueenSessionSelection:
            )
            assert resp.status == 200
            data = await resp.json()
-
-        assert data == {
-            "session_id": "fresh_thread",
-            "queen_id": "queen_technology",
-            "status": "created",
-        }
-        manager.stop_session.assert_awaited_once_with("old_live")
-        manager.create_session.assert_awaited_once_with(
-            initial_prompt=None,
-            queen_name="queen_technology",
-            initial_phase="independent",
-        )
+            # Assert inside the async-with so app shutdown (which stops
+            # remaining sessions as cleanup) doesn't pollute the assertions.
+            assert data == {
+                "session_id": "fresh_thread",
+                "queen_id": "queen_technology",
+                "status": "created",
+            }
+            # Other queen's live session must be left running.
+            manager.stop_session.assert_not_awaited()
+            assert "old_live" in manager._sessions
+            manager.create_session.assert_awaited_once_with(
+                initial_prompt=None,
+                queen_name="queen_technology",
+                initial_phase="independent",
+            )


 class TestExecution:
@@ -773,6 +781,21 @@ class TestExecution:
            data = await resp.json()
            assert data["execution_id"] == "exec_test_123"

+    @pytest.mark.asyncio
+    async def test_trigger_returns_409_when_execution_still_running(self):
+        session = _make_session()
+        session.colony_runtime.trigger = AsyncMock(side_effect=ExecutionAlreadyRunningError("default", ["session-1"]))
+        app = _make_app_with_session(session)
+        async with TestClient(TestServer(app)) as client:
+            resp = await client.post(
+                "/api/sessions/test_agent/trigger",
+                json={"entry_point_id": "default", "input_data": {"msg": "hi"}},
+            )
+            assert resp.status == 409
+            data = await resp.json()
+            assert data["stream_id"] == "default"
+            assert data["active_execution_ids"] == ["session-1"]
+
    @pytest.mark.asyncio
    async def test_trigger_not_found(self):
        app = create_app()
@@ -911,6 +934,7 @@ class TestExecution:
            data = await resp.json()
            assert data["stopped"] is False
            assert data["cancelled"] == []
+            assert data["cancelling"] == []
            assert data["timers_paused"] is True

    @pytest.mark.asyncio
@@ -1020,6 +1044,22 @@ class TestStop:
            assert resp.status == 200
            data = await resp.json()
            assert data["stopped"] is True
+            assert data["cancelling"] is False
+
+    @pytest.mark.asyncio
+    async def test_stop_returns_accepted_while_execution_is_still_cancelling(self):
+        session = _make_session()
+        session.colony_runtime._mock_streams["default"].cancel_execution = AsyncMock(return_value="cancelling")
+        app = _make_app_with_session(session)
+        async with TestClient(TestServer(app)) as client:
+            resp = await client.post(
+                "/api/sessions/test_agent/stop",
+                json={"execution_id": "exec_abc"},
+            )
+            assert resp.status == 202
+            data = await resp.json()
+            assert data["stopped"] is False
+            assert data["cancelling"] is True

    @pytest.mark.asyncio
    async def test_stop_not_found(self):
@@ -1476,6 +1516,65 @@ class TestCredentials:
            data = await resp.json()
            assert data["credentials"] == []

+    @pytest.mark.asyncio
+    async def test_list_credentials_skips_unreadable_encrypted_entry(self):
+        from pydantic import SecretStr
+
+        from framework.credentials.models import CredentialDecryptionError, CredentialKey, CredentialObject
+
+        class BrokenStore:
+            def list_credentials(self):
+                return ["good_cred", "bad_cred"]
+
+            def get_credential(self, credential_id, refresh_if_needed=False):
+                if credential_id == "bad_cred":
+                    raise CredentialDecryptionError("bad encrypted file")
+                return CredentialObject(
+                    id=credential_id,
+                    keys={"api_key": CredentialKey(name="api_key", value=SecretStr("secret"))},
+                )
+
+        app = create_app()
+        app["credential_store"] = BrokenStore()
+
+        async with TestClient(TestServer(app)) as client:
+            resp = await client.get("/api/credentials")
+            assert resp.status == 200
+            data = await resp.json()
+
+        assert [c["credential_id"] for c in data["credentials"]] == ["good_cred"]
+        assert data["unreadable_credentials"] == ["bad_cred"]
+        assert "secret" not in json.dumps(data)
+
+    @pytest.mark.asyncio
+    async def test_get_credential_unreadable_returns_recoverable_conflict(self):
+        from framework.credentials.models import CredentialDecryptionError
+
+        class BrokenStore:
+            def get_credential(self, credential_id, refresh_if_needed=False):
+                raise CredentialDecryptionError("bad encrypted file")
+
+        app = create_app()
+        app["credential_store"] = BrokenStore()
+
+        async with TestClient(TestServer(app)) as client:
+            resp = await client.get("/api/credentials/bad_cred")
+            data = await resp.json()
+
+        assert resp.status == 409
+        assert data["credential_id"] == "bad_cred"
+        assert data["recoverable"] is True
+
+    def test_specs_availability_treats_decryption_error_as_unavailable(self):
+        from framework.credentials.models import CredentialDecryptionError
+        from framework.server.routes_credentials import _is_available_for_specs
+
+        class BrokenStore:
+            def is_available(self, credential_id):
+                raise CredentialDecryptionError("bad encrypted file")
+
+        assert _is_available_for_specs(BrokenStore(), "exa_search") is False
+
    @pytest.mark.asyncio
    async def test_save_and_list_credential(self):
        app = self._make_app()
@@ -0,0 +1,300 @@
+"""Tests for the per-colony MCP tool allowlist filter + routes.
+
+Covers:
+1. ``ColonyRuntime`` filter semantics (default-allow, allowlist, empty,
+   lifecycle passes through).
+2. routes_colony_tools round trip (GET/PATCH, validation, 404).
+3. Colony index route for the Tool Library picker.
+
+Routes never touch the real ``~/.hive/colonies`` tree — we redirect
+``COLONIES_DIR`` into ``tmp_path`` via monkeypatch.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from typing import Any
+
+import pytest
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.host.colony_runtime import ColonyRuntime
+from framework.llm.provider import Tool
+from framework.server import routes_colony_tools
+
+
+def _tool(name: str) -> Tool:
+    return Tool(name=name, description=f"desc of {name}", parameters={"type": "object"})
+
+
+# ---------------------------------------------------------------------------
+# ColonyRuntime filter unit tests
+# ---------------------------------------------------------------------------
+
+
+def _bare_runtime() -> ColonyRuntime:
+    rt = ColonyRuntime.__new__(ColonyRuntime)
+    rt._enabled_mcp_tools = None
+    rt._mcp_tool_names_all = set()
+    return rt
+
+
+class TestColonyFilter:
+    def test_default_is_noop(self):
+        rt = _bare_runtime()
+        tools = [_tool("mcp_a"), _tool("lc_b")]
+        assert rt._apply_tool_allowlist(tools) == tools
+
+    def test_allowlist_gates_mcp_only(self):
+        rt = _bare_runtime()
+        rt._mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        rt._enabled_mcp_tools = ["mcp_a"]
+        tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        names = [t.name for t in rt._apply_tool_allowlist(tools)]
+        assert names == ["mcp_a", "lc_c"]
+
+    def test_empty_allowlist_keeps_lifecycle(self):
+        rt = _bare_runtime()
+        rt._mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        rt._enabled_mcp_tools = []
+        tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        names = [t.name for t in rt._apply_tool_allowlist(tools)]
+        assert names == ["lc_c"]
+
+    def test_setter_mutates_live_state(self):
+        rt = _bare_runtime()
+        rt.set_tool_allowlist(["x"], {"x", "y"})
+        assert rt._enabled_mcp_tools == ["x"]
+        assert rt._mcp_tool_names_all == {"x", "y"}
+
+        # Passing None on allowlist clears gating; mcp_tool_names_all
+        # defaults to "keep current" so a subsequent caller doesn't need
+        # to repeat the set.
+        rt.set_tool_allowlist(None)
+        assert rt._enabled_mcp_tools is None
+        assert rt._mcp_tool_names_all == {"x", "y"}
+
+
+# ---------------------------------------------------------------------------
+# Route round-trip tests
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class _FakeSession:
+    colony_name: str
+    colony: Any = None
+    colony_runtime: Any = None
+    id: str = "sess-1"
+
+
+@dataclass
+class _FakeManager:
+    _sessions: dict = field(default_factory=dict)
+    _mcp_tool_catalog: dict = field(default_factory=dict)
+
+
+@pytest.fixture
+def colony_dir(tmp_path, monkeypatch):
+    """Point COLONIES_DIR into a tmp tree and seed a colony."""
+    colonies = tmp_path / "colonies"
+    colonies.mkdir()
+    monkeypatch.setattr("framework.host.colony_metadata.COLONIES_DIR", colonies)
+    monkeypatch.setattr("framework.host.colony_tools_config.COLONIES_DIR", colonies)
+
+    name = "my_colony"
+    cdir = colonies / name
+    cdir.mkdir()
+    (cdir / "metadata.json").write_text(
+        json.dumps(
+            {
+                "colony_name": name,
+                "queen_name": "queen_technology",
+                "created_at": "2026-04-20T00:00:00+00:00",
+            }
+        )
+    )
+    return colonies, name
+
+
+async def _app(manager: _FakeManager) -> web.Application:
+    app = web.Application()
+    app["manager"] = manager
+    routes_colony_tools.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_get_tools_default_allow(colony_dir):
+    _, name = colony_dir
+    manager = _FakeManager(
+        _mcp_tool_catalog={
+            "coder-tools": [
+                {"name": "read_file", "description": "read", "input_schema": {}},
+                {"name": "write_file", "description": "write", "input_schema": {}},
+            ],
+        }
+    )
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/colony/{name}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+    assert body["enabled_mcp_tools"] is None
+    assert body["stale"] is False
+    tools = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+    assert all(t["enabled"] for t in tools.values())
+
+
+@pytest.mark.asyncio
+async def test_patch_persists_and_validates(colony_dir):
+    colonies_dir, name = colony_dir
+    manager = _FakeManager(
+        _mcp_tool_catalog={
+            "coder-tools": [
+                {"name": "read_file", "description": "", "input_schema": {}},
+                {"name": "write_file", "description": "", "input_schema": {}},
+            ]
+        }
+    )
+    app = await _app(manager)
+    tools_path = colonies_dir / name / "tools.json"
+    metadata_path = colonies_dir / name / "metadata.json"
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["read_file"]})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] == ["read_file"]
+
+        # Persisted to tools.json; metadata.json does not carry the field.
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] == ["read_file"]
+        assert "updated_at" in sidecar
+        meta = json.loads(metadata_path.read_text())
+        assert "enabled_mcp_tools" not in meta
+
+        # GET reflects the allowlist
+        resp = await client.get(f"/api/colony/{name}/tools")
+        body = await resp.json()
+        tools = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+        assert tools["read_file"]["enabled"] is True
+        assert tools["write_file"]["enabled"] is False
+
+        # Unknown → 400
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["ghost"]})
+        assert resp.status == 400
+        assert "ghost" in (await resp.json()).get("unknown", [])
+
+
+@pytest.mark.asyncio
+async def test_patch_refreshes_live_runtime(colony_dir):
+    _, name = colony_dir
+
+    rt = _bare_runtime()
+    rt._mcp_tool_names_all = {"read_file", "write_file"}
+    rt.set_tool_allowlist(None)
+
+    session = _FakeSession(colony_name=name, colony=rt)
+    manager = _FakeManager(
+        _sessions={session.id: session},
+        _mcp_tool_catalog={
+            "coder-tools": [
+                {"name": "read_file", "description": "", "input_schema": {}},
+                {"name": "write_file", "description": "", "input_schema": {}},
+            ]
+        },
+    )
+
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(f"/api/colony/{name}/tools", json={"enabled_mcp_tools": ["read_file"]})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["refreshed_runtimes"] == 1
+    assert rt._enabled_mcp_tools == ["read_file"]
+
+
+@pytest.mark.asyncio
+async def test_404_for_unknown_colony(colony_dir):
+    manager = _FakeManager()
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/colony/unknown/tools")
+        assert resp.status == 404
+        resp = await client.patch("/api/colony/unknown/tools", json={"enabled_mcp_tools": None})
+        assert resp.status == 404
+
+
+@pytest.mark.asyncio
+async def test_tools_index_lists_colonies(colony_dir):
+    _, name = colony_dir
+    manager = _FakeManager()
+    app = await _app(manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/colonies/tools-index")
+        assert resp.status == 200
+        body = await resp.json()
+    entries = {c["name"]: c for c in body["colonies"]}
+    assert name in entries
+    assert entries[name]["queen_name"] == "queen_technology"
+    assert entries[name]["has_allowlist"] is False
+
+
+def test_queen_allowlist_inherits_into_new_colony(tmp_path, monkeypatch):
+    """A colony forked with a curated queen inherits her allowlist.
+
+    Exercises the inheritance hook in
+    ``routes_execution.fork_session_into_colony`` without running the
+    full fork machinery — we just call
+    ``update_colony_tools_config`` the same way the hook does and
+    assert the colony's ``tools.json`` matches the queen's live list.
+    """
+    colonies = tmp_path / "colonies"
+    colonies.mkdir()
+    monkeypatch.setattr("framework.host.colony_tools_config.COLONIES_DIR", colonies)
+
+    from framework.host.colony_tools_config import (
+        load_colony_tools_config,
+        update_colony_tools_config,
+    )
+
+    colony_name = "forked_child"
+    (colonies / colony_name).mkdir()
+
+    # Simulate: queen has a curated allowlist (e.g. role default resolved
+    # to a concrete list). The inheritance hook copies it verbatim.
+    queen_live_allowlist = ["read_file", "web_scrape", "csv_read"]
+    update_colony_tools_config(colony_name, list(queen_live_allowlist))
+
+    assert load_colony_tools_config(colony_name) == queen_live_allowlist
+
+
+def test_legacy_metadata_field_migrates_to_sidecar(colony_dir):
+    """A legacy enabled_mcp_tools field in metadata.json is hoisted to tools.json."""
+    colonies_dir, name = colony_dir
+    meta_path = colonies_dir / name / "metadata.json"
+    tools_path = colonies_dir / name / "tools.json"
+
+    # Seed legacy field in metadata.json.
+    meta = json.loads(meta_path.read_text())
+    meta["enabled_mcp_tools"] = ["read_file"]
+    meta_path.write_text(json.dumps(meta))
+
+    from framework.host.colony_tools_config import load_colony_tools_config
+
+    # First load migrates.
+    assert load_colony_tools_config(name) == ["read_file"]
+    assert tools_path.exists()
+    sidecar = json.loads(tools_path.read_text())
+    assert sidecar["enabled_mcp_tools"] == ["read_file"]
+
+    # metadata.json no longer contains the field; provenance fields preserved.
+    migrated = json.loads(meta_path.read_text())
+    assert "enabled_mcp_tools" not in migrated
+    assert migrated["queen_name"] == "queen_technology"
+
+    # Second load is a direct sidecar read.
+    assert load_colony_tools_config(name) == ["read_file"]
@@ -0,0 +1,239 @@
+"""Tests for the MCP server CRUD HTTP routes.
+
+Monkey-patches ``MCPRegistry`` inside ``routes_mcp`` so the HTTP layer is
+exercised without reading or writing ``~/.hive/mcp_registry/installed.json``
+or spawning actual subprocesses.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.loader.mcp_errors import MCPError, MCPErrorCode
+from framework.server import routes_mcp
+
+
+class _FakeRegistry:
+    """Stand-in for MCPRegistry — just enough surface for the routes."""
+
+    def __init__(self) -> None:
+        self._servers: dict[str, dict[str, Any]] = {
+            "built-in-seed": {
+                "source": "registry",
+                "transport": "stdio",
+                "enabled": True,
+                "manifest": {"description": "Factory-seeded server", "tools": []},
+                "last_health_status": "healthy",
+                "last_error": None,
+                "last_health_check_at": None,
+            }
+        }
+
+    def initialize(self) -> None:  # noqa: D401 — registry idempotent init
+        return
+
+    def list_installed(self) -> list[dict[str, Any]]:
+        return [{"name": name, **entry} for name, entry in self._servers.items()]
+
+    def get_server(self, name: str) -> dict | None:
+        if name not in self._servers:
+            return None
+        return {"name": name, **self._servers[name]}
+
+    def add_local(self, *, name: str, transport: str, **kwargs: Any) -> dict:
+        if name in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Server '{name}' already exists",
+                why="A server with this name is already registered locally.",
+                fix=f"Run: hive mcp remove {name}",
+            )
+        entry = {
+            "source": "local",
+            "transport": transport,
+            "enabled": True,
+            "manifest": {"description": kwargs.get("description") or ""},
+            "last_health_status": None,
+            "last_error": None,
+            "last_health_check_at": None,
+        }
+        self._servers[name] = entry
+        return entry
+
+    def remove(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what=f"Cannot remove server '{name}'",
+                why="Server is not installed.",
+                fix="Run: hive mcp list",
+            )
+        del self._servers[name]
+
+    def enable(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        self._servers[name]["enabled"] = True
+
+    def disable(self, name: str) -> None:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_INSTALL_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        self._servers[name]["enabled"] = False
+
+    def health_check(self, name: str) -> dict[str, Any]:
+        if name not in self._servers:
+            raise MCPError(
+                code=MCPErrorCode.MCP_HEALTH_FAILED,
+                what="not found",
+                why="not found",
+                fix="x",
+            )
+        return {"name": name, "status": "healthy", "tools": 3, "error": None}
+
+
+@pytest.fixture
+def registry(monkeypatch):
+    reg = _FakeRegistry()
+    monkeypatch.setattr(routes_mcp, "_registry", lambda: reg)
+    return reg
+
+
+async def _make_app() -> web.Application:
+    app = web.Application()
+    routes_mcp.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_list_servers_returns_built_in(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/mcp/servers")
+        assert resp.status == 200
+        body = await resp.json()
+    names = {s["name"] for s in body["servers"]}
+    # The registry fake carries one entry; the list also merges package-
+    # baked entries from core/framework/agents/queen/mcp_servers.json so
+    # the UI matches what the queen actually loads. Both should appear.
+    assert "built-in-seed" in names
+    sources = {s["name"]: s["source"] for s in body["servers"]}
+    assert sources.get("built-in-seed") == "registry"
+    # The package-baked servers (coder-tools/gcu-tools/hive_tools) carry
+    # source=="built-in" and are non-removable.
+    pkg_entries = [s for s in body["servers"] if s["source"] == "built-in"]
+    assert pkg_entries, "expected at least one package-baked MCP server"
+    assert all(s.get("removable") is False for s in pkg_entries)
+
+
+@pytest.mark.asyncio
+async def test_add_local_server(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post(
+            "/api/mcp/servers",
+            json={
+                "name": "my-tool",
+                "transport": "stdio",
+                "command": "echo",
+                "args": ["hi"],
+                "description": "says hi",
+            },
+        )
+        assert resp.status == 201
+        body = await resp.json()
+        assert body["server"]["name"] == "my-tool"
+        assert body["server"]["source"] == "local"
+
+        resp = await client.get("/api/mcp/servers")
+        names = [s["name"] for s in (await resp.json())["servers"]]
+    assert "my-tool" in names
+
+
+@pytest.mark.asyncio
+async def test_add_rejects_duplicate(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        for _ in range(2):
+            resp = await client.post(
+                "/api/mcp/servers",
+                json={"name": "dup", "transport": "stdio", "command": "x"},
+            )
+        assert resp.status == 409
+        body = await resp.json()
+        assert "already exists" in body["error"].lower()
+        assert body["fix"]
+
+
+@pytest.mark.asyncio
+async def test_add_rejects_invalid_transport(registry):
+    app = await _make_app()
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post(
+            "/api/mcp/servers",
+            json={"name": "x", "transport": "nope"},
+        )
+        assert resp.status == 400
+
+
+@pytest.mark.asyncio
+async def test_enable_disable_cycle(registry):
+    app = await _make_app()
+    # Seed a local server
+    registry.add_local(name="local-one", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post("/api/mcp/servers/local-one/disable")
+        assert resp.status == 200
+        assert (await resp.json())["enabled"] is False
+        assert registry._servers["local-one"]["enabled"] is False
+
+        resp = await client.post("/api/mcp/servers/local-one/enable")
+        assert resp.status == 200
+        assert (await resp.json())["enabled"] is True
+
+
+@pytest.mark.asyncio
+async def test_remove_local_only(registry):
+    app = await _make_app()
+    registry.add_local(name="local-two", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        # Built-ins are protected
+        resp = await client.delete("/api/mcp/servers/built-in-seed")
+        assert resp.status == 400
+
+        # Missing
+        resp = await client.delete("/api/mcp/servers/ghost")
+        assert resp.status == 404
+
+        # Happy path
+        resp = await client.delete("/api/mcp/servers/local-two")
+        assert resp.status == 200
+        assert "local-two" not in registry._servers
+
+
+@pytest.mark.asyncio
+async def test_health_check(registry, monkeypatch):
+    app = await _make_app()
+    registry.add_local(name="pingable", transport="stdio", command="x")
+
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.post("/api/mcp/servers/pingable/health")
+        assert resp.status == 200
+        body = await resp.json()
+    assert body["status"] == "healthy"
+    assert body["tools"] == 3
@@ -0,0 +1,443 @@
+"""Tests for the per-queen MCP tool allowlist filter + routes.
+
+Covers:
+1. QueenPhaseState filter semantics (default-allow, allowlist, empty, phase-
+   isolation, memo identity for LLM prompt-cache stability).
+2. routes_queen_tools round trip (GET, PATCH, validation, live-session
+   hot-reload).
+
+Route tests monkey-patch a tiny queen profile + manager catalog; they never
+spawn an MCP subprocess.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from typing import Any
+from unittest.mock import MagicMock
+
+import pytest
+import yaml
+from aiohttp import web
+from aiohttp.test_utils import TestClient, TestServer
+
+from framework.llm.provider import Tool
+from framework.server import routes_queen_tools
+from framework.tools.queen_lifecycle_tools import QueenPhaseState
+
+# ---------------------------------------------------------------------------
+# QueenPhaseState filter — pure unit tests
+# ---------------------------------------------------------------------------
+
+
+def _tool(name: str) -> Tool:
+    return Tool(name=name, description=f"desc of {name}", parameters={"type": "object"})
+
+
+class TestPhaseStateFilter:
+    def test_default_allow_returns_every_tool(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = None
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["mcp_a", "mcp_b", "lc_c"]
+
+    def test_allowlist_keeps_listed_mcp_plus_all_lifecycle(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = ["mcp_a"]
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["mcp_a", "lc_c"]
+
+    def test_empty_allowlist_keeps_only_lifecycle(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("mcp_b"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a", "mcp_b"}
+        ps.enabled_mcp_tools = []
+        ps.rebuild_independent_filter()
+
+        names = [t.name for t in ps.get_current_tools()]
+        assert names == ["lc_c"]
+
+    def test_filter_isolated_to_independent_phase(self):
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.working_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a"}
+        ps.enabled_mcp_tools = []
+        ps.rebuild_independent_filter()
+
+        # Independent → filtered
+        assert [t.name for t in ps.get_current_tools()] == ["lc_c"]
+
+        # Other phases → unaffected
+        ps.phase = "working"
+        assert [t.name for t in ps.get_current_tools()] == ["mcp_a", "lc_c"]
+
+    def test_memo_returns_stable_identity_for_prompt_cache(self):
+        """Same Python list object across turns → LLM prompt cache stays warm."""
+        ps = QueenPhaseState(phase="independent")
+        ps.independent_tools = [_tool("mcp_a"), _tool("lc_c")]
+        ps.mcp_tool_names_all = {"mcp_a"}
+        ps.enabled_mcp_tools = None
+        ps.rebuild_independent_filter()
+
+        first = ps.get_current_tools()
+        second = ps.get_current_tools()
+        assert first is second, "memoized list must be the same object across turns"
+
+        # A rebuild should produce a different object so downstream caches
+        # correctly invalidate.
+        ps.enabled_mcp_tools = ["mcp_a"]
+        ps.rebuild_independent_filter()
+        third = ps.get_current_tools()
+        assert third is not first
+        assert [t.name for t in third] == ["mcp_a", "lc_c"]
+
+
+# ---------------------------------------------------------------------------
+# Route round-trip tests
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class _FakeSession:
+    queen_name: str
+    phase_state: QueenPhaseState
+    colony_runtime: Any = None
+    id: str = "sess-1"
+    _queen_tool_registry: Any = None
+
+
+@dataclass
+class _FakeManager:
+    _sessions: dict = field(default_factory=dict)
+    _mcp_tool_catalog: dict = field(default_factory=dict)
+
+
+@pytest.fixture
+def queen_dir(tmp_path, monkeypatch):
+    """Redirect queen profile + tools storage into a tmp dir."""
+    queens_dir = tmp_path / "queens"
+    queens_dir.mkdir()
+    monkeypatch.setattr("framework.agents.queen.queen_profiles.QUEENS_DIR", queens_dir)
+    monkeypatch.setattr("framework.agents.queen.queen_tools_config.QUEENS_DIR", queens_dir)
+
+    queen_id = "queen_technology"
+    (queens_dir / queen_id).mkdir()
+    (queens_dir / queen_id / "profile.yaml").write_text(
+        yaml.safe_dump({"name": "Alexandra", "title": "Head of Technology"})
+    )
+    return queens_dir, queen_id
+
+
+async def _make_app(*, manager: _FakeManager) -> web.Application:
+    app = web.Application()
+    app["manager"] = manager
+    routes_queen_tools.register_routes(app)
+    return app
+
+
+@pytest.mark.asyncio
+async def test_get_tools_default_allows_everything_for_unknown_queen(queen_dir, monkeypatch):
+    """Queens NOT in the role-default table fall back to allow-all."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+
+    queens_dir, _ = queen_dir
+    # Use a queen id that isn't in QUEEN_DEFAULT_CATEGORIES so we exercise
+    # the fallback-to-allow-all path.
+    custom_id = "queen_custom_unknown"
+    (queens_dir / custom_id).mkdir()
+    (queens_dir / custom_id / "profile.yaml").write_text(yaml.safe_dump({"name": "Custom", "title": "Custom Role"}))
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "coder-tools": [
+            {"name": "read_file", "description": "read", "input_schema": {}},
+            {"name": "write_file", "description": "write", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/queen/{custom_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+
+    assert body["enabled_mcp_tools"] is None
+    assert body["is_role_default"] is True  # no sidecar → default-allow
+    assert body["stale"] is False
+    servers = {s["name"]: s for s in body["mcp_servers"]}
+    assert set(servers) == {"coder-tools"}
+    for tool in servers["coder-tools"]["tools"]:
+        assert tool["enabled"] is True
+
+
+@pytest.mark.asyncio
+async def test_get_tools_applies_role_default(queen_dir, monkeypatch):
+    """Known persona queens get their role-based default allowlist."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    _, queen_id = queen_dir  # queen_technology — has a role default
+
+    manager = _FakeManager()
+    # Seed a catalog covering tools the role default references so the
+    # response reflects what the queen would actually see on boot.
+    manager._mcp_tool_catalog = {
+        "coder-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "port_scan", "description": "", "input_schema": {}},  # security
+            {"name": "excel_read", "description": "", "input_schema": {}},  # data
+            {"name": "fluffy_unknown_tool", "description": "", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+
+    # queen_technology's role default includes file_read, data, security, etc.
+    assert body["is_role_default"] is True
+    enabled = set(body["enabled_mcp_tools"] or [])
+    assert "read_file" in enabled
+    assert "port_scan" in enabled  # technology role includes security
+    assert "excel_read" in enabled
+    # Tools not in any category (and not in a @server: expansion target
+    # the role references) are NOT part of the default.
+    assert "fluffy_unknown_tool" not in enabled
+
+
+def test_resolve_queen_default_tools_expands_server_shorthand():
+    """@server:NAME shorthand expands against the provided catalog."""
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    catalog = {
+        "gcu-tools": [
+            {"name": "browser_navigate"},
+            {"name": "browser_click"},
+        ],
+    }
+    # queen_brand_design uses "browser" category → expands via @server:gcu-tools.
+    result = resolve_queen_default_tools("queen_brand_design", catalog)
+    assert result is not None
+    assert "browser_navigate" in result
+    assert "browser_click" in result
+
+
+def test_resolve_queen_default_tools_unknown_queen_returns_none():
+    from framework.agents.queen.queen_tools_defaults import resolve_queen_default_tools
+
+    assert resolve_queen_default_tools("queen_made_up", {}) is None
+
+
+@pytest.mark.asyncio
+async def test_patch_persists_and_validates(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    queens_dir, queen_id = queen_dir
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "coder-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "write_file", "description": "", "input_schema": {}},
+        ]
+    }
+
+    app = await _make_app(manager=manager)
+    tools_path = queens_dir / queen_id / "tools.json"
+    profile_path = queens_dir / queen_id / "profile.yaml"
+
+    async with TestClient(TestServer(app)) as client:
+        # Happy path
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] == ["read_file"]
+
+        # Sidecar persisted; profile YAML untouched by tools PATCH
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] == ["read_file"]
+        assert "updated_at" in sidecar
+        profile = yaml.safe_load(profile_path.read_text())
+        assert "enabled_mcp_tools" not in profile
+
+        # GET reflects the new state
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        body = await resp.json()
+        assert body["is_role_default"] is False  # user has explicitly saved
+        servers = {t["name"]: t for t in body["mcp_servers"][0]["tools"]}
+        assert servers["read_file"]["enabled"] is True
+        assert servers["write_file"]["enabled"] is False
+
+        # Null resets
+        resp = await client.patch(f"/api/queen/{queen_id}/tools", json={"enabled_mcp_tools": None})
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["enabled_mcp_tools"] is None
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] is None
+
+        # Unknown tool name → 400; sidecar unchanged
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["nope_not_a_tool"]},
+        )
+        assert resp.status == 400
+        detail = await resp.json()
+        assert "nope_not_a_tool" in detail.get("unknown", [])
+        sidecar = json.loads(tools_path.read_text())
+        assert sidecar["enabled_mcp_tools"] is None
+
+
+@pytest.mark.asyncio
+async def test_patch_hot_reloads_live_session(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    _, queen_id = queen_dir
+
+    # Build a fake live session whose phase state carries a tool list the
+    # filter can gate. We also need a fake registry so
+    # _catalog_from_live_session can enumerate tools.
+    class _FakeRegistry:
+        def __init__(self, server_map, tools_by_name):
+            self._mcp_server_tools = server_map
+            self._tools_by_name = tools_by_name
+
+        def get_tools(self):
+            return {n: MagicMock(name=n) for n in self._tools_by_name}
+
+    tools_by_name = {"read_file": _tool("read_file"), "write_file": _tool("write_file")}
+    registry = _FakeRegistry(
+        server_map={"coder-tools": {"read_file", "write_file"}},
+        tools_by_name=tools_by_name,
+    )
+    # Patch get_tools to return real Tool objects for name/description plumbing.
+    registry.get_tools = lambda: tools_by_name  # type: ignore[method-assign]
+
+    phase_state = QueenPhaseState(phase="independent")
+    phase_state.independent_tools = [tools_by_name["read_file"], tools_by_name["write_file"]]
+    phase_state.mcp_tool_names_all = {"read_file", "write_file"}
+    phase_state.enabled_mcp_tools = None
+    phase_state.rebuild_independent_filter()
+
+    session = _FakeSession(queen_name=queen_id, phase_state=phase_state)
+    session._queen_tool_registry = registry
+    manager = _FakeManager(_sessions={"sess-1": session})
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["refreshed_sessions"] == 1
+
+    # Session's phase state reflects the new allowlist without a restart
+    current = phase_state.get_current_tools()
+    assert [t.name for t in current] == ["read_file"]
+
+
+@pytest.mark.asyncio
+async def test_missing_queen_returns_404(queen_dir, monkeypatch):
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    manager = _FakeManager()
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        resp = await client.get("/api/queen/queen_nonexistent/tools")
+        assert resp.status == 404
+
+        resp = await client.patch(
+            "/api/queen/queen_nonexistent/tools",
+            json={"enabled_mcp_tools": None},
+        )
+        assert resp.status == 404
+
+
+@pytest.mark.asyncio
+async def test_delete_restores_role_default(queen_dir, monkeypatch):
+    """DELETE removes tools.json so the queen falls back to the role default."""
+    monkeypatch.setattr(routes_queen_tools, "ensure_default_queens", lambda: None)
+    queens_dir, queen_id = queen_dir
+    tools_path = queens_dir / queen_id / "tools.json"
+
+    manager = _FakeManager()
+    manager._mcp_tool_catalog = {
+        "coder-tools": [
+            {"name": "read_file", "description": "", "input_schema": {}},
+            {"name": "port_scan", "description": "", "input_schema": {}},
+        ],
+    }
+
+    app = await _make_app(manager=manager)
+    async with TestClient(TestServer(app)) as client:
+        # Seed a custom allowlist first so we have a sidecar to delete.
+        resp = await client.patch(
+            f"/api/queen/{queen_id}/tools",
+            json={"enabled_mcp_tools": ["read_file"]},
+        )
+        assert resp.status == 200
+        assert tools_path.exists()
+
+        resp = await client.delete(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        body = await resp.json()
+        assert body["removed"] is True
+        assert body["is_role_default"] is True
+        assert not tools_path.exists()
+
+        # The new effective list is the role default for queen_technology,
+        # which includes both read_file (file_read) and port_scan (security).
+        enabled = set(body["enabled_mcp_tools"] or [])
+        assert "read_file" in enabled
+        assert "port_scan" in enabled
+
+        # GET confirms.
+        resp = await client.get(f"/api/queen/{queen_id}/tools")
+        body = await resp.json()
+        assert body["is_role_default"] is True
+
+        # Deleting again is a no-op.
+        resp = await client.delete(f"/api/queen/{queen_id}/tools")
+        assert resp.status == 200
+        assert (await resp.json())["removed"] is False
+
+
+def test_legacy_profile_field_migrates_to_sidecar(queen_dir):
+    """A legacy enabled_mcp_tools field in profile.yaml is hoisted to tools.json."""
+    queens_dir, queen_id = queen_dir
+    profile_path = queens_dir / queen_id / "profile.yaml"
+    tools_path = queens_dir / queen_id / "tools.json"
+
+    # Seed legacy field in profile.yaml.
+    profile = yaml.safe_load(profile_path.read_text()) or {}
+    profile["enabled_mcp_tools"] = ["read_file", "write_file"]
+    profile_path.write_text(yaml.safe_dump(profile, sort_keys=False))
+
+    from framework.agents.queen.queen_tools_config import load_queen_tools_config
+
+    # First load migrates.
+    assert load_queen_tools_config(queen_id) == ["read_file", "write_file"]
+    assert tools_path.exists()
+    sidecar = json.loads(tools_path.read_text())
+    assert sidecar["enabled_mcp_tools"] == ["read_file", "write_file"]
+
+    # profile.yaml no longer contains the field; other fields preserved.
+    migrated_profile = yaml.safe_load(profile_path.read_text())
+    assert "enabled_mcp_tools" not in migrated_profile
+    assert migrated_profile["name"] == "Alexandra"
+
+    # Second load is a direct read — no migration work to do.
+    assert load_queen_tools_config(queen_id) == ["read_file", "write_file"]
@@ -1,24 +0,0 @@
---
-name: hive.batch-ledger
-description: Track per-item status when processing collections to prevent skipped or duplicated items.
-metadata:
-  author: hive
-  type: default-skill
---
-
-## Operational Protocol: Batch Progress Ledger
-
-When processing a collection of items, maintain a batch ledger in `_batch_ledger`.
-
-Initialize when you identify the batch:
- `_batch_total`: total item count
- `_batch_ledger`: JSON with per-item status
-
-Per-item statuses: pending → in_progress → completed|failed|skipped
-
- Set `in_progress` BEFORE processing
- Set final status AFTER processing with 1-line result_summary
- Include error reason for failed/skipped items
- Update aggregate counts after each item
- NEVER remove items from the ledger
- If resuming, skip items already marked completed
@@ -0,0 +1,112 @@
+---
+name: hive.colony-progress-tracker
+description: Claim tasks, record step progress, and verify SOP gates in the colony SQLite queue. Applies when your spawn message includes a db_path field.
+metadata:
+  author: hive
+  type: default-skill
+  visibility: [worker]
+---
+
+## Operational Protocol: Colony Progress Tracker
+
+**Applies when** your spawn message has `db_path:` and `colony_id:` fields. The DB is your durable working memory — tells you what's done, what to skip, which SOP gates you owe.
+
+Access via `execute_command_tool` running `sqlite3 "<db_path>" "..."`. Tables: `tasks` (queue), `steps` (per-task decomposition), `sop_checklist` (hard gates).
+
+### Claim: assigned task (check this FIRST)
+
+If your spawn message includes a `task_id:` field, the queen pre-assigned a specific row to you. Claim that row by id — **do not** use the generic next-pending pattern below:
+
+```bash
+sqlite3 "<db_path>" <<'SQL'
+UPDATE tasks SET status='claimed', worker_id='<worker-id>',
+  claim_token=lower(hex(randomblob(8))),
+  claimed_at=datetime('now'), updated_at=datetime('now')
+WHERE id='<task_id>' AND status='pending'
+RETURNING id, goal, payload;
+SQL
+```
+
+Empty output → another worker raced you or the row is already done. Stop and report.  Non-empty → that row is yours, proceed to "Load the plan".
+
+### Claim: next pending (fallback when no task_id is assigned)
+
+If your spawn message did NOT include `task_id:` — you are a generic fan-out worker racing on a shared queue. Use the generic next-pending claim:
+
+```bash
+sqlite3 "<db_path>" <<'SQL'
+UPDATE tasks SET status='claimed', worker_id='<worker-id>',
+  claim_token=lower(hex(randomblob(8))),
+  claimed_at=datetime('now'), updated_at=datetime('now')
+WHERE id=(SELECT id FROM tasks WHERE status='pending'
+  ORDER BY priority DESC, seq, created_at LIMIT 1)
+RETURNING id, goal, payload;
+SQL
+```
+
+Empty output → queue drained, exit. Otherwise the returned `id` is yours. **Never SELECT-then-UPDATE** — races.
+
+### Load the plan
+
+```bash
+sqlite3 "<db_path>" "SELECT seq, id, title, status FROM steps WHERE task_id='<task-id>' ORDER BY seq;"
+sqlite3 "<db_path>" "SELECT key, description, required, done_at FROM sop_checklist WHERE task_id='<task-id>';"
+```
+
+**Skip any step where status='done'.** That's the point — don't redo completed work.
+
+### Execute a step
+
+Before tool calls:
+```bash
+sqlite3 "<db_path>" "UPDATE steps SET status='in_progress', worker_id='<worker-id>', started_at=datetime('now') WHERE id='<step-id>';"
+```
+After success (one-line evidence: path, URL, key result):
+```bash
+sqlite3 "<db_path>" "UPDATE steps SET status='done', evidence='<what you did>', completed_at=datetime('now') WHERE id='<step-id>';"
+```
+
+### MANDATORY: SOP gate check before marking task done
+
+```bash
+sqlite3 "<db_path>" "SELECT key, description FROM sop_checklist WHERE task_id='<task-id>' AND required=1 AND done_at IS NULL;"
+```
+
+- Empty → proceed to "Mark task done".
+- Non-empty → each row is work you still owe. Do it, then check it off:
+
+```bash
+sqlite3 "<db_path>" "UPDATE sop_checklist SET done_at=datetime('now'), done_by='<worker-id>', note='<why>' WHERE task_id='<task-id>' AND key='<key>';"
+```
+
+**Never mark a task done while this SELECT returns rows.** This gate exists specifically to stop you from declaring success while skipping required steps.
+
+### Mark task done / failed
+
+```bash
+# Success:
+sqlite3 "<db_path>" "UPDATE tasks SET status='done', completed_at=datetime('now'), updated_at=datetime('now') WHERE id='<task-id>' AND worker_id='<worker-id>';"
+
+# Unrecoverable failure:
+sqlite3 "<db_path>" "UPDATE tasks SET status='failed', last_error='<one sentence>', completed_at=datetime('now'), updated_at=datetime('now') WHERE id='<task-id>' AND worker_id='<worker-id>';"
+```
+
+The `AND worker_id=?` guard means a reclaimed row won't accept your write — treat zero rows affected as "your claim was revoked, stop."
+
+### Loop
+
+After done/failed → claim the next task. Exit only when claim returns empty.
+
+### Errors + debug
+
+- **"database is locked"**: retry with 100ms → 1s backoff, max 5 attempts. `busy_timeout=5000` handles most contention silently.
+- **Queue health**: `SELECT status, count(*) FROM tasks GROUP BY status;`
+- **Your in-flight work**: `SELECT id, goal, status FROM tasks WHERE worker_id='<worker-id>';`
+
+### Anti-patterns (will break the queue)
+
+- Don't DDL (CREATE/ALTER/DROP).
+- Don't DELETE — failed tasks stay as `failed` for audit.
+- Don't skip Protocol 4 (SOP gate) before marking done.
+- Don't hold a task >15min without updates — the stale-claim reclaimer revokes your claim.
+- Don't invent task IDs. Workers update existing rows; only the queen enqueues new ones.
@@ -1,24 +1,24 @@
 ---
 name: hive.context-preservation
-description: Proactively preserve critical information before automatic context pruning destroys it.
+description: Proactively extract critical values from tool results into working notes before automatic context pruning destroys them.
 metadata:
  author: hive
  type: default-skill
+  visibility: [worker]
 ---

 ## Operational Protocol: Context Preservation

-You operate under a finite context window. Important information WILL be pruned.
+You operate under a finite context window. Older tool results WILL be pruned. Extract what you need while it's still in context.

-Save-As-You-Go: After any tool call producing information you'll need later,
-immediately extract key data into `_working_notes` or `_preserved_data`.
-Do NOT rely on referring back to old tool results.
+**Save-as-you-go.** After any tool call producing information you'll need later, immediately extract the key data into `_working_notes` or `_preserved_data`. Do not rely on referring back to old tool results — once they're pruned they're gone.

-What to extract: URLs and key snippets (not full pages), relevant API fields
-(not raw JSON), specific lines/values (not entire files), analysis results
-(not raw data).
+**What to extract:**
+- URLs and key snippets (not full pages)
+- Relevant API fields (not raw JSON blobs)
+- Specific lines, values, or IDs (not entire files)
+- Analysis conclusions (not raw data)

-Before transitioning to the next phase/node, write a handoff summary to
-`_handoff_context` with everything the next phase needs to know.
+**Handoffs between tasks** happen through `progress.db`, not through shared-buffer handoff blobs. When you finish a task, any state the next worker needs goes into the task row itself (`steps.evidence`, `tasks.last_error`, `sop_checklist.note`) — see `hive.colony-progress-tracker`. Use `_working_notes` for things the DB schema doesn't cover.

 You will receive an alert when context reaches {{warn_at_usage_ratio_pct}}% — preserve immediately.
@@ -1,18 +1,30 @@
 ---
 name: hive.error-recovery
-description: Follow a structured recovery protocol when tool calls fail instead of blindly retrying or giving up.
+description: Follow a structured recovery decision tree when tool calls fail instead of blindly retrying or giving up.
 metadata:
  author: hive
  type: default-skill
+  visibility: [worker]
 ---

 ## Operational Protocol: Error Recovery

 When a tool call fails:

-1. Diagnose — record error in notes, classify as transient or structural
-2. Decide — transient: retry once. Structural fixable: fix and retry.
-   Structural unfixable: record as failed, move to next item.
-   Blocking all progress: record escalation note.
-3. Adapt — if same tool failed {{max_retries_per_tool}}+ times, stop using it and find alternative.
-   Update plan in notes. Never silently drop the failed item.
+1. **Diagnose** — classify the failure as *transient* (network blip, rate limit, timeout) or *structural* (wrong selector, missing auth, invalid schema, permission denied).
+
+2. **Decide:**
+   - Transient → retry once.
+   - Structural + fixable → fix the input and retry.
+   - Structural + unfixable → record the failure and move to the next item.
+   - Blocking all progress → escalate.
+
+3. **Adapt** — if the same tool has failed {{max_retries_per_tool}}+ times in a row, stop using it and find an alternative approach.
+
+**Never silently drop a failed item.** If the item is a task in the colony queue, write the failure to the DB instead of an in-memory buffer:
+
+```bash
+sqlite3 "$DB_PATH" "UPDATE tasks SET status='failed', last_error='<one-sentence reason>', completed_at=datetime('now'), updated_at=datetime('now') WHERE id='<task-id>' AND worker_id='<your-worker-id>';"
+```
+
+The `tasks.retry_count` column and the stale-claim reclaimer handle auto-retry for crashes; your job is the within-run decision tree above. See `hive.colony-progress-tracker` for the full queue protocol.
@@ -1,27 +1,29 @@
 ---
 name: hive.note-taking
-description: Maintain structured working notes throughout execution to prevent information loss during context pruning.
+description: Maintain a free-form scratchpad of decisions, extracted values, and open questions so context pruning doesn't lose anything you still need.
 metadata:
  author: hive
  type: default-skill
+  visibility: [worker]
 ---

 ## Operational Protocol: Structured Note-Taking

-Maintain structured working notes in shared buffer key `_working_notes`.
+Maintain free-form working notes in shared buffer key `_working_notes` for data that *you* need to remember but that isn't captured by the colony task queue.
+
+**Do not duplicate the queue in here.** Per-task goal, ordered steps, and SOP gates live in `progress.db` — use `hive.colony-progress-tracker` for those. These notes are for things the DB schema doesn't cover.
+
 Update at these checkpoints:

- After completing each discrete subtask or batch item
- After receiving new information that changes your plan
- Before any tool call that will produce substantial output
+- After receiving new information that changes how you plan to approach the current step
+- Before any tool call that will produce substantial output you'll need to reference later
+- When you make a non-obvious decision whose *why* would be lost if the tool call history gets pruned

 Structure:

-### Objective — restate the goal
-### Current Plan — numbered steps, mark completed with ✓
 ### Key Decisions — decisions made and WHY
-### Working Data — intermediate results, extracted values
-### Open Questions — uncertainties to verify
-### Blockers — anything preventing progress
+### Working Data — intermediate results, extracted values (URLs, IDs, key snippets — not full pages)
+### Open Questions — uncertainties you plan to verify
+### Blockers — anything preventing progress that isn't already captured in `tasks.last_error`

 Update incrementally — do not rewrite from scratch each time.
@@ -4,6 +4,7 @@ description: Periodically self-assess output quality to catch degradation before
 metadata:
  author: hive
  type: default-skill
+  visibility: [worker]
 ---

 ## Operational Protocol: Quality Self-Assessment
@@ -1,17 +0,0 @@
---
-name: hive.task-decomposition
-description: Decompose complex tasks into explicit subtasks before diving in.
-metadata:
-  author: hive
-  type: default-skill
---
-
-## Operational Protocol: Task Decomposition
-
-Before starting a complex task:
-
-1. Decompose — break into numbered subtasks in `_working_notes` Current Plan
-2. Estimate — relative effort per subtask (small/medium/large)
-3. Execute — work through in order, mark ✓ when complete
-4. Budget — if running low on iterations, prioritize by impact
-5. Verify — before declaring done, every subtask must be ✓, skipped (with reason), or blocked
@@ -21,10 +21,32 @@ Each skill is a directory containing a `SKILL.md`. At startup, only the frontmat

 ### Choosing where to put a new skill

- **Project-scoped**: put under `<project>/.hive/skills/` when the skill is tied to that codebase's APIs, conventions, or infra.
- **User-scoped**: put under `~/.hive/skills/` when the skill is reusable across projects for this machine/user.
+- **Colony-scoped (via `create_colony`)**: when the skill is the operational protocol a single colony needs — its API auth, DOM selectors, DB schema, task-queue conventions — do NOT place it under `~/.hive/skills/` or `<project>/.hive/skills/` yourself. Those roots are SHARED and every colony on the machine will see it. Instead, pass the skill content INLINE to the `create_colony` tool (`skill_name`, `skill_description`, `skill_body`, optional `skill_files`). The tool materializes the folder under `~/.hive/colonies/<colony_name>/.hive/skills/<skill-name>/` where it is discovered as **project scope** by only that colony's workers. See the subsection below.
+- **Project-scoped**: put under `<project>/.hive/skills/` when the skill is tied to that codebase's APIs, conventions, or infra and multiple agents in the project should share it.
+- **User-scoped**: put under `~/.hive/skills/` when the skill is reusable across projects for this machine/user and all agents should see it.
 - **Framework default**: add under `core/framework/skills/_default_skills/` AND register in `framework/skills/defaults.py::SKILL_REGISTRY` only when the skill is a universal operational protocol shipped with Hive. Default skills use the `hive.<name>` naming convention and include `type: default-skill` in metadata.

+### Colony-scoped skills via `create_colony`
+
+A colony-scoped skill is one that belongs to exactly ONE colony — e.g. it encodes the HoneyComb staging API the `honeycomb_research` colony polls, or the LinkedIn outbound flow the `linkedin_outbound_campaign` colony runs. Writing such a skill at `~/.hive/skills/` or `<project>/.hive/skills/` leaks it to every other colony, which will then see it at selection time.
+
+**Do not reach for `write_file` to create the folder.** The `create_colony` tool takes the skill content INLINE and places it for you:
+
+```
+create_colony(
+    colony_name="honeycomb_research",
+    task="Build a daily honeycomb market report…",
+    skill_name="honeycomb-api-protocol",
+    skill_description="How to query the HoneyComb staging API…",
+    skill_body="## Operational Protocol\n\nAuth: …",
+    skill_files=[{"path": "scripts/fetch_tickers.py", "content": "…"}],  # optional
+)
+```
+
+The tool writes `~/.hive/colonies/honeycomb_research/.hive/skills/honeycomb-api-protocol/SKILL.md` (plus any `skill_files`), which `SkillDiscovery` picks up as project scope when that colony's workers start — and ONLY that colony's workers. No cross-colony leakage.
+
+Do not write colony-bound skill folders by hand under `~/.hive/skills/`. A skill placed there is user-scoped and becomes visible to every colony on the machine — defeating the isolation you wanted.
+
 ### Directory layout

 ```
@@ -124,8 +146,8 @@ For Python scripts in a Hive project, prefer `uv run scripts/foo.py ...`.
 ### Creating a new skill — workflow

 1. Pick a `<skill-name>` (lowercase-hyphenated).
-2. Decide scope: project (`<project>/.hive/skills/`), user (`~/.hive/skills/`), or framework default (`core/framework/skills/_default_skills/` + registry entry).
-3. Create the directory and write `SKILL.md` with frontmatter + body.
+2. Decide scope: **colony** (pass content INLINE to `create_colony` — STOP here, do not hand-author the folder), project (`<project>/.hive/skills/`), user (`~/.hive/skills/`), or framework default (`core/framework/skills/_default_skills/` + registry entry).
+3. For the non-colony scopes: create the directory and write `SKILL.md` with frontmatter + body.
 4. Add `scripts/`, `references/`, `assets/` only if needed.
 5. Validate the frontmatter: name matches dir, description is specific, no forbidden characters.
 6. Validate using the Hive CLI:
@@ -1,6 +1,6 @@
 ---
 name: hive.browser-automation
-description: Required before any browser_* tool call. Teaches the screenshot + browser_click_coordinate workflow that reaches shadow-DOM inputs selectors can't see, rich-text editor quirks ("send button stays disabled" failures), and CSP gotchas. Covers Chrome via CDP through the GCU Beeline extension. Skipping this causes repeated failures on LinkedIn / Reddit / X. Verified against real production sites 2026-04-11.
+description: Required before any browser_* tool call. Teaches the screenshot + browser_click_coordinate workflow that reaches shadow-DOM inputs selectors can't see, the CSS-pixel coordinate rule (not physical px), rich-text editor quirks ("send button stays disabled" failures), and CSP gotchas. Covers Chrome via CDP through the GCU Beeline extension. Skipping this causes repeated failures on LinkedIn / Reddit / X. Verified against real production sites 2026-04-11.
 metadata:
  author: hive
  type: default-skill
@@ -14,72 +14,60 @@ All GCU browser tools drive a real Chrome instance through the Beeline extension

 ## Coordinates

-Screenshots are delivered at the CSS viewport's own dimensions. A pixel you see in the screenshot is the same coordinate `browser_click_coordinate` expects — no conversion, no scale factors.
+Every browser tool that takes or returns coordinates operates in **fractions of the viewport (0..1 for both axes)**. Read a target's proportional position off `browser_screenshot` — "this button is about 35% from the left and 20% from the top" → pass `(0.35, 0.20)`. Rect-returning tools (`browser_get_rect`, `browser_shadow_query`, and the `rect` inside `focused_element`) also return fractions. The tools convert to CSS pixels internally before dispatching to Chrome.

 ```
-browser_screenshot()                  → image at CSS-viewport size (JPEG)
-browser_click_coordinate(x, y)        → same (x, y)
-browser_hover_coordinate(x, y)        → same (x, y)
-browser_press_at(x, y, key)           → same (x, y)
-browser_get_rect(selector) → rect.css → pass rect.css.cx, rect.css.cy to any of the above
-browser_shadow_query(...)  → sq.css   → same
+browser_screenshot()                  → image + cssWidth/cssHeight in meta
+browser_click_coordinate(x, y)        → x, y are fractions 0..1
+browser_hover_coordinate(x, y)        → fractions
+browser_press_at(x, y, key)           → fractions
+browser_get_rect(selector) → rect     → rect.cx / rect.cy are fractions
+browser_shadow_query(...)  → rect     → same
 ```

-**Exception for zoomed elements:** pages that use `zoom` or `transform: scale()` on a container (LinkedIn's `#interop-outlet`, some embedded iframes) render in a scaled local coordinate space. `getBoundingClientRect` there may not match CDP's hit space. Use `browser_shadow_query` which handles the math, or visually pick coordinates from a screenshot.
+**Why fractions:** every vision model (Claude ~1.15 MP target, GPT-4o 512-px tiles, Gemini, local VLMs) resizes or tiles images differently before the model sees the pixels. Proportions survive every such transform; pixel coordinates only "work" per-model and silently break when you swap backends. Four-decimal precision (`0.0001` ≈ 0.17 CSS px on a 1717-wide viewport) is more than enough for the tightest targets.
+
+**Exception for zoomed elements:** pages that use `zoom` or `transform: scale()` on a container (LinkedIn's `#interop-outlet`, some embedded iframes) render in a scaled local coordinate space. `getBoundingClientRect` there may not match CDP's hit space. Prefer `browser_shadow_query` (which handles the math and returns fractions) or visually pick coordinates from a screenshot. Avoid raw `browser_evaluate` + `getBoundingClientRect()` for coord lookup — that returns CSS px and will be wrong when fed to click tools.

 ## Screenshot + coordinates is shadow-agnostic — prefer it on shadow-heavy sites

-On sites that use Shadow DOM heavily (Reddit's faceplate Web Components, LinkedIn's `#interop-outlet` messaging overlay, some X custom elements), **coordinate-based operations reach elements that selector-based tools can't see.**
+Start with `browser_snapshot` when you need to inspect the page structure or find ordinary controls. If the snapshot does not show the thing you need, shows stale or misleading refs, or cannot prove where a visible target is, take `browser_screenshot` and use the screenshot + coordinate path. This is especially useful on sites that use Shadow DOM heavily

 Why:

- **CDP hit testing walks shadow roots natively.** `browser_click_coordinate(css_x, css_y)` routes through Chrome's native hit tester, which traverses open shadow roots automatically. You don't need to know the shadow structure.
+- **CDP hit testing walks shadow roots natively.** `browser_click_coordinate(x, y)` routes through Chrome's native hit tester, which traverses open shadow roots automatically. You don't need to know the shadow structure.
 - **Keyboard dispatch follows focus** into shadow roots. After a click focuses an input (even one three shadow levels deep), `browser_press(...)` with no selector dispatches keys to `document.activeElement`'s computed focus target.
 - **Screenshots render the real layout** regardless of DOM implementation.

-Whereas `wait_for_selector`, `browser_click(selector=...)`, `browser_type(selector=...)` all use `document.querySelector` under the hood, which **stops at shadow boundaries**. They cannot see elements inside shadow roots.
+Whereas `wait_for_selector`, `browser_click(selector=...)`, `browser_type(selector=...)` all use `document.querySelector` under the hood, which **stops at shadow boundaries**. They cannot see elements inside shadow roots. For shadow-DOM inputs, use `browser_type_focused` after focusing via click-coordinate.

 ### Recommended workflow on shadow-heavy sites

-1. `browser_screenshot()` → visual image (delivered at the CSS-viewport's own dimensions).
-2. Identify the target visually → pixel `(x, y)` read straight off the image.
-3. `browser_click_coordinate(x, y)` → clicks there. **The response includes `focused_element: {tag, id, role, contenteditable, rect, ...}`** — use it to verify you actually focused what you intended.
-4. `browser_type_focused(text="...")` → dispatches CDP `Input.insertText` to `document.activeElement`. Shadow roots, iframes, Lexical, Draft.js, ProseMirror all just work. Use `browser_type(selector, text)` only when you want to target a different element than the one you just focused.
+1. `browser_screenshot()` → JPEG; meta includes `cssWidth`/`cssHeight` for reference.
+2. Identify the target visually → estimate its proportional position `(fx, fy)` where each is in `0..1`.
+3. `browser_click_coordinate(fx, fy)` → tool converts to CSS px and dispatches; CDP native hit testing focuses the element. **The response includes `focused_element: {tag, id, role, contenteditable, rect, inFrame?, ...}`** — use it to verify you actually focused what you intended. `rect` is in fractions (same space as your input). When focus is inside a same-origin iframe, the descriptor reports the inner element and adds `inFrame: [...]` breadcrumbs.
+4. `browser_type_focused(text="...")` → inserts text into `document.activeElement` (traverses into same-origin iframes automatically). Shadow roots, iframes, Lexical, Draft.js, ProseMirror all just work. Use `browser_type(selector, text)` instead when you have a reliable CSS selector for a light-DOM element.
 5. Verify via `browser_screenshot` OR `browser_get_attribute` on a known-reachable marker (e.g. check that the Send button's `aria-disabled` flipped to `false`).

 ### The click→type loop (canonical pattern)

-```
-resp = browser_click_coordinate(x, y)       # x, y read straight off the screenshot
-fe = resp.get("focused_element")
-if fe and (fe.get("contenteditable") or fe["tag"] in ("textarea", "input")):
-    browser_type_focused(text="...")        # insertText to activeElement
-else:
-    # you clicked something that isn't editable — refine the pixel and retry.
-    # Do NOT reach for browser_evaluate + execCommand('insertText', ...)
-    # or a walk(root) shadow traversal. The problem is your click, not
-    # the typing method.
-    ...
-```
+1. Call `browser_click_coordinate(x, y)` to click the target element.
+2. Check the `focused_element` field in the response — it tells you what actually received focus (tag, id, role, contenteditable, rect).
+3. If the focused element is editable, call `browser_type_focused(text="...")` to insert text. Use tools to verify the text took effect — prefer checking the underlying `.value` / `innerText` via `browser_evaluate` or confirming the submit button enabled. A screenshot alone can mislead: narrow input boxes visually clip long text, so only a portion may appear on screen even though the full string was accepted.
+4. If it is NOT editable, your click landed on the wrong thing — refine coordinates and retry. Do NOT reach for `browser_evaluate` + `execCommand('insertText')` or shadow-root traversals. The problem is the click target, not the typing method.

-`browser_click` (selector-based) also returns `focused_element`, so the same check works whether you clicked by selector or by coordinate.
+`browser_click` (selector-based) also returns `focused_element`, so the same check works whether you clicked by selector or coordinate.

 ### Empirically verified (2026-04-11)

 Tested against `https://www.reddit.com/r/programming/` whose search input lives at:
+
 ```
 document > reddit-search-large [shadow]
         > faceplate-search-input#search-input [shadow]
         > input[name="q"]
 ```

- `document.querySelector('input')` → **0 visible inputs** on the page (all in shadow)
- `browser_type('faceplate-search-input input', 'python')` → "Element not found"
- `browser_click_coordinate(617, 28)` → focus trail: `REDDIT-SEARCH-LARGE > FACEPLATE-SEARCH-INPUT > INPUT` ✓
- Char-by-char key dispatch after the click → `input.value === 'python'` ✓
-
-Coordinate pipeline: works perfectly. Selector pipeline: unusable without shadow-piercing syntax.
-
 ### Shadow-piercing selectors

 When you DO want a selector-based approach and know the shadow structure, `browser_shadow_query` and `browser_get_rect` support `>>>` shadow-piercing syntax:
@@ -89,7 +77,7 @@ browser_shadow_query("reddit-search-large >>> #search-input")
 browser_get_rect("#interop-outlet >>> #ember37 >>> p")
 ```

-Returns the element's rect in **CSS pixels** (feed directly to click tools). Remember: `browser_type` and `wait_for_selector` do **not** support `>>>` — only shadow_query and get_rect do.
+Returns the element's rect as **fractions of the viewport** (feed `rect.cx` / `rect.cy` directly to click tools). Remember: `browser_type` and `wait_for_selector` do **not** support `>>>` — only shadow_query and get_rect do.

 ## Navigation and waiting

@@ -97,8 +85,8 @@ Returns the element's rect in **CSS pixels** (feed directly to click tools). Rem

 ```
 browser_navigate(url, wait_until="load")   # "load" | "domcontentloaded" | "networkidle"
-browser_wait_for_selector("h1", timeout_ms=5000)
-browser_wait_for_text("Some text", timeout_ms=5000)
+browser_wait_for_selector("h1", timeout_ms=2000)
+browser_wait_for_text("Some text", timeout_ms=2000)
 browser_go_back()
 browser_go_forward()
 browser_reload()
@@ -108,15 +96,15 @@ All return real URLs and titles. On a fast page `navigate(wait_until="load")` re

 ### Timing expectations (measured against real sites)

-| Site | Navigate load time |
-|---|---|
-| example.com | 100–400 ms |
-| wikipedia.org | 200–500 ms |
-| reddit.com | 1.5–2 s |
-| x.com/twitter | 1.2–1.6 s |
-| linkedin.com (logged in) | 4–5 s |
+| Site                     | Navigate load time |
+| ------------------------ | ------------------ |
+| example.com              | 100–400 ms         |
+| wikipedia.org            | 200–500 ms         |
+| reddit.com               | 1.5–2 s            |
+| x.com/twitter            | 1.2–1.6 s          |
+| linkedin.com (logged in) | 4–5 s              |

-Use `timeout_ms=20000` for LinkedIn and other heavy SPAs to give them margin.
+For LinkedIn and other heavy SPAs, rely on `sleep()` after navigation to let the page hydrate.

 ### After navigate, always let SPA hydrate

@@ -125,8 +113,8 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in
 ### Reading pages efficiently

 - **Prefer `browser_snapshot` over `browser_get_text("body")`** — returns a compact ~1–5 KB accessibility tree vs 100+ KB of raw HTML.
- Interaction tools (`browser_click`, `browser_type`, `browser_fill`, `browser_scroll`, etc.) return a page snapshot automatically in their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Only call `browser_snapshot` when you need a fresh view without performing an action, or after setting `auto_snapshot=false`.
- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. On these pages, `browser_screenshot` is the only reliable way to orient yourself.
+- Interaction tools `browser_click`, `browser_type`, `browser_type_focused`, `browser_fill`, and `browser_scroll` wait 0.5 s for the page to settle after a successful action, then attach a fresh accessibility snapshot under the `snapshot` key of their result. Use it to decide your next action — do NOT call `browser_snapshot` separately after every action. Tune the capture via `auto_snapshot_mode`: `"default"` (full tree, the default), `"simple"` (trims unnamed structural nodes), `"interactive"` (only controls — tightest token footprint), or `"off"` to skip the capture entirely (useful when batching several interactions and you don't need the intermediate trees). Call `browser_snapshot` explicitly only when you need a newer view or a different mode than what was auto-captured.
+- Complex pages (LinkedIn, Twitter/X, SPAs with virtual scrolling) can have DOMs that don't match what's visually rendered — snapshot refs may be stale, missing, or misaligned with visible layout. Try the available snapshot first; when the target is not present in that snapshot or visual position matters, switch to `browser_screenshot` to orient yourself.
 - Only fall back to `browser_get_text` for extracting specific small elements by CSS selector.

 ## Typing and keyboard input
@@ -137,7 +125,7 @@ Even after `wait_until="load"`, React/Vue SPAs often render their real chrome in

 Why this is necessary:

- **React / Vue controlled components** don't trust JS-sourced `.focus()`. React uses event delegation and watches for *native* pointer/focus events — a `click` dispatched via CDP fires the real `pointerdown`/`pointerup`/`click`/`focus` sequence that React listens to, and updates its internal state. A JS-only `.focus()` sets `document.activeElement` but the framework's controlled state doesn't see it.
+- **React / Vue controlled components** don't trust JS-sourced `.focus()`. React uses event delegation and watches for _native_ pointer/focus events — a `click` dispatched via CDP fires the real `pointerdown`/`pointerup`/`click`/`focus` sequence that React listens to, and updates its internal state. A JS-only `.focus()` sets `document.activeElement` but the framework's controlled state doesn't see it.
 - **Draft.js** (X/Twitter compose) and **Lexical** (Gmail, LinkedIn DMs) use contenteditable divs with immutable editor state. They only enter "edit mode" after a real click on the editor surface. Typing at them without clicking routes keys to `document.body` or gets silently discarded.
 - **Send/submit buttons are bound to framework state**, not DOM state. They're typically `disabled={!hasRealContent}` where `hasRealContent` is computed from React/Vue/Svelte state. The input field can have characters in the DOM but the button stays disabled because the framework never saw a real input event.

@@ -145,44 +133,15 @@ The symptom is always the same: **you type, the characters appear visually, and

 ### Safe "click-then-type-then-verify" pattern

-```
-# 1. Focus the real element via a real click (not JS .focus()).
-rect = browser_get_rect(selector)             # or browser_shadow_query for shadow sites
-browser_click_coordinate(rect.css.cx, rect.css.cy)   # rect.css.cx/cy — matched pair
-sleep(0.5)                                     # let the editor open / focus settle
+1. **Focus** the real element via a real click (not JS `.focus()`). Use `browser_get_rect(selector)` (or `browser_shadow_query` for shadow sites) to get coordinates, then `browser_click_coordinate(cx, cy)`. Wait ~0.5 s for the editor to open and focus to settle.

-# 2. Type. browser_type now uses CDP Input.insertText by default, which is
-#    the most reliable way to insert text into rich editors (Lexical,
-#    Draft.js, ProseMirror, any React-controlled contenteditable).
-browser_type(selector, text)
-sleep(1.0)                                     # let framework state commit
+2. **Type** the text. Use `browser_type(selector, text)` for light-DOM inputs, or `browser_type_focused(text=...)` for shadow-DOM / already-focused inputs. Both use CDP `Input.insertText` by default, which is the most reliable method for rich editors (Lexical, Draft.js, ProseMirror). Wait ~500 ms for framework state to commit.

-# 3. BEFORE clicking send, verify the submit button is actually enabled.
-#    Don't trust that typing worked — check state.
-state = browser_evaluate("""
-    (function(){
-      const btn = document.querySelector('[data-testid="tweetButton"]');
-      if (!btn) return {exists: false};
-      return {
-        exists: true,
-        disabled: btn.disabled || btn.getAttribute('aria-disabled') === 'true',
-        text: btn.textContent.trim(),
-      };
-    })()
-""")
+3. **Verify** the submit button is enabled before clicking it. Use `browser_evaluate` to check the button's `disabled` or `aria-disabled` attribute. Do NOT trust that typing worked — always check state.

-# 4. Only click send if the button is enabled.
-if not state['disabled']:
-    browser_click(submit_selector)
-else:
-    # Recovery: sometimes a click-again + one extra keystroke nudges
-    # React into recomputing hasRealContent.
-    browser_click_coordinate(rect.css.cx, rect.css.cy)   # rect.css.cx/cy — matched pair
-    browser_press("End")
-    browser_press(" ")
-    browser_press("Backspace")
-    # re-check state
-```
+   **Partial visibility is fine.** Small single-line inputs, chat boxes with fixed width, and search fields commonly clip or truncate long text visually — only the tail or head may be shown on screen. Don't treat that as failure. What matters is that the framework accepted the input: the submit button enabled, or `element.value` / `innerText` read via `browser_evaluate` contains the full string. If the visible pixels don't match what you typed but the button is enabled and the underlying value is correct, typing succeeded — proceed.
+
+4. **Only click send if the button is enabled.** If the button is still disabled, try the recovery dance: click the textarea again, press `End`, press a space, press `Backspace` — this forces React to recompute `hasRealContent`. Then re-check the button state.

 ### Why `browser_type` uses `Input.insertText` by default

@@ -215,16 +174,16 @@ Always include an equivalent cleanup block in any script that types into a compo

 ### Verified site-specific quirks

-| Site | Editor | Workaround |
-|---|---|---|
-| **X / Twitter** compose | Draft.js | Click `[data-testid='tweetTextarea_0']` first, then type with `delay_ms=20`. First 1-2 chars may be eaten — accept truncation or prepend a throwaway char. Verify `[data-testid='tweetButton']` has `disabled: false` before clicking. |
-| **LinkedIn** messaging | contenteditable (inside `#interop-outlet` shadow root) | Use `browser_shadow_query` to find the rect, click-coordinate to focus, then type via focus-based key dispatch (selector-based type can't reach shadow). Send button is `.msg-form__send-button`. |
-| **LinkedIn** feed post composer | Quill/LinkedIn custom | Click the "Start a post" trigger first, wait 1s for modal, click the textarea, type. |
-| **Reddit** comment/post box | ProseMirror | Click the textarea, wait 0.5s for the toolbar to mount, then type. Submit is `button[slot="submit-button"]` inside a shreddit-composer. |
-| **Gmail** compose | Lexical | Click the body first. Gmail has a visible `div[contenteditable=true][aria-label*='Message Body']` after opening a compose window. |
-| **Slack** message box | contenteditable | Click first, then type. Send is a paper-plane button with `data-qa='texty_send_button'`. |
-| **Discord** | Slate | Click first. Discord's send is implicit on Enter (no button), so just press Enter after typing. |
-| **Monaco** editors (GitHub code review, CodeSandbox) | Monaco | Click first, type with `delay_ms=10`. Monaco listens for `textarea` input events on a hidden textarea — requires focus to be on that textarea. |
+| Site                                                 | Editor                                                 | Workaround                                                                                                                                                                                                                             |
+| ---------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **X / Twitter** compose                              | Draft.js                                               | Click `[data-testid='tweetTextarea_0']` first, then type with `delay_ms=20`. First 1-2 chars may be eaten — accept truncation or prepend a throwaway char. Verify `[data-testid='tweetButton']` has `disabled: false` before clicking. |
+| **LinkedIn** messaging                               | contenteditable (inside `#interop-outlet` shadow root) | Use `browser_shadow_query` to find the rect, click-coordinate to focus, then `browser_type_focused(text=...)` (selector-based `browser_type` can't reach shadow). Send button is `.msg-form__send-button`.                             |
+| **LinkedIn** feed post composer                      | Quill/LinkedIn custom                                  | Click the "Start a post" trigger first, wait 1s for modal, click the textarea, type.                                                                                                                                                   |
+| **Reddit** comment/post box                          | ProseMirror                                            | Click the textarea, wait 0.5s for the toolbar to mount, then type. Submit is `button[slot="submit-button"]` inside a shreddit-composer.                                                                                                |
+| **Gmail** compose                                    | Lexical                                                | Click the body first. Gmail has a visible `div[contenteditable=true][aria-label*='Message Body']` after opening a compose window.                                                                                                      |
+| **Slack** message box                                | contenteditable                                        | Click first, then type. Send is a paper-plane button with `data-qa='texty_send_button'`.                                                                                                                                               |
+| **Discord**                                          | Slate                                                  | Click first. Discord's send is implicit on Enter (no button), so just press Enter after typing.                                                                                                                                        |
+| **Monaco** editors (GitHub code review, CodeSandbox) | Monaco                                                 | Click first, type with `delay_ms=10`. Monaco listens for `textarea` input events on a hidden textarea — requires focus to be on that textarea.                                                                                         |

 ### Plain text into a real input

@@ -238,7 +197,7 @@ browser_type(selector, text)
 - Fires real `keydown` / `keypress` / `input` / `keyup` events — frameworks that branch on `event.key` or `event.code` see the right values
 - Matches what Playwright and Puppeteer send

-Works on real `<input>`, `<textarea>`, and `contenteditable` elements. For shadow-DOM inputs, see the "shadow-heavy sites" section above — `type_text(selector=)` can't see past shadow boundaries.
+Works on real `<input>`, `<textarea>`, and `contenteditable` elements. For shadow-DOM inputs, see the "shadow-heavy sites" section above — `browser_type(selector=)` can't see past shadow boundaries; use `browser_type_focused` after click-coordinate focus.

 ### Keyboard shortcuts (Ctrl+A, Shift+Tab, Cmd+Enter)

@@ -260,12 +219,12 @@ Recognized without modifiers: `Enter`, `Tab`, `Escape`, `Backspace`, `Delete`, `
 ## Screenshots

 ```
-browser_screenshot()                    # viewport, CSS-sized JPEG
-browser_screenshot(full_page=True)      # full scrollable page
+browser_screenshot()                    # viewport, 800 px wide JPEG
+browser_screenshot(full_page=True)      # full scrollable page (overview only — don't click off a full-page shot)
 browser_screenshot(selector="#header")  # clip to element's rect
 ```

-Returns a JPEG (quality 75, ~150–250 KB for a typical UI) at the CSS viewport's own dimensions, plus a JSON metadata block containing `cssWidth`, `devicePixelRatio`, `imageWidth` (= `cssWidth`), and a `scaleHint` confirming image-px == CSS-px. The image is annotated with a highlight rectangle/dot showing the last interaction (click, hover, type) if one happened on this tab.
+Returns a JPEG (quality 75, ~50–120 KB) at 800 px wide. The pixel width is purely a bandwidth choice; all tool coordinates are fractions of the viewport and are invariant to image size. Metadata includes `imageWidth` (800), `cssWidth`, `cssHeight` (for reference), and `physicalScale`. The image is annotated with a highlight rectangle/dot showing the last interaction (click, hover, type) if one happened on this tab.

 The highlight overlay stays visible on the page for **10 seconds** after each interaction, then fades. Before a screenshot is likely, make sure your click / hover / type happens <10 s before the screenshot.

@@ -291,6 +250,7 @@ The highlight overlay stays visible on the page for **10 seconds** after each in
 - Popup appeared that you didn't need? Close it immediately

 `browser_tabs` returns an `origin` field for each tab:
+
 - `"agent"` — you opened it; you own it; close it when done
 - `"popup"` — opened by a link or script; close after extracting what you need
 - `"startup"` or `"user"` — leave these alone unless the task requires it
@@ -303,41 +263,41 @@ The bridge automatically evicts per-tab state (`_cdp_attached`, `_interaction_hi

 ### LinkedIn

-| Target | Selector |
-|---|---|
-| Global search input | `input[data-testid='typeahead-input']` |
-| Own profile link | `a[href*='linkedin.com/in/']` |
-| Messaging overlay | `#interop-outlet >>> [aria-label]` (use shadow_query) |
+| Target              | Selector                                              |
+| ------------------- | ----------------------------------------------------- |
+| Global search input | `input[data-testid='typeahead-input']`                |
+| Own profile link    | `a[href*='linkedin.com/in/']`                         |
+| Messaging overlay   | `#interop-outlet >>> [aria-label]` (use shadow_query) |

 LinkedIn enforces **strict Trusted Types CSP**. Any script you inject via `browser_evaluate` that uses `innerHTML = "<...>"` will be **silently dropped** — the wrapper element gets added but its content is empty, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection on LinkedIn. `style.cssText`, `textContent`, and `.value` assignments are fine (they don't go through the Trusted Types sink).

 ### Reddit (new reddit / shreddit)

-| Target | Selector |
-|---|---|
+| Target                | Selector                                                                     |
+| --------------------- | ---------------------------------------------------------------------------- |
 | Search input (shadow) | `reddit-search-large >>> #search-input` (rect only; type via click-to-focus) |
-| Reddit logo (home) | `#reddit-logo` |
-| Subreddit posts | `shreddit-post` custom elements |
-| Create post button | `a[href*='/submit']` |
+| Reddit logo (home)    | `#reddit-logo`                                                               |
+| Subreddit posts       | `shreddit-post` custom elements                                              |
+| Create post button    | `a[href*='/submit']`                                                         |

 Reddit's search input lives **two shadow levels deep** inside `reddit-search-large > faceplate-search-input`. You cannot reach it with `browser_type(selector=)`. The working pattern:

 1. `browser_shadow_query("reddit-search-large >>> #search-input")` → rect
-2. `browser_click_coordinate(rect.css.cx, rect.css.cy)   # rect.css.cx/cy — matched pair` → click lands on the real shadow input via native hit testing; input becomes focused
-3. `browser_press(c)` for each character → dispatches to focused element
+2. `browser_click_coordinate(rect.cx, rect.cy)` → click lands on the real shadow input via native hit testing; input becomes focused
+3. `browser_type_focused(text="query")` → dispatches to focused element via `Input.insertText`
 4. Verify by reading `.value` via `browser_evaluate` walking the shadow path

 ### X / Twitter

-| Target | Selector |
-|---|---|
-| Main search input | `input[data-testid='SearchBox_Search_Input']` |
-| Home nav link | `a[data-testid='AppTabBar_Home_Link']` |
-| Post text area (compose) | `[data-testid='tweetTextarea_0']` |
-| Reply buttons on feed | `[data-testid='reply']` |
-| Post / Tweet submit button | `[data-testid='tweetButton']` |
-| Caret (⋯) menu on a post | `[data-testid='caret']` |
-| Confirmation sheet button | `[data-testid='confirmationSheetConfirm']` |
+| Target                     | Selector                                      |
+| -------------------------- | --------------------------------------------- |
+| Main search input          | `input[data-testid='SearchBox_Search_Input']` |
+| Home nav link              | `a[data-testid='AppTabBar_Home_Link']`        |
+| Post text area (compose)   | `[data-testid='tweetTextarea_0']`             |
+| Reply buttons on feed      | `[data-testid='reply']`                       |
+| Post / Tweet submit button | `[data-testid='tweetButton']`                 |
+| Caret (⋯) menu on a post   | `[data-testid='caret']`                       |
+| Confirmation sheet button  | `[data-testid='confirmationSheetConfirm']`    |

 **X uses Draft.js for the compose text editor**, which does NOT accept synthetic input reliably. Working workaround: `browser_type(selector='[data-testid="tweetTextarea_0"]', text="...", delay_ms=20)`. The delay gives Draft.js time to process each keystroke. The first 1–2 characters may still get eaten — accept minor truncation or prepend a throwaway character. After typing, check `[data-testid="tweetButton"]` has `disabled: false` before clicking submit.

@@ -393,11 +353,12 @@ Then pass the most specific selector that uniquely identifies the right input (e
 - **Typing into a rich-text editor without clicking first → send button stays disabled.** Draft.js (X), Lexical (Gmail, LinkedIn DMs), ProseMirror (Reddit), and React-controlled `contenteditable` elements only register input as "real" when the element received a native focus event — JS-sourced `.focus()` is not enough. `browser_type` now does this automatically via a real CDP pointer click before inserting text, but always verify the submit button's `disabled` state before clicking send. See the "ALWAYS click before typing" section above.
 - **Using per-character `keyDown` on Lexical / Draft.js editors → keys dispatch but text never appears.** Those editors intercept `beforeinput` and route insertion through their own state machine; raw keyDown events are silently dropped. `browser_type` now uses `Input.insertText` by default (the CDP IME-commit method) which these editors accept cleanly. Only set `use_insert_text=False` when you explicitly need per-keystroke dispatch.
 - **Leaving a composer with text then trying to navigate → `beforeunload` dialog hangs the bridge.** LinkedIn and several other sites pop a native "unsent message" confirm. `browser_navigate` and `close_tab` both time out against this. Always strip `window.onbeforeunload = null` via `browser_evaluate` before any navigation after typing in a composer, or wrap your logic in a `try/finally` that runs the cleanup block.
- **Click landed in the wrong region (sidebar / header instead of target).** The `focused_element` in the click response shows what actually got focused (e.g. `className: "msg-conversation-listitem__link"` means you hit the messaging sidebar). Treat it as ground truth — if it isn't the target, adjust the pixel and retry. Screenshot pixels equal CSS pixels, so the number you passed is the number CDP clicked; a wrong result means you picked the wrong pixel, not that any conversion went sideways.
+- **Click landed in the wrong region (sidebar / header instead of target).** Check `focused_element` in the click response — it's ground truth for what actually got focused, including the `inFrame` breadcrumb when focus ends up inside a same-origin iframe. If it isn't the target (e.g. `className: "msg-conversation-listitem__link"` when you meant to hit a composer), adjust the fraction and retry. Coordinates you pass are fractions of the viewport; the tool multiplies by `cssWidth` / `cssHeight` internally, so a wrong result means your estimated proportion was off — not that any scale went sideways.
+- **Accidentally passing pixels to click / hover / press_at.** The tools reject any coord outside `[-0.1, 1.5]` with a clear error. If you see that error, you passed a pixel (like 815) instead of a fraction (like 0.475). Use `browser_get_rect` to get exact fractional cx/cy, or read proportions off `browser_screenshot`.
 - **Calling `wait_for_selector` on a shadow element.** It'll always time out. Use `browser_shadow_query` or the screenshot + coordinate strategy.
 - **Relying on `innerHTML` in injected scripts on LinkedIn.** Silently discarded. Use `createElement` + `appendChild`.
 - **Not waiting for SPA hydration.** `wait_until="load"` fires before React/Vue rendering on many sites. Add a 2–3 s sleep before querying for chrome elements.
- **Using `browser_type(selector)` on LinkedIn DMs or any shadow-DOM input.** Won't find the element. Fall back to click-to-focus + `browser_press` per character.
+- **Using `browser_type(selector)` on LinkedIn DMs or any shadow-DOM input.** Won't find the element. Use `browser_click_coordinate` to focus, then `browser_type_focused(text=...)` to type.
 - **Clicking a "Photo" / "Attach" / "Upload" button to pick a file.** This opens Chrome's NATIVE OS file picker, which is rendered outside the web page and cannot be interacted with via CDP. Your automation will hang staring at an unreachable dialog. ALWAYS use `browser_upload(selector, file_paths)` against the underlying `<input type='file'>` element — see the "File uploads" section above for the full pattern. This is the single most common way to wedge a browser session on compose-with-media flows (X/LinkedIn/Gmail).
 - **Keyboard shortcuts without the `code` field.** Chrome's shortcut dispatcher ignores keyboard events that lack a `code` or `windowsVirtualKeyCode`. `browser_press(..., modifiers=[...])` populates these automatically; raw `Input.dispatchKeyEvent` calls from `browser_evaluate` may not.
 - **Taking a screenshot more than 10s after the last interaction** and expecting the highlight to still be visible. The overlay fades after 10s. Take the screenshot sooner, or re-trigger the interaction.
@@ -409,17 +370,35 @@ If Chrome detaches the debugger for its own reasons (tab closed, user opened Dev

 If reattach also fails, you'll get the underlying CDP error string — that's a real problem, usually the tab is gone.

-## When to reach for `browser_evaluate`
+## `browser_evaluate` is a last-resort escape hatch

-Use it when:
- You need to read state from inside a shadow root that `browser_get_rect` doesn't handle
- You need a one-shot JS snippet to trigger a site-specific action (scroll a specific container, open a menu, set a form field value directly)
- You need to walk an AX tree or measure layout that the standard tools don't expose
+**Before using `browser_evaluate`, try these first — in this order:**

-Avoid it when:
- A standard tool (`browser_click_coordinate`, `browser_type`, `browser_press`) already does what you need. Those go through CDP's native event pipeline, which real sites trust more than synthetic JS dispatch.
- You're on a strict-CSP site and want to inject DOM — stick to `createElement` + `appendChild`, never `innerHTML`.
- You need to trigger React / Vue / framework state changes — those frameworks watch for real browser events (`input`, `change`, `click`), not scripted `dispatchEvent` calls. Native-event tools are more reliable.
+1. **`browser_screenshot` + `browser_click_coordinate`** — works on every site regardless of shadow DOM, iframes, obfuscated classes. This is the default path for "click a thing you can see."
+2. **`browser_type(use_insert_text=True, text=...)`** — for typing into ANY input/contenteditable, including Lexical and Draft.js. Handles click-focus-insert with built-in retries. Do **not** call `document.execCommand('insertText')` via evaluate; this tool already does it correctly.
+3. **`browser_shadow_query`** or **`browser_get_rect(selector)`** with the `>>>` shadow-piercing syntax — for selector-based lookups across shadow roots.
+4. **`browser_get_text` / `browser_get_attribute`** — for reading element state by selector.
+5. **`browser_snapshot`** — for dumping the accessibility tree of the page.
+
+If all five of those fit your goal, **do not use `browser_evaluate`.** Each evaluate call is a small LLM round-trip of ~30-100 tokens of JS plus a JSON response; five of them burn more context than a single screenshot-and-coordinate does, with less reliability.
+
+### Anti-patterns — stop immediately if you catch yourself doing these
+
+- **Trying multiple `querySelectorAll` variants when the first returned `[]`.** Different selectors on the same page rarely work if the first guess failed — modern SPAs obfuscate class names at build time. After one empty result, switch to `browser_screenshot` + `browser_click_coordinate`. Do not write `.artdeco-list__item`, then `[data-test-incoming-invitation-card]`, then `[class*="invitation"]` — you are already on the wrong path.
+- **Writing `walk(root)` recursive shadow-DOM traversal functions.** Use `browser_shadow_query` — it traverses at the CDP level (native C++), not by re-running a recursive JS function every call.
+- **Calling `document.execCommand('insertText', ...)` to type into a contenteditable.** Use `browser_type(use_insert_text=True, text='...')`. The high-level tool handles the exact same Lexical/Draft.js case but with click-focus-retry logic built in.
+- **Accessing `iframe.contentDocument`.** Rarely works (cross-origin, late hydration) and when it does, the code is brittle. Use `browser_screenshot` to see the iframe, then `browser_click_coordinate` to interact.
+- **Using `innerHTML = "<...>"` on a Trusted Types site (LinkedIn, GitHub).** The assignment is silently dropped. Use `createElement` + `appendChild` if you must inject DOM — but first, ask whether you really need to.
+- **Triggering React/Vue state via synthetic `dispatchEvent`.** Frameworks watch for real browser events. Use `browser_click_coordinate`, `browser_press`, or `browser_type` — all go through CDP's native event pipeline.
+
+### Legitimate uses (when nothing semantic fits)
+
+- Reading a computed style, `window.innerWidth/Height`, `document.scrollingElement.scrollTop`, or other layout values the tools don't expose.
+- Firing a one-shot site-specific API call (analytics beacon, feature-flag toggle).
+- Stripping `onbeforeunload` before navigating away from a page with an unsent draft (LinkedIn, Gmail).
+- Detecting whether a specific shadow-root host exists before a follow-up screenshot.
+
+In all of these cases the script is SHORT (< 10 lines) and the result is CONSUMED (read, then acted on), not further probed.

 ## Login & auth walls

@@ -445,7 +424,7 @@ browser_navigate("https://x.com/explore", wait_until="load")
 sleep(3)
 browser_wait_for_selector("input[data-testid='SearchBox_Search_Input']", timeout_ms=5000)
 rect = browser_get_rect("input[data-testid='SearchBox_Search_Input']")
-browser_click_coordinate(rect.css.cx, rect.css.cy)   # rect.css.cx/cy — matched pair
+browser_click_coordinate(rect.cx, rect.cy)
 browser_type("input[data-testid='SearchBox_Search_Input']", "openai", clear_first=True)
 # Screenshot now shows live search suggestions
 browser_screenshot()
@@ -459,10 +438,9 @@ browser_navigate("https://www.reddit.com/r/programming/", wait_until="load")
 sleep(2)
 # Shadow-pierce the nested search input
 sq = browser_shadow_query("reddit-search-large >>> #search-input")
-browser_click_coordinate(sq.css.cx, sq.css.cy)   # sq.css.cx/cy — matched pair
-# Typing can't use selector (shadow); focused input receives raw key presses
-for c in "python":
-    browser_press(c)
+browser_click_coordinate(sq.rect.cx, sq.rect.cy)
+# Typing can't use selector (shadow); use browser_type_focused on the focused input
+browser_type_focused(text="python")
 browser_screenshot()
 browser_press("Escape")
 ```
@@ -470,11 +448,11 @@ browser_press("Escape")
 ### Search LinkedIn and dismiss without submitting

 ```
-browser_navigate("https://www.linkedin.com/feed/", wait_until="load", timeout_ms=20000)
+browser_navigate("https://www.linkedin.com/feed/", wait_until="load")
 sleep(3)
 browser_wait_for_selector("input[data-testid='typeahead-input']", timeout_ms=5000)
 rect = browser_get_rect("input[data-testid='typeahead-input']")
-browser_click_coordinate(rect.css.cx, rect.css.cy)   # rect.css.cx/cy — matched pair
+browser_click_coordinate(rect.cx, rect.cy)
 browser_type("input[data-testid='typeahead-input']", "anthropic", clear_first=True)
 # Dropdown shows real live suggestions
 browser_screenshot()
@@ -13,15 +13,37 @@ metadata:

 LinkedIn is the hardest mainstream site to automate because it combines **shadow DOM** (`#interop-outlet` for messaging), **strict Trusted Types CSP** (silently drops `innerHTML`), **heavy React reconciliation** (injected nodes get stripped on re-render), **native `beforeunload` draft dialogs** (hang the bridge), and **aggressive spam filters**. Every one of those has bit us at least once. This skill documents what actually works.

-**Always activate `browser-automation` first.** This skill assumes you already know about CSS-px coordinates, `browser_type`'s click-first behavior, and `browser_shadow_query`. The guidance below is LinkedIn-specific; general browser rules are there.
+**Always activate `browser-automation` first.** This skill assumes you already know about CSS-px coordinates, `browser_type`/`browser_type_focused`, and `browser_shadow_query`. The guidance below is LinkedIn-specific; general browser rules are there.
+
+## Rule #0: screenshot + coordinates, not selectors
+
+LinkedIn changes class names aggressively and hides composers inside shadow roots AND iframes. **Selectors break constantly.** Your default strategy on every LinkedIn page should be:
+
+1. `browser_screenshot()` — see the page visually
+2. Pick the target's position from the image
+3. `browser_coords(image_x, image_y)` → get CSS pixels
+4. `browser_click_coordinate(css_x, css_y)` — reaches shadow DOM, iframes, and React elements indifferently
+5. `browser_type(use_insert_text=True, text=...)` — types into whatever is focused, including Lexical composers
+
+**If `browser_evaluate(...querySelectorAll...)` returns `[]` even once, do not try a different selector.** Stop, screenshot, and click. The "what if I try `.artdeco-list__item` next" instinct has burned ~50 tool calls in real sessions before the agent pivoted. Don't fall into that loop.
+
+The selectors in the table below are **only** for when you already know the target is in the light DOM and you want a faster path than screenshot+coord. **When in doubt, default to coordinates.**
+
+## Invitation manager — inline message button path is BROKEN
+
+If the user asks to message a connection request **from the invitation manager page without accepting first**, the inline "Message" button opens a composer inside a nested **iframe overlay** (not a shadow root). The iframe's `contentDocument` is either cross-origin-blocked or not hydrated at access time. This path is **not reliably automatable today.**
+
+**Redirect:** click the person's name/profile link on the card, go to the profile page, and use the standard Profile Message flow below. The profile flow is battle-tested; the inline-iframe flow isn't.
+
+If you end up writing `document.activeElement.tagName === 'IFRAME'` inside a `browser_evaluate`, you've hit this trap. Stop and go to the profile page.

 ## Timing expectations

- `browser_navigate(wait_until="load", timeout_ms=20000)` — LinkedIn takes **4–5 seconds** to load the feed cold. Default 30s timeout is fine; use 20s as a floor.
+- `browser_navigate(wait_until="load")` — LinkedIn takes **4–5 seconds** to load the feed cold.
 - After navigation, **always `sleep(3)`** to let React hydrate the profile/feed chrome before querying selectors. Without the sleep `wait_for_selector` will flake on elements that exist moments later.
 - Composer modal slide-in takes **~2 seconds** after you click the Message button.

-## Verified selectors (2026-04-11)
+## Verified selectors

 | Target | Selector | Notes |
 |---|---|---|
@@ -34,14 +56,14 @@ LinkedIn is the hardest mainstream site to automate because it combines **shadow
 | Pending connection card | `.invitation-card, .invitations-card, [data-test-incoming-invitation-card]` | Filter out "invited you to follow" / "subscribe" cards |
 | Accept button | `button[aria-label*="Accept"]` within the card scope | Per-card scoping is critical — there are many Accept buttons on the page |

-LinkedIn changes class names aggressively. If a class-based selector breaks, fall back to **`browser_screenshot` → visual identification → `browser_click_coordinate`** with the pixel you read straight off the image (screenshots are CSS-sized, so no conversion). The screenshot + coord path works regardless of class-name churn and regardless of shadow DOM.
+LinkedIn changes class names aggressively. If a class-based selector breaks, fall back to **`browser_screenshot` → visual identification → `browser_click_coordinate`** with the pixel you read straight off the image (screenshots are CSS-sized, no conversion). The screenshot + coord path works regardless of class-name churn and regardless of shadow DOM.

 ## Profile Message flow (verified end-to-end 2026-04-11)

 ```
 # 1. Load the profile
-browser_navigate("https://www.linkedin.com/in/<username>/", wait_until="load", timeout_ms=20000)
-sleep(4)
+browser_navigate("https://www.linkedin.com/in/<username>/", wait_until="load")
+sleep(3)

 # 2. Strip onbeforeunload before any state-mutating work — prevents draft-dialog deadlock later
 browser_evaluate("""
@@ -98,17 +120,18 @@ textarea = browser_evaluate("""
 browser_click_coordinate(textarea['cx'], textarea['cy'])
 sleep(0.6)

-# 6. Insert text via browser_type WITHOUT a selector. This dispatches
-#    CDP Input.insertText to document.activeElement — the same underlying
+# 6. Insert text via browser_type_focused. This dispatches CDP
+#    Input.insertText to document.activeElement — the same underlying
 #    mechanism as execCommand('insertText') but with no JSON escaping,
 #    no browser_evaluate round trip, and built-in retry. The click in
 #    step 5 already focused Lexical, so insertText lands in the editor
 #    regardless of the shadow wrapping around #interop-outlet.
 #
-#    Do NOT pass a selector here. Selector-based browser_type cannot see
-#    past the #interop-outlet shadow root. No-selector mode sidesteps
-#    that entirely by routing to activeElement.
-browser_type_focused(text=message_text)   # targets document.activeElement
+#    Use browser_type_focused (not browser_type) here — browser_type
+#    requires a selector, which cannot see past the #interop-outlet
+#    shadow root. browser_type_focused targets document.activeElement
+#    directly, sidestepping shadow boundaries entirely.
+browser_type_focused(text=message_text)
 sleep(1.0)   # let Lexical commit state + enable Send button

 # 7. Find the modal Send button (filter by in-viewport, reject pinned bar)
@@ -170,7 +193,7 @@ Daily outbound pattern — accept pending connection requests and send a templat

 ```
 browser_navigate("https://www.linkedin.com/mynetwork/invitation-manager/received/",
-                 wait_until="load", timeout_ms=20000)
+                 wait_until="load")
 sleep(4)
 browser_evaluate("(function(){window.onbeforeunload=null;})()")

@@ -214,7 +237,7 @@ for card in cards[:25]:
 ## Feed post composer flow

 ```
-browser_navigate("https://www.linkedin.com/feed/", wait_until="load", timeout_ms=20000)
+browser_navigate("https://www.linkedin.com/feed/", wait_until="load")
 sleep(4)
 browser_evaluate("(function(){window.onbeforeunload=null;})()")

@@ -301,7 +324,7 @@ If the image isn't already on disk, write it first with `write_file(absolute_pat

 ## Rate limits and safety

-LinkedIn's abuse detection is aggressive. Respect these limits:
+LinkedIn's abuse detection is aggressive. Beware of the limits, let user know but dare to break them if user confirms.:

 | Action | Limit |
 |---|---|
@@ -309,8 +332,7 @@ LinkedIn's abuse detection is aggressive. Respect these limits:
 | Outbound messages to new 1st-degree connections | **25/day max**, 5–10s randomized delays |
 | Connection request sends | **100/week max**, spread across days, warm intros preferred |
 | Profile views | Several hundred/day is usually fine but varies by account age |
-| Post publications | 1–3/day, no URL-only posts |
-| Feed reactions | Dozens/day is fine; vary your activity mix |
+| Post publications | 1–5/day, no URL-only posts |

 Signals you're being throttled:
 - "Message failed to send" with no error detail
@@ -323,9 +345,8 @@ If any of those show up, **stop the run, screenshot the state, and surface the i
 ## Common pitfalls

 - **`innerHTML` injection is silently dropped** — LinkedIn's Trusted Types CSP discards any `innerHTML = "<...>"` from injected scripts, no console error. Always use `createElement` + `appendChild` + `setAttribute` for DOM injection. `textContent`, `style.cssText`, and `.value` assignments are fine.
- **Do NOT use selector-based `browser_type` on the message composer — use `browser_type_focused(text=...)`.** The Lexical contenteditable lives inside the `#interop-outlet` iframe/shadow wrapper which `document.querySelector` cannot see. `browser_shadow_query` can find it but selector-based `browser_type` doesn't support the `>>>` shadow-pierce syntax. The reliable insert path is: (1) `browser_click_coordinate` on the composer rect — the response's `focused_element` (which recurses into same-origin iframes) confirms what actually received focus → (2) `browser_type_focused(text=message_text)` — CDP `Input.insertText` dispatches to `document.activeElement` regardless of shadow wrapping.
+- **Use `browser_type_focused` (not `browser_type`) on the message composer.** The Lexical contenteditable lives inside the `#interop-outlet` shadow root which `document.querySelector` (what `browser_type`'s selector path uses under the hood) cannot see. `browser_type` requires a selector and will fail with "Element not found". The reliable insert path is: (1) `browser_click_coordinate` on the composer rect — the response's `focused_element` confirms Lexical received focus → (2) `browser_type_focused(text=message_text)` — CDP `Input.insertText` dispatches to `document.activeElement` regardless of shadow wrapping.
 - **Per-char keyDown on the message composer produces empty text** — Lexical intercepts `beforeinput` and drops raw keys. Use `browser_type_focused(text=..., use_insert_text=True)` after click-coordinate focused the composer. The CDP `Input.insertText` method commits as if IME fired, which Lexical accepts cleanly.
- **ANTI-PATTERN: "inject a dummy `<div id='dummy-target'>` and pass it as the `selector` arg to `browser_type`".** This fails compoundingly: `browser_type` clicks the **dummy div's** rect (not the editor's), the click lands on the Lexical wrapper's non-editable chrome, the contenteditable never receives focus, and `Input.insertText` fires against nothing. The bridge will still return `{"ok": true, "action": "type", "length": N}` because it has no way to verify the text actually landed. Symptom: Send button stays `disabled: true` forever. Fix: `browser_click_coordinate` on the real composer rect, then `browser_type_focused(text=message_text)` — CDP `Input.insertText` dispatches to `document.activeElement`.
 - **Multiple Send buttons on the page** — the pinned bottom-right messaging bar has its own `msg-form__send-button` that's usually below `innerHeight`. Filter by in-viewport before clicking.
 - **`window.onbeforeunload` hangs navigation/close** — after typing in a composer, any `browser_navigate` or `close_tab` can pop a native "unsent message, leave?" confirm dialog that deadlocks the bridge. Always strip `onbeforeunload` before any navigation, and wrap composer flows in a `try/finally` that runs the cleanup block:

@@ -346,7 +367,7 @@ browser_evaluate("""

 ## Auth wall detection

-If you see a "Log in" / "Join LinkedIn" prompt instead of the logged-in feed, **stop immediately** and surface the issue. Do NOT attempt to log in via automation — LinkedIn's bot detection will flag the account.
+If you see a "Log in" / "Join LinkedIn" prompt instead of the logged-in feed, **stop immediately** and surface the issue to user. Do NOT attempt to log in via automation — LinkedIn's bot detection will flag the account.

 Check via:
 ```
@@ -360,24 +381,15 @@ is_logged_in = browser_evaluate("""

 ## Deduplication pattern

-For any daily loop (connection acceptance, profile visits, DMs), maintain a ledger file:
+Dedup is handled by the colony progress queue, not a separate JSON file. For any daily loop (connection acceptance, profile visits, DMs), the queen enqueues one row in the `tasks` table per `(profile_url, action)` pair; workers claim, act, and mark done. Already-`done` rows are skipped on the next claim — that's your crash-resume and cross-day dedup. See `hive.colony-progress-tracker` for the full claim/update protocol.

-```
-# data/linkedin_contacts.json
-{
-  "contacts": [
-    {
-      "profile_url": "https://www.linkedin.com/in/username/",
-      "name": "First Last",
-      "action": "connection_accepted+message_sent",
-      "timestamp": "2026-04-13T09:30:00Z",
-      "message_preview": "first 50 chars of message sent"
-    }
-  ]
-}
+If you need to check whether a given `(profile_url, action)` has already been handled in a prior run before enqueuing a new row, query the queue directly:
+
+```bash
+sqlite3 "<db_path>" "SELECT status FROM tasks WHERE payload LIKE '%\"profile_url\":\"<url>\"%' AND payload LIKE '%\"action\":\"<action>\"%';"
 ```

-Before any action, check if the profile URL already has a recent entry for the same action. Skip if yes. Atomic-write the ledger after each success so crash-resume works.
+Empty → not yet enqueued, safe to add. Otherwise honor the existing row's status.

 ## See also

@@ -203,7 +203,7 @@ for c in candidates:
    else:
        browser_click("[data-testid='tweetButton']")
        sleep(2)
-        record_sent(c['preview'], reply_text)  # append to ledger
+        # Mark the task done in progress.db — see hive.colony-progress-tracker

    # Close the composer (press Escape or click the Close button)
    browser_press("Escape")
@@ -307,24 +307,9 @@ If any of these appear, **stop the run, screenshot the state, and surface the is

 ## Deduplication pattern

-Every daily loop should maintain a ledger file. Append after each successful reply/post, atomic-write to survive crashes.
+Dedup is handled by the colony progress queue, not a separate JSON file. The queen enqueues one row in the `tasks` table per reply target (keyed by tweet URL); workers claim, reply, and mark done. Already-`done` rows are skipped on the next claim — that's your crash-resume and cross-day dedup, for free. See `hive.colony-progress-tracker` for the full claim/update protocol.

-```
-# data/x_replies_ledger.json
-{
-  "replies": [
-    {
-      "tweet_url": "https://x.com/<author>/status/<id>",
-      "author": "username",
-      "original_preview": "first 100 chars of the tweet",
-      "reply_text": "what you sent",
-      "timestamp": "2026-04-13T09:30:00Z"
-    }
-  ]
-}
-```
-
-Extract the tweet URL via `browser_evaluate`:
+Extract the tweet URL via `browser_evaluate` so the queen can use it as the task key:

 ```
 url = browser_evaluate("""
@@ -337,7 +322,13 @@ url = browser_evaluate("""
 """, article_index)
 ```

-Before each reply, check if the URL already has a ledger entry. If yes, skip. This survives across runs and across days.
+If you need to check whether a given tweet URL has already been replied to in a prior run (e.g., scanning live search results before enqueuing), query the queue directly:
+
+```bash
+sqlite3 "<db_path>" "SELECT status FROM tasks WHERE payload LIKE '%\"tweet_url\":\"<url>\"%';"
+```
+
+Empty → not yet enqueued, safe to add. Otherwise honor the existing row's status.

 ## Reply style guidelines

@@ -0,0 +1,203 @@
+"""Shared skill authoring primitives.
+
+Validates and materializes a skill folder. Used by three callers:
+
+1. Queen's ``create_colony`` tool (``queen_lifecycle_tools.py``) — inline
+   content passed by the queen during colony creation.
+2. HTTP POST / PUT routes under ``/api/**/skills`` — UI-driven creation.
+3. Future ``create_learned_skill`` tool — runtime learning.
+
+Keeping the validators and writer here ensures the three paths share one
+authority; changes to the name regex or frontmatter layout happen in one
+place.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+import shutil
+from dataclasses import dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Framework skill names include dots (``hive.note-taking``), so the
+# validator needs to allow them even though the queen's ``create_colony``
+# tool historically forbade dots. User-created skills without dots still
+# pass; the dot cap just prevents us from rejecting existing framework
+# names when the UI toggles them via ``validate_skill_name``.
+_SKILL_NAME_RE = re.compile(r"^[a-z0-9.-]+$")
+_MAX_NAME_LEN = 64
+_MAX_DESC_LEN = 1024
+
+
+@dataclass
+class SkillFile:
+    """Supporting file bundled with a skill (relative path + content)."""
+
+    rel_path: Path
+    content: str
+
+
+@dataclass
+class SkillDraft:
+    """Validated skill content ready to be written to disk."""
+
+    name: str
+    description: str
+    body: str
+    files: list[SkillFile] = field(default_factory=list)
+
+    @property
+    def skill_md_text(self) -> str:
+        """Assemble the final SKILL.md text (frontmatter + body)."""
+        body_norm = self.body.rstrip() + "\n"
+        return f"---\nname: {self.name}\ndescription: {self.description}\n---\n\n{body_norm}"
+
+
+def validate_skill_name(raw: str) -> tuple[str | None, str | None]:
+    """Return ``(normalized_name, error)``. Either side may be None."""
+    name = (raw or "").strip() if isinstance(raw, str) else ""
+    if not name:
+        return None, "skill_name is required"
+    if not _SKILL_NAME_RE.match(name):
+        return None, f"skill_name '{name}' must match [a-z0-9-] pattern"
+    if name.startswith("-") or name.endswith("-") or "--" in name:
+        return None, f"skill_name '{name}' has leading/trailing/consecutive hyphens"
+    if len(name) > _MAX_NAME_LEN:
+        return None, f"skill_name '{name}' exceeds {_MAX_NAME_LEN} chars"
+    return name, None
+
+
+def validate_description(raw: str) -> tuple[str | None, str | None]:
+    desc = (raw or "").strip() if isinstance(raw, str) else ""
+    if not desc:
+        return None, "skill_description is required"
+    if len(desc) > _MAX_DESC_LEN:
+        return None, f"skill_description must be 1–{_MAX_DESC_LEN} chars"
+    # Frontmatter descriptions are line-oriented — the parser reads one value.
+    if "\n" in desc or "\r" in desc:
+        return None, "skill_description must be a single line (no newlines)"
+    return desc, None
+
+
+def validate_files(raw: list[dict] | None) -> tuple[list[SkillFile] | None, str | None]:
+    if not raw:
+        return [], None
+    if not isinstance(raw, list):
+        return None, "skill_files must be an array"
+    out: list[SkillFile] = []
+    for entry in raw:
+        if not isinstance(entry, dict):
+            return None, "each skill_files entry must be an object with 'path' and 'content'"
+        rel_raw = entry.get("path")
+        content = entry.get("content")
+        if not isinstance(rel_raw, str) or not rel_raw.strip():
+            return None, "skill_files entry missing non-empty 'path'"
+        if not isinstance(content, str):
+            return None, f"skill_files entry '{rel_raw}' missing string 'content'"
+        rel_stripped = rel_raw.strip()
+        # Allow './foo' but reject '/foo' — relativizing absolute paths silently
+        # has bitten other tools; make the intent loud instead.
+        if rel_stripped.startswith("./"):
+            rel_stripped = rel_stripped[2:]
+        rel_path = Path(rel_stripped)
+        if rel_stripped.startswith("/") or rel_path.is_absolute() or ".." in rel_path.parts:
+            return None, f"skill_files path '{rel_raw}' must be relative and inside the skill folder"
+        if rel_path.as_posix() == "SKILL.md":
+            return None, "skill_files must not contain SKILL.md — pass skill_body instead"
+        out.append(SkillFile(rel_path=rel_path, content=content))
+    return out, None
+
+
+def build_draft(
+    *,
+    skill_name: str,
+    skill_description: str,
+    skill_body: str,
+    skill_files: list[dict] | None = None,
+) -> tuple[SkillDraft | None, str | None]:
+    """Validate all inputs and return an immutable draft ready for writing."""
+    name, err = validate_skill_name(skill_name)
+    if err or name is None:
+        return None, err
+    desc, err = validate_description(skill_description)
+    if err or desc is None:
+        return None, err
+    body = skill_body if isinstance(skill_body, str) else ""
+    if not body.strip():
+        return None, (
+            "skill_body is required — the operational procedure the colony worker needs to run this job unattended"
+        )
+    files, err = validate_files(skill_files)
+    if err or files is None:
+        return None, err
+    return SkillDraft(name=name, description=desc, body=body, files=list(files)), None
+
+
+def write_skill(
+    draft: SkillDraft,
+    *,
+    target_root: Path,
+    replace_existing: bool = True,
+) -> tuple[Path | None, str | None, bool]:
+    """Write the draft under ``target_root/{draft.name}/``.
+
+    ``target_root`` is the parent scope dir (e.g.
+    ``~/.hive/agents/queens/{id}/skills`` or
+    ``{colony_dir}/.hive/skills``). The function creates it if needed.
+
+    Returns ``(installed_path, error, replaced)``. On success ``error`` is
+    ``None``; on failure ``installed_path`` is ``None`` and the target is
+    left as it was before the call (best-effort).
+
+    When ``replace_existing=False`` and the target dir already exists,
+    the write is refused with a non-fatal error (caller decides whether
+    to surface it as a 409 or a warning).
+    """
+    try:
+        target_root.mkdir(parents=True, exist_ok=True)
+    except OSError as e:
+        return None, f"failed to create skills root: {e}", False
+
+    target = target_root / draft.name
+    replaced = False
+    try:
+        if target.exists():
+            if not replace_existing:
+                return None, f"skill '{draft.name}' already exists", False
+            # Remove the old dir outright so stale files from a prior
+            # version don't linger alongside the new ones.
+            replaced = True
+            shutil.rmtree(target)
+        target.mkdir(parents=True, exist_ok=False)
+        (target / "SKILL.md").write_text(draft.skill_md_text, encoding="utf-8")
+        for sf in draft.files:
+            full_path = target / sf.rel_path
+            full_path.parent.mkdir(parents=True, exist_ok=True)
+            full_path.write_text(sf.content, encoding="utf-8")
+    except OSError as e:
+        return None, f"failed to write skill folder {target}: {e}", replaced
+    return target, None, replaced
+
+
+def remove_skill(target_root: Path, skill_name: str) -> tuple[bool, str | None]:
+    """Rm-tree the skill directory under ``target_root/{skill_name}/``.
+
+    Returns ``(removed, error)``. ``removed=False, error=None`` means
+    the directory didn't exist (idempotent). Name is validated on the
+    way in so an attacker with UI access can't traverse out of the
+    scope root.
+    """
+    name, err = validate_skill_name(skill_name)
+    if err or name is None:
+        return False, err
+    target = target_root / name
+    if not target.exists():
+        return False, None
+    try:
+        shutil.rmtree(target)
+    except OSError as e:
+        return False, f"failed to remove skill folder {target}: {e}"
+    return True, None
@@ -20,6 +20,11 @@ logger = logging.getLogger(__name__)
 # visible. Preserving awareness of every skill beats truncating entries.
 _COMPACT_THRESHOLD_CHARS = 5000

+# Per-skill description cap. Descriptions often run 300–500 chars of
+# context that's only useful once — the first sentence is enough to
+# decide whether a skill applies. Truncated entries get a trailing "…".
+_DESCRIPTION_CAP_CHARS = 140
+
 _MANDATORY_HEADER_FULL = """## Skills (mandatory)
 Before replying: scan <available_skills> <description> entries.
 - If exactly one skill clearly applies: read its SKILL.md at <location> with `read_file`, then follow it.
@@ -29,12 +34,8 @@ Constraints: never read more than one skill up front; only read after selecting.
 - When a skill drives external API writes (Gmail, Calendar, GitHub, etc.),
  assume rate limits: prefer fewer larger writes, avoid tight one-item loops,
  serialize bursts when possible, and respect 429/Retry-After.
-
-
-The following skills provide specialized instructions for specific tasks.
-Use `read_file` to load a skill's SKILL.md when the task matches its description.
-When a skill file references a relative path, resolve it against the
-skill directory (parent of SKILL.md) and use that absolute path in tool commands."""
+- When a selected skill references a relative path, resolve it against the
+  skill directory (parent of SKILL.md) and use that absolute path in tool commands."""

 _MANDATORY_HEADER_COMPACT = """## Skills (mandatory)
 Before replying: scan <available_skills> <name> entries.
@@ -45,12 +46,8 @@ Constraints: never read more than one skill up front; only read after selecting.
 - When a skill drives external API writes (Gmail, Calendar, GitHub, etc.),
  assume rate limits: prefer fewer larger writes, avoid tight one-item loops,
  serialize bursts when possible, and respect 429/Retry-After.
-
-
-The following skills provide specialized instructions for specific tasks.
-Use `read_file` to load a skill's SKILL.md when the task matches its name.
-When a skill file references a relative path, resolve it against the
-skill directory (parent of SKILL.md) and use that absolute path in tool commands."""
+- When a selected skill references a relative path, resolve it against the
+  skill directory (parent of SKILL.md) and use that absolute path in tool commands."""


 class SkillCatalog:
@@ -88,18 +85,27 @@ class SkillCatalog:
        """All skill base directories for file access allowlisting."""
        return [skill.base_dir for skill in self._skills.values()]

-    def to_prompt(self) -> str:
+    def to_prompt(self, *, phase: str | None = None) -> str:
        """Generate the catalog prompt for system prompt injection.

        Returns empty string when no skills are present. Otherwise returns
        a mandatory pre-reply checklist + decision rules + rate-limit note,
        followed by the <available_skills> XML body.

-        When the full XML body exceeds ``_COMPACT_THRESHOLD_CHARS``, the
-        compact variant is emitted instead: <description> elements are
-        dropped so every skill stays visible before any gets truncated.
+        When ``phase`` is set, skills whose ``visibility`` list is present
+        and does not include that phase are filtered out. Skills with
+        ``visibility=None`` always appear.
+
+        Descriptions are capped to the first sentence or
+        ``_DESCRIPTION_CAP_CHARS`` (whichever is shorter) with a trailing
+        "…" on truncation. When the full XML body still exceeds
+        ``_COMPACT_THRESHOLD_CHARS`` the compact variant is emitted:
+        <description> elements are dropped so every skill stays visible
+        before any gets truncated.
        """
        all_skills = sorted(self._skills.values(), key=lambda s: s.name)
+        if phase is not None:
+            all_skills = [s for s in all_skills if s.visibility is None or phase in s.visibility]
        if not all_skills:
            return ""

@@ -111,7 +117,25 @@ class SkillCatalog:
        return f"{_MANDATORY_HEADER_COMPACT}\n\n{compact_xml}"

    @staticmethod
-    def _render_xml(skills: list[ParsedSkill], *, compact: bool) -> str:
+    def _cap_description(description: str) -> str:
+        """Return the first sentence or first ``_DESCRIPTION_CAP_CHARS`` chars."""
+        text = description.strip()
+        if not text:
+            return text
+        # First sentence boundary — look for '. ', '! ', '? '. Avoid matching
+        # decimals or abbreviations by requiring whitespace after the mark.
+        for i, ch in enumerate(text):
+            if ch in ".!?" and (i + 1 == len(text) or text[i + 1].isspace()):
+                sentence = text[: i + 1]
+                if len(sentence) <= _DESCRIPTION_CAP_CHARS:
+                    return sentence
+                break
+        if len(text) <= _DESCRIPTION_CAP_CHARS:
+            return text
+        return text[: _DESCRIPTION_CAP_CHARS - 1].rstrip() + "…"
+
+    @classmethod
+    def _render_xml(cls, skills: list[ParsedSkill], *, compact: bool) -> str:
        """Render the `<available_skills>` block.

        ``compact=True`` drops `<description>` to preserve skill awareness
@@ -122,7 +146,8 @@ class SkillCatalog:
            lines.append("  <skill>")
            lines.append(f"    <name>{escape(skill.name)}</name>")
            if not compact:
-                lines.append(f"    <description>{escape(skill.description)}</description>")
+                capped = cls._cap_description(skill.description)
+                lines.append(f"    <description>{escape(capped)}</description>")
            lines.append(f"    <location>{escape(skill.location)}</location>")
            lines.append("  </skill>")
        lines.append("</available_skills>")
@@ -36,8 +36,8 @@ class SkillsConfig:
        # Default skill configuration
        default_skills = {
            "hive.note-taking": {"enabled": True},
-            "hive.batch-ledger": {"enabled": True, "checkpoint_every_n": 10},
-            "hive.quality-monitor": {"enabled": False},
+            "hive.quality-monitor": {"enabled": False, "assessment_interval": 10},
+            "hive.error-recovery": {"max_retries_per_tool": 5},
        }
    """

@@ -24,34 +24,21 @@ _SKILL_DEFAULTS: dict[str, dict[str, Any]] = {
    "hive.quality-monitor": {"assessment_interval": 5},
    "hive.error-recovery": {"max_retries_per_tool": 3},
    "hive.context-preservation": {"warn_at_usage_ratio_pct": 45},
-    "hive.batch-ledger": {"checkpoint_every_n": 5},
 }

-# Keywords that indicate a batch processing scenario (DS-12)
-_BATCH_KEYWORDS: tuple[str, ...] = (
-    "list of",
-    "collection of",
-    "set of",
-    "batch of",
-    "each item",
-    "for each",
-    "process all",
-    "records",
-    "entries",
-    "rows",
-    "items",
-)
-
-_BATCH_INIT_NUDGE = (
-    "Note: your input appears to describe a batch operation. "
-    "Initialize `_batch_ledger` with the total item count before processing."
-)
-

 def is_batch_scenario(text: str) -> bool:
-    """Return True if *text* contains batch-processing indicators (DS-12)."""
-    lower = text.lower()
-    return any(kw in lower for kw in _BATCH_KEYWORDS)
+    """Deprecated: batch auto-detection is no longer used.
+
+    Kept as a no-op so the agent_loop call site (which wraps it in an
+    ``if ctx.default_skill_batch_nudge:`` guard that's also now always
+    empty) can stay unchanged until a broader cleanup.  The old
+    ``_batch_ledger`` shared-buffer feature was replaced by the
+    per-colony SQLite task queue (``hive.colony-progress-tracker``),
+    which lives in ``progress.db`` and is authoritative for batch
+    state across workers and runs.
+    """
+    return False


 def _apply_overrides(skill_name: str, body: str, overrides: dict[str, Any]) -> str:
@@ -67,40 +54,37 @@ def _apply_overrides(skill_name: str, body: str, overrides: dict[str, Any]) -> s
    return body


-# Ordered list of default skills (name → directory)
+# Ordered list of default skills (name → directory).
+#
+# Removed on 2026-04-15 as part of the colony-progress-tracker rollout:
+#   - hive.task-decomposition — steps table in progress.db supersedes
+#     in-memory ``_working_notes → Current Plan`` decomposition.
+#   - hive.batch-ledger       — tasks table in progress.db supersedes
+#     the ``_batch_ledger`` dict-shaped queue with its pending →
+#     in_progress → completed/failed/skipped state machine.
+# Both were duplicating state that belongs in SQLite.
 SKILL_REGISTRY: dict[str, str] = {
    "hive.note-taking": "note-taking",
-    "hive.batch-ledger": "batch-ledger",
    "hive.context-preservation": "context-preservation",
    "hive.quality-monitor": "quality-monitor",
    "hive.error-recovery": "error-recovery",
-    "hive.task-decomposition": "task-decomposition",
+    "hive.colony-progress-tracker": "colony-progress-tracker",
    "hive.writing-hive-skills": "writing-hive-skills",
 }

-# All shared buffer keys used by default skills (for permission auto-inclusion)
+# Shared buffer keys referenced by the remaining default skills (used
+# for permission auto-inclusion). The dead keys for batch-ledger,
+# task-decomposition, the handoff buffer, and the error-log buffers
+# were removed when those features migrated to progress.db.
 DATA_BUFFER_KEYS: list[str] = [
    # note-taking
    "_working_notes",
    "_notes_updated_at",
-    # batch-ledger
-    "_batch_ledger",
-    "_batch_total",
-    "_batch_completed",
-    "_batch_failed",
    # context-preservation
-    "_handoff_context",
    "_preserved_data",
    # quality-monitor
    "_quality_log",
    "_quality_degradation_count",
-    # error-recovery
-    "_error_log",
-    "_failed_tools",
-    "_escalation_needed",
-    # task-decomposition
-    "_subtasks",
-    "_iteration_budget_remaining",
 ]


@@ -252,16 +236,15 @@ class DefaultSkillManager:

    @property
    def batch_init_nudge(self) -> str | None:
-        """Nudge text to prepend to system prompt when batch input detected (DS-12).
+        """Deprecated: always returns None.

-        Returns None if ``hive.batch-ledger`` is disabled or auto_detect_batch is False.
+        The ``hive.batch-ledger`` default skill was removed when batch
+        tracking moved into ``progress.db`` (``hive.colony-progress-
+        tracker``). Callers in agent_host, colony_runtime, and
+        orchestrator still read this property; returning None keeps
+        them functional with no system-prompt nudge.
        """
-        if "hive.batch-ledger" not in self._skills:
-            return None
-        overrides = self._config.get_default_overrides("hive.batch-ledger")
-        if overrides.get("auto_detect_batch") is False:
-            return None
-        return _BATCH_INIT_NUDGE
+        return None

    @property
    def context_warn_ratio(self) -> float | None:
@@ -7,7 +7,7 @@ locations. Resolves name collisions deterministically.
 from __future__ import annotations

 import logging
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path

 from framework.skills.parser import ParsedSkill, parse_skill_md
@@ -30,16 +30,40 @@ _SKIP_DIRS = frozenset(
 )

 # Scope priority (higher = takes precedence)
+# ``preset`` sits between framework and user: bundled alongside the
+# framework distribution, but off by default — capability packs the user
+# opts into per queen/colony rather than globally-enabled infra.
 _SCOPE_PRIORITY = {
    "framework": 0,
-    "user": 1,
-    "project": 2,
+    "preset": 1,
+    "user": 2,
+    "queen_ui": 3,
+    "colony_ui": 4,
+    "project": 5,
 }

 # Within the same scope, Hive-specific paths override cross-client paths.
 # We encode this by scanning cross-client first, then Hive-specific (later wins).


+@dataclass
+class ExtraScope:
+    """Additional scope dir to scan beyond the standard five.
+
+    Used by :class:`framework.skills.manager.SkillsManager` to surface
+    per-queen (``queen_ui``) and per-colony (``colony_ui``) skill
+    directories created through the UI. The ``label`` feeds
+    :attr:`ParsedSkill.source_scope` so downstream consumers (trust
+    gate, UI provenance resolver) can distinguish scope origins.
+    """
+
+    directory: Path
+    label: str
+    # Kept for forward-compat with the priority table; discovery itself
+    # relies on scan order for last-wins resolution.
+    priority: int = 0
+
+
@dataclass
 class DiscoveryConfig:
    """Configuration for skill discovery."""
@@ -49,6 +73,10 @@ class DiscoveryConfig:
    skip_framework_scope: bool = False
    max_depth: int = 4
    max_dirs: int = 2000
+    # Additional scope dirs scanned between user and project scopes,
+    # in the order they are provided. Use ``ExtraScope`` to tag each
+    # with its logical label (``queen_ui`` / ``colony_ui``).
+    extra_scopes: list[ExtraScope] = field(default_factory=list)


 class SkillDiscovery:
@@ -82,13 +110,22 @@ class SkillDiscovery:
        all_skills: list[ParsedSkill] = []
        self._scanned_dirs = []

-        # Framework scope (lowest precedence)
+        # Framework scope (lowest precedence) — always-on infra skills.
        if not self._config.skip_framework_scope:
            framework_dir = Path(__file__).parent / "_default_skills"
            if framework_dir.is_dir():
                self._scanned_dirs.append(framework_dir)
                all_skills.extend(self._scan_scope(framework_dir, "framework"))

+            # Preset scope — bundled capability packs that ship with the
+            # framework but default to OFF. User opts in per queen/colony
+            # via the Skills Library. ``skip_framework_scope`` covers both
+            # bundled directories since they live side-by-side on disk.
+            preset_dir = Path(__file__).parent / "_preset_skills"
+            if preset_dir.is_dir():
+                self._scanned_dirs.append(preset_dir)
+                all_skills.extend(self._scan_scope(preset_dir, "preset"))
+
        # User scope
        if not self._config.skip_user_scope:
            home = Path.home()
@@ -105,6 +142,13 @@ class SkillDiscovery:
                self._scanned_dirs.append(user_hive)
                all_skills.extend(self._scan_scope(user_hive, "user"))

+        # Extra scopes (queen_ui / colony_ui), scanned between user and project
+        # so colony overrides beat queen overrides, and both beat user-scope.
+        for extra in self._config.extra_scopes:
+            if extra.directory.is_dir():
+                self._scanned_dirs.append(extra.directory)
+                all_skills.extend(self._scan_scope(extra.directory, extra.label))
+
        # Project scope (highest precedence)
        if self._config.project_root:
            root = self._config.project_root
@@ -23,6 +23,7 @@ Typical usage — **bare** (exported agents, SDK users)::

 from __future__ import annotations

+import asyncio
 import logging
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -44,6 +45,18 @@ class SkillsManagerConfig:
            even when ``project_root`` is set.
        interactive: Whether trust gating can prompt the user interactively.
            When ``False``, untrusted project skills are silently skipped.
+        queen_id: Optional queen identifier. When set, enables the
+            ``queen_ui`` scope and per-queen override file.
+        queen_overrides_path: Path to
+            ``~/.hive/agents/queens/{queen_id}/skills_overrides.json``.
+            When set, the store is loaded and its entries override
+            discovery results (disable skills, record provenance).
+        colony_name: Optional colony identifier; mirrors ``queen_id`` for
+            the ``colony_ui`` scope.
+        colony_overrides_path: Per-colony override file path.
+        extra_scope_dirs: Extra scope dirs scanned between user and
+            project scopes. Typically populated by the caller with the
+            queen/colony UI skill directories.
    """

    skills_config: SkillsConfig = field(default_factory=SkillsConfig)
@@ -51,6 +64,15 @@ class SkillsManagerConfig:
    skip_community_discovery: bool = False
    interactive: bool = True

+    # Override support
+    queen_id: str | None = None
+    queen_overrides_path: Path | None = None
+    colony_name: str | None = None
+    colony_overrides_path: Path | None = None
+    # Typed at the call site as ``list[ExtraScope]`` — not imported here
+    # to keep this module free of discovery-layer dependencies.
+    extra_scope_dirs: list = field(default_factory=list)
+

 class SkillsManager:
    """Unified skill lifecycle: discovery → loading → prompt renderation.
@@ -64,13 +86,22 @@ class SkillsManager:
    def __init__(self, config: SkillsManagerConfig | None = None) -> None:
        self._config = config or SkillsManagerConfig()
        self._loaded = False
+        self._catalog: object = None  # SkillCatalog, set after load()
+        self._all_skills: list = []  # list[ParsedSkill], pre-override-filter
        self._catalog_prompt: str = ""
        self._protocols_prompt: str = ""
        self._allowlisted_dirs: list[str] = []
        self._default_mgr: object = None  # DefaultSkillManager, set after load()
+        # Override stores (loaded lazily in _do_load). Queen-scope and
+        # colony-scope are read together; colony entries win on collision.
+        self._queen_overrides: object = None  # SkillOverrideStore | None
+        self._colony_overrides: object = None  # SkillOverrideStore | None
        # Hot-reload state
        self._watched_dirs: list[str] = []
+        self._watched_files: list[str] = []
        self._watcher_task: object = None  # asyncio.Task, set by start_watching()
+        # Serializes in-process mutations (HTTP handlers + create_colony).
+        self._mutation_lock = asyncio.Lock()

    # ------------------------------------------------------------------
    # Factory for backwards-compat bridge
@@ -91,6 +122,7 @@ class SkillsManager:
        mgr = cls.__new__(cls)
        mgr._config = SkillsManagerConfig()
        mgr._loaded = True  # skip load()
+        mgr._catalog = None
        mgr._catalog_prompt = skills_catalog_prompt
        mgr._protocols_prompt = protocols_prompt
        mgr._allowlisted_dirs = []
@@ -117,6 +149,7 @@ class SkillsManager:
        from framework.skills.catalog import SkillCatalog
        from framework.skills.defaults import DefaultSkillManager
        from framework.skills.discovery import DiscoveryConfig, SkillDiscovery
+        from framework.skills.overrides import SkillOverrideStore

        skills_config = self._config.skills_config

@@ -126,12 +159,13 @@ class SkillsManager:
            DiscoveryConfig(
                project_root=self._config.project_root,
                skip_framework_scope=False,
+                extra_scopes=list(self._config.extra_scope_dirs or []),
            )
        )
        discovered = discovery.discover()
        self._watched_dirs = discovery.scanned_directories

-        # Trust-gate project-scope skills (AS-13)
+        # Trust-gate project-scope skills (AS-13). UI scopes bypass.
        if self._config.project_root is not None and not self._config.skip_community_discovery:
            from framework.skills.trust import TrustGate

@@ -139,7 +173,33 @@ class SkillsManager:
                discovered, project_dir=self._config.project_root
            )

+        # 1b. Load per-scope override stores. Missing files → empty stores.
+        queen_store = None
+        if self._config.queen_overrides_path is not None:
+            queen_store = SkillOverrideStore.load(
+                self._config.queen_overrides_path,
+                scope_label=f"queen:{self._config.queen_id or ''}",
+            )
+        colony_store = None
+        if self._config.colony_overrides_path is not None:
+            colony_store = SkillOverrideStore.load(
+                self._config.colony_overrides_path,
+                scope_label=f"colony:{self._config.colony_name or ''}",
+            )
+        self._queen_overrides = queen_store
+        self._colony_overrides = colony_store
+        self._watched_files = [
+            str(p) for p in (self._config.queen_overrides_path, self._config.colony_overrides_path) if p is not None
+        ]
+
+        # 1c. Apply override filtering. Colony entries take precedence over
+        # queen entries on name collision; the store's ``is_disabled`` keeps
+        # the resolution rule in one place.
+        self._all_skills = list(discovered)
+        discovered = self._apply_overrides(discovered, skills_config, queen_store, colony_store)
+
        catalog = SkillCatalog(discovered)
+        self._catalog = catalog
        self._allowlisted_dirs = catalog.allowlisted_dirs
        catalog_prompt = catalog.to_prompt()

@@ -171,6 +231,101 @@ class SkillsManager:
                len(catalog_prompt),
            )

+    # ------------------------------------------------------------------
+    # Override application
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _apply_overrides(
+        discovered: list,
+        skills_config: SkillsConfig,
+        queen_store: object,
+        colony_store: object,
+    ) -> list:
+        """Filter ``discovered`` per the queen + colony override stores.
+
+        Resolution rule:
+          1. Tombstoned names (``deleted_ui_skills``) drop out.
+          2. An explicit ``enabled=False`` override drops the skill.
+          3. An explicit ``enabled=True`` override keeps it (wins over
+             ``all_defaults_disabled`` for framework defaults AND over the
+             preset-scope default-off rule).
+          4. Otherwise: preset-scope skills are off by default; everything
+             else inherits :meth:`SkillsConfig.is_default_enabled`.
+        """
+        from framework.skills.overrides import SkillOverrideStore
+
+        stores: list[SkillOverrideStore] = [s for s in (queen_store, colony_store) if s is not None]
+
+        tombstones: set[str] = set()
+        for store in stores:
+            tombstones |= set(store.deleted_ui_skills)
+
+        out = []
+        for skill in discovered:
+            if skill.name in tombstones:
+                continue
+            # Check colony first so colony overrides win over queen's.
+            explicit: bool | None = None
+            master_disabled = False
+            for store in reversed(stores):  # colony, then queen
+                entry = store.get(skill.name)
+                if entry is not None and entry.enabled is not None:
+                    explicit = entry.enabled
+                    break
+                if store.all_defaults_disabled:
+                    master_disabled = True
+            if explicit is False:
+                continue
+            if explicit is True:
+                out.append(skill)
+                continue
+            # Preset-scope capability packs are bundled but ship OFF; the
+            # user must explicitly enable them per queen or colony. This
+            # runs even when no store is present so bare agents don't
+            # silently load x-automation etc.
+            if skill.source_scope == "preset":
+                continue
+            # No explicit entry — master switch takes effect against framework defaults.
+            default_enabled = skills_config.is_default_enabled(skill.name)
+            if master_disabled and default_enabled and skill.source_scope == "framework":
+                continue
+            if default_enabled:
+                out.append(skill)
+        return out
+
+    # ------------------------------------------------------------------
+    # Override accessors
+    # ------------------------------------------------------------------
+
+    @property
+    def queen_overrides(self) -> object:
+        """The queen-scope :class:`SkillOverrideStore` or ``None``."""
+        return self._queen_overrides
+
+    @property
+    def colony_overrides(self) -> object:
+        """The colony-scope :class:`SkillOverrideStore` or ``None``."""
+        return self._colony_overrides
+
+    @property
+    def mutation_lock(self) -> asyncio.Lock:
+        """Serializes in-process override mutations (routes + queen tools)."""
+        return self._mutation_lock
+
+    def reload(self) -> None:
+        """Re-run discovery and rebuild cached prompts. Public wrapper for ``_reload``."""
+        self._reload()
+
+    def enumerate_skills_with_source(self) -> list:
+        """Return every discovered skill, including ones disabled by overrides.
+
+        The UI relies on this: a disabled framework skill needs to render
+        in the list so the user can toggle it back on. The post-filter
+        catalog omits those entries.
+        """
+        return list(self._all_skills)
+
    # ------------------------------------------------------------------
    # Hot-reload: watch skill directories for SKILL.md changes.
    # ------------------------------------------------------------------
@@ -178,14 +333,14 @@ class SkillsManager:
    async def start_watching(self) -> None:
        """Start a background task watching skill directories for changes.

-        When a ``SKILL.md`` file is added/modified/removed, the cached
-        ``skills_catalog_prompt`` is rebuilt.  The next node iteration picks
-        up the new prompt automatically via the ``dynamic_prompt_provider``.
+        Triggers a reload when any ``SKILL.md`` changes or an override
+        JSON file is modified. The next node iteration picks up the new
+        prompt via the ``dynamic_prompt_provider`` / per-worker
+        ``dynamic_skills_catalog_provider``.

-        Silently no-ops when ``watchfiles`` is not installed or when no
-        directories are being watched (e.g. bare mode, no project_root).
+        Silently no-ops when ``watchfiles`` is not installed or there
+        are no paths to watch.
        """
-        import asyncio

        try:
            import watchfiles  # noqa: F401 -- optional dep check
@@ -193,7 +348,7 @@ class SkillsManager:
            logger.debug("watchfiles not installed; skill hot-reload disabled")
            return

-        if not self._watched_dirs:
+        if not self._watched_dirs and not self._watched_files:
            logger.debug("No skill directories to watch; hot-reload skipped")
            return

@@ -205,14 +360,13 @@ class SkillsManager:
            name="skills-hot-reload",
        )
        logger.info(
-            "Skill hot-reload enabled (watching %d directories)",
+            "Skill hot-reload enabled (watching %d dirs, %d override files)",
            len(self._watched_dirs),
+            len(self._watched_files),
        )

    async def stop_watching(self) -> None:
        """Cancel the background watcher task (if running)."""
-        import asyncio
-
        task = self._watcher_task
        if task is None:
            return
@@ -225,22 +379,35 @@ class SkillsManager:
                pass

    async def _watch_loop(self) -> None:
-        """Background coroutine that watches SKILL.md files and triggers reload."""
-        import asyncio
-
+        """Watch SKILL.md + override JSON files and trigger reload on change."""
        import watchfiles

        def _filter(_change: object, path: str) -> bool:
-            return path.endswith("SKILL.md")
+            return path.endswith("SKILL.md") or path.endswith("skills_overrides.json")
+
+        # watchfiles accepts a mix of dirs and files; file watches survive
+        # a tmp+rename (the containing dir sees the event).
+        watch_targets = list(self._watched_dirs)
+        for f in self._watched_files:
+            # watchfiles needs the parent dir for file-level events to fire
+            # reliably through atomic replace; adding the file path directly
+            # works on Linux/macOS inotify/FSEvents but a dir watch is
+            # belt-and-braces.
+            parent = str(Path(f).parent)
+            if parent not in watch_targets:
+                watch_targets.append(parent)
+
+        if not watch_targets:
+            return

        try:
            async for changes in watchfiles.awatch(
-                *self._watched_dirs,
+                *watch_targets,
                watch_filter=_filter,
                debounce=1000,
            ):
                paths = [p for _, p in changes]
-                logger.info("SKILL.md changes detected: %s", paths)
+                logger.info("Skill state changes detected: %s", paths)
                try:
                    self._reload()
                except Exception:
@@ -271,6 +438,18 @@ class SkillsManager:
        """Community skills XML catalog for system prompt injection."""
        return self._catalog_prompt

+    def skills_catalog_prompt_for_phase(self, phase: str | None) -> str:
+        """Render the catalog filtered for the given queen phase.
+
+        Skills whose frontmatter ``visibility`` list is present and
+        excludes ``phase`` are dropped. Falls back to the cached
+        phase-agnostic prompt when no live catalog is available
+        (e.g. ``from_precomputed``).
+        """
+        if self._catalog is None or phase is None:
+            return self._catalog_prompt
+        return self._catalog.to_prompt(phase=phase)  # type: ignore[attr-defined]
+
    @property
    def protocols_prompt(self) -> str:
        """Default skill operational protocols for system prompt injection."""
@@ -0,0 +1,254 @@
+"""Per-scope skill override store.
+
+Sits between :mod:`framework.skills.discovery` and
+:class:`framework.skills.catalog.SkillCatalog`: records the user's
+per-queen and per-colony decisions about which skills are enabled,
+who created them (provenance), and any parameter tweaks.
+
+Two well-known paths back this module:
+
+* Queen scope:   ``~/.hive/agents/queens/{queen_id}/skills_overrides.json``
+* Colony scope:  ``~/.hive/colonies/{colony_name}/skills_overrides.json``
+
+The schema is intentionally small; see :class:`SkillOverrideStore` for
+the JSON shape. Atomic writes mirror
+:class:`framework.skills.trust.TrustedRepoStore` (tmp + rename).
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from dataclasses import dataclass, field
+from datetime import UTC, datetime
+from enum import StrEnum
+from pathlib import Path
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+_SCHEMA_VERSION = 1
+
+
+class Provenance(StrEnum):
+    """Where a skill came from.
+
+    The override store is the authoritative provenance ledger for anything
+    the UI or the queen tools touched. Framework / user-dropped /
+    project-dropped skills don't need an entry unless they've been
+    explicitly configured.
+    """
+
+    FRAMEWORK = "framework"
+    PRESET = "preset"
+    USER_DROPPED = "user_dropped"
+    USER_UI_CREATED = "user_ui_created"
+    QUEEN_CREATED = "queen_created"
+    LEARNED_RUNTIME = "learned_runtime"
+    PROJECT_DROPPED = "project_dropped"
+    # Catch-all for skills with no recorded authorship: legacy rows from
+    # before the override store existed, PATCHes that precede any CREATE,
+    # etc. Keeps the ledger honest rather than forcing a guess.
+    OTHER = "other"
+
+
+@dataclass
+class OverrideEntry:
+    """Per-skill override record inside a scope's store."""
+
+    enabled: bool | None = None
+    provenance: Provenance = Provenance.FRAMEWORK
+    trust: str | None = None
+    param_overrides: dict[str, Any] = field(default_factory=dict)
+    notes: str | None = None
+    created_at: datetime | None = None
+    created_by: str | None = None
+
+    def clone(self) -> OverrideEntry:
+        """Return a deep-enough copy (dict fields are re-allocated)."""
+        return OverrideEntry(
+            enabled=self.enabled,
+            provenance=self.provenance,
+            trust=self.trust,
+            param_overrides=dict(self.param_overrides),
+            notes=self.notes,
+            created_at=self.created_at,
+            created_by=self.created_by,
+        )
+
+    def to_dict(self) -> dict[str, Any]:
+        out: dict[str, Any] = {"provenance": str(self.provenance)}
+        if self.enabled is not None:
+            out["enabled"] = bool(self.enabled)
+        if self.trust is not None:
+            out["trust"] = self.trust
+        if self.param_overrides:
+            out["param_overrides"] = dict(self.param_overrides)
+        if self.notes is not None:
+            out["notes"] = self.notes
+        if self.created_at is not None:
+            out["created_at"] = self.created_at.isoformat()
+        if self.created_by is not None:
+            out["created_by"] = self.created_by
+        return out
+
+    @classmethod
+    def from_dict(cls, raw: dict[str, Any]) -> OverrideEntry:
+        created_at_raw = raw.get("created_at")
+        created_at: datetime | None = None
+        if isinstance(created_at_raw, str):
+            try:
+                created_at = datetime.fromisoformat(created_at_raw)
+            except ValueError:
+                created_at = None
+        provenance_raw = raw.get("provenance") or Provenance.FRAMEWORK
+        try:
+            provenance = Provenance(provenance_raw)
+        except ValueError:
+            logger.warning("override: unknown provenance %r; defaulting to framework", provenance_raw)
+            provenance = Provenance.FRAMEWORK
+        enabled = raw.get("enabled")
+        return cls(
+            enabled=enabled if isinstance(enabled, bool) else None,
+            provenance=provenance,
+            trust=raw.get("trust") if isinstance(raw.get("trust"), str) else None,
+            param_overrides=dict(raw.get("param_overrides") or {}),
+            notes=raw.get("notes") if isinstance(raw.get("notes"), str) else None,
+            created_at=created_at,
+            created_by=raw.get("created_by") if isinstance(raw.get("created_by"), str) else None,
+        )
+
+
+@dataclass
+class SkillOverrideStore:
+    """Persistent per-scope override file.
+
+    The file is created lazily on first save; a missing file behaves like
+    an empty store (all skills inherit defaults, no metadata recorded).
+    """
+
+    path: Path
+    scope_label: str = ""
+    version: int = _SCHEMA_VERSION
+    all_defaults_disabled: bool = False
+    overrides: dict[str, OverrideEntry] = field(default_factory=dict)
+    deleted_ui_skills: set[str] = field(default_factory=set)
+
+    # ------------------------------------------------------------------
+    # Factory
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def load(cls, path: Path, scope_label: str = "") -> SkillOverrideStore:
+        """Load the store from disk; return an empty store if the file is absent.
+
+        Permissive on parse errors: logs and returns an empty store rather
+        than raising, so a corrupted file never takes down skill loading.
+        """
+        store = cls(path=path, scope_label=scope_label)
+        try:
+            raw = json.loads(path.read_text(encoding="utf-8"))
+        except FileNotFoundError:
+            return store
+        except Exception as exc:
+            logger.warning("override: failed to read %s (%s); starting empty", path, exc)
+            return store
+        if not isinstance(raw, dict):
+            logger.warning("override: %s is not an object; starting empty", path)
+            return store
+
+        store.version = int(raw.get("version", _SCHEMA_VERSION))
+        store.all_defaults_disabled = bool(raw.get("all_defaults_disabled", False))
+        raw_overrides = raw.get("overrides") or {}
+        if isinstance(raw_overrides, dict):
+            for name, entry_raw in raw_overrides.items():
+                if not isinstance(name, str) or not isinstance(entry_raw, dict):
+                    continue
+                store.overrides[name] = OverrideEntry.from_dict(entry_raw)
+        deleted = raw.get("deleted_ui_skills") or []
+        if isinstance(deleted, list):
+            store.deleted_ui_skills = {s for s in deleted if isinstance(s, str)}
+        return store
+
+    # ------------------------------------------------------------------
+    # Mutations
+    # ------------------------------------------------------------------
+
+    def upsert(self, skill_name: str, entry: OverrideEntry) -> None:
+        """Insert or replace a skill's override entry."""
+        self.overrides[skill_name] = entry
+        # If we're explicitly managing this skill again, lift any tombstone.
+        self.deleted_ui_skills.discard(skill_name)
+
+    def set_enabled(self, skill_name: str, enabled: bool, *, provenance: Provenance | None = None) -> None:
+        """Convenience: toggle enabled without rewriting other fields."""
+        existing = self.overrides.get(skill_name)
+        if existing is None:
+            existing = OverrideEntry(
+                enabled=enabled,
+                provenance=provenance or Provenance.FRAMEWORK,
+            )
+        else:
+            existing.enabled = enabled
+            if provenance is not None:
+                existing.provenance = provenance
+        self.overrides[skill_name] = existing
+
+    def remove(self, skill_name: str, *, tombstone: bool = True) -> None:
+        """Drop a skill's override entry; optionally leave a tombstone.
+
+        Tombstones matter for UI-created skills: if the user deletes a
+        queen-scope skill via the UI, we rm-tree its directory, but the
+        file watcher might lag or a background process might have an
+        open handle. A tombstone ensures the loader treats the skill as
+        gone even if a stale SKILL.md lingers.
+        """
+        self.overrides.pop(skill_name, None)
+        if tombstone:
+            self.deleted_ui_skills.add(skill_name)
+
+    def is_disabled(self, skill_name: str, *, default_enabled: bool) -> bool:
+        """Return True when this scope's override force-disables the skill."""
+        if self.all_defaults_disabled and default_enabled:
+            # Caller says "default enabled"; master switch flips it off unless
+            # an explicit enabled=True override re-enables.
+            entry = self.overrides.get(skill_name)
+            if entry is not None and entry.enabled is True:
+                return False
+            return True
+        entry = self.overrides.get(skill_name)
+        if entry is None:
+            return not default_enabled
+        if entry.enabled is None:
+            return not default_enabled
+        return not entry.enabled
+
+    def effective_enabled(self, skill_name: str, *, default_enabled: bool) -> bool:
+        """The inverse of :meth:`is_disabled`, for readability at call sites."""
+        return not self.is_disabled(skill_name, default_enabled=default_enabled)
+
+    def get(self, skill_name: str) -> OverrideEntry | None:
+        return self.overrides.get(skill_name)
+
+    # ------------------------------------------------------------------
+    # Persistence
+    # ------------------------------------------------------------------
+
+    def save(self) -> None:
+        """Atomic write: tmp + rename. Creates the parent dir if needed."""
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+        payload: dict[str, Any] = {
+            "version": self.version,
+            "all_defaults_disabled": self.all_defaults_disabled,
+            "overrides": {name: entry.to_dict() for name, entry in sorted(self.overrides.items())},
+        }
+        if self.deleted_ui_skills:
+            payload["deleted_ui_skills"] = sorted(self.deleted_ui_skills)
+        tmp = self.path.with_suffix(self.path.suffix + ".tmp")
+        tmp.write_text(json.dumps(payload, indent=2), encoding="utf-8")
+        tmp.replace(self.path)
+
+
+def utc_now() -> datetime:
+    """Single source of truth for override timestamps."""
+    return datetime.now(tz=UTC)
@@ -37,6 +37,10 @@ class ParsedSkill:
    compatibility: list[str] | None = None
    metadata: dict[str, Any] | None = None
    allowed_tools: list[str] | None = None
+    # List of queen phases in which this skill appears in the catalog.
+    # None = visible in all phases. Example: ["planning", "building"]
+    # hides a framework-authoring skill from the INDEPENDENT/DM prompt.
+    visibility: list[str] | None = None


 def _try_fix_yaml(raw: str) -> str:
@@ -219,6 +223,19 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
    raw_tools = frontmatter.get("allowed-tools")
    if isinstance(raw_tools, str):
        raw_tools = [raw_tools]
+    # `visibility` lives under `metadata.visibility` so it stays inside
+    # the open `metadata` map (the skill-file schema used by the IDE
+    # and other tooling only allows a fixed set of top-level keys).
+    raw_metadata = frontmatter.get("metadata")
+    raw_visibility: Any = None
+    if isinstance(raw_metadata, dict):
+        raw_visibility = raw_metadata.get("visibility")
+    if isinstance(raw_visibility, str):
+        raw_visibility = [raw_visibility]
+    if isinstance(raw_visibility, list):
+        raw_visibility = [str(v).strip() for v in raw_visibility if str(v).strip()] or None
+    else:
+        raw_visibility = None

    return ParsedSkill(
        name=name,
@@ -231,4 +248,5 @@ def parse_skill_md(path: Path, source_scope: str = "project") -> ParsedSkill | N
        compatibility=raw_compat,
        metadata=frontmatter.get("metadata"),
        allowed_tools=raw_tools,
+        visibility=raw_visibility,
    )
@@ -20,9 +20,17 @@ from pathlib import Path

 logger = logging.getLogger(__name__)

-_DEFAULT_SKILLS_DIR = Path(__file__).parent / "_default_skills"
+# Bundled skills live in two sibling dirs: ``_default_skills`` (always-on
+# infra) and ``_preset_skills`` (capability packs, off by default but
+# still bundled). Tool-gated pre-activation walks both so ``browser_*``
+# tools still pull in the browser-automation preset even though it isn't
+# default-enabled in the catalog.
+_BUNDLED_DIRS: tuple[Path, ...] = (
+    Path(__file__).parent / "_default_skills",
+    Path(__file__).parent / "_preset_skills",
+)

-# (tool-name prefix, default skill directory name, display name)
+# (tool-name prefix, skill directory name, display name)
 _TOOL_GATED_SKILLS: list[tuple[str, str, str]] = [
    ("browser_", "browser-automation", "hive.browser-automation"),
 ]
@@ -31,12 +39,23 @@ _BODY_CACHE: dict[str, str] = {}


 def _load_body(dir_name: str) -> str:
-    """Load the markdown body of a framework default skill, cached."""
+    """Load the markdown body of a bundled skill, cached. Searches every
+    bundled directory (default + preset) so the mapping table doesn't
+    need to know which dir a skill lives in.
+    """
    if dir_name in _BODY_CACHE:
        return _BODY_CACHE[dir_name]

-    path = _DEFAULT_SKILLS_DIR / dir_name / "SKILL.md"
+    path: Path | None = None
+    for parent in _BUNDLED_DIRS:
+        candidate = parent / dir_name / "SKILL.md"
+        if candidate.exists():
+            path = candidate
+            break
    body = ""
+    if path is None:
+        _BODY_CACHE[dir_name] = body
+        return body
    try:
        raw = path.read_text(encoding="utf-8")
        # Strip YAML frontmatter (between the first two '---' fences)
@@ -318,13 +318,19 @@ class TrustGate:
    ) -> list[ParsedSkill]:
        """Return the subset of skills that are trusted for loading.

-        - Framework and user-scope skills: always included.
+        - Framework, user, queen_ui, and colony_ui scopes: always included.
+          (UI-created skills are authenticated by the user creating them
+          through the authenticated UI — they do not go through the
+          trusted_repos.json flow.)
        - Project-scope skills: classified; consent prompt shown if untrusted.
        """
        import os

-        # Separate project skills from always-trusted scopes
-        always_trusted = [s for s in skills if s.source_scope != "project"]
+        # UI-authored scopes bypass the trust gate — they're implicitly
+        # trusted because the user authored them through the UI. ``preset``
+        # ships with the framework distribution, so it's trusted too.
+        _bypass_scopes = {"framework", "preset", "user", "queen_ui", "colony_ui"}
+        always_trusted = [s for s in skills if s.source_scope in _bypass_scopes]
        project_skills = [s for s in skills if s.source_scope == "project"]

        if not project_skills:
@@ -43,6 +43,10 @@ class FileConversationStore:
    def __init__(self, base_path: str | Path) -> None:
        self._base = Path(base_path)
        self._parts_dir = self._base / "parts"
+        # Partial checkpoints for in-flight assistant turns. Written on every
+        # stream event, deleted atomically when the final part lands. Kept
+        # in a sibling dir so the parts/ glob doesn't pick them up.
+        self._partials_dir = self._base / "partials"

    # --- sync helpers --------------------------------------------------------

@@ -99,6 +103,44 @@ class FileConversationStore:
    async def read_cursor(self) -> dict[str, Any] | None:
        return await self._run(self._read_json, self._base / "cursor.json")

+    async def write_partial(self, seq: int, data: dict[str, Any]) -> None:
+        """Checkpoint an in-flight assistant turn. Overwrites any prior partial
+        for the same seq. Caller is expected to clear_partial() once the real
+        part is written via write_part().
+        """
+        path = self._partials_dir / f"{seq:010d}.json"
+        await self._run(self._write_json, path, data)
+
+    async def read_partial(self, seq: int) -> dict[str, Any] | None:
+        path = self._partials_dir / f"{seq:010d}.json"
+        return await self._run(self._read_json, path)
+
+    async def read_all_partials(self) -> list[dict[str, Any]]:
+        """Return all partial checkpoints, sorted by seq. Used during restore
+        to surface any in-flight turn that the last process didn't finish.
+        """
+
+        def _read_all() -> list[dict[str, Any]]:
+            if not self._partials_dir.exists():
+                return []
+            files = sorted(self._partials_dir.glob("*.json"))
+            partials: list[dict[str, Any]] = []
+            for f in files:
+                data = self._read_json(f)
+                if data is not None:
+                    partials.append(data)
+            return partials
+
+        return await self._run(_read_all)
+
+    async def clear_partial(self, seq: int) -> None:
+        def _clear() -> None:
+            path = self._partials_dir / f"{seq:010d}.json"
+            if path.exists():
+                path.unlink()
+
+        await self._run(_clear)
+
    async def delete_parts_before(self, seq: int, run_id: str | None = None) -> None:
        def _delete() -> None:
            if not self._parts_dir.exists():
@@ -125,6 +167,10 @@ class FileConversationStore:
            if self._parts_dir.exists():
                for f in self._parts_dir.glob("*.json"):
                    f.unlink()
+            # Clear partial checkpoints
+            if self._partials_dir.exists():
+                for f in self._partials_dir.glob("*.json"):
+                    f.unlink()
            # Clear cursor
            cursor_path = self._base / "cursor.json"
            if cursor_path.exists():
@@ -5,6 +5,8 @@ import ColonyChat from "./pages/colony-chat";
 import QueenDM from "./pages/queen-dm";
 import OrgChart from "./pages/org-chart";
 import PromptLibrary from "./pages/prompt-library";
+import SkillsLibrary from "./pages/skills-library";
+import ToolLibrary from "./pages/tool-library";
 import CredentialsPage from "./pages/credentials";
 import NotFound from "./pages/not-found";

@@ -16,7 +18,9 @@ function App() {
        <Route path="/colony/:colonyId" element={<ColonyChat />} />
        <Route path="/queen/:queenId" element={<QueenDM />} />
        <Route path="/org-chart" element={<OrgChart />} />
+        <Route path="/skills-library" element={<SkillsLibrary />} />
        <Route path="/prompt-library" element={<PromptLibrary />} />
+        <Route path="/tool-library" element={<ToolLibrary />} />
        <Route path="/credentials" element={<CredentialsPage />} />
        <Route path="*" element={<NotFound />} />
      </Route>
@@ -7,4 +7,8 @@ export const agentsApi = {
  /** Permanently delete an agent and all its sessions/files. */
  deleteAgent: (agentPath: string) =>
    api.delete<{ deleted: string }>("/agents", { agent_path: agentPath }),
+
+  /** Update colony metadata (e.g. icon). */
+  updateMetadata: (agentPath: string, updates: { icon?: string }) =>
+    api.patch<{ ok: boolean }>("/agents/metadata", { agent_path: agentPath, ...updates }),
 };
@@ -12,12 +12,13 @@ export class ApiError extends Error {

 async function request<T>(path: string, options: RequestInit = {}): Promise<T> {
  const url = `${API_BASE}${path}`;
+  const isFormData = options.body instanceof FormData;
+  const headers: Record<string, string> = isFormData
+    ? {}  // Let browser set Content-Type with boundary for multipart
+    : { "Content-Type": "application/json", ...options.headers as Record<string, string> };
  const response = await fetch(url, {
    ...options,
-    headers: {
-      "Content-Type": "application/json",
-      ...options.headers,
-    },
+    headers,
  });

  if (!response.ok) {
@@ -52,4 +53,6 @@ export const api = {
      method: "PATCH",
      body: body ? JSON.stringify(body) : undefined,
    }),
+  upload: <T>(path: string, formData: FormData) =>
+    request<T>(path, { method: "POST", body: formData }),
 };
@@ -0,0 +1,50 @@
+import { api } from "./client";
+import type { ToolMeta, McpServerTools } from "./queens";
+
+export interface ColonySummary {
+  name: string;
+  queen_name: string | null;
+  created_at: string | null;
+  has_allowlist: boolean;
+  enabled_count: number | null;
+}
+
+export interface ColonyToolsResponse {
+  colony_name: string;
+  enabled_mcp_tools: string[] | null;
+  stale: boolean;
+  lifecycle: ToolMeta[];
+  synthetic: ToolMeta[];
+  mcp_servers: McpServerTools[];
+}
+
+export interface ColonyToolsUpdateResult {
+  colony_name: string;
+  enabled_mcp_tools: string[] | null;
+  refreshed_runtimes: number;
+  note?: string;
+}
+
+export const coloniesApi = {
+  /** List every colony on disk with a summary of its tool allowlist. */
+  list: () =>
+    api.get<{ colonies: ColonySummary[] }>(`/colonies/tools-index`),
+
+  /** Enumerate a colony's tool surface (lifecycle + synthetic + MCP). */
+  getTools: (colonyName: string) =>
+    api.get<ColonyToolsResponse>(
+      `/colony/${encodeURIComponent(colonyName)}/tools`,
+    ),
+
+  /** Persist a colony's MCP tool allowlist.
+   *
+   * ``null`` resets to "allow every MCP tool". A list of names enables
+   * only those MCP tools. Changes take effect on the next worker spawn;
+   * in-flight workers keep their booted tool list.
+   */
+  updateTools: (colonyName: string, enabled: string[] | null) =>
+    api.patch<ColonyToolsUpdateResult>(
+      `/colony/${encodeURIComponent(colonyName)}/tools`,
+      { enabled_mcp_tools: enabled },
+    ),
+};
@@ -0,0 +1,80 @@
+import { api } from "./client";
+
+/** A SQLite cell value, constrained to JSON-serialisable types that
+ *  Python maps into sqlite3 param placeholders without surprises. */
+export type CellValue = string | number | boolean | null;
+
+export interface ColumnInfo {
+  name: string;
+  /** SQLite declared type (e.g. "TEXT", "INTEGER"). May be empty string. */
+  type: string;
+  notnull: boolean;
+  /** >0 means part of the primary key (ordinal position). 0 = not PK. */
+  pk: number;
+  dflt_value: string | null;
+}
+
+export interface TableOverview {
+  name: string;
+  columns: ColumnInfo[];
+  row_count: number;
+  primary_key: string[];
+}
+
+export interface TableRowsResponse {
+  table: string;
+  columns: ColumnInfo[];
+  primary_key: string[];
+  rows: Record<string, CellValue>[];
+  total: number;
+  limit: number;
+  offset: number;
+}
+
+export interface UpdateRowRequest {
+  /** Primary key column(s) → value(s). All PK columns must be present. */
+  pk: Record<string, CellValue>;
+  /** Column(s) → new value(s). Cannot include PK columns. */
+  updates: Record<string, CellValue>;
+}
+
+export const colonyDataApi = {
+  /** List user tables in the colony's progress.db with row counts.
+   *
+   *  Routed by colony directory name (not session) because progress.db
+   *  is per-colony — one DB serves every session for that colony, and
+   *  the data is reachable even when no session is live. */
+  listTables: (colonyName: string) =>
+    api.get<{ tables: TableOverview[] }>(
+      `/colonies/${encodeURIComponent(colonyName)}/data/tables`,
+    ),
+
+  /** Paginated rows for a table. Server enforces limit ≤ 500. */
+  listRows: (
+    colonyName: string,
+    table: string,
+    opts: {
+      limit?: number;
+      offset?: number;
+      orderBy?: string | null;
+      orderDir?: "asc" | "desc";
+    } = {},
+  ) => {
+    const params = new URLSearchParams();
+    if (opts.limit != null) params.set("limit", String(opts.limit));
+    if (opts.offset != null) params.set("offset", String(opts.offset));
+    if (opts.orderBy) params.set("order_by", opts.orderBy);
+    if (opts.orderDir) params.set("order_dir", opts.orderDir);
+    const qs = params.toString();
+    return api.get<TableRowsResponse>(
+      `/colonies/${encodeURIComponent(colonyName)}/data/tables/${encodeURIComponent(table)}/rows${qs ? `?${qs}` : ""}`,
+    );
+  },
+
+  /** Update a single row by primary key. Returns {updated: 0|1}. */
+  updateRow: (colonyName: string, table: string, body: UpdateRowRequest) =>
+    api.patch<{ updated: number }>(
+      `/colonies/${encodeURIComponent(colonyName)}/data/tables/${encodeURIComponent(table)}/rows`,
+      body,
+    ),
+};
@@ -0,0 +1,105 @@
+import { api } from "./client";
+
+export interface WorkerResult {
+  status: string;
+  summary: string;
+  error: string | null;
+  tokens_used: number;
+  duration_seconds: number;
+}
+
+export interface WorkerSummary {
+  worker_id: string;
+  task: string;
+  status: string;
+  started_at: number;
+  result: WorkerResult | null;
+}
+
+export interface ColonySkill {
+  name: string;
+  description: string;
+  location: string;
+  base_dir: string;
+  source_scope: string;
+}
+
+export interface ColonyTool {
+  name: string;
+  description: string;
+  /** Canonical credential/provider key (e.g. "hubspot", "gmail") for
+   *  tools bound to an Aden credential. ``null`` for framework/core
+   *  tools that don't require a provider credential. */
+  provider: string | null;
+}
+
+export interface ProgressTask {
+  id: string;
+  seq: number | null;
+  priority: number;
+  goal: string;
+  payload: string | null;
+  status: string;
+  worker_id: string | null;
+  claim_token: string | null;
+  claimed_at: string | null;
+  started_at: string | null;
+  completed_at: string | null;
+  created_at: string;
+  updated_at: string;
+  retry_count: number;
+  max_retries: number;
+  last_error: string | null;
+  parent_task_id: string | null;
+  source: string | null;
+}
+
+export interface ProgressStep {
+  id: string;
+  task_id: string;
+  seq: number;
+  title: string;
+  detail: string | null;
+  status: string;
+  evidence: string | null;
+  worker_id: string | null;
+  started_at: string | null;
+  completed_at: string | null;
+  /** Present only on upsert events; not on snapshot rows. */
+  _ts?: string | null;
+}
+
+export interface ProgressSnapshot {
+  tasks: ProgressTask[];
+  steps: ProgressStep[];
+}
+
+export const colonyWorkersApi = {
+  /** List spawned workers (live + completed) for a colony session. */
+  list: (sessionId: string) =>
+    api.get<{ workers: WorkerSummary[] }>(`/sessions/${sessionId}/workers`),
+
+  /** List the colony's shared skills catalog. */
+  listSkills: (sessionId: string) =>
+    api.get<{ skills: ColonySkill[] }>(`/sessions/${sessionId}/colony/skills`),
+
+  /** List the colony's default tools. */
+  listTools: (sessionId: string) =>
+    api.get<{ tools: ColonyTool[] }>(`/sessions/${sessionId}/colony/tools`),
+
+  /** Snapshot of progress.db tasks + steps, optionally filtered by
+   *  worker_id. Routed by colony directory name (not session) because
+   *  progress.db is per-colony. */
+  progressSnapshot: (colonyName: string, workerId?: string) => {
+    const qs = workerId ? `?worker_id=${encodeURIComponent(workerId)}` : "";
+    return api.get<ProgressSnapshot>(
+      `/colonies/${encodeURIComponent(colonyName)}/progress/snapshot${qs}`,
+    );
+  },
+
+  /** Build the URL for the live progress SSE stream. */
+  progressStreamUrl: (colonyName: string, workerId?: string): string => {
+    const qs = workerId ? `?worker_id=${encodeURIComponent(workerId)}` : "";
+    return `/api/colonies/${encodeURIComponent(colonyName)}/progress/stream${qs}`;
+  },
+};
@@ -64,4 +64,10 @@ export const configApi = {
      about,
      ...(theme ? { theme } : {}),
    }),
+
+  uploadAvatar: (file: File) => {
+    const fd = new FormData();
+    fd.append("avatar", file);
+    return api.upload<{ avatar_url: string }>("/config/profile/avatar", fd);
+  },
 };
@@ -74,4 +74,28 @@ export const executionApi = {
      `/sessions/${sessionId}/colony-spawn`,
      { colony_name: colonyName, task },
    ),
+
+  /** Lock a queen DM session because the user opened a spawned colony.
+   *  After this call /chat returns 409 until compactAndFork creates a new session.
+   */
+  markColonySpawned: (sessionId: string, colonyName: string) =>
+    api.post<{
+      session_id: string;
+      colony_spawned: boolean;
+      spawned_colony_name: string;
+    }>(`/sessions/${sessionId}/mark-colony-spawned`, {
+      colony_name: colonyName,
+    }),
+
+  /** Compact the locked session and fork into a fresh session under the same queen.
+   *  Returns the new session ID; the frontend should navigate the user to it.
+   */
+  compactAndFork: (sessionId: string) =>
+    api.post<{
+      new_session_id: string;
+      queen_id: string;
+      compacted_from: string;
+      summary_chars: number;
+      messages_compacted: number;
+    }>(`/sessions/${sessionId}/compact-and-fork`),
 };
@@ -0,0 +1,66 @@
+import { api } from "./client";
+
+export type McpTransport = "stdio" | "http" | "sse" | "unix";
+
+export interface McpServer {
+  name: string;
+  /** "local": added via UI/CLI (user-editable). "registry": installed from
+   * the remote MCP registry. "built-in": baked into the queen package —
+   * visible but not removable from the UI. */
+  source: "local" | "registry" | "built-in";
+  transport: McpTransport | string;
+  description: string;
+  enabled: boolean;
+  last_health_status: "healthy" | "unhealthy" | null;
+  last_error: string | null;
+  last_health_check_at: string | null;
+  tool_count: number | null;
+  /** Servers flagged removable:false cannot be deleted from the UI. */
+  removable?: boolean;
+}
+
+export interface AddMcpServerBody {
+  name: string;
+  transport: McpTransport;
+  /** stdio */
+  command?: string;
+  args?: string[];
+  env?: Record<string, string>;
+  cwd?: string;
+  /** http / sse */
+  url?: string;
+  headers?: Record<string, string>;
+  /** unix */
+  socket_path?: string;
+  description?: string;
+}
+
+export interface McpHealthResult {
+  name: string;
+  status: "healthy" | "unhealthy" | "unknown";
+  tools: number;
+  error: string | null;
+}
+
+/** Backend MCPError shape when an operation fails. */
+export interface McpErrorBody {
+  error: string;
+  code?: string;
+  what?: string;
+  why?: string;
+  fix?: string;
+}
+
+export const mcpApi = {
+  listServers: () => api.get<{ servers: McpServer[] }>("/mcp/servers"),
+  addServer: (body: AddMcpServerBody) =>
+    api.post<{ server: McpServer; hint: string }>("/mcp/servers", body),
+  removeServer: (name: string) =>
+    api.delete<{ removed: string }>(`/mcp/servers/${encodeURIComponent(name)}`),
+  setEnabled: (name: string, enabled: boolean) =>
+    api.post<{ name: string; enabled: boolean }>(
+      `/mcp/servers/${encodeURIComponent(name)}/${enabled ? "enable" : "disable"}`,
+    ),
+  checkHealth: (name: string) =>
+    api.post<McpHealthResult>(`/mcp/servers/${encodeURIComponent(name)}/health`),
+};
@@ -0,0 +1,19 @@
+import { api } from "./client";
+
+export interface CustomPrompt {
+  id: string;
+  title: string;
+  category: string;
+  content: string;
+  custom: true;
+}
+
+export const promptsApi = {
+  list: () => api.get<{ prompts: CustomPrompt[] }>("/prompts"),
+
+  create: (title: string, category: string, content: string) =>
+    api.post<CustomPrompt>("/prompts", { title, category, content }),
+
+  delete: (promptId: string) =>
+    api.delete<{ deleted: string }>(`/prompts/${promptId}`),
+};
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Richard Tang	77cc169606	feat: cost tracking	2026-04-23 15:34:07 -07:00
Richard Tang	8c6428f445	feat: token comsumption usage	2026-04-23 15:05:30 -07:00
Richard Tang	44cb0c0f4c	feat: hybrid compaction buffer (fixed tokens + ratio of context) The compaction trigger now reserves headroom equal to compaction_buffer_tokens + compaction_buffer_ratio * max_context_tokens. The fixed component (default 8k, sized for one max-sized tool result) gives a floor on small windows; the ratio (default 0.15) keeps the trigger meaningful on large windows where any constant buffer becomes a rounding error (8k buffer is 75% on a 32k window but 96% on a 200k window). Result: ~80% pre-turn trigger on 200k+ windows so the inner tool loop has room to grow without firing the mid-turn pre-send guard.	2026-04-23 15:04:19 -07:00
Richard Tang	2621fb88b1	fix: drain bg fork tasks before colony-spawn artifact asserts Compaction + worker-storage copy moved to a background task in f39c1c87; the test checked the worker-storage file before the task ran, which flaked under CI load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:38:21 -07:00
Richard Tang	a70f92edbe	chore: lint format	2026-04-22 21:33:33 -07:00
Richard Tang	b2efa179ea	docs: note cache fix in v0.10.4 release notes Release / Create Release (push) Waiting to run Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:27:24 -07:00
Richard Tang	8c6e76d052	fix: no cache for queen config	2026-04-22 21:24:00 -07:00
Richard Tang	c7f1fbf19f	chore: release v0.10.4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 21:12:28 -07:00
Richard Tang	7047ecbf46	chore: fixed ci	2026-04-22 20:14:36 -07:00
Richard Tang	b96ee5aaab	fix: create new session and switch branch	2026-04-22 20:05:21 -07:00
Richard Tang	6744bea01a	feat: move date time inject from system prompt	2026-04-22 17:11:34 -07:00
Richard Tang	390038225b	feat static system prompt	2026-04-22 16:54:47 -07:00
Richard Tang	b55c8fdf86	fix: validate session creation inputs and tighten skill/reflection edges	2026-04-22 15:08:50 -07:00
Richard Tang	e9aea0bbc4	fix: tools and skills registration	2026-04-22 13:54:10 -07:00
Richard Tang	0ba1fa8262	feat: created colony inherit skills and tools	2026-04-21 19:23:33 -07:00
Richard Tang	0fd96d410e	feat: configurable default tools and skills	2026-04-21 19:15:40 -07:00
Richard Tang	c658a7c50b	feat: default skills and tools	2026-04-21 19:15:28 -07:00
Richard Tang	56c3659bda	feat: refactor tool config and library menu	2026-04-21 18:57:11 -07:00
Richard Tang	14f927996c	feat: skill library	2026-04-21 18:48:22 -07:00
Richard Tang	8a0ec070b8	feat: tool library	2026-04-21 17:20:54 -07:00
Richard Tang	80cd77ac30	chore: release v0.10.3 Release / Create Release (push) Waiting to run Details	2026-04-20 19:49:28 -07:00
Richard Tang	c67521a09c	chore: ruff lint	2026-04-20 19:14:14 -07:00
Richard Tang	8da06f4f90	Merge remote-tracking branch 'origin/feat/queue-message' into feat/colony-merge-candidate # Conflicts: # core/frontend/src/components/ChatPanel.tsx # core/frontend/src/pages/colony-chat.tsx # core/frontend/src/pages/queen-dm.tsx	2026-04-20 19:11:58 -07:00
Richard Tang	46e0413eb8	chore: create colony popup	2026-04-20 19:01:43 -07:00
Richard Tang	81731587ff	feat(tool call): add format _coerce before execution	2026-04-20 18:58:12 -07:00
Richard Tang	4e9d9bf1ea	feat: group tools by sessions	2026-04-20 18:20:10 -07:00
Richard Tang	2644ab953d	fix: tool calls in chat	2026-04-20 18:10:53 -07:00
Richard Tang	e7daa59573	feat: queen ask_user tool prompt	2026-04-20 16:48:43 -07:00
Richard Tang	1bec43afad	feat: ask_user tool prompt	2026-04-20 16:38:29 -07:00
Richard Tang	3d1357595d	refactor: ask_user	2026-04-20 16:34:18 -07:00
bryan	59ccbba810	fix: suppress typing flicker on queue auto-flush and dedup user bubble on bootstrap race	2026-04-20 15:30:01 -07:00
Richard Tang	8b2ae369ac	fix:remove deuplicate parts in indenpendent prompt	2026-04-20 14:52:32 -07:00
Richard Tang	96a667cbd9	feat: better identity prompt structure	2026-04-20 14:41:20 -07:00
Richard Tang	17150a53bd	chore: lint	2026-04-20 13:09:02 -07:00
Richard Tang	c1d7b0ee69	feat: fix reply message bubble and improve code reuse	2026-04-20 13:07:26 -07:00
bryan	16ea9b52d3	feat: queue messages during queen turns in colony/queen chats	2026-04-20 12:45:38 -07:00
bryan	dcbfd4ab01	feat: add pending-queue hook and Steer/Cancel UI in ChatPanel	2026-04-20 12:45:14 -07:00
bryan	b762020793	refactor: carry executionId on user SSE events	2026-04-20 12:44:56 -07:00
Richard Tang	4ffddc53e6	fix: trigger message	2026-04-20 11:54:11 -07:00
Richard Tang	24bcc5aea7	feat: update trigger ui	2026-04-20 11:19:57 -07:00
Richard Tang	3c91119f67	feat: improvements for scheduler	2026-04-20 10:49:37 -07:00
Richard Tang	923e773c14	feat: improve the tab switching tool	2026-04-20 10:21:32 -07:00
Naresh Chandanbatve	199c3a235e	feat(tool): add Prometheus tool support (#7047 ) Adds prometheus_query (instant PromQL) and prometheus_query_range (time-range) tools. Includes credential spec, /-/ready health check, unit tests, and docs. Optional Bearer token and Basic auth via env vars (PROMETHEUS_TOKEN, PROMETHEUS_USERNAME/PASSWORD). Fixes #6945.	2026-04-20 18:13:49 +08:00
Kavin	a881fe68da	fix(llm): ensure store=False is passed to Codex Responses API (#7089 ) Forces store: false into the extra_body payload for Codex-style models so that LiteLLM successfully passes it down to the ChatGPT Responses API backend, fixing the BadRequestError. Fixes #7056. Original investigation and first PR by @Darshan174 (#7065). Co-authored-by: Darshan174 <Darshan002321@gmail.com>	2026-04-20 17:54:41 +08:00
Hundao	6b9040477f	fix(ci): unbreak main, ruff format browser and refresh test_model_catalog (#7095 ) * chore: ruff format browser bridge and tools * fix(tests): refresh test_model_catalog expectations after catalog drift	2026-04-20 17:23:26 +08:00
Richard Tang	c7cc031060	fix: handling broken Aden API Key	2026-04-19 20:05:14 -07:00
Richard Tang	93c0ef672a	fix: queen badge	2026-04-19 19:37:49 -07:00
Richard Tang	67d55e6cce	feat: scheduler tools for incubating	2026-04-19 19:30:31 -07:00
Richard Tang	0907ff9cec	Merge branch 'pr-7093-vincent' into feat/colony-session-transfer	2026-04-19 19:01:19 -07:00
Vincent Jiang	ed2e7125ac	feat: colony creation, queen identity in colonies, and org chart improvements - Colony creation: add "Create a Colony" button in queen DM (conversation header), queen profile panel, and sidebar with queen picker + goal input - Queen identity in colonies: resolve queen profile name for colony chat messages, fix duplicate messages on refresh via SSE replay deduplication with restore cutoff - Colony header: show colony name with Component icon, queen profile link preserved - Org chart: colony detail drawer with metadata (start date, goal, status, stats), icon picker for colonies (16 icons, persisted to metadata.json), fixed queen card heights, fixed queen display order via shared sortQueenProfiles() - Chat: add headerAction slot for inline buttons next to "Conversation" header - Backend: PATCH /api/agents/metadata for colony icon, created_at in discover API with filesystem fallback, chat-helpers queen name passthrough for cold restore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-19 18:55:01 -07:00
Richard Tang	f39c1c87af	feat: compact the queen session when creating colony	2026-04-19 18:51:51 -07:00
Richard Tang	1229b4ad4d	feat: incubating phase	2026-04-19 18:07:09 -07:00
Richard Tang	0d11a946a5	feat: mark-colony-spawned for a session that created colony	2026-04-19 17:21:06 -07:00
Richard Tang	b007ed753b	chore: xiaomi model context limit	2026-04-19 15:28:32 -07:00
Richard Tang	bb39424e99	chore: update model context config	2026-04-19 15:19:26 -07:00
Richard Tang	b27c7a029e	chore: update openrouter model selections	2026-04-19 15:10:36 -07:00
Timothy	a3433f2c9e	Merge branch 'main' into fix/image-coordinate-precision	2026-04-19 13:25:41 -07:00
Richard Tang	24ef2c247d	chore: tidy editorconfig and gitattributes, drop unused reference	2026-04-19 13:24:34 -07:00
Richard Tang	a8f9661626	chore: remove unused files	2026-04-19 13:19:01 -07:00
Timothy	3005bcaa96	chore: bump extension version to 1.0.1	2026-04-19 13:06:51 -07:00
Timothy	40c4591d65	fix: extension icons	2026-04-19 13:06:13 -07:00
Timothy	e2bfb9d3af	fix: frame resize	2026-04-19 13:02:12 -07:00
Timothy	e55cea97ef	fix: diagnostics	2026-04-19 12:52:04 -07:00
Timothy	ddaafe0307	Merge remote-tracking branch 'origin/main' into fix/image-coordinate-precision	2026-04-18 23:32:28 -07:00
Richard Tang	c17205a453	test: align stale tests with current behavior	2026-04-18 22:02:03 -07:00
Richard Tang	8e4468851c	chore: ruff format	2026-04-18 21:45:34 -07:00
Richard Tang	ccf4216841	fix: resolve merge conflict markers and ruff issues	2026-04-18 21:45:11 -07:00
Richard Tang	82ffcb17ac	Merge remote-tracking branch 'origin/main' into fix/colony-skill-leak	2026-04-18 21:36:23 -07:00
Richard Tang	4da5bcc1e4	feat: queen bar in colony	2026-04-18 21:30:19 -07:00
Richard Tang	3df7194003	feat: worker tab by clicking on the worker	2026-04-18 21:21:22 -07:00
Richard Tang	6f1f27b6e9	feat: load table by colony	2026-04-18 20:55:20 -07:00
Richard Tang	7b52ed9fa7	fix: outdated jsonledger	2026-04-18 20:35:05 -07:00
Richard Tang	4d32526a29	feat: real available parallel size	2026-04-18 20:18:54 -07:00
Richard Tang	656401e199	feat: real snapshot after interaction	2026-04-18 19:51:52 -07:00
Richard Tang	f2e51157dc	feat: snapshot related prompts	2026-04-18 19:39:00 -07:00
Timothy	0d13c805b1	fix: colony skill leakage	2026-04-18 15:34:31 -07:00
Kowshik Mente	b1ec64438c	fix(runtime): prevent session restart until cancelled execution fully terminates (#7001 ) * fix(runtime): prevent dual execution after forced cancel - keep bookkeeping until task termination - block restart while any execution task is still alive - make execution registration atomic under lock - avoid premature cleanup on cancel timeout - add regression tests for forced-cancel restart scenarios * chore: ruff format and import order --------- Co-authored-by: kowshikmente <kowshikmente@kowshikmentes-MacBook-Pro.local> Co-authored-by: hundao <alchemy_wimp@hotmail.com>	2026-04-18 19:36:50 +08:00
Hundao	90aadf247a	fix(ci): unbreak main — ruff format, test_refs, test_model_catalog (#7084 ) * fix(ci): apply ruff format to browser tool files Refs #7083 * fix(ci): unbreak test_refs (img regression) and test_model_catalog test_refs: - Add `img` back to CONTENT_ROLES so named images get refs again. The recent `cc6ec97a feat: multiple modes browser snapshot tool` refactor renamed NAMED_CONTENT_ROLES → CONTENT_ROLES and accidentally dropped `img`, breaking `test_named_content_roles_get_refs`. - Drop the `navigation` assertion from `test_skips_structural_roles`. That same refactor intentionally added landmark roles (navigation, main, listitem) to CONTENT_ROLES so AI agents can ref them by name, and the test was not updated to reflect that. test_model_catalog: - Add 5 openrouter models that were added to model_catalog.json by #7081 (UI/UX improvements) but not reflected in the test. Refs #7083 * fix(ci): wait for event propagation in subagent report test on Windows `test_worker_report_emits_subagent_report_event` waited only for `worker.is_active` to flip to False, then immediately asserted on the collected events. On Windows the event loop scheduling differs enough that the SUBAGENT_REPORT subscriber callback can run a few ticks after the worker is marked inactive, so the assertion fires against an empty list. Wait for both conditions. Refs #7083	2026-04-18 19:09:15 +08:00
RichardTang-Aden	49317ac5f5	Merge pull request #7081 from vincentjiang777/feat/ui-ux-improvements feat: UI/UX improvements across BYOK, org chart, profiles, and prompt…	2026-04-17 21:03:01 -07:00
Richard Tang	7216e9d9f0	chore: ruff lint and format Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 21:01:18 -07:00
Richard Tang	91b1070d80	Merge remote-tracking branch 'origin/main' into feat/ui-ux-improvements # Conflicts: # core/frontend/src/components/SidebarQueenItem.tsx	2026-04-17 20:58:20 -07:00
Richard Tang	08aeffd977	chore: more create colony logs	2026-04-17 20:27:22 -07:00
Richard Tang	651b57b928	feat: hive open performance issue	2026-04-17 20:16:01 -07:00
Richard Tang	8c10fc2e1c	fix: queen dm session loading	2026-04-17 20:11:48 -07:00
Richard Tang	e3154ca0ee	fix: colony session loading	2026-04-17 19:45:31 -07:00
Richard Tang	84a92af41b	fix: patch the correct db path	2026-04-17 19:40:59 -07:00
Richard Tang	78fc62210a	feat table tab improvements	2026-04-17 19:25:15 -07:00
Timothy	2fd7e9172a	fix: y-offset inspection	2026-04-17 19:24:41 -07:00
Richard Tang	ca63fd9ee9	feat: create skill along with colony	2026-04-17 19:03:28 -07:00
Richard Tang	b99f25c8d7	feat: DataGrid for colony side bar	2026-04-17 18:47:19 -07:00
Timothy	e972112074	feature: merge sidebars with functionalities	2026-04-17 18:12:18 -07:00
Vincent Jiang	6e97191f21	feat: UI/UX improvements across BYOK, org chart, profiles, and prompt library - BYOK: unified styling (remove purple, consistent grey headers), model selector opens settings modal directly, backend validates API keys before activation - Org chart: queen profiles are now editable (name, title, about, skills, achievement) with changes persisted to YAML - Avatars: upload profile pictures for queens and user with client-side compression, displayed across org chart, sidebar, chat, and header - Colony deletion: await backend delete and re-fetch to prevent ghost colonies - Prompt library: add pagination (24/page), custom prompt upload/delete with backend persistence - Settings modal: performance cleanup (remove backdrop-blur, reduce transitions) - Fix ensure_default_queens() overwriting user edits on every API call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 14:21:18 -07:00
Richard Tang	023fb9b8d0	refactor: use SSE for worker and browser status	2026-04-17 14:11:19 -07:00
Richard Tang	b7924b1ad0	feat: colony tab by group	2026-04-17 14:05:55 -07:00
Timothy	b6640b8592	fix: prevent watcher to be gced	2026-04-17 13:13:39 -07:00
Timothy	43a1d5797c	Merge branch 'fix/worker-tab-groups' into feature/clean-context	2026-04-17 12:35:09 -07:00
Timothy	5cb814f2dc	fix: worker tab groups	2026-04-17 12:34:38 -07:00
Richard Tang	f52c44821a	feat: partially validation after typing	2026-04-17 12:16:13 -07:00
Richard Tang	97432ea08c	feat: colony side bar	2026-04-17 11:52:49 -07:00
Timothy	0abd1125b7	fix: parallel execution	2026-04-17 11:20:06 -07:00
Timothy	803337ec74	feat: new queen phases	2026-04-17 06:19:15 -07:00
Timothy	2b055d4d42	fix: simplify system prompt	2026-04-17 04:47:51 -07:00
Timothy	dde4dfaec9	Merge branch 'feature/colony-sqlite' into feature/clean-context	2026-04-17 04:12:35 -07:00
Timothy	6be026fcb1	fix: partial parts and system nudge	2026-04-17 04:06:59 -07:00
Richard Tang	3c2161aad5	chore: release v0.10.2 Release / Create Release (push) Waiting to run Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 23:43:20 -07:00
Richard Tang	e74ebe6835	feat: reduce gemini context window to improve reliability	2026-04-16 23:41:24 -07:00
Richard Tang	d788e5b2f7	chore: ruff lint	2026-04-16 23:33:48 -07:00
Richard Tang	583a5b41b4	fix: ununsed reference	2026-04-16 23:23:38 -07:00
Richard Tang	83cc44bdef	Merge branch 'feature/full-image-size'	2026-04-16 23:15:59 -07:00
Timothy	558813e7fa	feat: fraction-based visual clicks	2026-04-16 22:36:41 -07:00
Timothy	aba0ff07ba	fix: model invariant screenshot	2026-04-16 20:29:05 -07:00
Timothy	4303a36df0	fix: namespaced browser tab groups	2026-04-16 20:07:05 -07:00
Timothy	e68d8ef10b	fix: do not kill queen when switching	2026-04-16 19:29:00 -07:00
Richard Tang	c6b6a5a2f7	feat: GCP skills and prompts improvements	2026-04-16 17:43:52 -07:00
Richard Tang	18f5f078fc	feat: dashed highlighter for browser type focus	2026-04-16 17:26:09 -07:00
Richard Tang	cc6ec97a75	feat: multiple modes browser snapshot tool	2026-04-16 17:22:44 -07:00
Timothy	b50f237506	fix: screenshot skill diction	2026-04-16 15:16:22 -07:00
Timothy	59b1bc9338	fix: tool grouping logic	2026-04-16 12:55:10 -07:00
Timothy	37672c5581	fix: remove worker tool from dm	2026-04-16 12:23:19 -07:00
Timothy	7b0948cd62	Merge branch 'refactor/worker-message' into feature/colony-sqlite	2026-04-16 11:26:46 -07:00
Timothy	4aa5fd7a90	refactor: align worker display	2026-04-16 11:26:32 -07:00
Richard Tang	d20b617008	feat: queen profile in message bubbles	2026-04-16 11:21:02 -07:00
Timothy	c4ee12532f	fix: worker message display	2026-04-16 11:20:17 -07:00
Richard Tang	36ebf27e3e	feat: make side bar size adjustble	2026-04-16 11:15:47 -07:00
Richard Tang	ae1599c66a	feat: queen profile side bar	2026-04-16 11:15:30 -07:00
Richard Tang	810cf5a6d3	Merge remote-tracking branch 'origin/main' into feature/colony-sqlite	2026-04-16 11:10:34 -07:00
Timothy	1ee0d5a2e8	feat: worker bubble display	2026-04-16 10:48:44 -07:00
Richard Tang	be94c611bd	fix: queen fail when no worker is running	2026-04-15 22:14:36 -07:00
Timothy	45df68c146	feat: ensure sqlite3 installation	2026-04-15 18:34:33 -07:00
Timothy	2231dc5742	fix: delete spilled skill	2026-04-15 18:14:10 -07:00
Timothy	446844b2ad	fix: tighten worker with sqlite skills	2026-04-15 18:11:15 -07:00
Timothy	e719523434	fix: remove conflicting tools	2026-04-15 17:38:05 -07:00
Timothy	79c5d43006	feat: colony sqlite and skills	2026-04-15 15:28:37 -07:00