Merge remote-tracking branch 'origin/feat/question-widget' into queen-mode-separation

2026-03-03 20:09:10 -08:00
parent 273f411eee d7075b459b
commit 8daaf000b1
18 changed files with 2097 additions and 461 deletions
@@ -651,21 +651,36 @@ stop_worker() to return to STAGING mode.
 _queen_behavior = """
 # Behavior

+## CRITICAL RULE — ask_user tool
+
+Every response that ends with a question, a prompt, or expects user \
+input MUST finish with a call to ask_user(prompt, options). This is \
+NON-NEGOTIABLE. The system CANNOT detect that you are waiting for \
+input unless you call ask_user. You MUST call ask_user as the LAST \
+action in your response.
+
+NEVER end a response with a question in text without calling ask_user. \
+NEVER rely on the user seeing your text and replying — call ask_user.
+
+Always provide 2-4 short options that cover the most likely answers. \
+The user can always type a custom response.
+
+Examples:
+- ask_user("What do you need?",
+  ["Build a new agent", "Run the loaded worker", "Help with code"])
+- ask_user("Which pattern?",
+  ["Simple 2-node", "Rich with feedback", "Custom"])
+- ask_user("Ready to proceed?",
+  ["Yes, go ahead", "Let me change something"])
+
 ## Greeting and identity

-When the user greets you ("hi", "hello") or asks what you can do / \
-what you are, respond concisely. DO NOT list internal processes \
-(validation steps, AgentRunner.load, tool discovery). Focus on \
-user-facing capabilities:
-
-1. Direct capabilities: file operations, shell commands, coding, \
-agent building & debugging.
-2. Delegation: describe what the loaded worker does in one sentence \
-(read the Worker Profile at the end of this prompt). If no worker \
-is loaded, say so.
-3. End with a short prompt: "What do you need?"
-
-Keep it under 10 lines. No bullet-point dumps of every tool you have.
+When the user greets you or asks what you can do, respond concisely \
+(under 10 lines). DO NOT list internal processes. Focus on:
+1. Direct capabilities: coding, agent building & debugging.
+2. What the loaded worker does (one sentence from Worker Profile). \
+If no worker is loaded, say so.
+3. THEN call ask_user to prompt them — do NOT just write text.

 ## Direct coding
 You can do any coding task directly — reading files, writing code, running \
@@ -715,24 +730,37 @@ explain the problem clearly and help fix it. For credential errors, \
 guide the user to set up the missing credentials. For structural \
 issues, offer to fix the agent graph directly.

-## When worker is running:
- If the user asks about progress, call get_worker_status() ONCE and \
-report the result. Do NOT poll in a loop.
- NEVER call get_worker_status() repeatedly without user input in between. \
-The worker will surface results through client-facing nodes. You do not \
-need to monitor it. One check per user request is enough.
- If the user has a concern or instruction for the worker, call \
-inject_worker_message(content) to relay it.
- You can still do coding tasks directly while the worker runs.
- If an escalation ticket arrives from the judge, assess severity:
-  - Low/transient: acknowledge silently, do not disturb the user.
-  - High/critical: notify the user with a brief analysis and suggested action.
- After starting the worker or checking its status, WAIT for the user's \
-next message. Do not take autonomous actions unless the user asks.
+## When worker is running — GO SILENT

-## When worker asks user a question:
- The system will route the user's response directly to the worker. \
-You do not need to relay it. The user will come back to you after responding.
+Once you call start_worker(), your job is DONE. Do NOT call ask_user, \
+do NOT call get_worker_status(), do NOT emit any text. Just stop. \
+The worker owns the conversation now — it has its own client-facing \
+nodes that talk to the user directly.
+
+**After start_worker, your ENTIRE response should be ONE short \
+confirmation sentence with NO tool calls.** Example: \
+"Started the vulnerability assessment." — that's it. No ask_user, \
+no get_worker_status, no follow-up questions.
+
+You only wake up again when:
+- The user explicitly addresses you (not answering a worker question)
+- A worker question is forwarded to you for relay
+- An escalation ticket arrives from the judge
+- The worker finishes
+
+If the user explicitly asks about progress, call get_worker_status() \
+ONCE and report. Do NOT poll or check proactively.
+
+For escalation tickets: low/transient → acknowledge silently. \
+High/critical → notify the user with a brief analysis.
+
+## When the worker asks the user a question:
+- The user's answer is routed to you with context: \
+[Worker asked: "...", Options: ...] User answered: "...".
+- If the user is answering the worker's question normally, relay it \
+using inject_worker_message(answer_text). Then go silent again.
+- If the user is rejecting the approach, asking to stop, or giving \
+you an instruction, handle it yourself — do NOT relay.

 ## Showing or describing the loaded worker

@@ -152,6 +152,74 @@ def _compact_tool_calls(tool_calls: list[dict[str, Any]]) -> list[dict[str, Any]
    return compact


+def extract_tool_call_history(messages: list[Message], max_entries: int = 30) -> str:
+    """Build a compact tool call history from a list of messages.
+
+    Used in compaction summaries to prevent the LLM from re-calling
+    tools it already called.  Extracts tool call details, files saved,
+    outputs set, and errors encountered.
+    """
+    tool_calls_detail: dict[str, list[str]] = {}
+    files_saved: list[str] = []
+    outputs_set: list[str] = []
+    errors: list[str] = []
+
+    def _summarize_input(name: str, args: dict) -> str:
+        if name == "web_search":
+            return args.get("query", "")
+        if name == "web_scrape":
+            return args.get("url", "")
+        if name in ("load_data", "save_data"):
+            return args.get("filename", "")
+        return ""
+
+    for msg in messages:
+        if msg.role == "assistant" and msg.tool_calls:
+            for tc in msg.tool_calls:
+                func = tc.get("function", {})
+                name = func.get("name", "unknown")
+                try:
+                    args = json.loads(func.get("arguments", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    args = {}
+
+                summary = _summarize_input(name, args)
+                tool_calls_detail.setdefault(name, []).append(summary)
+
+                if name == "save_data" and args.get("filename"):
+                    files_saved.append(args["filename"])
+                if name == "set_output" and args.get("key"):
+                    outputs_set.append(args["key"])
+
+        if msg.role == "tool" and msg.is_error:
+            preview = msg.content[:120].replace("\n", " ")
+            errors.append(preview)
+
+    parts: list[str] = []
+    if tool_calls_detail:
+        lines: list[str] = []
+        for name, inputs in list(tool_calls_detail.items())[:max_entries]:
+            count = len(inputs)
+            non_empty = [s for s in inputs if s]
+            if non_empty:
+                detail_lines = [f"    - {s[:120]}" for s in non_empty[:8]]
+                lines.append(f"  {name} ({count}x):\n" + "\n".join(detail_lines))
+            else:
+                lines.append(f"  {name} ({count}x)")
+        parts.append("TOOLS ALREADY CALLED:\n" + "\n".join(lines))
+    if files_saved:
+        unique = list(dict.fromkeys(files_saved))
+        parts.append("FILES SAVED: " + ", ".join(unique))
+    if outputs_set:
+        unique = list(dict.fromkeys(outputs_set))
+        parts.append("OUTPUTS SET: " + ", ".join(unique))
+    if errors:
+        parts.append(
+            "ERRORS (do NOT retry these):\n" + "\n".join(f"  - {e}" for e in errors[:10])
+        )
+    return "\n\n".join(parts)
+
+
 # ---------------------------------------------------------------------------
 # ConversationStore protocol (Phase 2)
 # ---------------------------------------------------------------------------
@@ -373,9 +441,36 @@ class NodeConversation:
    def _repair_orphaned_tool_calls(
        msgs: list[dict[str, Any]],
    ) -> list[dict[str, Any]]:
-        """Ensure every tool_call has a matching tool-result message."""
+        """Ensure tool_call / tool_result pairs are consistent.
+
+        1. **Orphaned tool results** (tool_result with no preceding tool_use)
+           are dropped.  This happens when compaction removes an assistant
+           message but leaves its tool-result messages behind.
+        2. **Orphaned tool calls** (tool_use with no following tool_result)
+           get a synthetic error result appended.  This happens when a loop
+           is cancelled mid-tool-execution.
+        """
+        # Pass 1: collect all tool_call IDs from assistant messages so we
+        # can identify orphaned tool-result messages.
+        all_tool_call_ids: set[str] = set()
+        for m in msgs:
+            if m.get("role") == "assistant":
+                for tc in m.get("tool_calls") or []:
+                    tc_id = tc.get("id")
+                    if tc_id:
+                        all_tool_call_ids.add(tc_id)
+
+        # Pass 2: build repaired list — drop orphaned tool results, patch
+        # missing tool results.
        repaired: list[dict[str, Any]] = []
        for i, m in enumerate(msgs):
+            # Drop tool-result messages whose tool_call_id has no matching
+            # tool_use in any assistant message (orphaned by compaction).
+            if m.get("role") == "tool":
+                tid = m.get("tool_call_id")
+                if tid and tid not in all_tool_call_ids:
+                    continue  # skip orphaned result
+
            repaired.append(m)
            tool_calls = m.get("tool_calls")
            if m.get("role") != "assistant" or not tool_calls:
@@ -653,6 +748,7 @@ class NodeConversation:
        spillover_dir: str,
        keep_recent: int = 4,
        phase_graduated: bool = False,
+        aggressive: bool = False,
    ) -> None:
        """Structure-preserving compaction: save freeform text to file, keep tool messages.

@@ -662,6 +758,11 @@ class NodeConversation:
        after pruning.  Only freeform text exchanges (user messages,
        text-only assistant messages) are saved to a file and removed.

+        When *aggressive* is True, non-essential tool call pairs are also
+        collapsed into a compact summary instead of being kept individually.
+        Only ``set_output`` calls and error results are preserved; all other
+        old tool pairs are replaced by a tool-call history summary.
+
        The result: the agent retains exact knowledge of what tools it called,
        where each result is stored, and can load the conversation text if
        needed.  No LLM summary call.  No heuristics.  Nothing lost.
@@ -693,35 +794,92 @@ class NodeConversation:
        # Classify old messages: structural (keep) vs freeform (save to file)
        kept_structural: list[Message] = []
        freeform_lines: list[str] = []
+        collapsed_msgs: list[Message] = []

-        for msg in old_messages:
-            if msg.role == "tool":
-                # Tool results — already pruned to ~30 tokens (file reference).
-                # Keep in conversation.
-                kept_structural.append(msg)
-            elif msg.role == "assistant" and msg.tool_calls:
-                # Assistant message with tool_calls — keep the tool_calls
-                # with truncated arguments, clear the freeform text content.
-                compact_tcs = _compact_tool_calls(msg.tool_calls)
-                kept_structural.append(
-                    Message(
-                        seq=msg.seq,
-                        role=msg.role,
-                        content="",
-                        tool_calls=compact_tcs,
-                        is_error=msg.is_error,
-                        phase_id=msg.phase_id,
-                        is_transition_marker=msg.is_transition_marker,
-                    )
+        if aggressive:
+            # Aggressive: only keep set_output tool pairs and error results.
+            # Everything else is collapsed into a tool-call history summary.
+            # We need to track tool_call IDs to pair assistant messages with
+            # their tool results.
+            protected_tc_ids: set[str] = set()
+            collapsible_tc_ids: set[str] = set()
+
+            # First pass: classify assistant messages
+            for msg in old_messages:
+                if msg.role != "assistant" or not msg.tool_calls:
+                    continue
+                has_protected = any(
+                    tc.get("function", {}).get("name") == "set_output"
+                    for tc in msg.tool_calls
                )
-            else:
-                # Freeform text (user messages, text-only assistant messages)
-                # — save to file and remove from conversation.
-                role_label = msg.role
-                text = msg.content
-                if len(text) > 2000:
-                    text = text[:2000] + "…"
-                freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
+                tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
+                if has_protected:
+                    protected_tc_ids |= tc_ids
+                else:
+                    collapsible_tc_ids |= tc_ids
+
+            # Second pass: classify all messages
+            for msg in old_messages:
+                if msg.role == "tool":
+                    tc_id = msg.tool_use_id or ""
+                    if tc_id in protected_tc_ids:
+                        kept_structural.append(msg)
+                    elif msg.is_error:
+                        # Error results are always protected
+                        kept_structural.append(msg)
+                        # Protect the parent assistant message too
+                        protected_tc_ids.add(tc_id)
+                    else:
+                        collapsed_msgs.append(msg)
+                elif msg.role == "assistant" and msg.tool_calls:
+                    tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
+                    if tc_ids & protected_tc_ids:
+                        # Has at least one protected tool call — keep entire msg
+                        compact_tcs = _compact_tool_calls(msg.tool_calls)
+                        kept_structural.append(
+                            Message(
+                                seq=msg.seq,
+                                role=msg.role,
+                                content="",
+                                tool_calls=compact_tcs,
+                                is_error=msg.is_error,
+                                phase_id=msg.phase_id,
+                                is_transition_marker=msg.is_transition_marker,
+                            )
+                        )
+                    else:
+                        collapsed_msgs.append(msg)
+                else:
+                    # Freeform text — save to file
+                    role_label = msg.role
+                    text = msg.content
+                    if len(text) > 2000:
+                        text = text[:2000] + "…"
+                    freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
+        else:
+            # Standard mode: keep all tool call pairs as structural
+            for msg in old_messages:
+                if msg.role == "tool":
+                    kept_structural.append(msg)
+                elif msg.role == "assistant" and msg.tool_calls:
+                    compact_tcs = _compact_tool_calls(msg.tool_calls)
+                    kept_structural.append(
+                        Message(
+                            seq=msg.seq,
+                            role=msg.role,
+                            content="",
+                            tool_calls=compact_tcs,
+                            is_error=msg.is_error,
+                            phase_id=msg.phase_id,
+                            is_transition_marker=msg.is_transition_marker,
+                        )
+                    )
+                else:
+                    role_label = msg.role
+                    text = msg.content
+                    if len(text) > 2000:
+                        text = text[:2000] + "…"
+                    freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")

        # Write freeform text to a numbered conversation file
        spill_path = Path(spillover_dir)
@@ -741,13 +899,25 @@ class NodeConversation:
            conv_filename = ""

        # Build reference message
+        ref_parts: list[str] = []
        if conv_filename:
-            ref_content = (
+            ref_parts.append(
                f"[Previous conversation saved to '{conv_filename}'. "
                f"Use load_data('{conv_filename}') to review if needed.]"
            )
-        else:
-            ref_content = "[Previous freeform messages compacted.]"
+        elif not collapsed_msgs:
+            ref_parts.append("[Previous freeform messages compacted.]")
+
+        # Aggressive: add collapsed tool-call history to the reference
+        if collapsed_msgs:
+            tool_history = extract_tool_call_history(collapsed_msgs)
+            if tool_history:
+                ref_parts.append(tool_history)
+            elif not ref_parts:
+                ref_parts.append("[Previous tool calls compacted.]")
+
+        ref_content = "\n\n".join(ref_parts)
+
        # Use a seq just before the first kept message
        recent_messages = list(self._messages[split:])
        if kept_structural:
@@ -760,15 +930,15 @@ class NodeConversation:

        ref_msg = Message(seq=ref_seq, role="user", content=ref_content)

-        # Persist: delete old messages from store, write reference + kept structural
+        # Persist: delete old messages from store, write reference + kept structural.
+        # In aggressive mode, collapsed messages may be interspersed with kept
+        # messages, so we delete everything before the recent boundary and
+        # rewrite only what we want to keep.
        if self._store:
-            first_kept_seq = (
-                kept_structural[0].seq
-                if kept_structural
-                else (recent_messages[0].seq if recent_messages else self._next_seq)
+            recent_boundary = (
+                recent_messages[0].seq if recent_messages else self._next_seq
            )
-            # Delete everything before the first structural message we're keeping
-            await self._store.delete_parts_before(first_kept_seq)
+            await self._store.delete_parts_before(recent_boundary)
            # Write the reference message
            await self._store.write_part(ref_msg.seq, ref_msg.to_storage_dict())
            # Write kept structural messages (they may have been modified)
@@ -289,6 +289,114 @@ class GraphExecutor:

        return errors

+    # Max chars of formatted messages before proactively splitting for LLM.
+    _PHASE_LLM_CHAR_LIMIT = 240_000
+    _PHASE_LLM_MAX_DEPTH = 10
+
+    async def _phase_llm_compact(
+        self,
+        conversation: Any,
+        next_spec: NodeSpec,
+        messages: list,
+        _depth: int = 0,
+    ) -> str:
+        """Summarise messages for phase-boundary compaction.
+
+        Uses the same recursive binary-search splitting as EventLoopNode.
+        """
+        from framework.graph.conversation import extract_tool_call_history
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        if _depth > self._PHASE_LLM_MAX_DEPTH:
+            raise RuntimeError("Phase LLM compaction recursion limit")
+
+        # Format messages
+        lines: list[str] = []
+        for m in messages:
+            if m.role == "tool":
+                c = m.content[:500] + ("..." if len(m.content) > 500 else "")
+                lines.append(f"[tool result]: {c}")
+            elif m.role == "assistant" and m.tool_calls:
+                names = [
+                    tc.get("function", {}).get("name", "?")
+                    for tc in m.tool_calls
+                ]
+                lines.append(f"[assistant (calls: {', '.join(names)})]: "
+                             f"{m.content[:200] if m.content else ''}")
+            else:
+                lines.append(f"[{m.role}]: {m.content}")
+        formatted = "\n\n".join(lines)
+
+        # Proactive split
+        if len(formatted) > self._PHASE_LLM_CHAR_LIMIT and len(messages) > 1:
+            summary = await self._phase_llm_compact_split(
+                conversation, next_spec, messages, _depth,
+            )
+        else:
+            max_tokens = getattr(conversation, "_max_history_tokens", 32000)
+            target_tokens = max_tokens // 2
+            target_chars = target_tokens * 4
+
+            prompt = (
+                "You are compacting an AI agent's conversation history "
+                "at a phase boundary.\n\n"
+                f"NEXT PHASE: {next_spec.name}\n"
+            )
+            if next_spec.description:
+                prompt += f"NEXT PHASE PURPOSE: {next_spec.description}\n"
+            prompt += (
+                f"\nCONVERSATION MESSAGES:\n{formatted}\n\n"
+                "INSTRUCTIONS:\n"
+                f"Write a summary of approximately {target_chars} characters "
+                f"(~{target_tokens} tokens).\n"
+                "Preserve user-stated rules, constraints, and preferences "
+                "verbatim. Preserve key decisions and results from earlier "
+                "phases. Preserve context needed for the next phase.\n"
+            )
+            summary_budget = max(1024, max_tokens // 2)
+            try:
+                response = await self._llm.acomplete(
+                    messages=[{"role": "user", "content": prompt}],
+                    system=(
+                        "You are a conversation compactor. Write a detailed "
+                        "summary preserving context for the next phase."
+                    ),
+                    max_tokens=summary_budget,
+                )
+                summary = response.content
+            except Exception as e:
+                if _is_context_too_large_error(e) and len(messages) > 1:
+                    summary = await self._phase_llm_compact_split(
+                        conversation, next_spec, messages, _depth,
+                    )
+                else:
+                    raise
+
+        # Append tool history at top level only
+        if _depth == 0:
+            tool_history = extract_tool_call_history(messages)
+            if tool_history and "TOOLS ALREADY CALLED" not in summary:
+                summary += "\n\n" + tool_history
+
+        return summary
+
+    async def _phase_llm_compact_split(
+        self,
+        conversation: Any,
+        next_spec: NodeSpec,
+        messages: list,
+        _depth: int,
+    ) -> str:
+        """Split messages in half and summarise each half."""
+        mid = max(1, len(messages) // 2)
+        s1 = await self._phase_llm_compact(
+            conversation, next_spec, messages[:mid], _depth + 1,
+        )
+        s2 = await self._phase_llm_compact(
+            conversation, next_spec, messages[mid:], _depth + 1,
+        )
+        return s1 + "\n\n" + s2
+
    async def execute(
        self,
        graph: GraphSpec,
@@ -1294,9 +1402,7 @@ class GraphExecutor:
                        # Set current phase for phase-aware compaction
                        continuous_conversation.set_current_phase(next_spec.id)

-                        # Opportunistic compaction at transition:
-                        # 1. Prune old tool results (free, no LLM call)
-                        # 2. If still over 80%, do a phase-graduated compact
+                        # Phase-boundary compaction (same flow as EventLoopNode._compact)
                        if continuous_conversation.usage_ratio() > 0.5:
                            await continuous_conversation.prune_old_tool_results(
                                protect_tokens=2000,
@@ -1308,40 +1414,66 @@ class GraphExecutor:
                                _phase_ratio * 100,
                            )
                            _data_dir = (
-                                str(self._storage_path / "data") if self._storage_path else None
+                                str(self._storage_path / "data")
+                                if self._storage_path
+                                else None
                            )
+                            # Step 1: Structural compaction (>=80%)
                            if _data_dir:
+                                _pre = continuous_conversation.usage_ratio()
                                await continuous_conversation.compact_preserving_structure(
                                    spillover_dir=_data_dir,
                                    keep_recent=4,
                                    phase_graduated=True,
                                )
-                                # Circuit breaker: if still over budget, fall back
-                                _post_ratio = continuous_conversation.usage_ratio()
-                                if _post_ratio >= 0.9 * _phase_ratio:
-                                    self.logger.warning(
-                                        "   Structure-preserving compaction ineffective "
-                                        "(%.0f%% -> %.0f%%), falling back to summary",
-                                        _phase_ratio * 100,
-                                        _post_ratio * 100,
-                                    )
-                                    summary = (
-                                        f"Summary of earlier phases (before {next_spec.name}). "
-                                        "See transition markers for phase details."
-                                    )
-                                    await continuous_conversation.compact(
-                                        summary,
+                                if continuous_conversation.usage_ratio() >= 0.9 * _pre:
+                                    await continuous_conversation.compact_preserving_structure(
+                                        spillover_dir=_data_dir,
                                        keep_recent=4,
                                        phase_graduated=True,
+                                        aggressive=True,
                                    )
-                            else:
+
+                            # Step 2: LLM compaction (>95%)
+                            if (
+                                continuous_conversation.usage_ratio() > 0.95
+                                and self._llm is not None
+                            ):
+                                self.logger.info(
+                                    "   LLM phase-boundary compaction "
+                                    "(%.0f%% usage)",
+                                    continuous_conversation.usage_ratio() * 100,
+                                )
+                                try:
+                                    _llm_summary = await self._phase_llm_compact(
+                                        continuous_conversation,
+                                        next_spec,
+                                        list(continuous_conversation.messages),
+                                    )
+                                    await continuous_conversation.compact(
+                                        _llm_summary,
+                                        keep_recent=2,
+                                        phase_graduated=True,
+                                    )
+                                except Exception as e:
+                                    self.logger.warning(
+                                        "   Phase LLM compaction failed: %s", e,
+                                    )
+
+                            # Step 3: Emergency (only if still over budget)
+                            if continuous_conversation.needs_compaction():
+                                self.logger.warning(
+                                    "   Emergency phase compaction (%.0f%%)",
+                                    continuous_conversation.usage_ratio() * 100,
+                                )
                                summary = (
-                                    f"Summary of earlier phases (before {next_spec.name}). "
+                                    f"Summary of earlier phases "
+                                    f"(before {next_spec.name}). "
                                    "See transition markers for phase details."
                                )
                                await continuous_conversation.compact(
                                    summary,
-                                    keep_recent=4,
+                                    keep_recent=1,
                                    phase_graduated=True,
                                )

@@ -718,15 +718,24 @@ class EventBus:
        node_id: str,
        prompt: str = "",
        execution_id: str | None = None,
+        options: list[str] | None = None,
    ) -> None:
-        """Emit client input requested event (client_facing=True nodes)."""
+        """Emit client input requested event (client_facing=True nodes).
+
+        Args:
+            options: Optional predefined choices for the user (1-3 items).
+                     The frontend appends an "Other" free-text option automatically.
+        """
+        data: dict[str, Any] = {"prompt": prompt}
+        if options:
+            data["options"] = options
        await self.publish(
            AgentEvent(
                type=EventType.CLIENT_INPUT_REQUESTED,
                stream_id=stream_id,
                node_id=node_id,
                execution_id=execution_id,
-                data={"prompt": prompt},
+                data=data,
            )
        )

@@ -511,9 +511,11 @@ class ExecutionStream:
        logger.debug(f"Queued execution {execution_id} for stream {self.stream_id}")
        return execution_id

-    # Errors that indicate a fundamental configuration or environment problem.
-    # Resurrecting after these is pointless — the same error will recur.
+    # Errors that indicate resurrection won't help — the same error will recur.
+    # Includes both configuration/environment errors and deterministic node
+    # failures where the conversation/state hasn't changed.
    _FATAL_ERROR_PATTERNS: tuple[str, ...] = (
+        # Configuration / environment
        "credential",
        "authentication",
        "unauthorized",
@@ -525,6 +527,11 @@ class ExecutionStream:
        "permission denied",
        "invalid api",
        "configuration error",
+        # Deterministic node failures — resurrecting at the same node with
+        # the same conversation produces the same result.
+        "node stalled",
+        "ghost empty stream",
+        "max iterations",
    )

    @classmethod
@@ -132,6 +132,29 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
        "SSE connected: session='%s', sub_id='%s', types=%d", session.id, sub_id, len(event_types)
    )

+    # Replay buffered events that were published before this SSE connected.
+    # The EventBus keeps a history ring-buffer; we replay the subset that
+    # produces visible chat messages so the frontend never misses early
+    # queen output.  Lifecycle events are NOT replayed to avoid duplicate
+    # state transitions (turn counter increments, etc.).
+    _REPLAY_TYPES = {
+        EventType.CLIENT_OUTPUT_DELTA.value,
+        EventType.EXECUTION_STARTED.value,
+        EventType.CLIENT_INPUT_REQUESTED.value,
+    }
+    event_type_values = {et.value for et in event_types}
+    replay_types = _REPLAY_TYPES & event_type_values
+    replayed = 0
+    for past_event in event_bus._event_history:
+        if past_event.type.value in replay_types:
+            try:
+                queue.put_nowait(past_event.to_dict())
+                replayed += 1
+            except asyncio.QueueFull:
+                break
+    if replayed:
+        logger.info("SSE replayed %d buffered events for session='%s'", replayed, session.id)
+
    event_count = 0
    close_reason = "unknown"
    try:
@@ -134,6 +134,35 @@ async def handle_chat(request: web.Request) -> web.Response:
    return web.json_response({"error": "Queen not available"}, status=503)


+async def handle_queen_context(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/queen-context — queue context for the queen.
+
+    Unlike /chat, this does NOT trigger an LLM response. The message is
+    queued in the queen's injection queue and will be drained on her next
+    natural iteration (prefixed with [External event]:).
+
+    Body: {"message": "..."}
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    body = await request.json()
+    message = body.get("message", "")
+
+    if not message:
+        return web.json_response({"error": "message is required"}, status=400)
+
+    queen_executor = session.queen_executor
+    if queen_executor is not None:
+        node = queen_executor.node_registry.get("queen")
+        if node is not None and hasattr(node, "inject_event"):
+            await node.inject_event(message, is_client_input=False)
+            return web.json_response({"status": "queued", "delivered": True})
+
+    return web.json_response({"error": "Queen not available"}, status=503)
+
+
 async def handle_worker_input(request: web.Request) -> web.Response:
    """POST /api/sessions/{session_id}/worker-input — send input to waiting worker node.

@@ -385,6 +414,7 @@ def register_routes(app: web.Application) -> None:
    app.router.add_post("/api/sessions/{session_id}/trigger", handle_trigger)
    app.router.add_post("/api/sessions/{session_id}/inject", handle_inject)
    app.router.add_post("/api/sessions/{session_id}/chat", handle_chat)
+    app.router.add_post("/api/sessions/{session_id}/queen-context", handle_queen_context)
    app.router.add_post("/api/sessions/{session_id}/worker-input", handle_worker_input)
    app.router.add_post("/api/sessions/{session_id}/pause", handle_stop)
    app.router.add_post("/api/sessions/{session_id}/resume", handle_resume)
@@ -838,7 +838,7 @@ def register_queen_lifecycle_tools(
            injectable = stream.get_injectable_nodes()
            if injectable:
                target_node_id = injectable[0]["node_id"]
-                ok = await stream.inject_input(target_node_id, content)
+                ok = await stream.inject_input(target_node_id, content, is_client_input=True)
                if ok:
                    return json.dumps(
                        {
@@ -37,6 +37,10 @@ export const executionApi = {
  chat: (sessionId: string, message: string) =>
    api.post<ChatResult>(`/sessions/${sessionId}/chat`, { message }),

+  /** Queue context for the queen without triggering an LLM response. */
+  queenContext: (sessionId: string, message: string) =>
+    api.post<ChatResult>(`/sessions/${sessionId}/queen-context`, { message }),
+
  workerInput: (sessionId: string, message: string) =>
    api.post<ChatResult>(`/sessions/${sessionId}/worker-input`, { message }),

@@ -1,6 +1,7 @@
 import { memo, useState, useRef, useEffect } from "react";
-import { Send, Square, Crown, Cpu, Check, Loader2, Reply } from "lucide-react";
+import { Send, Square, Crown, Cpu, Check, Loader2 } from "lucide-react";
 import MarkdownContent from "@/components/MarkdownContent";
+import QuestionWidget from "@/components/QuestionWidget";

 export interface ChatMessage {
  id: string;
@@ -20,15 +21,23 @@ interface ChatPanelProps {
  messages: ChatMessage[];
  onSend: (message: string, thread: string) => void;
  isWaiting?: boolean;
+  /** When true a worker is thinking (not yet streaming) */
+  isWorkerWaiting?: boolean;
+  /** When true the queen is busy (typing or streaming) — shows the stop button */
+  isBusy?: boolean;
  activeThread: string;
-  /** When true, the worker is waiting for user input — shows inline reply box */
-  workerAwaitingInput?: boolean;
  /** When true, the input is disabled (e.g. during loading) */
  disabled?: boolean;
  /** Called when user clicks the stop button to cancel the queen's current turn */
  onCancel?: () => void;
-  /** Called when user submits a reply to the worker's input request */
-  onWorkerReply?: (message: string) => void;
+  /** Pending question from ask_user — replaces textarea when present */
+  pendingQuestion?: string | null;
+  /** Options for the pending question */
+  pendingOptions?: string[] | null;
+  /** Called when user submits an answer to the pending question */
+  onQuestionSubmit?: (answer: string, isOther: boolean) => void;
+  /** Called when user dismisses the pending question without answering */
+  onQuestionDismiss?: () => void;
  /** Queen operating mode — shown as a tag on queen messages */
  queenMode?: "building" | "staging" | "running";
 }
@@ -287,10 +296,12 @@ const MessageBubble = memo(function MessageBubble({ msg, queenMode }: { msg: Cha
  );
 }, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.queenMode === next.queenMode);

-export default function ChatPanel({ messages, onSend, isWaiting, activeThread, workerAwaitingInput, disabled, onCancel, onWorkerReply, queenMode }: ChatPanelProps) {
+export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, onQuestionSubmit, onQuestionDismiss, queenMode }: ChatPanelProps) {
  const [input, setInput] = useState("");
  const [readMap, setReadMap] = useState<Record<string, number>>({});
  const bottomRef = useRef<HTMLDivElement>(null);
+  const scrollRef = useRef<HTMLDivElement>(null);
+  const stickToBottom = useRef(true);
  const textareaRef = useRef<HTMLTextAreaElement>(null);

  const threadMessages = messages.filter((m) => {
@@ -307,10 +318,24 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
  // Suppress unused var
  void readMap;

-  const lastMsg = threadMessages[threadMessages.length - 1];
+  // Autoscroll: only when user is already near the bottom
+  const handleScroll = () => {
+    const el = scrollRef.current;
+    if (!el) return;
+    const distFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
+    stickToBottom.current = distFromBottom < 80;
+  };
+
  useEffect(() => {
-    bottomRef.current?.scrollIntoView({ behavior: "smooth" });
-  }, [threadMessages.length, lastMsg?.content, workerAwaitingInput]);
+    if (stickToBottom.current) {
+      bottomRef.current?.scrollIntoView({ behavior: "smooth" });
+    }
+  }, [threadMessages, pendingQuestion, isWaiting, isWorkerWaiting]);
+
+  // Always start pinned to bottom when switching threads
+  useEffect(() => {
+    stickToBottom.current = true;
+  }, [activeThread]);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
@@ -320,17 +345,6 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
    if (textareaRef.current) textareaRef.current.style.height = "auto";
  };

-  // Find the last worker message to attach the inline reply box below.
-  // For explicit ask_user, this will be the worker_input_request message.
-  // For auto-block, this will be the last client_output_delta streamed message.
-  const lastWorkerMsgIdx = workerAwaitingInput
-    ? threadMessages.reduce(
-        (last, m, i) =>
-          m.role === "worker" && m.type !== "tool_status" && m.type !== "system" ? i : last,
-        -1,
-      )
-    : -1;
-
  return (
    <div className="flex flex-col h-full min-w-0">
      {/* Compact sub-header */}
@@ -339,8 +353,8 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
      </div>

      {/* Messages */}
-      <div className="flex-1 overflow-auto px-5 py-4 space-y-3">
-        {threadMessages.map((msg, idx) => (
+      <div ref={scrollRef} onScroll={handleScroll} className="flex-1 overflow-auto px-5 py-4 space-y-3">
+        {threadMessages.map((msg) => (
          <div key={msg.id}>
            <MessageBubble msg={msg} queenMode={queenMode} />
            {idx === lastWorkerMsgIdx && onWorkerReply && (
@@ -351,8 +365,35 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w

        {isWaiting && (
          <div className="flex gap-3">
-            <div className="w-7 h-7 rounded-xl bg-muted flex items-center justify-center">
-              <Cpu className="w-3.5 h-3.5 text-muted-foreground" />
+            <div
+              className="flex-shrink-0 w-9 h-9 rounded-xl flex items-center justify-center"
+              style={{
+                backgroundColor: `${queenColor}18`,
+                border: `1.5px solid ${queenColor}35`,
+                boxShadow: `0 0 12px ${queenColor}20`,
+              }}
+            >
+              <Crown className="w-4 h-4" style={{ color: queenColor }} />
+            </div>
+            <div className="border border-primary/20 bg-primary/5 rounded-2xl rounded-tl-md px-4 py-3">
+              <div className="flex gap-1.5">
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "0ms" }} />
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "150ms" }} />
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "300ms" }} />
+              </div>
+            </div>
+          </div>
+        )}
+        {isWorkerWaiting && !isWaiting && (
+          <div className="flex gap-3">
+            <div
+              className="flex-shrink-0 w-7 h-7 rounded-xl flex items-center justify-center"
+              style={{
+                backgroundColor: `${workerColor}18`,
+                border: `1.5px solid ${workerColor}35`,
+              }}
+            >
+              <Cpu className="w-3.5 h-3.5" style={{ color: workerColor }} />
            </div>
            <div className="bg-muted/60 rounded-2xl rounded-tl-md px-4 py-3">
              <div className="flex gap-1.5">
@@ -366,48 +407,57 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
        <div ref={bottomRef} />
      </div>

-      {/* Input — always connected to Queen */}
-      <form onSubmit={handleSubmit} className="p-4 border-t border-border">
-        <div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
-          <textarea
-            ref={textareaRef}
-            rows={1}
-            value={input}
-            onChange={(e) => {
-              setInput(e.target.value);
-              const ta = e.target;
-              ta.style.height = "auto";
-              ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
-            }}
-            onKeyDown={(e) => {
-              if (e.key === "Enter" && !e.shiftKey) {
-                e.preventDefault();
-                handleSubmit(e);
-              }
-            }}
-            placeholder={disabled ? "Connecting to agent..." : "Message Queen Bee..."}
-            disabled={disabled}
-            className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
-          />
-          {isWaiting && onCancel ? (
-            <button
-              type="button"
-              onClick={onCancel}
-              className="p-2 rounded-lg bg-destructive text-destructive-foreground hover:opacity-90 transition-opacity"
-            >
-              <Square className="w-4 h-4" />
-            </button>
-          ) : (
-            <button
-              type="submit"
-              disabled={!input.trim() || disabled}
-              className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
-            >
-              <Send className="w-4 h-4" />
-            </button>
-          )}
-        </div>
-      </form>
+      {/* Input area — question widget replaces textarea when a question is pending */}
+      {pendingQuestion && pendingOptions && onQuestionSubmit ? (
+        <QuestionWidget
+          question={pendingQuestion}
+          options={pendingOptions}
+          onSubmit={onQuestionSubmit}
+          onDismiss={onQuestionDismiss}
+        />
+      ) : (
+        <form onSubmit={handleSubmit} className="p-4">
+          <div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
+            <textarea
+              ref={textareaRef}
+              rows={1}
+              value={input}
+              onChange={(e) => {
+                setInput(e.target.value);
+                const ta = e.target;
+                ta.style.height = "auto";
+                ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
+              }}
+              onKeyDown={(e) => {
+                if (e.key === "Enter" && !e.shiftKey) {
+                  e.preventDefault();
+                  handleSubmit(e);
+                }
+              }}
+              placeholder={disabled ? "Connecting to agent..." : "Message Queen Bee..."}
+              disabled={disabled}
+              className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
+            />
+            {isBusy && onCancel ? (
+              <button
+                type="button"
+                onClick={onCancel}
+                className="p-2 rounded-lg bg-amber-500/15 text-amber-400 border border-amber-500/40 hover:bg-amber-500/25 transition-colors"
+              >
+                <Square className="w-4 h-4" />
+              </button>
+            ) : (
+              <button
+                type="submit"
+                disabled={!input.trim() || disabled}
+                className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
+              >
+                <Send className="w-4 h-4" />
+              </button>
+            )}
+          </div>
+        </form>
+      )}
    </div>
  );
 }
@@ -0,0 +1,142 @@
+import { useState, useRef, useEffect, useCallback } from "react";
+import { Send, MessageCircleQuestion, X } from "lucide-react";
+
+export interface QuestionWidgetProps {
+  /** The question text shown to the user */
+  question: string;
+  /** 1-3 predefined options. The UI appends an "Other" free-text option. */
+  options: string[];
+  /** Called with the selected option label or custom text, and whether "Other" was chosen */
+  onSubmit: (answer: string, isOther: boolean) => void;
+  /** Called when user dismisses the question without answering */
+  onDismiss?: () => void;
+}
+
+export default function QuestionWidget({ question, options, onSubmit, onDismiss }: QuestionWidgetProps) {
+  const [selected, setSelected] = useState<number | null>(null);
+  const [customText, setCustomText] = useState("");
+  const [submitted, setSubmitted] = useState(false);
+  const inputRef = useRef<HTMLInputElement>(null);
+  const containerRef = useRef<HTMLDivElement>(null);
+
+  // "Other" is always the last option index
+  const otherIndex = options.length;
+  const isOtherSelected = selected === otherIndex;
+
+  // Focus the text input when "Other" is selected
+  useEffect(() => {
+    if (isOtherSelected) {
+      inputRef.current?.focus();
+    }
+  }, [isOtherSelected]);
+
+  const canSubmit = selected !== null && (!isOtherSelected || customText.trim().length > 0);
+
+  const handleSubmit = useCallback(() => {
+    if (!canSubmit || submitted) return;
+    setSubmitted(true);
+    if (isOtherSelected) {
+      onSubmit(customText.trim(), true);
+    } else {
+      onSubmit(options[selected!], false);
+    }
+  }, [canSubmit, submitted, isOtherSelected, customText, options, selected, onSubmit]);
+
+  // Keyboard: Enter to submit, number keys to select (only when text input is not focused)
+  useEffect(() => {
+    const handleKeyDown = (e: KeyboardEvent) => {
+      if (submitted) return;
+      const inTextInput = e.target === inputRef.current;
+
+      if (e.key === "Enter" && !e.shiftKey) {
+        e.preventDefault();
+        handleSubmit();
+        return;
+      }
+
+      // Number keys 1-4 select options — skip when typing in the "Other" field
+      if (!inTextInput) {
+        const num = parseInt(e.key, 10);
+        if (num >= 1 && num <= options.length + 1) {
+          e.preventDefault();
+          setSelected(num - 1);
+        }
+      }
+    };
+
+    window.addEventListener("keydown", handleKeyDown);
+    return () => window.removeEventListener("keydown", handleKeyDown);
+  }, [handleSubmit, submitted, options.length]);
+
+  if (submitted) return null;
+
+  return (
+    <div ref={containerRef} className="p-4">
+      <div className="bg-card border border-border rounded-xl shadow-sm overflow-hidden">
+        {/* Header / Question */}
+        <div className="px-5 pt-4 pb-3 flex items-start gap-3">
+          <div className="w-7 h-7 rounded-lg bg-primary/10 border border-primary/20 flex items-center justify-center flex-shrink-0 mt-0.5">
+            <MessageCircleQuestion className="w-3.5 h-3.5 text-primary" />
+          </div>
+          <p className="text-sm font-medium text-foreground leading-relaxed flex-1">{question}</p>
+          {onDismiss && (
+            <button
+              onClick={onDismiss}
+              className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors flex-shrink-0"
+            >
+              <X className="w-4 h-4" />
+            </button>
+          )}
+        </div>
+
+        {/* Options */}
+        <div className="px-5 pb-3 space-y-1.5">
+          {options.map((option, idx) => (
+            <button
+              key={idx}
+              onClick={() => setSelected(idx)}
+              className={`w-full text-left px-4 py-2.5 rounded-lg border text-sm transition-colors ${
+                selected === idx
+                  ? "border-primary bg-primary/10 text-foreground"
+                  : "border-border/60 bg-muted/20 text-foreground hover:border-primary/40 hover:bg-muted/40"
+              }`}
+            >
+              <span className="text-xs text-muted-foreground mr-2">{idx + 1}.</span>
+              {option}
+            </button>
+          ))}
+
+          {/* "Other" — inline text input that auto-selects on focus */}
+          <input
+            ref={inputRef}
+            type="text"
+            value={customText}
+            onFocus={() => setSelected(otherIndex)}
+            onChange={(e) => {
+              setSelected(otherIndex);
+              setCustomText(e.target.value);
+            }}
+            placeholder="Type a custom response..."
+            className={`w-full px-4 py-2.5 rounded-lg border border-dashed text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none ${
+              isOtherSelected
+                ? "border-primary bg-primary/10 text-foreground"
+                : "border-border text-muted-foreground hover:border-primary/40"
+            }`}
+          />
+        </div>
+
+        {/* Submit */}
+        <div className="px-5 pb-4">
+          <button
+            onClick={handleSubmit}
+            disabled={!canSubmit}
+            className="w-full flex items-center justify-center gap-2 py-2.5 rounded-lg text-sm font-medium bg-primary text-primary-foreground hover:bg-primary/90 disabled:opacity-30 disabled:cursor-not-allowed transition-colors"
+          >
+            <Send className="w-3.5 h-3.5" />
+            Submit
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
@@ -167,3 +167,12 @@
 .animate-in.slide-in-from-right {
  animation: slide-in-from-right 0.2s ease-out;
 }
+
+/* Slide-up animation for question widget */
+@keyframes slide-in-from-bottom {
+  from { transform: translateY(16px); opacity: 0; }
+  to { transform: translateY(0); opacity: 1; }
+}
+.animate-in.slide-in-from-bottom {
+  animation: slide-in-from-bottom 0.25s ease-out;
+}
@@ -8,6 +8,7 @@ import TopBar from "@/components/TopBar";
 import { TAB_STORAGE_KEY, loadPersistedTabs, savePersistedTabs, type PersistedTabState } from "@/lib/tab-persistence";
 import NodeDetailPanel from "@/components/NodeDetailPanel";
 import CredentialsModal, { type Credential, createFreshCredentials, cloneCredentials, allRequiredCredentialsMet, clearCredentialCache } from "@/components/CredentialsModal";
+
 import { agentsApi } from "@/api/agents";
 import { executionApi } from "@/api/execution";
 import { graphsApi } from "@/api/graphs";
@@ -249,8 +250,18 @@ interface AgentBackendState {
  subagentReports: { subagent_id: string; message: string; data?: Record<string, unknown>; timestamp: string }[];
  isTyping: boolean;
  isStreaming: boolean;
+  /** True only when the queen's LLM is actively processing (not worker) */
+  queenIsTyping: boolean;
+  /** True only when a worker's LLM is actively processing (not queen) */
+  workerIsTyping: boolean;
  llmSnapshots: Record<string, string>;
  activeToolCalls: Record<string, { name: string; done: boolean; streamId: string }>;
+  /** Structured question text from ask_user with options */
+  pendingQuestion: string | null;
+  /** Predefined choices from ask_user (1-3 items); UI appends "Other" */
+  pendingOptions: string[] | null;
+  /** Whether the pending question came from queen or worker */
+  pendingQuestionSource: "queen" | "worker" | null;
 }

 function defaultAgentState(): AgentBackendState {
@@ -274,8 +285,13 @@ function defaultAgentState(): AgentBackendState {
    subagentReports: [],
    isTyping: false,
    isStreaming: false,
+    queenIsTyping: false,
+    workerIsTyping: false,
    llmSnapshots: {},
    activeToolCalls: {},
+    pendingQuestion: null,
+    pendingOptions: null,
+    pendingQuestionSource: null,
  };
 }

@@ -355,8 +371,14 @@ export default function Workspace() {
    if (persisted) {
      const restored = { ...persisted.activeSessionByAgent };
      const urlSessions = sessionsByAgent[initialAgent];
-      if (urlSessions?.length && !restored[initialAgent]) {
-        restored[initialAgent] = urlSessions[0].id;
+      if (urlSessions?.length) {
+        // When a prompt was submitted from home, activate the newly created
+        // session (last in array) instead of the previously active one.
+        if (initialPrompt && hasExplicitAgent) {
+          restored[initialAgent] = urlSessions[urlSessions.length - 1].id;
+        } else if (!restored[initialAgent]) {
+          restored[initialAgent] = urlSessions[0].id;
+        }
      }
      return restored;
    }
@@ -635,7 +657,11 @@ export default function Workspace() {
                  const result = await sessionsApi.get(existingSessionId);
                  if (result.loading) continue;
                  return result as LiveSession;
-                } catch {
+                } catch (pollErr) {
+                  // 404 = agent failed to load and was cleaned up — stop immediately
+                  if (pollErr instanceof ApiError && pollErr.status === 404) {
+                    throw new Error("Agent failed to load");
+                  }
                  if (i === maxAttempts - 1) throw loadErr;
                }
              }
@@ -930,7 +956,7 @@ export default function Workspace() {
    } catch {
      // Best-effort — queen may have already finished
    }
-    updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
+    updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false });
  }, [agentStates, activeWorker, updateAgentState]);

  // --- Node log helper (writes into agentStates) ---
@@ -1013,7 +1039,7 @@ export default function Workspace() {
        case "execution_started":
          if (isQueen) {
            turnCounterRef.current[turnKey] = currentTurn + 1;
-            updateAgentState(agentType, { isTyping: true });
+            updateAgentState(agentType, { isTyping: true, queenIsTyping: true });
          } else {
            // Warn if prior LLM snapshots are being dropped (edge case: execution_completed never arrived)
            const priorSnapshots = agentStates[agentType]?.llmSnapshots || {};
@@ -1024,6 +1050,7 @@ export default function Workspace() {
            updateAgentState(agentType, {
              isTyping: true,
              isStreaming: false,
+              workerIsTyping: true,
              awaitingInput: false,
              workerRunState: "running",
              currentExecutionId: event.execution_id || agentStates[agentType]?.currentExecutionId || null,
@@ -1031,6 +1058,9 @@ export default function Workspace() {
              subagentReports: [],
              llmSnapshots: {},
              activeToolCalls: {},
+              pendingQuestion: null,
+              pendingOptions: null,
+              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping", "complete", "error"], "pending");
          }
@@ -1038,7 +1068,7 @@ export default function Workspace() {

        case "execution_completed":
          if (isQueen) {
-            updateAgentState(agentType, { isTyping: false });
+            updateAgentState(agentType, { isTyping: false, queenIsTyping: false });
          } else {
            // Flush any remaining LLM snapshots before clearing state
            const completedSnapshots = agentStates[agentType]?.llmSnapshots || {};
@@ -1050,11 +1080,15 @@ export default function Workspace() {
            updateAgentState(agentType, {
              isTyping: false,
              isStreaming: false,
+              workerIsTyping: false,
              awaitingInput: false,
              workerInputMessageId: null,
              workerRunState: "idle",
              currentExecutionId: null,
              llmSnapshots: {},
+              pendingQuestion: null,
+              pendingOptions: null,
+              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping"], "complete");

@@ -1079,7 +1113,7 @@ export default function Workspace() {

          // Mark streaming when LLM text is actively arriving
          if (event.type === "llm_text_delta" || event.type === "client_output_delta") {
-            updateAgentState(agentType, { isStreaming: true });
+            updateAgentState(agentType, { isStreaming: true, ...(isQueen ? {} : { workerIsTyping: false }) });
          }

          if (event.type === "llm_text_delta" && !isQueen && event.node_id) {
@@ -1101,8 +1135,41 @@ export default function Workspace() {

          if (event.type === "client_input_requested") {
            console.log('[CLIENT_INPUT_REQ] stream_id:', streamId, 'isQueen:', isQueen, 'node_id:', event.node_id, 'prompt:', (event.data?.prompt as string)?.slice(0, 80), 'agentType:', agentType);
+            const rawOptions = event.data?.options;
+            const options = Array.isArray(rawOptions) ? (rawOptions as string[]) : null;
            if (isQueen) {
-              updateAgentState(agentType, { awaitingInput: true, isTyping: false, isStreaming: false, queenBuilding: false });
+              const prompt = (event.data?.prompt as string) || "";
+              const isAutoBlock = !prompt && !options;
+              // Queen auto-block (empty prompt, no options) should not
+              // overwrite a pending worker question — the worker's
+              // QuestionWidget must stay visible.  Use the updater form
+              // to read the latest state and avoid stale-closure races
+              // when worker and queen events arrive in the same batch.
+              setAgentStates(prev => {
+                const cur = prev[agentType] || defaultAgentState();
+                const workerQuestionActive = cur.pendingQuestionSource === "worker";
+                if (isAutoBlock && workerQuestionActive) {
+                  return { ...prev, [agentType]: {
+                    ...cur,
+                    awaitingInput: true,
+                    isTyping: false,
+                    isStreaming: false,
+                    queenIsTyping: false,
+                    queenBuilding: false,
+                  }};
+                }
+                return { ...prev, [agentType]: {
+                  ...cur,
+                  awaitingInput: true,
+                  isTyping: false,
+                  isStreaming: false,
+                  queenIsTyping: false,
+                  queenBuilding: false,
+                  pendingQuestion: prompt || null,
+                  pendingOptions: options,
+                  pendingQuestionSource: "queen",
+                }};
+              });
            } else {
              // Worker input request.
              // If the prompt is non-empty (explicit ask_user), create a visible
@@ -1130,18 +1197,22 @@ export default function Workspace() {
                awaitingInput: true,
                isTyping: false,
                isStreaming: false,
+                queenIsTyping: false,
+                pendingQuestion: prompt || null,
+                pendingOptions: options,
+                pendingQuestionSource: options ? "worker" : null,
              });
            }
          }
          if (event.type === "execution_paused") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, awaitingInput: false, workerInputMessageId: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              markAllNodesAs(agentType, ["running", "looping"], "pending");
            }
          }
          if (event.type === "execution_failed") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, awaitingInput: false, workerInputMessageId: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              if (event.node_id) {
@@ -1173,7 +1244,11 @@ export default function Workspace() {

        case "node_loop_iteration":
          turnCounterRef.current[turnKey] = currentTurn + 1;
-          updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false });
+          if (isQueen) {
+            updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+          } else {
+            updateAgentState(agentType, { isStreaming: false, workerIsTyping: true, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+          }
          if (!isQueen && event.node_id) {
            const pendingText = agentStates[agentType]?.llmSnapshots[event.node_id];
            if (pendingText?.trim()) {
@@ -1577,6 +1652,11 @@ export default function Workspace() {
      return;
    }

+    // If queen has a pending question widget, dismiss it when user types directly
+    if (agentStates[activeWorker]?.pendingQuestionSource === "queen") {
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    }
+
    const userMsg: ChatMessage = {
      id: makeId(), agent: "You", agentColor: "",
      content: text, timestamp: "", type: "user", thread, createdAt: Date.now(),
@@ -1587,7 +1667,7 @@ export default function Workspace() {
        s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
      ),
    }));
-    updateAgentState(activeWorker, { isTyping: true });
+    updateAgentState(activeWorker, { isTyping: true, queenIsTyping: true });

    if (state?.sessionId && state?.ready) {
      executionApi.chat(state.sessionId, text).catch((err: unknown) => {
@@ -1603,7 +1683,7 @@ export default function Workspace() {
            s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
          ),
        }));
-        updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
+        updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false });
      });
    } else {
      const errorMsg: ChatMessage = {
@@ -1640,7 +1720,7 @@ export default function Workspace() {
    }));

    // Clear awaiting state optimistically
-    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true });
+    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });

    executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
      const errMsg = err instanceof Error ? err.message : String(err);
@@ -1659,6 +1739,90 @@ export default function Workspace() {
    });
  }, [activeWorker, activeSession, agentStates, updateAgentState]);

+  // --- handleWorkerQuestionAnswer: route predefined answers direct to worker, "Other" through queen ---
+  const handleWorkerQuestionAnswer = useCallback((answer: string, isOther: boolean) => {
+    if (!activeSession) return;
+    const state = agentStates[activeWorker];
+    const question = state?.pendingQuestion || "";
+    const opts = state?.pendingOptions;
+
+    if (isOther) {
+      // "Other" free-text → route through queen for evaluation
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+      if (question && opts && state?.sessionId && state?.ready) {
+        const formatted = `[Worker asked: "${question}" | Options: ${opts.join(", ")}]\nUser answered: "${answer}"`;
+        const userMsg: ChatMessage = {
+          id: makeId(), agent: "You", agentColor: "",
+          content: answer, timestamp: "", type: "user", thread: activeWorker, createdAt: Date.now(),
+        };
+        setSessionsByAgent(prev => ({
+          ...prev,
+          [activeWorker]: prev[activeWorker].map(s =>
+            s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
+          ),
+        }));
+        updateAgentState(activeWorker, { isTyping: true, queenIsTyping: true });
+        executionApi.chat(state.sessionId, formatted).catch((err: unknown) => {
+          const errMsg = err instanceof Error ? err.message : String(err);
+          const errorChatMsg: ChatMessage = {
+            id: makeId(), agent: "System", agentColor: "",
+            content: `Failed to send message: ${errMsg}`,
+            timestamp: "", type: "system", thread: activeWorker, createdAt: Date.now(),
+          };
+          setSessionsByAgent(prev => ({
+            ...prev,
+            [activeWorker]: prev[activeWorker].map(s =>
+              s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
+            ),
+          }));
+          updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false });
+        });
+      } else {
+        handleSend(answer, activeWorker);
+      }
+    } else {
+      // Predefined option → send directly to worker
+      handleWorkerReply(answer);
+      // Queue context for queen (fire-and-forget, no LLM response triggered)
+      if (question && state?.sessionId && state?.ready) {
+        const notification = `[Worker asked: "${question}" | User selected: "${answer}"]`;
+        executionApi.queenContext(state.sessionId, notification).catch(() => {});
+      }
+    }
+  }, [activeWorker, activeSession, agentStates, handleWorkerReply, handleSend, updateAgentState, setSessionsByAgent]);
+
+  // --- handleQueenQuestionAnswer: submit queen's own question answer via /chat ---
+  // The queen asked the question herself, so she already has context — just send the raw answer.
+  const handleQueenQuestionAnswer = useCallback((answer: string, _isOther: boolean) => {
+    updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    handleSend(answer, activeWorker);
+  }, [activeWorker, handleSend, updateAgentState]);
+
+  // --- handleQuestionDismiss: user closed the question widget without answering ---
+  // Injects a dismiss signal so the blocked node can continue.
+  const handleQuestionDismiss = useCallback(() => {
+    const state = agentStates[activeWorker];
+    if (!state?.sessionId) return;
+    const source = state.pendingQuestionSource;
+    const question = state.pendingQuestion || "";
+
+    // Clear UI state immediately
+    updateAgentState(activeWorker, {
+      pendingQuestion: null,
+      pendingOptions: null,
+      pendingQuestionSource: null,
+      awaitingInput: false,
+    });
+
+    // Unblock the waiting node with a dismiss signal
+    const dismissMsg = `[User dismissed the question: "${question}"]`;
+    if (source === "worker") {
+      executionApi.workerInput(state.sessionId, dismissMsg).catch(() => {});
+    } else {
+      executionApi.chat(state.sessionId, dismissMsg).catch(() => {});
+    }
+  }, [agentStates, activeWorker, updateAgentState]);
+
  const handleLoadAgent = useCallback(async (agentPath: string) => {
    const state = agentStates[activeWorker];
    if (!state?.sessionId) return;
@@ -1873,17 +2037,23 @@ export default function Workspace() {
                messages={activeSession.messages}
                onSend={handleSend}
                onCancel={handleCancelQueen}
-                onWorkerReply={handleWorkerReply}
                activeThread={activeWorker}
-                isWaiting={(activeAgentState?.isTyping && !activeAgentState?.isStreaming) ?? false}
-                workerAwaitingInput={
-                  (activeAgentState?.awaitingInput && activeAgentState?.workerRunState === "running") ?? false
-                }
+                isWaiting={(activeAgentState?.queenIsTyping && !activeAgentState?.isStreaming) ?? false}
+                isWorkerWaiting={(activeAgentState?.workerIsTyping && !activeAgentState?.isStreaming) ?? false}
+                isBusy={activeAgentState?.queenIsTyping ?? false}
                disabled={
                  (activeAgentState?.loading ?? true) ||
                  !(activeAgentState?.queenReady)
                }
                queenMode={activeAgentState?.queenMode ?? "building"}
+                pendingQuestion={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestion : null}
+                pendingOptions={activeAgentState?.awaitingInput ? activeAgentState.pendingOptions : null}
+                onQuestionSubmit={
+                  activeAgentState?.pendingQuestionSource === "queen"
+                    ? handleQueenQuestionAnswer
+                    : handleWorkerQuestionAnswer
+                }
+                onQuestionDismiss={handleQuestionDismiss}
              />
            )}
          </div>
@@ -578,7 +578,11 @@ class TestClientFacingBlocking:
        """signal_shutdown should unblock a waiting client_facing node."""
        llm = MockStreamingLLM(
            scenarios=[
-                tool_call_scenario("ask_user", {"question": "Waiting..."}, tool_use_id="ask_1"),
+                tool_call_scenario(
+                    "ask_user",
+                    {"question": "Waiting...", "options": ["Continue", "Stop"]},
+                    tool_use_id="ask_1",
+                ),
            ]
        )
        bus = EventBus()
@@ -600,7 +604,11 @@ class TestClientFacingBlocking:
        """CLIENT_INPUT_REQUESTED should be published when ask_user blocks."""
        llm = MockStreamingLLM(
            scenarios=[
-                tool_call_scenario("ask_user", {"question": "Hello!"}, tool_use_id="ask_1"),
+                tool_call_scenario(
+                    "ask_user",
+                    {"question": "Hello!", "options": ["Yes", "No"]},
+                    tool_use_id="ask_1",
+                ),
            ]
        )
        bus = EventBus()
@@ -796,7 +804,7 @@ class TestClientFacingExpectingWork:

        async def user_then_shutdown():
            await asyncio.sleep(0.05)
-            await node.inject_event("furwise.app")
+            await node.inject_event("furwise.app", is_client_input=True)
            # Node should auto-block on "Monitoring..." text.
            # Give it time to reach the block, then shutdown.
            await asyncio.sleep(0.1)
@@ -2027,3 +2035,65 @@ class TestExecutionId:
            node_spec=node_spec, memory=SharedMemory(), goal=goal, input_data={}
        )
        assert ctx.execution_id == ""
+
+
+# ---------------------------------------------------------------------------
+# Subagent memory snapshot includes accumulator outputs
+# ---------------------------------------------------------------------------
+
+
+class TestSubagentAccumulatorMemory:
+    """Verify that subagent memory construction merges accumulator outputs
+    and includes the subagent's input_keys in read permissions."""
+
+    def test_accumulator_values_merged_into_parent_data(self):
+        """Keys from OutputAccumulator should appear in subagent memory."""
+        # Simulate what _execute_subagent does internally:
+        # parent shared memory has user_request but NOT tweet_content
+        parent_memory = SharedMemory()
+        parent_memory.write("user_request", "post a joke")
+        parent_data = parent_memory.read_all()  # {"user_request": "post a joke"}
+
+        # Accumulator has tweet_content (set via set_output before delegation)
+        acc = OutputAccumulator(values={"tweet_content": "Hello world!"})
+
+        # Merge accumulator outputs (the fix)
+        for key, value in acc.to_dict().items():
+            if key not in parent_data:
+                parent_data[key] = value
+
+        # Build subagent memory
+        subagent_memory = SharedMemory()
+        for key, value in parent_data.items():
+            subagent_memory.write(key, value, validate=False)
+
+        subagent_input_keys = ["tweet_content"]
+        read_keys = set(parent_data.keys()) | set(subagent_input_keys)
+        scoped = subagent_memory.with_permissions(
+            read_keys=list(read_keys), write_keys=[]
+        )
+
+        # This would have raised PermissionError before the fix
+        assert scoped.read("tweet_content") == "Hello world!"
+        assert scoped.read("user_request") == "post a joke"
+
+    def test_input_keys_allowed_even_if_not_in_data(self):
+        """Subagent input_keys should be in read permissions even if the
+        key doesn't exist in memory (returns None instead of PermissionError)."""
+        parent_memory = SharedMemory()
+        parent_memory.write("user_request", "hi")
+        parent_data = parent_memory.read_all()
+
+        subagent_memory = SharedMemory()
+        for key, value in parent_data.items():
+            subagent_memory.write(key, value, validate=False)
+
+        # input_keys includes "tweet_content" which isn't in parent_data
+        read_keys = set(parent_data.keys()) | {"tweet_content"}
+        scoped = subagent_memory.with_permissions(
+            read_keys=list(read_keys), write_keys=[]
+        )
+
+        # Should return None (not raise PermissionError)
+        assert scoped.read("tweet_content") is None
+        assert scoped.read("user_request") == "hi"
@@ -2,11 +2,12 @@

 from __future__ import annotations

+import json
 from typing import Any

 import pytest

-from framework.graph.conversation import Message, NodeConversation
+from framework.graph.conversation import Message, NodeConversation, extract_tool_call_history
 from framework.storage.conversation_store import FileConversationStore

 # ---------------------------------------------------------------------------
@@ -930,3 +931,565 @@ class TestConversationIntegration:
        assert restored.next_seq == 4
        assert restored.messages[0].content == "new msg"
        assert restored.messages[0].seq == 2
+
+
+# ---------------------------------------------------------------------------
+# Helpers for aggressive compaction tests
+# ---------------------------------------------------------------------------
+
+
+def _make_tool_call(call_id: str, name: str, args: dict) -> dict:
+    return {
+        "id": call_id,
+        "type": "function",
+        "function": {"name": name, "arguments": json.dumps(args)},
+    }
+
+
+async def _build_tool_heavy_conversation(
+    store: MockConversationStore | None = None,
+) -> NodeConversation:
+    """Build a conversation with many tool call pairs.
+
+    Layout: user msg, then 5x (assistant with append_data tool_call + tool result),
+    then 1x (assistant with set_output tool_call + tool result), then user msg + assistant msg.
+    """
+    conv = NodeConversation(store=store)
+    await conv.add_user_message("Process the data")  # seq 0
+
+    for i in range(5):
+        args = {"filename": "output.html", "content": "x" * 500}
+        tc = [_make_tool_call(f"call_{i}", "append_data", args)]
+        conv._messages.append(Message(
+            seq=conv._next_seq, role="assistant",
+            content=f"Appending part {i}", tool_calls=tc,
+        ))
+        if store:
+            await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+        conv._next_seq += 1
+        conv._messages.append(Message(
+            seq=conv._next_seq, role="tool",
+            content='{"success": true}', tool_use_id=f"call_{i}",
+        ))
+        if store:
+            await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+        conv._next_seq += 1
+
+    # set_output call — must be protected
+    so_tc = [_make_tool_call("call_so", "set_output", {"key": "result", "value": "done"})]
+    conv._messages.append(
+        Message(seq=conv._next_seq, role="assistant", content="Setting output", tool_calls=so_tc)
+    )
+    if store:
+        await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+    conv._next_seq += 1
+    conv._messages.append(Message(
+        seq=conv._next_seq, role="tool",
+        content="Output 'result' set successfully.",
+        tool_use_id="call_so",
+    ))
+    if store:
+        await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+    conv._next_seq += 1
+
+    # Recent messages
+    await conv.add_user_message("Continue")
+    await conv.add_assistant_message("Working on it")
+    return conv
+
+
+# ---------------------------------------------------------------------------
+# Tests: aggressive structural compaction
+# ---------------------------------------------------------------------------
+
+
+class TestAggressiveStructuralCompaction:
+    @pytest.mark.asyncio
+    async def test_aggressive_collapses_tool_pairs(self, tmp_path):
+        """Aggressive mode should collapse non-essential tool pairs into a summary."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=True,
+        )
+
+        # The 5 append_data pairs (10 msgs) + 1 user msg should be collapsed.
+        # Remaining: ref_msg + set_output pair (2 msgs) + 2 recent = 5
+        assert conv.message_count == 5
+        assert conv.messages[0].role == "user"  # ref message
+        assert "TOOLS ALREADY CALLED" in conv.messages[0].content
+        assert "append_data (5x)" in conv.messages[0].content
+
+        # set_output pair should be preserved
+        assert conv.messages[1].role == "assistant"
+        assert conv.messages[1].tool_calls is not None
+        assert conv.messages[1].tool_calls[0]["function"]["name"] == "set_output"
+        assert conv.messages[2].role == "tool"
+
+        # Recent messages intact
+        assert conv.messages[3].content == "Continue"
+        assert conv.messages[4].content == "Working on it"
+
+    @pytest.mark.asyncio
+    async def test_aggressive_preserves_set_output(self, tmp_path):
+        """set_output tool calls are always protected in aggressive mode."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=True,
+        )
+
+        # Find all tool calls in remaining messages
+        tool_names = []
+        for msg in conv.messages:
+            if msg.tool_calls:
+                for tc in msg.tool_calls:
+                    tool_names.append(tc["function"]["name"])
+
+        assert "set_output" in tool_names
+        # append_data should NOT be in remaining messages (collapsed)
+        assert "append_data" not in tool_names
+
+    @pytest.mark.asyncio
+    async def test_aggressive_preserves_errors(self, tmp_path):
+        """Error tool results are always protected in aggressive mode."""
+        conv = NodeConversation()
+        await conv.add_user_message("Start")
+
+        # Regular tool call
+        tc1 = [_make_tool_call("call_ok", "web_search", {"query": "test"})]
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc1)
+        )
+        conv._next_seq += 1
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="tool", content="results", tool_use_id="call_ok")
+        )
+        conv._next_seq += 1
+
+        # Error tool call
+        tc2 = [_make_tool_call("call_err", "web_scrape", {"url": "http://broken.com"})]
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc2)
+        )
+        conv._next_seq += 1
+        conv._messages.append(
+            Message(
+                seq=conv._next_seq, role="tool", content="Connection timeout",
+                tool_use_id="call_err", is_error=True,
+            )
+        )
+        conv._next_seq += 1
+
+        await conv.add_user_message("Next")
+        await conv.add_assistant_message("OK")
+
+        spill = str(tmp_path)
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=True,
+        )
+
+        # Error pair should be preserved
+        error_msgs = [m for m in conv.messages if m.role == "tool" and m.is_error]
+        assert len(error_msgs) == 1
+        assert error_msgs[0].content == "Connection timeout"
+
+    @pytest.mark.asyncio
+    async def test_standard_mode_keeps_all_tool_pairs(self, tmp_path):
+        """Non-aggressive mode should keep all tool pairs (existing behavior)."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=False,
+        )
+
+        # All 6 tool pairs (12 msgs) should be kept as structural.
+        # Removed: 1 user msg (freeform). Remaining: ref + 12 structural + 2 recent = 15
+        assert conv.message_count == 15
+
+    @pytest.mark.asyncio
+    async def test_two_pass_sequence(self, tmp_path):
+        """Standard pass then aggressive pass produces valid result."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        # Pass 1: standard
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2,
+        )
+        after_standard = conv.message_count
+        assert after_standard == 15  # all structural kept
+
+        # Pass 2: aggressive
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=True,
+        )
+        after_aggressive = conv.message_count
+        assert after_aggressive < after_standard
+        # ref + set_output pair + 2 recent = 5
+        assert after_aggressive == 5
+
+    @pytest.mark.asyncio
+    async def test_aggressive_persists_correctly(self, tmp_path):
+        """Aggressive compaction correctly updates the store."""
+        store = MockConversationStore()
+        conv = await _build_tool_heavy_conversation(store=store)
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill, keep_recent=2, aggressive=True,
+        )
+
+        # Verify store state matches in-memory state
+        parts = await store.read_parts()
+        assert len(parts) == conv.message_count
+
+
+class TestExtractToolCallHistory:
+    def test_basic_extraction(self):
+        msgs = [
+            Message(seq=0, role="assistant", content="", tool_calls=[
+                _make_tool_call("c1", "web_search", {"query": "python async"}),
+            ]),
+            Message(seq=1, role="tool", content="results", tool_use_id="c1"),
+            Message(seq=2, role="assistant", content="", tool_calls=[
+                _make_tool_call("c2", "save_data", {"filename": "output.txt", "content": "data"}),
+            ]),
+            Message(seq=3, role="tool", content="saved", tool_use_id="c2"),
+        ]
+        result = extract_tool_call_history(msgs)
+        assert "web_search (1x)" in result
+        assert "save_data (1x)" in result
+        assert "FILES SAVED: output.txt" in result
+
+    def test_errors_included(self):
+        msgs = [
+            Message(
+                seq=0, role="tool", content="Connection refused",
+                is_error=True, tool_use_id="c1",
+            ),
+        ]
+        result = extract_tool_call_history(msgs)
+        assert "ERRORS" in result
+        assert "Connection refused" in result
+
+    def test_empty_messages(self):
+        assert extract_tool_call_history([]) == ""
+
+
+# ---------------------------------------------------------------------------
+# Tests for _is_context_too_large_error
+# ---------------------------------------------------------------------------
+
+
+class TestIsContextTooLargeError:
+    def test_context_window_class_name(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        class ContextWindowExceededError(Exception):
+            pass
+
+        assert _is_context_too_large_error(ContextWindowExceededError("x"))
+
+    def test_openai_context_length(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = RuntimeError(
+            "This model's maximum context length is 128000 tokens"
+        )
+        assert _is_context_too_large_error(err)
+
+    def test_anthropic_too_long(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = RuntimeError("prompt is too long: 150000 tokens > 100000")
+        assert _is_context_too_large_error(err)
+
+    def test_generic_exceeds_limit(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = ValueError("Request exceeds token limit")
+        assert _is_context_too_large_error(err)
+
+    def test_unrelated_error(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        assert not _is_context_too_large_error(ValueError("connection refused"))
+        assert not _is_context_too_large_error(RuntimeError("timeout"))
+
+
+# ---------------------------------------------------------------------------
+# Tests for _format_messages_for_summary
+# ---------------------------------------------------------------------------
+
+
+class TestFormatMessagesForSummary:
+    def test_user_assistant_messages(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        msgs = [
+            Message(seq=0, role="user", content="Hello world"),
+            Message(seq=1, role="assistant", content="Hi there"),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "[user]: Hello world" in result
+        assert "[assistant]: Hi there" in result
+
+    def test_tool_result_truncated(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        msgs = [
+            Message(seq=0, role="tool", content="x" * 1000, tool_use_id="c1"),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "[tool result]:" in result
+        assert "..." in result
+        # Should be truncated to 500 + "..."
+        assert len(result) < 600
+
+    def test_assistant_with_tool_calls(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
+        msgs = [
+            Message(seq=0, role="assistant", content="Searching", tool_calls=tc),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "web_search" in result
+        assert "[assistant (calls:" in result
+
+
+# ---------------------------------------------------------------------------
+# Tests for _llm_compact (recursive binary-search)
+# ---------------------------------------------------------------------------
+
+
+class TestLlmCompact:
+    """Test the recursive LLM compaction with mock LLM."""
+
+    def _make_node(self):
+        """Create a minimal EventLoopNode for testing."""
+        from framework.graph.event_loop_node import EventLoopNode, LoopConfig
+
+        config = LoopConfig(max_history_tokens=32000)
+        node = EventLoopNode.__new__(EventLoopNode)
+        node._config = config
+        node._event_bus = None
+        node._judge = None
+        node._approval_callback = None
+        node._tool_executor = None
+        node._adaptive_learner = None
+        # Set class-level constants (already on class, but explicit)
+        return node
+
+    def _make_ctx(self, llm_responses=None, llm_error=None):
+        """Create a mock NodeContext with controllable LLM."""
+        from unittest.mock import AsyncMock, MagicMock
+
+        from framework.graph.node import NodeSpec
+
+        spec = NodeSpec(
+            id="test",
+            name="Test Node",
+            description="A test node",
+            node_type="event_loop",
+            input_keys=[],
+            output_keys=["result"],
+        )
+
+        ctx = MagicMock()
+        ctx.node_spec = spec
+        ctx.node_id = "test"
+        ctx.stream_id = "test"
+        ctx.continuous_mode = False
+        ctx.runtime_logger = None
+
+        mock_llm = AsyncMock()
+        if llm_error:
+            mock_llm.acomplete.side_effect = llm_error
+        elif llm_responses:
+            responses = []
+            for text in llm_responses:
+                resp = MagicMock()
+                resp.content = text
+                responses.append(resp)
+            mock_llm.acomplete.side_effect = responses
+        else:
+            resp = MagicMock()
+            resp.content = "Summary of conversation."
+            mock_llm.acomplete.return_value = resp
+
+        ctx.llm = mock_llm
+        return ctx
+
+    @pytest.mark.asyncio
+    async def test_single_call_success(self):
+        node = self._make_node()
+        ctx = self._make_ctx()
+        msgs = [
+            Message(seq=0, role="user", content="Do something"),
+            Message(seq=1, role="assistant", content="Done"),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "Summary of conversation." in result
+        ctx.llm.acomplete.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_context_too_large_triggers_split(self):
+        """When LLM raises context error, should split and retry."""
+        from unittest.mock import MagicMock
+
+        node = self._make_node()
+
+        call_count = 0
+
+        async def mock_acomplete(**kwargs):
+            nonlocal call_count
+            call_count += 1
+            # First call with full messages → fail
+            # Subsequent calls with smaller chunks → succeed
+            if call_count == 1:
+                raise RuntimeError(
+                    "This model's maximum context length is 128000 tokens"
+                )
+            resp = MagicMock()
+            resp.content = f"Summary part {call_count}"
+            return resp
+
+        ctx = self._make_ctx()
+        ctx.llm.acomplete = mock_acomplete
+
+        msgs = [
+            Message(seq=i, role="user", content=f"Message {i}")
+            for i in range(10)
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        # Should have split and produced two summaries
+        assert "Summary part" in result
+        assert call_count >= 3  # 1 failure + 2 successful halves
+
+    @pytest.mark.asyncio
+    async def test_non_context_error_propagates(self):
+        """Non-context errors should propagate, not trigger splitting."""
+        node = self._make_node()
+        ctx = self._make_ctx(llm_error=ValueError("API key invalid"))
+        msgs = [
+            Message(seq=0, role="user", content="Hello"),
+            Message(seq=1, role="assistant", content="Hi"),
+        ]
+        with pytest.raises(ValueError, match="API key invalid"):
+            await node._llm_compact(ctx, msgs, None)
+
+    @pytest.mark.asyncio
+    async def test_proactive_split_for_large_input(self):
+        """Messages exceeding char limit should be split proactively."""
+        node = self._make_node()
+        # Lower the limit for testing
+        node._LLM_COMPACT_CHAR_LIMIT = 100
+
+        ctx = self._make_ctx(
+            llm_responses=["Part 1 summary", "Part 2 summary"],
+        )
+        msgs = [
+            Message(seq=0, role="user", content="x" * 80),
+            Message(seq=1, role="user", content="y" * 80),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "Part 1 summary" in result
+        assert "Part 2 summary" in result
+        # LLM should have been called twice (no failure, proactive split)
+        assert ctx.llm.acomplete.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_tool_history_appended_at_top_level(self):
+        """Tool history should only be appended at depth 0."""
+        node = self._make_node()
+        ctx = self._make_ctx()
+
+        tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
+        msgs = [
+            Message(seq=0, role="assistant", content="", tool_calls=tc),
+            Message(seq=1, role="tool", content="results", tool_use_id="c1"),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "TOOLS ALREADY CALLED" in result
+        assert "web_search" in result
+
+
+# ---------------------------------------------------------------------------
+# Orphaned tool result repair
+# ---------------------------------------------------------------------------
+
+
+class TestRepairOrphanedToolCalls:
+    """Test _repair_orphaned_tool_calls handles both directions."""
+
+    def test_orphaned_tool_result_dropped(self):
+        """Tool result with no matching tool_use should be dropped."""
+        msgs = [
+            # tool result with no preceding assistant tool_use
+            {"role": "tool", "tool_call_id": "orphan_1", "content": "stale result"},
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "hi"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        assert len(repaired) == 2
+        assert repaired[0]["role"] == "user"
+        assert repaired[1]["role"] == "assistant"
+
+    def test_valid_tool_pair_preserved(self):
+        """Tool result with matching tool_use should be kept."""
+        msgs = [
+            {"role": "user", "content": "search"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
+            },
+            {"role": "tool", "tool_call_id": "tc_1", "content": "results"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        assert len(repaired) == 3
+        assert repaired[2]["tool_call_id"] == "tc_1"
+
+    def test_orphaned_tool_use_gets_stub(self):
+        """Tool use with no following tool result gets a synthetic error stub."""
+        msgs = [
+            {"role": "user", "content": "search"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
+            },
+            # No tool result follows
+            {"role": "user", "content": "what happened?"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        # Should insert a synthetic tool result between assistant and user
+        assert len(repaired) == 4
+        assert repaired[2]["role"] == "tool"
+        assert repaired[2]["tool_call_id"] == "tc_1"
+        assert "interrupted" in repaired[2]["content"].lower()
+
+    def test_mixed_orphans(self):
+        """Both orphaned results and orphaned calls handled together."""
+        msgs = [
+            # Orphaned result (no matching tool_use)
+            {"role": "tool", "tool_call_id": "gone_1", "content": "old result"},
+            {"role": "user", "content": "try again"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_2", "function": {"name": "fetch", "arguments": "{}"}}],
+            },
+            # Missing result for tc_2
+            {"role": "user", "content": "done?"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        # orphaned result dropped, stub added for tc_2
+        roles = [m["role"] for m in repaired]
+        assert roles == ["user", "assistant", "tool", "user"]
+        assert repaired[2]["tool_call_id"] == "tc_2"
@@ -90,7 +90,7 @@ edges = [
        source="confirm-draft",
        target="intake",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="batch_complete == True and send_started == True and send_count >= 1 and sent_message_ids is not None and len(sent_message_ids) >= 1",
+        condition_expr="batch_complete == True",
        priority=1,
    ),
 ]
@@ -83,8 +83,8 @@ confirm_draft_node = NodeSpec(
    client_facing=True,
    max_node_visits=0,
    input_keys=["email_list", "filter_criteria"],
-    output_keys=["batch_complete", "restart", "send_started", "send_count", "sent_message_ids", "send_failures"],
-    nullable_output_keys=["batch_complete", "restart", "send_started", "send_count", "sent_message_ids", "send_failures"],
+    output_keys=["batch_complete", "restart"],
+    nullable_output_keys=["batch_complete", "restart"],
    success_criteria="User confirmed recipients and personalized replies sent for each.",
    system_prompt="""\
 You are a Gmail reply assistant. Present emails for confirmation, then send personalized replies.
@@ -99,22 +99,14 @@ You are a Gmail reply assistant. Present emails for confirmation, then send pers
 **STEP 2 — Handle user response:**

 If user CONFIRMS (says yes, go ahead, sounds good, etc.):
-1. Immediately call set_output("send_started", True) before any send tools.
-2. For EACH email in email_list, call gmail_reply_email with:
+For EACH email in email_list:
+1. Read the subject and snippet
+2. Use tone_guidance from filter_criteria + any user-specified preferences
+3. Call gmail_reply_email with:
   - message_id: the email's message_id
-   - html: personalized 2-4 sentence reply based on email context, using tone_guidance from filter_criteria and any new user preferences.
-3. Track send results during this run:
-   - send_count: number of successful gmail_reply_email calls
-   - sent_message_ids: list of message_ids successfully replied to
-   - send_failures: list of {"message_id": "...", "error": "..."} for failed sends
-4. REQUIRED completion gate:
-   - You MUST NOT set batch_complete=True unless send_started is True AND send_count >= 1 AND sent_message_ids is non-empty.
-   - If no sends succeeded, do NOT set batch_complete=True. Instead explain what failed and ask user whether to retry or restart.
-5. After successful sends, call set_output in a separate turn:
-   - set_output("send_count", <int>)
-   - set_output("sent_message_ids", <list>)
-   - set_output("send_failures", <list>)
-   - set_output("batch_complete", True)
+   - html: personalized 2-4 sentence reply based on email context
+   (The tool automatically handles recipient, subject, and threading)
+4. After all replies sent, call: set_output("batch_complete", True)

 If user wants to CHANGE LOGIC/FILTER (says change filter, different criteria, not these emails, wrong emails, etc.):
 1. Acknowledge their request