feat: allow multiple questions

Merge branch 'main' into feature/flowchart-linked-experimental
Merge pull request #6283 from vincentjiang777/main
2026-03-12 17:56:58 -07:00 · 2026-03-12 16:50:17 -07:00 · 2026-03-12 16:46:59 -07:00 · 2026-03-12 16:44:29 -07:00 · 2026-03-12 16:42:42 -07:00 · 2026-03-12 16:40:18 -07:00
244 changed files with 19267 additions and 9573 deletions
@@ -2,10 +2,6 @@

 Shared agent instructions for this workspace.

-## Deprecations
-
- **TUI is deprecated.** The terminal UI (`hive tui`) is no longer maintained. Use the browser-based interface (`hive open`) instead.
-
 ## Coding Agent Notes

 - 
@@ -111,7 +111,7 @@ This sets up:
 - **LLM provider** - Interactive default model configuration
 - All required Python dependencies with `uv`

- At last, it will initiate the open hive interface in your browser
+- Finally, it will open the Hive interface in your browser

 > **Tip:** To reopen the dashboard later, run `hive open` from the project directory.

@@ -125,18 +125,18 @@ Type the agent you want to build in the home input box

 ### Use Template Agents

-Click "Try a sample agent" and check the templates. You can run a templates directly or choose to build your version on top of the existing template.
+Click "Try a sample agent" and check the templates. You can run a template directly or choose to build your version on top of the existing template.

 ### Run Agents

-Now you can run an agent by selectiing the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
+Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.

 <img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/71c38206-2ad5-49aa-bde8-6698d0bc55f5" />

 ## Features

 - **Browser-Use** - Control the browser on your computer to achieve hard tasks
- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agent compelteing the jobs for you
+- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agents completing the jobs for you
 - **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
 - **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
 - **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
@@ -39,8 +39,8 @@ We consider security research conducted in accordance with this policy to be:
 ## Security Best Practices for Users

 1. **Keep Updated**: Always run the latest version
-2. **Secure Configuration**: Review `config.yaml` settings, especially in production
-3. **Environment Variables**: Never commit `.env` files or `config.yaml` with secrets
+2. **Secure Configuration**: Review your `~/.hive/configuration.json`, `.mcp.json`, and environment variable settings, especially in production
+3. **Environment Variables**: Never commit `.env` files or any configuration files that contain secrets
 4. **Network Security**: Use HTTPS in production, configure firewalls appropriately
 5. **Database Security**: Use strong passwords, limit network access

@@ -1,6 +1,6 @@
 # MCP Server Guide - Agent Building Tools

-> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_agent_package` tool, with underlying logic in `framework.builder.package_generator`.
+> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.

 This guide covers the MCP tools available for building goal-driven agents.

@@ -19,7 +19,7 @@ uv pip install -e .

 ## Agent Building

-Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_agent_package` tool and related utilities. The underlying package generation logic lives in `framework.builder.package_generator`.
+Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.

 See the [Getting Started Guide](../docs/getting-started.md) for building agents.

@@ -601,7 +601,7 @@ async def handle_ws(websocket):
        )
        node = EventLoopNode(
            event_bus=bus,
-            config=LoopConfig(max_iterations=10_000, max_history_tokens=32_000),
+            config=LoopConfig(max_iterations=10_000, max_context_tokens=32_000),
            conversation_store=STORE,
            tool_executor=tool_executor,
        )
@@ -1769,7 +1769,7 @@ async def _run_pipeline(websocket, initial_message: str):
            config=LoopConfig(
                max_iterations=30,
                max_tool_calls_per_turn=30,
-                max_history_tokens=64000,
+                max_context_tokens=64000,
                max_tool_result_chars=8_000,
                spillover_dir=str(_DATA_DIR),
            ),
@@ -752,7 +752,7 @@ async def _run_pipeline(websocket, topic: str):
        config=LoopConfig(
            max_iterations=20,
            max_tool_calls_per_turn=30,
-            max_history_tokens=32_000,
+            max_context_tokens=32_000,
        ),
        conversation_store=store_a,
        tool_executor=tool_executor,
@@ -850,7 +850,7 @@ async def _run_pipeline(websocket, topic: str):
        config=LoopConfig(
            max_iterations=10,
            max_tool_calls_per_turn=30,
-            max_history_tokens=32_000,
+            max_context_tokens=32_000,
        ),
        conversation_store=store_b,
    )
@@ -1258,7 +1258,7 @@ async def _run_org_pipeline(websocket, topic: str):
            config=LoopConfig(
                max_iterations=30,
                max_tool_calls_per_turn=30,
-                max_history_tokens=32_000,
+                max_context_tokens=32_000,
            ),
            conversation_store=store,
            tool_executor=executor,
@@ -22,7 +22,6 @@ The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
 See `framework.testing` for details.
 """

-from framework.builder.query import BuilderQuery
 from framework.llm import AnthropicProvider, LLMProvider
 from framework.runner import AgentOrchestrator, AgentRunner
 from framework.runtime.core import Runtime
@@ -51,8 +50,6 @@ __all__ = [
    "Problem",
    # Runtime
    "Runtime",
-    # Builder
-    "BuilderQuery",
    # LLM
    "LLMProvider",
    "AnthropicProvider",
@@ -10,13 +10,14 @@ from .agent import CredentialTesterAgent


 def setup_logging(verbose=False, debug=False):
+    from framework.observability import configure_logging
+
    if debug:
-        level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
+        configure_logging(level="DEBUG")
    elif verbose:
-        level, fmt = logging.INFO, "%(message)s"
+        configure_logging(level="INFO")
    else:
-        level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
-    logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
+        configure_logging(level="WARNING")


 def pick_account(agent: CredentialTesterAgent) -> dict | None:
@@ -51,42 +52,6 @@ def cli():
    pass


-@cli.command()
-@click.option("--verbose", "-v", is_flag=True)
-@click.option("--debug", is_flag=True)
-def tui(verbose, debug):
-    """Launch TUI to test a credential interactively."""
-    setup_logging(verbose=verbose, debug=debug)
-
-    try:
-        from framework.tui.app import AdenTUI
-    except ImportError:
-        click.echo("TUI requires 'textual'. Install with: pip install textual")
-        sys.exit(1)
-
-    agent = CredentialTesterAgent()
-    account = pick_account(agent)
-    if account is None:
-        sys.exit(1)
-
-    agent.select_account(account)
-    provider = account.get("provider", "?")
-    alias = account.get("alias", "?")
-    click.echo(f"\nTesting {provider}/{alias}...\n")
-
-    async def run_tui():
-        agent._setup()
-        runtime = agent._agent_runtime
-        await runtime.start()
-        try:
-            app = AdenTUI(runtime)
-            await app.run_async()
-        finally:
-            await runtime.stop()
-
-    asyncio.run(run_tui())
-
-
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
@@ -19,6 +19,7 @@ from __future__ import annotations
 from pathlib import Path
 from typing import TYPE_CHECKING

+from framework.config import get_max_context_tokens
 from framework.graph import Goal, NodeSpec, SuccessCriterion
 from framework.graph.checkpoint_config import CheckpointConfig
 from framework.graph.edge import GraphSpec
@@ -455,7 +456,6 @@ identity_prompt = (
 loop_config = {
    "max_iterations": 50,
    "max_tool_calls_per_turn": 30,
-    "max_history_tokens": 32000,
 }

 # ---------------------------------------------------------------------------
@@ -541,7 +541,7 @@ class CredentialTesterAgent:
            loop_config={
                "max_iterations": 50,
                "max_tool_calls_per_turn": 30,
-                "max_history_tokens": 32000,
+                "max_context_tokens": get_max_context_tokens(),
            },
            conversation_mode="continuous",
            identity_prompt=(
@@ -0,0 +1,151 @@
+"""Agent discovery — scan known directories and return categorised AgentEntry lists."""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from pathlib import Path
+
+
+@dataclass
+class AgentEntry:
+    """Lightweight agent metadata for the picker / API discover endpoint."""
+
+    path: Path
+    name: str
+    description: str
+    category: str
+    session_count: int = 0
+    node_count: int = 0
+    tool_count: int = 0
+    tags: list[str] = field(default_factory=list)
+    last_active: str | None = None
+
+
+def _get_last_active(agent_name: str) -> str | None:
+    """Return the most recent updated_at timestamp across all sessions."""
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if not sessions_dir.exists():
+        return None
+    latest: str | None = None
+    for session_dir in sessions_dir.iterdir():
+        if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
+            continue
+        state_file = session_dir / "state.json"
+        if not state_file.exists():
+            continue
+        try:
+            data = json.loads(state_file.read_text(encoding="utf-8"))
+            ts = data.get("timestamps", {}).get("updated_at")
+            if ts and (latest is None or ts > latest):
+                latest = ts
+        except Exception:
+            continue
+    return latest
+
+
+def _count_sessions(agent_name: str) -> int:
+    """Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
+    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
+    if not sessions_dir.exists():
+        return 0
+    return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
+
+
+def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
+    """Extract node count, tool count, and tags from an agent directory.
+
+    Prefers agent.py (AST-parsed) over agent.json for node/tool counts
+    since agent.json may be stale.  Tags are only available from agent.json.
+    """
+    import ast
+
+    node_count, tool_count, tags = 0, 0, []
+
+    agent_py = agent_path / "agent.py"
+    if agent_py.exists():
+        try:
+            tree = ast.parse(agent_py.read_text(encoding="utf-8"))
+            for node in ast.walk(tree):
+                if isinstance(node, ast.Assign):
+                    for target in node.targets:
+                        if isinstance(target, ast.Name) and target.id == "nodes":
+                            if isinstance(node.value, ast.List):
+                                node_count = len(node.value.elts)
+        except Exception:
+            pass
+
+    agent_json = agent_path / "agent.json"
+    if agent_json.exists():
+        try:
+            data = json.loads(agent_json.read_text(encoding="utf-8"))
+            json_nodes = data.get("graph", {}).get("nodes", []) or data.get("nodes", [])
+            if node_count == 0:
+                node_count = len(json_nodes)
+            tools: set[str] = set()
+            for n in json_nodes:
+                tools.update(n.get("tools", []))
+            tool_count = len(tools)
+            tags = data.get("agent", {}).get("tags", [])
+        except Exception:
+            pass
+
+    return node_count, tool_count, tags
+
+
+def discover_agents() -> dict[str, list[AgentEntry]]:
+    """Discover agents from all known sources grouped by category."""
+    from framework.runner.cli import (
+        _extract_python_agent_metadata,
+        _get_framework_agents_dir,
+        _is_valid_agent_dir,
+    )
+
+    groups: dict[str, list[AgentEntry]] = {}
+    sources = [
+        ("Your Agents", Path("exports")),
+        ("Framework", _get_framework_agents_dir()),
+        ("Examples", Path("examples/templates")),
+    ]
+
+    for category, base_dir in sources:
+        if not base_dir.exists():
+            continue
+        entries: list[AgentEntry] = []
+        for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
+            if not _is_valid_agent_dir(path):
+                continue
+
+            name, desc = _extract_python_agent_metadata(path)
+            config_fallback_name = path.name.replace("_", " ").title()
+            used_config = name != config_fallback_name
+
+            node_count, tool_count, tags = _extract_agent_stats(path)
+            if not used_config:
+                agent_json = path / "agent.json"
+                if agent_json.exists():
+                    try:
+                        data = json.loads(agent_json.read_text(encoding="utf-8"))
+                        meta = data.get("agent", {})
+                        name = meta.get("name", name)
+                        desc = meta.get("description", desc)
+                    except Exception:
+                        pass
+
+            entries.append(
+                AgentEntry(
+                    path=path,
+                    name=name,
+                    description=desc,
+                    category=category,
+                    session_count=_count_sessions(path.name),
+                    node_count=node_count,
+                    tool_count=tool_count,
+                    tags=tags,
+                    last_active=_get_last_active(path.name),
+                )
+            )
+        if entries:
+            groups[category] = entries
+
+    return groups
@@ -1,40 +0,0 @@
-"""
-Hive Coder — Native coding agent that builds Hive agent packages.
-
-Deeply understands the agent framework and produces complete Python packages
-with goals, nodes, edges, system prompts, MCP configuration, and tests
-from natural language specifications.
-"""
-
-from .agent import (
-    conversation_mode,
-    edges,
-    entry_node,
-    entry_points,
-    goal,
-    identity_prompt,
-    loop_config,
-    nodes,
-    pause_nodes,
-    terminal_nodes,
-)
-from .config import AgentMetadata, RuntimeConfig, default_config, metadata
-
-__version__ = "1.0.0"
-
-__all__ = [
-    "goal",
-    "nodes",
-    "edges",
-    "entry_node",
-    "entry_points",
-    "pause_nodes",
-    "terminal_nodes",
-    "conversation_mode",
-    "identity_prompt",
-    "loop_config",
-    "RuntimeConfig",
-    "AgentMetadata",
-    "default_config",
-    "metadata",
-]
@@ -1,60 +0,0 @@
-"""CLI entry point for Hive Coder agent."""
-
-import json
-import logging
-import sys
-
-import click
-
-from .agent import entry_node, goal, nodes
-from .config import metadata
-
-
-def setup_logging(verbose=False, debug=False):
-    """Configure logging for execution visibility."""
-    if debug:
-        level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
-    elif verbose:
-        level, fmt = logging.INFO, "%(message)s"
-    else:
-        level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
-    logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
-    logging.getLogger("framework").setLevel(level)
-
-
-@click.group()
-@click.version_option(version="1.0.0")
-def cli():
-    """Hive Coder — Build Hive agent packages from natural language."""
-    pass
-
-
-@cli.command()
-@click.option("--json", "output_json", is_flag=True)
-def info(output_json):
-    """Show agent information."""
-    info_data = {
-        "name": metadata.name,
-        "version": metadata.version,
-        "description": metadata.description,
-        "goal": {
-            "name": goal.name,
-            "description": goal.description,
-        },
-        "nodes": [n.id for n in nodes],
-        "entry_node": entry_node,
-        "client_facing_nodes": [n.id for n in nodes if n.client_facing],
-    }
-    if output_json:
-        click.echo(json.dumps(info_data, indent=2))
-    else:
-        click.echo(f"Agent: {info_data['name']}")
-        click.echo(f"Version: {info_data['version']}")
-        click.echo(f"Description: {info_data['description']}")
-        click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
-        click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
-        click.echo(f"Entry: {info_data['entry_node']}")
-
-
-if __name__ == "__main__":
-    cli()
@@ -1,153 +0,0 @@
-"""Agent graph construction for Hive Coder."""
-
-from framework.graph import Constraint, Goal, SuccessCriterion
-from framework.graph.edge import GraphSpec
-
-from .nodes import coder_node, queen_node
-
-# Goal definition
-goal = Goal(
-    id="hive-coder",
-    name="Hive Agent Builder",
-    description=(
-        "Build complete, validated Hive agent packages from natural language "
-        "specifications. Produces production-ready Python packages with goals, "
-        "nodes, edges, system prompts, MCP configuration, and tests."
-    ),
-    success_criteria=[
-        SuccessCriterion(
-            id="valid-package",
-            description="Generated agent package passes structural validation",
-            metric="validation_pass",
-            target="true",
-            weight=0.30,
-        ),
-        SuccessCriterion(
-            id="complete-files",
-            description=(
-                "All required files generated: agent.py, config.py, "
-                "nodes/__init__.py, __init__.py, __main__.py, mcp_servers.json"
-            ),
-            metric="file_count",
-            target=">=6",
-            weight=0.25,
-        ),
-        SuccessCriterion(
-            id="user-satisfaction",
-            description="User reviews and approves the generated agent",
-            metric="user_approval",
-            target="true",
-            weight=0.25,
-        ),
-        SuccessCriterion(
-            id="framework-compliance",
-            description=(
-                "Generated code follows framework patterns: STEP 1/STEP 2 "
-                "for client-facing and correct imports"
-            ),
-            metric="pattern_compliance",
-            target="100%",
-            weight=0.20,
-        ),
-    ],
-    constraints=[
-        Constraint(
-            id="dynamic-tool-discovery",
-            description=(
-                "Always discover available tools dynamically via "
-                "list_agent_tools before referencing tools in agent designs"
-            ),
-            constraint_type="hard",
-            category="correctness",
-        ),
-        Constraint(
-            id="no-fabricated-tools",
-            description="Only reference tools that exist in hive-tools MCP",
-            constraint_type="hard",
-            category="correctness",
-        ),
-        Constraint(
-            id="valid-python",
-            description="All generated Python files must be syntactically correct",
-            constraint_type="hard",
-            category="correctness",
-        ),
-        Constraint(
-            id="self-verification",
-            description="Run validation after writing code; fix errors before presenting",
-            constraint_type="hard",
-            category="quality",
-        ),
-    ],
-)
-
-# Nodes: primary coder node only.  The queen runs as an independent
-# GraphExecutor with queen_node — not as part of this graph.
-nodes = [coder_node]
-
-# No edges needed — single event_loop node
-edges = []
-
-# Graph configuration
-entry_node = "coder"
-entry_points = {"start": "coder"}
-pause_nodes = []
-terminal_nodes = []  # Coder node has output_keys and can terminate
-
-# No async entry points needed — the queen is now an independent executor,
-# not a secondary graph receiving events via add_graph().
-async_entry_points = []
-
-# Module-level variables read by AgentRunner.load()
-conversation_mode = "continuous"
-identity_prompt = (
-    "You are Hive Coder, the best agent-building coding agent on the planet. "
-    "You deeply understand the Hive agent framework at the source code level "
-    "and produce production-ready agent packages from natural language. "
-    "You can dynamically discover available framework tools, inspect runtime "
-    "sessions and checkpoints from agents you build, and run their test suites. "
-    "You follow coding agent discipline: read before writing, verify "
-    "assumptions by reading actual code, adhere to project conventions, "
-    "self-verify with validation, and fix your own errors. You are concise, "
-    "direct, and technically rigorous. No emojis. No fluff."
-)
-loop_config = {
-    "max_iterations": 100,
-    "max_tool_calls_per_turn": 30,
-    "max_history_tokens": 32000,
-}
-
-
-# ---------------------------------------------------------------------------
-# Queen graph — runs as an independent persistent conversation in the TUI.
-# Loaded by _load_judge_and_queen() in app.py, NOT by AgentRunner.
-# ---------------------------------------------------------------------------
-
-queen_goal = Goal(
-    id="queen-manager",
-    name="Queen Manager",
-    description=(
-        "Manage the worker agent lifecycle and serve as the user's primary "
-        "interactive interface. Triage health escalations from the judge."
-    ),
-    success_criteria=[],
-    constraints=[],
-)
-
-queen_graph = GraphSpec(
-    id="queen-graph",
-    goal_id=queen_goal.id,
-    version="1.0.0",
-    entry_node="queen",
-    entry_points={"start": "queen"},
-    terminal_nodes=[],
-    pause_nodes=[],
-    nodes=[queen_node],
-    edges=[],
-    conversation_mode="continuous",
-    loop_config={
-        "max_iterations": 999_999,
-        "max_tool_calls_per_turn": 30,
-        "max_history_tokens": 32000,
-    },
-)
@@ -0,0 +1,21 @@
+"""
+Queen — Native agent builder for the Hive framework.
+
+Deeply understands the agent framework and produces complete Python packages
+with goals, nodes, edges, system prompts, MCP configuration, and tests
+from natural language specifications.
+"""
+
+from .agent import queen_goal, queen_graph
+from .config import AgentMetadata, RuntimeConfig, default_config, metadata
+
+__version__ = "1.0.0"
+
+__all__ = [
+    "queen_goal",
+    "queen_graph",
+    "RuntimeConfig",
+    "AgentMetadata",
+    "default_config",
+    "metadata",
+]
@@ -0,0 +1,39 @@
+"""Queen graph definition."""
+
+from framework.graph import Goal
+from framework.graph.edge import GraphSpec
+
+from .nodes import queen_node
+
+# ---------------------------------------------------------------------------
+# Queen graph — the primary persistent conversation.
+# Loaded by queen_orchestrator.create_queen(), NOT by AgentRunner.
+# ---------------------------------------------------------------------------
+
+queen_goal = Goal(
+    id="queen-manager",
+    name="Queen Manager",
+    description=(
+        "Manage the worker agent lifecycle and serve as the user's primary "
+        "interactive interface. Triage health escalations from the judge."
+    ),
+    success_criteria=[],
+    constraints=[],
+)
+
+queen_graph = GraphSpec(
+    id="queen-graph",
+    goal_id=queen_goal.id,
+    version="1.0.0",
+    entry_node="queen",
+    entry_points={"start": "queen"},
+    terminal_nodes=[],
+    pause_nodes=[],
+    nodes=[queen_node],
+    edges=[],
+    conversation_mode="continuous",
+    loop_config={
+        "max_iterations": 999_999,
+        "max_tool_calls_per_turn": 30,
+    },
+)
@@ -1,4 +1,4 @@
-"""Runtime configuration for Hive Coder agent."""
+"""Runtime configuration for Queen agent."""

 import json
 from dataclasses import dataclass, field
@@ -34,7 +34,7 @@ default_config = RuntimeConfig()

@dataclass
 class AgentMetadata:
-    name: str = "Hive Coder"
+    name: str = "Queen"
    version: str = "1.0.0"
    description: str = (
        "Native coding agent that builds production-ready Hive agent packages "
@@ -43,7 +43,7 @@ class AgentMetadata:
        "MCP configuration, and tests."
    )
    intro_message: str = (
-        "I'm Hive Coder — I build Hive agents. Describe what kind of agent "
+        "I'm Queen — I build Hive agents. Describe what kind of agent "
        "you want to create and I'll design, implement, and validate it for you."
    )

@@ -0,0 +1,371 @@
+"""Queen global cross-session memory.
+
+Three-tier memory architecture:
+  ~/.hive/queen/MEMORY.md                            — semantic (who, what, why)
+  ~/.hive/queen/memories/MEMORY-YYYY-MM-DD.md        — episodic (daily journals)
+  ~/.hive/queen/session/{id}/data/adapt.md           — working (session-scoped)
+
+Semantic and episodic files are injected at queen session start.
+
+Semantic memory (MEMORY.md) is updated automatically at session end via
+consolidate_queen_memory() — the queen never rewrites this herself.
+
+Episodic memory (MEMORY-date.md) can be written by the queen during a session
+via the write_to_diary tool, and is also appended to at session end by
+consolidate_queen_memory().
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import traceback
+from datetime import date, datetime
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def _queen_dir() -> Path:
+    return Path.home() / ".hive" / "queen"
+
+
+def semantic_memory_path() -> Path:
+    return _queen_dir() / "MEMORY.md"
+
+
+def episodic_memory_path(d: date | None = None) -> Path:
+    d = d or date.today()
+    return _queen_dir() / "memories" / f"MEMORY-{d.strftime('%Y-%m-%d')}.md"
+
+
+def read_semantic_memory() -> str:
+    path = semantic_memory_path()
+    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
+
+
+def read_episodic_memory(d: date | None = None) -> str:
+    path = episodic_memory_path(d)
+    return path.read_text(encoding="utf-8").strip() if path.exists() else ""
+
+
+def format_for_injection() -> str:
+    """Format cross-session memory for system prompt injection.
+
+    Returns an empty string if no meaningful content exists yet (e.g. first
+    session with only the seed template).
+    """
+    semantic = read_semantic_memory()
+    episodic = read_episodic_memory()
+
+    # Suppress injection if semantic is still just the seed template
+    if semantic and semantic.startswith("# My Understanding of the User\n\n*No sessions"):
+        semantic = ""
+
+    parts: list[str] = []
+    if semantic:
+        parts.append(semantic)
+    if episodic:
+        today_str = date.today().strftime("%B %-d, %Y")
+        parts.append(f"## Today — {today_str}\n\n{episodic}")
+
+    if not parts:
+        return ""
+
+    body = "\n\n---\n\n".join(parts)
+    return "--- Your Cross-Session Memory ---\n\n" + body + "\n\n--- End Cross-Session Memory ---"
+
+
+_SEED_TEMPLATE = """\
+# My Understanding of the User
+
+*No sessions recorded yet.*
+
+## Who They Are
+
+## What They're Trying to Achieve
+
+## What's Working
+
+## What I've Learned
+"""
+
+
+def append_episodic_entry(content: str) -> None:
+    """Append a timestamped prose entry to today's episodic memory file.
+
+    Creates the file (with a date heading) if it doesn't exist yet.
+    Used both by the queen's diary tool and by the consolidation hook.
+    """
+    ep_path = episodic_memory_path()
+    ep_path.parent.mkdir(parents=True, exist_ok=True)
+    today_str = date.today().strftime("%B %-d, %Y")
+    timestamp = datetime.now().strftime("%H:%M")
+    if not ep_path.exists():
+        header = f"# {today_str}\n\n"
+        block = f"{header}### {timestamp}\n\n{content.strip()}\n"
+    else:
+        block = f"\n\n### {timestamp}\n\n{content.strip()}\n"
+    with ep_path.open("a", encoding="utf-8") as f:
+        f.write(block)
+
+
+def seed_if_missing() -> None:
+    """Create MEMORY.md with a blank template if it doesn't exist yet."""
+    path = semantic_memory_path()
+    if path.exists():
+        return
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(_SEED_TEMPLATE, encoding="utf-8")
+
+
+# ---------------------------------------------------------------------------
+# Consolidation prompt
+# ---------------------------------------------------------------------------
+
+_SEMANTIC_SYSTEM = """\
+You maintain the persistent cross-session memory of an AI assistant called the Queen.
+Review the session notes and rewrite MEMORY.md — the Queen's durable understanding of the
+person she works with across all sessions.
+
+Write entirely in the Queen's voice — first person, reflective, honest.
+Not a log of events, but genuine understanding of who this person is over time.
+
+Rules:
+- Update and synthesise: incorporate new understanding, update facts that have changed, remove
+  details that are stale, superseded, or no longer say anything meaningful about the person.
+- Keep it as structured markdown with named sections about the PERSON, not about today.
+- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
+  MEMORY.md is about who they are, what they want, what works — not what happened today.
+- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
+- If the session had no meaningful new information about the person,
+  return the existing text unchanged.
+- Do not add fictional details. Only reflect what is evidenced in the notes.
+- Stay concise. Prune rather than accumulate. A lean, accurate file is more useful than a
+  dense one. If something was true once but has been resolved or superseded, remove it.
+- Output only the raw markdown content of MEMORY.md. No preamble, no code fences.
+"""
+
+_DIARY_SYSTEM = """\
+You maintain the daily episodic diary of an AI assistant called the Queen.
+You receive: (1) today's existing diary so far, and (2) notes from the latest session.
+
+Rewrite the complete diary for today as a single unified narrative —
+first person, reflective, honest.
+Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
+describe it once with appropriate weight rather than retelling it. Weave in new developments from
+the session notes. Preserve important milestones, emotional texture, and session path references.
+
+If today's diary is empty, write the initial entry based on the session notes alone.
+
+Output only the full diary prose — no date heading, no timestamp headers,
+no preamble, no code fences.
+"""
+
+
+def read_session_context(session_dir: Path, max_messages: int = 80) -> str:
+    """Extract a readable transcript from conversation parts + adapt.md.
+
+    Reads the last ``max_messages`` conversation parts and the session's
+    adapt.md (working memory). Tool results are omitted — only user and
+    assistant turns (with tool-call names noted) are included.
+    """
+    parts: list[str] = []
+
+    # Working notes
+    adapt_path = session_dir / "data" / "adapt.md"
+    if adapt_path.exists():
+        text = adapt_path.read_text(encoding="utf-8").strip()
+        if text:
+            parts.append(f"## Session Working Notes (adapt.md)\n\n{text}")
+
+    # Conversation transcript
+    parts_dir = session_dir / "conversations" / "parts"
+    if parts_dir.exists():
+        part_files = sorted(parts_dir.glob("*.json"))[-max_messages:]
+        lines: list[str] = []
+        for pf in part_files:
+            try:
+                data = json.loads(pf.read_text(encoding="utf-8"))
+                role = data.get("role", "")
+                content = str(data.get("content", "")).strip()
+                tool_calls = data.get("tool_calls") or []
+                if role == "tool":
+                    continue  # skip verbose tool results
+                if role == "assistant" and tool_calls and not content:
+                    names = [tc.get("function", {}).get("name", "?") for tc in tool_calls]
+                    lines.append(f"[queen calls: {', '.join(names)}]")
+                elif content:
+                    label = "user" if role == "user" else "queen"
+                    lines.append(f"[{label}]: {content[:600]}")
+            except Exception:
+                continue
+        if lines:
+            parts.append("## Conversation\n\n" + "\n".join(lines))
+
+    return "\n\n".join(parts)
+
+
+# ---------------------------------------------------------------------------
+# Context compaction (binary-split LLM summarisation)
+# ---------------------------------------------------------------------------
+
+# If the raw session context exceeds this many characters, compact it first
+# before sending to the consolidation LLM. ~200 k chars ≈ 50 k tokens.
+_CTX_COMPACT_CHAR_LIMIT = 200_000
+_CTX_COMPACT_MAX_DEPTH = 8
+
+_COMPACT_SYSTEM = (
+    "Summarise this conversation segment. Preserve: user goals, key decisions, "
+    "what was built or changed, emotional tone, and important outcomes. "
+    "Write concisely in third person past tense. Omit routine tool invocations "
+    "unless the result matters."
+)
+
+
+async def _compact_context(text: str, llm: object, *, _depth: int = 0) -> str:
+    """Binary-split and LLM-summarise *text* until it fits within the char limit.
+
+    Mirrors the recursive binary-splitting strategy used by the main agent
+    compaction pipeline (EventLoopNode._llm_compact).
+    """
+    if len(text) <= _CTX_COMPACT_CHAR_LIMIT or _depth >= _CTX_COMPACT_MAX_DEPTH:
+        return text
+
+    # Split near the midpoint on a line boundary so we don't cut mid-message
+    mid = len(text) // 2
+    split_at = text.rfind("\n", 0, mid) + 1
+    if split_at <= 0:
+        split_at = mid
+
+    half1, half2 = text[:split_at], text[split_at:]
+
+    async def _summarise(chunk: str) -> str:
+        try:
+            resp = await llm.acomplete(
+                messages=[{"role": "user", "content": chunk}],
+                system=_COMPACT_SYSTEM,
+                max_tokens=2048,
+            )
+            return resp.content.strip()
+        except Exception:
+            logger.warning(
+                "queen_memory: context compaction LLM call failed (depth=%d), truncating",
+                _depth,
+            )
+            return chunk[: _CTX_COMPACT_CHAR_LIMIT // 4]
+
+    s1, s2 = await asyncio.gather(_summarise(half1), _summarise(half2))
+    combined = s1 + "\n\n" + s2
+    if len(combined) > _CTX_COMPACT_CHAR_LIMIT:
+        return await _compact_context(combined, llm, _depth=_depth + 1)
+    return combined
+
+
+async def consolidate_queen_memory(
+    session_id: str,
+    session_dir: Path,
+    llm: object,
+) -> None:
+    """Update MEMORY.md and append a diary entry based on the current session.
+
+    Reads conversation parts and adapt.md from session_dir. Called
+    periodically in the background and once at session end. Failures are
+    logged and silently swallowed so they never block teardown.
+
+    Args:
+        session_id: The session ID (used for the adapt.md path reference).
+        session_dir: Path to the session directory (~/.hive/queen/session/{id}).
+        llm: LLMProvider instance (must support acomplete()).
+    """
+    try:
+        session_context = read_session_context(session_dir)
+        if not session_context:
+            logger.debug("queen_memory: no session context, skipping consolidation")
+            return
+
+        logger.info("queen_memory: consolidating memory for session %s ...", session_id)
+
+        # If the transcript is very large, compact it with recursive binary LLM
+        # summarisation before sending to the consolidation model.
+        if len(session_context) > _CTX_COMPACT_CHAR_LIMIT:
+            logger.info(
+                "queen_memory: session context is %d chars — compacting first",
+                len(session_context),
+            )
+            session_context = await _compact_context(session_context, llm)
+            logger.info("queen_memory: compacted to %d chars", len(session_context))
+
+        existing_semantic = read_semantic_memory()
+        today_journal = read_episodic_memory()
+        today_str = date.today().strftime("%B %-d, %Y")
+        adapt_path = session_dir / "data" / "adapt.md"
+
+        user_msg = (
+            f"## Existing Semantic Memory (MEMORY.md)\n\n"
+            f"{existing_semantic or '(none yet)'}\n\n"
+            f"## Today's Diary So Far ({today_str})\n\n"
+            f"{today_journal or '(none yet)'}\n\n"
+            f"{session_context}\n\n"
+            f"## Session Reference\n\n"
+            f"Session ID: {session_id}\n"
+            f"Session path: {adapt_path}\n"
+        )
+
+        logger.debug(
+            "queen_memory: calling LLM (%d chars of context, ~%d tokens est.)",
+            len(user_msg),
+            len(user_msg) // 4,
+        )
+
+        from framework.agents.queen.config import default_config
+
+        semantic_resp, diary_resp = await asyncio.gather(
+            llm.acomplete(
+                messages=[{"role": "user", "content": user_msg}],
+                system=_SEMANTIC_SYSTEM,
+                max_tokens=default_config.max_tokens,
+            ),
+            llm.acomplete(
+                messages=[{"role": "user", "content": user_msg}],
+                system=_DIARY_SYSTEM,
+                max_tokens=default_config.max_tokens,
+            ),
+        )
+
+        new_semantic = semantic_resp.content.strip()
+        diary_entry = diary_resp.content.strip()
+
+        if new_semantic:
+            path = semantic_memory_path()
+            path.parent.mkdir(parents=True, exist_ok=True)
+            path.write_text(new_semantic, encoding="utf-8")
+            logger.info("queen_memory: semantic memory updated (%d chars)", len(new_semantic))
+
+        if diary_entry:
+            # Rewrite today's episodic file in-place — the LLM has merged and
+            # deduplicated the full day's content, so we replace rather than append.
+            ep_path = episodic_memory_path()
+            ep_path.parent.mkdir(parents=True, exist_ok=True)
+            heading = f"# {today_str}"
+            ep_path.write_text(f"{heading}\n\n{diary_entry}\n", encoding="utf-8")
+            logger.info(
+                "queen_memory: episodic diary rewritten for %s (%d chars)",
+                today_str,
+                len(diary_entry),
+            )
+
+    except Exception:
+        tb = traceback.format_exc()
+        logger.exception("queen_memory: consolidation failed")
+        # Write to file so the cause is findable regardless of log verbosity.
+        error_path = _queen_dir() / "consolidation_error.txt"
+        try:
+            error_path.parent.mkdir(parents=True, exist_ok=True)
+            error_path.write_text(
+                f"session: {session_id}\ntime: {datetime.now().isoformat()}\n\n{tb}",
+                encoding="utf-8",
+            )
+        except Exception:
+            pass
@@ -180,7 +180,7 @@ terminal_nodes = []  # Forever-alive
 # Module-level vars read by AgentRunner.load()
 conversation_mode = "continuous"
 identity_prompt = "You are a helpful agent."
-loop_config = {"max_iterations": 100, "max_tool_calls_per_turn": 20, "max_history_tokens": 32000}
+loop_config = {"max_iterations": 100, "max_tool_calls_per_turn": 20, "max_context_tokens": 32000}


 class MyAgent:
@@ -559,7 +559,7 @@ if __name__ == "__main__":

 ## mcp_servers.json

-> **Auto-generated.** `initialize_agent_package` creates this file with hive-tools
+> **Auto-generated.** `initialize_and_build_agent` creates this file with hive-tools
 > as the default. Only edit manually to add additional MCP servers.

 ```json
@@ -226,7 +226,7 @@ Only three valid keys:
 loop_config = {
    "max_iterations": 100,          # Max LLM turns per node visit
    "max_tool_calls_per_turn": 20,  # Max tool calls per LLM response
-    "max_history_tokens": 32000,    # Triggers conversation compaction
+    "max_context_tokens": 32000,    # Triggers conversation compaction
 }
 ```
 **INVALID keys** (do NOT use): `"strategy"`, `"mode"`, `"timeout"`,
@@ -0,0 +1,63 @@
+# Queen Memory — File System Structure
+
+```
+~/.hive/
+├── queen/
+│   ├── MEMORY.md                          ← Semantic memory
+│   ├── memories/
+│   │   ├── MEMORY-2026-03-09.md           ← Episodic memory (today)
+│   │   ├── MEMORY-2026-03-08.md
+│   │   └── ...
+│   └── session/
+│       └── {session_id}/                  ← One dir per session (or resumed-from session)
+│           ├── conversations/
+│           │   ├── parts/
+│           │   │   ├── 00001.json         ← One file per message (role, content, tool_calls)
+│           │   │   ├── 00002.json
+│           │   │   └── ...
+│           │   └── spillover/
+│           │       ├── conversation_1.md  ← Compacted old conversation segments
+│           │       ├── conversation_2.md
+│           │       └── ...
+│           └── data/
+│               ├── adapt.md              ← Working memory (session-scoped)
+│               ├── web_search_1.txt      ← Spillover: large tool results
+│               ├── web_search_2.txt
+│               └── ...
+```
+
+---
+
+## The three memory tiers
+
+| File | Tier | Written by | Read at |
+|---|---|---|---|
+| `MEMORY.md` | Semantic | Consolidation LLM (auto, post-session) | Session start (injected into system prompt) |
+| `memories/MEMORY-YYYY-MM-DD.md` | Episodic | Queen via `write_to_diary` tool + consolidation LLM | Session start (today's file injected) |
+| `data/adapt.md` | Working | Queen via `update_session_notes` tool | Every turn (inlined in system prompt) |
+
+---
+
+## Session directory naming
+
+The session directory name is **`queen_resume_from`** when a cold-restore resumes an existing
+session, otherwise the new **`session_id`**. This means resumed sessions accumulate all messages
+in the original directory rather than fragmenting across multiple folders.
+
+---
+
+## Consolidation
+
+`consolidate_queen_memory()` runs every **5 minutes** in the background and once more at session
+end. It reads:
+
+1. `conversations/parts/*.json` — full message history (user + assistant turns; tool results skipped)
+2. `data/adapt.md` — current working notes
+
+It then makes two LLM writes:
+
+- Rewrites `MEMORY.md` in place (semantic memory — queen never touches this herself)
+- Appends a timestamped prose entry to today's `memories/MEMORY-YYYY-MM-DD.md`
+
+If the combined transcript exceeds ~200 K characters it is recursively binary-compacted via the
+LLM before being sent to the consolidation model (mirrors `EventLoopNode._llm_compact`).
@@ -1,4 +1,4 @@
-"""Test fixtures for Hive Coder agent."""
+"""Test fixtures for Queen agent."""

 import sys
 from pathlib import Path
@@ -1,7 +0,0 @@
-"""Builder interface for analyzing and building agents."""
-
-from framework.builder.query import BuilderQuery
-
-__all__ = [
-    "BuilderQuery",
-]
@@ -1,501 +0,0 @@
-"""
-Builder Query Interface - How I (Builder) analyze agent runs.
-
-This is designed around the questions I need to answer:
-1. What happened? (summaries, narratives)
-2. Why did it fail? (failure analysis, decision traces)
-3. What patterns emerge? (across runs, across nodes)
-4. What should we change? (suggestions)
-"""
-
-from collections import defaultdict
-from pathlib import Path
-from typing import Any
-
-from framework.schemas.decision import Decision
-from framework.schemas.run import Run, RunStatus, RunSummary
-from framework.storage.backend import FileStorage
-
-
-class FailureAnalysis:
-    """Structured analysis of why a run failed."""
-
-    def __init__(
-        self,
-        run_id: str,
-        failure_point: str,
-        root_cause: str,
-        decision_chain: list[str],
-        problems: list[str],
-        suggestions: list[str],
-    ):
-        self.run_id = run_id
-        self.failure_point = failure_point
-        self.root_cause = root_cause
-        self.decision_chain = decision_chain
-        self.problems = problems
-        self.suggestions = suggestions
-
-    def to_dict(self) -> dict[str, Any]:
-        return {
-            "run_id": self.run_id,
-            "failure_point": self.failure_point,
-            "root_cause": self.root_cause,
-            "decision_chain": self.decision_chain,
-            "problems": self.problems,
-            "suggestions": self.suggestions,
-        }
-
-    def __str__(self) -> str:
-        lines = [
-            f"=== Failure Analysis for {self.run_id} ===",
-            "",
-            f"Failure Point: {self.failure_point}",
-            f"Root Cause: {self.root_cause}",
-            "",
-            "Decision Chain Leading to Failure:",
-        ]
-        for i, dec in enumerate(self.decision_chain, 1):
-            lines.append(f"  {i}. {dec}")
-
-        if self.problems:
-            lines.append("")
-            lines.append("Reported Problems:")
-            for prob in self.problems:
-                lines.append(f"  - {prob}")
-
-        if self.suggestions:
-            lines.append("")
-            lines.append("Suggestions:")
-            for sug in self.suggestions:
-                lines.append(f"  → {sug}")
-
-        return "\n".join(lines)
-
-
-class PatternAnalysis:
-    """Patterns detected across multiple runs."""
-
-    def __init__(
-        self,
-        goal_id: str,
-        run_count: int,
-        success_rate: float,
-        common_failures: list[tuple[str, int]],
-        problematic_nodes: list[tuple[str, float]],
-        decision_patterns: dict[str, Any],
-    ):
-        self.goal_id = goal_id
-        self.run_count = run_count
-        self.success_rate = success_rate
-        self.common_failures = common_failures
-        self.problematic_nodes = problematic_nodes
-        self.decision_patterns = decision_patterns
-
-    def to_dict(self) -> dict[str, Any]:
-        return {
-            "goal_id": self.goal_id,
-            "run_count": self.run_count,
-            "success_rate": self.success_rate,
-            "common_failures": self.common_failures,
-            "problematic_nodes": self.problematic_nodes,
-            "decision_patterns": self.decision_patterns,
-        }
-
-    def __str__(self) -> str:
-        lines = [
-            f"=== Pattern Analysis for Goal {self.goal_id} ===",
-            "",
-            f"Runs Analyzed: {self.run_count}",
-            f"Success Rate: {self.success_rate:.1%}",
-        ]
-
-        if self.common_failures:
-            lines.append("")
-            lines.append("Common Failures:")
-            for failure, count in self.common_failures:
-                lines.append(f"  - {failure} ({count} occurrences)")
-
-        if self.problematic_nodes:
-            lines.append("")
-            lines.append("Problematic Nodes (failure rate):")
-            for node, rate in self.problematic_nodes:
-                lines.append(f"  - {node}: {rate:.1%} failure rate")
-
-        return "\n".join(lines)
-
-
-class BuilderQuery:
-    """
-    The interface I (Builder) use to understand what agents are doing.
-
-    This is optimized for the questions I need to answer when analyzing
-    agent behavior and deciding what to improve.
-    """
-
-    def __init__(self, storage_path: str | Path):
-        self.storage = FileStorage(storage_path)
-
-    # === WHAT HAPPENED? ===
-
-    def get_run_summary(self, run_id: str) -> RunSummary | None:
-        """Get a quick summary of a run."""
-        return self.storage.load_summary(run_id)
-
-    def get_full_run(self, run_id: str) -> Run | None:
-        """Get the complete run with all decisions."""
-        return self.storage.load_run(run_id)
-
-    def list_runs_for_goal(self, goal_id: str) -> list[RunSummary]:
-        """Get summaries of all runs for a goal."""
-        run_ids = self.storage.get_runs_by_goal(goal_id)
-        summaries = []
-        for run_id in run_ids:
-            summary = self.storage.load_summary(run_id)
-            if summary:
-                summaries.append(summary)
-        return summaries
-
-    def get_recent_failures(self, limit: int = 10) -> list[RunSummary]:
-        """Get recent failed runs."""
-        run_ids = self.storage.get_runs_by_status(RunStatus.FAILED)
-        summaries = []
-        for run_id in run_ids[:limit]:
-            summary = self.storage.load_summary(run_id)
-            if summary:
-                summaries.append(summary)
-        return summaries
-
-    # === WHY DID IT FAIL? ===
-
-    def analyze_failure(self, run_id: str) -> FailureAnalysis | None:
-        """
-        Deep analysis of why a run failed.
-
-        This is my primary tool for understanding what went wrong.
-        """
-        run = self.storage.load_run(run_id)
-        if run is None or run.status != RunStatus.FAILED:
-            return None
-
-        # Find the first failed decision
-        failed_decisions = [d for d in run.decisions if not d.was_successful]
-        if not failed_decisions:
-            failure_point = "Unknown - no decision marked as failed"
-            root_cause = "Run failed but all decisions succeeded (external cause?)"
-        else:
-            first_failure = failed_decisions[0]
-            failure_point = first_failure.summary_for_builder()
-            root_cause = first_failure.outcome.error if first_failure.outcome else "Unknown"
-
-        # Build the decision chain leading to failure
-        decision_chain = []
-        for d in run.decisions:
-            decision_chain.append(d.summary_for_builder())
-            if not d.was_successful:
-                break
-
-        # Extract problems
-        problems = [f"[{p.severity}] {p.description}" for p in run.problems]
-
-        # Generate suggestions based on the failure
-        suggestions = self._generate_suggestions(run, failed_decisions)
-
-        return FailureAnalysis(
-            run_id=run_id,
-            failure_point=failure_point,
-            root_cause=root_cause,
-            decision_chain=decision_chain,
-            problems=problems,
-            suggestions=suggestions,
-        )
-
-    def get_decision_trace(self, run_id: str) -> list[str]:
-        """Get a readable trace of all decisions in a run."""
-        run = self.storage.load_run(run_id)
-        if run is None:
-            return []
-        return [d.summary_for_builder() for d in run.decisions]
-
-    # === WHAT PATTERNS EMERGE? ===
-
-    def find_patterns(self, goal_id: str) -> PatternAnalysis | None:
-        """
-        Find patterns across runs for a goal.
-
-        This helps me understand systemic issues vs one-off failures.
-        """
-        run_ids = self.storage.get_runs_by_goal(goal_id)
-        if not run_ids:
-            return None
-
-        runs = []
-        for run_id in run_ids:
-            run = self.storage.load_run(run_id)
-            if run:
-                runs.append(run)
-
-        if not runs:
-            return None
-
-        # Calculate success rate
-        completed = [r for r in runs if r.status == RunStatus.COMPLETED]
-        success_rate = len(completed) / len(runs) if runs else 0.0
-
-        # Find common failures
-        failure_counts: dict[str, int] = defaultdict(int)
-        for run in runs:
-            for decision in run.decisions:
-                if not decision.was_successful and decision.outcome:
-                    error = decision.outcome.error or "Unknown error"
-                    failure_counts[error] += 1
-
-        common_failures = sorted(failure_counts.items(), key=lambda x: x[1], reverse=True)[:5]
-
-        # Find problematic nodes
-        node_stats: dict[str, dict[str, int]] = defaultdict(lambda: {"total": 0, "failed": 0})
-        for run in runs:
-            for decision in run.decisions:
-                node_stats[decision.node_id]["total"] += 1
-                if not decision.was_successful:
-                    node_stats[decision.node_id]["failed"] += 1
-
-        problematic_nodes = []
-        for node_id, stats in node_stats.items():
-            if stats["total"] > 0:
-                failure_rate = stats["failed"] / stats["total"]
-                if failure_rate > 0.1:  # More than 10% failure rate
-                    problematic_nodes.append((node_id, failure_rate))
-
-        problematic_nodes.sort(key=lambda x: x[1], reverse=True)
-
-        # Decision patterns
-        decision_patterns = self._analyze_decision_patterns(runs)
-
-        return PatternAnalysis(
-            goal_id=goal_id,
-            run_count=len(runs),
-            success_rate=success_rate,
-            common_failures=common_failures,
-            problematic_nodes=problematic_nodes,
-            decision_patterns=decision_patterns,
-        )
-
-    def compare_runs(self, run_id_1: str, run_id_2: str) -> dict[str, Any]:
-        """Compare two runs to understand what differed."""
-        run1 = self.storage.load_run(run_id_1)
-        run2 = self.storage.load_run(run_id_2)
-
-        if run1 is None or run2 is None:
-            return {"error": "One or both runs not found"}
-
-        return {
-            "run_1": {
-                "id": run1.id,
-                "status": run1.status.value,
-                "decisions": len(run1.decisions),
-                "success_rate": run1.metrics.success_rate,
-            },
-            "run_2": {
-                "id": run2.id,
-                "status": run2.status.value,
-                "decisions": len(run2.decisions),
-                "success_rate": run2.metrics.success_rate,
-            },
-            "differences": self._find_differences(run1, run2),
-        }
-
-    # === WHAT SHOULD WE CHANGE? ===
-
-    def suggest_improvements(self, goal_id: str) -> list[dict[str, Any]]:
-        """
-        Generate improvement suggestions based on run analysis.
-
-        This is what I use to propose changes to the human engineer.
-        """
-        patterns = self.find_patterns(goal_id)
-        if patterns is None:
-            return []
-
-        suggestions = []
-
-        # Suggestion: Fix problematic nodes
-        for node_id, failure_rate in patterns.problematic_nodes:
-            suggestions.append(
-                {
-                    "type": "node_improvement",
-                    "target": node_id,
-                    "reason": f"Node has {failure_rate:.1%} failure rate",
-                    "recommendation": (
-                        f"Review and improve node '{node_id}' - "
-                        "high failure rate suggests prompt or tool issues"
-                    ),
-                    "priority": "high" if failure_rate > 0.3 else "medium",
-                }
-            )
-
-        # Suggestion: Address common failures
-        for failure, count in patterns.common_failures:
-            if count >= 2:
-                suggestions.append(
-                    {
-                        "type": "error_handling",
-                        "target": failure,
-                        "reason": f"Error occurred {count} times",
-                        "recommendation": f"Add handling for: {failure}",
-                        "priority": "high" if count >= 5 else "medium",
-                    }
-                )
-
-        # Suggestion: Overall success rate
-        if patterns.success_rate < 0.8:
-            suggestions.append(
-                {
-                    "type": "architecture",
-                    "target": goal_id,
-                    "reason": f"Goal success rate is only {patterns.success_rate:.1%}",
-                    "recommendation": (
-                        "Consider restructuring the agent graph or improving goal definition"
-                    ),
-                    "priority": "high",
-                }
-            )
-
-        return suggestions
-
-    def get_node_performance(self, node_id: str) -> dict[str, Any]:
-        """Get performance metrics for a specific node across all runs."""
-        run_ids = self.storage.get_runs_by_node(node_id)
-
-        total_decisions = 0
-        successful_decisions = 0
-        total_latency = 0
-        total_tokens = 0
-        decision_types: dict[str, int] = defaultdict(int)
-
-        for run_id in run_ids:
-            run = self.storage.load_run(run_id)
-            if run:
-                for decision in run.decisions:
-                    if decision.node_id == node_id:
-                        total_decisions += 1
-                        if decision.was_successful:
-                            successful_decisions += 1
-                        if decision.outcome:
-                            total_latency += decision.outcome.latency_ms
-                            total_tokens += decision.outcome.tokens_used
-                        decision_types[decision.decision_type.value] += 1
-
-        return {
-            "node_id": node_id,
-            "total_decisions": total_decisions,
-            "success_rate": successful_decisions / total_decisions if total_decisions > 0 else 0,
-            "avg_latency_ms": total_latency / total_decisions if total_decisions > 0 else 0,
-            "total_tokens": total_tokens,
-            "decision_type_distribution": dict(decision_types),
-        }
-
-    # === PRIVATE HELPERS ===
-
-    def _generate_suggestions(
-        self,
-        run: Run,
-        failed_decisions: list[Decision],
-    ) -> list[str]:
-        """Generate suggestions based on failure analysis."""
-        suggestions = []
-
-        for decision in failed_decisions:
-            # Check if there were alternatives
-            if len(decision.options) > 1:
-                chosen = decision.chosen_option
-                alternatives = [o for o in decision.options if o.id != decision.chosen_option_id]
-                if alternatives:
-                    alt_desc = alternatives[0].description
-                    chosen_desc = chosen.description if chosen else "unknown"
-                    suggestions.append(
-                        f"Consider alternative: '{alt_desc}' instead of '{chosen_desc}'"
-                    )
-
-            # Check for missing context
-            if not decision.input_context:
-                suggestions.append(
-                    f"Decision '{decision.intent}' had no input context - "
-                    "ensure relevant data is passed"
-                )
-
-            # Check for constraint issues
-            if decision.active_constraints:
-                constraints = ", ".join(decision.active_constraints)
-                suggestions.append(f"Review constraints: {constraints} - may be too restrictive")
-
-        # Check for reported problems with suggestions
-        for problem in run.problems:
-            if problem.suggested_fix:
-                suggestions.append(problem.suggested_fix)
-
-        return suggestions
-
-    def _analyze_decision_patterns(self, runs: list[Run]) -> dict[str, Any]:
-        """Analyze decision patterns across runs."""
-        type_counts: dict[str, int] = defaultdict(int)
-        option_counts: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
-
-        for run in runs:
-            for decision in run.decisions:
-                type_counts[decision.decision_type.value] += 1
-
-                # Track which options are chosen for similar intents
-                intent_key = decision.intent[:50]  # Truncate for grouping
-                if decision.chosen_option:
-                    option_counts[intent_key][decision.chosen_option.description] += 1
-
-        # Find most common choices per intent
-        common_choices = {}
-        for intent, choices in option_counts.items():
-            if choices:
-                most_common = max(choices.items(), key=lambda x: x[1])
-                common_choices[intent] = {
-                    "choice": most_common[0],
-                    "count": most_common[1],
-                    "alternatives": len(choices) - 1,
-                }
-
-        return {
-            "decision_type_distribution": dict(type_counts),
-            "common_choices": common_choices,
-        }
-
-    def _find_differences(self, run1: Run, run2: Run) -> list[str]:
-        """Find key differences between two runs."""
-        differences = []
-
-        # Status difference
-        if run1.status != run2.status:
-            differences.append(f"Status: {run1.status.value} vs {run2.status.value}")
-
-        # Decision count difference
-        if len(run1.decisions) != len(run2.decisions):
-            differences.append(f"Decision count: {len(run1.decisions)} vs {len(run2.decisions)}")
-
-        # Find first divergence point
-        for i, (d1, d2) in enumerate(zip(run1.decisions, run2.decisions, strict=False)):
-            if d1.chosen_option_id != d2.chosen_option_id:
-                differences.append(
-                    f"Diverged at decision {i}: "
-                    f"chose '{d1.chosen_option_id}' vs '{d2.chosen_option_id}'"
-                )
-                break
-
-        # Node differences
-        nodes1 = set(run1.metrics.nodes_executed)
-        nodes2 = set(run2.metrics.nodes_executed)
-        if nodes1 != nodes2:
-            only_1 = nodes1 - nodes2
-            only_2 = nodes2 - nodes1
-            if only_1:
-                differences.append(f"Nodes only in run 1: {only_1}")
-            if only_2:
-                differences.append(f"Nodes only in run 2: {only_2}")
-
-        return differences
@@ -56,6 +56,14 @@ def get_max_tokens() -> int:
    return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)


+DEFAULT_MAX_CONTEXT_TOKENS = 32_000
+
+
+def get_max_context_tokens() -> int:
+    """Return the configured max_context_tokens, falling back to DEFAULT_MAX_CONTEXT_TOKENS."""
+    return get_hive_config().get("llm", {}).get("max_context_tokens", DEFAULT_MAX_CONTEXT_TOKENS)
+
+
 def get_api_key() -> str | None:
    """Return the API key, supporting env var, Claude Code subscription, Codex, and ZAI Code.

@@ -90,6 +98,17 @@ def get_api_key() -> str | None:
        except ImportError:
            pass

+    # Kimi Code subscription: read API key from ~/.kimi/config.toml
+    if llm.get("use_kimi_code_subscription"):
+        try:
+            from framework.runner.runner import get_kimi_code_token
+
+            token = get_kimi_code_token()
+            if token:
+                return token
+        except ImportError:
+            pass
+
    # Standard env-var path (covers ZAI Code and all API-key providers)
    api_key_env_var = llm.get("api_key_env_var")
    if api_key_env_var:
@@ -108,6 +127,9 @@ def get_api_base() -> str | None:
    if llm.get("use_codex_subscription"):
        # Codex subscription routes through the ChatGPT backend, not api.openai.com.
        return "https://chatgpt.com/backend-api/codex"
+    if llm.get("use_kimi_code_subscription"):
+        # Kimi Code uses an Anthropic-compatible endpoint (no /v1 suffix).
+        return "https://api.kimi.com/coding"
    return llm.get("api_base")


@@ -164,6 +186,7 @@ class RuntimeConfig:
    model: str = field(default_factory=get_preferred_model)
    temperature: float = 0.7
    max_tokens: int = field(default_factory=get_max_tokens)
+    max_context_tokens: int = field(default_factory=get_max_context_tokens)
    api_key: str | None = field(default_factory=get_api_key)
    api_base: str | None = field(default_factory=get_api_base)
    extra_kwargs: dict[str, Any] = field(default_factory=get_llm_extra_kwargs)
@@ -6,7 +6,7 @@ This module provides secure credential storage with:
 - Template-based usage: {{cred.key}} patterns for injection
 - Bipartisan model: Store stores values, tools define usage
 - Provider system: Extensible lifecycle management (refresh, validate)
- Multiple backends: Encrypted files, env vars, HashiCorp Vault
+- Multiple backends: Encrypted files, env vars

 Quick Start:
    from core.framework.credentials import CredentialStore, CredentialObject
@@ -38,8 +38,6 @@ For Aden server sync:
        AdenSyncProvider,
    )

-For Vault integration:
-    from core.framework.credentials.vault import HashiCorpVaultStorage
 """

 from .key_storage import (
@@ -149,8 +149,14 @@ def delete_aden_api_key() -> None:

        storage = EncryptedFileStorage()
        storage.delete(ADEN_CREDENTIAL_ID)
+    except (FileNotFoundError, PermissionError) as e:
+        logger.debug("Could not delete %s from encrypted store: %s", ADEN_CREDENTIAL_ID, e)
    except Exception:
-        logger.debug("Could not delete %s from encrypted store", ADEN_CREDENTIAL_ID)
+        logger.warning(
+            "Unexpected error deleting %s from encrypted store",
+            ADEN_CREDENTIAL_ID,
+            exc_info=True,
+        )

    os.environ.pop(ADEN_ENV_VAR, None)

@@ -167,8 +173,10 @@ def _read_credential_key_file() -> str | None:
            value = CREDENTIAL_KEY_PATH.read_text(encoding="utf-8").strip()
            if value:
                return value
+    except (FileNotFoundError, PermissionError) as e:
+        logger.debug("Could not read %s: %s", CREDENTIAL_KEY_PATH, e)
    except Exception:
-        logger.debug("Could not read %s", CREDENTIAL_KEY_PATH)
+        logger.warning("Unexpected error reading %s", CREDENTIAL_KEY_PATH, exc_info=True)
    return None


@@ -196,6 +204,12 @@ def _read_aden_from_encrypted_store() -> str | None:
        cred = storage.load(ADEN_CREDENTIAL_ID)
        if cred:
            return cred.get_key("api_key")
+    except (FileNotFoundError, PermissionError, KeyError) as e:
+        logger.debug("Could not load %s from encrypted store: %s", ADEN_CREDENTIAL_ID, e)
    except Exception:
-        logger.debug("Could not load %s from encrypted store", ADEN_CREDENTIAL_ID)
+        logger.warning(
+            "Unexpected error loading %s from encrypted store",
+            ADEN_CREDENTIAL_ID,
+            exc_info=True,
+        )
    return None
@@ -1,55 +0,0 @@
-"""
-HashiCorp Vault integration for the credential store.
-
-This module provides enterprise-grade secret management through
-HashiCorp Vault integration.
-
-Quick Start:
-    from core.framework.credentials import CredentialStore
-    from core.framework.credentials.vault import HashiCorpVaultStorage
-
-    # Configure Vault storage
-    storage = HashiCorpVaultStorage(
-        url="https://vault.example.com:8200",
-        # token read from VAULT_TOKEN env var
-        mount_point="secret",
-        path_prefix="hive/agents/prod"
-    )
-
-    # Create credential store with Vault backend
-    store = CredentialStore(storage=storage)
-
-    # Use normally - credentials are stored in Vault
-    credential = store.get_credential("my_api")
-
-Requirements:
-    pip install hvac
-
-Authentication:
-    Set the VAULT_TOKEN environment variable or pass the token directly:
-
-        export VAULT_TOKEN="hvs.xxxxxxxxxxxxx"
-
-    For production, consider using Vault auth methods:
-    - Kubernetes auth
-    - AppRole auth
-    - AWS IAM auth
-
-Vault Configuration:
-    Ensure KV v2 secrets engine is enabled:
-
-        vault secrets enable -path=secret kv-v2
-
-    Grant appropriate policies:
-
-        path "secret/data/hive/credentials/*" {
-            capabilities = ["create", "read", "update", "delete", "list"]
-        }
-        path "secret/metadata/hive/credentials/*" {
-            capabilities = ["list", "delete"]
-        }
-"""
-
-from .hashicorp import HashiCorpVaultStorage
-
-__all__ = ["HashiCorpVaultStorage"]
@@ -1,394 +0,0 @@
-"""
-HashiCorp Vault storage adapter.
-
-Provides integration with HashiCorp Vault for enterprise secret management.
-Requires the 'hvac' package: uv pip install hvac
-"""
-
-from __future__ import annotations
-
-import logging
-import os
-from datetime import datetime
-from typing import Any
-
-from pydantic import SecretStr
-
-from ..models import CredentialKey, CredentialObject, CredentialType
-from ..storage import CredentialStorage
-
-logger = logging.getLogger(__name__)
-
-
-class HashiCorpVaultStorage(CredentialStorage):
-    """
-    HashiCorp Vault storage adapter.
-
-    Features:
-    - KV v2 secrets engine support
-    - Namespace support (Enterprise)
-    - Automatic secret versioning
-    - Audit logging via Vault
-
-    The adapter stores credentials in Vault's KV v2 secrets engine with
-    the following structure:
-
-        {mount_point}/data/{path_prefix}/{credential_id}
-        └── data:
-            ├── _type: "oauth2"
-            ├── access_token: "xxx"
-            ├── refresh_token: "yyy"
-            ├── _expires_access_token: "2024-01-26T12:00:00"
-            └── _provider_id: "oauth2"
-
-    Example:
-        storage = HashiCorpVaultStorage(
-            url="https://vault.example.com:8200",
-            token="hvs.xxx",  # Or use VAULT_TOKEN env var
-            mount_point="secret",
-            path_prefix="hive/credentials"
-        )
-
-        store = CredentialStore(storage=storage)
-
-        # Credentials are now stored in Vault
-        store.save_credential(credential)
-        credential = store.get_credential("my_api")
-
-    Authentication:
-        The adapter uses token-based authentication. The token can be provided:
-        1. Directly via the 'token' parameter
-        2. Via the VAULT_TOKEN environment variable
-
-        For production, consider using:
-        - Kubernetes auth method
-        - AppRole auth method
-        - AWS IAM auth method
-
-    Requirements:
-        uv pip install hvac
-    """
-
-    def __init__(
-        self,
-        url: str,
-        token: str | None = None,
-        mount_point: str = "secret",
-        path_prefix: str = "hive/credentials",
-        namespace: str | None = None,
-        verify_ssl: bool = True,
-    ):
-        """
-        Initialize Vault storage.
-
-        Args:
-            url: Vault server URL (e.g., https://vault.example.com:8200)
-            token: Vault token. If None, reads from VAULT_TOKEN env var
-            mount_point: KV secrets engine mount point (default: "secret")
-            path_prefix: Path prefix for all credentials
-            namespace: Vault namespace (Enterprise feature)
-            verify_ssl: Whether to verify SSL certificates
-
-        Raises:
-            ImportError: If hvac is not installed
-            ValueError: If authentication fails
-        """
-        try:
-            import hvac
-        except ImportError as e:
-            raise ImportError(
-                "HashiCorp Vault support requires 'hvac'. Install with: uv pip install hvac"
-            ) from e
-
-        self._url = url
-        self._token = token or os.environ.get("VAULT_TOKEN")
-        self._mount = mount_point
-        self._prefix = path_prefix
-        self._namespace = namespace
-
-        if not self._token:
-            raise ValueError(
-                "Vault token required. Set VAULT_TOKEN env var or pass token parameter."
-            )
-
-        self._client = hvac.Client(
-            url=url,
-            token=self._token,
-            namespace=namespace,
-            verify=verify_ssl,
-        )
-
-        if not self._client.is_authenticated():
-            raise ValueError("Vault authentication failed. Check token and server URL.")
-
-        logger.info(f"Connected to HashiCorp Vault at {url}")
-
-    def _path(self, credential_id: str) -> str:
-        """Build Vault path for credential."""
-        # Sanitize credential_id
-        safe_id = credential_id.replace("/", "_").replace("\\", "_")
-        return f"{self._prefix}/{safe_id}"
-
-    def save(self, credential: CredentialObject) -> None:
-        """Save credential to Vault KV v2."""
-        path = self._path(credential.id)
-        data = self._serialize_for_vault(credential)
-
-        try:
-            self._client.secrets.kv.v2.create_or_update_secret(
-                path=path,
-                secret=data,
-                mount_point=self._mount,
-            )
-            logger.debug(f"Saved credential '{credential.id}' to Vault at {path}")
-        except Exception as e:
-            logger.error(f"Failed to save credential '{credential.id}' to Vault: {e}")
-            raise
-
-    def load(self, credential_id: str) -> CredentialObject | None:
-        """Load credential from Vault."""
-        path = self._path(credential_id)
-
-        try:
-            response = self._client.secrets.kv.v2.read_secret_version(
-                path=path,
-                mount_point=self._mount,
-            )
-            data = response["data"]["data"]
-            return self._deserialize_from_vault(credential_id, data)
-        except Exception as e:
-            # Check if it's a "not found" error
-            error_str = str(e).lower()
-            if "not found" in error_str or "404" in error_str:
-                logger.debug(f"Credential '{credential_id}' not found in Vault")
-                return None
-            logger.error(f"Failed to load credential '{credential_id}' from Vault: {e}")
-            raise
-
-    def delete(self, credential_id: str) -> bool:
-        """Delete credential from Vault (all versions)."""
-        path = self._path(credential_id)
-
-        try:
-            self._client.secrets.kv.v2.delete_metadata_and_all_versions(
-                path=path,
-                mount_point=self._mount,
-            )
-            logger.debug(f"Deleted credential '{credential_id}' from Vault")
-            return True
-        except Exception as e:
-            error_str = str(e).lower()
-            if "not found" in error_str or "404" in error_str:
-                return False
-            logger.error(f"Failed to delete credential '{credential_id}' from Vault: {e}")
-            raise
-
-    def list_all(self) -> list[str]:
-        """List all credentials under the prefix."""
-        try:
-            response = self._client.secrets.kv.v2.list_secrets(
-                path=self._prefix,
-                mount_point=self._mount,
-            )
-            keys = response.get("data", {}).get("keys", [])
-            # Remove trailing slashes from folder names
-            return [k.rstrip("/") for k in keys]
-        except Exception as e:
-            error_str = str(e).lower()
-            if "not found" in error_str or "404" in error_str:
-                return []
-            logger.error(f"Failed to list credentials from Vault: {e}")
-            raise
-
-    def exists(self, credential_id: str) -> bool:
-        """Check if credential exists in Vault."""
-        try:
-            path = self._path(credential_id)
-            self._client.secrets.kv.v2.read_secret_version(
-                path=path,
-                mount_point=self._mount,
-            )
-            return True
-        except Exception:
-            return False
-
-    def _serialize_for_vault(self, credential: CredentialObject) -> dict[str, Any]:
-        """Convert credential to Vault secret format."""
-        data: dict[str, Any] = {
-            "_type": credential.credential_type.value,
-        }
-
-        if credential.provider_id:
-            data["_provider_id"] = credential.provider_id
-
-        if credential.description:
-            data["_description"] = credential.description
-
-        if credential.auto_refresh:
-            data["_auto_refresh"] = "true"
-
-        # Store each key
-        for key_name, key in credential.keys.items():
-            data[key_name] = key.get_secret_value()
-
-            if key.expires_at:
-                data[f"_expires_{key_name}"] = key.expires_at.isoformat()
-
-            if key.metadata:
-                data[f"_metadata_{key_name}"] = str(key.metadata)
-
-        return data
-
-    def _deserialize_from_vault(self, credential_id: str, data: dict[str, Any]) -> CredentialObject:
-        """Reconstruct credential from Vault secret."""
-        # Extract metadata fields
-        cred_type = CredentialType(data.pop("_type", "api_key"))
-        provider_id = data.pop("_provider_id", None)
-        description = data.pop("_description", "")
-        auto_refresh = data.pop("_auto_refresh", "") == "true"
-
-        # Build keys dict
-        keys: dict[str, CredentialKey] = {}
-
-        # Find all non-metadata keys
-        key_names = [k for k in data.keys() if not k.startswith("_")]
-
-        for key_name in key_names:
-            value = data[key_name]
-
-            # Check for expiration
-            expires_at = None
-            expires_key = f"_expires_{key_name}"
-            if expires_key in data:
-                try:
-                    expires_at = datetime.fromisoformat(data[expires_key])
-                except (ValueError, TypeError):
-                    pass
-
-            # Check for metadata
-            metadata: dict[str, Any] = {}
-            metadata_key = f"_metadata_{key_name}"
-            if metadata_key in data:
-                try:
-                    import ast
-
-                    metadata = ast.literal_eval(data[metadata_key])
-                except (ValueError, SyntaxError):
-                    pass
-
-            keys[key_name] = CredentialKey(
-                name=key_name,
-                value=SecretStr(value),
-                expires_at=expires_at,
-                metadata=metadata,
-            )
-
-        return CredentialObject(
-            id=credential_id,
-            credential_type=cred_type,
-            keys=keys,
-            provider_id=provider_id,
-            description=description,
-            auto_refresh=auto_refresh,
-        )
-
-    # --- Vault-Specific Operations ---
-
-    def get_secret_metadata(self, credential_id: str) -> dict[str, Any] | None:
-        """
-        Get Vault metadata for a secret (version info, timestamps, etc.).
-
-        Args:
-            credential_id: The credential identifier
-
-        Returns:
-            Metadata dict or None if not found
-        """
-        path = self._path(credential_id)
-
-        try:
-            response = self._client.secrets.kv.v2.read_secret_metadata(
-                path=path,
-                mount_point=self._mount,
-            )
-            return response.get("data", {})
-        except Exception:
-            return None
-
-    def soft_delete(self, credential_id: str, versions: list[int] | None = None) -> bool:
-        """
-        Soft delete specific versions (can be recovered).
-
-        Args:
-            credential_id: The credential identifier
-            versions: Version numbers to delete. If None, deletes latest.
-
-        Returns:
-            True if successful
-        """
-        path = self._path(credential_id)
-
-        try:
-            if versions:
-                self._client.secrets.kv.v2.delete_secret_versions(
-                    path=path,
-                    versions=versions,
-                    mount_point=self._mount,
-                )
-            else:
-                self._client.secrets.kv.v2.delete_latest_version_of_secret(
-                    path=path,
-                    mount_point=self._mount,
-                )
-            return True
-        except Exception as e:
-            logger.error(f"Soft delete failed for '{credential_id}': {e}")
-            return False
-
-    def undelete(self, credential_id: str, versions: list[int]) -> bool:
-        """
-        Recover soft-deleted versions.
-
-        Args:
-            credential_id: The credential identifier
-            versions: Version numbers to recover
-
-        Returns:
-            True if successful
-        """
-        path = self._path(credential_id)
-
-        try:
-            self._client.secrets.kv.v2.undelete_secret_versions(
-                path=path,
-                versions=versions,
-                mount_point=self._mount,
-            )
-            return True
-        except Exception as e:
-            logger.error(f"Undelete failed for '{credential_id}': {e}")
-            return False
-
-    def load_version(self, credential_id: str, version: int) -> CredentialObject | None:
-        """
-        Load a specific version of a credential.
-
-        Args:
-            credential_id: The credential identifier
-            version: Version number to load
-
-        Returns:
-            CredentialObject or None
-        """
-        path = self._path(credential_id)
-
-        try:
-            response = self._client.secrets.kv.v2.read_secret_version(
-                path=path,
-                version=version,
-                mount_point=self._mount,
-            )
-            data = response["data"]["data"]
-            return self._deserialize_from_vault(credential_id, data)
-        except Exception:
-            return None
@@ -307,13 +307,13 @@ class NodeConversation:
    def __init__(
        self,
        system_prompt: str = "",
-        max_history_tokens: int = 32000,
+        max_context_tokens: int = 32000,
        compaction_threshold: float = 0.8,
        output_keys: list[str] | None = None,
        store: ConversationStore | None = None,
    ) -> None:
        self._system_prompt = system_prompt
-        self._max_history_tokens = max_history_tokens
+        self._max_context_tokens = max_context_tokens
        self._compaction_threshold = compaction_threshold
        self._output_keys = output_keys
        self._store = store
@@ -525,16 +525,16 @@ class NodeConversation:
        self._last_api_input_tokens = actual_input_tokens

    def usage_ratio(self) -> float:
-        """Current token usage as a fraction of *max_history_tokens*.
+        """Current token usage as a fraction of *max_context_tokens*.

-        Returns 0.0 when ``max_history_tokens`` is zero (unlimited).
+        Returns 0.0 when ``max_context_tokens`` is zero (unlimited).
        """
-        if self._max_history_tokens <= 0:
+        if self._max_context_tokens <= 0:
            return 0.0
-        return self.estimate_tokens() / self._max_history_tokens
+        return self.estimate_tokens() / self._max_context_tokens

    def needs_compaction(self) -> bool:
-        return self.estimate_tokens() >= self._max_history_tokens * self._compaction_threshold
+        return self.estimate_tokens() >= self._max_context_tokens * self._compaction_threshold

    # --- Output-key extraction ---------------------------------------------

@@ -1029,7 +1029,7 @@ class NodeConversation:
        await self._store.write_meta(
            {
                "system_prompt": self._system_prompt,
-                "max_history_tokens": self._max_history_tokens,
+                "max_context_tokens": self._max_context_tokens,
                "compaction_threshold": self._compaction_threshold,
                "output_keys": self._output_keys,
            }
@@ -1062,7 +1062,7 @@ class NodeConversation:

        conv = cls(
            system_prompt=meta.get("system_prompt", ""),
-            max_history_tokens=meta.get("max_history_tokens", 32000),
+            max_context_tokens=meta.get("max_context_tokens", 32000),
            compaction_threshold=meta.get("compaction_threshold", 0.8),
            output_keys=meta.get("output_keys"),
            store=store,
@@ -37,7 +37,7 @@ async def evaluate_phase_completion(
    phase_description: str,
    success_criteria: str,
    accumulator_state: dict[str, Any],
-    max_history_tokens: int = 8_196,
+    max_context_tokens: int = 8_196,
 ) -> PhaseVerdict:
    """Level 2 judge: read the conversation and evaluate quality.

@@ -50,7 +50,7 @@ async def evaluate_phase_completion(
        phase_description: Description of the phase
        success_criteria: Natural-language criteria for phase completion
        accumulator_state: Current output key values
-        max_history_tokens: Main conversation token budget (judge gets 20%)
+        max_context_tokens: Main conversation token budget (judge gets 20%)

    Returns:
        PhaseVerdict with action and optional feedback
@@ -89,7 +89,7 @@ FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
        response = await llm.acomplete(
            messages=[{"role": "user", "content": user_prompt}],
            system=system_prompt,
-            max_tokens=max(1024, max_history_tokens // 5),
+            max_tokens=max(1024, max_context_tokens // 5),
            max_retries=1,
        )
        if not response.content or not response.content.strip():
@@ -73,6 +73,7 @@ class _EscalationReceiver:
    def __init__(self) -> None:
        self._event = asyncio.Event()
        self._response: str | None = None
+        self._awaiting_input = True  # So inject_worker_message() can prefer us

    async def inject_event(self, content: str, *, is_client_input: bool = False) -> None:
        """Called by ExecutionStream.inject_input() when the user responds."""
@@ -101,7 +102,10 @@ class JudgeVerdict:
    """Result of judge evaluation for the event loop."""

    action: Literal["ACCEPT", "RETRY", "ESCALATE"]
-    feedback: str = ""
+    # None  = no evaluation happened (skip_judge, tool-continue); not logged.
+    # ""    = evaluated but no feedback; logged with default text.
+    # "..." = evaluated with feedback; logged as-is.
+    feedback: str | None = None


@runtime_checkable
@@ -131,7 +135,7 @@ class SubagentJudge:
    async def evaluate(self, context: dict[str, Any]) -> JudgeVerdict:
        missing = context.get("missing_keys", [])
        if not missing:
-            return JudgeVerdict(action="ACCEPT")
+            return JudgeVerdict(action="ACCEPT", feedback="")

        iteration = context.get("iteration", 0)
        remaining = self._max_iterations - iteration - 1
@@ -165,8 +169,8 @@ class LoopConfig:
    max_tool_calls_per_turn: int = 30
    judge_every_n_turns: int = 1
    stall_detection_threshold: int = 3
-    stall_similarity_threshold: float = 0.7
-    max_history_tokens: int = 32_000
+    stall_similarity_threshold: float = 0.85
+    max_context_tokens: int = 32_000
    store_prefix: str = ""

    # Overflow margin for max_tool_calls_per_turn.  Tool calls are only
@@ -347,6 +351,7 @@ class EventLoopNode(NodeProtocol):
        self._awaiting_input = False
        self._shutdown = False
        self._stream_task: asyncio.Task | None = None
+        self._tool_task: asyncio.Task | None = None  # gather task while tools run
        # Track which nodes already have an action plan emitted (skip on revisit)
        self._action_plan_emitted: set[str] = set()
        # Monotonic counter for spillover file naming (web_search_1.txt, etc.)
@@ -477,28 +482,37 @@ class EventLoopNode(NodeProtocol):
                # If it doesn't exist yet, seed it with available context.
                if self._config.spillover_dir:
                    _adapt_path = Path(self._config.spillover_dir) / "adapt.md"
-                    if not _adapt_path.exists() and ctx.accounts_prompt:
+                    if not _adapt_path.exists():
                        _adapt_path.parent.mkdir(parents=True, exist_ok=True)
-                        _adapt_path.write_text(
-                            f"## Identity\n{ctx.accounts_prompt}\n",
-                            encoding="utf-8",
+                        seed = (
+                            f"## Identity\n{ctx.accounts_prompt}\n"
+                            if ctx.accounts_prompt
+                            else "# Session Working Memory\n"
                        )
+                        _adapt_path.write_text(seed, encoding="utf-8")
                    if _adapt_path.exists():
                        _adapt_text = _adapt_path.read_text(encoding="utf-8").strip()
                        if _adapt_text:
                            system_prompt = (
                                f"{system_prompt}\n\n"
-                                f"--- Your Memory ---\n{_adapt_text}\n--- End Memory ---\n\n"
-                                'Maintain your memory by calling save_data("adapt.md", ...) '
-                                'or edit_data("adapt.md", ...) as you work.\n'
-                                "IMMEDIATELY save: user rules about which account/identity to use, "
-                                "behavioral constraints, and preferences. "
-                                "Also record session history, decisions, and working notes."
+                                "--- Session Working Memory ---\n"
+                                f"{_adapt_text}\n"
+                                "--- End Session Working Memory ---\n\n"
+                                "Maintain your session working memory by calling "
+                                'save_data("adapt.md", ...) or edit_data("adapt.md", ...)'
+                                " as you work.\n"
+                                "This is session-scoped scratch space. "
+                                "IMMEDIATELY save: account/identity rules, "
+                                "behavioral constraints, and preferences specific to "
+                                "this session. Also record current task state, "
+                                "decisions, and working notes. "
+                                "For lasting knowledge about the user, use "
+                                "update_queen_memory() and append_queen_journal() instead."
                            )

                conversation = NodeConversation(
                    system_prompt=system_prompt,
-                    max_history_tokens=self._config.max_history_tokens,
+                    max_context_tokens=self._config.max_context_tokens,
                    output_keys=ctx.node_spec.output_keys or None,
                    store=self._conversation_store,
                )
@@ -535,6 +549,8 @@ class EventLoopNode(NodeProtocol):
            tools.append(set_output_tool)
        if ctx.node_spec.client_facing and not ctx.event_triggered:
            tools.append(self._build_ask_user_tool())
+            if stream_id == "queen":
+                tools.append(self._build_ask_user_multiple_tool())
        # Workers/subagents can escalate blockers to the queen.
        if stream_id not in ("queen", "judge"):
            tools.append(self._build_escalate_tool())
@@ -621,6 +637,7 @@ class EventLoopNode(NodeProtocol):
                _synthetic_names = {
                    "set_output",
                    "ask_user",
+                    "ask_user_multiple",
                    "escalate",
                    "delegate_to_sub_agent",
                    "report_to_parent",
@@ -671,6 +688,7 @@ class EventLoopNode(NodeProtocol):
                        queen_input_requested,
                        request_system_prompt,
                        request_messages,
+                        reported_to_parent,
                    ) = await self._run_single_turn(
                        ctx, conversation, tools, iteration, accumulator
                    )
@@ -697,6 +715,7 @@ class EventLoopNode(NodeProtocol):
                        model=turn_tokens.get("model", ""),
                        input_tokens=turn_tokens.get("input", 0),
                        output_tokens=turn_tokens.get("output", 0),
+                        cached_tokens=turn_tokens.get("cached", 0),
                        execution_id=execution_id,
                        iteration=iteration,
                    )
@@ -872,6 +891,7 @@ class EventLoopNode(NodeProtocol):
                and not outputs_set
                and not user_input_requested
                and not queen_input_requested
+                and not reported_to_parent
            )
            if truly_empty and accumulator is not None:
                missing = self._get_missing_output_keys(
@@ -1042,7 +1062,9 @@ class EventLoopNode(NodeProtocol):
            mcp_tool_calls = [
                tc
                for tc in logged_tool_calls
-                if tc.get("tool_name") not in ("set_output", "ask_user", "escalate")
+                if tc.get("tool_name") not in (
+                    "set_output", "ask_user", "ask_user_multiple", "escalate",
+                )
            ]
            if mcp_tool_calls:
                fps = self._fingerprint_tool_calls(mcp_tool_calls)
@@ -1236,8 +1258,12 @@ class EventLoopNode(NodeProtocol):
                    iteration,
                    _cf_auto,
                )
+                # Check for multi-question batch from ask_user_multiple
+                multi_qs = getattr(self, "_pending_multi_questions", None)
+                self._pending_multi_questions = None
                got_input = await self._await_user_input(
-                    ctx, prompt=_cf_prompt, options=ask_user_options
+                    ctx, prompt=_cf_prompt, options=ask_user_options,
+                    questions=multi_qs,
                )
                logger.info("[%s] iter=%d: unblocked, got_input=%s", node_id, iteration, got_input)
                if not got_input:
@@ -1322,8 +1348,8 @@ class EventLoopNode(NodeProtocol):
                # Auto-block beyond grace -- fall through to judge (6i)

            # 6h''. Worker wait for queen guidance
-            # If a worker escalates with wait_for_response=true, pause here and
-            # skip judge evaluation until queen injects guidance.
+            # When a worker escalates, pause here and skip judge evaluation
+            # until the queen injects guidance.
            if queen_input_requested:
                if self._shutdown:
                    await self._publish_loop_completed(
@@ -1465,7 +1491,7 @@ class EventLoopNode(NodeProtocol):
                continue

            # Judge evaluation (should_judge is always True here)
-            verdict = await self._evaluate(
+            verdict = await self._judge_turn(
                ctx,
                conversation,
                accumulator,
@@ -1544,7 +1570,7 @@ class EventLoopNode(NodeProtocol):
                        node_type="event_loop",
                        step_index=iteration,
                        verdict="ACCEPT",
-                        verdict_feedback=verdict.feedback,
+                        verdict_feedback=verdict.feedback or "",
                        tool_calls=logged_tool_calls,
                        llm_text=assistant_text,
                        input_tokens=turn_tokens.get("input", 0),
@@ -1587,7 +1613,7 @@ class EventLoopNode(NodeProtocol):
                        node_type="event_loop",
                        step_index=iteration,
                        verdict="ESCALATE",
-                        verdict_feedback=verdict.feedback,
+                        verdict_feedback=verdict.feedback or "",
                        tool_calls=logged_tool_calls,
                        llm_text=assistant_text,
                        input_tokens=turn_tokens.get("input", 0),
@@ -1599,7 +1625,7 @@ class EventLoopNode(NodeProtocol):
                        node_name=ctx.node_spec.name,
                        node_type="event_loop",
                        success=False,
-                        error=f"Judge escalated: {verdict.feedback}",
+                        error=f"Judge escalated: {verdict.feedback or 'no feedback'}",
                        total_steps=iteration + 1,
                        tokens_used=total_input_tokens + total_output_tokens,
                        input_tokens=total_input_tokens,
@@ -1613,7 +1639,7 @@ class EventLoopNode(NodeProtocol):
                    )
                return NodeResult(
                    success=False,
-                    error=f"Judge escalated: {verdict.feedback}",
+                    error=f"Judge escalated: {verdict.feedback or 'no feedback'}",
                    output=accumulator.to_dict(),
                    tokens_used=total_input_tokens + total_output_tokens,
                    latency_ms=latency_ms,
@@ -1629,15 +1655,16 @@ class EventLoopNode(NodeProtocol):
                        node_type="event_loop",
                        step_index=iteration,
                        verdict="RETRY",
-                        verdict_feedback=verdict.feedback,
+                        verdict_feedback=verdict.feedback or "",
                        tool_calls=logged_tool_calls,
                        llm_text=assistant_text,
                        input_tokens=turn_tokens.get("input", 0),
                        output_tokens=turn_tokens.get("output", 0),
                        latency_ms=iter_latency_ms,
                    )
-                if verdict.feedback:
-                    await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
+                if verdict.feedback is not None:
+                    fb = verdict.feedback or "[Judge returned RETRY without feedback]"
+                    await conversation.add_user_message(f"[Judge feedback]: {fb}")
                continue

        # 7. Max iterations exhausted
@@ -1702,14 +1729,16 @@ class EventLoopNode(NodeProtocol):
        self._input_ready.set()

    def cancel_current_turn(self) -> None:
-        """Cancel the current LLM streaming turn instantly.
+        """Cancel the current LLM streaming turn or in-progress tool calls instantly.

        Unlike signal_shutdown() which permanently stops the event loop,
-        this only kills the in-progress HTTP stream via task.cancel().
+        this only kills the in-progress HTTP stream or tool gather task.
        The queen stays alive for the next user message.
        """
        if self._stream_task and not self._stream_task.done():
            self._stream_task.cancel()
+        if self._tool_task and not self._tool_task.done():
+            self._tool_task.cancel()

    async def _await_user_input(
        self,
@@ -1717,6 +1746,7 @@ class EventLoopNode(NodeProtocol):
        prompt: str = "",
        *,
        options: list[str] | None = None,
+        questions: list[dict] | None = None,
        emit_client_request: bool = True,
    ) -> bool:
        """Block until user input arrives or shutdown is signaled.
@@ -1731,6 +1761,8 @@ class EventLoopNode(NodeProtocol):
            options: Optional predefined choices for the user (from ask_user).
                Passed through to the CLIENT_INPUT_REQUESTED event so the
                frontend can render a QuestionWidget with buttons.
+            questions: Optional list of question dicts for ask_user_multiple.
+                Each dict has id, prompt, and optional options.
            emit_client_request: When False, wait silently without publishing
                CLIENT_INPUT_REQUESTED. Used for worker waits where input is
                expected from the queen via inject_worker_message().
@@ -1755,6 +1787,7 @@ class EventLoopNode(NodeProtocol):
                prompt=prompt,
                execution_id=ctx.execution_id or "",
                options=options,
+                questions=questions,
            )

        self._awaiting_input = True
@@ -1787,12 +1820,13 @@ class EventLoopNode(NodeProtocol):
        bool,
        str,
        list[dict[str, Any]],
+        bool,
    ]:
        """Run a single LLM turn with streaming and tool execution.

        Returns (assistant_text, real_tool_results, outputs_set, token_counts, logged_tool_calls,
        user_input_requested, ask_user_prompt, ask_user_options, queen_input_requested,
-        system_prompt, messages).
+        system_prompt, messages, reported_to_parent).

        ``real_tool_results`` contains only results from actual tools (web_search,
        etc.), NOT from synthetic framework tools such as ``set_output``,
@@ -1802,8 +1836,8 @@ class EventLoopNode(NodeProtocol):
        ``ask_user`` during this turn.  This separation lets the caller treat
        synthetic tools as framework concerns rather than tool-execution concerns.
        ``queen_input_requested`` is True when the worker called
-        ``escalate(wait_for_response=true)`` and should wait for
-        queen guidance before judge evaluation.
+        ``escalate`` and should wait for queen guidance before judge
+        evaluation.

        ``logged_tool_calls`` accumulates ALL tool calls across inner iterations
        (real tools, set_output, and discarded calls) for L3 logging.  Unlike
@@ -1813,7 +1847,7 @@ class EventLoopNode(NodeProtocol):
        stream_id = ctx.stream_id or ctx.node_id
        node_id = ctx.node_id
        execution_id = ctx.execution_id or ""
-        token_counts: dict[str, int] = {"input": 0, "output": 0}
+        token_counts: dict[str, int] = {"input": 0, "output": 0, "cached": 0}
        tool_call_count = 0
        final_text = ""
        final_system_prompt = conversation.system_prompt
@@ -1824,6 +1858,7 @@ class EventLoopNode(NodeProtocol):
        ask_user_prompt = ""
        ask_user_options: list[str] | None = None
        queen_input_requested = False
+        reported_to_parent = False
        # Accumulate ALL tool calls across inner iterations for L3 logging.
        # Unlike real_tool_results (reset each inner iteration), this persists.
        logged_tool_calls: list[dict] = []
@@ -1893,6 +1928,7 @@ class EventLoopNode(NodeProtocol):
                    elif isinstance(event, FinishEvent):
                        token_counts["input"] += event.input_tokens
                        token_counts["output"] += event.output_tokens
+                        token_counts["cached"] += event.cached_tokens
                        token_counts["stop_reason"] = event.stop_reason
                        token_counts["model"] = event.model

@@ -1977,6 +2013,7 @@ class EventLoopNode(NodeProtocol):
                    queen_input_requested,
                    final_system_prompt,
                    final_messages,
+                    reported_to_parent,
                )

            # Execute tool calls — framework tools (set_output, ask_user)
@@ -2120,11 +2157,63 @@ class EventLoopNode(NodeProtocol):
                    )
                    results_by_id[tc.tool_use_id] = result

+                elif tc.tool_name == "ask_user_multiple":
+                    # --- Framework-level ask_user_multiple ---
+                    user_input_requested = True
+                    raw_questions = tc.tool_input.get("questions", [])
+                    if not isinstance(raw_questions, list) or len(raw_questions) < 2:
+                        result = ToolResult(
+                            tool_use_id=tc.tool_use_id,
+                            content=(
+                                "ERROR: questions must be an array of at "
+                                "least 2 question objects. Use ask_user "
+                                "for single questions."
+                            ),
+                            is_error=True,
+                        )
+                        results_by_id[tc.tool_use_id] = result
+                        user_input_requested = False
+                        continue
+
+                    # Normalize each question entry
+                    questions: list[dict] = []
+                    for i, q in enumerate(raw_questions):
+                        if not isinstance(q, dict):
+                            continue
+                        qid = str(q.get("id", f"q{i+1}"))
+                        prompt = str(q.get("prompt", ""))
+                        opts = q.get("options", None)
+                        if isinstance(opts, list):
+                            opts = [str(o) for o in opts if o]
+                            if len(opts) < 2:
+                                opts = None
+                        else:
+                            opts = None
+                        questions.append({
+                            "id": qid,
+                            "prompt": prompt,
+                            **({"options": opts} if opts else {}),
+                        })
+
+                    # Store as multi-question prompt/options for
+                    # the event emission path
+                    ask_user_prompt = ""
+                    ask_user_options = None
+                    # Pass the full questions list via a special
+                    # key that the event emitter picks up
+                    self._pending_multi_questions = questions
+
+                    result = ToolResult(
+                        tool_use_id=tc.tool_use_id,
+                        content="Waiting for user input...",
+                        is_error=False,
+                    )
+                    results_by_id[tc.tool_use_id] = result
+
                elif tc.tool_name == "escalate":
                    # --- Framework-level escalate handling ---
                    reason = str(tc.tool_input.get("reason", "")).strip()
                    context = str(tc.tool_input.get("context", "")).strip()
-                    # Always wait for queen guidance

                    if stream_id in ("queen", "judge"):
                        result = ToolResult(
@@ -2160,7 +2249,7 @@ class EventLoopNode(NodeProtocol):

                    result = ToolResult(
                        tool_use_id=tc.tool_use_id,
-                        content="Escalation requested to hive_coder (queen); waiting for guidance.",
+                        content="Escalation requested to queen; waiting for guidance.",
                        is_error=False,
                    )
                    results_by_id[tc.tool_use_id] = result
@@ -2179,6 +2268,7 @@ class EventLoopNode(NodeProtocol):

                elif tc.tool_name == "report_to_parent":
                    # --- Report from sub-agent to parent (optionally blocking) ---
+                    reported_to_parent = True
                    msg = tc.tool_input.get("message", "")
                    data = tc.tool_input.get("data")
                    wait = tc.tool_input.get("wait_for_response", False)
@@ -2250,10 +2340,16 @@ class EventLoopNode(NodeProtocol):
                    _dur = round(time.time() - _s, 3)
                    return _r, _iso, _dur

-                timed_results = await asyncio.gather(
-                    *(_timed_execute(tc) for tc in pending_real),
-                    return_exceptions=True,
+                self._tool_task = asyncio.ensure_future(
+                    asyncio.gather(
+                        *(_timed_execute(tc) for tc in pending_real),
+                        return_exceptions=True,
+                    )
                )
+                try:
+                    timed_results = await self._tool_task
+                finally:
+                    self._tool_task = None
                # gather(return_exceptions=True) captures CancelledError
                # as a return value instead of propagating it.  Re-raise
                # so stop_worker actually stops the execution.
@@ -2360,6 +2456,7 @@ class EventLoopNode(NodeProtocol):
                if tc.tool_name not in (
                    "set_output",
                    "ask_user",
+                    "ask_user_multiple",
                    "escalate",
                    "delegate_to_sub_agent",
                    "report_to_parent",
@@ -2429,7 +2526,7 @@ class EventLoopNode(NodeProtocol):
                # next turn.  The char-based token estimator underestimates
                # actual API tokens, so the standard compaction check in the
                # outer loop may not trigger in time.
-                protect = max(2000, self._config.max_history_tokens // 12)
+                protect = max(2000, self._config.max_context_tokens // 12)
                pruned = await conversation.prune_old_tool_results(
                    protect_tokens=protect,
                    min_prune_tokens=max(1000, protect // 3),
@@ -2438,7 +2535,7 @@ class EventLoopNode(NodeProtocol):
                    logger.info(
                        "Post-limit pruning: cleared %d old tool results (budget: %d)",
                        pruned,
-                        self._config.max_history_tokens,
+                        self._config.max_context_tokens,
                    )
                # Limit hit — return from this turn so the judge can
                # evaluate instead of looping back for another stream.
@@ -2454,11 +2551,12 @@ class EventLoopNode(NodeProtocol):
                    queen_input_requested,
                    final_system_prompt,
                    final_messages,
+                    reported_to_parent,
                )

            # --- Mid-turn pruning: prevent context blowup within a single turn ---
            if conversation.usage_ratio() >= 0.6:
-                protect = max(2000, self._config.max_history_tokens // 12)
+                protect = max(2000, self._config.max_context_tokens // 12)
                pruned = await conversation.prune_old_tool_results(
                    protect_tokens=protect,
                    min_prune_tokens=max(1000, protect // 3),
@@ -2485,6 +2583,7 @@ class EventLoopNode(NodeProtocol):
                    queen_input_requested,
                    final_system_prompt,
                    final_messages,
+                    reported_to_parent,
                )

            # Tool calls processed -- loop back to stream with updated conversation
@@ -2550,6 +2649,73 @@ class EventLoopNode(NodeProtocol):
            },
        )

+    def _build_ask_user_multiple_tool(self) -> Tool:
+        """Build the synthetic ask_user_multiple tool for batched questions.
+
+        Queen-only tool that presents multiple questions at once so the user
+        can answer them all in a single interaction rather than one at a time.
+        """
+        return Tool(
+            name="ask_user_multiple",
+            description=(
+                "Ask the user multiple questions at once. Use this instead of "
+                "ask_user when you have 2 or more questions to ask in the same "
+                "turn — it lets the user answer everything in one go rather than "
+                "going back and forth. Each question can have its own predefined "
+                "options (2-3 choices) or be free-form. The UI renders all "
+                "questions together with a single Submit button. "
+                "ALWAYS prefer this over ask_user when you have multiple things "
+                "to clarify. "
+                "IMPORTANT: Do NOT repeat the questions in your text response — "
+                "the widget renders them. Keep your text to a brief intro only. "
+                'Example: {"questions": ['
+                '  {"id": "scope", "prompt": "What scope?", "options": ["Full", "Partial"]},'
+                '  {"id": "format", "prompt": "Output format?", "options": ["PDF", "CSV", "JSON"]},'
+                '  {"id": "details", "prompt": "Any special requirements?"}'
+                "]}"
+            ),
+            parameters={
+                "type": "object",
+                "properties": {
+                    "questions": {
+                        "type": "array",
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "id": {
+                                    "type": "string",
+                                    "description": (
+                                        "Short identifier for this question "
+                                        "(used in the response)."
+                                    ),
+                                },
+                                "prompt": {
+                                    "type": "string",
+                                    "description": "The question text shown to the user.",
+                                },
+                                "options": {
+                                    "type": "array",
+                                    "items": {"type": "string"},
+                                    "description": (
+                                        "2-3 predefined choices. The UI appends an "
+                                        "'Other' free-text input automatically. "
+                                        "Omit only when the user must type a free-form answer."
+                                    ),
+                                    "minItems": 2,
+                                    "maxItems": 3,
+                                },
+                            },
+                            "required": ["id", "prompt"],
+                        },
+                        "minItems": 2,
+                        "maxItems": 8,
+                        "description": "List of questions to present to the user.",
+                    },
+                },
+                "required": ["questions"],
+            },
+        )
+
    def _build_set_output_tool(self, output_keys: list[str] | None) -> Tool | None:
        """Build the synthetic set_output tool for explicit output declaration."""
        if not output_keys:
@@ -2582,7 +2748,7 @@ class EventLoopNode(NodeProtocol):
        return Tool(
            name="escalate",
            description=(
-                "Escalate to the Hive Coder queen when requesting user input, "
+                "Escalate to the queen when requesting user input, "
                "blocked by errors, missing "
                "credentials, or ambiguous constraints that require supervisor "
                "guidance. Include a concise reason and optional context. "
@@ -2771,7 +2937,7 @@ class EventLoopNode(NodeProtocol):
    # Judge evaluation
    # -------------------------------------------------------------------

-    async def _evaluate(
+    async def _judge_turn(
        self,
        ctx: NodeContext,
        conversation: NodeConversation,
@@ -2780,14 +2946,29 @@ class EventLoopNode(NodeProtocol):
        tool_results: list[dict],
        iteration: int,
    ) -> JudgeVerdict:
-        """Evaluate the current state using judge or implicit logic."""
-        # Short-circuit: subagent called report_to_parent(mark_complete=True)
+        """Evaluate the current state using judge or implicit logic.
+
+        Evaluation levels (in order):
+          0. Short-circuits: mark_complete, skip_judge, tool-continue.
+          1. Custom judge (JudgeProtocol) — full authority when set.
+          2. Implicit judge — output-key check + optional conversation-aware
+             quality gate (when ``success_criteria`` is defined).
+
+        Returns a JudgeVerdict.  ``feedback=None`` means no real evaluation
+        happened (skip_judge, tool-continue); the caller must not inject a
+        feedback message.  Any non-None feedback (including ``""``) means a
+        real evaluation occurred and will be logged into the conversation.
+        """
+
+        # --- Level 0: short-circuits (no evaluation) -----------------------
+
        if self._mark_complete_flag:
            return JudgeVerdict(action="ACCEPT")

-        # Opt-out: node explicitly disables judge (e.g. conversational queen)
        if ctx.node_spec.skip_judge:
-            return JudgeVerdict(action="RETRY", feedback="")
+            return JudgeVerdict(action="RETRY")  # feedback=None → not logged
+
+        # --- Level 1: custom judge -----------------------------------------

        if self._judge is not None:
            context = {
@@ -2802,81 +2983,82 @@ class EventLoopNode(NodeProtocol):
                    accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
                ),
            }
-            return await self._judge.evaluate(context)
+            verdict = await self._judge.evaluate(context)
+            # Ensure evaluated RETRY always carries feedback for logging.
+            if verdict.action == "RETRY" and not verdict.feedback:
+                return JudgeVerdict(action="RETRY", feedback="Custom judge returned RETRY.")
+            return verdict

-        # Implicit judge: accept when no tool calls and all output keys present
-        if not tool_results:
-            missing = self._get_missing_output_keys(
-                accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
+        # --- Level 2: implicit judge ---------------------------------------
+
+        # Real tool calls were made — let the agent keep working.
+        if tool_results:
+            return JudgeVerdict(action="RETRY")  # feedback=None → not logged
+
+        missing = self._get_missing_output_keys(
+            accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
+        )
+
+        if missing:
+            return JudgeVerdict(
+                action="RETRY",
+                feedback=(
+                    f"Task incomplete. Required outputs not yet produced: {missing}. "
+                    f"Follow your system prompt instructions to complete the work."
+                ),
            )
-            if not missing:
-                # Safety check: when ALL output keys are nullable and NONE
-                # have been set, the node produced nothing useful.  Retry
-                # instead of accepting an empty result — this prevents
-                # client-facing nodes from terminating before the user
-                # ever interacts, and non-client-facing nodes from
-                # short-circuiting without doing their work.
-                output_keys = ctx.node_spec.output_keys or []
-                nullable_keys = set(ctx.node_spec.nullable_output_keys or [])
-                all_nullable = output_keys and nullable_keys >= set(output_keys)
-                none_set = not any(accumulator.get(k) is not None for k in output_keys)
-                if all_nullable and none_set:
-                    return JudgeVerdict(
-                        action="RETRY",
-                        feedback=(
-                            f"No output keys have been set yet. "
-                            f"Use set_output to set at least one of: {output_keys}"
-                        ),
-                    )

-                # Client-facing nodes with no output keys are meant for
-                # continuous interaction — they should not auto-accept.
-                # Only exit via shutdown, max_iterations, or max_node_visits.
-                # Inject tool-use pressure so models stuck in a
-                # "narrate-instead-of-act" loop get corrective feedback.
-                if not output_keys and ctx.node_spec.client_facing:
-                    return JudgeVerdict(
-                        action="RETRY",
-                        feedback=(
-                            "STOP describing what you will do. "
-                            "You have FULL access to all tools — file creation, "
-                            "shell commands, MCP tools — and you CAN call them "
-                            "directly in your response. Respond ONLY with tool "
-                            "calls, no prose. Execute the task now."
-                        ),
-                    )
+        # All output keys present — run safety checks before accepting.

-                # Level 2: conversation-aware quality check (if success_criteria set)
-                if ctx.node_spec.success_criteria and ctx.llm:
-                    from framework.graph.conversation_judge import evaluate_phase_completion
+        output_keys = ctx.node_spec.output_keys or []
+        nullable_keys = set(ctx.node_spec.nullable_output_keys or [])

-                    verdict = await evaluate_phase_completion(
-                        llm=ctx.llm,
-                        conversation=conversation,
-                        phase_name=ctx.node_spec.name,
-                        phase_description=ctx.node_spec.description,
-                        success_criteria=ctx.node_spec.success_criteria,
-                        accumulator_state=accumulator.to_dict(),
-                        max_history_tokens=self._config.max_history_tokens,
-                    )
-                    if verdict.action != "ACCEPT":
-                        return JudgeVerdict(
-                            action=verdict.action,
-                            feedback=verdict.feedback or "Phase criteria not met.",
-                        )
+        # All-nullable with nothing set → node produced nothing useful.
+        all_nullable = output_keys and nullable_keys >= set(output_keys)
+        none_set = not any(accumulator.get(k) is not None for k in output_keys)
+        if all_nullable and none_set:
+            return JudgeVerdict(
+                action="RETRY",
+                feedback=(
+                    f"No output keys have been set yet. "
+                    f"Use set_output to set at least one of: {output_keys}"
+                ),
+            )

-                return JudgeVerdict(action="ACCEPT")
-            else:
+        # Client-facing with no output keys → continuous interaction node.
+        # Inject tool-use pressure instead of auto-accepting.
+        if not output_keys and ctx.node_spec.client_facing:
+            return JudgeVerdict(
+                action="RETRY",
+                feedback=(
+                    "STOP describing what you will do. "
+                    "You have FULL access to all tools — file creation, "
+                    "shell commands, MCP tools — and you CAN call them "
+                    "directly in your response. Respond ONLY with tool "
+                    "calls, no prose. Execute the task now."
+                ),
+            )
+
+        # Level 2b: conversation-aware quality check (if success_criteria set)
+        if ctx.node_spec.success_criteria and ctx.llm:
+            from framework.graph.conversation_judge import evaluate_phase_completion
+
+            verdict = await evaluate_phase_completion(
+                llm=ctx.llm,
+                conversation=conversation,
+                phase_name=ctx.node_spec.name,
+                phase_description=ctx.node_spec.description,
+                success_criteria=ctx.node_spec.success_criteria,
+                accumulator_state=accumulator.to_dict(),
+                max_context_tokens=self._config.max_context_tokens,
+            )
+            if verdict.action != "ACCEPT":
                return JudgeVerdict(
-                    action="RETRY",
-                    feedback=(
-                        f"Task incomplete. Required outputs not yet produced: {missing}. "
-                        f"Follow your system prompt instructions to complete the work."
-                    ),
+                    action=verdict.action,
+                    feedback=verdict.feedback or "Phase criteria not met.",
                )

-        # Tool calls were made -- continue loop
-        return JudgeVerdict(action="RETRY", feedback="")
+        return JudgeVerdict(action="ACCEPT", feedback="")

    # -------------------------------------------------------------------
    # Helpers
@@ -2956,8 +3138,10 @@ class EventLoopNode(NodeProtocol):
    def _is_stalled(self, recent_responses: list[str]) -> bool:
        """Detect stall using n-gram similarity.

-        Detects when N consecutive responses have similarity >= threshold.
-        This catches phrases like "I'm still stuck" vs "I'm stuck".
+        Detects when ALL N consecutive responses are mutually similar
+        (>= threshold).  A single dissimilar response resets the signal.
+        This catches phrases like "I'm still stuck" vs "I'm stuck"
+        without false-positives on "attempt 1" vs "attempt 2".
        """
        if len(recent_responses) < self._config.stall_detection_threshold:
            return False
@@ -2965,13 +3149,11 @@ class EventLoopNode(NodeProtocol):
            return False

        threshold = self._config.stall_similarity_threshold
-        # Check similarity against all recent responses (excluding self)
-        for i, resp in enumerate(recent_responses):
-            # Compare against all previous responses
-            for prev in recent_responses[:i]:
-                if self._ngram_similarity(resp, prev) >= threshold:
-                    return True
-        return False
+        # Every consecutive pair must be similar
+        for i in range(1, len(recent_responses)):
+            if self._ngram_similarity(recent_responses[i], recent_responses[i - 1]) < threshold:
+                return False
+        return True

    @staticmethod
    def _is_transient_error(exc: BaseException) -> bool:
@@ -3050,10 +3232,11 @@ class EventLoopNode(NodeProtocol):
        self,
        recent_tool_fingerprints: list[list[tuple[str, str]]],
    ) -> tuple[bool, str]:
-        """Detect doom loop using n-gram similarity on tool inputs.
+        """Detect doom loop via exact fingerprint match.

-        Detects when N consecutive turns have similar tool calls.
-        Similarity applies to the canonicalized tool input strings.
+        Detects when N consecutive turns invoke the same tools with
+        identical (canonicalized) arguments.  Different arguments mean
+        different work, so only exact matches count.

        Returns (is_doom_loop, description).
        """
@@ -3066,23 +3249,12 @@ class EventLoopNode(NodeProtocol):
        if not first:
            return False, ""

-        # Convert a turn's list of (name, args) pairs to a single comparable string.
-        def _turn_sig(fp: list[tuple[str, str]]) -> str:
-            return "|".join(f"{name}:{args}" for name, args in fp)
-
-        first_sig = _turn_sig(first)
-        similarity_threshold = self._config.stall_similarity_threshold
-        similar_count = sum(
-            1
-            for fp in recent_tool_fingerprints
-            if self._ngram_similarity(_turn_sig(fp), first_sig) >= similarity_threshold
-        )
-
-        if similar_count >= threshold:
-            tool_names = [name for fp in recent_tool_fingerprints for name, _ in fp]
+        # All turns in the window must match the first exactly
+        if all(fp == first for fp in recent_tool_fingerprints[1:]):
+            tool_names = [name for name, _ in first]
            desc = (
-                f"Doom loop detected: {similar_count}/{len(recent_tool_fingerprints)} "
-                f"consecutive similar tool calls ({', '.join(tool_names)})"
+                f"Doom loop detected: {len(recent_tool_fingerprints)} "
+                f"identical consecutive tool calls ({', '.join(tool_names)})"
            )
            return True, desc
        return False, ""
@@ -3318,7 +3490,7 @@ class EventLoopNode(NodeProtocol):
        phase_grad = getattr(ctx, "continuous_mode", False)

        # --- Step 1: Prune old tool results (free, no LLM) ---
-        protect = max(2000, self._config.max_history_tokens // 12)
+        protect = max(2000, self._config.max_context_tokens // 12)
        pruned = await conversation.prune_old_tool_results(
            protect_tokens=protect,
            min_prune_tokens=max(1000, protect // 3),
@@ -3424,7 +3596,7 @@ class EventLoopNode(NodeProtocol):
                accumulator,
                formatted,
            )
-            summary_budget = max(1024, self._config.max_history_tokens // 2)
+            summary_budget = max(1024, self._config.max_context_tokens // 2)
            try:
                response = await ctx.llm.acomplete(
                    messages=[{"role": "user", "content": prompt}],
@@ -3527,7 +3699,7 @@ class EventLoopNode(NodeProtocol):
        elif spec.output_keys:
            ctx_lines.append(f"OUTPUTS STILL NEEDED: {', '.join(spec.output_keys)}")

-        target_tokens = self._config.max_history_tokens // 2
+        target_tokens = self._config.max_context_tokens // 2
        target_chars = target_tokens * 4
        node_ctx = "\n".join(ctx_lines)

@@ -3995,6 +4167,7 @@ class EventLoopNode(NodeProtocol):
        model: str,
        input_tokens: int,
        output_tokens: int,
+        cached_tokens: int = 0,
        execution_id: str = "",
        iteration: int | None = None,
    ) -> None:
@@ -4006,6 +4179,7 @@ class EventLoopNode(NodeProtocol):
                model=model,
                input_tokens=input_tokens,
                output_tokens=output_tokens,
+                cached_tokens=cached_tokens,
                execution_id=execution_id,
                iteration=iteration,
            )
@@ -4288,22 +4462,18 @@ class EventLoopNode(NodeProtocol):

            registry[escalation_id] = receiver
            try:
-                # Stream message to user (parent's node_id so TUI shows parent talking)
-                await self._event_bus.emit_client_output_delta(
-                    stream_id=ctx.node_id,
-                    node_id=ctx.node_id,
-                    content=message,
-                    snapshot=message,
-                    execution_id=ctx.execution_id,
-                )
-                # Request input (escalation_id for routing response back)
-                await self._event_bus.emit_client_input_requested(
-                    stream_id=ctx.node_id,
+                # Escalate to the queen instead of asking the user directly.
+                # The queen handles the request and injects the response via
+                # inject_worker_message(), which finds this receiver through
+                # its _awaiting_input flag.
+                await self._event_bus.emit_escalation_requested(
+                    stream_id=ctx.stream_id or ctx.node_id,
                    node_id=escalation_id,
-                    prompt=message,
+                    reason=f"Subagent report (wait_for_response) from {agent_id}",
+                    context=message,
                    execution_id=ctx.execution_id,
                )
-                # Block until user responds
+                # Block until queen responds
                return await receiver.wait()
            finally:
                registry.pop(escalation_id, None)
@@ -4410,7 +4580,7 @@ class EventLoopNode(NodeProtocol):
                max_iterations=max_iter,  # Tighter budget
                max_tool_calls_per_turn=self._config.max_tool_calls_per_turn,
                tool_call_overflow_margin=self._config.tool_call_overflow_margin,
-                max_history_tokens=self._config.max_history_tokens,
+                max_context_tokens=self._config.max_context_tokens,
                stall_detection_threshold=self._config.stall_detection_threshold,
                max_tool_result_chars=self._config.max_tool_result_chars,
                spillover_dir=subagent_spillover,
@@ -330,7 +330,7 @@ class GraphExecutor:
                _depth,
            )
        else:
-            max_tokens = getattr(conversation, "_max_history_tokens", 32000)
+            max_tokens = getattr(conversation, "_max_context_tokens", 32000)
            target_tokens = max_tokens // 2
            target_chars = target_tokens * 4

@@ -1604,7 +1604,7 @@ class GraphExecutor:
            # Return with paused status
            return ExecutionResult(
                success=False,
-                error="Execution paused by user",
+                error="Execution cancelled",
                output=saved_memory,
                steps_executed=steps,
                total_tokens=total_tokens,
@@ -1872,7 +1872,7 @@ class GraphExecutor:
                    max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 30),
                    tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
                    stall_detection_threshold=lc.get("stall_detection_threshold", 3),
-                    max_history_tokens=lc.get("max_history_tokens", 32000),
+                    max_context_tokens=lc.get("max_context_tokens", 32000),
                    max_tool_result_chars=lc.get("max_tool_result_chars", 30_000),
                    spillover_dir=spillover,
                    hooks=lc.get("hooks", {}),
@@ -1,203 +0,0 @@
-"""
-Standardized HITL (Human-In-The-Loop) Protocol
-
-This module defines the formal structure for pause/resume interactions
-where agents need to gather input from humans.
-"""
-
-from dataclasses import dataclass, field
-from enum import StrEnum
-from typing import Any
-
-
-class HITLInputType(StrEnum):
-    """Type of input expected from human."""
-
-    FREE_TEXT = "free_text"  # Open-ended text response
-    STRUCTURED = "structured"  # Specific fields to fill
-    SELECTION = "selection"  # Choose from options
-    APPROVAL = "approval"  # Yes/no/modify decision
-    MULTI_FIELD = "multi_field"  # Multiple related inputs
-
-
-@dataclass
-class HITLQuestion:
-    """A single question to ask the human."""
-
-    id: str
-    question: str
-    input_type: HITLInputType = HITLInputType.FREE_TEXT
-
-    # For SELECTION type
-    options: list[str] = field(default_factory=list)
-
-    # For STRUCTURED type
-    fields: dict[str, str] = field(default_factory=dict)  # {field_name: description}
-
-    # Metadata
-    required: bool = True
-    help_text: str = ""
-
-
-@dataclass
-class HITLRequest:
-    """
-    Formal request for human input at a pause node.
-
-    This is what the agent produces when it needs human input.
-    """
-
-    # Context
-    objective: str  # What we're trying to accomplish
-    current_state: str  # Where we are in the process
-
-    # What we need
-    questions: list[HITLQuestion] = field(default_factory=list)
-    missing_info: list[str] = field(default_factory=list)
-
-    # Guidance
-    instructions: str = ""
-    examples: list[str] = field(default_factory=list)
-
-    # Metadata
-    request_id: str = ""
-    node_id: str = ""
-
-    def to_dict(self) -> dict[str, Any]:
-        """Convert to dictionary for serialization."""
-        return {
-            "objective": self.objective,
-            "current_state": self.current_state,
-            "questions": [
-                {
-                    "id": q.id,
-                    "question": q.question,
-                    "input_type": q.input_type.value,
-                    "options": q.options,
-                    "fields": q.fields,
-                    "required": q.required,
-                    "help_text": q.help_text,
-                }
-                for q in self.questions
-            ],
-            "missing_info": self.missing_info,
-            "instructions": self.instructions,
-            "examples": self.examples,
-            "request_id": self.request_id,
-            "node_id": self.node_id,
-        }
-
-
-@dataclass
-class HITLResponse:
-    """
-    Human's response to a HITL request.
-
-    This is what gets passed back when resuming from a pause.
-    """
-
-    # Original request reference
-    request_id: str
-
-    # Human's answers
-    answers: dict[str, Any] = field(default_factory=dict)  # {question_id: answer}
-    raw_input: str = ""  # Raw text if provided
-
-    # Metadata
-    response_time_ms: int = 0
-
-    def to_dict(self) -> dict[str, Any]:
-        """Convert to dictionary for serialization."""
-        return {
-            "request_id": self.request_id,
-            "answers": self.answers,
-            "raw_input": self.raw_input,
-            "response_time_ms": self.response_time_ms,
-        }
-
-
-class HITLProtocol:
-    """
-    Standardized protocol for HITL interactions.
-
-    Usage in pause nodes:
-
-    1. Pause Node: Generates HITLRequest with questions
-    2. Executor: Saves state and returns request to user
-    3. User: Provides HITLResponse with answers
-    4. Resume Node: Processes response and merges into context
-    """
-
-    @staticmethod
-    def create_request(
-        objective: str,
-        questions: list[HITLQuestion],
-        missing_info: list[str] | None = None,
-        node_id: str = "",
-    ) -> HITLRequest:
-        """Create a standardized HITL request."""
-        return HITLRequest(
-            objective=objective,
-            current_state="Awaiting clarification",
-            questions=questions,
-            missing_info=missing_info or [],
-            request_id=f"{node_id}_{hash(objective) % 10000}",
-            node_id=node_id,
-        )
-
-    @staticmethod
-    def parse_response(
-        raw_input: str,
-        request: HITLRequest,
-        use_haiku: bool = True,
-    ) -> HITLResponse:
-        """
-        Parse human's raw input into structured response.
-
-        Maps the raw input to the first question. For multi-question HITL,
-        the caller should present one question at a time.
-        """
-        response = HITLResponse(request_id=request.request_id, raw_input=raw_input)
-
-        # If no questions, just return raw input
-        if not request.questions:
-            return response
-
-        # Map raw input to first question
-        response.answers[request.questions[0].id] = raw_input
-        return response
-
-    @staticmethod
-    def format_for_display(request: HITLRequest) -> str:
-        """Format HITL request for user-friendly display."""
-        parts = []
-
-        if request.objective:
-            parts.append(f"📋 Objective: {request.objective}")
-
-        if request.current_state:
-            parts.append(f"📍 Current State: {request.current_state}")
-
-        if request.instructions:
-            parts.append(f"\n{request.instructions}")
-
-        if request.questions:
-            parts.append(f"\n❓ Questions ({len(request.questions)}):")
-            for i, q in enumerate(request.questions, 1):
-                parts.append(f"{i}. {q.question}")
-                if q.help_text:
-                    parts.append(f"   💡 {q.help_text}")
-                if q.options:
-                    parts.append(f"   Options: {', '.join(q.options)}")
-
-        if request.missing_info:
-            parts.append("\n📝 Missing Information:")
-            for info in request.missing_info:
-                parts.append(f"  • {info}")
-
-        if request.examples:
-            parts.append("\n📚 Examples:")
-            for example in request.examples:
-                parts.append(f"  • {example}")
-
-        return "\n".join(parts)
@@ -119,6 +119,19 @@ RATE_LIMIT_BACKOFF_BASE = 2  # seconds
 RATE_LIMIT_MAX_DELAY = 120  # seconds - cap to prevent absurd waits
 MINIMAX_API_BASE = "https://api.minimax.io/v1"

+# Providers that accept cache_control on message content blocks.
+# Anthropic: native ephemeral caching. MiniMax & Z-AI/GLM: pass-through to their APIs.
+# (OpenAI caches automatically server-side; Groq/Gemini/etc. strip the header.)
+_CACHE_CONTROL_PREFIXES = ("anthropic/", "claude-", "minimax/", "minimax-", "MiniMax-", "zai-glm", "glm-")
+
+
+def _model_supports_cache_control(model: str) -> bool:
+    return any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES)
+# Kimi For Coding uses an Anthropic-compatible endpoint (no /v1 suffix).
+# Claude Code integration uses this format; the /v1 OpenAI-compatible endpoint
+# enforces a coding-agent whitelist that blocks unknown User-Agents.
+KIMI_API_BASE = "https://api.kimi.com/coding"
+
 # Empty-stream retries use a short fixed delay, not the rate-limit backoff.
 # Conversation-structure issues are deterministic — long waits don't help.
 EMPTY_STREAM_MAX_RETRIES = 3
@@ -323,9 +336,21 @@ class LiteLLMProvider(LLMProvider):
            api_base: Custom API base URL (for proxies or local deployments)
            **kwargs: Additional arguments passed to litellm.completion()
        """
+        # Kimi For Coding exposes an Anthropic-compatible endpoint at
+        # https://api.kimi.com/coding (the same format Claude Code uses natively).
+        # Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
+        # Messages API handler and routes to that endpoint — no special headers needed.
+        _original_model = model
+        if model.lower().startswith("kimi/"):
+            model = "anthropic/" + model[len("kimi/") :]
+            # Normalise api_base: litellm's Anthropic handler appends /v1/messages,
+            # so the base must be https://api.kimi.com/coding (no /v1 suffix).
+            # Strip a trailing /v1 in case the user's saved config has the old value.
+            if api_base and api_base.rstrip("/").endswith("/v1"):
+                api_base = api_base.rstrip("/")[:-3]
        self.model = model
        self.api_key = api_key
-        self.api_base = api_base or self._default_api_base_for_model(model)
+        self.api_base = api_base or self._default_api_base_for_model(_original_model)
        self.extra_kwargs = kwargs
        # The Codex ChatGPT backend (chatgpt.com/backend-api/codex) rejects
        # several standard OpenAI params: max_output_tokens, stream_options.
@@ -350,6 +375,8 @@ class LiteLLMProvider(LLMProvider):
        model_lower = model.lower()
        if model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
            return MINIMAX_API_BASE
+        if model_lower.startswith("kimi/"):
+            return KIMI_API_BASE
        return None

    def _completion_with_rate_limit_retry(
@@ -689,7 +716,10 @@ class LiteLLMProvider(LLMProvider):

        full_messages: list[dict[str, Any]] = []
        if system:
-            full_messages.append({"role": "system", "content": system})
+            sys_msg: dict[str, Any] = {"role": "system", "content": system}
+            if _model_supports_cache_control(self.model):
+                sys_msg["cache_control"] = {"type": "ephemeral"}
+            full_messages.append(sys_msg)
        full_messages.extend(messages)

        if json_mode:
@@ -860,7 +890,10 @@ class LiteLLMProvider(LLMProvider):

        full_messages: list[dict[str, Any]] = []
        if system:
-            full_messages.append({"role": "system", "content": system})
+            sys_msg: dict[str, Any] = {"role": "system", "content": system}
+            if _model_supports_cache_control(self.model):
+                sys_msg["cache_control"] = {"type": "ephemeral"}
+            full_messages.append(sys_msg)
        full_messages.extend(messages)

        # Codex Responses API requires an `instructions` field (system prompt).
@@ -925,9 +958,26 @@ class LiteLLMProvider(LLMProvider):
                response = await litellm.acompletion(**kwargs)  # type: ignore[union-attr]

                async for chunk in response:
-                    choice = chunk.choices[0] if chunk.choices else None
-                    if not choice:
+                    # Capture usage from the trailing usage-only chunk that
+                    # stream_options={"include_usage": True} sends with empty choices.
+                    if not chunk.choices:
+                        usage = getattr(chunk, "usage", None)
+                        if usage:
+                            input_tokens = getattr(usage, "prompt_tokens", 0) or 0
+                            output_tokens = getattr(usage, "completion_tokens", 0) or 0
+                            logger.debug(
+                                "[tokens] trailing usage chunk: input=%d output=%d model=%s",
+                                input_tokens,
+                                output_tokens,
+                                self.model,
+                            )
+                        else:
+                            logger.debug(
+                                "[tokens] empty-choices chunk with no usage (model=%s)",
+                                self.model,
+                            )
                        continue
+                    choice = chunk.choices[0]

                    delta = choice.delta

@@ -1000,19 +1050,90 @@ class LiteLLMProvider(LLMProvider):
                            tail_events.append(TextEndEvent(full_text=accumulated_text))

                        usage = getattr(chunk, "usage", None)
+                        logger.debug(
+                            "[tokens] finish-chunk raw usage: %r (type=%s)",
+                            usage,
+                            type(usage).__name__,
+                        )
+                        cached_tokens = 0
                        if usage:
                            input_tokens = getattr(usage, "prompt_tokens", 0) or 0
                            output_tokens = getattr(usage, "completion_tokens", 0) or 0
+                            _details = getattr(usage, "prompt_tokens_details", None)
+                            cached_tokens = (
+                                getattr(_details, "cached_tokens", 0) or 0
+                                if _details is not None
+                                else getattr(usage, "cache_read_input_tokens", 0) or 0
+                            )
+                            logger.debug(
+                                "[tokens] finish-chunk usage: input=%d output=%d cached=%d model=%s",
+                                input_tokens,
+                                output_tokens,
+                                cached_tokens,
+                                self.model,
+                            )

+                        logger.debug(
+                            "[tokens] finish event: input=%d output=%d cached=%d stop=%s model=%s",
+                            input_tokens,
+                            output_tokens,
+                            cached_tokens,
+                            choice.finish_reason,
+                            self.model,
+                        )
                        tail_events.append(
                            FinishEvent(
                                stop_reason=choice.finish_reason,
                                input_tokens=input_tokens,
                                output_tokens=output_tokens,
+                                cached_tokens=cached_tokens,
                                model=self.model,
                            )
                        )

+                # Fallback: LiteLLM strips usage from yielded chunks before
+                # returning them to us, but appends the original chunk (with
+                # usage intact) to response.chunks first.  Use LiteLLM's own
+                # calculate_total_usage() on that accumulated list.
+                if input_tokens == 0 and output_tokens == 0:
+                    try:
+                        from litellm.litellm_core_utils.streaming_handler import (
+                            calculate_total_usage,
+                        )
+
+                        _chunks = getattr(response, "chunks", None)
+                        if _chunks:
+                            _usage = calculate_total_usage(chunks=_chunks)
+                            input_tokens = _usage.prompt_tokens or 0
+                            output_tokens = _usage.completion_tokens or 0
+                            _details = getattr(_usage, "prompt_tokens_details", None)
+                            cached_tokens = (
+                                getattr(_details, "cached_tokens", 0) or 0
+                                if _details is not None
+                                else getattr(_usage, "cache_read_input_tokens", 0) or 0
+                            )
+                            logger.debug(
+                                "[tokens] post-loop chunks fallback:"
+                                " input=%d output=%d cached=%d model=%s",
+                                input_tokens,
+                                output_tokens,
+                                cached_tokens,
+                                self.model,
+                            )
+                            # Patch the FinishEvent already queued with 0 tokens
+                            for _i, _ev in enumerate(tail_events):
+                                if isinstance(_ev, FinishEvent) and _ev.input_tokens == 0:
+                                    tail_events[_i] = FinishEvent(
+                                        stop_reason=_ev.stop_reason,
+                                        input_tokens=input_tokens,
+                                        output_tokens=output_tokens,
+                                        cached_tokens=cached_tokens,
+                                        model=_ev.model,
+                                    )
+                                    break
+                    except Exception as _e:
+                        logger.debug("[tokens] chunks fallback failed: %s", _e)
+
                # Check whether the stream produced any real content.
                # (If text deltas were yielded above, has_content is True
                # and we skip the retry path — nothing was yielded in vain.)
@@ -71,6 +71,7 @@ class FinishEvent:
    stop_reason: str = ""
    input_tokens: int = 0
    output_tokens: int = 0
+    cached_tokens: int = 0
    model: str = ""


@@ -1,4 +0,0 @@
-"""MCP servers for worker-bee."""
-
-# Don't auto-import servers to avoid double-import issues when running with -m
-__all__ = []
@@ -253,6 +253,6 @@ judge_graph = GraphSpec(
    loop_config={
        "max_iterations": 10,  # One check shouldn't take many turns
        "max_tool_calls_per_turn": 3,  # get_summary + optionally emit_ticket
-        "max_history_tokens": 16000,  # Compact — judge only needs recent context
+        "max_context_tokens": 16000,  # Compact — judge only needs recent context
    },
 )
@@ -148,8 +148,9 @@ class HumanReadableFormatter(logging.Formatter):
        if record_event is not None:
            event = f" [{record_event}]"

-        # Format message: [LEVEL] [trace context] message
-        return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
+        timestamp = self.formatTime(record, "%Y-%m-%d %H:%M:%S")
+        # Format message: TIMESTAMP [LEVEL] [trace context] message
+        return f"{timestamp} {color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"


 def configure_logging(
@@ -51,11 +51,7 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        action="store_true",
        help="Show detailed execution logs (steps, LLM calls, etc.)",
    )
-    run_parser.add_argument(
-        "--tui",
-        action="store_true",
-        help="Launch interactive terminal dashboard",
-    )
+
    run_parser.add_argument(
        "--model",
        "-m",
@@ -194,158 +190,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
    shell_parser.set_defaults(func=cmd_shell)

    # tui command (interactive agent dashboard)
-    tui_parser = subparsers.add_parser(
-        "tui",
-        help="Launch interactive TUI dashboard",
-        description="Browse available agents and launch the terminal dashboard.",
-    )
-    tui_parser.add_argument(
-        "--model",
-        "-m",
-        type=str,
-        default=None,
-        help="LLM model to use (any LiteLLM-compatible name)",
-    )
-    tui_parser.set_defaults(func=cmd_tui)
-
-    # code command (Hive Coder — framework agent builder)
-    code_parser = subparsers.add_parser(
-        "code",
-        help="Launch Hive Coder to build agents",
-        description="Interactive agent builder. Describe what you want and Hive Coder builds it.",
-    )
-    code_parser.add_argument(
-        "--model",
-        "-m",
-        type=str,
-        default=None,
-        help="LLM model to use (any LiteLLM-compatible name)",
-    )
-    code_parser.set_defaults(func=cmd_code)
-
-    # sessions command group (checkpoint/resume management)
-    sessions_parser = subparsers.add_parser(
-        "sessions",
-        help="Manage agent sessions",
-        description="List, inspect, and manage agent execution sessions.",
-    )
-    sessions_subparsers = sessions_parser.add_subparsers(
-        dest="sessions_cmd",
-        help="Session management commands",
-    )
-
-    # sessions list
-    sessions_list_parser = sessions_subparsers.add_parser(
-        "list",
-        help="List agent sessions",
-        description="List all sessions for an agent.",
-    )
-    sessions_list_parser.add_argument(
-        "agent_path",
-        type=str,
-        help="Path to agent folder",
-    )
-    sessions_list_parser.add_argument(
-        "--status",
-        choices=["all", "active", "failed", "completed", "paused"],
-        default="all",
-        help="Filter by session status (default: all)",
-    )
-    sessions_list_parser.add_argument(
-        "--has-checkpoints",
-        action="store_true",
-        help="Show only sessions with checkpoints",
-    )
-    sessions_list_parser.set_defaults(func=cmd_sessions_list)
-
-    # sessions show
-    sessions_show_parser = sessions_subparsers.add_parser(
-        "show",
-        help="Show session details",
-        description="Display detailed information about a specific session.",
-    )
-    sessions_show_parser.add_argument(
-        "agent_path",
-        type=str,
-        help="Path to agent folder",
-    )
-    sessions_show_parser.add_argument(
-        "session_id",
-        type=str,
-        help="Session ID to inspect",
-    )
-    sessions_show_parser.add_argument(
-        "--json",
-        action="store_true",
-        help="Output as JSON",
-    )
-    sessions_show_parser.set_defaults(func=cmd_sessions_show)
-
-    # sessions checkpoints
-    sessions_checkpoints_parser = sessions_subparsers.add_parser(
-        "checkpoints",
-        help="List session checkpoints",
-        description="List all checkpoints for a session.",
-    )
-    sessions_checkpoints_parser.add_argument(
-        "agent_path",
-        type=str,
-        help="Path to agent folder",
-    )
-    sessions_checkpoints_parser.add_argument(
-        "session_id",
-        type=str,
-        help="Session ID",
-    )
-    sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
-
-    # pause command
-    pause_parser = subparsers.add_parser(
-        "pause",
-        help="Pause running session",
-        description="Request graceful pause of a running agent session.",
-    )
-    pause_parser.add_argument(
-        "agent_path",
-        type=str,
-        help="Path to agent folder",
-    )
-    pause_parser.add_argument(
-        "session_id",
-        type=str,
-        help="Session ID to pause",
-    )
-    pause_parser.set_defaults(func=cmd_pause)
-
-    # resume command
-    resume_parser = subparsers.add_parser(
-        "resume",
-        help="Resume session from checkpoint",
-        description="Resume a paused or failed session from a checkpoint.",
-    )
-    resume_parser.add_argument(
-        "agent_path",
-        type=str,
-        help="Path to agent folder",
-    )
-    resume_parser.add_argument(
-        "session_id",
-        type=str,
-        help="Session ID to resume",
-    )
-    resume_parser.add_argument(
-        "--checkpoint",
-        "-c",
-        type=str,
-        help="Specific checkpoint ID to resume from (default: latest)",
-    )
-    resume_parser.add_argument(
-        "--tui",
-        action="store_true",
-        help="Resume in TUI dashboard mode",
-    )
-    resume_parser.set_defaults(func=cmd_resume)
-
    # setup-credentials command
    setup_creds_parser = subparsers.add_parser(
        "setup-credentials",
@@ -399,6 +243,12 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        action="store_true",
        help="Open dashboard in browser after server starts",
    )
+    serve_parser.add_argument(
+        "--verbose", "-v", action="store_true", help="Enable INFO log level"
+    )
+    serve_parser.add_argument(
+        "--debug", action="store_true", help="Enable DEBUG log level"
+    )
    serve_parser.set_defaults(func=cmd_serve)

    # open command (serve + auto-open browser)
@@ -436,6 +286,12 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        default=None,
        help="LLM model for preloaded agents",
    )
+    open_parser.add_argument(
+        "--verbose", "-v", action="store_true", help="Enable INFO log level"
+    )
+    open_parser.add_argument(
+        "--debug", action="store_true", help="Enable DEBUG log level"
+    )
    open_parser.set_defaults(func=cmd_open)


@@ -536,13 +392,15 @@ def cmd_run(args: argparse.Namespace) -> int:
    from framework.credentials.models import CredentialError
    from framework.runner import AgentRunner

+    from framework.observability import configure_logging
+
    # Set logging level (quiet by default for cleaner output)
    if args.quiet:
-        logging.basicConfig(level=logging.ERROR, format="%(message)s")
+        configure_logging(level="ERROR")
    elif getattr(args, "verbose", False):
-        logging.basicConfig(level=logging.INFO, format="%(message)s")
+        configure_logging(level="INFO")
    else:
-        logging.basicConfig(level=logging.WARNING, format="%(message)s")
+        configure_logging(level="WARNING")

    # Load input context
    context = {}
@@ -577,128 +435,67 @@ def cmd_run(args: argparse.Namespace) -> int:
            )
            return 1

-    # Run the agent (with TUI or standard)
-    if getattr(args, "tui", False):
-        from framework.tui.app import AdenTUI
+    # Standard execution
+    # AgentRunner handles credential setup interactively when stdin is a TTY.
+    try:
+        runner = AgentRunner.load(
+            args.agent_path,
+            model=args.model,
+        )
+    except CredentialError as e:
+        print(f"\n{e}", file=sys.stderr)
+        return 1
+    except FileNotFoundError as e:
+        print(f"Error: {e}", file=sys.stderr)
+        return 1

-        async def run_with_tui():
-            try:
-                # Load runner inside the async loop to ensure strict loop affinity
-                # (only one load — avoids spawning duplicate MCP subprocesses)
-                # AgentRunner handles credential setup interactively when stdin is a TTY.
-                try:
-                    runner = AgentRunner.load(
-                        args.agent_path,
-                        model=args.model,
-                    )
-                except CredentialError as e:
-                    print(f"\n{e}", file=sys.stderr)
-                    return
-                except Exception as e:
-                    print(f"Error loading agent: {e}")
-                    return
+    # Prompt before starting (allows credential updates)
+    if sys.stdin.isatty() and not args.quiet:
+        runner = _prompt_before_start(args.agent_path, runner, args.model)
+        if runner is None:
+            return 1

-                # Prompt before starting (allows credential updates)
-                if sys.stdin.isatty():
-                    runner = _prompt_before_start(args.agent_path, runner, args.model)
-                    if runner is None:
-                        return
-
-                # Force setup inside the loop
-                if runner._agent_runtime is None:
-                    try:
-                        runner._setup()
-                    except CredentialError as e:
-                        print(f"\n{e}", file=sys.stderr)
-                        return
-
-                # Start runtime before TUI so it's ready for user input
-                if runner._agent_runtime and not runner._agent_runtime.is_running:
-                    await runner._agent_runtime.start()
-
-                app = AdenTUI(
-                    runner._agent_runtime,
-                    resume_session=getattr(args, "resume_session", None),
-                    resume_checkpoint=getattr(args, "checkpoint", None),
-                )
-
-                # TUI handles execution via ChatRepl — user submits input,
-                # ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
-                await app.run_async()
-            except Exception as e:
-                import traceback
-
-                traceback.print_exc()
-                print(f"TUI error: {e}")
-
-            await runner.cleanup_async()
-            return None
-
-        asyncio.run(run_with_tui())
-        print("TUI session ended.")
-        return 0
-    else:
-        # Standard execution — load runner here (not shared with TUI path)
-        # AgentRunner handles credential setup interactively when stdin is a TTY.
-        try:
-            runner = AgentRunner.load(
-                args.agent_path,
-                model=args.model,
+    # Load session/checkpoint state for resume (headless mode)
+    session_state = None
+    resume_session = getattr(args, "resume_session", None)
+    checkpoint = getattr(args, "checkpoint", None)
+    if resume_session:
+        session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
+        if session_state is None:
+            print(
+                f"Error: Could not load session state for {resume_session}",
+                file=sys.stderr,
            )
-        except CredentialError as e:
-            print(f"\n{e}", file=sys.stderr)
            return 1
-        except FileNotFoundError as e:
-            print(f"Error: {e}", file=sys.stderr)
-            return 1
-
-        # Prompt before starting (allows credential updates)
-        if sys.stdin.isatty() and not args.quiet:
-            runner = _prompt_before_start(args.agent_path, runner, args.model)
-            if runner is None:
-                return 1
-
-        # Load session/checkpoint state for resume (headless mode)
-        session_state = None
-        resume_session = getattr(args, "resume_session", None)
-        checkpoint = getattr(args, "checkpoint", None)
-        if resume_session:
-            session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
-            if session_state is None:
-                print(
-                    f"Error: Could not load session state for {resume_session}",
-                    file=sys.stderr,
-                )
-                return 1
-            if not args.quiet:
-                resume_node = session_state.get("paused_at", "unknown")
-                if checkpoint:
-                    print(f"Resuming from checkpoint: {checkpoint}")
-                else:
-                    print(f"Resuming session: {resume_session}")
-                print(f"Resume point: {resume_node}")
-                print()
-
-        # Auto-inject user_id if the agent expects it but it's not provided
-        entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
-        if "user_id" in entry_input_keys and context.get("user_id") is None:
-            import os
-
-            context["user_id"] = os.environ.get("USER", "default_user")
-
        if not args.quiet:
-            info = runner.info()
-            print(f"Agent: {info.name}")
-            print(f"Goal: {info.goal_name}")
-            print(f"Steps: {info.node_count}")
-            print(f"Input: {json.dumps(context)}")
-            print()
-            print("=" * 60)
-            print("Executing agent...")
-            print("=" * 60)
+            resume_node = session_state.get("paused_at", "unknown")
+            if checkpoint:
+                print(f"Resuming from checkpoint: {checkpoint}")
+            else:
+                print(f"Resuming session: {resume_session}")
+            print(f"Resume point: {resume_node}")
            print()

-        result = asyncio.run(runner.run(context, session_state=session_state))
+    # Auto-inject user_id if the agent expects it but it's not provided
+    entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
+    if "user_id" in entry_input_keys and context.get("user_id") is None:
+        import os
+
+        context["user_id"] = os.environ.get("USER", "default_user")
+
+    if not args.quiet:
+        info = runner.info()
+        print(f"Agent: {info.name}")
+        print(f"Goal: {info.goal_name}")
+        print(f"Steps: {info.node_count}")
+        print(f"Input: {json.dumps(context)}")
+        print()
+        print("=" * 60)
+        print("Executing agent...")
+        print("=" * 60)
+        print()
+
+    result = asyncio.run(runner.run(context, session_state=session_state))

    # Format output
    output = {
@@ -959,6 +756,17 @@ def cmd_dispatch(args: argparse.Namespace) -> int:
    if args.agents:
        # Use specific agents
        for agent_name in args.agents:
+            # Guard against full paths: if the name contains path separators
+            # (e.g. "exports/my_agent"), it will be doubled with agents_dir
+            agent_name_path = Path(agent_name)
+            if len(agent_name_path.parts) > 1:
+                print(
+                    f"Error: --agents expects agent names, not paths. "
+                    f"Use: --agents {agent_name_path.name} "
+                    f"instead of --agents {agent_name}",
+                    file=sys.stderr,
+                )
+                return 1
            agent_path = agents_dir / agent_name
            if not _is_valid_agent_dir(agent_path):
                print(f"Agent not found: {agent_path}", file=sys.stderr)
@@ -1129,11 +937,9 @@ def cmd_shell(args: argparse.Namespace) -> int:
    from framework.credentials.models import CredentialError
    from framework.runner import AgentRunner

-    # Configure logging to show runtime visibility
-    logging.basicConfig(
-        level=logging.INFO,
-        format="%(message)s",  # Simple format for clean output
-    )
+    from framework.observability import configure_logging
+
+    configure_logging(level="INFO")

    agents_dir = Path(args.agents_dir)

@@ -1364,154 +1170,6 @@ def _get_framework_agents_dir() -> Path:
    return Path(__file__).resolve().parent.parent / "agents"


-def _launch_agent_tui(
-    agent_path: str | Path,
-    model: str | None = None,
-) -> int:
-    """Load an agent and launch the TUI. Shared by cmd_tui and cmd_code."""
-    from framework.credentials.models import CredentialError
-    from framework.runner import AgentRunner
-    from framework.tui.app import AdenTUI
-
-    async def run_with_tui():
-        # AgentRunner handles credential setup interactively when stdin is a TTY.
-        try:
-            runner = AgentRunner.load(
-                agent_path,
-                model=model,
-            )
-        except CredentialError as e:
-            print(f"\n{e}", file=sys.stderr)
-            return
-        except Exception as e:
-            print(f"Error loading agent: {e}")
-            return
-
-        if runner._agent_runtime is None:
-            try:
-                runner._setup()
-            except CredentialError as e:
-                print(f"\n{e}", file=sys.stderr)
-                return
-
-        if runner._agent_runtime and not runner._agent_runtime.is_running:
-            await runner._agent_runtime.start()
-
-        app = AdenTUI(runner._agent_runtime)
-        try:
-            await app.run_async()
-        except Exception as e:
-            import traceback
-
-            traceback.print_exc()
-            print(f"TUI error: {e}")
-
-        await runner.cleanup_async()
-
-    asyncio.run(run_with_tui())
-    print("TUI session ended.")
-    return 0
-
-
-def cmd_tui(args: argparse.Namespace) -> int:
-    """Launch the interactive TUI dashboard with in-app agent picker."""
-    import logging
-
-    logging.basicConfig(level=logging.WARNING, format="%(message)s")
-
-    from framework.tui.app import AdenTUI
-
-    async def run_tui():
-        app = AdenTUI(
-            model=args.model,
-        )
-        await app.run_async()
-
-    asyncio.run(run_tui())
-    print("TUI session ended.")
-    return 0
-
-
-def cmd_code(args: argparse.Namespace) -> int:
-    """Launch Hive Coder with multi-graph support.
-
-    Unlike ``_launch_agent_tui``, this sets up graph lifecycle tools and
-    assigns ``graph_id="hive_coder"`` so the coder can load, supervise,
-    and restart secondary agent graphs within the same session.
-    """
-    import logging
-
-    logging.basicConfig(level=logging.WARNING, format="%(message)s")
-
-    framework_agents_dir = _get_framework_agents_dir()
-    hive_coder_path = framework_agents_dir / "hive_coder"
-
-    if not (hive_coder_path / "agent.py").exists():
-        print("Error: Hive Coder agent not found.", file=sys.stderr)
-        return 1
-
-    # Ensure framework agents dir is on sys.path for import
-    fa_str = str(framework_agents_dir)
-    if fa_str not in sys.path:
-        sys.path.insert(0, fa_str)
-
-    from framework.credentials.models import CredentialError
-    from framework.runner import AgentRunner
-    from framework.tools.session_graph_tools import register_graph_tools
-    from framework.tui.app import AdenTUI
-
-    async def run_with_tui():
-        try:
-            runner = AgentRunner.load(hive_coder_path, model=args.model)
-        except CredentialError as e:
-            print(f"\n{e}", file=sys.stderr)
-            return
-        except Exception as e:
-            print(f"Error loading agent: {e}")
-            return
-
-        if runner._agent_runtime is None:
-            try:
-                runner._setup()
-            except CredentialError as e:
-                print(f"\n{e}", file=sys.stderr)
-                return
-
-        runtime = runner._agent_runtime
-
-        # -- Multi-graph setup --
-        # Tag the primary graph so events carry graph_id="hive_coder"
-        runtime._graph_id = "hive_coder"
-        runtime._active_graph_id = "hive_coder"
-
-        # Register graph lifecycle tools (load_agent, unload_agent, etc.)
-        register_graph_tools(runner._tool_registry, runtime)
-
-        # Refresh tool schemas AND executor so streams see the new tools.
-        # The executor closure references the registry dict by ref, but
-        # refreshing both is robust against any copy-on-read behavior.
-        runtime._tools = list(runner._tool_registry.get_tools().values())
-        runtime._tool_executor = runner._tool_registry.get_executor()
-
-        if not runtime.is_running:
-            await runtime.start()
-
-        app = AdenTUI(runtime)
-        try:
-            await app.run_async()
-        except Exception as e:
-            import traceback
-
-            traceback.print_exc()
-            print(f"TUI error: {e}")
-
-        await runner.cleanup_async()
-
-    asyncio.run(run_with_tui())
-    print("TUI session ended.")
-    return 0
-
-
 def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
    """Extract name and description from a Python-based agent's config.py.

@@ -1864,56 +1522,6 @@ def _interactive_multi(agents_dir: Path) -> int:
    return 0


-def cmd_sessions_list(args: argparse.Namespace) -> int:
-    """List agent sessions."""
-    print("⚠ Sessions list command not yet implemented")
-    print("This will be available once checkpoint infrastructure is complete.")
-    print(f"\nAgent: {args.agent_path}")
-    print(f"Status filter: {args.status}")
-    print(f"Has checkpoints: {args.has_checkpoints}")
-    return 1
-
-
-def cmd_sessions_show(args: argparse.Namespace) -> int:
-    """Show detailed session information."""
-    print("⚠ Session show command not yet implemented")
-    print("This will be available once checkpoint infrastructure is complete.")
-    print(f"\nAgent: {args.agent_path}")
-    print(f"Session: {args.session_id}")
-    return 1
-
-
-def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
-    """List checkpoints for a session."""
-    print("⚠ Session checkpoints command not yet implemented")
-    print("This will be available once checkpoint infrastructure is complete.")
-    print(f"\nAgent: {args.agent_path}")
-    print(f"Session: {args.session_id}")
-    return 1
-
-
-def cmd_pause(args: argparse.Namespace) -> int:
-    """Pause a running session."""
-    print("⚠ Pause command not yet implemented")
-    print("This will be available once executor pause integration is complete.")
-    print(f"\nAgent: {args.agent_path}")
-    print(f"Session: {args.session_id}")
-    return 1
-
-
-def cmd_resume(args: argparse.Namespace) -> int:
-    """Resume a session from checkpoint."""
-    print("⚠ Resume command not yet implemented")
-    print("This will be available once checkpoint resume integration is complete.")
-    print(f"\nAgent: {args.agent_path}")
-    print(f"Session: {args.session_id}")
-    if args.checkpoint:
-        print(f"Checkpoint: {args.checkpoint}")
-    if args.tui:
-        print("Mode: TUI")
-    return 1
-
-
 def cmd_setup_credentials(args: argparse.Namespace) -> int:
    """Interactive credential setup for an agent."""
    from framework.credentials.setup import CredentialSetupSession
@@ -2037,10 +1645,12 @@ def cmd_serve(args: argparse.Namespace) -> int:

    from framework.server.app import create_app

-    logging.basicConfig(
-        level=logging.INFO,
-        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
-    )
+    from framework.observability import configure_logging
+
+    if getattr(args, "debug", False):
+        configure_logging(level="DEBUG")
+    else:
+        configure_logging(level="INFO")

    model = getattr(args, "model", None)
    app = create_app(model=model)
@@ -68,6 +68,7 @@ class MCPClient:
        self._read_stream = None
        self._write_stream = None
        self._stdio_context = None  # Context manager for stdio_client
+        self._errlog_handle = None  # Track errlog file handle for cleanup
        self._http_client: httpx.Client | None = None
        self._tools: dict[str, MCPTool] = {}
        self._connected = False
@@ -200,7 +201,8 @@ class MCPClient:
                        if os.name == "nt":
                            errlog = sys.stderr
                        else:
-                            errlog = open(os.devnull, "w")  # noqa: SIM115
+                            self._errlog_handle = open(os.devnull, "w")
+                            errlog = self._errlog_handle
                        self._stdio_context = stdio_client(server_params, errlog=errlog)
                        (
                            self._read_stream,
@@ -475,6 +477,15 @@ class MCPClient:
        finally:
            self._stdio_context = None

+        # Third: close errlog file handle if we opened one
+        if self._errlog_handle is not None:
+            try:
+                self._errlog_handle.close()
+            except Exception as e:
+                logger.debug(f"Error closing errlog handle: {e}")
+            finally:
+                self._errlog_handle = None
+
    def disconnect(self) -> None:
        """Disconnect from the MCP server."""
        # Clean up persistent STDIO connection
@@ -545,6 +556,7 @@ class MCPClient:
            self._write_stream = None
            self._loop = None
            self._loop_thread = None
+            self._errlog_handle = None

        # Clean up HTTP client
        if self._http_client:
@@ -9,7 +9,7 @@ from datetime import UTC
 from pathlib import Path
 from typing import TYPE_CHECKING, Any

-from framework.config import get_hive_config, get_preferred_model
+from framework.config import get_hive_config, get_max_context_tokens, get_preferred_model
 from framework.credentials.validation import (
    ensure_credential_key_env as _ensure_credential_key_env,
 )
@@ -517,6 +517,41 @@ def get_codex_account_id() -> str | None:
    return None


+# ---------------------------------------------------------------------------
+# Kimi Code subscription token helpers
+# ---------------------------------------------------------------------------
+
+
+def get_kimi_code_token() -> str | None:
+    """Get the API key from a Kimi Code CLI installation.
+
+    Reads the API key from ``~/.kimi/config.toml``, which is created when
+    the user runs ``kimi /login`` in the Kimi Code CLI.
+
+    Returns:
+        The API key if available, None otherwise.
+    """
+    import tomllib
+
+    config_path = Path.home() / ".kimi" / "config.toml"
+    if not config_path.exists():
+        return None
+
+    try:
+        with open(config_path, "rb") as f:
+            config = tomllib.load(f)
+        providers = config.get("providers", {})
+        # kimi-cli stores credentials under providers.kimi-for-coding
+        for provider_cfg in providers.values():
+            if isinstance(provider_cfg, dict):
+                key = provider_cfg.get("api_key")
+                if key:
+                    return key
+    except Exception:
+        pass
+    return None
+
+
@dataclass
 class AgentInfo:
    """Information about an exported agent."""
@@ -891,10 +926,31 @@ class AgentRunner:

            if agent_config and hasattr(agent_config, "max_tokens"):
                max_tokens = agent_config.max_tokens
+                logger.info(
+                    "Agent default_config overrides max_tokens: %d (configuration.json value ignored)",
+                    max_tokens,
+                )
            else:
                hive_config = get_hive_config()
                max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)

+            # Resolve max_context_tokens with priority:
+            #   1. agent loop_config["max_context_tokens"] (explicit, wins silently)
+            #   2. agent default_config.max_context_tokens (logged)
+            #   3. configuration.json llm.max_context_tokens
+            #   4. hardcoded default (32_000)
+            agent_loop_config: dict = dict(getattr(agent_module, "loop_config", {}))
+            if "max_context_tokens" not in agent_loop_config:
+                if agent_config and hasattr(agent_config, "max_context_tokens"):
+                    agent_loop_config["max_context_tokens"] = agent_config.max_context_tokens
+                    logger.info(
+                        "Agent default_config overrides max_context_tokens: %d"
+                        " (configuration.json value ignored)",
+                        agent_config.max_context_tokens,
+                    )
+                else:
+                    agent_loop_config["max_context_tokens"] = get_max_context_tokens()
+
            # Read intro_message from agent metadata (shown on TUI load)
            agent_metadata = getattr(agent_module, "metadata", None)
            intro_message = ""
@@ -914,7 +970,7 @@ class AgentRunner:
                "nodes": nodes,
                "edges": edges,
                "max_tokens": max_tokens,
-                "loop_config": getattr(agent_module, "loop_config", {}),
+                "loop_config": agent_loop_config,
            }
            # Only pass optional fields if explicitly defined by the agent module
            conversation_mode = getattr(agent_module, "conversation_mode", None)
@@ -1104,6 +1160,7 @@ class AgentRunner:
            llm_config = config.get("llm", {})
            use_claude_code = llm_config.get("use_claude_code_subscription", False)
            use_codex = llm_config.get("use_codex_subscription", False)
+            use_kimi_code = llm_config.get("use_kimi_code_subscription", False)
            api_base = llm_config.get("api_base")

            api_key = None
@@ -1119,6 +1176,12 @@ class AgentRunner:
                if not api_key:
                    print("Warning: Codex subscription configured but no token found.")
                    print("Run 'codex' to authenticate, then try again.")
+            elif use_kimi_code:
+                # Get API key from Kimi Code CLI config (~/.kimi/config.toml)
+                api_key = get_kimi_code_token()
+                if not api_key:
+                    print("Warning: Kimi Code subscription configured but no key found.")
+                    print("Run 'kimi /login' to authenticate, then try again.")

            if api_key and use_claude_code:
                # Use litellm's built-in Anthropic OAuth support.
@@ -1149,6 +1212,14 @@ class AgentRunner:
                    store=False,
                    allowed_openai_params=["store"],
                )
+            elif api_key and use_kimi_code:
+                # Kimi Code subscription uses the Kimi coding API (OpenAI-compatible).
+                # The api_base is set automatically by LiteLLMProvider for kimi/ models.
+                self._llm = LiteLLMProvider(
+                    model=self.model,
+                    api_key=api_key,
+                    api_base=api_base,
+                )
            else:
                # Local models (e.g. Ollama) don't need an API key
                if self._is_local_model(self.model):
@@ -1314,6 +1385,8 @@ class AgentRunner:
            return "TOGETHER_API_KEY"
        elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
            return "MINIMAX_API_KEY"
+        elif model_lower.startswith("kimi/"):
+            return "KIMI_API_KEY"
        else:
            # Default: assume OpenAI-compatible
            return "OPENAI_API_KEY"
@@ -1334,6 +1407,8 @@ class AgentRunner:
            cred_id = "anthropic"
        elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
            cred_id = "minimax"
+        elif model_lower.startswith("kimi/"):
+            cred_id = "kimi"
        # Add more mappings as providers are added to LLM_CREDENTIALS

        if cred_id is None:
@@ -349,7 +349,7 @@ class AgentRuntime:
                            return
                        # Skip events originating from this graph's own
                        # executions (e.g. guardian should not fire on
-                        # hive_coder failures — only secondary graphs).
+                        # queen failures — only secondary graphs).
                        if _exclude_own and event.graph_id == self._graph_id:
                            return
                        ep_spec = self._entry_points.get(entry_point_id)
@@ -1531,6 +1531,11 @@ class AgentRuntime:
                for executor in stream._active_executors.values():
                    for node_id, node in executor.node_registry.items():
                        if getattr(node, "_awaiting_input", False):
+                            # Skip escalation receivers — those are handled
+                            # by the queen via inject_worker_message(), not
+                            # by the user directly.
+                            if ":escalation:" in node_id:
+                                continue
                            return node_id, graph_id
        return None, None

@@ -123,7 +123,7 @@ class EventType(StrEnum):
    # Custom events
    CUSTOM = "custom"

-    # Escalation (agent requests handoff to hive_coder)
+    # Escalation (agent requests handoff to queen)
    ESCALATION_REQUESTED = "escalation_requested"

    # Worker health monitoring (judge → queen → operator)
@@ -137,6 +137,12 @@ class EventType(StrEnum):
    WORKER_LOADED = "worker_loaded"
    CREDENTIALS_REQUIRED = "credentials_required"

+    # Draft graph (planning phase — lightweight graph preview)
+    DRAFT_GRAPH_UPDATED = "draft_graph_updated"
+
+    # Flowchart map updated (after reconciliation with runtime graph)
+    FLOWCHART_MAP_UPDATED = "flowchart_map_updated"
+
    # Queen phase changes (building <-> staging <-> running)
    QUEEN_PHASE_CHANGED = "queen_phase_changed"

@@ -616,6 +622,7 @@ class EventBus:
        model: str,
        input_tokens: int,
        output_tokens: int,
+        cached_tokens: int = 0,
        execution_id: str | None = None,
        iteration: int | None = None,
    ) -> None:
@@ -625,6 +632,7 @@ class EventBus:
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
+            "cached_tokens": cached_tokens,
        }
        if iteration is not None:
            data["iteration"] = iteration
@@ -722,16 +730,23 @@ class EventBus:
        prompt: str = "",
        execution_id: str | None = None,
        options: list[str] | None = None,
+        questions: list[dict] | None = None,
    ) -> None:
        """Emit client input requested event (client_facing=True nodes).

        Args:
            options: Optional predefined choices for the user (1-3 items).
-                     The frontend appends an "Other" free-text option automatically.
+                     The frontend appends an "Other" free-text option
+                     automatically.
+            questions: Optional list of question dicts for multi-question
+                       batches (from ask_user_multiple). Each dict has id,
+                       prompt, and optional options.
        """
        data: dict[str, Any] = {"prompt": prompt}
        if options:
            data["options"] = options
+        if questions:
+            data["questions"] = questions
        await self.publish(
            AgentEvent(
                type=EventType.CLIENT_INPUT_REQUESTED,
@@ -976,7 +991,7 @@ class EventBus:
        context: str = "",
        execution_id: str | None = None,
    ) -> None:
-        """Emit escalation requested event (agent wants hive_coder)."""
+        """Emit escalation requested event (agent wants queen)."""
        await self.publish(
            AgentEvent(
                type=EventType.ESCALATION_REQUESTED,
@@ -9,6 +9,7 @@ Each stream has:

 import asyncio
 import logging
+import os
 import time
 import uuid
 from collections import OrderedDict
@@ -240,6 +241,7 @@ class ExecutionStream:
        self._active_executions: dict[str, ExecutionContext] = {}
        self._execution_tasks: dict[str, asyncio.Task] = {}
        self._active_executors: dict[str, GraphExecutor] = {}
+        self._cancel_reasons: dict[str, str] = {}
        self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
        self._execution_result_times: dict[str, float] = {}
        self._completion_events: dict[str, asyncio.Event] = {}
@@ -464,7 +466,7 @@ class ExecutionStream:
                        node.signal_shutdown()
                    if hasattr(node, "cancel_current_turn"):
                        node.cancel_current_turn()
-            await self.cancel_execution(eid)
+            await self.cancel_execution(eid, reason="Restarted with new execution")

        # When resuming, reuse the original session ID so the execution
        # continues in the same session directory instead of creating a new one.
@@ -801,19 +803,20 @@ class ExecutionStream:
                # Emit SSE event so the frontend knows the execution stopped.
                # The executor does NOT emit on CancelledError, so there is no
                # risk of double-emitting.
+                cancel_reason = self._cancel_reasons.pop(execution_id, "Execution cancelled")
                if self._scoped_event_bus:
                    if has_result and result.paused_at:
                        await self._scoped_event_bus.emit_execution_paused(
                            stream_id=self.stream_id,
                            node_id=result.paused_at,
-                            reason="Execution cancelled",
+                            reason=cancel_reason,
                            execution_id=execution_id,
                        )
                    else:
                        await self._scoped_event_bus.emit_execution_failed(
                            stream_id=self.stream_id,
                            execution_id=execution_id,
-                            error="Execution cancelled",
+                            error=cancel_reason,
                            correlation_id=ctx.correlation_id,
                        )

@@ -961,6 +964,9 @@ class ExecutionStream:
            if error:
                state.result.error = error

+            # Stamp the owning process ID for cross-process stale detection
+            state.pid = os.getpid()
+
            # Write state.json
            await self._session_store.write_state(execution_id, state)
            logger.debug(f"Wrote state.json for session {execution_id} (status={status})")
@@ -1054,18 +1060,24 @@ class ExecutionStream:
        """Get execution context."""
        return self._active_executions.get(execution_id)

-    async def cancel_execution(self, execution_id: str) -> bool:
+    async def cancel_execution(self, execution_id: str, *, reason: str | None = None) -> bool:
        """
        Cancel a running execution.

        Args:
            execution_id: Execution to cancel
+            reason: Human-readable reason for the cancellation (e.g.
+                "Stopped by queen", "User requested pause"). If not
+                provided, defaults to "Execution cancelled".

        Returns:
            True if cancelled, False if not found
        """
        task = self._execution_tasks.get(execution_id)
        if task and not task.done():
+            # Store the reason so the CancelledError handler can use it
+            # when emitting the pause/fail event.
+            self._cancel_reasons[execution_id] = reason or "Execution cancelled"
            task.cancel()
            # Wait briefly for the task to finish. Don't block indefinitely —
            # the task may be stuck in a long LLM API call that doesn't
@@ -134,6 +134,9 @@ class SessionState(BaseModel):
    # Input data (for debugging/replay)
    input_data: dict[str, Any] = Field(default_factory=dict)

+    # Process ID of the owning process (for cross-process stale session detection)
+    pid: int | None = None
+
    # Isolation level (from ExecutionContext)
    isolation_level: str = "shared"

@@ -1,36 +0,0 @@
-"""Backward-compatibility shim.
-
-The primary implementation is now in ``session_manager.py``.
-This module re-exports ``SessionManager`` as ``AgentManager`` and
-keeps ``AgentSlot`` for test compatibility.
-"""
-
-import asyncio
-from dataclasses import dataclass
-from pathlib import Path
-from typing import Any
-
-from framework.server.session_manager import Session, SessionManager  # noqa: F401
-
-
-@dataclass
-class AgentSlot:
-    """Legacy data class — kept for test compatibility only.
-
-    New code should use ``Session`` from ``session_manager``.
-    """
-
-    id: str
-    agent_path: Path
-    runner: Any
-    runtime: Any
-    info: Any
-    loaded_at: float
-    queen_executor: Any = None
-    queen_task: asyncio.Task | None = None
-    judge_task: asyncio.Task | None = None
-    escalation_sub: str | None = None
-
-
-# Backward compat alias
-AgentManager = SessionManager
@@ -0,0 +1,331 @@
+"""Queen orchestrator — builds and runs the queen executor.
+
+Extracted from SessionManager._start_queen() to keep session management
+and queen orchestration concerns separate.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from pathlib import Path
+from typing import TYPE_CHECKING, Any
+
+if TYPE_CHECKING:
+    from framework.server.session_manager import Session
+
+logger = logging.getLogger(__name__)
+
+
+async def create_queen(
+    session: Session,
+    session_manager: Any,
+    worker_identity: str | None,
+    queen_dir: Path,
+    initial_prompt: str | None = None,
+) -> asyncio.Task:
+    """Build the queen executor and return the running asyncio task.
+
+    Handles tool registration, phase-state initialization, prompt
+    composition, persona hook setup, graph preparation, and the queen
+    event loop.
+    """
+    from framework.agents.queen.agent import (
+        queen_goal,
+        queen_graph as _queen_graph,
+    )
+    from framework.agents.queen.nodes import (
+        _QUEEN_BUILDING_TOOLS,
+        _QUEEN_PLANNING_TOOLS,
+        _QUEEN_RUNNING_TOOLS,
+        _QUEEN_STAGING_TOOLS,
+        _appendices,
+        _building_knowledge,
+        _planning_knowledge,
+        _queen_behavior_always,
+        _queen_behavior_building,
+        _queen_behavior_planning,
+        _queen_behavior_running,
+        _queen_behavior_staging,
+        _queen_identity_building,
+        _queen_identity_planning,
+        _queen_identity_running,
+        _queen_identity_staging,
+        _queen_phase_7,
+        _queen_style,
+        _queen_tools_building,
+        _queen_tools_planning,
+        _queen_tools_running,
+        _queen_tools_staging,
+        _shared_building_knowledge,
+    )
+    from framework.agents.queen.nodes.thinking_hook import select_expert_persona
+    from framework.graph.event_loop_node import HookContext, HookResult
+    from framework.graph.executor import GraphExecutor
+    from framework.runner.tool_registry import ToolRegistry
+    from framework.runtime.core import Runtime
+    from framework.runtime.event_bus import AgentEvent, EventType
+    from framework.tools.queen_lifecycle_tools import (
+        QueenPhaseState,
+        register_queen_lifecycle_tools,
+    )
+
+    hive_home = Path.home() / ".hive"
+
+    # ---- Tool registry ------------------------------------------------
+    queen_registry = ToolRegistry()
+    import framework.agents.queen as _queen_pkg
+
+    queen_pkg_dir = Path(_queen_pkg.__file__).parent
+    mcp_config = queen_pkg_dir / "mcp_servers.json"
+    if mcp_config.exists():
+        try:
+            queen_registry.load_mcp_config(mcp_config)
+            logger.info("Queen: loaded MCP tools from %s", mcp_config)
+        except Exception:
+            logger.warning("Queen: MCP config failed to load", exc_info=True)
+
+    # ---- Phase state --------------------------------------------------
+    initial_phase = "staging" if worker_identity else "planning"
+    phase_state = QueenPhaseState(phase=initial_phase, event_bus=session.event_bus)
+    session.phase_state = phase_state
+
+    # ---- Lifecycle tools (always registered) --------------------------
+    register_queen_lifecycle_tools(
+        queen_registry,
+        session=session,
+        session_id=session.id,
+        session_manager=session_manager,
+        manager_session_id=session.id,
+        phase_state=phase_state,
+    )
+
+    # ---- Monitoring tools (only when worker is loaded) ----------------
+    if session.worker_runtime:
+        from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
+
+        register_worker_monitoring_tools(
+            queen_registry,
+            session.event_bus,
+            session.worker_path,
+            stream_id="queen",
+            worker_graph_id=session.worker_runtime._graph_id,
+        )
+
+    queen_tools = list(queen_registry.get_tools().values())
+    queen_tool_executor = queen_registry.get_executor()
+
+    # ---- Partition tools by phase ------------------------------------
+    planning_names = set(_QUEEN_PLANNING_TOOLS)
+    building_names = set(_QUEEN_BUILDING_TOOLS)
+    staging_names = set(_QUEEN_STAGING_TOOLS)
+    running_names = set(_QUEEN_RUNNING_TOOLS)
+
+    registered_names = {t.name for t in queen_tools}
+    missing_building = building_names - registered_names
+    if missing_building:
+        logger.warning(
+            "Queen: %d/%d building tools NOT registered: %s",
+            len(missing_building),
+            len(building_names),
+            sorted(missing_building),
+        )
+    logger.info("Queen: registered tools: %s", sorted(registered_names))
+
+    phase_state.planning_tools = [t for t in queen_tools if t.name in planning_names]
+    phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
+    phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
+    phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
+
+    # ---- Cross-session memory ----------------------------------------
+    from framework.agents.queen.queen_memory import seed_if_missing
+
+    seed_if_missing()
+
+    # ---- Compose phase-specific prompts ------------------------------
+    _orig_node = _queen_graph.nodes[0]
+
+    if worker_identity is None:
+        worker_identity = (
+            "\n\n# Worker Profile\n"
+            "No worker agent loaded. You are operating independently.\n"
+            "Handle all tasks directly using your coding tools."
+        )
+
+    _planning_body = (
+        _queen_style
+        + _shared_building_knowledge
+        + _queen_tools_planning
+        + _queen_behavior_always
+        + _queen_behavior_planning
+        + _planning_knowledge
+        + worker_identity
+    )
+    phase_state.prompt_planning = _queen_identity_planning + _planning_body
+
+    _building_body = (
+        _queen_style
+        + _shared_building_knowledge
+        + _queen_tools_building
+        + _queen_behavior_always
+        + _queen_behavior_building
+        + _building_knowledge
+        + _queen_phase_7
+        + _appendices
+        + worker_identity
+    )
+    phase_state.prompt_building = _queen_identity_building + _building_body
+    phase_state.prompt_staging = (
+        _queen_identity_staging
+        + _queen_style
+        + _queen_tools_staging
+        + _queen_behavior_always
+        + _queen_behavior_staging
+        + worker_identity
+    )
+    phase_state.prompt_running = (
+        _queen_identity_running
+        + _queen_style
+        + _queen_tools_running
+        + _queen_behavior_always
+        + _queen_behavior_running
+        + worker_identity
+    )
+
+    # ---- Persona hook ------------------------------------------------
+    _session_llm = session.llm
+    _session_event_bus = session.event_bus
+
+    async def _persona_hook(ctx: HookContext) -> HookResult | None:
+        persona = await select_expert_persona(ctx.trigger or "", _session_llm)
+        if not persona:
+            return None
+        if _session_event_bus is not None:
+            await _session_event_bus.publish(
+                AgentEvent(
+                    type=EventType.QUEEN_PERSONA_SELECTED,
+                    stream_id="queen",
+                    data={"persona": persona},
+                )
+            )
+        return HookResult(system_prompt=persona + "\n\n" + phase_state.get_current_prompt())
+
+    # ---- Graph preparation -------------------------------------------
+    initial_prompt_text = phase_state.get_current_prompt()
+
+    registered_tool_names = set(queen_registry.get_tools().keys())
+    declared_tools = _orig_node.tools or []
+    available_tools = [t for t in declared_tools if t in registered_tool_names]
+
+    node_updates: dict = {
+        "system_prompt": initial_prompt_text,
+    }
+    if set(available_tools) != set(declared_tools):
+        missing = sorted(set(declared_tools) - registered_tool_names)
+        if missing:
+            logger.warning("Queen: tools not available: %s", missing)
+        node_updates["tools"] = available_tools
+
+    adjusted_node = _orig_node.model_copy(update=node_updates)
+    _queen_loop_config = {
+        **(_queen_graph.loop_config or {}),
+        "hooks": {"session_start": [_persona_hook]},
+    }
+    queen_graph = _queen_graph.model_copy(
+        update={"nodes": [adjusted_node], "loop_config": _queen_loop_config}
+    )
+
+    # ---- Queen event loop --------------------------------------------
+    queen_runtime = Runtime(hive_home / "queen")
+
+    async def _queen_loop():
+        try:
+            executor = GraphExecutor(
+                runtime=queen_runtime,
+                llm=session.llm,
+                tools=queen_tools,
+                tool_executor=queen_tool_executor,
+                event_bus=session.event_bus,
+                stream_id="queen",
+                storage_path=queen_dir,
+                loop_config=_queen_loop_config,
+                execution_id=session.id,
+                dynamic_tools_provider=phase_state.get_current_tools,
+                dynamic_prompt_provider=phase_state.get_current_prompt,
+            )
+            session.queen_executor = executor
+
+            # Wire inject_notification so phase switches notify the queen LLM
+            async def _inject_phase_notification(content: str) -> None:
+                node = executor.node_registry.get("queen")
+                if node is not None and hasattr(node, "inject_event"):
+                    await node.inject_event(content)
+
+            phase_state.inject_notification = _inject_phase_notification
+
+            # Auto-switch to staging when worker execution finishes
+            async def _on_worker_done(event):
+                if event.stream_id == "queen":
+                    return
+                if phase_state.phase == "running":
+                    if event.type == EventType.EXECUTION_COMPLETED:
+                        output = event.data.get("output", {})
+                        output_summary = ""
+                        if output:
+                            for key, value in output.items():
+                                val_str = str(value)
+                                if len(val_str) > 200:
+                                    val_str = val_str[:200] + "..."
+                                output_summary += f"\n  {key}: {val_str}"
+                        _out = output_summary or " (no output keys set)"
+                        notification = (
+                            "[WORKER_TERMINAL] Worker finished successfully.\n"
+                            f"Output:{_out}\n"
+                            "Report this to the user. "
+                            "Ask if they want to continue with another run."
+                        )
+                    else:  # EXECUTION_FAILED
+                        error = event.data.get("error", "Unknown error")
+                        notification = (
+                            "[WORKER_TERMINAL] Worker failed.\n"
+                            f"Error: {error}\n"
+                            "Report this to the user and help them troubleshoot."
+                        )
+
+                    node = executor.node_registry.get("queen")
+                    if node is not None and hasattr(node, "inject_event"):
+                        await node.inject_event(notification)
+
+                    await phase_state.switch_to_staging(source="auto")
+
+            session.event_bus.subscribe(
+                event_types=[EventType.EXECUTION_COMPLETED, EventType.EXECUTION_FAILED],
+                handler=_on_worker_done,
+            )
+            session_manager._subscribe_worker_handoffs(session, executor)
+
+            logger.info(
+                "Queen starting in %s phase with %d tools: %s",
+                phase_state.phase,
+                len(phase_state.get_current_tools()),
+                [t.name for t in phase_state.get_current_tools()],
+            )
+            result = await executor.execute(
+                graph=queen_graph,
+                goal=queen_goal,
+                input_data={"greeting": initial_prompt or "Session started."},
+                session_state={"resume_session_id": session.id},
+            )
+            if result.success:
+                logger.warning("Queen executor returned (should be forever-alive)")
+            else:
+                logger.error(
+                    "Queen executor failed: %s",
+                    result.error or "(no error message)",
+                )
+        except Exception:
+            logger.error("Queen conversation crashed", exc_info=True)
+        finally:
+            session.queen_executor = None
+
+    return asyncio.create_task(_queen_loop())
@@ -40,6 +40,7 @@ DEFAULT_EVENT_TYPES = [
    EventType.CREDENTIALS_REQUIRED,
    EventType.SUBAGENT_REPORT,
    EventType.QUEEN_PHASE_CHANGED,
+    EventType.DRAFT_GRAPH_UPDATED,
 ]

 # Keepalive interval in seconds
@@ -347,7 +347,7 @@ async def handle_pause(request: web.Request) -> web.Response:

            for exec_id in list(stream.active_execution_ids):
                try:
-                    ok = await stream.cancel_execution(exec_id)
+                    ok = await stream.cancel_execution(exec_id, reason="Execution paused by user")
                    if ok:
                        cancelled.append(exec_id)
                except Exception:
@@ -357,8 +357,8 @@ async def handle_pause(request: web.Request) -> web.Response:
    runtime.pause_timers()

    # Switch to staging (agent still loaded, ready to re-run)
-    if session.mode_state is not None:
-        await session.mode_state.switch_to_staging(source="frontend")
+    if session.phase_state is not None:
+        await session.phase_state.switch_to_staging(source="frontend")

    return web.json_response(
        {
@@ -400,7 +400,9 @@ async def handle_stop(request: web.Request) -> web.Response:
                    if hasattr(node, "cancel_current_turn"):
                        node.cancel_current_turn()

-            cancelled = await stream.cancel_execution(execution_id)
+            cancelled = await stream.cancel_execution(
+                execution_id, reason="Execution stopped by user"
+            )
            if cancelled:
                # Cancel queen's in-progress LLM turn
                if session.queen_executor:
@@ -234,8 +234,69 @@ async def handle_node_tools(request: web.Request) -> web.Response:
    return web.json_response({"tools": tools_out})


+async def handle_draft_graph(request: web.Request) -> web.Response:
+    """Return the current draft graph from planning phase (if any)."""
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    phase_state = getattr(session, "phase_state", None)
+    if phase_state is None or phase_state.draft_graph is None:
+        return web.json_response({"draft": None})
+
+    return web.json_response({"draft": phase_state.draft_graph})
+
+
+async def handle_flowchart_map(request: web.Request) -> web.Response:
+    """Return the flowchart→runtime node mapping and the original (pre-dissolution) draft.
+
+    Available after confirm_and_build() dissolves decision nodes, or loaded
+    from the agent's flowchart.json file, or synthesized from the runtime graph.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    phase_state = getattr(session, "phase_state", None)
+
+    # Fast path: already in memory
+    if phase_state is not None and phase_state.original_draft_graph is not None:
+        return web.json_response({
+            "map": phase_state.flowchart_map,
+            "original_draft": phase_state.original_draft_graph,
+        })
+
+    # Try loading from flowchart.json in the agent folder
+    worker_path = getattr(session, "worker_path", None)
+    if worker_path is not None:
+        from pathlib import Path
+
+        target = Path(worker_path) / "flowchart.json"
+        if target.is_file():
+            try:
+                data = json.loads(target.read_text(encoding="utf-8"))
+                original_draft = data.get("original_draft")
+                fmap = data.get("flowchart_map")
+                # Cache in phase_state for future requests
+                if phase_state is not None and original_draft:
+                    phase_state.original_draft_graph = original_draft
+                    phase_state.flowchart_map = fmap
+                return web.json_response({
+                    "map": fmap,
+                    "original_draft": original_draft,
+                })
+            except Exception:
+                logger.warning("Failed to read flowchart.json from %s", worker_path)
+
+    return web.json_response({"map": None, "original_draft": None})
+
+
 def register_routes(app: web.Application) -> None:
    """Register graph/node inspection routes."""
+    # Draft graph (planning phase — visual only, no loaded worker required)
+    app.router.add_get("/api/sessions/{session_id}/draft-graph", handle_draft_graph)
+    # Flowchart map (post-dissolution — maps runtime nodes to original draft nodes)
+    app.router.add_get("/api/sessions/{session_id}/flowchart-map", handle_flowchart_map)
    # Session-primary routes
    app.router.add_get("/api/sessions/{session_id}/graphs/{graph_id}/nodes", handle_list_nodes)
    app.router.add_get(
@@ -61,7 +61,7 @@ def _session_to_live_dict(session) -> dict:
        "loaded_at": session.loaded_at,
        "uptime_seconds": round(time.time() - session.loaded_at, 1),
        "intro_message": getattr(session.runner, "intro_message", "") or "",
-        "queen_phase": phase_state.phase if phase_state else "building",
+        "queen_phase": phase_state.phase if phase_state else "planning",
    }


@@ -731,7 +731,7 @@ async def handle_delete_history_session(request: web.Request) -> web.Response:

 async def handle_discover(request: web.Request) -> web.Response:
    """GET /api/discover — discover agents from filesystem."""
-    from framework.tui.screens.agent_picker import discover_agents
+    from framework.agents.discovery import discover_agents

    manager = _get_manager(request)
    loaded_paths = {str(s.worker_path) for s in manager.list_sessions() if s.worker_path}
@@ -46,6 +46,8 @@ class Session:
    judge_task: asyncio.Task | None = None
    escalation_sub: str | None = None
    worker_handoff_sub: str | None = None
+    # Memory consolidation subscription (fires on CONTEXT_COMPACTED)
+    memory_consolidation_sub: str | None = None
    # Session directory resumption:
    # When set, _start_queen writes queen conversations to this existing session's
    # directory instead of creating a new one.  This lets cold-restores accumulate
@@ -276,11 +278,20 @@ class SessionManager:
        When a new runtime starts, any on-disk session still marked 'active'
        is from a process that no longer exists. 'Paused' sessions are left
        intact so they remain resumable.
+
+        Two-layer protection against corrupting live sessions:
+        1. In-memory: skip any session ID currently tracked in self._sessions
+           (guaranteed alive in this process).
+        2. PID validation: if state.json contains a ``pid`` field, check whether
+           that process is still running on the host. If it is, the session is
+           owned by another healthy worker process, so leave it alone.
        """
        sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
        if not sessions_path.exists():
            return

+        live_session_ids = set(self._sessions.keys())
+
        for d in sessions_path.iterdir():
            if not d.is_dir() or not d.name.startswith("session_"):
                continue
@@ -291,6 +302,26 @@ class SessionManager:
                state = json.loads(state_path.read_text(encoding="utf-8"))
                if state.get("status") != "active":
                    continue
+
+                # Layer 1: skip sessions that are alive in this process
+                session_id = state.get("session_id", d.name)
+                if session_id in live_session_ids or d.name in live_session_ids:
+                    logger.debug(
+                        "Skipping live in-memory session '%s' during stale cleanup",
+                        d.name,
+                    )
+                    continue
+
+                # Layer 2: skip sessions whose owning process is still alive
+                recorded_pid = state.get("pid")
+                if recorded_pid is not None and self._is_pid_alive(recorded_pid):
+                    logger.debug(
+                        "Skipping session '%s' — owning process %d is still running",
+                        d.name,
+                        recorded_pid,
+                    )
+                    continue
+
                state["status"] = "cancelled"
                state.setdefault("result", {})["error"] = "Stale session: runtime restarted"
                state.setdefault("timestamps", {})["updated_at"] = datetime.now().isoformat()
@@ -301,6 +332,34 @@ class SessionManager:
            except (json.JSONDecodeError, OSError) as e:
                logger.warning("Failed to clean up stale session %s: %s", d.name, e)

+    @staticmethod
+    def _is_pid_alive(pid: int) -> bool:
+        """Check whether a process with the given PID is still running."""
+        import os
+        import platform
+
+        if platform.system() == "Windows":
+            import ctypes
+
+            # PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
+            kernel32 = ctypes.windll.kernel32
+            handle = kernel32.OpenProcess(0x1000, False, pid)
+            if not handle:
+                # 5 is ERROR_ACCESS_DENIED, meaning the process exists but is protected
+                return kernel32.GetLastError() == 5
+
+            exit_code = ctypes.c_ulong()
+            kernel32.GetExitCodeProcess(handle, ctypes.byref(exit_code))
+            kernel32.CloseHandle(handle)
+            # 259 is STILL_ACTIVE
+            return exit_code.value == 259
+        else:
+            try:
+                os.kill(pid, 0)
+            except OSError:
+                return False
+            return True
+
    async def load_worker(
        self,
        session_id: str,
@@ -325,9 +384,9 @@ class SessionManager:
            model=model,
        )

-        # Notify queen about the loaded worker (skip for hive_coder itself).
+        # Notify queen about the loaded worker (skip for queen itself).
        # Health judge disabled for simplicity.
-        if agent_path.name != "hive_coder" and session.worker_runtime:
+        if agent_path.name != "queen" and session.worker_runtime:
            # await self._start_judge(session, session.runner._storage_path)
            await self._notify_queen_worker_loaded(session)

@@ -379,6 +438,11 @@ class SessionManager:
        if session is None:
            return False

+        # Capture session data for memory consolidation before teardown
+        _llm = getattr(session, "llm", None)
+        _storage_id = getattr(session, "queen_resume_from", None) or session_id
+        _session_dir = Path.home() / ".hive" / "queen" / "session" / _storage_id
+
        # Stop judge
        self._stop_judge(session)
        if session.worker_handoff_sub is not None:
@@ -388,7 +452,13 @@ class SessionManager:
                pass
            session.worker_handoff_sub = None

-        # Stop queen
+        # Stop queen and memory consolidation subscription
+        if session.memory_consolidation_sub is not None:
+            try:
+                session.event_bus.unsubscribe(session.memory_consolidation_sub)
+            except Exception:
+                pass
+            session.memory_consolidation_sub = None
        if session.queen_task is not None:
            session.queen_task.cancel()
            session.queen_task = None
@@ -401,6 +471,17 @@ class SessionManager:
            except Exception as e:
                logger.error("Error cleaning up worker: %s", e)

+        # Final memory consolidation — fire-and-forget so teardown isn't blocked.
+        if _llm is not None and _session_dir.exists():
+            import asyncio
+
+            from framework.agents.queen.queen_memory import consolidate_queen_memory
+
+            asyncio.create_task(
+                consolidate_queen_memory(session_id, _session_dir, _llm),
+                name=f"queen-memory-consolidation-{session_id}",
+            )
+
        logger.info("Session '%s' stopped", session_id)
        return True

@@ -461,13 +542,7 @@ class SessionManager:
        are written to the ORIGINAL session's directory so the full conversation
        history accumulates in one place across server restarts.
        """
-        from framework.agents.hive_coder.agent import (
-            queen_goal,
-            queen_graph as _queen_graph,
-        )
-        from framework.graph.executor import GraphExecutor
-        from framework.runner.tool_registry import ToolRegistry
-        from framework.runtime.core import Runtime
+        from framework.server.queen_orchestrator import create_queen

        hive_home = Path.home() / ".hive"

@@ -505,284 +580,33 @@ class SessionManager:
        except OSError:
            pass

-        # Register MCP coding tools
-        queen_registry = ToolRegistry()
-        import framework.agents.hive_coder as _hive_coder_pkg
-
-        hive_coder_dir = Path(_hive_coder_pkg.__file__).parent
-        mcp_config = hive_coder_dir / "mcp_servers.json"
-        if mcp_config.exists():
-            try:
-                queen_registry.load_mcp_config(mcp_config)
-                logger.info("Queen: loaded MCP tools from %s", mcp_config)
-            except Exception:
-                logger.warning("Queen: MCP config failed to load", exc_info=True)
-
-        # Phase state for building/running phase switching
-        from framework.tools.queen_lifecycle_tools import (
-            QueenPhaseState,
-            register_queen_lifecycle_tools,
-        )
-
-        # Start in staging when the caller provided an agent, building otherwise.
-        initial_phase = "staging" if worker_identity else "building"
-        phase_state = QueenPhaseState(phase=initial_phase, event_bus=session.event_bus)
-        session.phase_state = phase_state
-
-        # Always register lifecycle tools — they check session.worker_runtime
-        # at call time, so they work even if no worker is loaded yet.
-        register_queen_lifecycle_tools(
-            queen_registry,
+        session.queen_task = await create_queen(
            session=session,
-            session_id=session.id,
            session_manager=self,
-            manager_session_id=session.id,
-            phase_state=phase_state,
+            worker_identity=worker_identity,
+            queen_dir=queen_dir,
+            initial_prompt=initial_prompt,
        )

-        # Monitoring tools need concrete worker paths — only register when present
-        if session.worker_runtime:
-            from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
+        # Memory consolidation — triggered by context compaction events.
+        # Compaction is a natural signal that "enough has happened to be worth remembering".
+        _consolidation_llm = session.llm
+        _consolidation_session_dir = queen_dir

-            register_worker_monitoring_tools(
-                queen_registry,
-                session.event_bus,
-                session.worker_path,
-                stream_id="queen",
-                worker_graph_id=session.worker_runtime._graph_id,
+        async def _on_compaction(_event) -> None:
+            from framework.agents.queen.queen_memory import consolidate_queen_memory
+
+            await consolidate_queen_memory(
+                session.id, _consolidation_session_dir, _consolidation_llm
            )

-        queen_tools = list(queen_registry.get_tools().values())
-        queen_tool_executor = queen_registry.get_executor()
+        from framework.runtime.event_bus import EventType as _ET

-        # Partition tools into phase-specific sets and import prompt segments
-        from framework.agents.hive_coder.nodes import (
-            _QUEEN_BUILDING_TOOLS,
-            _QUEEN_RUNNING_TOOLS,
-            _QUEEN_STAGING_TOOLS,
-            _appendices,
-            _gcu_building_section,
-            _package_builder_knowledge,
-            _queen_behavior_always,
-            _queen_behavior_building,
-            _queen_behavior_running,
-            _queen_behavior_staging,
-            _queen_identity_building,
-            _queen_identity_running,
-            _queen_identity_staging,
-            _queen_phase_7,
-            _queen_style,
-            _queen_tools_building,
-            _queen_tools_running,
-            _queen_tools_staging,
+        session.memory_consolidation_sub = session.event_bus.subscribe(
+            event_types=[_ET.CONTEXT_COMPACTED],
+            handler=_on_compaction,
        )

-        building_names = set(_QUEEN_BUILDING_TOOLS)
-        staging_names = set(_QUEEN_STAGING_TOOLS)
-        running_names = set(_QUEEN_RUNNING_TOOLS)
-
-        registered_names = {t.name for t in queen_tools}
-        missing_building = building_names - registered_names
-        if missing_building:
-            logger.warning(
-                "Queen: %d/%d building tools NOT registered: %s",
-                len(missing_building),
-                len(building_names),
-                sorted(missing_building),
-            )
-        logger.info("Queen: registered tools: %s", sorted(registered_names))
-
-        phase_state.building_tools = [t for t in queen_tools if t.name in building_names]
-        phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
-        phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
-
-        # Build queen graph with adjusted prompt + tools
-        _orig_node = _queen_graph.nodes[0]
-
-        if worker_identity is None:
-            worker_identity = (
-                "\n\n# Worker Profile\n"
-                "No worker agent loaded. You are operating independently.\n"
-                "Handle all tasks directly using your coding tools."
-            )
-
-        # Compose phase-specific prompts.
-        _building_body = (
-            _queen_style
-            + _queen_tools_building
-            + _queen_behavior_always
-            + _queen_behavior_building
-            + _package_builder_knowledge
-            + _gcu_building_section
-            + _queen_phase_7
-            + _appendices
-            + worker_identity
-        )
-        phase_state.prompt_building = _queen_identity_building + _building_body
-        phase_state.prompt_staging = (
-            _queen_identity_staging
-            + _queen_style
-            + _queen_tools_staging
-            + _queen_behavior_always
-            + _queen_behavior_staging
-            + worker_identity
-        )
-        phase_state.prompt_running = (
-            _queen_identity_running
-            + _queen_style
-            + _queen_tools_running
-            + _queen_behavior_always
-            + _queen_behavior_running
-            + worker_identity
-        )
-
-        # Build the session_start hook: selects the best-fit expert persona
-        # from the user's opening message and replaces the identity prefix.
-        from framework.agents.hive_coder.nodes.thinking_hook import select_expert_persona
-        from framework.graph.event_loop_node import HookContext, HookResult
-        from framework.runtime.event_bus import AgentEvent, EventType
-
-        _session_llm = session.llm
-        _session_event_bus = session.event_bus
-
-        async def _persona_hook(ctx: HookContext) -> HookResult | None:
-            persona = await select_expert_persona(ctx.trigger or "", _session_llm)
-            if not persona:
-                return None
-            if _session_event_bus is not None:
-                await _session_event_bus.publish(
-                    AgentEvent(
-                        type=EventType.QUEEN_PERSONA_SELECTED,
-                        stream_id="queen",
-                        data={"persona": persona},
-                    )
-                )
-            return HookResult(system_prompt=persona + "\n\n" + _building_body)
-
-        initial_prompt_text = phase_state.get_current_prompt()
-
-        registered_tool_names = set(queen_registry.get_tools().keys())
-        declared_tools = _orig_node.tools or []
-        available_tools = [t for t in declared_tools if t in registered_tool_names]
-
-        node_updates: dict = {
-            "system_prompt": initial_prompt_text,
-        }
-        if set(available_tools) != set(declared_tools):
-            missing = sorted(set(declared_tools) - registered_tool_names)
-            if missing:
-                logger.warning("Queen: tools not available: %s", missing)
-            node_updates["tools"] = available_tools
-
-        adjusted_node = _orig_node.model_copy(update=node_updates)
-        _queen_loop_config = {
-            **(_queen_graph.loop_config or {}),
-            "hooks": {"session_start": [_persona_hook]},
-        }
-        queen_graph = _queen_graph.model_copy(
-            update={"nodes": [adjusted_node], "loop_config": _queen_loop_config}
-        )
-
-        queen_runtime = Runtime(hive_home / "queen")
-
-        async def _queen_loop():
-            try:
-                executor = GraphExecutor(
-                    runtime=queen_runtime,
-                    llm=session.llm,
-                    tools=queen_tools,
-                    tool_executor=queen_tool_executor,
-                    event_bus=session.event_bus,
-                    stream_id="queen",
-                    storage_path=queen_dir,
-                    loop_config=_queen_loop_config,
-                    execution_id=session.id,
-                    dynamic_tools_provider=phase_state.get_current_tools,
-                    dynamic_prompt_provider=phase_state.get_current_prompt,
-                )
-                session.queen_executor = executor
-
-                # Wire inject_notification so phase switches notify the queen LLM
-                async def _inject_phase_notification(content: str) -> None:
-                    node = executor.node_registry.get("queen")
-                    if node is not None and hasattr(node, "inject_event"):
-                        await node.inject_event(content)
-
-                phase_state.inject_notification = _inject_phase_notification
-
-                # Auto-switch to staging when worker execution finishes naturally
-                # and notify the queen about the termination
-                from framework.runtime.event_bus import EventType as _ET
-
-                async def _on_worker_done(event):
-                    if event.stream_id == "queen":
-                        return
-                    if phase_state.phase == "running":
-                        # Build termination notification for the queen
-                        if event.type == _ET.EXECUTION_COMPLETED:
-                            output = event.data.get("output", {})
-                            output_summary = ""
-                            if output:
-                                # Summarize key outputs for the queen
-                                for key, value in output.items():
-                                    val_str = str(value)
-                                    if len(val_str) > 200:
-                                        val_str = val_str[:200] + "..."
-                                    output_summary += f"\n  {key}: {val_str}"
-                            _out = output_summary or " (no output keys set)"
-                            notification = (
-                                "[WORKER_TERMINAL] Worker finished successfully.\n"
-                                f"Output:{_out}\n"
-                                "Report this to the user. "
-                                "Ask if they want to continue with another run."
-                            )
-                        else:  # EXECUTION_FAILED
-                            error = event.data.get("error", "Unknown error")
-                            notification = (
-                                "[WORKER_TERMINAL] Worker failed.\n"
-                                f"Error: {error}\n"
-                                "Report this to the user and help them troubleshoot."
-                            )
-
-                        # Inject notification to queen before phase switch
-                        node = executor.node_registry.get("queen")
-                        if node is not None and hasattr(node, "inject_event"):
-                            await node.inject_event(notification)
-
-                        await phase_state.switch_to_staging(source="auto")
-
-                session.event_bus.subscribe(
-                    event_types=[_ET.EXECUTION_COMPLETED, _ET.EXECUTION_FAILED],
-                    handler=_on_worker_done,
-                )
-                self._subscribe_worker_handoffs(session, executor)
-
-                logger.info(
-                    "Queen starting in %s phase with %d tools: %s",
-                    phase_state.phase,
-                    len(phase_state.get_current_tools()),
-                    [t.name for t in phase_state.get_current_tools()],
-                )
-                result = await executor.execute(
-                    graph=queen_graph,
-                    goal=queen_goal,
-                    input_data={"greeting": initial_prompt or "Session started."},
-                    session_state={"resume_session_id": session.id},
-                )
-                if result.success:
-                    logger.warning("Queen executor returned (should be forever-alive)")
-                else:
-                    logger.error(
-                        "Queen executor failed: %s",
-                        result.error or "(no error message)",
-                    )
-            except Exception:
-                logger.error("Queen conversation crashed", exc_info=True)
-            finally:
-                session.queen_executor = None
-
-        session.queen_task = asyncio.create_task(_queen_loop())
-
    # ------------------------------------------------------------------
    # Judge startup / teardown
    # ------------------------------------------------------------------
@@ -37,6 +37,7 @@ class MockNodeSpec:
    client_facing: bool = False
    success_criteria: str | None = None
    system_prompt: str | None = None
+    sub_agents: list = field(default_factory=list)


@dataclass
@@ -67,6 +68,7 @@ class MockEntryPoint:
    name: str = "Default"
    entry_node: str = "start"
    trigger_type: str = "manual"
+    trigger_config: dict = field(default_factory=dict)


@dataclass
@@ -130,6 +132,9 @@ class MockRuntime:
    def get_stats(self):
        return {"running": True, "executions": 1}

+    def get_timer_next_fire_in(self, ep_id):
+        return None
+

 class MockAgentInfo:
    name: str = "test_agent"
@@ -1556,3 +1561,106 @@ class TestErrorMiddleware:
        async with TestClient(TestServer(app)) as client:
            resp = await client.get("/api/nonexistent")
            assert resp.status == 404
+
+
+class TestCleanupStaleActiveSessions:
+    """Tests for _cleanup_stale_active_sessions with two-layer protection."""
+
+    def _make_manager(self):
+        from framework.server.session_manager import SessionManager
+
+        return SessionManager()
+
+    def _write_state(self, session_dir: Path, status: str, pid: int | None = None) -> None:
+        session_dir.mkdir(parents=True, exist_ok=True)
+        state: dict = {"status": status, "session_id": session_dir.name}
+        if pid is not None:
+            state["pid"] = pid
+        (session_dir / "state.json").write_text(json.dumps(state))
+
+    def _read_state(self, session_dir: Path) -> dict:
+        return json.loads((session_dir / "state.json").read_text())
+
+    def test_stale_session_is_cancelled(self, tmp_path, monkeypatch):
+        """Truly stale active sessions (no live tracking, no PID) get cancelled."""
+        monkeypatch.setattr(Path, "home", lambda: tmp_path)
+        agent_path = Path("my_agent")
+        sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
+        session_dir = sessions_dir / "session_stale_001"
+
+        self._write_state(session_dir, "active")
+
+        mgr = self._make_manager()
+        mgr._cleanup_stale_active_sessions(agent_path)
+
+        state = self._read_state(session_dir)
+        assert state["status"] == "cancelled"
+        assert "Stale session" in state["result"]["error"]
+
+    def test_live_in_memory_session_is_skipped(self, tmp_path, monkeypatch):
+        """Sessions tracked in self._sessions must NOT be cancelled (Layer 1)."""
+        monkeypatch.setattr(Path, "home", lambda: tmp_path)
+        agent_path = Path("my_agent")
+        sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
+        session_dir = sessions_dir / "session_live_002"
+
+        self._write_state(session_dir, "active")
+
+        mgr = self._make_manager()
+        # Simulate a live session in the manager's in-memory map
+        mgr._sessions["session_live_002"] = MagicMock()
+
+        mgr._cleanup_stale_active_sessions(agent_path)
+
+        state = self._read_state(session_dir)
+        assert state["status"] == "active", "Live in-memory session should NOT be cancelled"
+
+    def test_session_with_live_pid_is_skipped(self, tmp_path, monkeypatch):
+        """Sessions whose owning PID is still alive must NOT be cancelled (Layer 2)."""
+        import os
+
+        monkeypatch.setattr(Path, "home", lambda: tmp_path)
+        agent_path = Path("my_agent")
+        sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
+        session_dir = sessions_dir / "session_pid_003"
+
+        # Use the current process PID — guaranteed to be alive
+        self._write_state(session_dir, "active", pid=os.getpid())
+
+        mgr = self._make_manager()
+        mgr._cleanup_stale_active_sessions(agent_path)
+
+        state = self._read_state(session_dir)
+        assert state["status"] == "active", "Session with live PID should NOT be cancelled"
+
+    def test_session_with_dead_pid_is_cancelled(self, tmp_path, monkeypatch):
+        """Sessions whose owning PID is dead should be cancelled."""
+        monkeypatch.setattr(Path, "home", lambda: tmp_path)
+        agent_path = Path("my_agent")
+        sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
+        session_dir = sessions_dir / "session_dead_004"
+
+        # Use a PID that is almost certainly not running
+        self._write_state(session_dir, "active", pid=999999999)
+
+        mgr = self._make_manager()
+        mgr._cleanup_stale_active_sessions(agent_path)
+
+        state = self._read_state(session_dir)
+        assert state["status"] == "cancelled"
+        assert "Stale session" in state["result"]["error"]
+
+    def test_paused_session_is_never_touched(self, tmp_path, monkeypatch):
+        """Paused sessions should remain intact regardless of PID or tracking."""
+        monkeypatch.setattr(Path, "home", lambda: tmp_path)
+        agent_path = Path("my_agent")
+        sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
+        session_dir = sessions_dir / "session_paused_005"
+
+        self._write_state(session_dir, "paused")
+
+        mgr = self._make_manager()
+        mgr._cleanup_stale_active_sessions(agent_path)
+
+        state = self._read_state(session_dir)
+        assert state["status"] == "paused", "Paused sessions must remain untouched"
@@ -1,179 +0,0 @@
-"""
-State Writer - Dual-write adapter for migration period.
-
-Writes execution state to both old (Run/RunSummary) and new (state.json) formats
-to maintain backward compatibility during the transition period.
-"""
-
-import logging
-import os
-from datetime import datetime
-
-from framework.schemas.run import Problem, Run, RunMetrics, RunStatus
-from framework.schemas.session_state import SessionState, SessionStatus
-from framework.storage.concurrent import ConcurrentStorage
-from framework.storage.session_store import SessionStore
-
-logger = logging.getLogger(__name__)
-
-
-class StateWriter:
-    """
-    Writes execution state to both old and new formats during migration.
-
-    During the dual-write phase:
-    - New format (state.json) is written when USE_UNIFIED_SESSIONS=true
-    - Old format (Run/RunSummary) is always written for backward compatibility
-    """
-
-    def __init__(self, old_storage: ConcurrentStorage, session_store: SessionStore):
-        """
-        Initialize state writer.
-
-        Args:
-            old_storage: ConcurrentStorage for old format (runs/, summaries/)
-            session_store: SessionStore for new format (sessions/*/state.json)
-        """
-        self.old = old_storage
-        self.new = session_store
-        self.dual_write_enabled = os.getenv("USE_UNIFIED_SESSIONS", "false").lower() == "true"
-
-    async def write_execution_state(
-        self,
-        session_id: str,
-        state: SessionState,
-    ) -> None:
-        """
-        Write execution state to both old and new formats.
-
-        Args:
-            session_id: Session ID
-            state: SessionState to write
-        """
-        # Write to new format if enabled
-        if self.dual_write_enabled:
-            try:
-                await self.new.write_state(session_id, state)
-                logger.debug(f"Wrote state.json for session {session_id}")
-            except Exception as e:
-                logger.error(f"Failed to write state.json for {session_id}: {e}")
-                # Don't fail - old format is still written
-
-        # Always write to old format for backward compatibility
-        try:
-            run = self._convert_to_run(state)
-            await self.old.save_run(run)
-            logger.debug(f"Wrote Run object for session {session_id}")
-        except Exception as e:
-            logger.error(f"Failed to write Run object for {session_id}: {e}")
-            # This is more critical - reraise if old format fails
-            raise
-
-    def _convert_to_run(self, state: SessionState) -> Run:
-        """
-        Convert SessionState to legacy Run object.
-
-        Args:
-            state: SessionState to convert
-
-        Returns:
-            Run object
-        """
-        # Map SessionStatus to RunStatus
-        status_mapping = {
-            SessionStatus.ACTIVE: RunStatus.RUNNING,
-            SessionStatus.PAUSED: RunStatus.RUNNING,  # Paused is still "running" in old format
-            SessionStatus.COMPLETED: RunStatus.COMPLETED,
-            SessionStatus.FAILED: RunStatus.FAILED,
-            SessionStatus.CANCELLED: RunStatus.CANCELLED,
-        }
-        run_status = status_mapping.get(state.status, RunStatus.FAILED)
-
-        # Convert timestamps
-        started_at = datetime.fromisoformat(state.timestamps.started_at)
-        completed_at = (
-            datetime.fromisoformat(state.timestamps.completed_at)
-            if state.timestamps.completed_at
-            else None
-        )
-
-        # Build RunMetrics
-        metrics = RunMetrics(
-            total_decisions=state.metrics.decision_count,
-            successful_decisions=state.metrics.decision_count
-            - len(state.progress.nodes_with_failures),  # Approximate
-            failed_decisions=len(state.progress.nodes_with_failures),
-            total_tokens=state.metrics.total_input_tokens + state.metrics.total_output_tokens,
-            total_latency_ms=state.progress.total_latency_ms,
-            nodes_executed=state.metrics.nodes_executed,
-            edges_traversed=state.metrics.edges_traversed,
-        )
-
-        # Convert problems (SessionState stores as dicts, Run expects Problem objects)
-        problems = []
-        for p_dict in state.problems:
-            # Handle both old Problem objects and new dict format
-            if isinstance(p_dict, dict):
-                problems.append(Problem(**p_dict))
-            else:
-                problems.append(p_dict)
-
-        # Convert decisions (SessionState stores as dicts, Run expects Decision objects)
-        from framework.schemas.decision import Decision
-
-        decisions = []
-        for d_dict in state.decisions:
-            # Handle both old Decision objects and new dict format
-            if isinstance(d_dict, dict):
-                try:
-                    decisions.append(Decision(**d_dict))
-                except Exception:
-                    # Skip invalid decisions
-                    continue
-            else:
-                decisions.append(d_dict)
-
-        # Create Run object
-        run = Run(
-            id=state.session_id,  # Use session_id as run_id
-            goal_id=state.goal_id,
-            started_at=started_at,
-            status=run_status,
-            completed_at=completed_at,
-            decisions=decisions,
-            problems=problems,
-            metrics=metrics,
-            goal_description="",  # Not stored in SessionState
-            input_data=state.input_data,
-            output_data=state.result.output,
-        )
-
-        return run
-
-    async def read_state(
-        self,
-        session_id: str,
-        prefer_new: bool = True,
-    ) -> SessionState | None:
-        """
-        Read execution state from either format.
-
-        Args:
-            session_id: Session ID
-            prefer_new: If True, try new format first (default)
-
-        Returns:
-            SessionState or None if not found
-        """
-        if prefer_new:
-            # Try new format first
-            state = await self.new.read_state(session_id)
-            if state:
-                return state
-
-        # Fall back to old format
-        run = await self.old.load_run(session_id)
-        if run:
-            return SessionState.from_legacy_run(run, session_id)
-
-        return None
@@ -0,0 +1,38 @@
+"""Tool for the queen to write to her episodic memory.
+
+The queen can consciously record significant moments during a session — like
+writing in a diary. Semantic memory (MEMORY.md) is updated automatically at
+session end and is never written by the queen directly.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from framework.runner.tool_registry import ToolRegistry
+
+
+def write_to_diary(entry: str) -> str:
+    """Write a prose entry to today's episodic memory.
+
+    Use this when something significant just happened: a pipeline went live, the
+    user shared an important preference, a goal was achieved or abandoned, or
+    you want to record something that should be remembered across sessions.
+
+    Write in first person, as you would in a private diary. Be specific — what
+    happened, how the user responded, what it means going forward. One or two
+    paragraphs is enough.
+
+    You do not need to include a timestamp or date heading; those are added
+    automatically.
+    """
+    from framework.agents.queen.queen_memory import append_episodic_entry
+
+    append_episodic_entry(entry)
+    return "Diary entry recorded."
+
+
+def register_queen_memory_tools(registry: ToolRegistry) -> None:
+    """Register the episodic memory tool into the queen's tool registry."""
+    registry.register_function(write_to_diary)
@@ -1,6 +1,6 @@
 """Graph lifecycle tools for multi-graph sessions.

-These tools allow an agent (e.g. hive_coder) to load, unload, start,
+These tools allow an agent (e.g. queen) to load, unload, start,
 restart, and query other agent graphs within the same runtime session.

 Usage::
@@ -1,13 +0,0 @@
-"""TUI screens package."""
-
-from .account_selection import AccountSelectionScreen
-from .add_local_credential import AddLocalCredentialScreen
-from .agent_picker import AgentPickerScreen
-from .credential_setup import CredentialSetupScreen
-
-__all__ = [
-    "AccountSelectionScreen",
-    "AddLocalCredentialScreen",
-    "AgentPickerScreen",
-    "CredentialSetupScreen",
-]
@@ -1,111 +0,0 @@
-"""Account selection ModalScreen for picking a connected account before agent start."""
-
-from __future__ import annotations
-
-from rich.text import Text
-from textual.app import ComposeResult
-from textual.binding import Binding
-from textual.containers import Vertical
-from textual.screen import ModalScreen
-from textual.widgets import Label, OptionList
-from textual.widgets._option_list import Option
-
-
-class AccountSelectionScreen(ModalScreen[dict | None]):
-    """Modal screen showing connected accounts for pre-run selection.
-
-    Returns the selected account dict, or None if dismissed.
-    """
-
-    SCOPED_CSS = False
-
-    BINDINGS = [
-        Binding("escape", "dismiss_picker", "Cancel"),
-    ]
-
-    DEFAULT_CSS = """
-    AccountSelectionScreen {
-        align: center middle;
-    }
-    #acct-container {
-        width: 70%;
-        max-width: 80;
-        height: 60%;
-        background: $surface;
-        border: heavy $primary;
-        padding: 1 2;
-    }
-    #acct-title {
-        text-align: center;
-        text-style: bold;
-        width: 100%;
-        color: $text;
-    }
-    #acct-subtitle {
-        text-align: center;
-        width: 100%;
-        margin-bottom: 1;
-    }
-    #acct-footer {
-        text-align: center;
-        width: 100%;
-        margin-top: 1;
-    }
-    """
-
-    def __init__(self, accounts: list[dict]) -> None:
-        super().__init__()
-        self._accounts = accounts
-
-    def compose(self) -> ComposeResult:
-        n = len(self._accounts)
-        with Vertical(id="acct-container"):
-            yield Label("Select Account to Test", id="acct-title")
-            yield Label(
-                f"[dim]{n} connected account{'s' if n != 1 else ''}[/dim]",
-                id="acct-subtitle",
-            )
-            option_list = OptionList(id="acct-list")
-            # Group: Aden accounts first, then local
-            aden = [a for a in self._accounts if a.get("source") != "local"]
-            local = [a for a in self._accounts if a.get("source") == "local"]
-            ordered = aden + local
-            for i, acct in enumerate(ordered):
-                provider = acct.get("provider", "unknown")
-                alias = acct.get("alias", "unknown")
-                identity = acct.get("identity", {})
-                source = acct.get("source", "aden")
-                # Build identity label: prefer email, then username/workspace
-                identity_label = (
-                    identity.get("email")
-                    or identity.get("username")
-                    or identity.get("workspace")
-                    or ""
-                )
-                label = Text()
-                label.append(f"{provider}/", style="bold")
-                label.append(alias, style="bold cyan")
-                if source == "local":
-                    label.append("  [local]", style="dim yellow")
-                if identity_label:
-                    label.append(f"  ({identity_label})", style="dim")
-                option_list.add_option(Option(label, id=f"acct-{i}"))
-            # Keep ordered list for index lookups
-            self._accounts = ordered
-            yield option_list
-            yield Label(
-                "[dim]Enter[/dim] Select  [dim]Esc[/dim] Cancel",
-                id="acct-footer",
-            )
-
-    def on_mount(self) -> None:
-        ol = self.query_one("#acct-list", OptionList)
-        ol.styles.height = "1fr"
-
-    def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
-        idx = event.option_index
-        if 0 <= idx < len(self._accounts):
-            self.dismiss(self._accounts[idx])
-
-    def action_dismiss_picker(self) -> None:
-        self.dismiss(None)
@@ -1,244 +0,0 @@
-"""Add Local Credential ModalScreen for storing named local API key accounts."""
-
-from __future__ import annotations
-
-from textual.app import ComposeResult
-from textual.binding import Binding
-from textual.containers import Vertical, VerticalScroll
-from textual.screen import ModalScreen
-from textual.widgets import Button, Input, Label, OptionList
-from textual.widgets._option_list import Option
-
-
-class AddLocalCredentialScreen(ModalScreen[dict | None]):
-    """Modal screen for adding a named local API key credential.
-
-    Phase 1: Pick credential type from list.
-    Phase 2: Enter alias + API key, run health check, save.
-
-    Returns a dict with credential_id, alias, and identity on success, or None on cancel.
-    """
-
-    BINDINGS = [
-        Binding("escape", "dismiss_screen", "Cancel"),
-    ]
-
-    DEFAULT_CSS = """
-    AddLocalCredentialScreen {
-        align: center middle;
-    }
-    #alc-container {
-        width: 80%;
-        max-width: 90;
-        height: 80%;
-        background: $surface;
-        border: heavy $primary;
-        padding: 1 2;
-    }
-    #alc-title {
-        text-align: center;
-        text-style: bold;
-        width: 100%;
-        color: $text;
-    }
-    #alc-subtitle {
-        text-align: center;
-        width: 100%;
-        margin-bottom: 1;
-    }
-    #alc-type-list {
-        height: 1fr;
-    }
-    #alc-form {
-        height: 1fr;
-    }
-    .alc-field {
-        margin-bottom: 1;
-        height: auto;
-    }
-    .alc-field Label {
-        margin-bottom: 0;
-    }
-    #alc-status {
-        width: 100%;
-        height: auto;
-        margin-top: 1;
-        padding: 1;
-        background: $panel;
-    }
-    .alc-buttons {
-        height: auto;
-        margin-top: 1;
-        align: center middle;
-    }
-    .alc-buttons Button {
-        margin: 0 1;
-    }
-    #alc-footer {
-        text-align: center;
-        width: 100%;
-        margin-top: 1;
-    }
-    """
-
-    def __init__(self) -> None:
-        super().__init__()
-        # Load credential specs that support direct API keys
-        self._specs: list[tuple[str, object]] = self._load_specs()
-        # Selected credential spec (set in phase 2)
-        self._selected_id: str = ""
-        self._selected_spec: object = None
-        self._phase: int = 1  # 1 = type selection, 2 = form
-
-    @staticmethod
-    def _load_specs() -> list[tuple[str, object]]:
-        """Return (credential_id, spec) pairs for direct-API-key credentials."""
-        try:
-            from aden_tools.credentials import CREDENTIAL_SPECS
-
-            return [
-                (cid, spec)
-                for cid, spec in CREDENTIAL_SPECS.items()
-                if getattr(spec, "direct_api_key_supported", False)
-            ]
-        except Exception:
-            return []
-
-    # ------------------------------------------------------------------
-    # Compose
-    # ------------------------------------------------------------------
-
-    def compose(self) -> ComposeResult:
-        with Vertical(id="alc-container"):
-            yield Label("Add Local Credential", id="alc-title")
-            yield Label("[dim]Store a named API key account[/dim]", id="alc-subtitle")
-            # Phase 1: type selection
-            option_list = OptionList(id="alc-type-list")
-            for cid, spec in self._specs:
-                description = getattr(spec, "description", cid)
-                option_list.add_option(Option(f"{cid}  [dim]{description}[/dim]", id=f"type-{cid}"))
-            yield option_list
-            # Phase 2: form (hidden initially)
-            with VerticalScroll(id="alc-form"):
-                with Vertical(classes="alc-field"):
-                    yield Label("[bold]Alias[/bold]  [dim](e.g. work, personal)[/dim]")
-                    yield Input(value="default", id="alc-alias")
-                with Vertical(classes="alc-field"):
-                    yield Label("[bold]API Key[/bold]")
-                    yield Input(placeholder="Paste API key...", password=True, id="alc-key")
-                yield Label("", id="alc-status")
-                with Vertical(classes="alc-buttons"):
-                    yield Button("Test & Save", variant="primary", id="btn-save")
-                    yield Button("Back", variant="default", id="btn-back")
-            yield Label(
-                "[dim]Enter[/dim] Select  [dim]Esc[/dim] Cancel",
-                id="alc-footer",
-            )
-
-    def on_mount(self) -> None:
-        self._show_phase(1)
-
-    # ------------------------------------------------------------------
-    # Phase switching
-    # ------------------------------------------------------------------
-
-    def _show_phase(self, phase: int) -> None:
-        self._phase = phase
-        type_list = self.query_one("#alc-type-list", OptionList)
-        form = self.query_one("#alc-form", VerticalScroll)
-        if phase == 1:
-            type_list.display = True
-            form.display = False
-            subtitle = self.query_one("#alc-subtitle", Label)
-            subtitle.update("[dim]Select the credential type to add[/dim]")
-        else:
-            type_list.display = False
-            form.display = True
-            spec = self._selected_spec
-            description = (
-                getattr(spec, "description", self._selected_id) if spec else self._selected_id
-            )
-            subtitle = self.query_one("#alc-subtitle", Label)
-            subtitle.update(f"[dim]{self._selected_id}[/dim]  {description}")
-            self._clear_status()
-            # Focus the alias input
-            self.query_one("#alc-alias", Input).focus()
-
-    # ------------------------------------------------------------------
-    # Event handlers
-    # ------------------------------------------------------------------
-
-    def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
-        if self._phase != 1:
-            return
-        option_id = event.option.id or ""
-        if option_id.startswith("type-"):
-            cid = option_id[5:]  # strip "type-" prefix
-            self._selected_id = cid
-            self._selected_spec = next(
-                (spec for spec_id, spec in self._specs if spec_id == cid), None
-            )
-            self._show_phase(2)
-
-    def on_button_pressed(self, event: Button.Pressed) -> None:
-        if event.button.id == "btn-save":
-            self._do_save()
-        elif event.button.id == "btn-back":
-            self._show_phase(1)
-
-    # ------------------------------------------------------------------
-    # Save logic
-    # ------------------------------------------------------------------
-
-    def _do_save(self) -> None:
-        alias = self.query_one("#alc-alias", Input).value.strip() or "default"
-        api_key = self.query_one("#alc-key", Input).value.strip()
-
-        if not api_key:
-            self._set_status("[red]API key cannot be empty.[/red]")
-            return
-
-        self._set_status("[dim]Running health check...[/dim]")
-        # Disable save button while running
-        btn = self.query_one("#btn-save", Button)
-        btn.disabled = True
-
-        try:
-            from framework.credentials.local.registry import LocalCredentialRegistry
-
-            registry = LocalCredentialRegistry.default()
-            info, health_result = registry.save_account(
-                credential_id=self._selected_id,
-                alias=alias,
-                api_key=api_key,
-                run_health_check=True,
-            )
-
-            if health_result is not None and not health_result.valid:
-                self._set_status(
-                    f"[yellow]Saved with failed health check:[/yellow] {health_result.message}\n"
-                    "[dim]You can re-validate later via validate_credential().[/dim]"
-                )
-            else:
-                identity = info.identity.to_dict()
-                identity_str = ""
-                if identity:
-                    parts = [f"{k}: {v}" for k, v in identity.items() if v]
-                    identity_str = "  " + ", ".join(parts) if parts else ""
-                self._set_status(f"[green]Saved:[/green] {info.storage_id}{identity_str}")
-                # Dismiss with result so callers can react
-                self.set_timer(1.0, lambda: self.dismiss(info.to_account_dict()))
-                return
-        except Exception as e:
-            self._set_status(f"[red]Error:[/red] {e}")
-        finally:
-            btn.disabled = False
-
-    def _set_status(self, markup: str) -> None:
-        self.query_one("#alc-status", Label).update(markup)
-
-    def _clear_status(self) -> None:
-        self.query_one("#alc-status", Label).update("")
-
-    def action_dismiss_screen(self) -> None:
-        self.dismiss(None)
@@ -1,362 +0,0 @@
-"""Agent picker ModalScreen for selecting agents within the TUI."""
-
-from __future__ import annotations
-
-import json
-from dataclasses import dataclass, field
-from enum import Enum
-from pathlib import Path
-
-from rich.console import Group
-from rich.text import Text
-from textual.app import ComposeResult
-from textual.binding import Binding
-from textual.containers import Vertical
-from textual.screen import ModalScreen
-from textual.widgets import Label, OptionList, TabbedContent, TabPane
-from textual.widgets._option_list import Option
-
-
-class GetStartedAction(Enum):
-    """Actions available in the Get Started tab."""
-
-    RUN_EXAMPLES = "run_examples"
-    RUN_EXISTING = "run_existing"
-    BUILD_EDIT = "build_edit"
-
-
-@dataclass
-class AgentEntry:
-    """Lightweight agent metadata for the picker."""
-
-    path: Path
-    name: str
-    description: str
-    category: str
-    session_count: int = 0
-    node_count: int = 0
-    tool_count: int = 0
-    tags: list[str] = field(default_factory=list)
-    last_active: str | None = None
-
-
-def _get_last_active(agent_name: str) -> str | None:
-    """Return the most recent updated_at timestamp across all sessions."""
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
-    if not sessions_dir.exists():
-        return None
-    latest: str | None = None
-    for session_dir in sessions_dir.iterdir():
-        if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
-            continue
-        state_file = session_dir / "state.json"
-        if not state_file.exists():
-            continue
-        try:
-            data = json.loads(state_file.read_text(encoding="utf-8"))
-            ts = data.get("timestamps", {}).get("updated_at")
-            if ts and (latest is None or ts > latest):
-                latest = ts
-        except Exception:
-            continue
-    return latest
-
-
-def _count_sessions(agent_name: str) -> int:
-    """Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
-    sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
-    if not sessions_dir.exists():
-        return 0
-    return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
-
-
-def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
-    """Extract node count, tool count, and tags from an agent directory.
-
-    Prefers agent.py (AST-parsed) over agent.json for node/tool counts
-    since agent.json may be stale.  Tags are only available from agent.json.
-    """
-    import ast
-
-    node_count, tool_count, tags = 0, 0, []
-
-    # Try agent.py first — source of truth for nodes
-    agent_py = agent_path / "agent.py"
-    if agent_py.exists():
-        try:
-            tree = ast.parse(agent_py.read_text(encoding="utf-8"))
-            for node in ast.walk(tree):
-                # Find `nodes = [...]` assignment
-                if isinstance(node, ast.Assign):
-                    for target in node.targets:
-                        if isinstance(target, ast.Name) and target.id == "nodes":
-                            if isinstance(node.value, ast.List):
-                                node_count = len(node.value.elts)
-        except Exception:
-            pass
-
-    # Fall back to / supplement from agent.json
-    agent_json = agent_path / "agent.json"
-    if agent_json.exists():
-        try:
-            data = json.loads(agent_json.read_text(encoding="utf-8"))
-            json_nodes = data.get("nodes", [])
-            if node_count == 0:
-                node_count = len(json_nodes)
-            # Tool count: use whichever source gave us nodes, but agent.json
-            # has the structured tool lists so prefer it for tool counting
-            tools: set[str] = set()
-            for n in json_nodes:
-                tools.update(n.get("tools", []))
-            tool_count = len(tools)
-            tags = data.get("agent", {}).get("tags", [])
-        except Exception:
-            pass
-
-    return node_count, tool_count, tags
-
-
-def discover_agents() -> dict[str, list[AgentEntry]]:
-    """Discover agents from all known sources grouped by category."""
-    from framework.runner.cli import (
-        _extract_python_agent_metadata,
-        _get_framework_agents_dir,
-        _is_valid_agent_dir,
-    )
-
-    groups: dict[str, list[AgentEntry]] = {}
-    sources = [
-        ("Your Agents", Path("exports")),
-        ("Framework", _get_framework_agents_dir()),
-        ("Examples", Path("examples/templates")),
-    ]
-
-    for category, base_dir in sources:
-        if not base_dir.exists():
-            continue
-        entries: list[AgentEntry] = []
-        for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
-            if not _is_valid_agent_dir(path):
-                continue
-
-            # config.py is source of truth for name/description
-            name, desc = _extract_python_agent_metadata(path)
-            config_fallback_name = path.name.replace("_", " ").title()
-            used_config = name != config_fallback_name
-
-            node_count, tool_count, tags = _extract_agent_stats(path)
-            if not used_config:
-                # config.py didn't provide values, fall back to agent.json
-                agent_json = path / "agent.json"
-                if agent_json.exists():
-                    try:
-                        data = json.loads(agent_json.read_text(encoding="utf-8"))
-                        meta = data.get("agent", {})
-                        name = meta.get("name", name)
-                        desc = meta.get("description", desc)
-                    except Exception:
-                        pass
-
-            entries.append(
-                AgentEntry(
-                    path=path,
-                    name=name,
-                    description=desc,
-                    category=category,
-                    session_count=_count_sessions(path.name),
-                    node_count=node_count,
-                    tool_count=tool_count,
-                    tags=tags,
-                    last_active=_get_last_active(path.name),
-                )
-            )
-        if entries:
-            groups[category] = entries
-
-    return groups
-
-
-def _render_agent_option(agent: AgentEntry) -> Group:
-    """Build a Rich renderable for a single agent option."""
-    # Line 1: name + session badge
-    line1 = Text()
-    line1.append(agent.name, style="bold")
-    if agent.session_count:
-        line1.append(f"  {agent.session_count} sessions", style="dim cyan")
-
-    # Line 2: description (word-wrapped by the widget)
-    desc = agent.description if agent.description else "No description"
-    line2 = Text(desc, style="dim")
-
-    # Line 3: stats chips
-    chips = Text()
-    if agent.node_count:
-        chips.append(f" {agent.node_count} nodes ", style="on dark_green white")
-        chips.append(" ")
-    if agent.tool_count:
-        chips.append(f" {agent.tool_count} tools ", style="on dark_blue white")
-        chips.append(" ")
-    for tag in agent.tags[:3]:
-        chips.append(f" {tag} ", style="on grey37 white")
-        chips.append(" ")
-
-    parts = [line1, line2]
-    if chips.plain.strip():
-        parts.append(chips)
-    return Group(*parts)
-
-
-def _render_get_started_option(title: str, description: str, icon: str = "→") -> Group:
-    """Build a Rich renderable for a Get Started option."""
-    line1 = Text()
-    line1.append(f"{icon} ", style="bold cyan")
-    line1.append(title, style="bold")
-    line2 = Text(description, style="dim")
-    return Group(line1, line2)
-
-
-class AgentPickerScreen(ModalScreen[str | None]):
-    """Modal screen showing available agents organized by tabbed categories.
-
-    Returns the selected agent path as a string, or None if dismissed.
-    For Get Started actions, returns a special prefix like "action:run_examples".
-    """
-
-    BINDINGS = [
-        Binding("escape", "dismiss_picker", "Cancel"),
-    ]
-
-    DEFAULT_CSS = """
-    AgentPickerScreen {
-        align: center middle;
-    }
-    #picker-container {
-        width: 90%;
-        max-width: 120;
-        height: 85%;
-        background: $surface;
-        border: heavy $primary;
-        padding: 1 2;
-    }
-    #picker-title {
-        text-align: center;
-        text-style: bold;
-        width: 100%;
-        color: $text;
-    }
-    #picker-subtitle {
-        text-align: center;
-        width: 100%;
-        margin-bottom: 1;
-    }
-    #picker-footer {
-        text-align: center;
-        width: 100%;
-        margin-top: 1;
-    }
-    TabPane {
-        padding: 0;
-    }
-    OptionList {
-        height: 1fr;
-    }
-    OptionList > .option-list--option {
-        padding: 1 2;
-    }
-    """
-
-    def __init__(
-        self,
-        agent_groups: dict[str, list[AgentEntry]],
-        show_get_started: bool = False,
-    ) -> None:
-        super().__init__()
-        self._groups = agent_groups
-        self._show_get_started = show_get_started
-        # Map (tab_id, option_index) -> AgentEntry
-        self._option_map: dict[str, dict[int, AgentEntry]] = {}
-
-    def compose(self) -> ComposeResult:
-        total = sum(len(v) for v in self._groups.values())
-        with Vertical(id="picker-container"):
-            yield Label("Hive Agent Launcher", id="picker-title")
-            yield Label(
-                f"[dim]{total} agents available[/dim]",
-                id="picker-subtitle",
-            )
-            with TabbedContent():
-                # Get Started tab (only on initial launch)
-                if self._show_get_started:
-                    with TabPane("Get Started", id="get-started"):
-                        option_list = OptionList(id="list-get-started")
-                        option_list.add_option(
-                            Option(
-                                _render_get_started_option(
-                                    "Test and run example agents",
-                                    "Try pre-built example agents to learn how Hive works",
-                                    "📚",
-                                ),
-                                id="action:run_examples",
-                            )
-                        )
-                        option_list.add_option(
-                            Option(
-                                _render_get_started_option(
-                                    "Test and run existing agent",
-                                    "Load and run an agent you've already built (from exports/)",
-                                    "🚀",
-                                ),
-                                id="action:run_existing",
-                            )
-                        )
-                        option_list.add_option(
-                            Option(
-                                _render_get_started_option(
-                                    "Build or edit agent",
-                                    "Create a new agent or modify an existing one",
-                                    "🛠️ ",
-                                ),
-                                id="action:build_edit",
-                            )
-                        )
-                        yield option_list
-
-                # Agent category tabs
-                for category, agents in self._groups.items():
-                    tab_id = category.lower().replace(" ", "-")
-                    with TabPane(f"{category} ({len(agents)})", id=tab_id):
-                        option_list = OptionList(id=f"list-{tab_id}")
-                        self._option_map[f"list-{tab_id}"] = {}
-                        for i, agent in enumerate(agents):
-                            option_list.add_option(
-                                Option(
-                                    _render_agent_option(agent),
-                                    id=str(agent.path),
-                                )
-                            )
-                            self._option_map[f"list-{tab_id}"][i] = agent
-                        yield option_list
-            yield Label(
-                "[dim]Enter[/dim] Select  [dim]Tab[/dim] Switch category  [dim]Esc[/dim] Cancel",
-                id="picker-footer",
-            )
-
-    def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
-        list_id = event.option_list.id or ""
-
-        # Handle Get Started tab options
-        if list_id == "list-get-started":
-            option = event.option
-            if option and option.id:
-                self.dismiss(option.id)  # Returns "action:run_examples", etc.
-            return
-
-        # Handle agent selection from other tabs
-        idx = event.option_index
-        agent_map = self._option_map.get(list_id, {})
-        agent = agent_map.get(idx)
-        if agent:
-            self.dismiss(str(agent.path))
-
-    def action_dismiss_picker(self) -> None:
-        self.dismiss(None)
@@ -1,304 +0,0 @@
-"""Credential setup ModalScreen for configuring missing agent credentials."""
-
-from __future__ import annotations
-
-import os
-
-from textual.app import ComposeResult
-from textual.binding import Binding
-from textual.containers import Vertical, VerticalScroll
-from textual.screen import ModalScreen
-from textual.widgets import Button, Input, Label
-
-from framework.credentials.setup import CredentialSetupSession, MissingCredential
-
-
-class CredentialSetupScreen(ModalScreen[bool | None]):
-    """Modal screen for configuring missing agent credentials.
-
-    Shows a form with one password Input per missing credential.
-    For Aden-backed credentials (``aden_supported=True``), prompts for
-    ``ADEN_API_KEY`` and runs the Aden sync flow instead of storing a
-    raw value.
-
-    Returns True on successful save, or None on cancel/skip.
-    """
-
-    BINDINGS = [
-        Binding("escape", "dismiss_setup", "Cancel"),
-    ]
-
-    DEFAULT_CSS = """
-    CredentialSetupScreen {
-        align: center middle;
-    }
-    #cred-container {
-        width: 80%;
-        max-width: 100;
-        height: 80%;
-        background: $surface;
-        border: heavy $primary;
-        padding: 1 2;
-    }
-    #cred-title {
-        text-align: center;
-        text-style: bold;
-        width: 100%;
-        color: $text;
-    }
-    #cred-subtitle {
-        text-align: center;
-        width: 100%;
-        margin-bottom: 1;
-    }
-    #cred-scroll {
-        height: 1fr;
-    }
-    .cred-entry {
-        margin-bottom: 1;
-        padding: 1;
-        background: $panel;
-        height: auto;
-    }
-    .cred-entry Input {
-        margin-top: 1;
-    }
-    .cred-buttons {
-        height: auto;
-        margin-top: 1;
-        align: center middle;
-    }
-    .cred-buttons Button {
-        margin: 0 1;
-    }
-    #cred-footer {
-        text-align: center;
-        width: 100%;
-        margin-top: 1;
-    }
-    """
-
-    def __init__(self, session: CredentialSetupSession) -> None:
-        super().__init__()
-        self._session = session
-        self._missing: list[MissingCredential] = session.missing
-        # Track which credentials need Aden sync vs direct API key
-        self._aden_creds: set[int] = set()
-        self._needs_aden_key = False
-        for i, cred in enumerate(self._missing):
-            if cred.aden_supported and not cred.direct_api_key_supported:
-                self._aden_creds.add(i)
-                self._needs_aden_key = True
-
-    def compose(self) -> ComposeResult:
-        n = len(self._missing)
-        with Vertical(id="cred-container"):
-            yield Label("Credential Setup", id="cred-title")
-            yield Label(
-                f"[dim]{n} credential{'s' if n != 1 else ''} needed to run this agent[/dim]",
-                id="cred-subtitle",
-            )
-            with VerticalScroll(id="cred-scroll"):
-                # If any credential needs Aden, show ADEN_API_KEY input first
-                if self._needs_aden_key:
-                    aden_key = os.environ.get("ADEN_API_KEY", "")
-                    with Vertical(classes="cred-entry"):
-                        yield Label("[bold]ADEN_API_KEY[/bold]")
-                        aden_names = [
-                            self._missing[i].credential_name for i in sorted(self._aden_creds)
-                        ]
-                        yield Label(f"[dim]Required for OAuth sync: {', '.join(aden_names)}[/dim]")
-                        yield Label("[cyan]Get key:[/cyan] https://hive.adenhq.com")
-                        yield Input(
-                            placeholder="Paste ADEN_API_KEY..."
-                            if not aden_key
-                            else "Already set (leave blank to keep)",
-                            password=True,
-                            id="key-aden",
-                        )
-
-                # Show direct API key inputs for non-Aden credentials
-                for i, cred in enumerate(self._missing):
-                    if i in self._aden_creds:
-                        continue  # Handled via Aden sync above
-                    with Vertical(classes="cred-entry"):
-                        yield Label(f"[bold]{cred.env_var}[/bold]")
-                        affected = cred.tools or cred.node_types
-                        if affected:
-                            yield Label(f"[dim]Required by: {', '.join(affected)}[/dim]")
-                        if cred.description:
-                            yield Label(f"[dim]{cred.description}[/dim]")
-                        if cred.help_url:
-                            yield Label(f"[cyan]Get key:[/cyan] {cred.help_url}")
-                        yield Input(
-                            placeholder="Paste API key...",
-                            password=True,
-                            id=f"key-{i}",
-                        )
-            with Vertical(classes="cred-buttons"):
-                yield Button("Save & Continue", variant="primary", id="btn-save")
-                yield Button("Skip", variant="default", id="btn-skip")
-            yield Label(
-                "[dim]Enter[/dim] Submit  [dim]Esc[/dim] Cancel",
-                id="cred-footer",
-            )
-
-    def on_button_pressed(self, event: Button.Pressed) -> None:
-        if event.button.id == "btn-save":
-            self._save_credentials()
-        elif event.button.id == "btn-skip":
-            self.dismiss(None)
-
-    def _save_credentials(self) -> None:
-        """Collect inputs, store credentials, and dismiss."""
-        self._session._ensure_credential_key()
-
-        configured = 0
-
-        # Handle Aden-backed credentials
-        if self._needs_aden_key:
-            aden_input = self.query_one("#key-aden", Input)
-            aden_key = aden_input.value.strip()
-            if aden_key:
-                from framework.credentials.key_storage import save_aden_api_key
-
-                save_aden_api_key(aden_key)
-                configured += 1  # ADEN_API_KEY itself counts as configured
-
-            # Run Aden sync for all Aden-backed creds (best-effort)
-            if aden_key or os.environ.get("ADEN_API_KEY"):
-                self._sync_aden_credentials()
-
-        # Handle direct API key credentials
-        for i, cred in enumerate(self._missing):
-            if i in self._aden_creds:
-                continue
-            input_widget = self.query_one(f"#key-{i}", Input)
-            value = input_widget.value.strip()
-            if not value:
-                continue
-            try:
-                self._session._store_credential(cred, value)
-                configured += 1
-            except Exception as e:
-                self.notify(f"Error storing {cred.env_var}: {e}", severity="error")
-
-        if configured > 0:
-            self.dismiss(True)
-        else:
-            self.notify("No credentials configured", severity="warning", timeout=3)
-
-    def _sync_aden_credentials(self) -> int:
-        """Sync Aden-backed credentials and return count of successfully synced."""
-        # Build the Aden sync components directly so we get real errors
-        # instead of CredentialStore.with_aden_sync() silently falling back.
-        try:
-            from framework.credentials.aden import (
-                AdenCachedStorage,
-                AdenClientConfig,
-                AdenCredentialClient,
-                AdenSyncProvider,
-            )
-            from framework.credentials.storage import EncryptedFileStorage
-
-            client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
-            provider = AdenSyncProvider(client=client)
-            local_storage = EncryptedFileStorage()
-            cached_storage = AdenCachedStorage(
-                local_storage=local_storage,
-                aden_provider=provider,
-            )
-        except Exception as e:
-            self.notify(
-                f"Aden setup error: {e}",
-                severity="error",
-                timeout=8,
-            )
-            return 0
-
-        # Sync all integrations from Aden to get the provider index populated
-        try:
-            from framework.credentials import CredentialStore
-
-            store = CredentialStore(
-                storage=cached_storage,
-                providers=[provider],
-                auto_refresh=True,
-            )
-            num_synced = provider.sync_all(store)
-            if num_synced == 0:
-                self.notify(
-                    "No active integrations found in Aden. "
-                    "Connect integrations at hive.adenhq.com.",
-                    severity="warning",
-                    timeout=8,
-                )
-        except Exception as e:
-            self.notify(
-                f"Aden sync error: {e}",
-                severity="error",
-                timeout=8,
-            )
-            return 0
-
-        synced = 0
-        for i in sorted(self._aden_creds):
-            cred = self._missing[i]
-            cred_id = cred.credential_id or cred.credential_name
-            if store.is_available(cred_id):
-                try:
-                    value = store.get_key(cred_id, cred.credential_key)
-                    if value:
-                        os.environ[cred.env_var] = value
-                        self._persist_to_local_store(cred_id, cred.credential_key, value)
-                        synced += 1
-                    else:
-                        self.notify(
-                            f"{cred.credential_name}: key "
-                            f"'{cred.credential_key}' not found "
-                            f"in credential '{cred_id}'",
-                            severity="warning",
-                            timeout=8,
-                        )
-                except Exception as e:
-                    self.notify(
-                        f"{cred.credential_name} extraction failed: {e}",
-                        severity="error",
-                        timeout=8,
-                    )
-            else:
-                self.notify(
-                    f"{cred.credential_name} (id='{cred_id}') "
-                    f"not found in Aden. Connect this "
-                    f"integration at hive.adenhq.com first.",
-                    severity="warning",
-                    timeout=8,
-                )
-        return synced
-
-    @staticmethod
-    def _persist_to_local_store(cred_id: str, key_name: str, value: str) -> None:
-        """Save a synced token to the local encrypted store under the canonical ID."""
-        try:
-            from pydantic import SecretStr
-
-            from framework.credentials.models import CredentialKey, CredentialObject, CredentialType
-            from framework.credentials.storage import EncryptedFileStorage
-
-            cred_obj = CredentialObject(
-                id=cred_id,
-                credential_type=CredentialType.OAUTH2,
-                keys={
-                    key_name: CredentialKey(
-                        name=key_name,
-                        value=SecretStr(value),
-                    ),
-                },
-                auto_refresh=True,
-            )
-            EncryptedFileStorage().save(cred_obj)
-        except Exception:
-            pass  # Best-effort; env var is the primary delivery mechanism
-
-    def action_dismiss_setup(self) -> None:
-        self.dismiss(None)
@@ -1,139 +0,0 @@
-"""
-Native OS file dialog for PDF selection.
-
-Launches the platform's native file picker (macOS: NSOpenPanel via osascript,
-Linux: zenity/kdialog, Windows: PowerShell OpenFileDialog) in a background
-thread so Textual's event loop stays responsive.
-
-Falls back to None when no GUI is available (SSH, headless).
-"""
-
-import asyncio
-import os
-import subprocess
-import sys
-from pathlib import Path
-
-
-def _has_gui() -> bool:
-    """Detect whether a GUI display is available."""
-    if sys.platform == "darwin":
-        # macOS: GUI is available unless running over SSH without display forwarding.
-        return "SSH_CONNECTION" not in os.environ or "DISPLAY" in os.environ
-    elif sys.platform == "win32":
-        return True
-    else:
-        # Linux/BSD: Need X11 or Wayland.
-        return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
-
-
-def _linux_file_dialog() -> subprocess.CompletedProcess | None:
-    """Try zenity, then kdialog, on Linux. Returns CompletedProcess or None."""
-    # Try zenity (GTK)
-    try:
-        return subprocess.run(
-            [
-                "zenity",
-                "--file-selection",
-                "--title=Select a PDF file",
-                "--file-filter=PDF files (*.pdf)|*.pdf",
-            ],
-            encoding="utf-8",
-            capture_output=True,
-            text=True,
-            timeout=300,
-        )
-    except FileNotFoundError:
-        pass
-
-    # Try kdialog (KDE)
-    try:
-        return subprocess.run(
-            [
-                "kdialog",
-                "--getopenfilename",
-                ".",
-                "PDF files (*.pdf)",
-            ],
-            encoding="utf-8",
-            capture_output=True,
-            text=True,
-            timeout=300,
-        )
-    except FileNotFoundError:
-        pass
-
-    return None
-
-
-def _pick_pdf_subprocess() -> Path | None:
-    """Run the native file dialog. BLOCKS until user picks or cancels.
-
-    Returns a Path on success, None on cancel or error.
-    Must be called from a non-main thread (via asyncio.to_thread).
-    """
-    try:
-        if sys.platform == "darwin":
-            result = subprocess.run(
-                [
-                    "osascript",
-                    "-e",
-                    'POSIX path of (choose file of type {"com.adobe.pdf"} '
-                    'with prompt "Select a PDF file")',
-                ],
-                encoding="utf-8",
-                capture_output=True,
-                text=True,
-                timeout=300,
-            )
-        elif sys.platform == "win32":
-            ps_script = (
-                "Add-Type -AssemblyName System.Windows.Forms; "
-                "$f = New-Object System.Windows.Forms.OpenFileDialog; "
-                "$f.Filter = 'PDF files (*.pdf)|*.pdf'; "
-                "$f.Title = 'Select a PDF file'; "
-                "if ($f.ShowDialog() -eq 'OK') { $f.FileName }"
-            )
-            result = subprocess.run(
-                ["powershell", "-NoProfile", "-Command", ps_script],
-                encoding="utf-8",
-                capture_output=True,
-                text=True,
-                timeout=300,
-            )
-        else:
-            result = _linux_file_dialog()
-            if result is None:
-                return None
-
-        if result.returncode != 0:
-            return None
-
-        path_str = result.stdout.strip()
-        if not path_str:
-            return None
-
-        path = Path(path_str)
-        if path.is_file() and path.suffix.lower() == ".pdf":
-            return path
-
-        return None
-
-    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
-        return None
-
-
-async def pick_pdf_file() -> Path | None:
-    """Open a native OS file dialog to pick a PDF file.
-
-    Non-blocking: runs the dialog subprocess in a background thread via
-    asyncio.to_thread(), so the calling event loop stays responsive.
-
-    Returns:
-        Path to the selected PDF, or None if the user cancelled,
-        no GUI is available, or the dialog command was not found.
-    """
-    if not _has_gui():
-        return None
-
-    return await asyncio.to_thread(_pick_pdf_subprocess)
@@ -1,585 +0,0 @@
-"""
-Graph/Tree Overview Widget - Displays real agent graph structure.
-
-Supports rendering loops (back-edges) via right-side return channels:
-arrows drawn on the right margin that visually point back up to earlier nodes.
-"""
-
-from __future__ import annotations
-
-import re
-import time
-
-from textual.app import ComposeResult
-from textual.containers import Vertical
-
-from framework.runtime.agent_runtime import AgentRuntime
-from framework.runtime.event_bus import EventType
-from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
-
-# Width of each return-channel column (padding + │ + gap)
-_CHANNEL_WIDTH = 5
-
-# Regex to strip Rich markup tags for measuring visible width
-_MARKUP_RE = re.compile(r"\[/?[^\]]*\]")
-
-
-def _plain_len(s: str) -> int:
-    """Return the visible character length of a Rich-markup string."""
-    return len(_MARKUP_RE.sub("", s))
-
-
-class GraphOverview(Vertical):
-    """Widget to display Agent execution graph/tree with real data."""
-
-    DEFAULT_CSS = """
-    GraphOverview {
-        width: 100%;
-        height: 100%;
-        background: $panel;
-    }
-
-    GraphOverview > RichLog {
-        width: 100%;
-        height: 100%;
-        background: $panel;
-        border: none;
-        scrollbar-background: $surface;
-        scrollbar-color: $primary;
-    }
-    """
-
-    def __init__(self, runtime: AgentRuntime):
-        super().__init__()
-        self.runtime = runtime
-        self._override_graph = None  # Set by switch_graph() for secondary graphs
-        self.active_node: str | None = None
-        self.execution_path: list[str] = []
-        # Per-node status strings shown next to the node in the graph display.
-        # e.g. {"planner": "thinking...", "searcher": "web_search..."}
-        self._node_status: dict[str, str] = {}
-
-    @property
-    def _graph(self):
-        """The graph currently being displayed (may be a secondary graph)."""
-        return self._override_graph or self.runtime.graph
-
-    def switch_graph(self, graph) -> None:
-        """Switch to displaying a different graph and refresh."""
-        self._override_graph = graph
-        self.active_node = None
-        self.execution_path = []
-        self._node_status = {}
-        self._display_graph()
-
-    def compose(self) -> ComposeResult:
-        # Use RichLog for formatted output
-        yield RichLog(id="graph-display", highlight=True, markup=True)
-
-    def on_mount(self) -> None:
-        """Display initial graph structure."""
-        self._display_graph()
-        # Refresh every 1s so timer countdowns stay current
-        if self.runtime._timer_next_fire is not None:
-            self.set_interval(1.0, self._display_graph)
-
-    # ------------------------------------------------------------------
-    # Graph analysis helpers
-    # ------------------------------------------------------------------
-
-    def _topo_order(self) -> list[str]:
-        """BFS from entry_node following edges."""
-        graph = self._graph
-        visited: list[str] = []
-        seen: set[str] = set()
-        queue = [graph.entry_node]
-        while queue:
-            nid = queue.pop(0)
-            if nid in seen:
-                continue
-            seen.add(nid)
-            visited.append(nid)
-            for edge in graph.get_outgoing_edges(nid):
-                if edge.target not in seen:
-                    queue.append(edge.target)
-        # Append orphan nodes not reachable from entry
-        for node in graph.nodes:
-            if node.id not in seen:
-                visited.append(node.id)
-        return visited
-
-    def _detect_back_edges(self, ordered: list[str]) -> list[dict]:
-        """Find edges where target appears before (or equal to) source in topo order.
-
-        Returns a list of dicts with keys: edge, source, target, source_idx, target_idx.
-        """
-        order_idx = {nid: i for i, nid in enumerate(ordered)}
-        back_edges: list[dict] = []
-        for node_id in ordered:
-            for edge in self._graph.get_outgoing_edges(node_id):
-                target_idx = order_idx.get(edge.target, -1)
-                source_idx = order_idx.get(node_id, -1)
-                if target_idx != -1 and target_idx <= source_idx:
-                    back_edges.append(
-                        {
-                            "edge": edge,
-                            "source": node_id,
-                            "target": edge.target,
-                            "source_idx": source_idx,
-                            "target_idx": target_idx,
-                        }
-                    )
-        return back_edges
-
-    def _is_back_edge(self, source: str, target: str, order_idx: dict[str, int]) -> bool:
-        """Check whether an edge from *source* to *target* is a back-edge."""
-        si = order_idx.get(source, -1)
-        ti = order_idx.get(target, -1)
-        return ti != -1 and ti <= si
-
-    # ------------------------------------------------------------------
-    # Line rendering (Pass 1)
-    # ------------------------------------------------------------------
-
-    def _render_node_line(self, node_id: str) -> str:
-        """Render a single node with status symbol and optional status text."""
-        graph = self._graph
-        is_terminal = node_id in (graph.terminal_nodes or [])
-        is_active = node_id == self.active_node
-        is_done = node_id in self.execution_path and not is_active
-        status = self._node_status.get(node_id, "")
-
-        if is_active:
-            sym = "[bold green]●[/bold green]"
-        elif is_done:
-            sym = "[dim]✓[/dim]"
-        elif is_terminal:
-            sym = "[yellow]■[/yellow]"
-        else:
-            sym = "○"
-
-        if is_active:
-            name = f"[bold green]{node_id}[/bold green]"
-        elif is_done:
-            name = f"[dim]{node_id}[/dim]"
-        else:
-            name = node_id
-
-        suffix = f"  [italic]{status}[/italic]" if status else ""
-        return f"  {sym} {name}{suffix}"
-
-    def _render_edges(self, node_id: str, order_idx: dict[str, int]) -> list[str]:
-        """Render forward-edge connectors from *node_id*.
-
-        Back-edges are excluded here — they are drawn by the return-channel
-        overlay in Pass 2.
-        """
-        all_edges = self._graph.get_outgoing_edges(node_id)
-        if not all_edges:
-            return []
-
-        # Split into forward and back
-        forward = [e for e in all_edges if not self._is_back_edge(node_id, e.target, order_idx)]
-
-        if not forward:
-            # All edges are back-edges — nothing to render here
-            return []
-
-        if len(forward) == 1:
-            return ["  │", "  ▼"]
-
-        # Fan-out: show branches
-        lines: list[str] = []
-        for i, edge in enumerate(forward):
-            connector = "└" if i == len(forward) - 1 else "├"
-            cond = ""
-            if edge.condition.value not in ("always", "on_success"):
-                cond = f" [dim]({edge.condition.value})[/dim]"
-            lines.append(f"  {connector}──▶ {edge.target}{cond}")
-        return lines
-
-    # ------------------------------------------------------------------
-    # Return-channel overlay (Pass 2)
-    # ------------------------------------------------------------------
-
-    def _overlay_return_channels(
-        self,
-        lines: list[str],
-        node_line_map: dict[str, int],
-        back_edges: list[dict],
-        available_width: int,
-    ) -> list[str]:
-        """Overlay right-side return channels onto the line buffer.
-
-        Each back-edge gets a vertical channel on the right margin.  Channels
-        are allocated left-to-right by increasing span length so that shorter
-        (inner) loops are closer to the graph body and longer (outer) loops are
-        further right.
-
-        If the terminal is too narrow to fit even one channel, we fall back to
-        simple inline ``↺`` annotations instead.
-        """
-        if not back_edges:
-            return lines
-
-        num_channels = len(back_edges)
-
-        # Sort by span length ascending → inner loops get nearest channel
-        sorted_be = sorted(back_edges, key=lambda b: b["source_idx"] - b["target_idx"])
-
-        # --- Insert dedicated connector lines for back-edge sources ---
-        # Each back-edge source gets a blank line inserted after its node
-        # section (after any forward-edge lines).  We process insertions in
-        # reverse order so that earlier indices remain valid.
-        all_node_lines_set = set(node_line_map.values())
-
-        insertions: list[tuple[int, int]] = []  # (insert_after_line, be_index)
-        for be_idx, be in enumerate(sorted_be):
-            source_node_line = node_line_map.get(be["source"])
-            if source_node_line is None:
-                continue
-            # Walk forward to find the last line in this node's section
-            last_section_line = source_node_line
-            for li in range(source_node_line + 1, len(lines)):
-                if li in all_node_lines_set:
-                    break
-                last_section_line = li
-            insertions.append((last_section_line, be_idx))
-
-        source_line_for_be: dict[int, int] = {}
-        for insert_after, be_idx in sorted(insertions, reverse=True):
-            insert_at = insert_after + 1
-            lines.insert(insert_at, "")  # placeholder for connector
-            source_line_for_be[be_idx] = insert_at
-            # Shift node_line_map entries that come after the insertion point
-            for nid in node_line_map:
-                if node_line_map[nid] > insert_after:
-                    node_line_map[nid] += 1
-            # Also shift already-assigned source lines
-            for prev_idx in source_line_for_be:
-                if prev_idx != be_idx and source_line_for_be[prev_idx] > insert_after:
-                    source_line_for_be[prev_idx] += 1
-
-        # Recompute max content width after insertions
-        max_content_w = max(_plain_len(ln) for ln in lines) if lines else 0
-
-        # Check if we have room for channels
-        channels_total_w = num_channels * _CHANNEL_WIDTH
-        if max_content_w + channels_total_w + 2 > available_width:
-            return self._inline_back_edge_fallback(lines, node_line_map, back_edges)
-
-        content_pad = max_content_w + 3  # gap between content and first channel
-
-        # Build channel info with final line positions
-        channel_info: list[dict] = []
-        for ch_idx, be in enumerate(sorted_be):
-            target_line = node_line_map.get(be["target"])
-            source_line = source_line_for_be.get(ch_idx)
-            if target_line is None or source_line is None:
-                continue
-            col = content_pad + ch_idx * _CHANNEL_WIDTH
-            channel_info.append(
-                {
-                    "target_line": target_line,
-                    "source_line": source_line,
-                    "col": col,
-                }
-            )
-
-        if not channel_info:
-            return lines
-
-        # Build overlay grid — one row per line, columns for channel area
-        total_width = content_pad + num_channels * _CHANNEL_WIDTH + 1
-        overlay_width = total_width - max_content_w
-        overlays: list[list[str]] = [[" "] * overlay_width for _ in range(len(lines))]
-
-        for ci in channel_info:
-            tl = ci["target_line"]
-            sl = ci["source_line"]
-            col_offset = ci["col"] - max_content_w
-
-            if col_offset < 0 or col_offset >= overlay_width:
-                continue
-
-            # Target line: ◄──...──┐
-            if 0 <= tl < len(overlays):
-                for c in range(col_offset):
-                    if overlays[tl][c] == " ":
-                        overlays[tl][c] = "─"
-                overlays[tl][col_offset] = "┐"
-
-            # Source line: ──...──┘
-            if 0 <= sl < len(overlays):
-                for c in range(col_offset):
-                    if overlays[sl][c] == " ":
-                        overlays[sl][c] = "─"
-                overlays[sl][col_offset] = "┘"
-
-            # Vertical lines between target+1 and source-1
-            for li in range(tl + 1, sl):
-                if 0 <= li < len(overlays) and overlays[li][col_offset] == " ":
-                    overlays[li][col_offset] = "│"
-
-        # Merge overlays into the line strings
-        result: list[str] = []
-        for i, line in enumerate(lines):
-            pw = _plain_len(line)
-            pad = max_content_w - pw
-            overlay_chars = overlays[i] if i < len(overlays) else []
-            overlay_str = "".join(overlay_chars)
-            overlay_trimmed = overlay_str.rstrip()
-            if overlay_trimmed:
-                is_target_line = any(ci["target_line"] == i for ci in channel_info)
-                if is_target_line:
-                    overlay_trimmed = "◄" + overlay_trimmed[1:]
-
-                is_source_line = any(ci["source_line"] == i for ci in channel_info)
-                if is_source_line and not line.strip():
-                    # Inserted blank line → build └───┘ connector.
-                    # "  └" = 3 chars of content prefix, so remaining pad = max_content_w - 3
-                    remaining_pad = max_content_w - 3
-                    full = list(" " * remaining_pad + overlay_trimmed)
-                    # Find the ┘ corner for this source connector
-                    corner_pos = -1
-                    for ci_s in channel_info:
-                        if ci_s["source_line"] == i:
-                            corner_pos = remaining_pad + (ci_s["col"] - max_content_w)
-                            break
-                    # Fill everything up to the corner with ─
-                    if corner_pos >= 0:
-                        for c in range(corner_pos):
-                            if full[c] not in ("│", "┘", "┐"):
-                                full[c] = "─"
-                    connector = "  └" + "".join(full).rstrip()
-                    result.append(f"[dim]{connector}[/dim]")
-                    continue
-
-                colored_overlay = f"[dim]{' ' * pad}{overlay_trimmed}[/dim]"
-                result.append(f"{line}{colored_overlay}")
-            else:
-                result.append(line)
-
-        return result
-
-    def _inline_back_edge_fallback(
-        self,
-        lines: list[str],
-        node_line_map: dict[str, int],
-        back_edges: list[dict],
-    ) -> list[str]:
-        """Fallback: add inline ↺ annotations when terminal is too narrow for channels."""
-        # Group back-edges by source node
-        source_to_be: dict[str, list[dict]] = {}
-        for be in back_edges:
-            source_to_be.setdefault(be["source"], []).append(be)
-
-        result = list(lines)
-        # Insert annotation lines after each source node's section
-        offset = 0
-        all_node_lines = sorted(node_line_map.values())
-        for source, bes in source_to_be.items():
-            source_line = node_line_map.get(source)
-            if source_line is None:
-                continue
-            # Find end of source node section
-            end_line = source_line
-            for nl in all_node_lines:
-                if nl > source_line:
-                    end_line = nl - 1
-                    break
-            else:
-                end_line = len(lines) - 1
-            # Insert after last content line of this node's section
-            insert_at = end_line + offset + 1
-            for be in bes:
-                cond = ""
-                edge = be["edge"]
-                if edge.condition.value not in ("always", "on_success"):
-                    cond = f" [dim]({edge.condition.value})[/dim]"
-                annotation = f"  [yellow]↺[/yellow] {be['target']}{cond}"
-                result.insert(insert_at, annotation)
-                insert_at += 1
-                offset += 1
-
-        return result
-
-    # ------------------------------------------------------------------
-    # Main display
-    # ------------------------------------------------------------------
-
-    def _display_graph(self) -> None:
-        """Display the graph as an ASCII DAG with edge connectors and loop channels."""
-        display = self.query_one("#graph-display", RichLog)
-        display.clear()
-
-        graph = self._graph
-        display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
-
-        ordered = self._topo_order()
-        order_idx = {nid: i for i, nid in enumerate(ordered)}
-
-        # --- Pass 1: Build line buffer ---
-        lines: list[str] = []
-        node_line_map: dict[str, int] = {}
-
-        for node_id in ordered:
-            node_line_map[node_id] = len(lines)
-            lines.append(self._render_node_line(node_id))
-            for edge_line in self._render_edges(node_id, order_idx):
-                lines.append(edge_line)
-
-        # --- Pass 2: Overlay return channels for back-edges ---
-        back_edges = self._detect_back_edges(ordered)
-        if back_edges:
-            # Try to get actual widget width; default to a reasonable value
-            try:
-                available_width = self.size.width or 60
-            except Exception:
-                available_width = 60
-            lines = self._overlay_return_channels(lines, node_line_map, back_edges, available_width)
-
-        # Write all lines
-        for line in lines:
-            display.write(line)
-
-        # Execution path footer
-        if self.execution_path:
-            display.write("")
-            display.write(f"[dim]Path:[/dim] {' → '.join(self.execution_path[-5:])}")
-
-        # Event sources section
-        self._render_event_sources(display)
-
-    # ------------------------------------------------------------------
-    # Event sources display
-    # ------------------------------------------------------------------
-
-    def _render_event_sources(self, display: RichLog) -> None:
-        """Render event source info (webhooks, timers) below the graph."""
-        entry_points = self.runtime.get_entry_points()
-
-        # Filter to non-manual entry points (webhooks, timers, events)
-        event_sources = [ep for ep in entry_points if ep.trigger_type not in ("manual",)]
-        if not event_sources:
-            return
-
-        display.write("")
-        display.write("[bold cyan]Event Sources[/bold cyan]")
-
-        config = self.runtime._config
-
-        for ep in event_sources:
-            if ep.trigger_type == "timer":
-                cron_expr = ep.trigger_config.get("cron")
-                interval = ep.trigger_config.get("interval_minutes", "?")
-                schedule_label = f"cron: {cron_expr}" if cron_expr else f"every {interval} min"
-                display.write(f"  [green]⏱[/green]  {ep.name} [dim]→ {ep.entry_node}[/dim]")
-                # Show schedule + next fire countdown
-                next_fire = self.runtime._timer_next_fire.get(ep.id)
-                if next_fire is not None:
-                    remaining = max(0, next_fire - time.monotonic())
-                    hours, rem = divmod(int(remaining), 3600)
-                    mins, secs = divmod(rem, 60)
-                    if hours > 0:
-                        countdown = f"{hours}h {mins:02d}m {secs:02d}s"
-                    else:
-                        countdown = f"{mins}m {secs:02d}s"
-                    display.write(f"     [dim]{schedule_label} — next in {countdown}[/dim]")
-                else:
-                    display.write(f"     [dim]{schedule_label}[/dim]")
-
-            elif ep.trigger_type in ("event", "webhook"):
-                display.write(f"  [yellow]⚡[/yellow] {ep.name} [dim]→ {ep.entry_node}[/dim]")
-                # Show webhook endpoint if configured
-                route = None
-                for r in config.webhook_routes:
-                    src = r.get("source_id", "")
-                    if src and src in ep.id:
-                        route = r
-                        break
-                if not route and config.webhook_routes:
-                    # Fall back to first route
-                    route = config.webhook_routes[0]
-
-                if route:
-                    host = config.webhook_host
-                    port = config.webhook_port
-                    path = route.get("path", "/webhook")
-                    display.write(f"     [dim]{host}:{port}{path}[/dim]")
-                else:
-                    event_types = ep.trigger_config.get("event_types", [])
-                    if event_types:
-                        display.write(f"     [dim]events: {', '.join(event_types)}[/dim]")
-
-    # ------------------------------------------------------------------
-    # Public API (called by app.py)
-    # ------------------------------------------------------------------
-
-    def update_active_node(self, node_id: str) -> None:
-        """Update the currently active node."""
-        self.active_node = node_id
-        if node_id not in self.execution_path:
-            self.execution_path.append(node_id)
-        self._display_graph()
-
-    def update_execution(self, event) -> None:
-        """Update the displayed node status based on execution lifecycle events."""
-        if event.type == EventType.EXECUTION_STARTED:
-            self._node_status.clear()
-            self.execution_path.clear()
-            entry_node = event.data.get("entry_node") or (
-                self._graph.entry_node if self.runtime else None
-            )
-            if entry_node:
-                self.update_active_node(entry_node)
-
-        elif event.type == EventType.EXECUTION_COMPLETED:
-            self.active_node = None
-            self._node_status.clear()
-            self._display_graph()
-
-        elif event.type == EventType.EXECUTION_FAILED:
-            error = event.data.get("error", "Unknown error")
-            if self.active_node:
-                self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
-            self.active_node = None
-            self._display_graph()
-
-    # -- Event handlers called by app.py _handle_event --
-
-    def handle_node_loop_started(self, node_id: str) -> None:
-        """A node's event loop has started."""
-        self._node_status[node_id] = "thinking..."
-        self.update_active_node(node_id)
-
-    def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
-        """A node advanced to a new loop iteration."""
-        self._node_status[node_id] = f"step {iteration}"
-        self._display_graph()
-
-    def handle_node_loop_completed(self, node_id: str) -> None:
-        """A node's event loop completed."""
-        self._node_status.pop(node_id, None)
-        if self.active_node == node_id:
-            self.active_node = None
-        self._display_graph()
-
-    def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
-        """Show tool activity next to the active node."""
-        if started:
-            self._node_status[node_id] = f"{tool_name}..."
-        else:
-            # Restore to generic thinking status after tool completes
-            self._node_status[node_id] = "thinking..."
-        self._display_graph()
-
-    def handle_stalled(self, node_id: str, reason: str) -> None:
-        """Highlight a stalled node."""
-        self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
-        self._display_graph()
-
-    def handle_edge_traversed(self, source_node: str, target_node: str) -> None:
-        """Highlight an edge being traversed."""
-        self._node_status[source_node] = f"[dim]→ {target_node}[/dim]"
-        self._display_graph()
@@ -1,172 +0,0 @@
-"""
-Log formatting utilities and LogPane widget.
-
-The module-level functions (format_event, extract_event_text, format_python_log)
-can be used by any widget that needs to render log lines without instantiating LogPane.
-"""
-
-import logging
-from datetime import datetime
-
-from textual.app import ComposeResult
-from textual.containers import Container
-
-from framework.runtime.event_bus import AgentEvent, EventType
-from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
-
-# --- Module-level formatting constants ---
-
-EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
-    EventType.EXECUTION_STARTED: (">>", "bold cyan"),
-    EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
-    EventType.EXECUTION_FAILED: ("!!", "bold red"),
-    EventType.TOOL_CALL_STARTED: ("->", "yellow"),
-    EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
-    EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
-    EventType.NODE_LOOP_ITERATION: ("..", "dim"),
-    EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
-    EventType.LLM_TURN_COMPLETE: ("◆", "green"),
-    EventType.NODE_STALLED: ("!!", "bold yellow"),
-    EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
-    EventType.GOAL_PROGRESS: ("%%", "blue"),
-    EventType.GOAL_ACHIEVED: ("**", "bold green"),
-    EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
-    EventType.STATE_CHANGED: ("~~", "dim"),
-    EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
-}
-
-LOG_LEVEL_COLORS: dict[int, str] = {
-    logging.DEBUG: "dim",
-    logging.INFO: "",
-    logging.WARNING: "yellow",
-    logging.ERROR: "red",
-    logging.CRITICAL: "bold red",
-}
-
-
-# --- Module-level formatting functions ---
-
-
-def extract_event_text(event: AgentEvent) -> str:
-    """Extract human-readable text from an event's data dict."""
-    et = event.type
-    data = event.data
-
-    if et == EventType.EXECUTION_STARTED:
-        return "Execution started"
-    elif et == EventType.EXECUTION_COMPLETED:
-        return "Execution completed"
-    elif et == EventType.EXECUTION_FAILED:
-        return f"Execution FAILED: {data.get('error', 'unknown')}"
-    elif et == EventType.TOOL_CALL_STARTED:
-        return f"Tool call: {data.get('tool_name', 'unknown')}"
-    elif et == EventType.TOOL_CALL_COMPLETED:
-        name = data.get("tool_name", "unknown")
-        if data.get("is_error"):
-            preview = str(data.get("result", ""))[:80]
-            return f"Tool error: {name} - {preview}"
-        return f"Tool done: {name}"
-    elif et == EventType.NODE_LOOP_STARTED:
-        return f"Node started: {event.node_id or 'unknown'}"
-    elif et == EventType.NODE_LOOP_ITERATION:
-        return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
-    elif et == EventType.NODE_LOOP_COMPLETED:
-        return f"Node done: {event.node_id or 'unknown'}"
-    elif et == EventType.NODE_STALLED:
-        reason = data.get("reason", "")
-        node = event.node_id or "unknown"
-        return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
-    elif et == EventType.NODE_INPUT_BLOCKED:
-        return f"Node input blocked: {event.node_id or 'unknown'}"
-    elif et == EventType.GOAL_PROGRESS:
-        return f"Goal progress: {data.get('progress', '?')}"
-    elif et == EventType.GOAL_ACHIEVED:
-        return "Goal achieved"
-    elif et == EventType.CONSTRAINT_VIOLATION:
-        return f"Constraint violated: {data.get('description', 'unknown')}"
-    elif et == EventType.STATE_CHANGED:
-        return f"State changed: {data.get('key', 'unknown')}"
-    elif et == EventType.CLIENT_INPUT_REQUESTED:
-        return "Waiting for user input"
-    elif et == EventType.LLM_TURN_COMPLETE:
-        stop = data.get("stop_reason", "?")
-        model = data.get("model", "?")
-        inp = data.get("input_tokens", 0)
-        out = data.get("output_tokens", 0)
-        return f"{model} → {stop} ({inp}+{out} tokens)"
-    else:
-        return f"{et.value}: {data}"
-
-
-def format_event(event: AgentEvent) -> str:
-    """Format an AgentEvent as a Rich markup string with timestamp + symbol."""
-    ts = event.timestamp.strftime("%H:%M:%S")
-    symbol, color = EVENT_FORMAT.get(event.type, ("--", "dim"))
-    text = extract_event_text(event)
-    return f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]"
-
-
-def format_python_log(record: logging.LogRecord) -> str:
-    """Format a Python log record as a Rich markup string with timestamp and severity color."""
-    ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
-    color = LOG_LEVEL_COLORS.get(record.levelno, "")
-    msg = record.getMessage()
-    if color:
-        return f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}"
-    else:
-        return f"[dim]{ts}[/dim] {record.levelname} {msg}"
-
-
-# --- LogPane widget (kept for backward compatibility) ---
-
-
-class LogPane(Container):
-    """Widget to display logs with reliable rendering."""
-
-    DEFAULT_CSS = """
-    LogPane {
-        width: 100%;
-        height: 100%;
-    }
-
-    LogPane > RichLog {
-        width: 100%;
-        height: 100%;
-        background: $surface;
-        border: none;
-        scrollbar-background: $panel;
-        scrollbar-color: $primary;
-    }
-    """
-
-    def compose(self) -> ComposeResult:
-        yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
-
-    def write_event(self, event: AgentEvent) -> None:
-        """Format an AgentEvent with timestamp + symbol and write to the log."""
-        self.write_log(format_event(event))
-
-    def write_python_log(self, record: logging.LogRecord) -> None:
-        """Format a Python log record with timestamp and severity color."""
-        self.write_log(format_python_log(record))
-
-    def write_log(self, message: str) -> None:
-        """Write a log message to the log pane."""
-        try:
-            if not self.is_mounted:
-                return
-
-            log = self.query_one("#main-log", RichLog)
-
-            if not log.is_mounted:
-                return
-
-            was_at_bottom = log.is_vertical_scroll_end
-
-            log.write(message)
-
-            if was_at_bottom:
-                log.scroll_end(animate=False)
-
-        except Exception:
-            pass
@@ -1,229 +0,0 @@
-"""
-SelectableRichLog - RichLog with mouse-driven text selection and clipboard copy.
-
-Drop-in replacement for RichLog. Click-and-drag to select text, which is
-visually highlighted. Press Ctrl+C to copy selection to clipboard (handled
-by app.py). Press Escape or single-click to clear selection.
-"""
-
-from __future__ import annotations
-
-import subprocess
-import sys
-
-from rich.segment import Segment as RichSegment
-from rich.style import Style
-from textual.geometry import Offset
-from textual.selection import Selection
-from textual.strip import Strip
-from textual.widgets import RichLog
-
-# Highlight style for selected text
-_HIGHLIGHT_STYLE = Style(bgcolor="blue", color="white")
-
-
-class SelectableRichLog(RichLog):
-    """RichLog with mouse-driven text selection."""
-
-    DEFAULT_CSS = """
-    SelectableRichLog {
-        pointer: text;
-    }
-    """
-
-    def __init__(self, **kwargs) -> None:
-        super().__init__(**kwargs)
-        self._sel_anchor: Offset | None = None
-        self._sel_end: Offset | None = None
-        self._selecting: bool = False
-
-    # -- Internal helpers --
-
-    def _apply_highlight(self, strip: Strip) -> Strip:
-        """Apply highlight with correct precedence (highlight wins over base style)."""
-        segments = []
-        for text, style, control in strip._segments:
-            if control:
-                segments.append(RichSegment(text, style, control))
-            else:
-                new_style = (style + _HIGHLIGHT_STYLE) if style else _HIGHLIGHT_STYLE
-                segments.append(RichSegment(text, new_style, control))
-        return Strip(segments, strip.cell_length)
-
-    # -- Selection helpers --
-
-    @property
-    def selection(self) -> Selection | None:
-        """Build a Selection from current anchor/end, or None if no selection."""
-        if self._sel_anchor is None or self._sel_end is None:
-            return None
-        if self._sel_anchor == self._sel_end:
-            return None
-        return Selection.from_offsets(self._sel_anchor, self._sel_end)
-
-    def _mouse_to_content(self, event_x: int, event_y: int) -> Offset:
-        """Convert viewport mouse coords to content (line, col) coords."""
-        scroll_x, scroll_y = self.scroll_offset
-        return Offset(scroll_x + event_x, scroll_y + event_y)
-
-    def clear_selection(self) -> None:
-        """Clear any active selection."""
-        had_selection = self._sel_anchor is not None
-        self._sel_anchor = None
-        self._sel_end = None
-        self._selecting = False
-        if had_selection:
-            self.refresh()
-
-    # -- Mouse handlers (left button only) --
-
-    def on_mouse_down(self, event) -> None:
-        """Start selection on left mouse button."""
-        if event.button != 1:
-            return
-        self._sel_anchor = self._mouse_to_content(event.x, event.y)
-        self._sel_end = self._sel_anchor
-        self._selecting = True
-        self.capture_mouse()
-        self.refresh()
-
-    def on_mouse_move(self, event) -> None:
-        """Extend selection while dragging."""
-        if not self._selecting:
-            return
-        self._sel_end = self._mouse_to_content(event.x, event.y)
-        self.refresh()
-
-    def on_mouse_up(self, event) -> None:
-        """End selection on mouse release."""
-        if not self._selecting:
-            return
-        self._selecting = False
-        self.release_mouse()
-
-        # Single-click (no drag) clears selection
-        if self._sel_anchor == self._sel_end:
-            self.clear_selection()
-
-    # -- Keyboard handlers --
-
-    def on_key(self, event) -> None:
-        """Clear selection on Escape."""
-        if event.key == "escape":
-            self.clear_selection()
-
-    # -- Rendering with highlight --
-
-    def render_line(self, y: int) -> Strip:
-        """Override to apply selection highlight on top of the base strip."""
-        strip = super().render_line(y)
-
-        sel = self.selection
-        if sel is None:
-            return strip
-
-        # Determine which content line this viewport row corresponds to
-        _, scroll_y = self.scroll_offset
-        content_y = scroll_y + y
-
-        span = sel.get_span(content_y)
-        if span is None:
-            return strip
-
-        start_x, end_x = span
-        cell_len = strip.cell_length
-        if cell_len == 0:
-            return strip
-
-        scroll_x, _ = self.scroll_offset
-
-        # -1 means "to end of content line" — use viewport end
-        if end_x == -1:
-            end_x = cell_len
-        else:
-            # Convert content-space x to viewport-space x
-            end_x = end_x - scroll_x
-
-        # Convert content-space x to viewport-space x
-        start_x = start_x - scroll_x
-
-        # Clamp to viewport strip bounds
-        start_x = max(0, start_x)
-        end_x = min(end_x, cell_len)
-
-        if start_x >= end_x:
-            return strip
-
-        # Divide strip into [before, selected, after] and highlight the middle
-        parts = strip.divide([start_x, end_x])
-        if len(parts) < 2:
-            return strip
-
-        highlighted_parts: list[Strip] = []
-        for i, part in enumerate(parts):
-            if i == 1:
-                highlighted_parts.append(self._apply_highlight(part))
-            else:
-                highlighted_parts.append(part)
-
-        return Strip.join(highlighted_parts)
-
-    # -- Text extraction & clipboard --
-
-    def get_selected_text(self) -> str | None:
-        """Extract the plain text of the current selection, or None."""
-        sel = self.selection
-        if sel is None:
-            return None
-
-        # Build full text from all lines
-        all_text = "\n".join(strip.text for strip in self.lines)
-        try:
-            extracted = sel.extract(all_text)
-        except (IndexError, ValueError):
-            # Selection coordinates can exceed line count when the virtual
-            # canvas is larger than the actual content (e.g. after scroll).
-            return None
-        return extracted if extracted else None
-
-    def copy_selection(self) -> str | None:
-        """Copy selected text to system clipboard. Returns text or None."""
-        text = self.get_selected_text()
-        if not text:
-            return None
-        _copy_to_clipboard(text)
-        return text
-
-
-def _copy_to_clipboard(text: str) -> None:
-    """Copy text to system clipboard using platform-native tools."""
-    try:
-        if sys.platform == "darwin":
-            subprocess.run(["pbcopy"], encoding="utf-8", input=text.encode(), check=True, timeout=5)
-        elif sys.platform == "win32":
-            subprocess.run(
-                ["clip.exe"],
-                encoding="utf-8",
-                input=text.encode("utf-16le"),
-                check=True,
-                timeout=5,
-            )
-        elif sys.platform.startswith("linux"):
-            try:
-                subprocess.run(
-                    ["xclip", "-selection", "clipboard"],
-                    encoding="utf-8",
-                    input=text.encode(),
-                    check=True,
-                    timeout=5,
-                )
-            except (subprocess.SubprocessError, FileNotFoundError):
-                subprocess.run(
-                    ["xsel", "--clipboard", "--input"],
-                    encoding="utf-8",
-                    input=text.encode(),
-                    check=True,
-                    timeout=5,
-                )
-    except (subprocess.SubprocessError, FileNotFoundError):
-        pass
@@ -1,5 +1,5 @@
 import { api } from "./client";
-import type { GraphTopology, NodeDetail, NodeCriteria, ToolInfo } from "./types";
+import type { GraphTopology, NodeDetail, NodeCriteria, ToolInfo, DraftGraph, FlowchartMap } from "./types";

 export const graphsApi = {
  nodes: (sessionId: string, graphId: string, workerSessionId?: string) =>
@@ -26,4 +26,14 @@ export const graphsApi = {
    api.get<{ tools: ToolInfo[] }>(
      `/sessions/${sessionId}/graphs/${graphId}/nodes/${nodeId}/tools`,
    ),
+
+  draftGraph: (sessionId: string) =>
+    api.get<{ draft: DraftGraph | null }>(
+      `/sessions/${sessionId}/draft-graph`,
+    ),
+
+  flowchartMap: (sessionId: string) =>
+    api.get<FlowchartMap>(
+      `/sessions/${sessionId}/flowchart-map`,
+    ),
 };
@@ -12,8 +12,8 @@ export interface LiveSession {
  loaded_at: number;
  uptime_seconds: number;
  intro_message?: string;
-  /** Queen operating phase — "building", "staging", or "running" */
-  queen_phase?: "building" | "staging" | "running";
+  /** Queen operating phase — "planning", "building", "staging", or "running" */
+  queen_phase?: "planning" | "building" | "staging" | "running";
  /** Present in 409 conflict responses when worker is still loading */
  loading?: boolean;
 }
@@ -191,6 +191,56 @@ export interface GraphTopology {
  entry_points?: EntryPoint[];
 }

+// --- Draft graph types (planning phase) ---
+
+export interface DraftNode {
+  id: string;
+  name: string;
+  description: string;
+  node_type: string;
+  tools: string[];
+  input_keys: string[];
+  output_keys: string[];
+  success_criteria: string;
+  sub_agents: string[];
+  /** For decision nodes: the yes/no question evaluated during dissolution. */
+  decision_clause?: string;
+  flowchart_type: string;
+  flowchart_shape: string;
+  flowchart_color: string;
+}
+
+export interface DraftEdge {
+  id: string;
+  source: string;
+  target: string;
+  condition: string;
+  description: string;
+  /** Short label shown on the flowchart edge (e.g. "Yes", "No"). */
+  label?: string;
+}
+
+export interface DraftGraph {
+  agent_name: string;
+  goal: string;
+  description: string;
+  success_criteria: string[];
+  constraints: string[];
+  nodes: DraftNode[];
+  edges: DraftEdge[];
+  entry_node: string;
+  terminal_nodes: string[];
+  flowchart_legend: Record<string, { shape: string; color: string }>;
+}
+
+/** Mapping from runtime graph nodes → original flowchart draft nodes. */
+export interface FlowchartMap {
+  /** runtime_node_id → list of original draft node IDs it absorbed. */
+  map: Record<string, string[]> | null;
+  /** Original draft graph preserved before planning-node dissolution (decision + subagent). */
+  original_draft: DraftGraph | null;
+}
+
 export interface NodeCriteria {
  node_id: string;
  success_criteria: string | null;
@@ -276,7 +326,9 @@ export type EventTypeName =
  | "worker_loaded"
  | "credentials_required"
  | "queen_phase_changed"
-  | "subagent_report";
+  | "subagent_report"
+  | "draft_graph_updated"
+  | "flowchart_map_updated";

 export interface AgentEvent {
  type: EventTypeName;
@@ -31,7 +31,7 @@ interface AgentGraphProps {
  version?: string;
  runState?: RunState;
  building?: boolean;
-  queenPhase?: "building" | "staging" | "running";
+  queenPhase?: "planning" | "building" | "staging" | "running";
 }

 // --- Extracted RunButton so hover state survives parent re-renders ---
@@ -278,7 +278,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
              </span>
            )}
          </div>
-          <RunButton runState={runState} disabled={nodes.length === 0 || queenPhase === "building"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
+          <RunButton runState={runState} disabled={nodes.length === 0 || queenPhase === "building" || queenPhase === "planning"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
        </div>
        <div className="flex-1 flex items-center justify-center px-5">
          {building ? (
@@ -2,6 +2,7 @@ import { memo, useState, useRef, useEffect } from "react";
 import { Send, Square, Crown, Cpu, Check, Loader2 } from "lucide-react";
 import MarkdownContent from "@/components/MarkdownContent";
 import QuestionWidget from "@/components/QuestionWidget";
+import MultiQuestionWidget from "@/components/MultiQuestionWidget";

 export interface ChatMessage {
  id: string;
@@ -34,12 +35,16 @@ interface ChatPanelProps {
  pendingQuestion?: string | null;
  /** Options for the pending question */
  pendingOptions?: string[] | null;
+  /** Multiple questions from ask_user_multiple */
+  pendingQuestions?: { id: string; prompt: string; options?: string[] }[] | null;
  /** Called when user submits an answer to the pending question */
  onQuestionSubmit?: (answer: string, isOther: boolean) => void;
+  /** Called when user submits answers to multiple questions */
+  onMultiQuestionSubmit?: (answers: Record<string, string>) => void;
  /** Called when user dismisses the pending question without answering */
  onQuestionDismiss?: () => void;
  /** Queen operating phase — shown as a tag on queen messages */
-  queenPhase?: "building" | "staging" | "running";
+  queenPhase?: "planning" | "building" | "staging" | "running";
 }

 const queenColor = "hsl(45,95%,58%)";
@@ -144,7 +149,7 @@ function ToolActivityRow({ content }: { content: string }) {
  );
 }

-const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: ChatMessage; queenPhase?: "building" | "staging" | "running" }) {
+const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: ChatMessage; queenPhase?: "planning" | "building" | "staging" | "running" }) {
  const isUser = msg.type === "user";
  const isQueen = msg.role === "queen";
  const color = getColor(msg.agent, msg.role);
@@ -204,7 +209,9 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
                ? "running phase"
                : queenPhase === "staging"
                  ? "staging phase"
-                  : "building phase"
+                  : queenPhase === "planning"
+                    ? "planning phase"
+                    : "building phase"
              : "Worker"}
          </span>
        </div>
@@ -220,7 +227,7 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
  );
 }, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.queenPhase === next.queenPhase);

-export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, onQuestionSubmit, onQuestionDismiss, queenPhase }: ChatPanelProps) {
+export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, pendingQuestions, onQuestionSubmit, onMultiQuestionSubmit, onQuestionDismiss, queenPhase }: ChatPanelProps) {
  const [input, setInput] = useState("");
  const [readMap, setReadMap] = useState<Record<string, number>>({});
  const bottomRef = useRef<HTMLDivElement>(null);
@@ -330,7 +337,13 @@ export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting
      </div>

      {/* Input area — question widget replaces textarea when a question is pending */}
-      {pendingQuestion && pendingOptions && onQuestionSubmit ? (
+      {pendingQuestions && pendingQuestions.length >= 2 && onMultiQuestionSubmit ? (
+        <MultiQuestionWidget
+          questions={pendingQuestions}
+          onSubmit={onMultiQuestionSubmit}
+          onDismiss={onQuestionDismiss}
+        />
+      ) : pendingQuestion && pendingOptions && onQuestionSubmit ? (
        <QuestionWidget
          question={pendingQuestion}
          options={pendingOptions}
@@ -0,0 +1,848 @@
+import { useEffect, useMemo, useRef, useState } from "react";
+import type { DraftGraph as DraftGraphData, DraftNode } from "@/api/types";
+import type { GraphNode } from "./AgentGraph";
+
+type DraftNodeStatus = "pending" | "running" | "complete" | "error";
+
+interface DraftGraphProps {
+  draft: DraftGraphData;
+  onNodeClick?: (node: DraftNode) => void;
+  /** Runtime node ID → list of original draft node IDs (post-dissolution mapping). */
+  flowchartMap?: Record<string, string[]>;
+  /** Current runtime graph nodes with live status (for overlay during execution). */
+  runtimeNodes?: GraphNode[];
+  /** Called when a draft node is clicked in overlay mode — receives the runtime node ID. */
+  onRuntimeNodeClick?: (runtimeNodeId: string) => void;
+}
+
+// Layout constants — tuned for a ~500px panel (484px after px-2 padding)
+const NODE_H = 52;
+const GAP_Y = 48;
+const TOP_Y = 28;
+const MARGIN_X = 16;
+const GAP_X = 16;
+
+function truncateLabel(label: string, availablePx: number, fontSize: number): string {
+  const avgCharW = fontSize * 0.58;
+  const maxChars = Math.floor(availablePx / avgCharW);
+  if (label.length <= maxChars) return label;
+  return label.slice(0, Math.max(maxChars - 1, 1)) + "\u2026";
+}
+
+/**
+ * Render an ISO 5807 flowchart shape as an SVG element.
+ */
+function FlowchartShape({
+  shape,
+  x,
+  y,
+  w,
+  h,
+  color,
+  selected,
+}: {
+  shape: string;
+  x: number;
+  y: number;
+  w: number;
+  h: number;
+  color: string;
+  selected: boolean;
+}) {
+  const fill = selected ? `${color}28` : `${color}18`;
+  const stroke = selected ? color : `${color}80`;
+  const common = { fill, stroke, strokeWidth: 1.2 };
+
+  switch (shape) {
+    case "stadium":
+      return <rect x={x} y={y} width={w} height={h} rx={h / 2} {...common} />;
+
+    case "rectangle":
+      return <rect x={x} y={y} width={w} height={h} rx={4} {...common} />;
+
+    case "rounded_rect":
+      return <rect x={x} y={y} width={w} height={h} rx={12} {...common} />;
+
+    case "diamond": {
+      const cx = x + w / 2;
+      const cy = y + h / 2;
+      // Keep diamond within bounding box
+      return (
+        <polygon
+          points={`${cx},${y} ${x + w},${cy} ${cx},${y + h} ${x},${cy}`}
+          {...common}
+        />
+      );
+    }
+
+    case "parallelogram": {
+      const skew = 12;
+      return (
+        <polygon
+          points={`${x + skew},${y} ${x + w},${y} ${x + w - skew},${y + h} ${x},${y + h}`}
+          {...common}
+        />
+      );
+    }
+
+    case "document": {
+      const d = `M ${x} ${y + 4} Q ${x} ${y}, ${x + 8} ${y} L ${x + w - 8} ${y} Q ${x + w} ${y}, ${x + w} ${y + 4} L ${x + w} ${y + h - 8} C ${x + w * 0.75} ${y + h + 2}, ${x + w * 0.25} ${y + h - 10}, ${x} ${y + h - 4} Z`;
+      return <path d={d} {...common} />;
+    }
+
+    case "multi_document": {
+      const off = 3;
+      const d = `M ${x} ${y + 4 + off} Q ${x} ${y + off}, ${x + 8} ${y + off} L ${x + w - 8 - off} ${y + off} Q ${x + w - off} ${y + off}, ${x + w - off} ${y + 4 + off} L ${x + w - off} ${y + h - 8} C ${x + (w - off) * 0.75} ${y + h + 2}, ${x + (w - off) * 0.25} ${y + h - 10}, ${x} ${y + h - 4} Z`;
+      return (
+        <g>
+          <rect x={x + off * 2} y={y} width={w - off * 2} height={h - off} rx={4} fill={fill} stroke={stroke} strokeWidth={1.2} opacity={0.4} />
+          <rect x={x + off} y={y + off / 2} width={w - off} height={h - off} rx={4} fill={fill} stroke={stroke} strokeWidth={1.2} opacity={0.6} />
+          <path d={d} {...common} />
+        </g>
+      );
+    }
+
+    case "subroutine": {
+      const inset = 7;
+      return (
+        <g>
+          <rect x={x} y={y} width={w} height={h} rx={4} {...common} />
+          <line x1={x + inset} y1={y} x2={x + inset} y2={y + h} stroke={stroke} strokeWidth={1.2} />
+          <line x1={x + w - inset} y1={y} x2={x + w - inset} y2={y + h} stroke={stroke} strokeWidth={1.2} />
+        </g>
+      );
+    }
+
+    case "hexagon": {
+      const inset = 14;
+      return (
+        <polygon
+          points={`${x + inset},${y} ${x + w - inset},${y} ${x + w},${y + h / 2} ${x + w - inset},${y + h} ${x + inset},${y + h} ${x},${y + h / 2}`}
+          {...common}
+        />
+      );
+    }
+
+    case "manual_input":
+      return (
+        <polygon
+          points={`${x},${y + 10} ${x + w},${y} ${x + w},${y + h} ${x},${y + h}`}
+          {...common}
+        />
+      );
+
+    case "trapezoid": {
+      const inset = 12;
+      return (
+        <polygon
+          points={`${x},${y} ${x + w},${y} ${x + w - inset},${y + h} ${x + inset},${y + h}`}
+          {...common}
+        />
+      );
+    }
+
+    case "delay": {
+      const d = `M ${x} ${y + 4} Q ${x} ${y}, ${x + 4} ${y} L ${x + w * 0.65} ${y} A ${w * 0.35} ${h / 2} 0 0 1 ${x + w * 0.65} ${y + h} L ${x + 4} ${y + h} Q ${x} ${y + h}, ${x} ${y + h - 4} Z`;
+      return <path d={d} {...common} />;
+    }
+
+    case "display": {
+      const d = `M ${x + 16} ${y} L ${x + w * 0.65} ${y} A ${w * 0.35} ${h / 2} 0 0 1 ${x + w * 0.65} ${y + h} L ${x + 16} ${y + h} L ${x} ${y + h / 2} Z`;
+      return <path d={d} {...common} />;
+    }
+
+    case "cylinder": {
+      const ry = 7;
+      return (
+        <g>
+          <path
+            d={`M ${x} ${y + ry} L ${x} ${y + h - ry} A ${w / 2} ${ry} 0 0 0 ${x + w} ${y + h - ry} L ${x + w} ${y + ry}`}
+            {...common}
+          />
+          <ellipse cx={x + w / 2} cy={y + ry} rx={w / 2} ry={ry} {...common} />
+          <ellipse cx={x + w / 2} cy={y + h - ry} rx={w / 2} ry={ry} fill={fill} stroke={stroke} strokeWidth={1.2} />
+        </g>
+      );
+    }
+
+    case "stored_data": {
+      const d = `M ${x + 14} ${y} L ${x + w} ${y} A 10 ${h / 2} 0 0 0 ${x + w} ${y + h} L ${x + 14} ${y + h} A 10 ${h / 2} 0 0 1 ${x + 14} ${y} Z`;
+      return <path d={d} {...common} />;
+    }
+
+    case "internal_storage":
+      return (
+        <g>
+          <rect x={x} y={y} width={w} height={h} rx={4} {...common} />
+          <line x1={x + 10} y1={y} x2={x + 10} y2={y + h} stroke={stroke} strokeWidth={0.8} opacity={0.5} />
+          <line x1={x} y1={y + 10} x2={x + w} y2={y + 10} stroke={stroke} strokeWidth={0.8} opacity={0.5} />
+        </g>
+      );
+
+    case "circle": {
+      const r = Math.min(w, h) / 2 - 2;
+      return <circle cx={x + w / 2} cy={y + h / 2} r={r} {...common} />;
+    }
+
+    case "pentagon":
+      return (
+        <polygon
+          points={`${x},${y} ${x + w},${y} ${x + w},${y + h * 0.6} ${x + w / 2},${y + h} ${x},${y + h * 0.6}`}
+          {...common}
+        />
+      );
+
+    case "triangle_inv":
+      return (
+        <polygon
+          points={`${x},${y} ${x + w},${y} ${x + w / 2},${y + h}`}
+          {...common}
+        />
+      );
+
+    case "triangle":
+      return (
+        <polygon
+          points={`${x + w / 2},${y} ${x + w},${y + h} ${x},${y + h}`}
+          {...common}
+        />
+      );
+
+    case "hourglass":
+      return (
+        <polygon
+          points={`${x},${y} ${x + w},${y} ${x + w / 2},${y + h / 2} ${x + w},${y + h} ${x},${y + h} ${x + w / 2},${y + h / 2}`}
+          {...common}
+        />
+      );
+
+    case "circle_cross": {
+      const r = Math.min(w, h) / 2 - 2;
+      const cx = x + w / 2;
+      const cy = y + h / 2;
+      return (
+        <g>
+          <circle cx={cx} cy={cy} r={r} {...common} />
+          <line x1={cx - r * 0.7} y1={cy - r * 0.7} x2={cx + r * 0.7} y2={cy + r * 0.7} stroke={stroke} strokeWidth={1} />
+          <line x1={cx + r * 0.7} y1={cy - r * 0.7} x2={cx - r * 0.7} y2={cy + r * 0.7} stroke={stroke} strokeWidth={1} />
+        </g>
+      );
+    }
+
+    case "circle_bar": {
+      const r = Math.min(w, h) / 2 - 2;
+      const cx = x + w / 2;
+      const cy = y + h / 2;
+      return (
+        <g>
+          <circle cx={cx} cy={cy} r={r} {...common} />
+          <line x1={cx} y1={cy - r} x2={cx} y2={cy + r} stroke={stroke} strokeWidth={1} />
+          <line x1={cx - r} y1={cy} x2={cx + r} y2={cy} stroke={stroke} strokeWidth={1} />
+        </g>
+      );
+    }
+
+    case "flag": {
+      const d = `M ${x} ${y} L ${x + w} ${y} L ${x + w - 8} ${y + h / 2} L ${x + w} ${y + h} L ${x} ${y + h} Z`;
+      return <path d={d} {...common} />;
+    }
+
+    default:
+      return <rect x={x} y={y} width={w} height={h} rx={8} {...common} />;
+  }
+}
+
+/** HTML tooltip positioned over the graph container */
+function Tooltip({ node, style }: { node: DraftNode; style: React.CSSProperties }) {
+  const lines: string[] = [];
+  if (node.description) lines.push(node.description);
+  if (node.tools.length > 0) lines.push(`Tools: ${node.tools.join(", ")}`);
+  if (node.success_criteria) lines.push(`Criteria: ${node.success_criteria}`);
+  if (lines.length === 0) return null;
+
+  return (
+    <div
+      className="absolute z-20 pointer-events-none px-2.5 py-2 rounded-md border border-border/40 bg-popover/95 backdrop-blur-sm shadow-lg max-w-[260px]"
+      style={style}
+    >
+      {lines.map((line, i) => (
+        <p key={i} className="text-[10px] text-muted-foreground leading-[1.4] mb-0.5 last:mb-0">
+          {line}
+        </p>
+      ))}
+    </div>
+  );
+}
+
+export default function DraftGraph({ draft, onNodeClick, flowchartMap, runtimeNodes, onRuntimeNodeClick }: DraftGraphProps) {
+  const [hoveredNode, setHoveredNode] = useState<string | null>(null);
+  const containerRef = useRef<HTMLDivElement>(null);
+  const [containerW, setContainerW] = useState(484);
+
+  // Measure actual container width so layout fills it exactly
+  useEffect(() => {
+    const el = containerRef.current;
+    if (!el) return;
+    const ro = new ResizeObserver((entries) => {
+      const w = entries[0]?.contentRect.width;
+      if (w && w > 0) setContainerW(w);
+    });
+    ro.observe(el);
+    // Capture initial width
+    setContainerW(el.clientWidth || 484);
+    return () => ro.disconnect();
+  }, []);
+
+  // Invert flowchartMap: draftNodeId → runtimeNodeId
+  const draftToRuntime = useMemo<Record<string, string>>(() => {
+    if (!flowchartMap) return {};
+    const map: Record<string, string> = {};
+    for (const [runtimeId, draftIds] of Object.entries(flowchartMap)) {
+      for (const did of draftIds) {
+        map[did] = runtimeId;
+      }
+    }
+    return map;
+  }, [flowchartMap]);
+
+  // Compute draft node statuses from runtime overlay
+  const nodeStatuses = useMemo<Record<string, DraftNodeStatus>>(() => {
+    if (!runtimeNodes?.length || !Object.keys(draftToRuntime).length) return {};
+    // Build runtime status lookup
+    const runtimeStatus: Record<string, DraftNodeStatus> = {};
+    for (const rn of runtimeNodes) {
+      const s = rn.status;
+      runtimeStatus[rn.id] =
+        s === "running" || s === "looping" ? "running"
+        : s === "complete" ? "complete"
+        : s === "error" ? "error"
+        : "pending";
+    }
+    // Map to draft nodes
+    const result: Record<string, DraftNodeStatus> = {};
+    for (const [draftId, runtimeId] of Object.entries(draftToRuntime)) {
+      result[draftId] = runtimeStatus[runtimeId] ?? "pending";
+    }
+    return result;
+  }, [draftToRuntime, runtimeNodes]);
+
+  const hasStatusOverlay = Object.keys(nodeStatuses).length > 0;
+
+  const { nodes, edges } = draft;
+
+  const idxMap = useMemo(
+    () => Object.fromEntries(nodes.map((n, i) => [n.id, i])),
+    [nodes],
+  );
+
+  const forwardEdges = useMemo(() => {
+    const fwd: { fromIdx: number; toIdx: number; fanCount: number; fanIndex: number; label?: string }[] = [];
+    const grouped = new Map<number, { toIdx: number; label?: string }[]>();
+    for (const e of edges) {
+      const fromIdx = idxMap[e.source];
+      const toIdx = idxMap[e.target];
+      if (fromIdx === undefined || toIdx === undefined) continue;
+      if (toIdx <= fromIdx) continue;
+      const list = grouped.get(fromIdx) || [];
+      list.push({ toIdx, label: e.label || (e.condition !== "on_success" && e.condition !== "always" ? e.condition : e.description || undefined) });
+      grouped.set(fromIdx, list);
+    }
+    for (const [fromIdx, targets] of grouped) {
+      targets.forEach((t, fi) => {
+        fwd.push({ fromIdx, toIdx: t.toIdx, fanCount: targets.length, fanIndex: fi, label: t.label });
+      });
+    }
+    return fwd;
+  }, [edges, idxMap]);
+
+  const backEdges = useMemo(() => {
+    const back: { fromIdx: number; toIdx: number }[] = [];
+    for (const e of edges) {
+      const fromIdx = idxMap[e.source];
+      const toIdx = idxMap[e.target];
+      if (fromIdx === undefined || toIdx === undefined) continue;
+      if (toIdx <= fromIdx) back.push({ fromIdx, toIdx });
+    }
+    return back;
+  }, [edges, idxMap]);
+
+  // Layer-based layout with parent-aware column placement
+  const layout = useMemo(() => {
+    if (nodes.length === 0) {
+      return { layers: [] as number[], nodeW: 200, firstColX: MARGIN_X, nodeXPositions: [] as number[] };
+    }
+
+    // Build parent and children maps
+    const parents = new Map<number, number[]>();
+    const children = new Map<number, number[]>();
+    nodes.forEach((_, i) => { parents.set(i, []); children.set(i, []); });
+    forwardEdges.forEach((e) => {
+      parents.get(e.toIdx)!.push(e.fromIdx);
+      children.get(e.fromIdx)!.push(e.toIdx);
+    });
+
+    // Assign layers (longest path from root)
+    const layers = new Array(nodes.length).fill(0);
+    for (let i = 0; i < nodes.length; i++) {
+      const pars = parents.get(i) || [];
+      if (pars.length > 0) {
+        layers[i] = Math.max(...pars.map((p) => layers[p])) + 1;
+      }
+    }
+
+    const layerGroups = new Map<number, number[]>();
+    layers.forEach((l, i) => {
+      const group = layerGroups.get(l) || [];
+      group.push(i);
+      layerGroups.set(l, group);
+    });
+
+    let maxCols = 1;
+    layerGroups.forEach((group) => {
+      maxCols = Math.max(maxCols, group.length);
+    });
+
+    // Compute node width
+    const backEdgeMargin = backEdges.length > 0 ? 30 + backEdges.length * 14 : 8;
+    const totalMargin = MARGIN_X * 2 + backEdgeMargin;
+    const availW = containerW - totalMargin;
+    const nodeW = Math.min(360, Math.floor((availW - (maxCols - 1) * GAP_X) / maxCols));
+
+    // Parent-aware column placement using fractional positions.
+    // Instead of snapping to a fixed grid, nodes inherit positions from parents
+    // and fan-out children spread around the parent's position.
+    const colPos = new Array(nodes.length).fill(0); // fractional column positions
+    const maxLayer = Math.max(...layers);
+
+    // Process layers top-down
+    for (let layer = 0; layer <= maxLayer; layer++) {
+      const group = layerGroups.get(layer) || [];
+      if (layer === 0) {
+        // Root layer: spread evenly across available columns
+        if (group.length === 1) {
+          colPos[group[0]] = (maxCols - 1) / 2;
+        } else {
+          const offset = (maxCols - group.length) / 2;
+          group.forEach((nodeIdx, i) => { colPos[nodeIdx] = offset + i; });
+        }
+        continue;
+      }
+
+      // For each node, compute ideal position from parents
+      const ideals: { idx: number; pos: number }[] = [];
+      for (const nodeIdx of group) {
+        const pars = parents.get(nodeIdx) || [];
+        if (pars.length === 0) {
+          ideals.push({ idx: nodeIdx, pos: (maxCols - 1) / 2 });
+          continue;
+        }
+        // Average parent column — weighted center
+        const avgCol = pars.reduce((s, p) => s + colPos[p], 0) / pars.length;
+
+        // If this node is one of multiple children of a parent, offset from center
+        // Find the parent with the most children to determine fan-out
+        let bestOffset = 0;
+        for (const p of pars) {
+          const siblings = (children.get(p) || []).filter(c => layers[c] === layer);
+          if (siblings.length > 1) {
+            const sibIdx = siblings.indexOf(nodeIdx);
+            if (sibIdx >= 0) {
+              bestOffset = sibIdx - (siblings.length - 1) / 2;
+              // Scale so siblings don't exceed available columns
+              bestOffset *= Math.min(1, (maxCols - 1) / Math.max(siblings.length - 1, 1));
+            }
+          }
+        }
+        ideals.push({ idx: nodeIdx, pos: avgCol + bestOffset });
+      }
+
+      // Sort by ideal position, then assign while preventing overlaps
+      ideals.sort((a, b) => a.pos - b.pos);
+
+      // Ensure minimum spacing of 1 column between nodes in the same layer
+      const assigned: number[] = [];
+      for (const item of ideals) {
+        let pos = item.pos;
+        // Clamp to valid range
+        pos = Math.max(0, Math.min(maxCols - 1, pos));
+        // Push right if overlapping previous
+        if (assigned.length > 0) {
+          const prev = assigned[assigned.length - 1];
+          if (pos < prev + 1) pos = prev + 1;
+        }
+        assigned.push(pos);
+        colPos[item.idx] = pos;
+      }
+
+      // If we pushed nodes too far right, shift the whole group left
+      const maxPos = assigned[assigned.length - 1];
+      if (maxPos > maxCols - 1) {
+        const shift = maxPos - (maxCols - 1);
+        for (const item of ideals) {
+          colPos[item.idx] = Math.max(0, colPos[item.idx] - shift);
+        }
+      }
+    }
+
+    // Convert fractional column positions to pixel X positions
+    const colSpacing = nodeW + GAP_X;
+    const usedMin = Math.min(...colPos);
+    const usedMax = Math.max(...colPos);
+    const usedSpan = usedMax - usedMin || 1;
+    const totalNodesW = usedSpan * colSpacing;
+    const firstColX = MARGIN_X + (availW - totalNodesW) / 2;
+
+    const nodeXPositions = colPos.map((c: number) => firstColX + (c - usedMin) * colSpacing);
+
+    return { layers, nodeW, firstColX, nodeXPositions };
+  }, [nodes, forwardEdges, backEdges.length, containerW]);
+
+  if (nodes.length === 0) {
+    return (
+      <div className="flex flex-col h-full">
+        <div className="px-4 pt-4 pb-2">
+          <p className="text-[11px] text-muted-foreground font-medium uppercase tracking-wider">
+            Draft
+          </p>
+        </div>
+        <div className="flex-1 flex items-center justify-center px-4">
+          <p className="text-xs text-muted-foreground/60 text-center italic">
+            No draft graph yet.
+            <br />
+            Describe your workflow to get started.
+          </p>
+        </div>
+      </div>
+    );
+  }
+
+  const { layers, nodeW, nodeXPositions } = layout;
+
+  const nodePos = (i: number) => ({
+    x: nodeXPositions[i],
+    y: TOP_Y + layers[i] * (NODE_H + GAP_Y),
+  });
+
+  const maxLayer = Math.max(...layers);
+  const svgHeight = TOP_Y + (maxLayer + 1) * NODE_H + maxLayer * GAP_Y + 16;
+
+  // Compute group areas for multi-node runtime groups
+  const groupAreas = useMemo(() => {
+    if (!flowchartMap || !runtimeNodes?.length) return [];
+    const groups: { runtimeId: string; label: string; draftIds: string[] }[] = [];
+    for (const [runtimeId, draftIds] of Object.entries(flowchartMap)) {
+      if (draftIds.length < 2) continue;
+      const rn = runtimeNodes.find(n => n.id === runtimeId);
+      groups.push({ runtimeId, label: rn?.label ?? runtimeId, draftIds });
+    }
+    return groups;
+  }, [flowchartMap, runtimeNodes]);
+
+  // Legend
+  const usedTypes = (() => {
+    const seen = new Map<string, { shape: string; color: string }>();
+    for (const n of nodes) {
+      if (!seen.has(n.flowchart_type)) {
+        seen.set(n.flowchart_type, { shape: n.flowchart_shape, color: n.flowchart_color });
+      }
+    }
+    return [...seen.entries()];
+  })();
+  const legendH = usedTypes.length * 18 + 20;
+  const totalH = svgHeight + legendH;
+
+  // Find hovered node for tooltip positioning
+  const hoveredNodeData = hoveredNode ? nodes.find(n => n.id === hoveredNode) : null;
+  const hoveredIdx = hoveredNode ? idxMap[hoveredNode] : -1;
+  const hoveredPos = hoveredIdx >= 0 ? nodePos(hoveredIdx) : null;
+
+  const renderEdge = (edge: typeof forwardEdges[number], i: number) => {
+    const from = nodePos(edge.fromIdx);
+    const to = nodePos(edge.toIdx);
+    const fromCenterX = from.x + nodeW / 2;
+    const toCenterX = to.x + nodeW / 2;
+    const y1 = from.y + NODE_H;
+    const y2 = to.y;
+
+    let startX = fromCenterX;
+    if (edge.fanCount > 1) {
+      const spread = nodeW * 0.4;
+      const step = edge.fanCount > 1 ? spread / (edge.fanCount - 1) : 0;
+      startX = fromCenterX - spread / 2 + edge.fanIndex * step;
+    }
+
+    const midY = (y1 + y2) / 2;
+    const d = `M ${startX} ${y1} C ${startX} ${midY}, ${toCenterX} ${midY}, ${toCenterX} ${y2}`;
+
+    return (
+      <g key={`fwd-${i}`}>
+        <path d={d} fill="none" stroke="hsl(220,10%,30%)" strokeWidth={1.2} />
+        <polygon
+          points={`${toCenterX - 3},${y2 - 5} ${toCenterX + 3},${y2 - 5} ${toCenterX},${y2 - 1}`}
+          fill="hsl(220,10%,35%)"
+        />
+        {edge.label && (
+          <text
+            x={(startX + toCenterX) / 2}
+            y={midY - 3}
+            fill="hsl(220,10%,45%)"
+            fontSize={9}
+            fontStyle="italic"
+            textAnchor="middle"
+          >
+            {truncateLabel(edge.label, 80, 9)}
+          </text>
+        )}
+      </g>
+    );
+  };
+
+  const renderBackEdge = (edge: typeof backEdges[number], i: number) => {
+    const from = nodePos(edge.fromIdx);
+    const to = nodePos(edge.toIdx);
+    const rightX = Math.max(from.x, to.x) + nodeW;
+    const rightOffset = 20 + i * 14;
+    const startX = from.x + nodeW;
+    const startY = from.y + NODE_H / 2;
+    const endX = to.x + nodeW;
+    const endY = to.y + NODE_H / 2;
+    const curveX = rightX + rightOffset;
+    const r = 10;
+
+    const path = `M ${startX} ${startY} C ${startX + r} ${startY}, ${curveX} ${startY}, ${curveX} ${startY - r} L ${curveX} ${endY + r} C ${curveX} ${endY}, ${endX + r} ${endY}, ${endX + 5} ${endY}`;
+
+    return (
+      <g key={`back-${i}`}>
+        <path d={path} fill="none" stroke="hsl(220,10%,25%)" strokeWidth={1.2} strokeDasharray="4 3" />
+        <polygon
+          points={`${endX + 5},${endY - 2.5} ${endX + 5},${endY + 2.5} ${endX},${endY}`}
+          fill="hsl(220,10%,30%)"
+        />
+      </g>
+    );
+  };
+
+  const STATUS_COLORS: Record<DraftNodeStatus, string> = {
+    running: "#F59E0B",  // amber
+    complete: "#22C55E", // green
+    error: "#EF4444",    // red
+    pending: "",         // no overlay
+  };
+
+  const renderNode = (node: DraftNode, i: number) => {
+    const pos = nodePos(i);
+    const isHovered = hoveredNode === node.id;
+    const status = nodeStatuses[node.id] as DraftNodeStatus | undefined;
+    const statusColor = status ? STATUS_COLORS[status] : "";
+    const fontSize = 13;
+    const labelAvailW = nodeW - 28;
+    const displayLabel = truncateLabel(node.name, labelAvailW, fontSize);
+    const descAvailW = nodeW - 24;
+    const descLabel = node.description
+      ? truncateLabel(node.description, descAvailW, 9.5)
+      : node.flowchart_type.replace(/_/g, " ");
+    const textX = pos.x + nodeW / 2;
+    const textY = pos.y + NODE_H / 2;
+
+    return (
+      <g
+        key={node.id}
+        onClick={() => {
+          if (hasStatusOverlay && onRuntimeNodeClick) {
+            const runtimeId = draftToRuntime[node.id];
+            if (runtimeId) onRuntimeNodeClick(runtimeId);
+          } else {
+            onNodeClick?.(node);
+          }
+        }}
+        onMouseEnter={() => setHoveredNode(node.id)}
+        onMouseLeave={() => setHoveredNode(null)}
+        style={{ cursor: "pointer" }}
+      >
+        <title>{`${node.name}\n${node.flowchart_type}`}</title>
+
+        {/* Status glow ring (runtime overlay) */}
+        {hasStatusOverlay && statusColor && (
+          <rect
+            x={pos.x - 3}
+            y={pos.y - 3}
+            width={nodeW + 6}
+            height={NODE_H + 6}
+            rx={8}
+            fill="none"
+            stroke={statusColor}
+            strokeWidth={2}
+            opacity={status === "running" ? 0.8 : 0.6}
+          >
+            {status === "running" && (
+              <animate attributeName="opacity" values="0.4;0.9;0.4" dur="1.5s" repeatCount="indefinite" />
+            )}
+          </rect>
+        )}
+
+        <FlowchartShape
+          shape={node.flowchart_shape}
+          x={pos.x}
+          y={pos.y}
+          w={nodeW}
+          h={NODE_H}
+          color={node.flowchart_color}
+          selected={isHovered}
+        />
+
+        <text
+          x={textX}
+          y={textY - 5}
+          fill={isHovered ? "hsl(0,0%,92%)" : "hsl(0,0%,78%)"}
+          fontSize={fontSize}
+          fontWeight={500}
+          textAnchor="middle"
+          dominantBaseline="middle"
+        >
+          {displayLabel}
+        </text>
+
+        <text
+          x={textX}
+          y={textY + 11}
+          fill="hsl(220,10%,50%)"
+          fontSize={9.5}
+          textAnchor="middle"
+          dominantBaseline="middle"
+        >
+          {descLabel}
+        </text>
+
+        {/* Status dot indicator */}
+        {hasStatusOverlay && statusColor && (
+          <circle
+            cx={pos.x + nodeW - 6}
+            cy={pos.y + 6}
+            r={4}
+            fill={statusColor}
+          >
+            {status === "running" && (
+              <animate attributeName="r" values="3;5;3" dur="1s" repeatCount="indefinite" />
+            )}
+          </circle>
+        )}
+      </g>
+    );
+  };
+
+  return (
+    <div className="flex flex-col h-full">
+      {/* Header */}
+      <div className="px-4 pt-3 pb-1.5 flex items-center gap-2">
+        <p className="text-[11px] text-muted-foreground font-medium uppercase tracking-wider">
+          {hasStatusOverlay ? "Flowchart" : "Draft"}
+        </p>
+        <span className={`text-[9px] font-mono font-medium rounded px-1 py-0.5 leading-none border ${hasStatusOverlay ? "text-emerald-500/60 border-emerald-500/20" : "text-amber-500/60 border-amber-500/20"}`}>
+          {hasStatusOverlay ? "live" : "planning"}
+        </span>
+      </div>
+
+      {/* Agent name + goal */}
+      <div className="px-4 pb-2.5 border-b border-border/20">
+        <p className="text-[11px] font-medium text-foreground/80 truncate">
+          {draft.agent_name}
+        </p>
+        {draft.goal && (
+          <p className="text-[10px] text-muted-foreground/60 mt-0.5 line-clamp-2 leading-snug">
+            {draft.goal}
+          </p>
+        )}
+      </div>
+
+      {/* Graph */}
+      <div ref={containerRef} className="flex-1 overflow-y-auto overflow-x-hidden px-2 pb-2 relative">
+        <svg
+          width="100%"
+          viewBox={`0 0 ${containerW} ${totalH}`}
+          preserveAspectRatio="xMidYMin meet"
+          className="select-none"
+          style={{ fontFamily: "'Inter', system-ui, sans-serif" }}
+        >
+          {/* Group areas — dashed boxes behind multi-node runtime groups */}
+          {groupAreas.map((group) => {
+            const memberIndices = group.draftIds
+              .map(id => idxMap[id])
+              .filter((idx): idx is number => idx !== undefined);
+            if (memberIndices.length < 2) return null;
+            const positions = memberIndices.map(i => nodePos(i));
+            const pad = 10;
+            const minX = Math.min(...positions.map(p => p.x)) - pad;
+            const minY = Math.min(...positions.map(p => p.y)) - pad - 14; // extra space for label
+            const maxX = Math.max(...positions.map(p => p.x + nodeW)) + pad;
+            const maxY = Math.max(...positions.map(p => p.y + NODE_H)) + pad;
+            return (
+              <g key={`group-${group.runtimeId}`}>
+                <rect
+                  x={minX}
+                  y={minY}
+                  width={maxX - minX}
+                  height={maxY - minY}
+                  rx={8}
+                  fill="hsl(220,15%,18%)"
+                  fillOpacity={0.35}
+                  stroke="hsl(220,10%,40%)"
+                  strokeWidth={1}
+                  strokeDasharray="5 3"
+                />
+                <text
+                  x={minX + 8}
+                  y={minY + 11}
+                  fill="hsl(220,10%,50%)"
+                  fontSize={9}
+                  fontWeight={500}
+                >
+                  {truncateLabel(group.label, maxX - minX - 16, 9)}
+                </text>
+              </g>
+            );
+          })}
+
+          {forwardEdges.map((e, i) => renderEdge(e, i))}
+          {backEdges.map((e, i) => renderBackEdge(e, i))}
+          {nodes.map((n, i) => renderNode(n, i))}
+
+          {/* Legend */}
+          <g transform={`translate(${MARGIN_X}, ${svgHeight + 4})`}>
+            <text fill="hsl(220,10%,40%)" fontSize={9} fontWeight={600} y={4}>
+              LEGEND
+            </text>
+            {usedTypes.map(([type, meta], i) => (
+              <g key={type} transform={`translate(0, ${14 + i * 18})`}>
+                <FlowchartShape
+                  shape={meta.shape}
+                  x={0}
+                  y={0}
+                  w={16}
+                  h={12}
+                  color={meta.color}
+                  selected={false}
+                />
+                <text x={22} y={9} fill="hsl(220,10%,55%)" fontSize={9.5}>
+                  {type.replace(/_/g, " ")}
+                </text>
+              </g>
+            ))}
+          </g>
+        </svg>
+
+        {/* HTML tooltip — rendered outside SVG so it's not clipped */}
+        {hoveredNodeData && hoveredPos && (
+          <Tooltip
+            node={hoveredNodeData}
+            style={{
+              left: 8,
+              right: 8,
+              // Position below the hovered node, scaled to container width
+              top: `calc(${((hoveredPos.y + NODE_H + 4) / totalH) * 100}%)`,
+            }}
+          />
+        )}
+      </div>
+    </div>
+  );
+}
@@ -0,0 +1,215 @@
+import { useState, useRef, useEffect, useCallback } from "react";
+import { Send, MessageCircleQuestion, X } from "lucide-react";
+
+export interface QuestionItem {
+  id: string;
+  prompt: string;
+  options?: string[];
+}
+
+export interface MultiQuestionWidgetProps {
+  questions: QuestionItem[];
+  onSubmit: (answers: Record<string, string>) => void;
+  onDismiss?: () => void;
+}
+
+export default function MultiQuestionWidget({ questions, onSubmit, onDismiss }: MultiQuestionWidgetProps) {
+  // Per-question state: selected index (null = nothing, options.length = "Other")
+  const [selections, setSelections] = useState<(number | null)[]>(
+    () => questions.map(() => null),
+  );
+  const [customTexts, setCustomTexts] = useState<string[]>(
+    () => questions.map(() => ""),
+  );
+  const [submitted, setSubmitted] = useState(false);
+  const containerRef = useRef<HTMLDivElement>(null);
+
+  // Scroll the first unanswered question into view when it changes
+  useEffect(() => {
+    containerRef.current?.scrollTo({ top: 0, behavior: "smooth" });
+  }, []);
+
+  const canSubmit = questions.every((q, i) => {
+    const sel = selections[i];
+    if (sel === null) return false;
+    const isOther = q.options ? sel === q.options.length : true;
+    if (isOther && !customTexts[i].trim()) return false;
+    return true;
+  });
+
+  const handleSubmit = useCallback(() => {
+    if (!canSubmit || submitted) return;
+    setSubmitted(true);
+    const answers: Record<string, string> = {};
+    for (let i = 0; i < questions.length; i++) {
+      const q = questions[i];
+      const sel = selections[i]!;
+      const isOther = q.options ? sel === q.options.length : true;
+      answers[q.id] = isOther ? customTexts[i].trim() : q.options![sel];
+    }
+    onSubmit(answers);
+  }, [canSubmit, submitted, questions, selections, customTexts, onSubmit]);
+
+  // Enter to submit (only when not focused on a text input)
+  useEffect(() => {
+    const handleKeyDown = (e: KeyboardEvent) => {
+      if (submitted) return;
+      const target = e.target as HTMLElement;
+      const inInput = target.tagName === "INPUT" || target.tagName === "TEXTAREA";
+      if (e.key === "Enter" && !e.shiftKey && !inInput) {
+        e.preventDefault();
+        handleSubmit();
+      }
+    };
+    window.addEventListener("keydown", handleKeyDown);
+    return () => window.removeEventListener("keydown", handleKeyDown);
+  }, [handleSubmit, submitted]);
+
+  if (submitted) return null;
+
+  const answeredCount = selections.filter((s) => s !== null).length;
+
+  return (
+    <div className="p-4">
+      <div className="bg-card border border-border rounded-xl shadow-sm overflow-hidden">
+        {/* Header */}
+        <div className="px-5 pt-4 pb-2 flex items-center gap-3">
+          <div className="w-7 h-7 rounded-lg bg-primary/10 border border-primary/20 flex items-center justify-center flex-shrink-0">
+            <MessageCircleQuestion className="w-3.5 h-3.5 text-primary" />
+          </div>
+          <div className="flex-1 min-w-0">
+            <p className="text-sm font-medium text-foreground">
+              {questions.length} questions
+            </p>
+            <p className="text-[11px] text-muted-foreground">
+              {answeredCount}/{questions.length} answered
+            </p>
+          </div>
+          {onDismiss && (
+            <button
+              onClick={onDismiss}
+              className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors flex-shrink-0"
+            >
+              <X className="w-4 h-4" />
+            </button>
+          )}
+        </div>
+
+        {/* Questions */}
+        <div
+          ref={containerRef}
+          className="px-5 pb-3 space-y-4 max-h-[400px] overflow-y-auto"
+        >
+          {questions.map((q, qi) => {
+            const sel = selections[qi];
+            const hasOptions = q.options && q.options.length >= 2;
+            const otherIndex = hasOptions ? q.options!.length : 0;
+            const isOtherSelected = sel === otherIndex;
+
+            return (
+              <div key={q.id} className="space-y-1.5">
+                <p className="text-sm font-medium text-foreground">
+                  <span className="text-xs text-muted-foreground mr-1.5">
+                    {qi + 1}.
+                  </span>
+                  {q.prompt}
+                </p>
+
+                {hasOptions ? (
+                  <>
+                    {q.options!.map((opt, oi) => (
+                      <button
+                        key={oi}
+                        onClick={() => {
+                          setSelections((prev) => {
+                            const next = [...prev];
+                            next[qi] = oi;
+                            return next;
+                          });
+                        }}
+                        className={`w-full text-left px-4 py-2 rounded-lg border text-sm transition-colors ${
+                          sel === oi
+                            ? "border-primary bg-primary/10 text-foreground"
+                            : "border-border/60 bg-muted/20 text-foreground hover:border-primary/40 hover:bg-muted/40"
+                        }`}
+                      >
+                        {opt}
+                      </button>
+                    ))}
+                    <input
+                      type="text"
+                      value={customTexts[qi]}
+                      onFocus={() => {
+                        setSelections((prev) => {
+                          const next = [...prev];
+                          next[qi] = otherIndex;
+                          return next;
+                        });
+                      }}
+                      onChange={(e) => {
+                        setSelections((prev) => {
+                          const next = [...prev];
+                          next[qi] = otherIndex;
+                          return next;
+                        });
+                        setCustomTexts((prev) => {
+                          const next = [...prev];
+                          next[qi] = e.target.value;
+                          return next;
+                        });
+                      }}
+                      placeholder="Type a custom response..."
+                      className={`w-full px-4 py-2 rounded-lg border border-dashed text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none ${
+                        isOtherSelected
+                          ? "border-primary bg-primary/10 text-foreground"
+                          : "border-border text-muted-foreground hover:border-primary/40"
+                      }`}
+                    />
+                  </>
+                ) : (
+                  <input
+                    type="text"
+                    value={customTexts[qi]}
+                    onFocus={() => {
+                      setSelections((prev) => {
+                        const next = [...prev];
+                        next[qi] = 0;
+                        return next;
+                      });
+                    }}
+                    onChange={(e) => {
+                      setSelections((prev) => {
+                        const next = [...prev];
+                        next[qi] = 0;
+                        return next;
+                      });
+                      setCustomTexts((prev) => {
+                        const next = [...prev];
+                        next[qi] = e.target.value;
+                        return next;
+                      });
+                    }}
+                    placeholder="Type your answer..."
+                    className="w-full px-4 py-2 rounded-lg border text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none border-border text-foreground hover:border-primary/40 focus:border-primary"
+                  />
+                )}
+              </div>
+            );
+          })}
+        </div>
+
+        {/* Submit */}
+        <div className="px-5 pb-4">
+          <button
+            onClick={handleSubmit}
+            disabled={!canSubmit}
+            className="w-full flex items-center justify-center gap-2 py-2.5 rounded-lg text-sm font-medium bg-primary text-primary-foreground hover:bg-primary/90 disabled:opacity-30 disabled:cursor-not-allowed transition-colors"
+          >
+            <Send className="w-3.5 h-3.5" />
+            Submit All
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
@@ -121,7 +121,8 @@ export function sseEventToChatMessage(
        id: `paused-${event.execution_id}`,
        agent: "System",
        agentColor: "",
-        content: "Execution paused by user",
+        content:
+          (event.data?.reason as string) || "Execution paused",
        timestamp: "",
        type: "system",
        thread,
@@ -3,6 +3,7 @@ import ReactDOM from "react-dom";
 import { useSearchParams, useNavigate } from "react-router-dom";
 import { Plus, KeyRound, Sparkles, Layers, ChevronLeft, Bot, Loader2, WifiOff, X } from "lucide-react";
 import AgentGraph, { type GraphNode, type NodeStatus } from "@/components/AgentGraph";
+import DraftGraph from "@/components/DraftGraph";
 import ChatPanel, { type ChatMessage } from "@/components/ChatPanel";
 import TopBar from "@/components/TopBar";
 import { TAB_STORAGE_KEY, loadPersistedTabs, savePersistedTabs, type PersistedTabState } from "@/lib/tab-persistence";
@@ -13,7 +14,7 @@ import { executionApi } from "@/api/execution";
 import { graphsApi } from "@/api/graphs";
 import { sessionsApi } from "@/api/sessions";
 import { useMultiSSE } from "@/hooks/use-sse";
-import type { LiveSession, AgentEvent, DiscoverEntry, Message, NodeSpec } from "@/api/types";
+import type { LiveSession, AgentEvent, DiscoverEntry, Message, NodeSpec, DraftGraph as DraftGraphData } from "@/api/types";
 import { backendMessageToChatMessage, sseEventToChatMessage, formatAgentDisplayName } from "@/lib/chat-helpers";
 import { topologyToGraphNodes } from "@/lib/graph-converter";
 import { ApiError } from "@/api/client";
@@ -255,8 +256,14 @@ interface AgentBackendState {
  /** The message ID of the current worker input request (for inline reply box) */
  workerInputMessageId: string | null;
  queenBuilding: boolean;
-  /** Queen operating phase — "building" (coding), "staging" (loaded), or "running" (executing) */
-  queenPhase: "building" | "staging" | "running";
+  /** Queen operating phase — "planning" (design), "building" (coding), "staging" (loaded), or "running" (executing) */
+  queenPhase: "planning" | "building" | "staging" | "running";
+  /** Draft graph from planning phase (before code generation) */
+  draftGraph: DraftGraphData | null;
+  /** Original draft (pre-dissolution) for flowchart display during runtime */
+  originalDraft: DraftGraphData | null;
+  /** Runtime node ID → list of original draft node IDs it absorbed */
+  flowchartMap: Record<string, string[]> | null;
  workerRunState: "idle" | "deploying" | "running";
  currentExecutionId: string | null;
  nodeLogs: Record<string, string[]>;
@@ -270,10 +277,14 @@ interface AgentBackendState {
  workerIsTyping: boolean;
  llmSnapshots: Record<string, string>;
  activeToolCalls: Record<string, { name: string; done: boolean; streamId: string }>;
+  /** Agent folder path — set after scaffolding, used for credential queries */
+  agentPath: string | null;
  /** Structured question text from ask_user with options */
  pendingQuestion: string | null;
  /** Predefined choices from ask_user (1-3 items); UI appends "Other" */
  pendingOptions: string[] | null;
+  /** Multiple questions from ask_user_multiple */
+  pendingQuestions: { id: string; prompt: string; options?: string[] }[] | null;
  /** Whether the pending question came from queen or worker */
  pendingQuestionSource: "queen" | "worker" | null;
 }
@@ -291,7 +302,11 @@ function defaultAgentState(): AgentBackendState {
    awaitingInput: false,
    workerInputMessageId: null,
    queenBuilding: false,
-    queenPhase: "building",
+    queenPhase: "planning",
+    draftGraph: null,
+    originalDraft: null,
+    flowchartMap: null,
+    agentPath: null,
    workerRunState: "idle",
    currentExecutionId: null,
    nodeLogs: {},
@@ -305,6 +320,7 @@ function defaultAgentState(): AgentBackendState {
    activeToolCalls: {},
    pendingQuestion: null,
    pendingOptions: null,
+    pendingQuestions: null,
    pendingQuestionSource: null,
  };
 }
@@ -892,7 +908,7 @@ export default function Workspace() {
      // failed, the throw inside the catch exits the outer try block.
      const session = liveSession!;
      const displayName = formatAgentDisplayName(session.worker_name || agentType);
-      const initialPhase = session.queen_phase || (session.has_worker ? "staging" : "building");
+      const initialPhase = session.queen_phase || (session.has_worker ? "staging" : "planning");
      updateAgentState(agentType, {
        sessionId: session.session_id,
        displayName,
@@ -1056,6 +1072,39 @@ export default function Workspace() {
    }
  }, [agentStates, fetchGraphForAgent]);

+  // --- Fetch draft graph when a session is in planning phase ---
+  // Covers initial load, tab switches, reconnects, and cold restores.
+  const fetchedDraftSessionsRef = useRef<Set<string>>(new Set());
+  const fetchedFlowchartMapSessionsRef = useRef<Set<string>>(new Set());
+  useEffect(() => {
+    for (const [agentType, state] of Object.entries(agentStates)) {
+      if (!state.sessionId || !state.ready) continue;
+
+      if (state.queenPhase === "planning") {
+        // Fetch draft graph for planning phase
+        if (state.draftGraph) continue;
+        if (fetchedDraftSessionsRef.current.has(state.sessionId)) continue;
+        fetchedDraftSessionsRef.current.add(state.sessionId);
+        graphsApi.draftGraph(state.sessionId).then(({ draft }) => {
+          if (draft) updateAgentState(agentType, { draftGraph: draft });
+        }).catch(() => {});
+      } else {
+        // Fetch flowchart map for non-planning phases (staging, running, building)
+        if (state.originalDraft) continue; // already have it
+        if (fetchedFlowchartMapSessionsRef.current.has(state.sessionId)) continue;
+        fetchedFlowchartMapSessionsRef.current.add(state.sessionId);
+        graphsApi.flowchartMap(state.sessionId).then(({ map, original_draft }) => {
+          if (original_draft) {
+            updateAgentState(agentType, {
+              flowchartMap: map,
+              originalDraft: original_draft,
+            });
+          }
+        }).catch(() => {});
+      }
+    }
+  }, [agentStates, updateAgentState]);
+
  // Poll entry points every second for agents with timers to keep
  // next_fire_in countdowns fresh without re-fetching the full topology.
  useEffect(() => {
@@ -1310,6 +1359,7 @@ export default function Workspace() {
              activeToolCalls: {},
              pendingQuestion: null,
              pendingOptions: null,
+              pendingQuestions: null,
              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping", "complete", "error"], "pending");
@@ -1339,6 +1389,7 @@ export default function Workspace() {
              llmSnapshots: {},
              pendingQuestion: null,
              pendingOptions: null,
+              pendingQuestions: null,
              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping"], "complete");
@@ -1388,9 +1439,13 @@ export default function Workspace() {
            console.log('[CLIENT_INPUT_REQ] stream_id:', streamId, 'isQueen:', isQueen, 'node_id:', event.node_id, 'prompt:', (event.data?.prompt as string)?.slice(0, 80), 'agentType:', agentType);
            const rawOptions = event.data?.options;
            const options = Array.isArray(rawOptions) ? (rawOptions as string[]) : null;
+            const rawQuestions = event.data?.questions;
+            const questions = Array.isArray(rawQuestions)
+              ? (rawQuestions as { id: string; prompt: string; options?: string[] }[])
+              : null;
            if (isQueen) {
              const prompt = (event.data?.prompt as string) || "";
-              const isAutoBlock = !prompt && !options;
+              const isAutoBlock = !prompt && !options && !questions;
              // Queen auto-block (empty prompt, no options) should not
              // overwrite a pending worker question — the worker's
              // QuestionWidget must stay visible.  Use the updater form
@@ -1421,6 +1476,7 @@ export default function Workspace() {
                    queenBuilding: false,
                    pendingQuestion: prompt || null,
                    pendingOptions: options,
+                    pendingQuestions: questions,
                    pendingQuestionSource: "queen",
                  }
                };
@@ -1460,14 +1516,14 @@ export default function Workspace() {
            }
          }
          if (event.type === "execution_paused") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              markAllNodesAs(agentType, ["running", "looping"], "pending");
            }
          }
          if (event.type === "execution_failed") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              if (event.node_id) {
@@ -1500,9 +1556,9 @@ export default function Workspace() {
        case "node_loop_iteration":
          turnCounterRef.current[turnKey] = currentTurn + 1;
          if (isQueen) {
-            updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+            updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
          } else {
-            updateAgentState(agentType, { isStreaming: false, workerIsTyping: true, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+            updateAgentState(agentType, { isStreaming: false, workerIsTyping: true, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
          }
          if (!isQueen && event.node_id) {
            const pendingText = agentStates[agentType]?.llmSnapshots[event.node_id];
@@ -1788,14 +1844,62 @@ export default function Workspace() {

        case "queen_phase_changed": {
          const rawPhase = event.data?.phase as string;
-          const newPhase: "building" | "staging" | "running" =
-            rawPhase === "running" ? "running" : rawPhase === "staging" ? "staging" : "building";
+          const eventAgentPath = (event.data?.agent_path as string) || null;
+          const newPhase: "planning" | "building" | "staging" | "running" =
+            rawPhase === "running" ? "running"
+            : rawPhase === "staging" ? "staging"
+            : rawPhase === "planning" ? "planning"
+            : "building";
          updateAgentState(agentType, {
            queenPhase: newPhase,
            queenBuilding: newPhase === "building",
            // Sync workerRunState so the RunButton reflects the phase
            workerRunState: newPhase === "running" ? "running" : "idle",
+            // Clear draft graph once we leave planning; also clear dedup refs
+            // so re-entering planning or re-fetching flowchart map works
+            ...(newPhase !== "planning" ? { draftGraph: null } : { originalDraft: null, flowchartMap: null }),
+            // Store agent path for credential queries
+            ...(eventAgentPath ? { agentPath: eventAgentPath } : {}),
          });
+          {
+            const sid = agentStates[agentType]?.sessionId;
+            if (sid) {
+              if (newPhase !== "planning") {
+                fetchedDraftSessionsRef.current.delete(sid);
+                fetchedFlowchartMapSessionsRef.current.delete(sid);
+                // Fetch the flowchart map (original draft + dissolution mapping)
+                graphsApi.flowchartMap(sid).then(({ map, original_draft }) => {
+                  updateAgentState(agentType, {
+                    flowchartMap: map,
+                    originalDraft: original_draft,
+                  });
+                }).catch(() => {});
+              } else {
+                fetchedDraftSessionsRef.current.delete(sid);
+                fetchedFlowchartMapSessionsRef.current.delete(sid);
+              }
+            }
+          }
+          break;
+        }
+
+        case "draft_graph_updated": {
+          // The draft dict is published directly as event.data (not nested under a key)
+          const draft = event.data as unknown as DraftGraphData | undefined;
+          if (draft?.nodes) {
+            updateAgentState(agentType, { draftGraph: draft });
+          }
+          break;
+        }
+
+        case "flowchart_map_updated": {
+          const mapData = event.data as { map?: Record<string, string[]>; original_draft?: DraftGraphData } | undefined;
+          if (mapData) {
+            updateAgentState(agentType, {
+              flowchartMap: mapData.map ?? null,
+              originalDraft: mapData.original_draft ?? null,
+            });
+          }
          break;
        }

@@ -1926,7 +2030,7 @@ export default function Workspace() {
            s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
          ),
        }));
-        updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+        updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
        executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
          const errMsg = err instanceof Error ? err.message : String(err);
          const errorChatMsg: ChatMessage = {
@@ -1948,7 +2052,7 @@ export default function Workspace() {

    // If queen has a pending question widget, dismiss it when user types directly
    if (agentStates[activeWorker]?.pendingQuestionSource === "queen") {
-      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
    }

    const userMsg: ChatMessage = {
@@ -2015,7 +2119,7 @@ export default function Workspace() {
    }));

    // Clear awaiting state optimistically
-    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });

    executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
      const errMsg = err instanceof Error ? err.message : String(err);
@@ -2043,7 +2147,7 @@ export default function Workspace() {

    if (isOther) {
      // "Other" free-text → route through queen for evaluation
-      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
      if (question && opts && state?.sessionId && state?.ready) {
        const formatted = `[Worker asked: "${question}" | Options: ${opts.join(", ")}]\nUser answered: "${answer}"`;
        const userMsg: ChatMessage = {
@@ -2089,10 +2193,23 @@ export default function Workspace() {
  // --- handleQueenQuestionAnswer: submit queen's own question answer via /chat ---
  // The queen asked the question herself, so she already has context — just send the raw answer.
  const handleQueenQuestionAnswer = useCallback((answer: string, _isOther: boolean) => {
-    updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestions: null, pendingQuestionSource: null });
    handleSend(answer, activeWorker);
  }, [activeWorker, handleSend, updateAgentState]);

+  // --- handleMultiQuestionAnswer: submit answers to ask_user_multiple ---
+  const handleMultiQuestionAnswer = useCallback((answers: Record<string, string>) => {
+    updateAgentState(activeWorker, {
+      pendingQuestion: null, pendingOptions: null,
+      pendingQuestions: null, pendingQuestionSource: null,
+    });
+    // Format as structured text the LLM can parse
+    const lines = Object.entries(answers).map(
+      ([id, answer]) => `[${id}]: ${answer}`,
+    );
+    handleSend(lines.join("\n"), activeWorker);
+  }, [activeWorker, handleSend, updateAgentState]);
+
  // --- handleQuestionDismiss: user closed the question widget without answering ---
  // Injects a dismiss signal so the blocked node can continue.
  const handleQuestionDismiss = useCallback(() => {
@@ -2105,6 +2222,7 @@ export default function Workspace() {
    updateAgentState(activeWorker, {
      pendingQuestion: null,
      pendingOptions: null,
+      pendingQuestions: null,
      pendingQuestionSource: null,
      awaitingInput: false,
    });
@@ -2368,18 +2486,32 @@ export default function Workspace() {
      <div className="flex flex-1 min-h-0">

        {/* ── Pipeline graph + chat ──────────────────────────────────── */}
-        <div className="w-[300px] min-w-[240px] bg-card/30 flex flex-col border-r border-border/30">
+        <div className={`${(activeAgentState?.queenPhase === "planning" && activeAgentState?.draftGraph) || activeAgentState?.originalDraft ? "w-[500px] min-w-[400px]" : "w-[300px] min-w-[240px]"} bg-card/30 flex flex-col border-r border-border/30 transition-[width] duration-200`}>
          <div className="flex-1 min-h-0">
-            <AgentGraph
-              nodes={currentGraph.nodes}
-              title={currentGraph.title}
-              onNodeClick={(node) => setSelectedNode(prev => prev?.id === node.id ? null : node)}
-              onRun={handleRun}
-              onPause={handlePause}
-              runState={activeAgentState?.workerRunState ?? "idle"}
-              building={activeAgentState?.queenBuilding ?? false}
-              queenPhase={activeAgentState?.queenPhase ?? "building"}
-            />
+            {activeAgentState?.queenPhase === "planning" && activeAgentState.draftGraph ? (
+              <DraftGraph draft={activeAgentState.draftGraph} />
+            ) : activeAgentState?.originalDraft ? (
+              <DraftGraph
+                draft={activeAgentState.originalDraft}
+                flowchartMap={activeAgentState.flowchartMap ?? undefined}
+                runtimeNodes={currentGraph.nodes}
+                onRuntimeNodeClick={(runtimeNodeId) => {
+                  const node = currentGraph.nodes.find(n => n.id === runtimeNodeId);
+                  if (node) setSelectedNode(prev => prev?.id === node.id ? null : node);
+                }}
+              />
+            ) : (
+              <AgentGraph
+                nodes={currentGraph.nodes}
+                title={currentGraph.title}
+                onNodeClick={(node) => setSelectedNode(prev => prev?.id === node.id ? null : node)}
+                onRun={handleRun}
+                onPause={handlePause}
+                runState={activeAgentState?.workerRunState ?? "idle"}
+                building={activeAgentState?.queenBuilding ?? false}
+                queenPhase={activeAgentState?.queenPhase ?? "building"}
+              />
+            )}
          </div>
        </div>
        <div className="flex-1 min-w-0 flex">
@@ -2451,11 +2583,13 @@ export default function Workspace() {
                queenPhase={activeAgentState?.queenPhase ?? "building"}
                pendingQuestion={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestion : null}
                pendingOptions={activeAgentState?.awaitingInput ? activeAgentState.pendingOptions : null}
+                pendingQuestions={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestions : null}
                onQuestionSubmit={
                  activeAgentState?.pendingQuestionSource === "queen"
                    ? handleQueenQuestionAnswer
                    : handleWorkerQuestionAnswer
                }
+                onMultiQuestionSubmit={handleMultiQuestionAnswer}
                onQuestionDismiss={handleQuestionDismiss}
              />
            )}
@@ -2543,7 +2677,7 @@ export default function Workspace() {
      <CredentialsModal
        agentType={activeWorker}
        agentLabel={activeWorkerLabel}
-        agentPath={credentialAgentPath || (!activeWorker.startsWith("new-agent") ? activeWorker : undefined)}
+        agentPath={credentialAgentPath || activeAgentState?.agentPath || (!activeWorker.startsWith("new-agent") ? activeWorker : undefined)}
        open={credentialsOpen}
        onClose={() => { setCredentialsOpen(false); setCredentialAgentPath(null); setDismissedBanner(null); }}
        credentials={activeSession?.credentials || []}
@@ -11,12 +11,10 @@ dependencies = [
  "litellm>=1.81.0",
  "mcp>=1.0.0",
  "fastmcp>=2.0.0",
-  "textual>=1.0.0",
  "tools",
 ]

 [project.optional-dependencies]
-tui = ["textual>=0.75.0"]
 webhook = ["aiohttp>=3.9.0"]
 server = ["aiohttp>=3.9.0"]
 testing = [
@@ -1,90 +0,0 @@
-"""Tests for ChatTextArea key handling (Enter submits, Shift+Enter / Ctrl+J insert newlines)."""
-
-import pytest
-from textual.app import App, ComposeResult
-
-from framework.tui.widgets.chat_repl import ChatTextArea
-
-
-class ChatTextAreaApp(App):
-    """Minimal app that mounts a ChatTextArea for testing."""
-
-    submitted_texts: list[str]
-
-    def compose(self) -> ComposeResult:
-        yield ChatTextArea(id="input")
-
-    def on_mount(self) -> None:
-        self.submitted_texts = []
-
-    def on_chat_text_area_submitted(self, message: ChatTextArea.Submitted) -> None:
-        self.submitted_texts.append(message.text)
-
-
-@pytest.fixture
-def app():
-    return ChatTextAreaApp()
-
-
-@pytest.mark.asyncio
-async def test_enter_submits_text(app):
-    """Pressing Enter should post a Submitted message and clear the widget."""
-    async with app.run_test() as pilot:
-        await pilot.press("h", "e", "l", "l", "o")
-        await pilot.press("enter")
-
-    assert app.submitted_texts == ["hello"]
-
-
-@pytest.mark.asyncio
-async def test_enter_on_empty_does_not_submit(app):
-    """Pressing Enter with no text should not post a Submitted message."""
-    async with app.run_test() as pilot:
-        await pilot.press("enter")
-
-    assert app.submitted_texts == []
-
-
-@pytest.mark.asyncio
-async def test_shift_enter_inserts_newline(app):
-    """Shift+Enter should insert a newline, not submit."""
-    async with app.run_test() as pilot:
-        widget = app.query_one("#input", ChatTextArea)
-
-        await pilot.press("a")
-        await pilot.press("shift+enter")
-        await pilot.press("b")
-
-    assert app.submitted_texts == []
-    assert "\n" in widget.text
-    assert widget.text.startswith("a")
-    assert widget.text.endswith("b")
-
-
-@pytest.mark.asyncio
-async def test_ctrl_j_inserts_newline(app):
-    """Ctrl+J should insert a newline (fallback for terminals without Shift+Enter)."""
-    async with app.run_test() as pilot:
-        widget = app.query_one("#input", ChatTextArea)
-
-        await pilot.press("a")
-        await pilot.press("ctrl+j")
-        await pilot.press("b")
-
-    assert app.submitted_texts == []
-    assert "\n" in widget.text
-    assert widget.text.startswith("a")
-    assert widget.text.endswith("b")
-
-
-@pytest.mark.asyncio
-async def test_multiline_submit(app):
-    """Typing multiline text via Ctrl+J then pressing Enter should submit all lines."""
-    async with app.run_test() as pilot:
-        await pilot.press("a")
-        await pilot.press("ctrl+j")
-        await pilot.press("b")
-        await pilot.press("enter")
-
-    assert len(app.submitted_texts) == 1
-    assert app.submitted_texts[0] == "a\nb"
@@ -572,7 +572,7 @@ async def test_event_loop_conversation_compaction():
    judge = CountingJudge(retry_count=3)
    node = EventLoopNode(
        judge=judge,
-        config=LoopConfig(max_iterations=10, max_history_tokens=200),
+        config=LoopConfig(max_iterations=10, max_context_tokens=200),
    )
    result = await node.execute(ctx)

@@ -763,7 +763,7 @@ class TestClientFacingBlocking:
 class TestEscalate:
    @pytest.mark.asyncio
    async def test_escalate_emits_event(self, runtime, node_spec, memory):
-        """escalate() should publish ESCALATION_REQUESTED."""
+        """escalate() should publish ESCALATION_REQUESTED and block for queen guidance."""
        node_spec.output_keys = []
        llm = MockStreamingLLM(
            scenarios=[
@@ -772,7 +772,6 @@ class TestEscalate:
                    {
                        "reason": "tool failure",
                        "context": "HTTP 401 from upstream",
-                        "wait_for_response": False,
                    },
                    tool_use_id="escalate_1",
                ),
@@ -789,7 +788,20 @@ class TestEscalate:

        ctx = build_ctx(runtime, node_spec, memory, llm, stream_id="worker")
        node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
+
+        async def queen_reply():
+            await asyncio.sleep(0.05)
+            await node.inject_event("Acknowledged, proceed.")
+
+        task = asyncio.create_task(queen_reply())
+
+        async def queen_reply():
+            await asyncio.sleep(0.05)
+            await node.inject_event("Acknowledged, proceed.")
+
+        task = asyncio.create_task(queen_reply())
        result = await node.execute(ctx)
+        await task

        assert result.success is True
        assert len(received) == 1
@@ -808,7 +820,6 @@ class TestEscalate:
                    {
                        "reason": "blocked",
                        "context": "dependency missing",
-                        "wait_for_response": False,
                    },
                    tool_use_id="escalate_1",
                ),
@@ -827,7 +838,14 @@ class TestEscalate:

        ctx = build_ctx(runtime, node_spec, memory, llm, stream_id="worker")
        node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
+
+        async def queen_reply():
+            await asyncio.sleep(0.05)
+            await node.inject_event("Queen acknowledges escalation.")
+
+        task = asyncio.create_task(queen_reply())
        result = await node.execute(ctx)
+        await task

        assert result.success is True
        queen_node.inject_event.assert_awaited_once()
@@ -842,7 +860,7 @@ class TestEscalate:

    @pytest.mark.asyncio
    async def test_escalate_waits_for_queen_input_and_skips_judge(self, runtime, node_spec, memory):
-        """wait_for_response=true should block for queen input before judge evaluation."""
+        """escalate() should block for queen input before judge evaluation."""
        node_spec.output_keys = ["result"]
        llm = MockStreamingLLM(
            scenarios=[
@@ -851,7 +869,6 @@ class TestEscalate:
                    {
                        "reason": "need direction",
                        "context": "conflicting constraints",
-                        "wait_for_response": True,
                    },
                    tool_use_id="escalate_1",
                ),
@@ -1756,9 +1773,9 @@ class TestIsToolDoomLoop:

    def test_different_args_no_doom(self):
        node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
-        fp1 = [("search", '{"q": "a"}')]
-        fp2 = [("search", '{"q": "b"}')]
-        fp3 = [("search", '{"q": "c"}')]
+        fp1 = [("search", '{"q": "deploy kubernetes cluster to production"}')]
+        fp2 = [("read_file", '{"path": "/etc/nginx/nginx.conf"}')]
+        fp3 = [("execute", '{"command": "SELECT * FROM users WHERE active=true"}')]
        is_doom, _ = node._is_tool_doom_loop([fp1, fp2, fp3])
        assert is_doom is False

@@ -1886,6 +1903,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_threshold=3,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -1941,6 +1959,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_threshold=3,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -2005,6 +2024,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_threshold=3,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -2056,6 +2076,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_enabled=False,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -2144,6 +2165,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_threshold=3,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -2206,6 +2228,7 @@ class TestToolDoomLoopIntegration:
            config=LoopConfig(
                max_iterations=10,
                tool_doom_loop_threshold=3,
+                stall_similarity_threshold=1.0,  # disable fuzzy stall detection
            ),
        )
        result = await node.execute(ctx)
@@ -40,16 +40,3 @@ class TestMCPDependencies:
        from mcp.server import FastMCP

        assert FastMCP is not None
-
-
-class TestMCPPackageExports:
-    """Tests for the framework.mcp package exports."""
-
-    def test_package_importable(self):
-        """Test that framework.mcp package can be imported."""
-        if not MCP_AVAILABLE:
-            pytest.skip(MCP_SKIP_REASON)
-
-        import framework.mcp
-
-        assert framework.mcp is not None
@@ -204,8 +204,8 @@ class TestNodeConversation:

    @pytest.mark.asyncio
    async def test_usage_ratio(self):
-        """usage_ratio returns estimate / max_history_tokens."""
-        conv = NodeConversation(max_history_tokens=1000)
+        """usage_ratio returns estimate / max_context_tokens."""
+        conv = NodeConversation(max_context_tokens=1000)
        await conv.add_user_message("a" * 400)
        assert conv.usage_ratio() == pytest.approx(0.1)  # 100/1000

@@ -214,15 +214,15 @@ class TestNodeConversation:

    @pytest.mark.asyncio
    async def test_usage_ratio_zero_budget(self):
-        """usage_ratio returns 0 when max_history_tokens is 0 (unlimited)."""
-        conv = NodeConversation(max_history_tokens=0)
+        """usage_ratio returns 0 when max_context_tokens is 0 (unlimited)."""
+        conv = NodeConversation(max_context_tokens=0)
        await conv.add_user_message("a" * 400)
        assert conv.usage_ratio() == 0.0

    @pytest.mark.asyncio
    async def test_needs_compaction_with_actual_tokens(self):
        """needs_compaction uses actual API token count when available."""
-        conv = NodeConversation(max_history_tokens=1000, compaction_threshold=0.8)
+        conv = NodeConversation(max_context_tokens=1000, compaction_threshold=0.8)
        await conv.add_user_message("a" * 100)  # chars/4 = 25, well under 800

        assert conv.needs_compaction() is False
@@ -233,7 +233,7 @@ class TestNodeConversation:

    @pytest.mark.asyncio
    async def test_needs_compaction(self):
-        conv = NodeConversation(max_history_tokens=100, compaction_threshold=0.8)
+        conv = NodeConversation(max_context_tokens=100, compaction_threshold=0.8)
        await conv.add_user_message("x" * 320)
        assert conv.needs_compaction() is True

@@ -457,7 +457,7 @@ class TestPersistence:
        store = MockConversationStore()
        assert await NodeConversation.restore(store) is None

-        conv = NodeConversation(system_prompt="hello", max_history_tokens=500, store=store)
+        conv = NodeConversation(system_prompt="hello", max_context_tokens=500, store=store)
        await conv.add_user_message("u1")
        await conv.add_assistant_message("a1")

@@ -643,7 +643,7 @@ class TestConversationIntegration:
        store = FileConversationStore(base)
        conv = NodeConversation(
            system_prompt="You are a helpful travel agent.",
-            max_history_tokens=16000,
+            max_context_tokens=16000,
            store=store,
        )

@@ -1314,7 +1314,7 @@ class TestLlmCompact:
        """Create a minimal EventLoopNode for testing."""
        from framework.graph.event_loop_node import EventLoopNode, LoopConfig

-        config = LoopConfig(max_history_tokens=32000)
+        config = LoopConfig(max_context_tokens=32000)
        node = EventLoopNode.__new__(EventLoopNode)
        node._config = config
        node._event_bus = None
@@ -970,13 +970,13 @@ class TestEscalationFlow:
        )

    @pytest.mark.asyncio
-    async def test_wait_for_response_emits_client_events(
+    async def test_wait_for_response_emits_escalation_event(
        self,
        runtime,
        parent_node_spec,
        subagent_node_spec,
    ):
-        """Escalation should emit CLIENT_OUTPUT_DELTA and CLIENT_INPUT_REQUESTED events."""
+        """Escalation should emit ESCALATION_REQUESTED to the queen."""
        from framework.graph.event_loop_node import _EscalationReceiver

        bus = EventBus()
@@ -986,7 +986,7 @@ class TestEscalationFlow:
            bus_events.append(event)

        bus.subscribe(
-            event_types=[EventType.CLIENT_OUTPUT_DELTA, EventType.CLIENT_INPUT_REQUESTED],
+            event_types=[EventType.ESCALATION_REQUESTED],
            handler=handler,
        )

@@ -1034,16 +1034,12 @@ class TestEscalationFlow:
        await node._execute_subagent(ctx, "researcher", "Navigate page with CAPTCHA")
        await injector

-        # Should have emitted both events
-        output_deltas = [e for e in bus_events if e.type == EventType.CLIENT_OUTPUT_DELTA]
-        input_requests = [e for e in bus_events if e.type == EventType.CLIENT_INPUT_REQUESTED]
+        # Should have emitted ESCALATION_REQUESTED
+        escalation_events = [e for e in bus_events if e.type == EventType.ESCALATION_REQUESTED]

-        assert len(output_deltas) >= 1, "Should emit CLIENT_OUTPUT_DELTA with the message"
-        assert output_deltas[0].data["content"] == "CAPTCHA detected on page"
-        assert output_deltas[0].node_id == "parent"  # Shows as parent talking
-
-        assert len(input_requests) >= 1, "Should emit CLIENT_INPUT_REQUESTED for routing"
-        assert ":escalation:" in input_requests[0].node_id  # Escalation ID for routing
+        assert len(escalation_events) >= 1, "Should emit ESCALATION_REQUESTED"
+        assert escalation_events[0].data["context"] == "CAPTCHA detected on page"
+        assert ":escalation:" in escalation_events[0].node_id

    @pytest.mark.asyncio
    async def test_non_blocking_report_still_works(
@@ -3,9 +3,8 @@
 Tests the FULL routing chain:
  ExecutionStream → GraphExecutor → EventLoopNode → _execute_subagent
  → _report_callback registers _EscalationReceiver in executor.node_registry
-  → emit CLIENT_INPUT_REQUESTED with escalation_id
-  → subscriber calls stream.inject_input(escalation_id, "done")
-  → ExecutionStream finds _EscalationReceiver in executor.node_registry
+  → emit ESCALATION_REQUESTED (queen handles the escalation)
+  → queen inject_worker_message() finds _EscalationReceiver via get_waiting_nodes()
  → receiver.inject_event("done") unblocks the subagent
  → subagent continues and completes
 """
@@ -227,26 +226,30 @@ async def test_escalation_e2e_through_execution_stream(tmp_path):
    stream_holder: list[ExecutionStream] = []

    async def escalation_handler(event: AgentEvent):
-        """Simulate a TUI/runner: when CLIENT_INPUT_REQUESTED arrives with
-        an escalation node_id, inject the user's response via the stream."""
+        """Simulate the queen: when ESCALATION_REQUESTED arrives,
+        find the waiting receiver and inject the response via the stream."""
        all_events.append(event)
-        if event.type == EventType.CLIENT_INPUT_REQUESTED:
-            node_id = event.node_id
-            if ":escalation:" in node_id:
-                escalation_events.append(event)
-                # Small delay to simulate user typing
-                await asyncio.sleep(0.05)
-                # Route through the REAL inject_input chain
-                stream = stream_holder[0]
-                success = await stream.inject_input(node_id, "done logging in")
-                assert success, (
-                    f"inject_input({node_id!r}) returned False — "
-                    "escalation receiver not found in executor.node_registry"
-                )
-                inject_called.set()
+        if event.type == EventType.ESCALATION_REQUESTED:
+            escalation_events.append(event)
+            # Small delay to simulate queen processing
+            await asyncio.sleep(0.05)
+            # Route through the REAL inject_input chain — find the waiting
+            # escalation receiver via get_waiting_nodes() (mirrors what
+            # inject_worker_message does in the queen lifecycle tools).
+            stream = stream_holder[0]
+            waiting = stream.get_waiting_nodes()
+            assert waiting, "Should have a waiting escalation receiver"
+            target_node_id = waiting[0]["node_id"]
+            assert ":escalation:" in target_node_id
+            success = await stream.inject_input(target_node_id, "done logging in")
+            assert success, (
+                f"inject_input({target_node_id!r}) returned False — "
+                "escalation receiver not found in executor.node_registry"
+            )
+            inject_called.set()

    bus.subscribe(
-        event_types=[EventType.CLIENT_INPUT_REQUESTED, EventType.CLIENT_OUTPUT_DELTA],
+        event_types=[EventType.ESCALATION_REQUESTED],
        handler=escalation_handler,
    )

@@ -297,17 +300,7 @@ async def test_escalation_e2e_through_execution_stream(tmp_path):
    # 3. Escalation event has correct structure
    esc_event = escalation_events[0]
    assert ":escalation:" in esc_event.node_id
-    assert esc_event.data["prompt"] == "Login required for LinkedIn. Please log in manually."
-
-    # 4. CLIENT_OUTPUT_DELTA was emitted for the escalation message
-    output_deltas = [
-        e
-        for e in all_events
-        if e.type == EventType.CLIENT_OUTPUT_DELTA and "Login required" in e.data.get("content", "")
-    ]
-    assert len(output_deltas) >= 1, (
-        "Should have emitted CLIENT_OUTPUT_DELTA with escalation message"
-    )
+    assert esc_event.data["context"] == "Login required for LinkedIn. Please log in manually."

    # 5. The parent node got the subagent's result
    assert "result" in result.output
@@ -444,7 +437,7 @@ async def test_escalation_cleanup_after_completion(tmp_path):
    stream_holder: list[ExecutionStream] = []

    async def auto_respond(event: AgentEvent):
-        if event.type == EventType.CLIENT_INPUT_REQUESTED and ":escalation:" in event.node_id:
+        if event.type == EventType.ESCALATION_REQUESTED:
            stream = stream_holder[0]

            # Snapshot the active executor's node_registry BEFORE responding
@@ -462,10 +455,13 @@ async def test_escalation_cleanup_after_completion(tmp_path):
                )

            await asyncio.sleep(0.02)
-            await stream.inject_input(event.node_id, "ok")
+            # Find the waiting escalation receiver and inject response
+            waiting = stream.get_waiting_nodes()
+            if waiting:
+                await stream.inject_input(waiting[0]["node_id"], "ok")

    bus.subscribe(
-        event_types=[EventType.CLIENT_INPUT_REQUESTED],
+        event_types=[EventType.ESCALATION_REQUESTED],
        handler=auto_respond,
    )

@@ -23,7 +23,7 @@ Done. For details, prerequisites, and troubleshooting, read on.

 ## What you get after setup

- **coder-tools** – Create and manage agents (scaffolding via `initialize_agent_package`, file I/O, tool discovery).
+- **coder-tools** – Create and manage agents (scaffolding via `initialize_and_build_agent`, file I/O, tool discovery).
 - **tools** – File operations, web search, and other agent tools.
 - **Documentation** – Guided docs for building and testing agents.

@@ -130,7 +130,7 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
 }
 ```

-The `coder-tools` server provides agent scaffolding via `initialize_agent_package` and related tools. The `tools` MCP server exposes tools including web search, PDF reading, CSV processing, and file system operations.
+The `coder-tools` server provides agent scaffolding via `initialize_and_build_agent` and related tools. The `tools` MCP server exposes tools including web search, PDF reading, CSV processing, and file system operations.

 ## Storage

@@ -172,7 +172,7 @@ Add to `.vscode/settings.json`:
 ## Security Best Practices

 1. **Never commit API keys** - Use environment variables or `.env` files
-2. **`.env` is git-ignored** - Copy `.env.example` to `.env` at the project root and fill in your values
+2. **If you use a local `.env` file, keep it private** - This repository does not include a root `.env.example`; use your own local `.env` file or shell environment variables for secrets
 3. **Use real provider keys in non-production environments** - validate configuration with low-risk inputs before production rollout
 4. **Credential isolation** - Each tool validates its own credentials at runtime

@@ -244,7 +244,7 @@ The fastest way to build agents is with the configured MCP workflow:
 ./quickstart.sh

 # Build a new agent
-Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_agent_package)
+Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_and_build_agent)
 ```

 ### Agent Development Workflow
@@ -252,7 +252,7 @@ Use the coder-tools MCP tools from your IDE agent chat (e.g., initialize_agent_p
 1. **Define Your Goal**

   ```
-   Use the coder-tools initialize_agent_package tool
+   Use the coder-tools initialize_and_build_agent tool
   Enter goal: "Build an agent that processes customer support tickets"
   ```

@@ -555,7 +555,7 @@ uv add <package>

 ```bash
 # Option 1: Use Claude Code skill (recommended)
-Use the coder-tools initialize_agent_package tool
+Use the coder-tools initialize_and_build_agent tool

 # Option 2: Create manually
 # Note: exports/ is initially empty (gitignored). Create your agent directory:
@@ -0,0 +1,597 @@
+# Draft Flowchart System — Complete Reference
+
+The draft flowchart system bridges user-facing workflow design (planning phase) and the runtime agent graph (execution phase). During planning, the queen agent creates an ISO 5807 flowchart that the user reviews. On approval, decision nodes are dissolved into runtime-compatible structures, and the original flowchart is preserved for live status overlay during execution.
+
+---
+
+## Architecture Overview
+
+```
+Planning Phase                    Build Gate                     Runtime Phase
+─────────────────────────────────────────────────────────────────────────────
+
+Queen LLM                      confirm_and_build()              Graph Executor
+    │                                │                               │
+    ▼                                ▼                               ▼
+save_agent_draft()        ┌──────────────────────┐          Node execution
+    │                     │ dissolve_decision_nodes│          with status
+    ▼                     │                        │               │
+DraftGraph (SSE) ────►    │  Decision diamonds     │               ▼
+    │                     │  merged into           │          Flowchart Map
+    ▼                     │  predecessor criteria   │          inverts to
+Frontend renders          │                        │          overlay status
+ISO 5807 flowchart        │  Original draft        │          on original
+with diamond              │  preserved             │          flowchart
+decisions                 │                        │
+                          └──────────────────────┘
+```
+
+**Key files:**
+- Backend: `core/framework/tools/queen_lifecycle_tools.py` — draft creation, classification, dissolution
+- Backend: `core/framework/server/routes_graphs.py` — REST endpoints
+- Frontend: `core/frontend/src/components/DraftGraph.tsx` — SVG flowchart renderer
+- Frontend: `core/frontend/src/api/types.ts` — TypeScript interfaces
+- Frontend: `core/frontend/src/pages/workspace.tsx` — state management and conditional rendering
+
+---
+
+## 1. JSON Schemas
+
+### Tool: `save_agent_draft` — Input Schema
+
+```json
+{
+  "type": "object",
+  "required": ["agent_name", "goal", "nodes"],
+  "properties": {
+    "agent_name": {
+      "type": "string",
+      "description": "Snake_case name for the agent (e.g. 'lead_router_agent')"
+    },
+    "goal": {
+      "type": "string",
+      "description": "High-level goal description for the agent"
+    },
+    "description": {
+      "type": "string",
+      "description": "Brief description of what the agent does"
+    },
+    "nodes": {
+      "type": "array",
+      "description": "Graph nodes. Only 'id' is required; all other fields are optional hints.",
+      "items": { "$ref": "#/$defs/DraftNode" }
+    },
+    "edges": {
+      "type": "array",
+      "description": "Connections between nodes. Auto-generated as linear if omitted.",
+      "items": { "$ref": "#/$defs/DraftEdge" }
+    },
+    "terminal_nodes": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Node IDs that are terminal (end) nodes. Auto-detected from edges if omitted."
+    },
+    "success_criteria": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Agent-level success criteria"
+    },
+    "constraints": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Agent-level constraints"
+    }
+  }
+}
+```
+
+### Node Schema (`DraftNode`)
+
+```json
+{
+  "type": "object",
+  "required": ["id"],
+  "properties": {
+    "id": {
+      "type": "string",
+      "description": "Kebab-case node identifier (e.g. 'enrich-lead')"
+    },
+    "name": {
+      "type": "string",
+      "description": "Human-readable display name. Defaults to id if omitted."
+    },
+    "description": {
+      "type": "string",
+      "description": "What this node does (business logic). Used for auto-classification."
+    },
+    "node_type": {
+      "type": "string",
+      "enum": ["event_loop", "gcu"],
+      "default": "event_loop",
+      "description": "Runtime node type. 'gcu' maps to browser automation."
+    },
+    "flowchart_type": {
+      "type": "string",
+      "enum": [
+        "start", "terminal", "process", "decision",
+        "io", "document", "multi_document",
+        "subprocess", "preparation",
+        "manual_input", "manual_operation",
+        "delay", "display",
+        "database", "stored_data", "internal_storage",
+        "connector", "offpage_connector",
+        "merge", "extract", "sort", "collate",
+        "summing_junction", "or",
+        "browser", "comment", "alternate_process"
+      ],
+      "description": "ISO 5807 flowchart symbol. Auto-detected if omitted."
+    },
+    "tools": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Planned tool names (hints for scaffolder, not validated)"
+    },
+    "input_keys": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Expected input memory keys"
+    },
+    "output_keys": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Expected output memory keys"
+    },
+    "success_criteria": {
+      "type": "string",
+      "description": "What success looks like for this node"
+    },
+    "decision_clause": {
+      "type": "string",
+      "description": "For decision nodes only: the yes/no question to evaluate (e.g. 'Is amount > $100?'). During dissolution, this becomes the predecessor node's success_criteria."
+    }
+  }
+}
+```
+
+### Edge Schema (`DraftEdge`)
+
+```json
+{
+  "type": "object",
+  "required": ["source", "target"],
+  "properties": {
+    "source": {
+      "type": "string",
+      "description": "Source node ID"
+    },
+    "target": {
+      "type": "string",
+      "description": "Target node ID"
+    },
+    "condition": {
+      "type": "string",
+      "enum": ["always", "on_success", "on_failure", "conditional", "llm_decide"],
+      "default": "on_success",
+      "description": "Edge traversal condition"
+    },
+    "description": {
+      "type": "string",
+      "description": "Human-readable description of when this edge is taken"
+    },
+    "label": {
+      "type": "string",
+      "description": "Short label shown on the flowchart edge (e.g. 'Yes', 'No', 'Retry')"
+    }
+  }
+}
+```
+
+### Output: Enriched Draft Graph Object
+
+After `save_agent_draft` processes the input, it stores and emits an enriched draft with auto-classified flowchart metadata. This is the structure sent via the `draft_graph_updated` SSE event and returned by `GET /api/sessions/{id}/draft-graph`.
+
+```json
+{
+  "agent_name": "lead_router_agent",
+  "goal": "Enrich and route incoming leads",
+  "description": "Automated lead enrichment and routing agent",
+  "success_criteria": ["Lead score calculated", "Correct tier assigned"],
+  "constraints": ["Apollo enrichment required before routing"],
+  "entry_node": "intake",
+  "terminal_nodes": ["route"],
+  "nodes": [
+    {
+      "id": "intake",
+      "name": "Intake",
+      "description": "Fetch contact from HubSpot",
+      "node_type": "event_loop",
+      "tools": ["hubspot_get_contact"],
+      "input_keys": ["contact_id"],
+      "output_keys": ["contact_data", "domain"],
+      "success_criteria": "Contact data retrieved",
+      "decision_clause": "",
+      "sub_agents": [],
+      "flowchart_type": "start",
+      "flowchart_shape": "stadium",
+      "flowchart_color": "#4CAF50"
+    },
+    {
+      "id": "check-tier",
+      "name": "Check Tier",
+      "description": "",
+      "node_type": "event_loop",
+      "decision_clause": "Is lead score > 80?",
+      "flowchart_type": "decision",
+      "flowchart_shape": "diamond",
+      "flowchart_color": "#FF9800"
+    }
+  ],
+  "edges": [
+    {
+      "id": "edge-0",
+      "source": "intake",
+      "target": "check-tier",
+      "condition": "on_success",
+      "description": "",
+      "label": ""
+    },
+    {
+      "id": "edge-1",
+      "source": "check-tier",
+      "target": "enrich",
+      "condition": "on_success",
+      "description": "",
+      "label": "Yes"
+    },
+    {
+      "id": "edge-2",
+      "source": "check-tier",
+      "target": "route",
+      "condition": "on_failure",
+      "description": "",
+      "label": "No"
+    }
+  ],
+  "flowchart_legend": {
+    "start":    { "shape": "stadium",    "color": "#4CAF50" },
+    "terminal": { "shape": "stadium",    "color": "#F44336" },
+    "process":  { "shape": "rectangle",  "color": "#2196F3" },
+    "decision": { "shape": "diamond",    "color": "#FF9800" }
+  }
+}
+```
+
+**Enriched fields** (added by backend to every node during classification):
+
+| Field | Type | Description |
+|---|---|---|
+| `flowchart_type` | `string` | The resolved ISO 5807 symbol type |
+| `flowchart_shape` | `string` | SVG shape identifier for the frontend renderer |
+| `flowchart_color` | `string` | Hex color code for the symbol |
+
+### Flowchart Map Object
+
+Returned by `GET /api/sessions/{id}/flowchart-map` after `confirm_and_build()` dissolves decision nodes:
+
+```json
+{
+  "map": {
+    "intake": ["intake", "check-tier"],
+    "enrich": ["enrich"],
+    "route": ["route"]
+  },
+  "original_draft": { "...original draft graph before dissolution..." }
+}
+```
+
+- `map`: Keys are runtime node IDs, values are lists of original draft node IDs that the runtime node absorbed.
+- `original_draft`: The complete draft graph as it existed before dissolution, preserved for flowchart display.
+- Both fields are `null` if no dissolution has occurred yet.
+
+---
+
+## 2. ISO 5807 Flowchart Types
+
+### Core Symbols
+
+| Type | Shape | Color | SVG Primitive | Description |
+|---|---|---|---|---|
+| `start` | stadium | `#4CAF50` green | `<rect rx={h/2}>` | Entry point / start terminator |
+| `terminal` | stadium | `#F44336` red | `<rect rx={h/2}>` | End point / stop terminator |
+| `process` | rectangle | `#2196F3` blue | `<rect rx={4}>` | General processing step |
+| `decision` | diamond | `#FF9800` amber | `<polygon>` 4-point | Branching / conditional logic |
+| `io` | parallelogram | `#9C27B0` purple | `<polygon>` skewed | Data input or output |
+| `document` | document | `#607D8B` blue-grey | `<path>` wavy bottom | Single document output |
+| `multi_document` | multi_document | `#78909C` blue-grey | stacked `<rect>` + `<path>` | Multiple documents |
+| `subprocess` | subroutine | `#009688` teal | `<rect>` + inner `<line>` | Predefined process / sub-agent |
+| `preparation` | hexagon | `#795548` brown | `<polygon>` 6-point | Setup / initialization step |
+| `manual_input` | manual_input | `#E91E63` pink | `<polygon>` sloped top | Manual data entry |
+| `manual_operation` | trapezoid | `#AD1457` dark pink | `<polygon>` tapered bottom | Human-in-the-loop / approval |
+| `delay` | delay | `#FF5722` deep orange | `<path>` D-shape | Wait / pause / cooldown |
+| `display` | display | `#00BCD4` cyan | `<path>` pointed left | Display / render output |
+
+### Data Storage Symbols
+
+| Type | Shape | Color | SVG Primitive | Description |
+|---|---|---|---|---|
+| `database` | cylinder | `#8BC34A` light green | `<path>` + `<ellipse>` top/bottom | Database / direct access storage |
+| `stored_data` | stored_data | `#CDDC39` lime | `<path>` curved left | Generic data store |
+| `internal_storage` | internal_storage | `#FFC107` amber | `<rect>` + internal `<line>` grid | Internal memory / cache |
+
+### Connectors
+
+| Type | Shape | Color | SVG Primitive | Description |
+|---|---|---|---|---|
+| `connector` | circle | `#9E9E9E` grey | `<circle>` | On-page connector |
+| `offpage_connector` | pentagon | `#757575` dark grey | `<polygon>` 5-point | Off-page connector |
+
+### Flow Operations
+
+| Type | Shape | Color | SVG Primitive | Description |
+|---|---|---|---|---|
+| `merge` | triangle_inv | `#3F51B5` indigo | `<polygon>` inverted | Merge multiple flows |
+| `extract` | triangle | `#5C6BC0` indigo light | `<polygon>` upward | Extract / split flow |
+| `sort` | hourglass | `#7986CB` indigo lighter | `<polygon>` X-shape | Sort operation |
+| `collate` | hourglass_inv | `#9FA8DA` indigo lightest | `<polygon>` X-shape inv | Collate operation |
+| `summing_junction` | circle_cross | `#F06292` pink light | `<circle>` + cross `<line>` | Summing junction |
+| `or` | circle_bar | `#CE93D8` purple light | `<circle>` + plus `<line>` | Logical OR |
+
+### Domain-Specific (Hive)
+
+| Type | Shape | Color | SVG Primitive | Description |
+|---|---|---|---|---|
+| `browser` | hexagon | `#1A237E` dark indigo | `<polygon>` 6-point | Browser automation (GCU node) |
+| `comment` | flag | `#BDBDBD` light grey | `<path>` notched right | Annotation / comment |
+| `alternate_process` | rounded_rect | `#42A5F5` light blue | `<rect rx={12}>` | Alternate process variant |
+
+---
+
+## 3. Auto-Classification Priority
+
+When `flowchart_type` is omitted from a node, the backend classifies it automatically using this priority (function `_classify_flowchart_node` in `queen_lifecycle_tools.py`):
+
+1. **Explicit override** — if `flowchart_type` is set and valid, use it
+2. **Node type** — `gcu` nodes become `browser`
+3. **Position** — first node becomes `start`
+4. **Terminal detection** — nodes in `terminal_nodes` (or with no outgoing edges) become `terminal`
+5. **Branching structure** — nodes with 2+ outgoing edges with different conditions become `decision`
+6. **Sub-agents** — nodes with `sub_agents` become `subprocess`
+7. **Tool heuristics** — tool names match known patterns:
+   - DB tools (`query_database`, `sql_query`, `read_table`, etc.) → `database`
+   - Doc tools (`generate_report`, `create_document`, etc.) → `document`
+   - I/O tools (`send_email`, `post_to_slack`, `fetch_url`, etc.) → `io`
+   - Display tools (`serve_file_to_user`, `display_results`) → `display`
+8. **Description keyword heuristics**:
+   - `"manual"`, `"approval"`, `"human review"` → `manual_operation`
+   - `"setup"`, `"prepare"`, `"configure"` → `preparation`
+   - `"wait"`, `"delay"`, `"pause"` → `delay`
+   - `"merge"`, `"combine"`, `"aggregate"` → `merge`
+   - `"display"`, `"show"`, `"render"` → `display`
+   - `"database"`, `"data store"`, `"persist"` → `database`
+   - `"report"`, `"document"`, `"summary"` → `document`
+   - `"deliver"`, `"send"`, `"notify"` → `io`
+9. **Default** — `process` (blue rectangle)
+
+---
+
+## 4. Decision Node Dissolution
+
+When `confirm_and_build()` is called, decision nodes (flowchart diamonds) are dissolved into runtime-compatible structures by `_dissolve_decision_nodes()`. Decision nodes are a **planning-only** concept — they don't exist in the runtime graph.
+
+### Algorithm
+
+```
+For each decision node D (in topological order):
+  1. Find predecessors P via incoming edges
+  2. Find yes-target and no-target via outgoing edges
+     - Yes: edge with label "Yes"/"True"/"Pass" or condition "on_success"
+     - No:  edge with label "No"/"False"/"Fail" or condition "on_failure"
+     - Fallback: first outgoing = yes, second = no
+  3. Get decision clause: D.decision_clause || D.description || D.name
+  4. For each predecessor P:
+     - Append clause to P.success_criteria
+     - Remove edge P → D
+     - Add edge P → yes_target (on_success)
+     - Add edge P → no_target (on_failure)
+  5. Remove D and all its edges from the graph
+  6. Record absorption: flowchart_map[P.id] = [P.id, D.id]
+```
+
+### Edge Cases
+
+| Case | Behavior |
+|---|---|
+| **Decision at start** (no predecessor) | Converted to a process node with `success_criteria` = clause; outgoing edges rewired to `on_success`/`on_failure` |
+| **Chained decisions** (A → D1 → D2 → B) | Processed in order. D1 dissolves into A. D2's predecessor is now A, so D2 also dissolves into A. Map: `A → [A, D1, D2]` |
+| **Multiple predecessors** | Each predecessor gets its own copy of the yes/no edges |
+| **Existing success_criteria on predecessor** | Appended with `"; then evaluate: <clause>"` |
+| **Decision with >2 outgoing edges** | First classified yes/no pair is used; remaining edges are preserved |
+
+### Example
+
+**Input (planning flowchart):**
+```
+[Fetch Billing Data] → <Amount > $100?> → Yes → [Generate PDF Receipt]
+                                         → No  → [Draft Email Receipt]
+```
+
+**Output (runtime graph):**
+```
+[Fetch Billing Data] → on_success → [Generate PDF Receipt]
+                     → on_failure → [Draft Email Receipt]
+  success_criteria: "Amount > $100?"
+```
+
+**Flowchart map:**
+```json
+{
+  "fetch-billing-data": ["fetch-billing-data", "amount-gt-100"],
+  "generate-pdf-receipt": ["generate-pdf-receipt"],
+  "draft-email-receipt": ["draft-email-receipt"]
+}
+```
+
+The runtime Level 2 judge evaluates the decision clause against the node's conversation. `NodeResult.success = true` routes via `on_success` (yes), `false` routes via `on_failure` (no).
+
+---
+
+## 5. Frontend Rendering
+
+### Component: `DraftGraph.tsx`
+
+An SVG-based flowchart renderer that operates in two modes:
+
+1. **Planning mode** — renders the draft graph with ISO 5807 shapes during the planning phase
+2. **Runtime overlay mode** — renders the original (pre-dissolution) draft with live execution status when `flowchartMap` and `runtimeNodes` props are provided
+
+#### Props
+
+```typescript
+interface DraftGraphProps {
+  draft: DraftGraphData;                          // The draft graph to render
+  onNodeClick?: (node: DraftNode) => void;        // Node click handler
+  flowchartMap?: Record<string, string[]>;         // Runtime → draft node mapping
+  runtimeNodes?: GraphNode[];                      // Live runtime graph nodes with status
+}
+```
+
+#### Layout Engine
+
+The layout algorithm arranges nodes in layers based on graph topology:
+
+1. **Layer assignment**: Each node's layer = max(parent layers) + 1. Root nodes are layer 0.
+2. **Column assignment**: Within each layer, nodes are sorted by parent column average and centered.
+3. **Node sizing**: `nodeW = min(360, availableWidth / maxColumns)` — nodes fill available space up to 360px.
+4. **Container measurement**: A `ResizeObserver` measures the actual container width so SVG viewBox coordinates match CSS pixels 1:1.
+
+```
+Constants:
+  NODE_H   = 52px    (node height)
+  GAP_Y    = 48px    (vertical gap between layers)
+  GAP_X    = 16px    (horizontal gap between columns)
+  MARGIN_X = 16px    (left/right margin)
+  TOP_Y    = 28px    (top padding)
+```
+
+#### Shape Rendering
+
+The `FlowchartShape` component renders each ISO 5807 shape as SVG primitives. Each shape receives:
+- `x, y, w, h` — bounding box in SVG units
+- `color` — the hex color from the flowchart type
+- `selected` — hover state (increases fill opacity from 18% to 28%, brightens stroke)
+
+All shapes use `strokeWidth={1.2}` to prevent overflow on hover.
+
+#### Edge Rendering
+
+**Forward edges** (source layer < target layer):
+- Rendered as cubic bezier curves from source bottom-center to target top-center
+- Fan-out: when a node has multiple outgoing edges, start points spread across 40% of node width
+- Labels shown at the midpoint (from `edge.label`, or condition/description fallback)
+
+**Back edges** (source layer >= target layer):
+- Rendered as dashed arcs that loop right of the graph
+- Each back edge gets a unique offset to prevent overlap
+
+#### Node Labels
+
+Each node displays two lines of text:
+- **Primary**: Node name (font size 13, truncated to fit `nodeW - 28px`)
+- **Secondary**: Node description or flowchart type (font size 9.5, truncated to fit `nodeW - 24px`)
+
+Truncation uses `avgCharWidth = fontSize * 0.58` to estimate available characters.
+
+#### Tooltip
+
+An HTML overlay (not SVG) positioned below hovered nodes, showing:
+- Node description
+- Tools list (`Tools: tool_a, tool_b`)
+- Success criteria (`Criteria: ...`)
+
+#### Legend
+
+A dynamic legend at the bottom of the SVG listing all flowchart types used in the current draft, with their shape and color.
+
+### Runtime Status Overlay
+
+When `flowchartMap` and `runtimeNodes` are provided, the component computes per-node statuses:
+
+1. **Invert the map**: `flowchartMap` maps `runtime_id → [draft_ids]`; inversion gives `draft_id → runtime_id`
+2. **Map runtime status**: For each runtime node, classify status as `running` (amber), `complete` (green), `error` (red), or `pending` (no overlay)
+3. **Render overlays**:
+   - **Glow ring**: A pulsing amber `<rect>` around running nodes, solid green/red for complete/error
+   - **Status dot**: A small `<circle>` in the top-right corner with animated radius for running nodes
+4. **Header**: Changes from "Draft / planning" to "Flowchart / live"
+
+```typescript
+// Status color mapping
+const STATUS_COLORS = {
+  running:  "#F59E0B",  // amber — pulsing glow
+  complete: "#22C55E",  // green — solid ring
+  error:    "#EF4444",  // red   — solid ring
+  pending:  "",         // no overlay
+};
+```
+
+### Workspace Integration (`workspace.tsx`)
+
+The workspace conditionally renders `DraftGraph` in three scenarios:
+
+| Condition | Renders | Panel Width |
+|---|---|---|
+| `queenPhase === "planning"` and `draftGraph` exists | `<DraftGraph draft={draftGraph} />` | 500px |
+| `originalDraft` exists (post-planning) | `<DraftGraph draft={originalDraft} flowchartMap={...} runtimeNodes={...} />` | 500px |
+| Neither | `<AgentGraph ... />` (runtime pipeline view) | 300px |
+
+**State management:**
+- `draftGraph`: Set by `draft_graph_updated` SSE event during planning; cleared on phase change
+- `originalDraft` + `flowchartMap`: Fetched from `GET /api/sessions/{id}/flowchart-map` when phase transitions away from planning
+
+---
+
+## 6. Events & API
+
+### SSE Event: `draft_graph_updated`
+
+Emitted when `save_agent_draft` completes. The full draft graph object is the event `data` payload.
+
+```
+event: message
+data: {"type": "draft_graph_updated", "stream_id": "queen", "data": { ...draft graph object... }, ...}
+```
+
+### REST Endpoints
+
+**`GET /api/sessions/{session_id}/draft-graph`**
+
+Returns the current draft graph from planning phase.
+```json
+{"draft": <DraftGraph object>}
+// or
+{"draft": null}
+```
+
+**`GET /api/sessions/{session_id}/flowchart-map`**
+
+Returns the flowchart-to-runtime mapping and original draft (available after `confirm_and_build()`).
+```json
+{
+  "map": { "runtime-node-id": ["draft-node-a", "draft-node-b"], ... },
+  "original_draft": { ...original DraftGraph before dissolution... }
+}
+// or
+{"map": null, "original_draft": null}
+```
+
+---
+
+## 7. Phase Gate
+
+The draft graph is part of a two-step gate controlling the planning → building transition:
+
+1. **`save_agent_draft()`** — creates the draft, classifies nodes, emits `draft_graph_updated`
+2. User reviews the rendered flowchart (with decision diamonds, edge labels, color-coded shapes)
+3. **`confirm_and_build()`** — dissolves decision nodes, preserves original draft, builds flowchart map, sets `build_confirmed = true`
+4. **`initialize_and_build_agent()`** — checks `build_confirmed` before proceeding; passes the dissolved (decision-free) draft to the scaffolder for pre-population
+
+The scaffolder never sees decision nodes — it receives a clean graph with only runtime-compatible node types where branching is expressed through `success_criteria` + `on_success`/`on_failure` edges.
@@ -180,7 +180,7 @@ MCP tools are also available in Cursor. To enable:

 **Claude Code:**
 ```
-Use the coder-tools initialize_agent_package tool to scaffold a new agent
+Use the coder-tools initialize_and_build_agent tool to scaffold a new agent
 ```

 **Codex CLI:**
@@ -453,7 +453,7 @@ This design allows agents in `exports/` to be:
 ### 2. Build Agent (Claude Code)

 ```
-Use the coder-tools initialize_agent_package tool
+Use the coder-tools initialize_and_build_agent tool
 Enter goal: "Build an agent that processes customer support tickets"
 ```

--- a/Show More
+++ b/Show More