Merge pull request #5829 from aden-hive/feat/remove-old-session-status-tools

fix: remove the reference in the coder agent init
2026-03-04 17:42:34 -08:00 · 2026-03-04 17:40:28 -08:00 · 2026-03-04 17:29:35 -08:00 · 2026-03-04 17:23:10 -08:00 · 2026-03-04 17:22:21 -08:00 · 2026-03-04 17:16:10 -08:00
379 changed files with 51722 additions and 2539 deletions
@@ -62,8 +62,11 @@ jobs:
          uv run pytest tests/ -v

  test-tools:
-    name: Test Tools
-    runs-on: ubuntu-latest
+    name: Test Tools (${{ matrix.os }})
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest]
    steps:
      - uses: actions/checkout@v4

@@ -79,3 +79,4 @@ core/tests/*dumps/*

 screenshots/*

+.gemini/*
@@ -2,6 +2,10 @@

 Shared agent instructions for this workspace.

+## Deprecations
+
+- **TUI is deprecated.** The terminal UI (`hive tui`) is no longer maintained. Use the browser-based interface (`hive open`) instead.
+
 ## Coding Agent Notes

 - 
@@ -20,8 +20,20 @@ check: ## Run all checks without modifying files (CI-safe)
 	cd core && ruff format --check .
 	cd tools && ruff format --check .

-test: ## Run all tests
+test: ## Run all tests (core + tools, excludes live)
 	cd core && uv run python -m pytest tests/ -v
+	cd tools && uv run python -m pytest -v
+
+test-tools: ## Run tool tests only (mocked, no credentials needed)
+	cd tools && uv run python -m pytest -v
+
+test-live: ## Run live integration tests (requires real API credentials)
+	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO
+
+test-all: ## Run everything including live tests
+	cd core && uv run python -m pytest tests/ -v
+	cd tools && uv run python -m pytest -v
+	cd tools && uv run python -m pytest -m live -s -o "addopts=" --log-cli-level=INFO

 install-hooks: ## Install pre-commit hooks
 	uv pip install pre-commit
@@ -82,6 +82,7 @@ Use Hive when you need:

 - Python 3.11+ for agent development
 - An LLM provider that powers the agents
+- **ripgrep (optional, recommended on Windows):** The `search_files` tool uses ripgrep for faster file search. If not installed, a Python fallback is used. On Windows: `winget install BurntSushi.ripgrep` or `scoop install ripgrep`

 > **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.

@@ -112,6 +113,8 @@ This sets up:

 - At last, it will initiate the open hive interface in your browser

+> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
+
 <img width="2500" height="1214" alt="home-screen" src="https://github.com/user-attachments/assets/134d897f-5e75-4874-b00b-e0505f6b45c4" />

 ### Build Your First Agent
@@ -0,0 +1,31 @@
+perf: reduce subprocess spawning in quickstart scripts (#4427)
+
+## Problem
+Windows process creation (CreateProcess) is 10-100x slower than Linux fork/exec.
+The quickstart scripts were spawning 4+ separate `uv run python -c "import X"` 
+processes to verify imports, adding ~600ms overhead on Windows.
+
+## Solution
+Consolidated all import checks into a single batch script that checks multiple 
+modules in one subprocess call, reducing spawn overhead by ~75%.
+
+## Changes
+- **New**: `scripts/check_requirements.py` - Batched import checker
+- **New**: `scripts/test_check_requirements.py` - Test suite  
+- **New**: `scripts/benchmark_quickstart.ps1` - Performance benchmark tool
+- **Modified**: `quickstart.ps1` - Updated import verification (2 sections)
+- **Modified**: `quickstart.sh` - Updated import verification
+
+## Performance Impact
+**Benchmark results on Windows:**
+- Before: ~19.8 seconds for import checks
+- After: ~4.9 seconds for import checks
+- **Improvement: 14.9 seconds saved (75.2% faster)**
+
+## Testing
+- ✅ All functional tests pass (`scripts/test_check_requirements.py`)
+- ✅ Quickstart scripts work correctly on Windows
+- ✅ Error handling verified (invalid imports reported correctly)
+- ✅ Performance benchmark confirms 75%+ improvement
+
+Fixes #4427
@@ -69,7 +69,7 @@ goal = Goal(
            id="dynamic-tool-discovery",
            description=(
                "Always discover available tools dynamically via "
-                "discover_mcp_tools before referencing tools in agent designs"
+                "list_agent_tools before referencing tools in agent designs"
            ),
            constraint_type="hard",
            category="correctness",
@@ -10,7 +10,7 @@ def _load_preferred_model() -> str:
    config_path = Path.home() / ".hive" / "configuration.json"
    if config_path.exists():
        try:
-            with open(config_path) as f:
+            with open(config_path, encoding="utf-8") as f:
                config = json.load(f)
            llm = config.get("llm", {})
            if llm.get("provider") and llm.get("model"):
@@ -24,7 +24,7 @@ def _load_preferred_model() -> str:
 class RuntimeConfig:
    model: str = field(default_factory=_load_preferred_model)
    temperature: float = 0.7
-    max_tokens: int = 40000
+    max_tokens: int = 8000
    api_key: str | None = None
    api_base: str | None = None

@@ -7,11 +7,11 @@ from framework.graph import NodeSpec
 # Load reference docs at import time so they're always in the system prompt.
 # No voluntary read_file() calls needed — the LLM gets everything upfront.
 _ref_dir = Path(__file__).parent.parent / "reference"
-_framework_guide = (_ref_dir / "framework_guide.md").read_text()
-_file_templates = (_ref_dir / "file_templates.md").read_text()
-_anti_patterns = (_ref_dir / "anti_patterns.md").read_text()
+_framework_guide = (_ref_dir / "framework_guide.md").read_text(encoding="utf-8")
+_file_templates = (_ref_dir / "file_templates.md").read_text(encoding="utf-8")
+_anti_patterns = (_ref_dir / "anti_patterns.md").read_text(encoding="utf-8")
 _gcu_guide_path = _ref_dir / "gcu_guide.md"
-_gcu_guide = _gcu_guide_path.read_text() if _gcu_guide_path.exists() else ""
+_gcu_guide = _gcu_guide_path.read_text(encoding="utf-8") if _gcu_guide_path.exists() else ""


 def _is_gcu_enabled() -> bool:
@@ -46,23 +46,62 @@ _SHARED_TOOLS = [
    "read_file",
    "write_file",
    "edit_file",
+    "hashline_edit",
    "list_directory",
    "search_files",
    "run_command",
    "undo_changes",
    # Meta-agent
    "list_agent_tools",
-    "discover_mcp_tools",
    "validate_agent_tools",
    "list_agents",
    "list_agent_sessions",
-    "get_agent_session_state",
-    "get_agent_session_memory",
    "list_agent_checkpoints",
    "get_agent_checkpoint",
    "run_agent_tests",
 ]

+# Queen mode-specific tool sets.
+# Building mode: full coding + agent construction tools.
+_QUEEN_BUILDING_TOOLS = _SHARED_TOOLS + [
+    "load_built_agent",
+    "list_credentials",
+]
+
+# Staging mode: agent loaded but not yet running — inspect, configure, launch.
+_QUEEN_STAGING_TOOLS = [
+    # Read-only (inspect agent files, logs)
+    "read_file",
+    "list_directory",
+    "search_files",
+    "run_command",
+    # Agent inspection
+    "list_credentials",
+    "get_worker_status",
+    # Launch or go back
+    "run_agent_with_input",
+    "stop_worker_and_edit",
+]
+
+# Running mode: worker is executing — monitor and control.
+_QUEEN_RUNNING_TOOLS = [
+    # Read-only coding (for inspecting logs, files)
+    "read_file",
+    "list_directory",
+    "search_files",
+    "run_command",
+    # Credentials
+    "list_credentials",
+    # Worker lifecycle
+    "stop_worker",
+    "stop_worker_and_edit",
+    "get_worker_status",
+    "inject_worker_message",
+    # Monitoring
+    "get_worker_health_summary",
+    "notify_operator",
+]
+

 # ---------------------------------------------------------------------------
 # Shared agent-building knowledge: core mandates, tool docs, meta-agent
@@ -91,26 +130,35 @@ errors yourself. Don't declare success until validation passes.

 # Tools

+## Paths (MANDATORY)
+**Always use RELATIVE paths**
+(e.g. `exports/agent_name/config.py`, `exports/agent_name/nodes/__init__.py`).
+**Never use absolute paths** like `/mnt/data/...` or `/workspace/...` — they fail.
+The project root is implicit.
+
 ## File I/O
- read_file(path, offset?, limit?) — read with line numbers
+- read_file(path, offset?, limit?, hashline?) — read with line numbers; \
+hashline=True for N:hhhh|content anchors (use with hashline_edit)
 - write_file(path, content) — create/overwrite, auto-mkdir
 - edit_file(path, old_text, new_text, replace_all?) — fuzzy-match edit
+- hashline_edit(path, edits, auto_cleanup?, encoding?) — anchor-based \
+editing using N:hhhh refs from read_file(hashline=True). Ops: set_line, \
+replace_lines, insert_after, insert_before, replace, append
 - list_directory(path, recursive?) — list contents
- search_files(pattern, path?, include?) — regex search
+- search_files(pattern, path?, include?, hashline?) — regex search; \
+hashline=True for anchors in results
 - run_command(command, cwd?, timeout?) — shell execution
 - undo_changes(path?) — restore from git snapshot

 ## Meta-Agent
- list_agent_tools(server_config_path?) — list all tool names available \
-for agent building, grouped by category. Call this FIRST before designing.
- discover_mcp_tools(server_config_path?) — connect to MCP servers \
-and list all available tools with full schemas. Use for parameter details.
+- list_agent_tools(server_config_path?, output_schema?, group?) — discover \
+available tools grouped by category. output_schema: "simple" (default) or \
+"full" (includes input_schema). group: "all" (default) or a prefix like \
+"gmail". Call FIRST before designing.
 - validate_agent_tools(agent_path) — validate that all tools declared \
 in an agent's nodes actually exist. Call after building.
 - list_agents() — list all agent packages in exports/ with session counts
 - list_agent_sessions(agent_name, status?, limit?) — list sessions
- get_agent_session_state(agent_name, session_id) — full session state
- get_agent_session_memory(agent_name, session_id, key?) — memory data
 - list_agent_checkpoints(agent_name, session_id) — list checkpoints
 - get_agent_checkpoint(agent_name, session_id, checkpoint_id?) — load checkpoint
 - run_agent_tests(agent_name, test_types?, fail_fast?) — run pytest with parsing
@@ -121,15 +169,14 @@ You are not just a file writer. You have deep integration with the \
 Hive framework:

 ## Tool Discovery (MANDATORY before designing)
-Before designing any agent, run list_agent_tools() to get all \
-available tool names. ONLY use tools from this list in your node \
-definitions. NEVER guess or fabricate tool names from memory.
+Before designing any agent, run list_agent_tools() to discover all \
+available tools. ONLY use tools from this list in your node definitions. \
+NEVER guess or fabricate tool names from memory.

-For full parameter schemas when you need details:
-  discover_mcp_tools()
-
-To check a specific agent's configured tools:
-  list_agent_tools("exports/{agent_name}/mcp_servers.json")
+  list_agent_tools()                                    # names + descriptions
+  list_agent_tools(output_schema="full")                # include input_schema
+  list_agent_tools(group="gmail")                       # only gmail_* tools
+  list_agent_tools("exports/{agent_name}/mcp_servers.json")  # specific agent

 ## Agent Awareness
 Run list_agents() to see what agents already exist. Read their code \
@@ -146,8 +193,7 @@ After writing agent code, validate structurally AND run tests:
 ## Debugging Built Agents
 When a user says "my agent is failing" or "debug this agent":
 1. list_agent_sessions("{agent_name}") — find the session
-2. get_agent_session_state("{agent_name}", "{session_id}") — see status
-3. get_agent_session_memory("{agent_name}", "{session_id}") — inspect data
+2. get_worker_status
 4. list_agent_checkpoints / get_agent_checkpoint — trace execution

 # Agent Building Workflow
@@ -246,11 +292,12 @@ explicitly requests a one-shot/batch agent. Forever-alive agents loop \
 continuously — the user exits by closing the TUI. This is the standard \
 pattern for all interactive agents.

-### Node Count Rules (HARD LIMITS)
+### Node Design Rules

-**2-4 nodes** for all agents. Never exceed 4 unless the user explicitly \
-requests more. Each node boundary serializes outputs to shared memory \
-and DESTROYS all in-context information (tool results, reasoning, history).
+Each node boundary serializes outputs to shared memory \
+and DESTROYS all in-context information (tool results, reasoning, history). \
+Use as many nodes as the use case requires, but don't create nodes without \
+tools — merge them into nodes that do real work.

 **MERGE nodes when:**
 - Node has NO tools (pure LLM reasoning) → merge into predecessor/successor
@@ -264,10 +311,11 @@ and DESTROYS all in-context information (tool results, reasoning, history).
 - Fundamentally different tool sets
 - Fan-out parallelism (parallel branches MUST be separate)

-**Typical patterns:**
- 2 nodes: `interact (client-facing) → process (autonomous) → interact`
- 3 nodes: `intake (CF) → process (auto) → review (CF) → intake`
+**Typical patterns (queen manages intake — NO client-facing intake node):**
+- 2 nodes: `process (autonomous) → review (client-facing) → process`
+- 1 node: `process (autonomous)` — simplest; queen handles all interaction
 - WRONG: 7 nodes where half have no tools and just do LLM reasoning
+- WRONG: Intake node that asks the user for requirements — the queen does intake

 Read reference agents before designing:
  list_agents()
@@ -280,20 +328,27 @@ use box-drawing characters and clear flow arrows:

 ```
 ┌─────────────────────────┐
-│  intake (client-facing)  │
-│  tools: set_output       │
-└────────────┬────────────┘
-             │ on_success
-             ▼
-┌─────────────────────────┐
 │  process (autonomous)    │
+│  in:  user_request       │
 │  tools: web_search,      │
 │         save_data        │
 └────────────┬────────────┘
             │ on_success
-             └──────► back to intake
+             ▼
+┌─────────────────────────┐
+│  review (client-facing)  │
+│  tools: set_output       │
+└────────────┬────────────┘
+             │ on_success
+             └──────► back to process
 ```

+The queen owns intake: she gathers user requirements, then calls \
+`run_agent_with_input(task)` with a structured task description. \
+When building the agent, design the entry node's `input_keys` to \
+match what the queen will provide at run time. No client-facing \
+intake node in the worker.
+
 Follow the graph with a brief summary of each node's purpose. \
 Get user approval before implementing.

@@ -356,8 +411,9 @@ from .agent import (
 ```

 **entry_points**: `{"start": "first-node-id"}`
-For agents with multiple entry points (e.g. a reminder trigger), \
-add them: `{"start": "intake", "reminder": "reminder"}`
+The first node should be an autonomous processing node (NOT a \
+client-facing intake). For agents with multiple entry points, \
+add them: `{"start": "process", "reminder": "check"}`

 **conversation_mode** — ONLY two valid values:
 - `"continuous"` — recommended for interactive agents (context carries \
@@ -391,7 +447,8 @@ NO "mcpServers" wrapper. cwd "../../tools". command "uv".

 **Storage**: `Path.home() / ".hive" / "agents" / "{name}"`

-**Client-facing system prompts** — STEP 1/STEP 2 pattern:
+**Client-facing system prompts** (review/approval nodes only, NOT intake) \
+— STEP 1/STEP 2 pattern:
 ```
 STEP 1 — Present to user (text only, NO tool calls):
 [instructions]
@@ -399,6 +456,9 @@ STEP 1 — Present to user (text only, NO tool calls):
 STEP 2 — After user responds, call set_output:
 [set_output calls]
 ```
+The queen manages intake. Workers should NOT have a client-facing node \
+that asks for requirements. Use client_facing=True only for review or \
+approval checkpoints mid-execution.

 **Autonomous system prompts** — set_output in SEPARATE turn.

@@ -408,7 +468,10 @@ If list_agent_tools() shows these don't exist, use alternatives \
 (e.g. save_data/load_data for data persistence).

 **Node rules**:
- **2-4 nodes MAX.** Never exceed 4. Merge thin nodes aggressively.
+- **NO intake nodes.** The queen owns intake. She defines the entry \
+node's input_keys at build time and fills them via \
+`run_agent_with_input(task)` at run time.
+- Don't abuse nodes without tools — merge them into a node that does work.
 - A node with 0 tools is NOT a real node — merge it.
 - node_type "event_loop" for all regular graph nodes. Use "gcu" ONLY for
  browser automation subagents (see GCU appendix). GCU nodes MUST be in a
@@ -542,50 +605,89 @@ start_agent("{name}")           # triggers default entry point

 _queen_tools_docs = """

-## Worker Lifecycle
- start_worker(task) — Start the worker with a task description. The \
-worker runs autonomously until it finishes or asks the user a question.
- stop_worker() — Cancel the worker's current execution.
- get_worker_status() — Check if the worker is idle, running, or waiting \
-for user input. Returns execution details.
- inject_worker_message(content) — Send a message to the running worker. \
-Use this to relay user instructions or concerns.
+## Operating Modes

-## Monitoring
- get_worker_health_summary() — Read the latest health data from the judge.
- notify_operator(ticket_id, analysis, urgency) — Alert the user about a \
-critical issue. Use sparingly.
+You operate in one of three modes. Your available tools change based on the \
+mode. The system notifies you when a mode change occurs.

-## Agent Loading
- load_built_agent(agent_path) — Load a newly built agent as the worker in \
-this session. If a worker is already loaded, it is automatically unloaded \
-first. Call after building and validating an agent to make it available \
-immediately.
+### BUILDING mode (default)
+You have full coding tools for building and modifying agents:
+- File I/O: read_file, write_file, edit_file, list_directory, search_files, \
+run_command, undo_changes
+- Meta-agent: list_agent_tools, validate_agent_tools, \
+list_agents, list_agent_sessions, \
+list_agent_checkpoints, get_agent_checkpoint, run_agent_tests
+- load_built_agent(agent_path) — Load the agent and switch to STAGING mode
+- list_credentials(credential_id?) — List authorized credentials

-## Credentials
- list_credentials(credential_id?) — List all authorized credentials in the \
-local store. Returns IDs, aliases, status, and identity metadata (never \
-secrets). Optionally filter by credential_id.
+When you finish building an agent, call load_built_agent(path) to stage it.
+
+### STAGING mode (agent loaded, not yet running)
+The agent is loaded and ready to run. You can inspect it and launch it:
+- Read-only: read_file, list_directory, search_files, run_command
+- list_credentials(credential_id?) — Verify credentials are configured
+- get_worker_status() — Check the loaded worker
+- run_agent_with_input(task) — Start the worker and switch to RUNNING mode
+- stop_worker_and_edit() — Go back to BUILDING mode
+
+In STAGING mode you do NOT have write tools. If you need to modify the agent, \
+call stop_worker_and_edit() to go back to BUILDING mode.
+
+### RUNNING mode (worker is executing)
+The worker is running. You have monitoring and lifecycle tools:
+- Read-only: read_file, list_directory, search_files, run_command
+- get_worker_status() — Check worker status (idle, running, waiting)
+- inject_worker_message(content) — Send a message to the running worker
+- get_worker_health_summary() — Read the latest health data
+- notify_operator(ticket_id, analysis, urgency) — Alert the user (use sparingly)
+- stop_worker() — Stop the worker and return to STAGING mode, then ask the user what to do next
+- stop_worker_and_edit() — Stop the worker and switch back to BUILDING mode
+
+In RUNNING mode you do NOT have write tools or agent construction tools. \
+If you need to modify the agent, call stop_worker_and_edit() to switch back \
+to BUILDING mode. To stop the worker and ask the user what to do next, call \
+stop_worker() to return to STAGING mode.
+
+### Mode transitions
+- load_built_agent(path) → switches to STAGING mode
+- run_agent_with_input(task) → starts worker, switches to RUNNING mode
+- stop_worker() → stops worker, switches to STAGING mode (ask user: re-run or edit?)
+- stop_worker_and_edit() → stops worker (if running), switches to BUILDING mode
 """

 _queen_behavior = """
 # Behavior

+## CRITICAL RULE — ask_user tool
+
+Every response that ends with a question, a prompt, or expects user \
+input MUST finish with a call to ask_user(prompt, options). This is \
+NON-NEGOTIABLE. The system CANNOT detect that you are waiting for \
+input unless you call ask_user. You MUST call ask_user as the LAST \
+action in your response.
+
+NEVER end a response with a question in text without calling ask_user. \
+NEVER rely on the user seeing your text and replying — call ask_user.
+
+Always provide 2-4 short options that cover the most likely answers. \
+The user can always type a custom response.
+
+Examples:
+- ask_user("What do you need?",
+  ["Build a new agent", "Run the loaded worker", "Help with code"])
+- ask_user("Which pattern?",
+  ["Simple 2-node", "Rich with feedback", "Custom"])
+- ask_user("Ready to proceed?",
+  ["Yes, go ahead", "Let me change something"])
+
 ## Greeting and identity

-When the user greets you ("hi", "hello") or asks what you can do / \
-what you are, respond concisely. DO NOT list internal processes \
-(validation steps, AgentRunner.load, tool discovery). Focus on \
-user-facing capabilities:
-
-1. Direct capabilities: file operations, shell commands, coding, \
-agent building & debugging.
-2. Delegation: describe what the loaded worker does in one sentence \
-(read the Worker Profile at the end of this prompt). If no worker \
-is loaded, say so.
-3. End with a short prompt: "What do you need?"
-
-Keep it under 10 lines. No bullet-point dumps of every tool you have.
+When the user greets you or asks what you can do, respond concisely \
+(under 10 lines). DO NOT list internal processes. Focus on:
+1. Direct capabilities: coding, agent building & debugging.
+2. What the loaded worker does (one sentence from Worker Profile). \
+If no worker is loaded, say so.
+3. THEN call ask_user to prompt them — do NOT just write text.

 ## Direct coding
 You can do any coding task directly — reading files, writing code, running \
@@ -596,7 +698,8 @@ The worker is a specialized agent (see Worker Profile at the end of this \
 prompt). It can ONLY do what its goal and tools allow.

 **Decision rule — read the Worker Profile first:**
- The user's request directly matches the worker's goal → start_worker(task)
+- The user's request directly matches the worker's goal → use \
+run_agent_with_input(task) (if in staging) or load then run (if in building)
 - Anything else → do it yourself. Do NOT reframe user requests into \
 subtasks to justify delegation.
 - Building, modifying, or configuring agents is ALWAYS your job. Never \
@@ -604,16 +707,30 @@ delegate agent construction to the worker, even as a "research" subtask.

 ## When the user says "run", "execute", or "start" (without specifics)

-The loaded worker is described in the Worker Profile below. Ask what \
-task or topic they want — do NOT call list_agents() or list directories. \
-The worker is already loaded. Just ask for the input the worker needs \
-(e.g., a research topic, a target domain, a job description).
+The loaded worker is described in the Worker Profile below. You MUST \
+ask the user what task or input they want using ask_user — do NOT \
+invent a task, do NOT call list_agents() or list directories. \
+The worker is already loaded. Just ask for the specific input the \
+worker needs (e.g., a research topic, a target domain, a job description). \
+NEVER call run_agent_with_input until the user has provided their input.

 If NO worker is loaded, say so and offer to build one.

+## When in staging mode (agent loaded, not running):
+- Tell the user the agent is loaded and ready.
+- For tasks matching the worker's goal: ALWAYS ask the user for their \
+specific input BEFORE calling run_agent_with_input(task). NEVER make up \
+or assume what the user wants. Use ask_user to collect the task details \
+(e.g., topic, target, requirements). Once you have the user's answer, \
+compose a structured task description from their input and call \
+run_agent_with_input(task). The worker has no intake node — it receives \
+your task and starts processing.
+- If the user wants to modify the agent, call stop_worker_and_edit().
+
 ## When idle (worker not running):
 - Greet the user. Mention what the worker can do in one sentence.
- For tasks matching the worker's goal, call start_worker(task).
+- For tasks matching the worker's goal, use run_agent_with_input(task) \
+(if in staging) or load the agent first (if in building).
 - For everything else, do it directly.

 ## When the user clicks Run (external event notification)
@@ -625,24 +742,37 @@ explain the problem clearly and help fix it. For credential errors, \
 guide the user to set up the missing credentials. For structural \
 issues, offer to fix the agent graph directly.

-## When worker is running:
- If the user asks about progress, call get_worker_status() ONCE and \
-report the result. Do NOT poll in a loop.
- NEVER call get_worker_status() repeatedly without user input in between. \
-The worker will surface results through client-facing nodes. You do not \
-need to monitor it. One check per user request is enough.
- If the user has a concern or instruction for the worker, call \
-inject_worker_message(content) to relay it.
- You can still do coding tasks directly while the worker runs.
- If an escalation ticket arrives from the judge, assess severity:
-  - Low/transient: acknowledge silently, do not disturb the user.
-  - High/critical: notify the user with a brief analysis and suggested action.
- After starting the worker or checking its status, WAIT for the user's \
-next message. Do not take autonomous actions unless the user asks.
+## When worker is running — GO SILENT

-## When worker asks user a question:
- The system will route the user's response directly to the worker. \
-You do not need to relay it. The user will come back to you after responding.
+Once you call start_worker(), your job is DONE. Do NOT call ask_user, \
+do NOT call get_worker_status(), do NOT emit any text. Just stop. \
+The worker owns the conversation now — it has its own client-facing \
+nodes that talk to the user directly.
+
+**After start_worker, your ENTIRE response should be ONE short \
+confirmation sentence with NO tool calls.** Example: \
+"Started the vulnerability assessment." — that's it. No ask_user, \
+no get_worker_status, no follow-up questions.
+
+You only wake up again when:
+- The user explicitly addresses you (not answering a worker question)
+- A worker question is forwarded to you for relay
+- An escalation ticket arrives from the judge
+- The worker finishes
+
+If the user explicitly asks about progress, call get_worker_status() \
+ONCE and report. Do NOT poll or check proactively.
+
+For escalation tickets: low/transient → acknowledge silently. \
+High/critical → notify the user with a brief analysis.
+
+## When the worker asks the user a question:
+- The user's answer is routed to you with context: \
+[Worker asked: "...", Options: ...] User answered: "...".
+- If the user is answering the worker's question normally, relay it \
+using inject_worker_message(answer_text). Then go silent again.
+- If the user is rejecting the approach, asking to stop, or giving \
+you an instruction, handle it yourself — do NOT relay.

 ## Showing or describing the loaded worker

@@ -658,16 +788,18 @@ building something new.
 When the user asks to change, modify, or update the loaded worker \
 (e.g., "change the report node", "add a node", "delete node X"):

-1. Use the **Path** from the Worker Profile to locate the agent files.
-2. Read the relevant files (nodes/__init__.py, agent.py, etc.).
-3. Make the requested changes using edit_file / write_file.
-4. Run validation (default_agent.validate(), AgentRunner.load(), \
+1. Call stop_worker_and_edit() — this stops the worker and gives you \
+coding tools (switches to BUILDING mode).
+2. Use the **Path** from the Worker Profile to locate the agent files.
+3. Read the relevant files (nodes/__init__.py, agent.py, etc.).
+4. Make the requested changes using edit_file / write_file.
+5. Run validation (default_agent.validate(), AgentRunner.load(), \
 validate_agent_tools()).
-5. **Reload the modified worker**: call load_built_agent("{path}") \
-so the changes take effect immediately. If a worker is already loaded, \
-stop it first, then reload.
+6. **Reload the modified worker**: call load_built_agent("{path}") \
+so the changes take effect immediately (switches to STAGING mode). \
+Then call run_agent_with_input(task) to restart execution.

-Do NOT skip step 5 — without reloading, the user will still be \
+Do NOT skip step 6 — without reloading, the user will still be \
 interacting with the old version.
 """

@@ -676,9 +808,9 @@ _queen_phase_7 = """

 After building and verifying, load the agent into the current session:
  load_built_agent("exports/{name}")
-This makes the agent available immediately — the user sees its graph, \
-the tab name updates, and you can delegate to it via start_worker(). \
-Do NOT tell the user to run `python -m {name} run` — load it here.
+This switches to STAGING mode — the user sees the agent's graph and \
+the tab name updates. Then call run_agent_with_input(task) to start it. \
+Do NOT tell the user to run `python -m {name} run` — load and run it here.
 """

 _queen_style = """
@@ -808,21 +940,7 @@ queen_node = NodeSpec(
        "User's intent is understood, coding tasks are completed correctly, "
        "and the worker is managed effectively when delegated to."
    ),
-    tools=_SHARED_TOOLS
-    + [
-        # Worker lifecycle
-        "start_worker",
-        "stop_worker",
-        "get_worker_status",
-        "inject_worker_message",
-        # Monitoring
-        "get_worker_health_summary",
-        "notify_operator",
-        # Agent loading
-        "load_built_agent",
-        # Credentials
-        "list_credentials",
-    ],
+    tools=sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS)),
    system_prompt=(
        "You are the Queen — the user's primary interface. You are a coding agent "
        "with the same capabilities as the Hive Coder worker, PLUS the ability to "
@@ -836,20 +954,7 @@ queen_node = NodeSpec(
    ),
 )

-ALL_QUEEN_TOOLS = _SHARED_TOOLS + [
-    # Worker lifecycle
-    "start_worker",
-    "stop_worker",
-    "get_worker_status",
-    "inject_worker_message",
-    # Monitoring
-    "get_worker_health_summary",
-    "notify_operator",
-    # Agent loading
-    "load_built_agent",
-    # Credentials
-    "list_credentials",
-]
+ALL_QUEEN_TOOLS = sorted(set(_QUEEN_BUILDING_TOOLS + _QUEEN_STAGING_TOOLS + _QUEEN_RUNNING_TOOLS))

 __all__ = [
    "coder_node",
@@ -857,4 +962,7 @@ __all__ = [
    "queen_node",
    "ALL_QUEEN_TRIAGE_TOOLS",
    "ALL_QUEEN_TOOLS",
+    "_QUEEN_BUILDING_TOOLS",
+    "_QUEEN_STAGING_TOOLS",
+    "_QUEEN_RUNNING_TOOLS",
 ]
@@ -48,11 +48,11 @@ profile_setup → daily_intake → update_tracker → analyze_progress → gener
 ```
 `analyze_progress` has no tools. `schedule_reminders` just sets one boolean. `report` just presents analysis. `update_tracker` and `generate_plan` are sequential autonomous work.

-**Good example** (3 nodes):
+**Good example** (2 nodes):
 ```
-intake (client-facing) → process (autonomous: track + analyze + plan) → intake (loop back)
+process (autonomous: track + analyze + plan) → review (client-facing) → process (loop back)
 ```
-One client-facing node handles ALL user interaction (setup, logging, reports). One autonomous node handles ALL backend work (CSV update, analysis, plan generation) with tools and context preserved.
+The queen handles intake (gathering requirements from the user) and passes the task via `run_agent_with_input(task)`. One autonomous node handles ALL backend work (CSV update, analysis, plan generation) with tools and context preserved. One client-facing node handles review/approval when needed.

 12. **Adding framework gating for LLM behavior** — Don't add output rollback, premature rejection, or interaction protocol injection. Fix with better prompts or custom judges.

@@ -109,3 +109,5 @@ def test_research_routes_back_to_interact(self):
 25. **Manually wiring browser tools on event_loop nodes** — If the agent needs browser automation, use `node_type="gcu"` which auto-includes all browser tools and prepends best-practices guidance. Do NOT manually list browser tool names on event_loop nodes — they may not exist in the MCP server or may be incomplete. See the GCU Guide appendix.

 26. **Using GCU nodes as regular graph nodes** — GCU nodes (`node_type="gcu"`) are exclusively subagents. They must ONLY appear in a parent node's `sub_agents=["gcu-node-id"]` list and be invoked via `delegate_to_sub_agent()`. They must NEVER be connected via edges, used as entry nodes, or used as terminal nodes. If a GCU node appears as an edge source or target, the graph will fail pre-load validation.
+
+27. **Adding a client-facing intake node to worker agents** — The queen owns intake. She defines the entry node's `input_keys` at build time and fills them via `run_agent_with_input(task)` at run time. Worker agents should start with an autonomous processing node, NOT a client-facing intake node that asks the user for requirements. Client-facing nodes in workers are for mid-execution review/approval only.
@@ -57,51 +57,28 @@ metadata = AgentMetadata()

 from framework.graph import NodeSpec

-# Node 1: Intake (client-facing)
-intake_node = NodeSpec(
-    id="intake",
-    name="Intake",
-    description="Gather requirements from the user",
+# Node 1: Process (autonomous entry node)
+# The queen handles intake and passes structured input via
+# run_agent_with_input(task). NO client-facing intake node.
+# The queen defines input_keys at build time and fills them at run time.
+process_node = NodeSpec(
+    id="process",
+    name="Process",
+    description="Execute the task using available tools",
    node_type="event_loop",
-    client_facing=True,
    max_node_visits=0,  # Unlimited for forever-alive
-    input_keys=["topic"],
-    output_keys=["brief"],
-    success_criteria="The brief is specific and actionable.",
-    system_prompt="""\
-You are an intake specialist.
-
-**STEP 1 — Read and respond (text only, NO tool calls):**
-1. Read the topic provided
-2. If vague, ask 1-2 clarifying questions
-3. If clear, confirm your understanding
-
-**STEP 2 — After the user confirms, call set_output:**
- set_output("brief", "Clear description of what to do")
-""",
-    tools=[],
-)
-
-# Node 2: Worker (autonomous)
-worker_node = NodeSpec(
-    id="worker",
-    name="Worker",
-    description="Do the main work",
-    node_type="event_loop",
-    max_node_visits=0,
-    input_keys=["brief", "feedback"],
+    input_keys=["user_request", "feedback"],
    output_keys=["results"],
    nullable_output_keys=["feedback"],  # Only on feedback edge
    success_criteria="Results are complete and accurate.",
    system_prompt="""\
-You are a worker agent. Given a brief, do the work.
-
-If feedback is provided, this is a follow-up — address the feedback.
+You are a processing agent. Your task is in memory under "user_request". \
+If "feedback" is present, this is a revision — address the feedback.

 Work in phases:
 1. Use tools to gather/process data
 2. Analyze results
-3. Call set_output for each key in a SEPARATE turn:
+3. Call set_output in a SEPARATE turn:
   - set_output("results", "structured results")
 """,
    tools=["web_search", "web_scrape", "save_data", "load_data", "list_data_files"],
@@ -115,7 +92,7 @@ review_node = NodeSpec(
    node_type="event_loop",
    client_facing=True,
    max_node_visits=0,
-    input_keys=["results", "brief"],
+    input_keys=["results", "user_request"],
    output_keys=["next_action", "feedback"],
    nullable_output_keys=["feedback"],
    success_criteria="User has reviewed and decided next steps.",
@@ -128,14 +105,14 @@ Present the results to the user.
 3. Ask: satisfied, or want changes?

 **STEP 2 — After user responds, call set_output:**
- set_output("next_action", "new_topic")   — if starting fresh
+- set_output("next_action", "done")        — if satisfied
 - set_output("next_action", "revise")      — if changes needed
 - set_output("feedback", "what to change") — only if revising
 """,
    tools=[],
 )

-__all__ = ["intake_node", "worker_node", "review_node"]
+__all__ = ["process_node", "review_node"]
 ```

 ## agent.py
@@ -155,7 +132,7 @@ from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
 from framework.runtime.execution_stream import EntryPointSpec

 from .config import default_config, metadata
-from .nodes import intake_node, worker_node, review_node
+from .nodes import process_node, review_node

 # Goal definition
 goal = Goal(
@@ -172,27 +149,26 @@ goal = Goal(
 )

 # Node list
-nodes = [intake_node, worker_node, review_node]
+nodes = [process_node, review_node]

 # Edge definitions
 edges = [
-    EdgeSpec(id="intake-to-worker", source="intake", target="worker",
+    EdgeSpec(id="process-to-review", source="process", target="review",
             condition=EdgeCondition.ON_SUCCESS, priority=1),
-    EdgeSpec(id="worker-to-review", source="worker", target="review",
-             condition=EdgeCondition.ON_SUCCESS, priority=1),
-    # Feedback loop
-    EdgeSpec(id="review-to-worker", source="review", target="worker",
+    # Feedback loop — revise results
+    EdgeSpec(id="review-to-process", source="review", target="process",
             condition=EdgeCondition.CONDITIONAL,
             condition_expr="str(next_action).lower() == 'revise'", priority=2),
-    # Loop back for new topic
-    EdgeSpec(id="review-to-intake", source="review", target="intake",
+    # Loop back for next task (queen sends new input)
+    EdgeSpec(id="review-done", source="review", target="process",
             condition=EdgeCondition.CONDITIONAL,
-             condition_expr="str(next_action).lower() == 'new_topic'", priority=1),
+             condition_expr="str(next_action).lower() == 'done'", priority=1),
 ]

-# Graph configuration
-entry_node = "intake"
-entry_points = {"start": "intake"}
+# Graph configuration — entry is the autonomous process node
+# The queen handles intake and passes the task via run_agent_with_input(task)
+entry_node = "process"
+entry_points = {"start": "process"}
 pause_nodes = []
 terminal_nodes = []  # Forever-alive

@@ -208,7 +184,7 @@ class MyAgent:
        self.goal = goal
        self.nodes = nodes
        self.edges = edges
-        self.entry_node = entry_node
+        self.entry_node = entry_node  # "process" — autonomous entry
        self.entry_points = entry_points
        self.pause_nodes = pause_nodes
        self.terminal_nodes = terminal_nodes
@@ -498,7 +474,7 @@ def tui():
        llm = LiteLLMProvider(model=agent.config.model, api_key=agent.config.api_key, api_base=agent.config.api_base)
        runtime = create_agent_runtime(
            graph=agent._build_graph(), goal=agent.goal, storage_path=storage,
-            entry_points=[EntryPointSpec(id="start", name="Start", entry_node="intake", trigger_type="manual", isolation_level="isolated")],
+            entry_points=[EntryPointSpec(id="start", name="Start", entry_node="process", trigger_type="manual", isolation_level="isolated")],
            llm=llm, tools=list(agent._tool_registry.get_tools().values()), tool_executor=agent._tool_registry.get_executor())
        await runtime.start()
        try:
@@ -131,13 +131,19 @@ downstream node only sees the serialized summary string.
 - A "report" node that presents analysis → merge into the client-facing node
 - A "confirm" or "schedule" node that doesn't call any external service → remove

-**Typical agent structure (3 nodes):**
+**Typical agent structure (2 nodes):**
 ```
-intake (client-facing) ←→ process (autonomous) ←→ review (client-facing)
+process (autonomous) ←→ review (client-facing)
 ```
-Or for simpler agents, just 2 nodes:
+The queen owns intake — she gathers requirements from the user, then
+passes structured input via `run_agent_with_input(task)`. When building
+the agent, design the entry node's `input_keys` to match what the queen
+will provide at run time. Worker agents should NOT have a client-facing
+intake node. Client-facing nodes are for mid-execution review/approval only.
+
+For simpler agents, just 1 autonomous node:
 ```
-interact (client-facing) → process (autonomous) → interact (loop)
+process (autonomous) — loops back to itself
 ```

 ### nullable_output_keys
@@ -397,7 +403,7 @@ from .agent import (
 ### Reference Agent

 See `exports/gmail_inbox_guardian/agent.py` for a complete example with:
- Primary client-facing intake node (user configures rules)
+- Primary client-facing node (user configures rules)
 - Timer-based scheduled inbox checks (every 20 min)
 - Webhook-triggered email event handling
 - Shared isolation for memory access across streams
@@ -413,13 +419,13 @@ See `exports/gmail_inbox_guardian/agent.py` for a complete example with:
 ## Tool Discovery

 Do NOT rely on a static tool list — it will be outdated. Always use
-`list_agent_tools()` to get available tool names grouped by category.
-For full schemas with parameter details, use `discover_mcp_tools()`.
+`list_agent_tools()` to discover available tools, grouped by category.

 ```
-list_agent_tools()                            # all available tools
-list_agent_tools("exports/my_agent/mcp_servers.json")  # specific agent
-discover_mcp_tools()                          # full schemas with params
+list_agent_tools()                            # names + descriptions, all groups
+list_agent_tools(output_schema="full")        # include input_schema
+list_agent_tools(group="gmail")               # only gmail_* tools
+list_agent_tools("exports/my_agent/mcp_servers.json")  # specific agent's tools
 ```

 After building, validate tools exist: `validate_agent_tools("exports/{name}")`
@@ -21,7 +21,7 @@ Do NOT use GCU for:
 - Same underlying `EventLoopNode` class — no new imports needed
 - `tools=[]` is correct — tools are auto-populated at runtime

-## GCU Architecture Pattern
+## GCU Architecture Pattern  

 GCU nodes are **subagents** — invoked via `delegate_to_sub_agent()`, not connected via edges.

@@ -660,7 +660,7 @@ class GraphBuilder:
        # Generate Python code
        code = self._generate_code(graph)

-        Path(path).write_text(code)
+        Path(path).write_text(code, encoding="utf-8")
        self.session.phase = BuildPhase.EXPORTED
        self._save_session()

@@ -754,7 +754,7 @@ class GraphBuilder:
        """Save session to disk."""
        self.session.updated_at = datetime.now()
        path = self.storage_path / f"{self.session.id}.json"
-        path.write_text(self.session.model_dump_json(indent=2))
+        path.write_text(self.session.model_dump_json(indent=2), encoding="utf-8")

    def _load_session(self, session_id: str) -> BuildSession:
        """Load session from disk."""
@@ -92,7 +92,7 @@ def get_api_key() -> str | None:

 def get_gcu_enabled() -> bool:
    """Return whether GCU (browser automation) is enabled in user config."""
-    return get_hive_config().get("gcu_enabled", False)
+    return get_hive_config().get("gcu_enabled", True)


 def get_api_base() -> str | None:
@@ -69,7 +69,7 @@ def save_credential_key(key: str) -> Path:
    # Restrict the secrets directory itself
    path.parent.chmod(stat.S_IRWXU)  # 0o700

-    path.write_text(key)
+    path.write_text(key, encoding="utf-8")
    path.chmod(stat.S_IRUSR | stat.S_IWUSR)  # 0o600

    os.environ[CREDENTIAL_KEY_ENV_VAR] = key
@@ -73,6 +73,7 @@ from .provider import (
    TokenExpiredError,
    TokenPlacement,
 )
+from .zoho_provider import ZohoOAuth2Provider

 __all__ = [
    # Types
@@ -82,6 +83,7 @@ __all__ = [
    # Providers
    "BaseOAuth2Provider",
    "HubSpotOAuth2Provider",
+    "ZohoOAuth2Provider",
    # Lifecycle
    "TokenLifecycleManager",
    "TokenRefreshResult",
@@ -0,0 +1,198 @@
+"""
+Zoho CRM-specific OAuth2 provider.
+
+Pre-configured for Zoho's OAuth2 endpoints and CRM scopes.
+Extends BaseOAuth2Provider for Zoho-specific behavior.
+
+Usage:
+    provider = ZohoOAuth2Provider(
+        client_id="your-client-id",
+        client_secret="your-client-secret",
+        accounts_domain="https://accounts.zoho.com",  # or .in, .eu, etc.
+    )
+
+    # Use with credential store
+    store = CredentialStore(
+        storage=EncryptedFileStorage(),
+        providers=[provider],
+    )
+
+See: https://www.zoho.com/crm/developer/docs/api/v2/access-refresh.html
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from typing import Any
+
+from ..models import CredentialObject, CredentialRefreshError, CredentialType
+from .base_provider import BaseOAuth2Provider
+from .provider import OAuth2Config, OAuth2Token, TokenPlacement
+
+logger = logging.getLogger(__name__)
+
+# Default CRM scopes for Phase 1 (Leads, Contacts, Accounts, Deals, Notes)
+ZOHO_DEFAULT_SCOPES = [
+    "ZohoCRM.modules.leads.ALL",
+    "ZohoCRM.modules.contacts.ALL",
+    "ZohoCRM.modules.accounts.ALL",
+    "ZohoCRM.modules.deals.ALL",
+    "ZohoCRM.modules.notes.CREATE",
+]
+
+
+class ZohoOAuth2Provider(BaseOAuth2Provider):
+    """
+    Zoho CRM OAuth2 provider with pre-configured endpoints.
+
+    Handles Zoho-specific OAuth2 behavior:
+    - Pre-configured token and authorization URLs (region-aware)
+    - Default CRM scopes for Leads, Contacts, Accounts, Deals, Notes
+    - Token validation via Zoho CRM API
+    - Authorization header format: "Authorization: Zoho-oauthtoken {token}"
+
+    Example:
+        provider = ZohoOAuth2Provider(
+            client_id="your-zoho-client-id",
+            client_secret="your-zoho-client-secret",
+            accounts_domain="https://accounts.zoho.com",  # US
+            # or "https://accounts.zoho.in" for India
+            # or "https://accounts.zoho.eu" for EU
+        )
+    """
+
+    def __init__(
+        self,
+        client_id: str,
+        client_secret: str,
+        accounts_domain: str = "https://accounts.zoho.com",
+        api_domain: str | None = None,
+        scopes: list[str] | None = None,
+    ):
+        """
+        Initialize Zoho OAuth2 provider.
+
+        Args:
+            client_id: Zoho OAuth2 client ID
+            client_secret: Zoho OAuth2 client secret
+            accounts_domain: Zoho accounts domain (region-specific)
+                - US: https://accounts.zoho.com
+                - India: https://accounts.zoho.in
+                - EU: https://accounts.zoho.eu
+                - etc.
+            api_domain: Zoho API domain for CRM calls (used in validate).
+                Defaults to ZOHO_API_DOMAIN env or https://www.zohoapis.com
+            scopes: Override default scopes if needed
+        """
+        base = accounts_domain.rstrip("/")
+        token_url = f"{base}/oauth/v2/token"
+        auth_url = f"{base}/oauth/v2/auth"
+
+        config = OAuth2Config(
+            token_url=token_url,
+            authorization_url=auth_url,
+            client_id=client_id,
+            client_secret=client_secret,
+            default_scopes=scopes or ZOHO_DEFAULT_SCOPES,
+            token_placement=TokenPlacement.HEADER_CUSTOM,
+            custom_header_name="Authorization",
+        )
+        super().__init__(config, provider_id="zoho_crm_oauth2")
+        self._accounts_domain = base
+        self._api_domain = (
+            api_domain or os.getenv("ZOHO_API_DOMAIN", "https://www.zohoapis.com")
+        ).rstrip("/")
+
+    @property
+    def supported_types(self) -> list[CredentialType]:
+        return [CredentialType.OAUTH2]
+
+    def format_for_request(self, token: OAuth2Token) -> dict[str, Any]:
+        """
+        Format token for Zoho CRM API requests.
+
+        Zoho uses Authorization header: "Zoho-oauthtoken {access_token}"
+        (not Bearer).
+        """
+        return {
+            "headers": {
+                "Authorization": f"Zoho-oauthtoken {token.access_token}",
+                "Content-Type": "application/json",
+                "Accept": "application/json",
+            }
+        }
+
+    def validate(self, credential: CredentialObject) -> bool:
+        """
+        Validate Zoho credential by making a lightweight API call.
+
+        Uses GET /crm/v2/users?type=CurrentUser (doesn't require module access).
+        Treats 429 as valid-but-rate-limited.
+        """
+        access_token = credential.get_key("access_token")
+        if not access_token:
+            return False
+
+        try:
+            client = self._get_client()
+            response = client.get(
+                f"{self._api_domain}/crm/v2/users?type=CurrentUser",
+                headers={
+                    "Authorization": f"Zoho-oauthtoken {access_token}",
+                    "Accept": "application/json",
+                },
+                timeout=self.config.request_timeout,
+            )
+            return response.status_code in (200, 429)
+        except Exception as e:
+            logger.debug("Zoho credential validation failed: %s", e)
+            return False
+
+    def _parse_token_response(self, response_data: dict[str, Any]) -> OAuth2Token:
+        """
+        Parse Zoho token response.
+
+        Zoho returns:
+        {
+            "access_token": "...",
+            "refresh_token": "...",
+            "expires_in": 3600,
+            "api_domain": "https://www.zohoapis.com",
+            "token_type": "Bearer"
+        }
+        """
+        token = OAuth2Token.from_token_response(response_data)
+        if "api_domain" in response_data:
+            token.raw_response["api_domain"] = response_data["api_domain"]
+        return token
+
+    def refresh(self, credential: CredentialObject) -> CredentialObject:
+        """Refresh Zoho OAuth2 credential and persist DC metadata."""
+        refresh_tok = credential.get_key("refresh_token")
+        if not refresh_tok:
+            raise CredentialRefreshError(f"Credential '{credential.id}' has no refresh_token")
+
+        try:
+            new_token = self.refresh_access_token(refresh_tok)
+        except Exception as e:
+            raise CredentialRefreshError(f"Failed to refresh '{credential.id}': {e}") from e
+
+        credential.set_key("access_token", new_token.access_token, expires_at=new_token.expires_at)
+
+        if new_token.refresh_token and new_token.refresh_token != refresh_tok:
+            credential.set_key("refresh_token", new_token.refresh_token)
+
+        api_domain = new_token.raw_response.get("api_domain")
+        if isinstance(api_domain, str) and api_domain:
+            credential.set_key("api_domain", api_domain.rstrip("/"))
+
+        accounts_server = new_token.raw_response.get("accounts-server")
+        if isinstance(accounts_server, str) and accounts_server:
+            credential.set_key("accounts_domain", accounts_server.rstrip("/"))
+
+        location = new_token.raw_response.get("location")
+        if isinstance(location, str) and location:
+            credential.set_key("location", location.strip().lower())
+
+        return credential
@@ -568,7 +568,7 @@ def _load_nodes_from_python_agent(agent_path: Path) -> list:
 def _load_nodes_from_json_agent(agent_json: Path) -> list:
    """Load nodes from a JSON-based agent."""
    try:
-        with open(agent_json) as f:
+        with open(agent_json, encoding="utf-8") as f:
            data = json.load(f)

        from framework.graph import NodeSpec
@@ -227,7 +227,7 @@ class EncryptedFileStorage(CredentialStorage):
        index_path = self.base_path / "metadata" / "index.json"
        if not index_path.exists():
            return []
-        with open(index_path) as f:
+        with open(index_path, encoding="utf-8") as f:
            index = json.load(f)
        return list(index.get("credentials", {}).keys())

@@ -268,7 +268,7 @@ class EncryptedFileStorage(CredentialStorage):
        index_path = self.base_path / "metadata" / "index.json"

        if index_path.exists():
-            with open(index_path) as f:
+            with open(index_path, encoding="utf-8") as f:
                index = json.load(f)
        else:
            index = {"credentials": {}, "version": "1.0"}
@@ -283,7 +283,7 @@ class EncryptedFileStorage(CredentialStorage):

        index["last_modified"] = datetime.now(UTC).isoformat()

-        with open(index_path, "w") as f:
+        with open(index_path, "w", encoding="utf-8") as f:
            json.dump(index, f, indent=2)


@@ -152,6 +152,72 @@ def _compact_tool_calls(tool_calls: list[dict[str, Any]]) -> list[dict[str, Any]
    return compact


+def extract_tool_call_history(messages: list[Message], max_entries: int = 30) -> str:
+    """Build a compact tool call history from a list of messages.
+
+    Used in compaction summaries to prevent the LLM from re-calling
+    tools it already called.  Extracts tool call details, files saved,
+    outputs set, and errors encountered.
+    """
+    tool_calls_detail: dict[str, list[str]] = {}
+    files_saved: list[str] = []
+    outputs_set: list[str] = []
+    errors: list[str] = []
+
+    def _summarize_input(name: str, args: dict) -> str:
+        if name == "web_search":
+            return args.get("query", "")
+        if name == "web_scrape":
+            return args.get("url", "")
+        if name in ("load_data", "save_data"):
+            return args.get("filename", "")
+        return ""
+
+    for msg in messages:
+        if msg.role == "assistant" and msg.tool_calls:
+            for tc in msg.tool_calls:
+                func = tc.get("function", {})
+                name = func.get("name", "unknown")
+                try:
+                    args = json.loads(func.get("arguments", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    args = {}
+
+                summary = _summarize_input(name, args)
+                tool_calls_detail.setdefault(name, []).append(summary)
+
+                if name == "save_data" and args.get("filename"):
+                    files_saved.append(args["filename"])
+                if name == "set_output" and args.get("key"):
+                    outputs_set.append(args["key"])
+
+        if msg.role == "tool" and msg.is_error:
+            preview = msg.content[:120].replace("\n", " ")
+            errors.append(preview)
+
+    parts: list[str] = []
+    if tool_calls_detail:
+        lines: list[str] = []
+        for name, inputs in list(tool_calls_detail.items())[:max_entries]:
+            count = len(inputs)
+            non_empty = [s for s in inputs if s]
+            if non_empty:
+                detail_lines = [f"    - {s[:120]}" for s in non_empty[:8]]
+                lines.append(f"  {name} ({count}x):\n" + "\n".join(detail_lines))
+            else:
+                lines.append(f"  {name} ({count}x)")
+        parts.append("TOOLS ALREADY CALLED:\n" + "\n".join(lines))
+    if files_saved:
+        unique = list(dict.fromkeys(files_saved))
+        parts.append("FILES SAVED: " + ", ".join(unique))
+    if outputs_set:
+        unique = list(dict.fromkeys(outputs_set))
+        parts.append("OUTPUTS SET: " + ", ".join(unique))
+    if errors:
+        parts.append("ERRORS (do NOT retry these):\n" + "\n".join(f"  - {e}" for e in errors[:10]))
+    return "\n\n".join(parts)
+
+
 # ---------------------------------------------------------------------------
 # ConversationStore protocol (Phase 2)
 # ---------------------------------------------------------------------------
@@ -373,9 +439,36 @@ class NodeConversation:
    def _repair_orphaned_tool_calls(
        msgs: list[dict[str, Any]],
    ) -> list[dict[str, Any]]:
-        """Ensure every tool_call has a matching tool-result message."""
+        """Ensure tool_call / tool_result pairs are consistent.
+
+        1. **Orphaned tool results** (tool_result with no preceding tool_use)
+           are dropped.  This happens when compaction removes an assistant
+           message but leaves its tool-result messages behind.
+        2. **Orphaned tool calls** (tool_use with no following tool_result)
+           get a synthetic error result appended.  This happens when a loop
+           is cancelled mid-tool-execution.
+        """
+        # Pass 1: collect all tool_call IDs from assistant messages so we
+        # can identify orphaned tool-result messages.
+        all_tool_call_ids: set[str] = set()
+        for m in msgs:
+            if m.get("role") == "assistant":
+                for tc in m.get("tool_calls") or []:
+                    tc_id = tc.get("id")
+                    if tc_id:
+                        all_tool_call_ids.add(tc_id)
+
+        # Pass 2: build repaired list — drop orphaned tool results, patch
+        # missing tool results.
        repaired: list[dict[str, Any]] = []
        for i, m in enumerate(msgs):
+            # Drop tool-result messages whose tool_call_id has no matching
+            # tool_use in any assistant message (orphaned by compaction).
+            if m.get("role") == "tool":
+                tid = m.get("tool_call_id")
+                if tid and tid not in all_tool_call_ids:
+                    continue  # skip orphaned result
+
            repaired.append(m)
            tool_calls = m.get("tool_calls")
            if m.get("role") != "assistant" or not tool_calls:
@@ -653,6 +746,7 @@ class NodeConversation:
        spillover_dir: str,
        keep_recent: int = 4,
        phase_graduated: bool = False,
+        aggressive: bool = False,
    ) -> None:
        """Structure-preserving compaction: save freeform text to file, keep tool messages.

@@ -662,6 +756,11 @@ class NodeConversation:
        after pruning.  Only freeform text exchanges (user messages,
        text-only assistant messages) are saved to a file and removed.

+        When *aggressive* is True, non-essential tool call pairs are also
+        collapsed into a compact summary instead of being kept individually.
+        Only ``set_output`` calls and error results are preserved; all other
+        old tool pairs are replaced by a tool-call history summary.
+
        The result: the agent retains exact knowledge of what tools it called,
        where each result is stored, and can load the conversation text if
        needed.  No LLM summary call.  No heuristics.  Nothing lost.
@@ -693,35 +792,91 @@ class NodeConversation:
        # Classify old messages: structural (keep) vs freeform (save to file)
        kept_structural: list[Message] = []
        freeform_lines: list[str] = []
+        collapsed_msgs: list[Message] = []

-        for msg in old_messages:
-            if msg.role == "tool":
-                # Tool results — already pruned to ~30 tokens (file reference).
-                # Keep in conversation.
-                kept_structural.append(msg)
-            elif msg.role == "assistant" and msg.tool_calls:
-                # Assistant message with tool_calls — keep the tool_calls
-                # with truncated arguments, clear the freeform text content.
-                compact_tcs = _compact_tool_calls(msg.tool_calls)
-                kept_structural.append(
-                    Message(
-                        seq=msg.seq,
-                        role=msg.role,
-                        content="",
-                        tool_calls=compact_tcs,
-                        is_error=msg.is_error,
-                        phase_id=msg.phase_id,
-                        is_transition_marker=msg.is_transition_marker,
-                    )
+        if aggressive:
+            # Aggressive: only keep set_output tool pairs and error results.
+            # Everything else is collapsed into a tool-call history summary.
+            # We need to track tool_call IDs to pair assistant messages with
+            # their tool results.
+            protected_tc_ids: set[str] = set()
+            collapsible_tc_ids: set[str] = set()
+
+            # First pass: classify assistant messages
+            for msg in old_messages:
+                if msg.role != "assistant" or not msg.tool_calls:
+                    continue
+                has_protected = any(
+                    tc.get("function", {}).get("name") == "set_output" for tc in msg.tool_calls
                )
-            else:
-                # Freeform text (user messages, text-only assistant messages)
-                # — save to file and remove from conversation.
-                role_label = msg.role
-                text = msg.content
-                if len(text) > 2000:
-                    text = text[:2000] + "…"
-                freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
+                tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
+                if has_protected:
+                    protected_tc_ids |= tc_ids
+                else:
+                    collapsible_tc_ids |= tc_ids
+
+            # Second pass: classify all messages
+            for msg in old_messages:
+                if msg.role == "tool":
+                    tc_id = msg.tool_use_id or ""
+                    if tc_id in protected_tc_ids:
+                        kept_structural.append(msg)
+                    elif msg.is_error:
+                        # Error results are always protected
+                        kept_structural.append(msg)
+                        # Protect the parent assistant message too
+                        protected_tc_ids.add(tc_id)
+                    else:
+                        collapsed_msgs.append(msg)
+                elif msg.role == "assistant" and msg.tool_calls:
+                    tc_ids = {tc.get("id", "") for tc in msg.tool_calls}
+                    if tc_ids & protected_tc_ids:
+                        # Has at least one protected tool call — keep entire msg
+                        compact_tcs = _compact_tool_calls(msg.tool_calls)
+                        kept_structural.append(
+                            Message(
+                                seq=msg.seq,
+                                role=msg.role,
+                                content="",
+                                tool_calls=compact_tcs,
+                                is_error=msg.is_error,
+                                phase_id=msg.phase_id,
+                                is_transition_marker=msg.is_transition_marker,
+                            )
+                        )
+                    else:
+                        collapsed_msgs.append(msg)
+                else:
+                    # Freeform text — save to file
+                    role_label = msg.role
+                    text = msg.content
+                    if len(text) > 2000:
+                        text = text[:2000] + "…"
+                    freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")
+        else:
+            # Standard mode: keep all tool call pairs as structural
+            for msg in old_messages:
+                if msg.role == "tool":
+                    kept_structural.append(msg)
+                elif msg.role == "assistant" and msg.tool_calls:
+                    compact_tcs = _compact_tool_calls(msg.tool_calls)
+                    kept_structural.append(
+                        Message(
+                            seq=msg.seq,
+                            role=msg.role,
+                            content="",
+                            tool_calls=compact_tcs,
+                            is_error=msg.is_error,
+                            phase_id=msg.phase_id,
+                            is_transition_marker=msg.is_transition_marker,
+                        )
+                    )
+                else:
+                    role_label = msg.role
+                    text = msg.content
+                    if len(text) > 2000:
+                        text = text[:2000] + "…"
+                    freeform_lines.append(f"[{role_label}] (seq={msg.seq}): {text}")

        # Write freeform text to a numbered conversation file
        spill_path = Path(spillover_dir)
@@ -741,13 +896,25 @@ class NodeConversation:
            conv_filename = ""

        # Build reference message
+        ref_parts: list[str] = []
        if conv_filename:
-            ref_content = (
+            ref_parts.append(
                f"[Previous conversation saved to '{conv_filename}'. "
                f"Use load_data('{conv_filename}') to review if needed.]"
            )
-        else:
-            ref_content = "[Previous freeform messages compacted.]"
+        elif not collapsed_msgs:
+            ref_parts.append("[Previous freeform messages compacted.]")
+
+        # Aggressive: add collapsed tool-call history to the reference
+        if collapsed_msgs:
+            tool_history = extract_tool_call_history(collapsed_msgs)
+            if tool_history:
+                ref_parts.append(tool_history)
+            elif not ref_parts:
+                ref_parts.append("[Previous tool calls compacted.]")
+
+        ref_content = "\n\n".join(ref_parts)
+
        # Use a seq just before the first kept message
        recent_messages = list(self._messages[split:])
        if kept_structural:
@@ -760,15 +927,13 @@ class NodeConversation:

        ref_msg = Message(seq=ref_seq, role="user", content=ref_content)

-        # Persist: delete old messages from store, write reference + kept structural
+        # Persist: delete old messages from store, write reference + kept structural.
+        # In aggressive mode, collapsed messages may be interspersed with kept
+        # messages, so we delete everything before the recent boundary and
+        # rewrite only what we want to keep.
        if self._store:
-            first_kept_seq = (
-                kept_structural[0].seq
-                if kept_structural
-                else (recent_messages[0].seq if recent_messages else self._next_seq)
-            )
-            # Delete everything before the first structural message we're keeping
-            await self._store.delete_parts_before(first_kept_seq)
+            recent_boundary = recent_messages[0].seq if recent_messages else self._next_seq
+            await self._store.delete_parts_before(recent_boundary)
            # Write the reference message
            await self._store.write_part(ref_msg.seq, ref_msg.to_storage_dict())
            # Write kept structural messages (they may have been modified)
@@ -431,8 +431,7 @@ class GraphSpec(BaseModel):
    max_tokens: int = Field(default=None)  # resolved by _resolve_max_tokens validator

    # Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
-    # If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
-    # ANTHROPIC_API_KEY -> claude-haiku-4-5 as fallback
+    # If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b
    cleanup_llm_model: str | None = None

    # Execution limits
@@ -138,6 +138,7 @@ class GraphExecutor:
        accounts_prompt: str = "",
        accounts_data: list[dict] | None = None,
        tool_provider_map: dict[str, str] | None = None,
+        dynamic_tools_provider: Callable | None = None,
    ):
        """
        Initialize the executor.
@@ -160,6 +161,8 @@ class GraphExecutor:
            accounts_prompt: Connected accounts block for system prompt injection
            accounts_data: Raw account data for per-node prompt generation
            tool_provider_map: Tool name to provider name mapping for account routing
+            dynamic_tools_provider: Optional callback returning current
+                tool list (for mode switching)
        """
        self.runtime = runtime
        self.llm = llm
@@ -178,12 +181,14 @@ class GraphExecutor:
        self.accounts_prompt = accounts_prompt
        self.accounts_data = accounts_data
        self.tool_provider_map = tool_provider_map
+        self.dynamic_tools_provider = dynamic_tools_provider

-        # Initialize output cleaner
+        # Initialize output cleaner — uses its own dedicated fast model (CEREBRAS_API_KEY),
+        # never the main agent LLM. Passing the main LLM here would cause expensive
+        # Anthropic calls for output cleaning whenever ANTHROPIC_API_KEY is set.
        self.cleansing_config = cleansing_config or CleansingConfig()
        self.output_cleaner = OutputCleaner(
            config=self.cleansing_config,
-            llm_provider=llm,
        )

        # Parallel execution settings
@@ -286,6 +291,125 @@ class GraphExecutor:

        return errors

+    # Max chars of formatted messages before proactively splitting for LLM.
+    _PHASE_LLM_CHAR_LIMIT = 240_000
+    _PHASE_LLM_MAX_DEPTH = 10
+
+    async def _phase_llm_compact(
+        self,
+        conversation: Any,
+        next_spec: NodeSpec,
+        messages: list,
+        _depth: int = 0,
+    ) -> str:
+        """Summarise messages for phase-boundary compaction.
+
+        Uses the same recursive binary-search splitting as EventLoopNode.
+        """
+        from framework.graph.conversation import extract_tool_call_history
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        if _depth > self._PHASE_LLM_MAX_DEPTH:
+            raise RuntimeError("Phase LLM compaction recursion limit")
+
+        # Format messages
+        lines: list[str] = []
+        for m in messages:
+            if m.role == "tool":
+                c = m.content[:500] + ("..." if len(m.content) > 500 else "")
+                lines.append(f"[tool result]: {c}")
+            elif m.role == "assistant" and m.tool_calls:
+                names = [tc.get("function", {}).get("name", "?") for tc in m.tool_calls]
+                lines.append(
+                    f"[assistant (calls: {', '.join(names)})]: "
+                    f"{m.content[:200] if m.content else ''}"
+                )
+            else:
+                lines.append(f"[{m.role}]: {m.content}")
+        formatted = "\n\n".join(lines)
+
+        # Proactive split
+        if len(formatted) > self._PHASE_LLM_CHAR_LIMIT and len(messages) > 1:
+            summary = await self._phase_llm_compact_split(
+                conversation,
+                next_spec,
+                messages,
+                _depth,
+            )
+        else:
+            max_tokens = getattr(conversation, "_max_history_tokens", 32000)
+            target_tokens = max_tokens // 2
+            target_chars = target_tokens * 4
+
+            prompt = (
+                "You are compacting an AI agent's conversation history "
+                "at a phase boundary.\n\n"
+                f"NEXT PHASE: {next_spec.name}\n"
+            )
+            if next_spec.description:
+                prompt += f"NEXT PHASE PURPOSE: {next_spec.description}\n"
+            prompt += (
+                f"\nCONVERSATION MESSAGES:\n{formatted}\n\n"
+                "INSTRUCTIONS:\n"
+                f"Write a summary of approximately {target_chars} characters "
+                f"(~{target_tokens} tokens).\n"
+                "Preserve user-stated rules, constraints, and preferences "
+                "verbatim. Preserve key decisions and results from earlier "
+                "phases. Preserve context needed for the next phase.\n"
+            )
+            summary_budget = max(1024, max_tokens // 2)
+            try:
+                response = await self._llm.acomplete(
+                    messages=[{"role": "user", "content": prompt}],
+                    system=(
+                        "You are a conversation compactor. Write a detailed "
+                        "summary preserving context for the next phase."
+                    ),
+                    max_tokens=summary_budget,
+                )
+                summary = response.content
+            except Exception as e:
+                if _is_context_too_large_error(e) and len(messages) > 1:
+                    summary = await self._phase_llm_compact_split(
+                        conversation,
+                        next_spec,
+                        messages,
+                        _depth,
+                    )
+                else:
+                    raise
+
+        # Append tool history at top level only
+        if _depth == 0:
+            tool_history = extract_tool_call_history(messages)
+            if tool_history and "TOOLS ALREADY CALLED" not in summary:
+                summary += "\n\n" + tool_history
+
+        return summary
+
+    async def _phase_llm_compact_split(
+        self,
+        conversation: Any,
+        next_spec: NodeSpec,
+        messages: list,
+        _depth: int,
+    ) -> str:
+        """Split messages in half and summarise each half."""
+        mid = max(1, len(messages) // 2)
+        s1 = await self._phase_llm_compact(
+            conversation,
+            next_spec,
+            messages[:mid],
+            _depth + 1,
+        )
+        s2 = await self._phase_llm_compact(
+            conversation,
+            next_spec,
+            messages[mid:],
+            _depth + 1,
+        )
+        return s1 + "\n\n" + s2
+
    async def execute(
        self,
        graph: GraphSpec,
@@ -1291,9 +1415,7 @@ class GraphExecutor:
                        # Set current phase for phase-aware compaction
                        continuous_conversation.set_current_phase(next_spec.id)

-                        # Opportunistic compaction at transition:
-                        # 1. Prune old tool results (free, no LLM call)
-                        # 2. If still over 80%, do a phase-graduated compact
+                        # Phase-boundary compaction (same flow as EventLoopNode._compact)
                        if continuous_conversation.usage_ratio() > 0.5:
                            await continuous_conversation.prune_old_tool_results(
                                protect_tokens=2000,
@@ -1307,38 +1429,62 @@ class GraphExecutor:
                            _data_dir = (
                                str(self._storage_path / "data") if self._storage_path else None
                            )
+                            # Step 1: Structural compaction (>=80%)
                            if _data_dir:
+                                _pre = continuous_conversation.usage_ratio()
                                await continuous_conversation.compact_preserving_structure(
                                    spillover_dir=_data_dir,
                                    keep_recent=4,
                                    phase_graduated=True,
                                )
-                                # Circuit breaker: if still over budget, fall back
-                                _post_ratio = continuous_conversation.usage_ratio()
-                                if _post_ratio >= 0.9 * _phase_ratio:
-                                    self.logger.warning(
-                                        "   Structure-preserving compaction ineffective "
-                                        "(%.0f%% -> %.0f%%), falling back to summary",
-                                        _phase_ratio * 100,
-                                        _post_ratio * 100,
-                                    )
-                                    summary = (
-                                        f"Summary of earlier phases (before {next_spec.name}). "
-                                        "See transition markers for phase details."
-                                    )
-                                    await continuous_conversation.compact(
-                                        summary,
+                                if continuous_conversation.usage_ratio() >= 0.9 * _pre:
+                                    await continuous_conversation.compact_preserving_structure(
+                                        spillover_dir=_data_dir,
                                        keep_recent=4,
                                        phase_graduated=True,
+                                        aggressive=True,
                                    )
-                            else:
+
+                            # Step 2: LLM compaction (>95%)
+                            if (
+                                continuous_conversation.usage_ratio() > 0.95
+                                and self._llm is not None
+                            ):
+                                self.logger.info(
+                                    "   LLM phase-boundary compaction (%.0f%% usage)",
+                                    continuous_conversation.usage_ratio() * 100,
+                                )
+                                try:
+                                    _llm_summary = await self._phase_llm_compact(
+                                        continuous_conversation,
+                                        next_spec,
+                                        list(continuous_conversation.messages),
+                                    )
+                                    await continuous_conversation.compact(
+                                        _llm_summary,
+                                        keep_recent=2,
+                                        phase_graduated=True,
+                                    )
+                                except Exception as e:
+                                    self.logger.warning(
+                                        "   Phase LLM compaction failed: %s",
+                                        e,
+                                    )
+
+                            # Step 3: Emergency (only if still over budget)
+                            if continuous_conversation.needs_compaction():
+                                self.logger.warning(
+                                    "   Emergency phase compaction (%.0f%%)",
+                                    continuous_conversation.usage_ratio() * 100,
+                                )
                                summary = (
-                                    f"Summary of earlier phases (before {next_spec.name}). "
+                                    f"Summary of earlier phases "
+                                    f"(before {next_spec.name}). "
                                    "See transition markers for phase details."
                                )
                                await continuous_conversation.compact(
                                    summary,
-                                    keep_recent=4,
+                                    keep_recent=1,
                                    phase_graduated=True,
                                )

@@ -1651,6 +1797,7 @@ class GraphExecutor:
            node_registry=node_registry or {},
            all_tools=list(self.tools),  # Full catalog for subagent tool resolution
            shared_node_registry=self.node_registry,  # For subagent escalation routing
+            dynamic_tools_provider=self.dynamic_tools_provider,
        )

    VALID_NODE_TYPES = {
@@ -154,69 +154,17 @@ class HITLProtocol:
        """
        Parse human's raw input into structured response.

-        Uses Haiku to intelligently extract answers for each question.
+        Maps the raw input to the first question. For multi-question HITL,
+        the caller should present one question at a time.
        """
-        import os
-
        response = HITLResponse(request_id=request.request_id, raw_input=raw_input)

        # If no questions, just return raw input
        if not request.questions:
            return response

-        # Try to use Haiku for intelligent parsing
-        api_key = os.environ.get("ANTHROPIC_API_KEY")
-        if not use_haiku or not api_key:
-            # Simple fallback: treat as answer to first question
-            if request.questions:
-                response.answers[request.questions[0].id] = raw_input
-            return response
-
-        # Use Haiku to extract answers
-        try:
-            import json
-
-            import anthropic
-
-            questions_str = "\n".join(
-                [f"{i + 1}. {q.question} (id: {q.id})" for i, q in enumerate(request.questions)]
-            )
-
-            prompt = f"""Parse the user's response and extract answers for each question.
-
-Questions asked:
-{questions_str}
-
-User's response:
-{raw_input}
-
-Extract the answer for each question. Output JSON with question IDs as keys.
-
-Example format:
-{{"question-1": "answer here", "question-2": "answer here"}}"""
-
-            client = anthropic.Anthropic(api_key=api_key)
-            message = client.messages.create(
-                model="claude-haiku-4-5-20251001",
-                max_tokens=500,
-                messages=[{"role": "user", "content": prompt}],
-            )
-
-            # Parse Haiku's response
-            import re
-
-            response_text = message.content[0].text.strip()
-            json_match = re.search(r"\{[^{}]*\}", response_text, re.DOTALL)
-
-            if json_match:
-                parsed = json.loads(json_match.group())
-                response.answers = parsed
-
-        except Exception:
-            # Fallback: use raw input for first question
-            if request.questions:
-                response.answers[request.questions[0].id] = raw_input
-
+        # Map raw input to first question
+        response.answers[request.questions[0].id] = raw_input
        return response

    @staticmethod
@@ -544,6 +544,11 @@ class NodeContext:
    # the inject_input() routing chain can find.
    shared_node_registry: dict[str, Any] = field(default_factory=dict)

+    # Dynamic tool provider — when set, EventLoopNode rebuilds the tool
+    # list from this callback at the start of each iteration.  Used by
+    # the queen to switch between building-mode and running-mode tools.
+    dynamic_tools_provider: Any = None  # Callable[[], list[Tool]] | None
+

@dataclass
 class NodeResult:
@@ -580,7 +585,6 @@ class NodeResult:
        Generate a human-readable summary of this node's execution and output.

        This is like toString() - it describes what the node produced in its current state.
-        Uses Haiku to intelligently summarize complex outputs.
        """
        if not self.success:
            return f"❌ Failed: {self.error}"
@@ -588,59 +592,13 @@ class NodeResult:
        if not self.output:
            return "✓ Completed (no output)"

-        # Use Haiku to generate intelligent summary
-        import os
-
-        api_key = os.environ.get("ANTHROPIC_API_KEY")
-
-        if not api_key:
-            # Fallback: simple key-value listing
-            parts = [f"✓ Completed with {len(self.output)} outputs:"]
-            for key, value in list(self.output.items())[:5]:  # Limit to 5 keys
-                value_str = str(value)[:100]
-                if len(str(value)) > 100:
-                    value_str += "..."
-                parts.append(f"  • {key}: {value_str}")
-            return "\n".join(parts)
-
-        # Use Haiku to generate intelligent summary
-        try:
-            import json
-
-            import anthropic
-
-            node_context = ""
-            if node_spec:
-                node_context = f"\nNode: {node_spec.name}\nPurpose: {node_spec.description}"
-
-            output_json = json.dumps(self.output, indent=2, default=str)[:2000]
-            prompt = (
-                f"Generate a 1-2 sentence human-readable summary of "
-                f"what this node produced.{node_context}\n\n"
-                f"Node output:\n{output_json}\n\n"
-                "Provide a concise, clear summary that a human can quickly "
-                "understand. Focus on the key information produced."
-            )
-
-            client = anthropic.Anthropic(api_key=api_key)
-            message = client.messages.create(
-                model="claude-haiku-4-5-20251001",
-                max_tokens=200,
-                messages=[{"role": "user", "content": prompt}],
-            )
-
-            summary = message.content[0].text.strip()
-            return f"✓ {summary}"
-
-        except Exception:
-            # Fallback on error
-            parts = [f"✓ Completed with {len(self.output)} outputs:"]
-            for key, value in list(self.output.items())[:3]:
-                value_str = str(value)[:80]
-                if len(str(value)) > 80:
-                    value_str += "..."
-                parts.append(f"  • {key}: {value_str}")
-            return "\n".join(parts)
+        parts = [f"✓ Completed with {len(self.output)} outputs:"]
+        for key, value in list(self.output.items())[:5]:  # Limit to 5 keys
+            value_str = str(value)[:100]
+            if len(str(value)) > 100:
+                value_str += "..."
+            parts.append(f"  • {key}: {value_str}")
+        return "\n".join(parts)


 class NodeProtocol(ABC):
@@ -170,7 +170,7 @@ def _dump_failed_request(
        "temperature": kwargs.get("temperature"),
    }

-    with open(filepath, "w") as f:
+    with open(filepath, "w", encoding="utf-8") as f:
        json.dump(dump_data, f, indent=2, default=str)

    return str(filepath)
@@ -162,7 +162,7 @@ def _load_session(session_id: str) -> BuildSession:
    if not session_file.exists():
        raise ValueError(f"Session '{session_id}' not found")

-    with open(session_file) as f:
+    with open(session_file, encoding="utf-8") as f:
        data = json.load(f)

    return BuildSession.from_dict(data)
@@ -174,7 +174,7 @@ def _load_active_session() -> BuildSession | None:
        return None

    try:
-        with open(ACTIVE_SESSION_FILE) as f:
+        with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
            session_id = f.read().strip()

        if session_id:
@@ -228,7 +228,7 @@ def list_sessions() -> str:
    if SESSIONS_DIR.exists():
        for session_file in SESSIONS_DIR.glob("*.json"):
            try:
-                with open(session_file) as f:
+                with open(session_file, encoding="utf-8") as f:
                    data = json.load(f)
                    sessions.append(
                        {
@@ -248,7 +248,7 @@ def list_sessions() -> str:
    active_id = None
    if ACTIVE_SESSION_FILE.exists():
        try:
-            with open(ACTIVE_SESSION_FILE) as f:
+            with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
                active_id = f.read().strip()
        except Exception:
            pass
@@ -310,7 +310,7 @@ def delete_session(session_id: Annotated[str, "ID of the session to delete"]) ->
            _session = None

        if ACTIVE_SESSION_FILE.exists():
-            with open(ACTIVE_SESSION_FILE) as f:
+            with open(ACTIVE_SESSION_FILE, encoding="utf-8") as f:
                active_id = f.read().strip()
                if active_id == session_id:
                    ACTIVE_SESSION_FILE.unlink()
@@ -2894,10 +2894,12 @@ def run_tests(
    try:
        result = subprocess.run(
            cmd,
+            encoding="utf-8",
            capture_output=True,
            text=True,
            timeout=600,  # 10 minute timeout
            env=env,
+            stdin=subprocess.DEVNULL,
        )
    except subprocess.TimeoutExpired:
        return json.dumps(
@@ -3085,10 +3087,12 @@ def debug_test(
    try:
        result = subprocess.run(
            cmd,
+            encoding="utf-8",
            capture_output=True,
            text=True,
            timeout=120,  # 2 minute timeout for single test
            env=env,
+            stdin=subprocess.DEVNULL,
        )
    except subprocess.TimeoutExpired:
        return json.dumps(
@@ -3712,82 +3716,6 @@ def list_agent_sessions(
    )


-@mcp.tool()
-def get_agent_session_state(
-    agent_work_dir: Annotated[str, "Path to the agent's working directory"],
-    session_id: Annotated[str, "The session ID (e.g., 'session_20260208_143022_abc12345')"],
-) -> str:
-    """
-    Load full session state for a specific session.
-
-    Returns complete session data including status, progress, result,
-    metrics, and checkpoint info. Memory values are excluded to prevent
-    context bloat -- use get_agent_session_memory to retrieve memory contents.
-    """
-    state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
-    data = _read_session_json(state_path)
-    if data is None:
-        return json.dumps({"error": f"Session not found: {session_id}"})
-
-    memory = data.get("memory", {})
-    data["memory_keys"] = list(memory.keys()) if isinstance(memory, dict) else []
-    data["memory_size"] = len(memory) if isinstance(memory, dict) else 0
-    data.pop("memory", None)
-
-    return json.dumps(data, indent=2, default=str)
-
-
-@mcp.tool()
-def get_agent_session_memory(
-    agent_work_dir: Annotated[str, "Path to the agent's working directory"],
-    session_id: Annotated[str, "The session ID"],
-    key: Annotated[str, "Specific memory key to retrieve. Empty for all."] = "",
-) -> str:
-    """
-    Get memory contents from a session.
-
-    Memory stores intermediate results passed between nodes. Use this
-    to inspect what data was produced during execution.
-
-    If key is provided, returns only that memory key's value.
-    If key is empty, returns all memory keys and their values.
-    """
-    state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
-    data = _read_session_json(state_path)
-    if data is None:
-        return json.dumps({"error": f"Session not found: {session_id}"})
-
-    memory = data.get("memory", {})
-    if not isinstance(memory, dict):
-        memory = {}
-
-    if key:
-        if key not in memory:
-            return json.dumps(
-                {
-                    "error": f"Memory key not found: '{key}'",
-                    "available_keys": list(memory.keys()),
-                }
-            )
-        value = memory[key]
-        return json.dumps(
-            {
-                "session_id": session_id,
-                "key": key,
-                "value": value,
-                "value_type": type(value).__name__,
-            },
-            indent=2,
-            default=str,
-        )
-
-    return json.dumps(
-        {"session_id": session_id, "memory": memory, "total_keys": len(memory)},
-        indent=2,
-        default=str,
-    )
-
-
@mcp.tool()
 def list_agent_checkpoints(
    agent_work_dir: Annotated[str, "Path to the agent's working directory"],
@@ -401,6 +401,43 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
    )
    serve_parser.set_defaults(func=cmd_serve)

+    # open command (serve + auto-open browser)
+    open_parser = subparsers.add_parser(
+        "open",
+        help="Start HTTP server and open dashboard in browser",
+        description="Shortcut for 'hive serve --open'. "
+        "Starts the HTTP server and opens the dashboard.",
+    )
+    open_parser.add_argument(
+        "--host",
+        type=str,
+        default="127.0.0.1",
+        help="Host to bind (default: 127.0.0.1)",
+    )
+    open_parser.add_argument(
+        "--port",
+        "-p",
+        type=int,
+        default=8787,
+        help="Port to listen on (default: 8787)",
+    )
+    open_parser.add_argument(
+        "--agent",
+        "-a",
+        type=str,
+        action="append",
+        default=[],
+        help="Agent path to preload (repeatable)",
+    )
+    open_parser.add_argument(
+        "--model",
+        "-m",
+        type=str,
+        default=None,
+        help="LLM model for preloaded agents",
+    )
+    open_parser.set_defaults(func=cmd_open)
+

 def _load_resume_state(
    agent_path: str, session_id: str, checkpoint_id: str | None = None
@@ -517,7 +554,7 @@ def cmd_run(args: argparse.Namespace) -> int:
            return 1
    elif args.input_file:
        try:
-            with open(args.input_file) as f:
+            with open(args.input_file, encoding="utf-8") as f:
                context = json.load(f)
        except (FileNotFoundError, json.JSONDecodeError) as e:
            print(f"Error reading input file: {e}", file=sys.stderr)
@@ -659,7 +696,7 @@ def cmd_run(args: argparse.Namespace) -> int:

    # Output results
    if args.output:
-        with open(args.output, "w") as f:
+        with open(args.output, "w", encoding="utf-8") as f:
            json.dump(output, f, indent=2, default=str)
        if not args.quiet:
            print(f"Results written to {args.output}")
@@ -1053,62 +1090,19 @@ def _interactive_approval(request):
 def _format_natural_language_to_json(
    user_input: str, input_keys: list[str], agent_description: str, session_context: dict = None
 ) -> dict:
-    """Use Haiku to convert natural language input to JSON based on agent's input schema."""
-    import os
+    """Convert natural language input to JSON based on agent's input schema.

-    import anthropic
+    Maps user input to the primary input field. For follow-up inputs,
+    appends to the existing value.
+    """
+    main_field = input_keys[0] if input_keys else "objective"

-    client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
-
-    # Build prompt for Haiku
-    session_info = ""
    if session_context:
-        # Extract the main field (usually 'objective') that we'll append to
-        main_field = input_keys[0] if input_keys else "objective"
        existing_value = session_context.get(main_field, "")
+        if existing_value:
+            return {main_field: f"{existing_value}\n\n{user_input}"}

-        session_info = (
-            f'\n\nExisting {main_field}: "{existing_value}"\n\n'
-            f"The user is providing ADDITIONAL information. Append this new "
-            f"information to the existing {main_field} to create an enriched, "
-            "more detailed version."
-        )
-
-    prompt = f"""You are formatting user input for an agent that requires specific input fields.
-
-Agent: {agent_description}
-
-Required input fields: {", ".join(input_keys)}{session_info}
-
-User input: {user_input}
-
-{"If this is a follow-up, APPEND new info to the existing field value." if session_context else ""}
-
-Output ONLY valid JSON, no explanation:"""
-
-    try:
-        message = client.messages.create(
-            model="claude-haiku-4-5-20251001",  # Fast and cheap
-            max_tokens=500,
-            messages=[{"role": "user", "content": prompt}],
-        )
-
-        json_str = message.content[0].text.strip()
-        # Remove markdown code blocks if present
-        if json_str.startswith("```"):
-            json_str = json_str.split("```")[1]
-            if json_str.startswith("json"):
-                json_str = json_str[4:]
-        json_str = json_str.strip()
-
-        return json.loads(json_str)
-    except Exception:
-        # Fallback: try to infer the main field
-        if len(input_keys) == 1:
-            return {input_keys[0]: user_input}
-        else:
-            # Put it in the first field as fallback
-            return {input_keys[0]: user_input}
+    return {main_field: user_input}


 def cmd_shell(args: argparse.Namespace) -> int:
@@ -1517,7 +1511,7 @@ def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
        return fallback_name, fallback_desc

    try:
-        with open(config_path) as f:
+        with open(config_path, encoding="utf-8") as f:
            tree = ast.parse(f.read())

        # Find AgentMetadata class definition
@@ -1928,14 +1922,27 @@ def cmd_setup_credentials(args: argparse.Namespace) -> int:
 def _open_browser(url: str) -> None:
    """Open URL in the default browser (best-effort, non-blocking)."""
    import subprocess
-    import sys

    try:
        if sys.platform == "darwin":
-            subprocess.Popen(["open", url], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+            subprocess.Popen(
+                ["open", url],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                encoding="utf-8",
+            )
+        elif sys.platform == "win32":
+            subprocess.Popen(
+                ["cmd", "/c", "start", "", url],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+            )
        elif sys.platform == "linux":
            subprocess.Popen(
-                ["xdg-open", url], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
+                ["xdg-open", url],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                encoding="utf-8",
            )
    except Exception:
        pass  # Best-effort — don't crash if browser can't open
@@ -1980,12 +1987,14 @@ def _build_frontend() -> bool:
        # Ensure deps are installed
        subprocess.run(
            ["npm", "install", "--no-fund", "--no-audit"],
+            encoding="utf-8",
            cwd=frontend_dir,
            check=True,
            capture_output=True,
        )
        subprocess.run(
            ["npm", "run", "build"],
+            encoding="utf-8",
            cwd=frontend_dir,
            check=True,
            capture_output=True,
@@ -2074,3 +2083,9 @@ def cmd_serve(args: argparse.Namespace) -> int:
        print("\nServer stopped.")

    return 0
+
+
+def cmd_open(args: argparse.Namespace) -> int:
+    """Start the HTTP API server and open the dashboard in the browser."""
+    args.open = True
+    return cmd_serve(args)
@@ -7,6 +7,8 @@ Supports both STDIO and HTTP transports using the official MCP Python SDK.
 import asyncio
 import logging
 import os
+import sys
+import threading
 from dataclasses import dataclass, field
 from typing import Any, Literal

@@ -73,6 +75,8 @@ class MCPClient:
        # Background event loop for persistent STDIO connection
        self._loop = None
        self._loop_thread = None
+        # Serialize STDIO tool calls (avoids races, helps on Windows)
+        self._stdio_call_lock = threading.Lock()

    def _run_async(self, coro):
        """
@@ -156,11 +160,19 @@ class MCPClient:
            # Create server parameters
            # Always inherit parent environment and merge with any custom env vars
            merged_env = {**os.environ, **(self.config.env or {})}
+            # On Windows, passing cwd can cause WinError 267 ("invalid directory name").
+            # tool_registry passes cwd=None and uses absolute script paths when applicable.
+            cwd = self.config.cwd
+            if os.name == "nt" and cwd is not None:
+                # Avoid passing cwd on Windows; tool_registry should have set cwd=None
+                # and absolute script paths for tools-dir servers. If cwd is still set,
+                # pass None to prevent WinError 267 (caller should use absolute paths).
+                cwd = None
            server_params = StdioServerParameters(
                command=self.config.command,
                args=self.config.args,
                env=merged_env,
-                cwd=self.config.cwd,
+                cwd=cwd,
            )

            # Store for later use
@@ -184,10 +196,12 @@ class MCPClient:
                        from mcp.client.stdio import stdio_client

                        # Create persistent stdio client context.
-                        # Redirect server stderr to devnull to prevent raw
-                        # output from leaking behind the TUI.
-                        devnull = open(os.devnull, "w")  # noqa: SIM115
-                        self._stdio_context = stdio_client(server_params, errlog=devnull)
+                        # On Windows, use stderr so subprocess startup errors are visible.
+                        if os.name == "nt":
+                            errlog = sys.stderr
+                        else:
+                            errlog = open(os.devnull, "w")  # noqa: SIM115
+                        self._stdio_context = stdio_client(server_params, errlog=errlog)
                        (
                            self._read_stream,
                            self._write_stream,
@@ -353,7 +367,8 @@ class MCPClient:
            raise ValueError(f"Unknown tool: {tool_name}")

        if self.config.transport == "stdio":
-            return self._run_async(self._call_tool_stdio_async(tool_name, arguments))
+            with self._stdio_call_lock:
+                return self._run_async(self._call_tool_stdio_async(tool_name, arguments))
        else:
            return self._call_tool_http(tool_name, arguments)

@@ -448,11 +463,15 @@ class MCPClient:
            if self._stdio_context:
                await self._stdio_context.__aexit__(None, None, None)
        except asyncio.CancelledError:
-            logger.warning(
+            logger.debug(
                "STDIO context cleanup was cancelled; proceeding with best-effort shutdown"
            )
        except Exception as e:
-            logger.warning(f"Error closing STDIO context: {e}")
+            msg = str(e).lower()
+            if "cancel scope" in msg or "different task" in msg:
+                logger.debug("STDIO context teardown (known anyio quirk): %s", e)
+            else:
+                logger.warning(f"Error closing STDIO context: {e}")
        finally:
            self._stdio_context = None

@@ -39,6 +39,7 @@ logger = logging.getLogger(__name__)
 CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
 CLAUDE_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
 CLAUDE_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
+CLAUDE_KEYCHAIN_SERVICE = "Claude Code-credentials"

 # Buffer in seconds before token expiry to trigger a proactive refresh
 _TOKEN_REFRESH_BUFFER_SECS = 300  # 5 minutes
@@ -51,6 +52,96 @@ CODEX_KEYCHAIN_SERVICE = "Codex Auth"
 _CODEX_TOKEN_LIFETIME_SECS = 3600  # 1 hour (no explicit expiry field)


+def _read_claude_keychain() -> dict | None:
+    """Read Claude Code credentials from macOS Keychain.
+
+    Returns the parsed JSON dict, or None if not on macOS or entry missing.
+    """
+    import getpass
+    import platform
+    import subprocess
+
+    if platform.system() != "Darwin":
+        return None
+
+    try:
+        account = getpass.getuser()
+        result = subprocess.run(
+            [
+                "security",
+                "find-generic-password",
+                "-s",
+                CLAUDE_KEYCHAIN_SERVICE,
+                "-a",
+                account,
+                "-w",
+            ],
+            capture_output=True,
+            encoding="utf-8",
+            timeout=5,
+        )
+        if result.returncode != 0:
+            return None
+        raw = result.stdout.strip()
+        if not raw:
+            return None
+        return json.loads(raw)
+    except (subprocess.TimeoutExpired, json.JSONDecodeError, OSError) as exc:
+        logger.debug("Claude keychain read failed: %s", exc)
+        return None
+
+
+def _save_claude_keychain(creds: dict) -> bool:
+    """Write Claude Code credentials to macOS Keychain. Returns True on success."""
+    import getpass
+    import platform
+    import subprocess
+
+    if platform.system() != "Darwin":
+        return False
+
+    try:
+        account = getpass.getuser()
+        data = json.dumps(creds)
+        result = subprocess.run(
+            [
+                "security",
+                "add-generic-password",
+                "-U",
+                "-s",
+                CLAUDE_KEYCHAIN_SERVICE,
+                "-a",
+                account,
+                "-w",
+                data,
+            ],
+            capture_output=True,
+            timeout=5,
+        )
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, OSError) as exc:
+        logger.debug("Claude keychain write failed: %s", exc)
+        return False
+
+
+def _read_claude_credentials() -> dict | None:
+    """Read Claude Code credentials from Keychain (macOS) or file (Linux/Windows)."""
+    # Try macOS Keychain first
+    creds = _read_claude_keychain()
+    if creds:
+        return creds
+
+    # Fall back to file
+    if not CLAUDE_CREDENTIALS_FILE.exists():
+        return None
+
+    try:
+        with open(CLAUDE_CREDENTIALS_FILE, encoding="utf-8") as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError):
+        return None
+
+
 def _refresh_claude_code_token(refresh_token: str) -> dict | None:
    """Refresh the Claude Code OAuth token using the refresh token.

@@ -89,16 +180,14 @@ def _refresh_claude_code_token(refresh_token: str) -> dict | None:


 def _save_refreshed_credentials(token_data: dict) -> None:
-    """Write refreshed token data back to ~/.claude/.credentials.json."""
+    """Write refreshed token data back to Keychain (macOS) or credentials file."""
    import time

-    if not CLAUDE_CREDENTIALS_FILE.exists():
+    creds = _read_claude_credentials()
+    if not creds:
        return

    try:
-        with open(CLAUDE_CREDENTIALS_FILE) as f:
-            creds = json.load(f)
-
        oauth = creds.get("claudeAiOauth", {})
        oauth["accessToken"] = token_data["access_token"]
        if "refresh_token" in token_data:
@@ -107,9 +196,15 @@ def _save_refreshed_credentials(token_data: dict) -> None:
            oauth["expiresAt"] = int((time.time() + token_data["expires_in"]) * 1000)
        creds["claudeAiOauth"] = oauth

-        with open(CLAUDE_CREDENTIALS_FILE, "w") as f:
-            json.dump(creds, f, indent=2)
-        logger.debug("Claude Code credentials refreshed successfully")
+        # Try Keychain first (macOS), fall back to file
+        if _save_claude_keychain(creds):
+            logger.debug("Claude Code credentials refreshed in Keychain")
+            return
+
+        if CLAUDE_CREDENTIALS_FILE.exists():
+            with open(CLAUDE_CREDENTIALS_FILE, "w", encoding="utf-8") as f:
+                json.dump(creds, f, indent=2)
+            logger.debug("Claude Code credentials refreshed in file")
    except (json.JSONDecodeError, OSError, KeyError) as exc:
        logger.debug("Failed to save refreshed credentials: %s", exc)

@@ -117,8 +212,8 @@ def _save_refreshed_credentials(token_data: dict) -> None:
 def get_claude_code_token() -> str | None:
    """Get the OAuth token from Claude Code subscription with auto-refresh.

-    Reads from ~/.claude/.credentials.json which is created by the
-    Claude Code CLI when users authenticate with their subscription.
+    Reads from macOS Keychain (on Darwin) or ~/.claude/.credentials.json
+    (on Linux/Windows), as created by the Claude Code CLI.

    If the token is expired or close to expiry, attempts an automatic
    refresh using the stored refresh token.
@@ -128,13 +223,8 @@ def get_claude_code_token() -> str | None:
    """
    import time

-    if not CLAUDE_CREDENTIALS_FILE.exists():
-        return None
-
-    try:
-        with open(CLAUDE_CREDENTIALS_FILE) as f:
-            creds = json.load(f)
-    except (json.JSONDecodeError, OSError):
+    creds = _read_claude_credentials()
+    if not creds:
        return None

    oauth = creds.get("claudeAiOauth", {})
@@ -212,7 +302,7 @@ def _read_codex_keychain() -> dict | None:
                "-w",
            ],
            capture_output=True,
-            text=True,
+            encoding="utf-8",
            timeout=5,
        )
        if result.returncode != 0:
@@ -231,7 +321,7 @@ def _read_codex_auth_file() -> dict | None:
    if not CODEX_AUTH_FILE.exists():
        return None
    try:
-        with open(CODEX_AUTH_FILE) as f:
+        with open(CODEX_AUTH_FILE, encoding="utf-8") as f:
            return json.load(f)
    except (json.JSONDecodeError, OSError):
        return None
@@ -324,7 +414,7 @@ def _save_refreshed_codex_credentials(auth_data: dict, token_data: dict) -> None

        CODEX_AUTH_FILE.parent.mkdir(parents=True, exist_ok=True, mode=0o700)
        fd = os.open(CODEX_AUTH_FILE, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
-        with os.fdopen(fd, "w") as f:
+        with os.fdopen(fd, "w", encoding="utf-8") as f:
            json.dump(auth_data, f, indent=2)
        logger.debug("Codex credentials refreshed successfully")
    except (OSError, KeyError) as exc:
@@ -869,7 +959,7 @@ class AgentRunner:
        if not agent_json_path.exists():
            raise FileNotFoundError(f"No agent.py or agent.json found in {agent_path}")

-        with open(agent_json_path) as f:
+        with open(agent_json_path, encoding="utf-8") as f:
            graph, goal = load_agent_export(f.read())

        return cls(
@@ -326,6 +326,103 @@ class ToolRegistry:
        """Restore execution context to its previous state."""
        _execution_context.reset(token)

+    @staticmethod
+    def resolve_mcp_stdio_config(server_config: dict[str, Any], base_dir: Path) -> dict[str, Any]:
+        """Resolve cwd and script paths for MCP stdio config (Windows compatibility).
+
+        Use this when building MCPServerConfig from a config file (e.g. in
+        list_agent_tools, discover_mcp_tools) so hive-tools and other servers
+        work on Windows. Call with base_dir = directory containing the config.
+        """
+        registry = ToolRegistry()
+        return registry._resolve_mcp_server_config(server_config, base_dir)
+
+    def _resolve_mcp_server_config(
+        self, server_config: dict[str, Any], base_dir: Path
+    ) -> dict[str, Any]:
+        """Resolve cwd and script paths for MCP stdio servers (Windows compatibility).
+
+        On Windows, passing cwd to subprocess can cause WinError 267. We use cwd=None
+        and absolute script paths when the server runs a .py script from the tools dir.
+        If the resolved cwd doesn't exist (e.g. config from ~/.hive/agents/), fall back
+        to Path.cwd() / "tools".
+        """
+        config = dict(server_config)
+        if config.get("transport") != "stdio":
+            return config
+
+        cwd = config.get("cwd")
+        args = list(config.get("args", []))
+        if not cwd and not args:
+            return config
+
+        # Resolve cwd relative to base_dir
+        resolved_cwd: Path | None = None
+        if cwd:
+            if Path(cwd).is_absolute():
+                resolved_cwd = Path(cwd)
+            else:
+                resolved_cwd = (base_dir / cwd).resolve()
+
+        # Find .py script in args (e.g. coder_tools_server.py, files_server.py)
+        script_name = None
+        for i, arg in enumerate(args):
+            if isinstance(arg, str) and arg.endswith(".py"):
+                script_name = arg
+                script_idx = i
+                break
+
+        if resolved_cwd is None:
+            return config
+
+        # If resolved cwd doesn't exist or (when we have a script) doesn't contain it,
+        # try fallback
+        tools_fallback = Path.cwd() / "tools"
+        need_fallback = not resolved_cwd.is_dir()
+        if script_name and not need_fallback:
+            need_fallback = not (resolved_cwd / script_name).exists()
+        if need_fallback:
+            fallback_ok = tools_fallback.is_dir()
+            if script_name:
+                fallback_ok = fallback_ok and (tools_fallback / script_name).exists()
+            else:
+                # No script (e.g. GCU); just need tools dir to exist
+                pass
+            if fallback_ok:
+                resolved_cwd = tools_fallback
+                logger.debug(
+                    "MCP server '%s': using fallback tools dir %s",
+                    config.get("name", "?"),
+                    resolved_cwd,
+                )
+            else:
+                config["cwd"] = str(resolved_cwd)
+                return config
+
+        if not script_name:
+            # No .py script (e.g. GCU uses -m gcu.server); just set cwd
+            config["cwd"] = str(resolved_cwd)
+            return config
+
+        # For coder_tools_server, inject --project-root so writes go to the expected workspace
+        if script_name and "coder_tools" in script_name:
+            project_root = str(resolved_cwd.parent.resolve())
+            args = list(args)
+            if "--project-root" not in args:
+                args.extend(["--project-root", project_root])
+            config["args"] = args
+
+        if os.name == "nt":
+            # Windows: cwd=None avoids WinError 267; use absolute script path
+            config["cwd"] = None
+            abs_script = str((resolved_cwd / script_name).resolve())
+            args = list(config["args"])
+            args[script_idx] = abs_script
+            config["args"] = args
+        else:
+            config["cwd"] = str(resolved_cwd)
+        return config
+
    def load_mcp_config(self, config_path: Path) -> None:
        """
        Load and register MCP servers from a config file.
@@ -340,7 +437,7 @@ class ToolRegistry:
        self._mcp_config_path = Path(config_path)

        try:
-            with open(config_path) as f:
+            with open(config_path, encoding="utf-8") as f:
                config = json.load(f)
        except Exception as e:
            logger.warning(f"Failed to load MCP config from {config_path}: {e}")
@@ -357,9 +454,7 @@ class ToolRegistry:
            server_list = [{"name": name, **cfg} for name, cfg in config.items()]

        for server_config in server_list:
-            cwd = server_config.get("cwd")
-            if cwd and not Path(cwd).is_absolute():
-                server_config["cwd"] = str((base_dir / cwd).resolve())
+            server_config = self._resolve_mcp_server_config(server_config, base_dir)
            try:
                self.register_mcp_server(server_config)
            except Exception as e:
@@ -480,6 +575,11 @@ class ToolRegistry:

        except Exception as e:
            logger.error(f"Failed to register MCP server: {e}")
+            if "Connection closed" in str(e) and os.name == "nt":
+                logger.debug(
+                    "On Windows, check that the MCP subprocess starts (e.g. uv in PATH, "
+                    "script path correct). Worker config uses base_dir = mcp_servers.json parent."
+                )
            return 0

    def _convert_mcp_tool_to_framework_tool(self, mcp_tool: Any) -> Tool:
@@ -1270,6 +1270,42 @@ class AgentRuntime:
        """Get the registration for a specific graph (or None)."""
        return self._graphs.get(graph_id)

+    def cancel_all_tasks(self, loop: asyncio.AbstractEventLoop) -> bool:
+        """Cancel all running execution tasks across all graphs.
+
+        Schedules the cancellation on *loop* (the agent event loop) so
+        that ``_execution_tasks`` is only read from the thread that owns
+        it, avoiding cross-thread dict access.  Safe to call from any
+        thread (e.g. the Textual UI thread).
+
+        Blocks the caller for up to 5 seconds waiting for the result.
+        For async callers, use :meth:`cancel_all_tasks_async` instead.
+        """
+        future = asyncio.run_coroutine_threadsafe(self.cancel_all_tasks_async(), loop)
+        try:
+            return future.result(timeout=5)
+        except Exception:
+            logger.warning("cancel_all_tasks: timed out or failed")
+            return False
+
+    async def cancel_all_tasks_async(self) -> bool:
+        """Cancel all running execution tasks (runs on the agent loop).
+
+        Iterates ``_execution_tasks`` and calls ``task.cancel()`` directly.
+        Must be awaited on the agent event loop so dict access is
+        thread-safe.  Returns True if at least one task was cancelled.
+        """
+        cancelled = False
+        for gid in self.list_graphs():
+            reg = self.get_graph_registration(gid)
+            if reg:
+                for stream in reg.streams.values():
+                    for task in list(stream._execution_tasks.values()):
+                        if task and not task.done():
+                            task.cancel()
+                            cancelled = True
+        return cancelled
+
    def _get_primary_session_state(
        self,
        exclude_entry_point: str,
@@ -137,6 +137,9 @@ class EventType(StrEnum):
    WORKER_LOADED = "worker_loaded"
    CREDENTIALS_REQUIRED = "credentials_required"

+    # Queen mode changes (building ↔ running)
+    QUEEN_MODE_CHANGED = "queen_mode_changed"
+
    # Subagent reports (one-way progress updates from sub-agents)
    SUBAGENT_REPORT = "subagent_report"

@@ -715,15 +718,24 @@ class EventBus:
        node_id: str,
        prompt: str = "",
        execution_id: str | None = None,
+        options: list[str] | None = None,
    ) -> None:
-        """Emit client input requested event (client_facing=True nodes)."""
+        """Emit client input requested event (client_facing=True nodes).
+
+        Args:
+            options: Optional predefined choices for the user (1-3 items).
+                     The frontend appends an "Other" free-text option automatically.
+        """
+        data: dict[str, Any] = {"prompt": prompt}
+        if options:
+            data["options"] = options
        await self.publish(
            AgentEvent(
                type=EventType.CLIENT_INPUT_REQUESTED,
                stream_id=stream_id,
                node_id=node_id,
                execution_id=execution_id,
-                data={"prompt": prompt},
+                data=data,
            )
        )

@@ -511,9 +511,11 @@ class ExecutionStream:
        logger.debug(f"Queued execution {execution_id} for stream {self.stream_id}")
        return execution_id

-    # Errors that indicate a fundamental configuration or environment problem.
-    # Resurrecting after these is pointless — the same error will recur.
+    # Errors that indicate resurrection won't help — the same error will recur.
+    # Includes both configuration/environment errors and deterministic node
+    # failures where the conversation/state hasn't changed.
    _FATAL_ERROR_PATTERNS: tuple[str, ...] = (
+        # Configuration / environment
        "credential",
        "authentication",
        "unauthorized",
@@ -525,6 +527,11 @@ class ExecutionStream:
        "permission denied",
        "invalid api",
        "configuration error",
+        # Deterministic node failures — resurrecting at the same node with
+        # the same conversation produces the same result.
+        "node stalled",
+        "ghost empty stream",
+        "max iterations",
    )

    @classmethod
@@ -821,5 +821,148 @@ class TestTimerEntryPoints:
            await runtime.stop()


+# === Cancel All Tasks Tests ===
+
+
+class TestCancelAllTasks:
+    """Tests for cancel_all_tasks and cancel_all_tasks_async."""
+
+    @pytest.mark.asyncio
+    async def test_cancel_all_tasks_async_returns_false_when_no_tasks(
+        self, sample_graph, sample_goal, temp_storage
+    ):
+        """Test that cancel_all_tasks_async returns False with no running tasks."""
+        runtime = AgentRuntime(
+            graph=sample_graph,
+            goal=sample_goal,
+            storage_path=temp_storage,
+        )
+
+        entry_spec = EntryPointSpec(
+            id="webhook",
+            name="Webhook",
+            entry_node="process-webhook",
+            trigger_type="webhook",
+        )
+        runtime.register_entry_point(entry_spec)
+        await runtime.start()
+
+        try:
+            result = await runtime.cancel_all_tasks_async()
+            assert result is False
+        finally:
+            await runtime.stop()
+
+    @pytest.mark.asyncio
+    async def test_cancel_all_tasks_async_cancels_running_task(
+        self, sample_graph, sample_goal, temp_storage
+    ):
+        """Test that cancel_all_tasks_async cancels a running task and returns True."""
+        runtime = AgentRuntime(
+            graph=sample_graph,
+            goal=sample_goal,
+            storage_path=temp_storage,
+        )
+
+        entry_spec = EntryPointSpec(
+            id="webhook",
+            name="Webhook",
+            entry_node="process-webhook",
+            trigger_type="webhook",
+        )
+        runtime.register_entry_point(entry_spec)
+        await runtime.start()
+
+        try:
+            # Inject a fake running task into the stream
+            stream = runtime._streams["webhook"]
+
+            async def hang_forever():
+                await asyncio.get_event_loop().create_future()
+
+            fake_task = asyncio.ensure_future(hang_forever())
+            stream._execution_tasks["fake-exec"] = fake_task
+
+            result = await runtime.cancel_all_tasks_async()
+            assert result is True
+
+            # Let the CancelledError propagate
+            try:
+                await fake_task
+            except asyncio.CancelledError:
+                pass
+            assert fake_task.cancelled()
+
+            # Clean up
+            del stream._execution_tasks["fake-exec"]
+        finally:
+            await runtime.stop()
+
+    @pytest.mark.asyncio
+    async def test_cancel_all_tasks_async_cancels_multiple_tasks_across_streams(
+        self, sample_graph, sample_goal, temp_storage
+    ):
+        """Test that cancel_all_tasks_async cancels tasks across multiple streams."""
+        runtime = AgentRuntime(
+            graph=sample_graph,
+            goal=sample_goal,
+            storage_path=temp_storage,
+        )
+
+        # Register two entry points so we get two streams
+        runtime.register_entry_point(
+            EntryPointSpec(
+                id="stream-a",
+                name="Stream A",
+                entry_node="process-webhook",
+                trigger_type="webhook",
+            )
+        )
+        runtime.register_entry_point(
+            EntryPointSpec(
+                id="stream-b",
+                name="Stream B",
+                entry_node="process-webhook",
+                trigger_type="webhook",
+            )
+        )
+        await runtime.start()
+
+        try:
+
+            async def hang_forever():
+                await asyncio.get_event_loop().create_future()
+
+            stream_a = runtime._streams["stream-a"]
+            stream_b = runtime._streams["stream-b"]
+
+            # Two tasks in stream A, one task in stream B
+            task_a1 = asyncio.ensure_future(hang_forever())
+            task_a2 = asyncio.ensure_future(hang_forever())
+            task_b1 = asyncio.ensure_future(hang_forever())
+
+            stream_a._execution_tasks["exec-a1"] = task_a1
+            stream_a._execution_tasks["exec-a2"] = task_a2
+            stream_b._execution_tasks["exec-b1"] = task_b1
+
+            result = await runtime.cancel_all_tasks_async()
+            assert result is True
+
+            # Let CancelledErrors propagate
+            for task in [task_a1, task_a2, task_b1]:
+                try:
+                    await task
+                except asyncio.CancelledError:
+                    pass
+                assert task.cancelled()
+
+            # Clean up
+            del stream_a._execution_tasks["exec-a1"]
+            del stream_a._execution_tasks["exec-a2"]
+            del stream_b._execution_tasks["exec-b1"]
+        finally:
+            await runtime.stop()
+
+
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
@@ -38,6 +38,7 @@ DEFAULT_EVENT_TYPES = [
    EventType.WORKER_LOADED,
    EventType.CREDENTIALS_REQUIRED,
    EventType.SUBAGENT_REPORT,
+    EventType.QUEEN_MODE_CHANGED,
 ]

 # Keepalive interval in seconds
@@ -91,6 +92,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
        "node_loop_started",
        "credentials_required",
        "worker_loaded",
+        "queen_mode_changed",
    }

    client_disconnected = asyncio.Event()
@@ -130,6 +132,29 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
        "SSE connected: session='%s', sub_id='%s', types=%d", session.id, sub_id, len(event_types)
    )

+    # Replay buffered events that were published before this SSE connected.
+    # The EventBus keeps a history ring-buffer; we replay the subset that
+    # produces visible chat messages so the frontend never misses early
+    # queen output.  Lifecycle events are NOT replayed to avoid duplicate
+    # state transitions (turn counter increments, etc.).
+    _REPLAY_TYPES = {
+        EventType.CLIENT_OUTPUT_DELTA.value,
+        EventType.EXECUTION_STARTED.value,
+        EventType.CLIENT_INPUT_REQUESTED.value,
+    }
+    event_type_values = {et.value for et in event_types}
+    replay_types = _REPLAY_TYPES & event_type_values
+    replayed = 0
+    for past_event in event_bus._event_history:
+        if past_event.type.value in replay_types:
+            try:
+                queue.put_nowait(past_event.to_dict())
+                replayed += 1
+            except asyncio.QueueFull:
+                break
+    if replayed:
+        logger.info("SSE replayed %d buffered events for session='%s'", replayed, session.id)
+
    event_count = 0
    close_reason = "unknown"
    try:
@@ -64,6 +64,16 @@ async def handle_trigger(request: web.Request) -> web.Response:
        session_state=session_state,
    )

+    # Cancel queen's in-progress LLM turn so it picks up the mode change cleanly
+    if session.queen_executor:
+        node = session.queen_executor.node_registry.get("queen")
+        if node and hasattr(node, "cancel_current_turn"):
+            node.cancel_current_turn()
+
+    # Switch queen to running mode (mirrors run_agent_with_input tool behavior)
+    if session.mode_state is not None:
+        await session.mode_state.switch_to_running(source="frontend")
+
    return web.json_response({"execution_id": execution_id})


@@ -124,6 +134,35 @@ async def handle_chat(request: web.Request) -> web.Response:
    return web.json_response({"error": "Queen not available"}, status=503)


+async def handle_queen_context(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/queen-context — queue context for the queen.
+
+    Unlike /chat, this does NOT trigger an LLM response. The message is
+    queued in the queen's injection queue and will be drained on her next
+    natural iteration (prefixed with [External event]:).
+
+    Body: {"message": "..."}
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    body = await request.json()
+    message = body.get("message", "")
+
+    if not message:
+        return web.json_response({"error": "message is required"}, status=400)
+
+    queen_executor = session.queen_executor
+    if queen_executor is not None:
+        node = queen_executor.node_registry.get("queen")
+        if node is not None and hasattr(node, "inject_event"):
+            await node.inject_event(message, is_client_input=False)
+            return web.json_response({"status": "queued", "delivered": True})
+
+    return web.json_response({"error": "Queen not available"}, status=503)
+
+
 async def handle_worker_input(request: web.Request) -> web.Response:
    """POST /api/sessions/{session_id}/worker-input — send input to waiting worker node.

@@ -249,6 +288,60 @@ async def handle_resume(request: web.Request) -> web.Response:
    )


+async def handle_pause(request: web.Request) -> web.Response:
+    """POST /api/sessions/{session_id}/pause — pause the worker (queen stays alive).
+
+    Mirrors the queen's stop_worker() tool: cancels all active worker
+    executions, pauses timers so nothing auto-restarts, but does NOT
+    touch the queen so she can observe and react to the pause.
+    """
+    session, err = resolve_session(request)
+    if err:
+        return err
+
+    if not session.worker_runtime:
+        return web.json_response({"error": "No worker loaded in this session"}, status=503)
+
+    runtime = session.worker_runtime
+    cancelled = []
+
+    for graph_id in runtime.list_graphs():
+        reg = runtime.get_graph_registration(graph_id)
+        if reg is None:
+            continue
+        for _ep_id, stream in reg.streams.items():
+            # Signal shutdown on active nodes to abort in-flight LLM streams
+            for executor in stream._active_executors.values():
+                for node in executor.node_registry.values():
+                    if hasattr(node, "signal_shutdown"):
+                        node.signal_shutdown()
+                    if hasattr(node, "cancel_current_turn"):
+                        node.cancel_current_turn()
+
+            for exec_id in list(stream.active_execution_ids):
+                try:
+                    ok = await stream.cancel_execution(exec_id)
+                    if ok:
+                        cancelled.append(exec_id)
+                except Exception:
+                    pass
+
+    # Pause timers so the next tick doesn't restart execution
+    runtime.pause_timers()
+
+    # Switch to staging (agent still loaded, ready to re-run)
+    if session.mode_state is not None:
+        await session.mode_state.switch_to_staging(source="frontend")
+
+    return web.json_response(
+        {
+            "stopped": bool(cancelled),
+            "cancelled": cancelled,
+            "timers_paused": True,
+        }
+    )
+
+
 async def handle_stop(request: web.Request) -> web.Response:
    """POST /api/sessions/{session_id}/stop — cancel a running execution.

@@ -282,6 +375,16 @@ async def handle_stop(request: web.Request) -> web.Response:

            cancelled = await stream.cancel_execution(execution_id)
            if cancelled:
+                # Cancel queen's in-progress LLM turn
+                if session.queen_executor:
+                    node = session.queen_executor.node_registry.get("queen")
+                    if node and hasattr(node, "cancel_current_turn"):
+                        node.cancel_current_turn()
+
+                # Switch to staging (agent still loaded, ready to re-run)
+                if session.mode_state is not None:
+                    await session.mode_state.switch_to_staging(source="frontend")
+
                return web.json_response(
                    {
                        "stopped": True,
@@ -365,8 +468,9 @@ def register_routes(app: web.Application) -> None:
    app.router.add_post("/api/sessions/{session_id}/trigger", handle_trigger)
    app.router.add_post("/api/sessions/{session_id}/inject", handle_inject)
    app.router.add_post("/api/sessions/{session_id}/chat", handle_chat)
+    app.router.add_post("/api/sessions/{session_id}/queen-context", handle_queen_context)
    app.router.add_post("/api/sessions/{session_id}/worker-input", handle_worker_input)
-    app.router.add_post("/api/sessions/{session_id}/pause", handle_stop)
+    app.router.add_post("/api/sessions/{session_id}/pause", handle_pause)
    app.router.add_post("/api/sessions/{session_id}/resume", handle_resume)
    app.router.add_post("/api/sessions/{session_id}/stop", handle_stop)
    app.router.add_post("/api/sessions/{session_id}/cancel-queen", handle_cancel_queen)
@@ -48,6 +48,7 @@ def _get_manager(request: web.Request) -> SessionManager:
 def _session_to_live_dict(session) -> dict:
    """Serialize a live Session to the session-primary JSON shape."""
    info = session.worker_info
+    mode_state = getattr(session, "mode_state", None)
    return {
        "session_id": session.id,
        "worker_id": session.worker_id,
@@ -60,6 +61,7 @@ def _session_to_live_dict(session) -> dict:
        "loaded_at": session.loaded_at,
        "uptime_seconds": round(time.time() - session.loaded_at, 1),
        "intro_message": getattr(session.runner, "intro_message", "") or "",
+        "queen_mode": mode_state.mode if mode_state else "building",
    }


@@ -40,6 +40,8 @@ class Session:
    runner: Any | None = None  # AgentRunner
    worker_runtime: Any | None = None  # AgentRuntime
    worker_info: Any | None = None  # AgentInfo
+    # Queen mode state (building/staging/running)
+    mode_state: Any = None  # QueenModeState
    # Judge (active when worker is loaded)
    judge_task: asyncio.Task | None = None
    escalation_sub: str | None = None
@@ -425,16 +427,26 @@ class SessionManager:
            except Exception:
                logger.warning("Queen: MCP config failed to load", exc_info=True)

+        # Mode state for building/running mode switching
+        from framework.tools.queen_lifecycle_tools import (
+            QueenModeState,
+            register_queen_lifecycle_tools,
+        )
+
+        # Start in staging when the caller provided an agent, building otherwise.
+        initial_mode = "staging" if worker_identity else "building"
+        mode_state = QueenModeState(mode=initial_mode, event_bus=session.event_bus)
+        session.mode_state = mode_state
+
        # Always register lifecycle tools — they check session.worker_runtime
        # at call time, so they work even if no worker is loaded yet.
-        from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
-
        register_queen_lifecycle_tools(
            queen_registry,
            session=session,
            session_id=session.id,
            session_manager=self,
            manager_session_id=session.id,
+            mode_state=mode_state,
        )

        # Monitoring tools need concrete worker paths — only register when present
@@ -452,6 +464,32 @@ class SessionManager:
        queen_tools = list(queen_registry.get_tools().values())
        queen_tool_executor = queen_registry.get_executor()

+        # Partition tools into mode-specific sets
+        from framework.agents.hive_coder.nodes import (
+            _QUEEN_BUILDING_TOOLS,
+            _QUEEN_RUNNING_TOOLS,
+            _QUEEN_STAGING_TOOLS,
+        )
+
+        building_names = set(_QUEEN_BUILDING_TOOLS)
+        staging_names = set(_QUEEN_STAGING_TOOLS)
+        running_names = set(_QUEEN_RUNNING_TOOLS)
+
+        registered_names = {t.name for t in queen_tools}
+        missing_building = building_names - registered_names
+        if missing_building:
+            logger.warning(
+                "Queen: %d/%d building tools NOT registered: %s",
+                len(missing_building),
+                len(building_names),
+                sorted(missing_building),
+            )
+        logger.info("Queen: registered tools: %s", sorted(registered_names))
+
+        mode_state.building_tools = [t for t in queen_tools if t.name in building_names]
+        mode_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
+        mode_state.running_tools = [t for t in queen_tools if t.name in running_names]
+
        # Build queen graph with adjusted prompt + tools
        _orig_node = _queen_graph.nodes[0]
        base_prompt = _orig_node.system_prompt or ""
@@ -493,12 +531,37 @@ class SessionManager:
                    storage_path=queen_dir,
                    loop_config=queen_graph.loop_config,
                    execution_id=session.id,
+                    dynamic_tools_provider=mode_state.get_current_tools,
                )
                session.queen_executor = executor
+
+                # Wire inject_notification so mode switches notify the queen LLM
+                async def _inject_mode_notification(content: str) -> None:
+                    node = executor.node_registry.get("queen")
+                    if node is not None and hasattr(node, "inject_event"):
+                        await node.inject_event(content)
+
+                mode_state.inject_notification = _inject_mode_notification
+
+                # Auto-switch to staging when worker execution finishes naturally
+                from framework.runtime.event_bus import EventType as _ET
+
+                async def _on_worker_done(event):
+                    if event.stream_id == "queen":
+                        return
+                    if mode_state.mode == "running":
+                        await mode_state.switch_to_staging(source="auto")
+
+                session.event_bus.subscribe(
+                    event_types=[_ET.EXECUTION_COMPLETED, _ET.EXECUTION_FAILED],
+                    handler=_on_worker_done,
+                )
+
                logger.info(
-                    "Queen starting with %d tools: %s",
-                    len(queen_tools),
-                    [t.name for t in queen_tools],
+                    "Queen starting in %s mode with %d tools: %s",
+                    mode_state.mode,
+                    len(mode_state.get_current_tools()),
+                    [t.name for t in mode_state.get_current_tools()],
                )
                result = await executor.execute(
                    graph=queen_graph,
@@ -74,6 +74,7 @@ class MockStream:
    is_awaiting_input: bool = False
    _execution_tasks: dict = field(default_factory=dict)
    _active_executors: dict = field(default_factory=dict)
+    active_execution_ids: set = field(default_factory=set)

    async def cancel_execution(self, execution_id: str) -> bool:
        return execution_id in self._execution_tasks
@@ -117,6 +118,9 @@ class MockRuntime:
    async def inject_input(self, node_id, content, graph_id=None, *, is_client_input=False):
        return True

+    def pause_timers(self):
+        pass
+
    async def get_goal_progress(self):
        return {"progress": 0.5, "criteria": []}

@@ -537,18 +541,8 @@ class TestExecution:
            assert resp.status == 400

    @pytest.mark.asyncio
-    async def test_pause_not_found(self):
-        session = _make_session()
-        app = _make_app_with_session(session)
-        async with TestClient(TestServer(app)) as client:
-            resp = await client.post(
-                "/api/sessions/test_agent/pause",
-                json={"execution_id": "nonexistent"},
-            )
-            assert resp.status == 404
-
-    @pytest.mark.asyncio
-    async def test_pause_missing_execution_id(self):
+    async def test_pause_no_active_executions(self):
+        """Pause with no active executions returns stopped=False."""
        session = _make_session()
        app = _make_app_with_session(session)
        async with TestClient(TestServer(app)) as client:
@@ -556,7 +550,26 @@ class TestExecution:
                "/api/sessions/test_agent/pause",
                json={},
            )
-            assert resp.status == 400
+            assert resp.status == 200
+            data = await resp.json()
+            assert data["stopped"] is False
+            assert data["cancelled"] == []
+            assert data["timers_paused"] is True
+
+    @pytest.mark.asyncio
+    async def test_pause_does_not_cancel_queen(self):
+        """Pause should stop the worker but leave the queen running."""
+        session = _make_session()
+        app = _make_app_with_session(session)
+        async with TestClient(TestServer(app)) as client:
+            resp = await client.post(
+                "/api/sessions/test_agent/pause",
+                json={},
+            )
+            assert resp.status == 200
+            # Queen's cancel_current_turn should NOT have been called
+            queen_node = session.queen_executor.node_registry["queen"]
+            queen_node.cancel_current_turn.assert_not_called()

    @pytest.mark.asyncio
    async def test_goal_progress(self):
@@ -270,10 +270,10 @@ def _edit_test_code(code: str) -> str:

    try:
        # Open editor
-        subprocess.run([editor, temp_path], check=True)
+        subprocess.run([editor, temp_path], check=True, encoding="utf-8")

        # Read edited code
-        with open(temp_path) as f:
+        with open(temp_path, encoding="utf-8") as f:
            return f.read()
    except subprocess.CalledProcessError:
        print("Editor failed, keeping original code")
@@ -190,6 +190,7 @@ def cmd_test_run(args: argparse.Namespace) -> int:
    try:
        result = subprocess.run(
            cmd,
+            encoding="utf-8",
            env=env,
            timeout=600,  # 10 minute timeout
        )
@@ -248,6 +249,7 @@ def cmd_test_debug(args: argparse.Namespace) -> int:
    try:
        result = subprocess.run(
            cmd,
+            encoding="utf-8",
            env=env,
            timeout=120,  # 2 minute timeout for single test
        )
@@ -36,7 +36,7 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path
 from typing import TYPE_CHECKING, Any

@@ -66,6 +66,125 @@ class WorkerSessionAdapter:
    worker_path: Path | None = None


+@dataclass
+class QueenModeState:
+    """Mutable state container for queen operating mode.
+
+    Three modes: building → staging → running.
+    Shared between the dynamic_tools_provider callback and tool handlers
+    that trigger mode transitions.
+    """
+
+    mode: str = "building"  # "building", "staging", or "running"
+    building_tools: list = field(default_factory=list)  # list[Tool]
+    staging_tools: list = field(default_factory=list)  # list[Tool]
+    running_tools: list = field(default_factory=list)  # list[Tool]
+    inject_notification: Any = None  # async (str) -> None
+    event_bus: Any = None  # EventBus — for emitting QUEEN_MODE_CHANGED events
+
+    def get_current_tools(self) -> list:
+        """Return tools for the current mode."""
+        if self.mode == "running":
+            return list(self.running_tools)
+        if self.mode == "staging":
+            return list(self.staging_tools)
+        return list(self.building_tools)
+
+    async def _emit_mode_event(self) -> None:
+        """Publish a QUEEN_MODE_CHANGED event so the frontend updates the tag."""
+        if self.event_bus is not None:
+            await self.event_bus.publish(
+                AgentEvent(
+                    type=EventType.QUEEN_MODE_CHANGED,
+                    stream_id="queen",
+                    data={"mode": self.mode},
+                )
+            )
+
+    async def switch_to_running(self, source: str = "tool") -> None:
+        """Switch to running mode and notify the queen.
+
+        Args:
+            source: Who triggered the switch — "tool" (queen LLM),
+                "frontend" (user clicked Run), or "auto" (system).
+        """
+        if self.mode == "running":
+            return
+        self.mode = "running"
+        tool_names = [t.name for t in self.running_tools]
+        logger.info("Queen mode → running (source=%s, tools: %s)", source, tool_names)
+        await self._emit_mode_event()
+        if self.inject_notification:
+            if source == "frontend":
+                msg = (
+                    "[MODE CHANGE] The user clicked Run in the UI. Switched to RUNNING mode. "
+                    "Worker is now executing. You have monitoring/lifecycle tools: "
+                    + ", ".join(tool_names)
+                    + "."
+                )
+            else:
+                msg = (
+                    "[MODE CHANGE] Switched to RUNNING mode. "
+                    "Worker is executing. You now have monitoring/lifecycle tools: "
+                    + ", ".join(tool_names)
+                    + "."
+                )
+            await self.inject_notification(msg)
+
+    async def switch_to_staging(self, source: str = "tool") -> None:
+        """Switch to staging mode and notify the queen.
+
+        Args:
+            source: Who triggered the switch — "tool", "frontend", or "auto".
+        """
+        if self.mode == "staging":
+            return
+        self.mode = "staging"
+        tool_names = [t.name for t in self.staging_tools]
+        logger.info("Queen mode → staging (source=%s, tools: %s)", source, tool_names)
+        await self._emit_mode_event()
+        if self.inject_notification:
+            if source == "frontend":
+                msg = (
+                    "[MODE CHANGE] The user stopped the worker from the UI. "
+                    "Switched to STAGING mode. Agent is still loaded. "
+                    "Available tools: " + ", ".join(tool_names) + "."
+                )
+            elif source == "auto":
+                msg = (
+                    "[MODE CHANGE] Worker execution completed. Switched to STAGING mode. "
+                    "Agent is still loaded. Call run_agent_with_input(task) to run again. "
+                    "Available tools: " + ", ".join(tool_names) + "."
+                )
+            else:
+                msg = (
+                    "[MODE CHANGE] Switched to STAGING mode. "
+                    "Agent loaded and ready. Call run_agent_with_input(task) to start, "
+                    "or stop_worker_and_edit() to go back to building. "
+                    "Available tools: " + ", ".join(tool_names) + "."
+                )
+            await self.inject_notification(msg)
+
+    async def switch_to_building(self, source: str = "tool") -> None:
+        """Switch to building mode and notify the queen.
+
+        Args:
+            source: Who triggered the switch — "tool", "frontend", or "auto".
+        """
+        if self.mode == "building":
+            return
+        self.mode = "building"
+        tool_names = [t.name for t in self.building_tools]
+        logger.info("Queen mode → building (source=%s, tools: %s)", source, tool_names)
+        await self._emit_mode_event()
+        if self.inject_notification:
+            await self.inject_notification(
+                "[MODE CHANGE] Switched to BUILDING mode. "
+                "Lifecycle tools removed. Full coding tools restored. "
+                "Call load_built_agent(path) when ready to stage."
+            )
+
+
 def build_worker_profile(runtime: AgentRuntime, agent_path: Path | str | None = None) -> str:
    """Build a worker capability profile from its graph/goal definition.

@@ -120,6 +239,8 @@ def register_queen_lifecycle_tools(
    # Server context — enables load_built_agent tool
    session_manager: Any = None,
    manager_session_id: str | None = None,
+    # Mode switching
+    mode_state: QueenModeState | None = None,
 ) -> int:
    """Register queen lifecycle tools.

@@ -136,6 +257,9 @@ def register_queen_lifecycle_tools(
            for ``load_built_agent`` to hot-load a worker.
        manager_session_id: (Server only) The session's ID in the manager,
            used with ``session_manager.load_worker()``.
+        mode_state: (Optional) Mutable mode state for building/running
+            mode switching. When provided, load_built_agent switches to
+            running mode and stop_worker_and_edit switches to building mode.

    Returns the number of tools registered.
    """
@@ -343,6 +467,75 @@ def register_queen_lifecycle_tools(
    registry.register("stop_worker", _stop_tool, lambda inputs: stop_worker())
    tools_registered += 1

+    # --- stop_worker_and_edit -------------------------------------------------
+
+    async def stop_worker_and_edit() -> str:
+        """Stop the worker and switch to building mode for editing the agent."""
+        stop_result = await stop_worker()
+
+        # Switch to building mode
+        if mode_state is not None:
+            await mode_state.switch_to_building()
+
+        result = json.loads(stop_result)
+        result["mode"] = "building"
+        result["message"] = (
+            "Worker stopped. You are now in building mode. "
+            "Use your coding tools to modify the agent, then call "
+            "load_built_agent(path) to stage it again."
+        )
+        return json.dumps(result)
+
+    _stop_edit_tool = Tool(
+        name="stop_worker_and_edit",
+        description=(
+            "Stop the running worker and switch to building mode. "
+            "Use this when you need to modify the agent's code, nodes, or configuration. "
+            "After editing, call load_built_agent(path) to reload and run."
+        ),
+        parameters={"type": "object", "properties": {}},
+    )
+    registry.register(
+        "stop_worker_and_edit", _stop_edit_tool, lambda inputs: stop_worker_and_edit()
+    )
+    tools_registered += 1
+
+    # --- stop_worker (Running → Staging) -------------------------------------
+
+    async def stop_worker_to_staging() -> str:
+        """Stop the running worker and switch to staging mode.
+
+        After stopping, ask the user whether they want to:
+        1. Re-run the agent with new input → call run_agent_with_input(task)
+        2. Edit the agent code → call stop_worker_and_edit() to go to building mode
+        """
+        stop_result = await stop_worker()
+
+        # Switch to staging mode
+        if mode_state is not None:
+            await mode_state.switch_to_staging()
+
+        result = json.loads(stop_result)
+        result["mode"] = "staging"
+        result["message"] = (
+            "Worker stopped. You are now in staging mode. "
+            "Ask the user: would they like to re-run with new input, "
+            "or edit the agent code?"
+        )
+        return json.dumps(result)
+
+    _stop_worker_tool = Tool(
+        name="stop_worker",
+        description=(
+            "Stop the running worker and switch to staging mode. "
+            "After stopping, ask the user whether they want to re-run "
+            "with new input or edit the agent code."
+        ),
+        parameters={"type": "object", "properties": {}},
+    )
+    registry.register("stop_worker", _stop_worker_tool, lambda inputs: stop_worker_to_staging())
+    tools_registered += 1
+
    # --- get_worker_status ----------------------------------------------------

    def _get_event_bus():
@@ -648,7 +841,7 @@ def register_queen_lifecycle_tools(
            injectable = stream.get_injectable_nodes()
            if injectable:
                target_node_id = injectable[0]["node_id"]
-                ok = await stream.inject_input(target_node_id, content)
+                ok = await stream.inject_input(target_node_id, content, is_client_input=True)
                if ok:
                    return json.dumps(
                        {
@@ -818,11 +1011,24 @@ def register_queen_lifecycle_tools(
                    str(resolved_path),
                )
                info = updated_session.worker_info
+
+                # Switch to staging mode after successful load
+                if mode_state is not None:
+                    await mode_state.switch_to_staging()
+
+                worker_name = info.name if info else updated_session.worker_id
                return json.dumps(
                    {
                        "status": "loaded",
+                        "mode": "staging",
+                        "message": (
+                            f"Successfully loaded '{worker_name}'. "
+                            "You are now in STAGING mode. "
+                            "Call run_agent_with_input(task) to start the worker, "
+                            "or stop_worker_and_edit() to go back to building."
+                        ),
                        "worker_id": updated_session.worker_id,
-                        "worker_name": info.name if info else updated_session.worker_id,
+                        "worker_name": worker_name,
                        "goal": info.goal_name if info else "",
                        "node_count": info.node_count if info else 0,
                    }
@@ -857,5 +1063,125 @@ def register_queen_lifecycle_tools(
        )
        tools_registered += 1

+    # --- run_agent_with_input ------------------------------------------------
+
+    async def run_agent_with_input(task: str) -> str:
+        """Run the loaded worker agent with the given task input.
+
+        Performs preflight checks (credentials, MCP resync), triggers the
+        worker's default entry point, and switches to running mode.
+        """
+        runtime = _get_runtime()
+        if runtime is None:
+            return json.dumps({"error": "No worker loaded in this session."})
+
+        try:
+            # Pre-flight: validate credentials and resync MCP servers.
+            loop = asyncio.get_running_loop()
+
+            async def _preflight():
+                cred_error: CredentialError | None = None
+                try:
+                    await loop.run_in_executor(
+                        None,
+                        lambda: validate_credentials(
+                            runtime.graph.nodes,
+                            interactive=False,
+                            skip=False,
+                        ),
+                    )
+                except CredentialError as e:
+                    cred_error = e
+
+                runner = getattr(session, "runner", None)
+                if runner:
+                    try:
+                        await loop.run_in_executor(
+                            None,
+                            lambda: runner._tool_registry.resync_mcp_servers_if_needed(),
+                        )
+                    except Exception as e:
+                        logger.warning("MCP resync failed: %s", e)
+
+                if cred_error is not None:
+                    raise cred_error
+
+            try:
+                await asyncio.wait_for(_preflight(), timeout=_START_PREFLIGHT_TIMEOUT)
+            except TimeoutError:
+                logger.warning(
+                    "run_agent_with_input preflight timed out after %ds — proceeding",
+                    _START_PREFLIGHT_TIMEOUT,
+                )
+            except CredentialError:
+                raise  # handled below
+
+            # Resume timers in case they were paused by a previous stop
+            runtime.resume_timers()
+
+            # Get session state from any prior execution for memory continuity
+            session_state = runtime._get_primary_session_state("default") or {}
+
+            if session_id:
+                session_state["resume_session_id"] = session_id
+
+            exec_id = await runtime.trigger(
+                entry_point_id="default",
+                input_data={"user_request": task},
+                session_state=session_state,
+            )
+
+            # Switch to running mode
+            if mode_state is not None:
+                await mode_state.switch_to_running()
+
+            return json.dumps(
+                {
+                    "status": "started",
+                    "mode": "running",
+                    "execution_id": exec_id,
+                    "task": task,
+                }
+            )
+        except CredentialError as e:
+            error_payload = credential_errors_to_json(e)
+            error_payload["agent_path"] = str(getattr(session, "worker_path", "") or "")
+
+            bus = getattr(session, "event_bus", None)
+            if bus is not None:
+                await bus.publish(
+                    AgentEvent(
+                        type=EventType.CREDENTIALS_REQUIRED,
+                        stream_id="queen",
+                        data=error_payload,
+                    )
+                )
+            return json.dumps(error_payload)
+        except Exception as e:
+            return json.dumps({"error": f"Failed to start worker: {e}"})
+
+    _run_input_tool = Tool(
+        name="run_agent_with_input",
+        description=(
+            "Run the loaded worker agent with the given task. Validates credentials, "
+            "triggers the worker's default entry point, and switches to running mode. "
+            "Use this after loading an agent (staging mode) to start execution."
+        ),
+        parameters={
+            "type": "object",
+            "properties": {
+                "task": {
+                    "type": "string",
+                    "description": "The task or input for the worker agent to execute",
+                },
+            },
+            "required": ["task"],
+        },
+    )
+    registry.register(
+        "run_agent_with_input", _run_input_tool, lambda inputs: run_agent_with_input(**inputs)
+    )
+    tools_registered += 1
+
    logger.info("Registered %d queen lifecycle tools", tools_registered)
    return tools_registered
@@ -256,7 +256,7 @@ class AdenTUI(App):
        """Override to use native `open` for file:// URLs on macOS."""
        if url.startswith("file://") and platform.system() == "Darwin":
            path = url.removeprefix("file://")
-            subprocess.Popen(["open", path])
+            subprocess.Popen(["open", path], encoding="utf-8")
        else:
            super().open_url(url, new_tab=new_tab)

@@ -475,7 +475,10 @@ class AdenTUI(App):
        from framework.graph.executor import GraphExecutor
        from framework.runner.tool_registry import ToolRegistry
        from framework.runtime.core import Runtime
-        from framework.tools.queen_lifecycle_tools import register_queen_lifecycle_tools
+        from framework.tools.queen_lifecycle_tools import (
+            QueenModeState,
+            register_queen_lifecycle_tools,
+        )
        from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools

        log = logging.getLogger("tui.queen")
@@ -536,12 +539,16 @@ class AdenTUI(App):
                except Exception:
                    log.warning("Queen: MCP config failed to load", exc_info=True)

+            # Worker is already loaded in TUI path → start in staging mode.
+            mode_state = QueenModeState(mode="staging", event_bus=event_bus)
+
            register_queen_lifecycle_tools(
                queen_registry,
                worker_runtime=self.runtime,
                event_bus=event_bus,
                storage_path=storage_path,
                session_id=session_id,
+                mode_state=mode_state,
            )
            register_worker_monitoring_tools(
                queen_registry,
@@ -553,6 +560,20 @@ class AdenTUI(App):
            queen_tools = list(queen_registry.get_tools().values())
            queen_tool_executor = queen_registry.get_executor()

+            # Partition tools into mode-specific sets
+            from framework.agents.hive_coder.nodes import (
+                _QUEEN_BUILDING_TOOLS,
+                _QUEEN_RUNNING_TOOLS,
+                _QUEEN_STAGING_TOOLS,
+            )
+
+            building_names = set(_QUEEN_BUILDING_TOOLS)
+            staging_names = set(_QUEEN_STAGING_TOOLS)
+            running_names = set(_QUEEN_RUNNING_TOOLS)
+            mode_state.building_tools = [t for t in queen_tools if t.name in building_names]
+            mode_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
+            mode_state.running_tools = [t for t in queen_tools if t.name in running_names]
+
            # Build worker profile for queen's system prompt.
            from framework.tools.queen_lifecycle_tools import build_worker_profile

@@ -593,12 +614,23 @@ class AdenTUI(App):
                        stream_id="queen",
                        storage_path=queen_dir,
                        loop_config=queen_graph.loop_config,
+                        dynamic_tools_provider=mode_state.get_current_tools,
                    )
                    self._queen_executor = executor
+
+                    # Wire inject_notification so mode switches notify the queen LLM
+                    async def _inject_mode_notification(content: str) -> None:
+                        node = executor.node_registry.get("queen")
+                        if node is not None and hasattr(node, "inject_event"):
+                            await node.inject_event(content)
+
+                    mode_state.inject_notification = _inject_mode_notification
+
                    log.info(
-                        "Queen starting with %d tools: %s",
-                        len(queen_tools),
-                        [t.name for t in queen_tools],
+                        "Queen starting in %s mode with %d tools: %s",
+                        mode_state.mode,
+                        len(mode_state.get_current_tools()),
+                        [t.name for t in mode_state.get_current_tools()],
                    )
                    # The queen's event_loop node runs forever (continuous mode).
                    # It blocks on _await_user_input() after each LLM turn,
@@ -1611,46 +1643,20 @@ class AdenTUI(App):
        self.notify(f"Logs {mode}", severity="information", timeout=2)

    def action_pause_execution(self) -> None:
-        """Immediately pause execution by cancelling task (bound to Ctrl+Z)."""
+        """Immediately pause execution by cancelling all running tasks (bound to Ctrl+Z)."""
        if self.chat_repl is None or self.runtime is None:
            return
        try:
-            if not self.chat_repl._current_exec_id:
+            if self.runtime.cancel_all_tasks(self.chat_repl._agent_loop):
+                self.chat_repl._current_exec_id = None
                self.notify(
-                    "No active execution to pause",
+                    "All executions stopped",
                    severity="information",
                    timeout=3,
                )
-                return
-
-            task_cancelled = False
-            all_streams = []
-            active_reg = self.runtime.get_graph_registration(self.runtime.active_graph_id)
-            if active_reg:
-                all_streams.extend(active_reg.streams.values())
-            for gid in self.runtime.list_graphs():
-                if gid == self.runtime.active_graph_id:
-                    continue
-                reg = self.runtime.get_graph_registration(gid)
-                if reg:
-                    all_streams.extend(reg.streams.values())
-
-            for stream in all_streams:
-                exec_id = self.chat_repl._current_exec_id
-                task = stream._execution_tasks.get(exec_id)
-                if task and not task.done():
-                    task.cancel()
-                    task_cancelled = True
-                    self.notify(
-                        "Execution paused - state saved",
-                        severity="information",
-                        timeout=3,
-                    )
-                    break
-
-            if not task_cancelled:
+            else:
                self.notify(
-                    "Execution already completed",
+                    "No active executions",
                    severity="information",
                    timeout=2,
                )
@@ -488,7 +488,7 @@ class ChatRepl(Vertical):
                if not state_file.exists():
                    continue

-                with open(state_file) as f:
+                with open(state_file, encoding="utf-8") as f:
                    state = json.load(f)

                status = state.get("status", "").lower()
@@ -547,7 +547,7 @@ class ChatRepl(Vertical):

            # Read session state
            try:
-                with open(state_file) as f:
+                with open(state_file, encoding="utf-8") as f:
                    state = json.load(f)

                # Track this session for /resume <number> lookup
@@ -599,7 +599,7 @@ class ChatRepl(Vertical):
        try:
            import json

-            with open(state_file) as f:
+            with open(state_file, encoding="utf-8") as f:
                state = json.load(f)

            # Basic info
@@ -640,7 +640,7 @@ class ChatRepl(Vertical):
                    # Load and show checkpoints
                    for i, cp_file in enumerate(checkpoint_files[-5:], 1):  # Last 5
                        try:
-                            with open(cp_file) as f:
+                            with open(cp_file, encoding="utf-8") as f:
                                cp_data = json.load(f)

                            cp_id = cp_data.get("checkpoint_id", cp_file.stem)
@@ -687,7 +687,7 @@ class ChatRepl(Vertical):

            import json

-            with open(state_file) as f:
+            with open(state_file, encoding="utf-8") as f:
                state = json.load(f)

            # Resume from session state (not checkpoint)
@@ -868,27 +868,17 @@ class ChatRepl(Vertical):
            self._write_history(f"[dim]{traceback.format_exc()}[/dim]")

    async def _cmd_pause(self) -> None:
-        """Immediately pause execution by cancelling task (same as Ctrl+Z)."""
-        # Check if there's a current execution
-        if not self._current_exec_id:
-            self._write_history("[bold yellow]No active execution to pause[/bold yellow]")
-            self._write_history("  Start an execution first, then use /pause during execution")
-            return
-
-        # Find and cancel the execution task - executor will catch and save state
-        task_cancelled = False
-        for stream in self.runtime._streams.values():
-            exec_id = self._current_exec_id
-            task = stream._execution_tasks.get(exec_id)
-            if task and not task.done():
-                task.cancel()
-                task_cancelled = True
-                self._write_history("[bold green]⏸ Execution paused - state saved[/bold green]")
-                self._write_history("  Resume later with: [bold]/resume[/bold]")
-                break
-
-        if not task_cancelled:
-            self._write_history("[bold yellow]Execution already completed[/bold yellow]")
+        """Immediately pause execution by cancelling all running tasks (same as Ctrl+Z)."""
+        future = asyncio.run_coroutine_threadsafe(
+            self.runtime.cancel_all_tasks_async(), self._agent_loop
+        )
+        result = await asyncio.wrap_future(future)
+        if result:
+            self._current_exec_id = None
+            self._write_history("[bold green]⏸ All executions stopped[/bold green]")
+            self._write_history("  Resume later with: [bold]/resume[/bold]")
+        else:
+            self._write_history("[bold yellow]No active executions[/bold yellow]")

    async def _cmd_coder(self, reason: str = "") -> None:
        """User-initiated escalation to Hive Coder."""
@@ -1112,7 +1102,7 @@ class ChatRepl(Vertical):
                    continue

                try:
-                    with open(state_file) as f:
+                    with open(state_file, encoding="utf-8") as f:
                        state = json.load(f)

                    status = state.get("status", "").lower()
@@ -38,6 +38,7 @@ def _linux_file_dialog() -> subprocess.CompletedProcess | None:
                "--title=Select a PDF file",
                "--file-filter=PDF files (*.pdf)|*.pdf",
            ],
+            encoding="utf-8",
            capture_output=True,
            text=True,
            timeout=300,
@@ -54,6 +55,7 @@ def _linux_file_dialog() -> subprocess.CompletedProcess | None:
                ".",
                "PDF files (*.pdf)",
            ],
+            encoding="utf-8",
            capture_output=True,
            text=True,
            timeout=300,
@@ -79,6 +81,7 @@ def _pick_pdf_subprocess() -> Path | None:
                    'POSIX path of (choose file of type {"com.adobe.pdf"} '
                    'with prompt "Select a PDF file")',
                ],
+                encoding="utf-8",
                capture_output=True,
                text=True,
                timeout=300,
@@ -93,6 +96,7 @@ def _pick_pdf_subprocess() -> Path | None:
            )
            result = subprocess.run(
                ["powershell", "-NoProfile", "-Command", ps_script],
+                encoding="utf-8",
                capture_output=True,
                text=True,
                timeout=300,
@@ -199,10 +199,11 @@ def _copy_to_clipboard(text: str) -> None:
    """Copy text to system clipboard using platform-native tools."""
    try:
        if sys.platform == "darwin":
-            subprocess.run(["pbcopy"], input=text.encode(), check=True, timeout=5)
+            subprocess.run(["pbcopy"], encoding="utf-8", input=text.encode(), check=True, timeout=5)
        elif sys.platform == "win32":
            subprocess.run(
                ["clip.exe"],
+                encoding="utf-8",
                input=text.encode("utf-16le"),
                check=True,
                timeout=5,
@@ -211,6 +212,7 @@ def _copy_to_clipboard(text: str) -> None:
            try:
                subprocess.run(
                    ["xclip", "-selection", "clipboard"],
+                    encoding="utf-8",
                    input=text.encode(),
                    check=True,
                    timeout=5,
@@ -218,6 +220,7 @@ def _copy_to_clipboard(text: str) -> None:
            except (subprocess.SubprocessError, FileNotFoundError):
                subprocess.run(
                    ["xsel", "--clipboard", "--input"],
+                    encoding="utf-8",
                    input=text.encode(),
                    check=True,
                    timeout=5,
@@ -37,6 +37,10 @@ export const executionApi = {
  chat: (sessionId: string, message: string) =>
    api.post<ChatResult>(`/sessions/${sessionId}/chat`, { message }),

+  /** Queue context for the queen without triggering an LLM response. */
+  queenContext: (sessionId: string, message: string) =>
+    api.post<ChatResult>(`/sessions/${sessionId}/queen-context`, { message }),
+
  workerInput: (sessionId: string, message: string) =>
    api.post<ChatResult>(`/sessions/${sessionId}/worker-input`, { message }),

@@ -12,6 +12,8 @@ export interface LiveSession {
  loaded_at: number;
  uptime_seconds: number;
  intro_message?: string;
+  /** Queen operating mode — "building", "staging", or "running" */
+  queen_mode?: "building" | "staging" | "running";
  /** Present in 409 conflict responses when worker is still loading */
  loading?: boolean;
 }
@@ -271,6 +273,7 @@ export type EventTypeName =
  | "escalation_requested"
  | "worker_loaded"
  | "credentials_required"
+  | "queen_mode_changed"
  | "subagent_report";

 export interface AgentEvent {
@@ -31,6 +31,7 @@ interface AgentGraphProps {
  version?: string;
  runState?: RunState;
  building?: boolean;
+  queenMode?: "building" | "staging" | "running";
 }

 // --- Extracted RunButton so hover state survives parent re-renders ---
@@ -145,7 +146,7 @@ function truncateLabel(label: string, availablePx: number, fontSize: number): st
  return label.slice(0, Math.max(maxChars - 1, 1)) + "\u2026";
 }

-export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, onPause, version, runState: externalRunState, building }: AgentGraphProps) {
+export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, onPause, version, runState: externalRunState, building, queenMode }: AgentGraphProps) {
  const [localRunState, setLocalRunState] = useState<RunState>("idle");
  const runState = externalRunState ?? localRunState;
  const runBtnRef = useRef<HTMLButtonElement>(null);
@@ -277,7 +278,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
              </span>
            )}
          </div>
-          <RunButton runState={runState} disabled={nodes.length === 0} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
+          <RunButton runState={runState} disabled={nodes.length === 0 || queenMode === "building"} onRun={handleRun} onPause={onPause ?? (() => {})} btnRef={runBtnRef} />
        </div>
        <div className="flex-1 flex items-center justify-center px-5">
          {building ? (
@@ -1,6 +1,7 @@
 import { memo, useState, useRef, useEffect } from "react";
-import { Send, Square, Crown, Cpu, Check, Loader2, Reply } from "lucide-react";
+import { Send, Square, Crown, Cpu, Check, Loader2 } from "lucide-react";
 import MarkdownContent from "@/components/MarkdownContent";
+import QuestionWidget from "@/components/QuestionWidget";

 export interface ChatMessage {
  id: string;
@@ -20,15 +21,25 @@ interface ChatPanelProps {
  messages: ChatMessage[];
  onSend: (message: string, thread: string) => void;
  isWaiting?: boolean;
+  /** When true a worker is thinking (not yet streaming) */
+  isWorkerWaiting?: boolean;
+  /** When true the queen is busy (typing or streaming) — shows the stop button */
+  isBusy?: boolean;
  activeThread: string;
-  /** When true, the worker is waiting for user input — shows inline reply box */
-  workerAwaitingInput?: boolean;
  /** When true, the input is disabled (e.g. during loading) */
  disabled?: boolean;
  /** Called when user clicks the stop button to cancel the queen's current turn */
  onCancel?: () => void;
-  /** Called when user submits a reply to the worker's input request */
-  onWorkerReply?: (message: string) => void;
+  /** Pending question from ask_user — replaces textarea when present */
+  pendingQuestion?: string | null;
+  /** Options for the pending question */
+  pendingOptions?: string[] | null;
+  /** Called when user submits an answer to the pending question */
+  onQuestionSubmit?: (answer: string, isOther: boolean) => void;
+  /** Called when user dismisses the pending question without answering */
+  onQuestionDismiss?: () => void;
+  /** Queen operating mode — shown as a tag on queen messages */
+  queenMode?: "building" | "staging" | "running";
 }

 const queenColor = "hsl(45,95%,58%)";
@@ -133,76 +144,7 @@ function ToolActivityRow({ content }: { content: string }) {
  );
 }

-/** Inline reply box that appears below a worker's input request in the chat thread. */
-function WorkerInputReply({ onSubmit, disabled }: { onSubmit: (text: string) => void; disabled?: boolean }) {
-  const [value, setValue] = useState("");
-  const [sent, setSent] = useState(false);
-  const inputRef = useRef<HTMLTextAreaElement>(null);
-
-  useEffect(() => {
-    if (!disabled && !sent) inputRef.current?.focus();
-  }, [disabled, sent]);
-
-  const handleSubmit = (e: React.FormEvent) => {
-    e.preventDefault();
-    if (!value.trim() || sent) return;
-    onSubmit(value.trim());
-    setSent(true);
-  };
-
-  if (sent) {
-    return (
-      <div className="ml-10 flex items-center gap-1.5 text-[11px] text-muted-foreground py-1">
-        <Check className="w-3 h-3 text-emerald-500" />
-        <span>Response sent</span>
-      </div>
-    );
-  }
-
-  return (
-    <form onSubmit={handleSubmit} className="ml-10 mt-1">
-      <div
-        className="flex items-center gap-2 rounded-xl px-3 py-2 border transition-colors"
-        style={{
-          backgroundColor: `${workerColor}08`,
-          borderColor: `${workerColor}30`,
-        }}
-      >
-        <Reply className="w-3.5 h-3.5 flex-shrink-0" style={{ color: workerColor }} />
-        <textarea
-          ref={inputRef}
-          rows={1}
-          value={value}
-          onChange={(e) => {
-            setValue(e.target.value);
-            const ta = e.target;
-            ta.style.height = "auto";
-            ta.style.height = `${Math.min(ta.scrollHeight, 120)}px`;
-          }}
-          onKeyDown={(e) => {
-            if (e.key === "Enter" && !e.shiftKey) {
-              e.preventDefault();
-              handleSubmit(e);
-            }
-          }}
-          placeholder="Reply to worker..."
-          disabled={disabled}
-          className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 resize-none overflow-y-auto"
-        />
-        <button
-          type="submit"
-          disabled={!value.trim() || disabled}
-          className="p-1.5 rounded-lg transition-opacity disabled:opacity-30 hover:opacity-90"
-          style={{ backgroundColor: workerColor, color: "white" }}
-        >
-          <Send className="w-3.5 h-3.5" />
-        </button>
-      </div>
-    </form>
-  );
-}
-
-const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage }) {
+const MessageBubble = memo(function MessageBubble({ msg, queenMode }: { msg: ChatMessage; queenMode?: "building" | "staging" | "running" }) {
  const isUser = msg.type === "user";
  const isQueen = msg.role === "queen";
  const color = getColor(msg.agent, msg.role);
@@ -257,7 +199,13 @@ const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage })
              isQueen ? "bg-primary/15 text-primary" : "bg-muted text-muted-foreground"
            }`}
          >
-            {isQueen ? "Queen" : "Worker"}
+            {isQueen
+              ? queenMode === "running"
+                ? "running mode"
+                : queenMode === "staging"
+                  ? "staging mode"
+                  : "building mode"
+              : "Worker"}
          </span>
        </div>
        <div
@@ -270,12 +218,14 @@ const MessageBubble = memo(function MessageBubble({ msg }: { msg: ChatMessage })
      </div>
    </div>
  );
-}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content);
+}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.queenMode === next.queenMode);

-export default function ChatPanel({ messages, onSend, isWaiting, activeThread, workerAwaitingInput, disabled, onCancel, onWorkerReply }: ChatPanelProps) {
+export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, onQuestionSubmit, onQuestionDismiss, queenMode }: ChatPanelProps) {
  const [input, setInput] = useState("");
  const [readMap, setReadMap] = useState<Record<string, number>>({});
  const bottomRef = useRef<HTMLDivElement>(null);
+  const scrollRef = useRef<HTMLDivElement>(null);
+  const stickToBottom = useRef(true);
  const textareaRef = useRef<HTMLTextAreaElement>(null);

  const threadMessages = messages.filter((m) => {
@@ -292,10 +242,24 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
  // Suppress unused var
  void readMap;

-  const lastMsg = threadMessages[threadMessages.length - 1];
+  // Autoscroll: only when user is already near the bottom
+  const handleScroll = () => {
+    const el = scrollRef.current;
+    if (!el) return;
+    const distFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
+    stickToBottom.current = distFromBottom < 80;
+  };
+
  useEffect(() => {
-    bottomRef.current?.scrollIntoView({ behavior: "smooth" });
-  }, [threadMessages.length, lastMsg?.content, workerAwaitingInput]);
+    if (stickToBottom.current) {
+      bottomRef.current?.scrollIntoView({ behavior: "smooth" });
+    }
+  }, [threadMessages, pendingQuestion, isWaiting, isWorkerWaiting]);
+
+  // Always start pinned to bottom when switching threads
+  useEffect(() => {
+    stickToBottom.current = true;
+  }, [activeThread]);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
@@ -305,17 +269,6 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
    if (textareaRef.current) textareaRef.current.style.height = "auto";
  };

-  // Find the last worker message to attach the inline reply box below.
-  // For explicit ask_user, this will be the worker_input_request message.
-  // For auto-block, this will be the last client_output_delta streamed message.
-  const lastWorkerMsgIdx = workerAwaitingInput
-    ? threadMessages.reduce(
-        (last, m, i) =>
-          m.role === "worker" && m.type !== "tool_status" && m.type !== "system" ? i : last,
-        -1,
-      )
-    : -1;
-
  return (
    <div className="flex flex-col h-full min-w-0">
      {/* Compact sub-header */}
@@ -324,20 +277,44 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
      </div>

      {/* Messages */}
-      <div className="flex-1 overflow-auto px-5 py-4 space-y-3">
-        {threadMessages.map((msg, idx) => (
+      <div ref={scrollRef} onScroll={handleScroll} className="flex-1 overflow-auto px-5 py-4 space-y-3">
+        {threadMessages.map((msg) => (
          <div key={msg.id}>
-            <MessageBubble msg={msg} />
-            {idx === lastWorkerMsgIdx && onWorkerReply && (
-              <WorkerInputReply onSubmit={onWorkerReply} />
-            )}
+            <MessageBubble msg={msg} queenMode={queenMode} />
          </div>
        ))}

        {isWaiting && (
          <div className="flex gap-3">
-            <div className="w-7 h-7 rounded-xl bg-muted flex items-center justify-center">
-              <Cpu className="w-3.5 h-3.5 text-muted-foreground" />
+            <div
+              className="flex-shrink-0 w-9 h-9 rounded-xl flex items-center justify-center"
+              style={{
+                backgroundColor: `${queenColor}18`,
+                border: `1.5px solid ${queenColor}35`,
+                boxShadow: `0 0 12px ${queenColor}20`,
+              }}
+            >
+              <Crown className="w-4 h-4" style={{ color: queenColor }} />
+            </div>
+            <div className="border border-primary/20 bg-primary/5 rounded-2xl rounded-tl-md px-4 py-3">
+              <div className="flex gap-1.5">
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "0ms" }} />
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "150ms" }} />
+                <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground animate-bounce" style={{ animationDelay: "300ms" }} />
+              </div>
+            </div>
+          </div>
+        )}
+        {isWorkerWaiting && !isWaiting && (
+          <div className="flex gap-3">
+            <div
+              className="flex-shrink-0 w-7 h-7 rounded-xl flex items-center justify-center"
+              style={{
+                backgroundColor: `${workerColor}18`,
+                border: `1.5px solid ${workerColor}35`,
+              }}
+            >
+              <Cpu className="w-3.5 h-3.5" style={{ color: workerColor }} />
            </div>
            <div className="bg-muted/60 rounded-2xl rounded-tl-md px-4 py-3">
              <div className="flex gap-1.5">
@@ -351,48 +328,57 @@ export default function ChatPanel({ messages, onSend, isWaiting, activeThread, w
        <div ref={bottomRef} />
      </div>

-      {/* Input — always connected to Queen */}
-      <form onSubmit={handleSubmit} className="p-4 border-t border-border">
-        <div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
-          <textarea
-            ref={textareaRef}
-            rows={1}
-            value={input}
-            onChange={(e) => {
-              setInput(e.target.value);
-              const ta = e.target;
-              ta.style.height = "auto";
-              ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
-            }}
-            onKeyDown={(e) => {
-              if (e.key === "Enter" && !e.shiftKey) {
-                e.preventDefault();
-                handleSubmit(e);
-              }
-            }}
-            placeholder={disabled ? "Connecting to agent..." : "Message Queen Bee..."}
-            disabled={disabled}
-            className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
-          />
-          {isWaiting && onCancel ? (
-            <button
-              type="button"
-              onClick={onCancel}
-              className="p-2 rounded-lg bg-destructive text-destructive-foreground hover:opacity-90 transition-opacity"
-            >
-              <Square className="w-4 h-4" />
-            </button>
-          ) : (
-            <button
-              type="submit"
-              disabled={!input.trim() || disabled}
-              className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
-            >
-              <Send className="w-4 h-4" />
-            </button>
-          )}
-        </div>
-      </form>
+      {/* Input area — question widget replaces textarea when a question is pending */}
+      {pendingQuestion && pendingOptions && onQuestionSubmit ? (
+        <QuestionWidget
+          question={pendingQuestion}
+          options={pendingOptions}
+          onSubmit={onQuestionSubmit}
+          onDismiss={onQuestionDismiss}
+        />
+      ) : (
+        <form onSubmit={handleSubmit} className="p-4">
+          <div className="flex items-center gap-3 bg-muted/40 rounded-xl px-4 py-2.5 border border-border focus-within:border-primary/40 transition-colors">
+            <textarea
+              ref={textareaRef}
+              rows={1}
+              value={input}
+              onChange={(e) => {
+                setInput(e.target.value);
+                const ta = e.target;
+                ta.style.height = "auto";
+                ta.style.height = `${Math.min(ta.scrollHeight, 160)}px`;
+              }}
+              onKeyDown={(e) => {
+                if (e.key === "Enter" && !e.shiftKey) {
+                  e.preventDefault();
+                  handleSubmit(e);
+                }
+              }}
+              placeholder={disabled ? "Connecting to agent..." : "Message Queen Bee..."}
+              disabled={disabled}
+              className="flex-1 bg-transparent text-sm text-foreground outline-none placeholder:text-muted-foreground disabled:opacity-50 disabled:cursor-not-allowed resize-none overflow-y-auto"
+            />
+            {isBusy && onCancel ? (
+              <button
+                type="button"
+                onClick={onCancel}
+                className="p-2 rounded-lg bg-amber-500/15 text-amber-400 border border-amber-500/40 hover:bg-amber-500/25 transition-colors"
+              >
+                <Square className="w-4 h-4" />
+              </button>
+            ) : (
+              <button
+                type="submit"
+                disabled={!input.trim() || disabled}
+                className="p-2 rounded-lg bg-primary text-primary-foreground disabled:opacity-30 hover:opacity-90 transition-opacity"
+              >
+                <Send className="w-4 h-4" />
+              </button>
+            )}
+          </div>
+        </form>
+      )}
    </div>
  );
 }
@@ -0,0 +1,142 @@
+import { useState, useRef, useEffect, useCallback } from "react";
+import { Send, MessageCircleQuestion, X } from "lucide-react";
+
+export interface QuestionWidgetProps {
+  /** The question text shown to the user */
+  question: string;
+  /** 1-3 predefined options. The UI appends an "Other" free-text option. */
+  options: string[];
+  /** Called with the selected option label or custom text, and whether "Other" was chosen */
+  onSubmit: (answer: string, isOther: boolean) => void;
+  /** Called when user dismisses the question without answering */
+  onDismiss?: () => void;
+}
+
+export default function QuestionWidget({ question, options, onSubmit, onDismiss }: QuestionWidgetProps) {
+  const [selected, setSelected] = useState<number | null>(null);
+  const [customText, setCustomText] = useState("");
+  const [submitted, setSubmitted] = useState(false);
+  const inputRef = useRef<HTMLInputElement>(null);
+  const containerRef = useRef<HTMLDivElement>(null);
+
+  // "Other" is always the last option index
+  const otherIndex = options.length;
+  const isOtherSelected = selected === otherIndex;
+
+  // Focus the text input when "Other" is selected
+  useEffect(() => {
+    if (isOtherSelected) {
+      inputRef.current?.focus();
+    }
+  }, [isOtherSelected]);
+
+  const canSubmit = selected !== null && (!isOtherSelected || customText.trim().length > 0);
+
+  const handleSubmit = useCallback(() => {
+    if (!canSubmit || submitted) return;
+    setSubmitted(true);
+    if (isOtherSelected) {
+      onSubmit(customText.trim(), true);
+    } else {
+      onSubmit(options[selected!], false);
+    }
+  }, [canSubmit, submitted, isOtherSelected, customText, options, selected, onSubmit]);
+
+  // Keyboard: Enter to submit, number keys to select (only when text input is not focused)
+  useEffect(() => {
+    const handleKeyDown = (e: KeyboardEvent) => {
+      if (submitted) return;
+      const inTextInput = e.target === inputRef.current;
+
+      if (e.key === "Enter" && !e.shiftKey) {
+        e.preventDefault();
+        handleSubmit();
+        return;
+      }
+
+      // Number keys 1-4 select options — skip when typing in the "Other" field
+      if (!inTextInput) {
+        const num = parseInt(e.key, 10);
+        if (num >= 1 && num <= options.length + 1) {
+          e.preventDefault();
+          setSelected(num - 1);
+        }
+      }
+    };
+
+    window.addEventListener("keydown", handleKeyDown);
+    return () => window.removeEventListener("keydown", handleKeyDown);
+  }, [handleSubmit, submitted, options.length]);
+
+  if (submitted) return null;
+
+  return (
+    <div ref={containerRef} className="p-4">
+      <div className="bg-card border border-border rounded-xl shadow-sm overflow-hidden">
+        {/* Header / Question */}
+        <div className="px-5 pt-4 pb-3 flex items-start gap-3">
+          <div className="w-7 h-7 rounded-lg bg-primary/10 border border-primary/20 flex items-center justify-center flex-shrink-0 mt-0.5">
+            <MessageCircleQuestion className="w-3.5 h-3.5 text-primary" />
+          </div>
+          <p className="text-sm font-medium text-foreground leading-relaxed flex-1">{question}</p>
+          {onDismiss && (
+            <button
+              onClick={onDismiss}
+              className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors flex-shrink-0"
+            >
+              <X className="w-4 h-4" />
+            </button>
+          )}
+        </div>
+
+        {/* Options */}
+        <div className="px-5 pb-3 space-y-1.5">
+          {options.map((option, idx) => (
+            <button
+              key={idx}
+              onClick={() => setSelected(idx)}
+              className={`w-full text-left px-4 py-2.5 rounded-lg border text-sm transition-colors ${
+                selected === idx
+                  ? "border-primary bg-primary/10 text-foreground"
+                  : "border-border/60 bg-muted/20 text-foreground hover:border-primary/40 hover:bg-muted/40"
+              }`}
+            >
+              <span className="text-xs text-muted-foreground mr-2">{idx + 1}.</span>
+              {option}
+            </button>
+          ))}
+
+          {/* "Other" — inline text input that auto-selects on focus */}
+          <input
+            ref={inputRef}
+            type="text"
+            value={customText}
+            onFocus={() => setSelected(otherIndex)}
+            onChange={(e) => {
+              setSelected(otherIndex);
+              setCustomText(e.target.value);
+            }}
+            placeholder="Type a custom response..."
+            className={`w-full px-4 py-2.5 rounded-lg border border-dashed text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none ${
+              isOtherSelected
+                ? "border-primary bg-primary/10 text-foreground"
+                : "border-border text-muted-foreground hover:border-primary/40"
+            }`}
+          />
+        </div>
+
+        {/* Submit */}
+        <div className="px-5 pb-4">
+          <button
+            onClick={handleSubmit}
+            disabled={!canSubmit}
+            className="w-full flex items-center justify-center gap-2 py-2.5 rounded-lg text-sm font-medium bg-primary text-primary-foreground hover:bg-primary/90 disabled:opacity-30 disabled:cursor-not-allowed transition-colors"
+          >
+            <Send className="w-3.5 h-3.5" />
+            Submit
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
@@ -167,3 +167,12 @@
 .animate-in.slide-in-from-right {
  animation: slide-in-from-right 0.2s ease-out;
 }
+
+/* Slide-up animation for question widget */
+@keyframes slide-in-from-bottom {
+  from { transform: translateY(16px); opacity: 0; }
+  to { transform: translateY(0); opacity: 1; }
+}
+.animate-in.slide-in-from-bottom {
+  animation: slide-in-from-bottom 0.25s ease-out;
+}
@@ -8,6 +8,7 @@ import TopBar from "@/components/TopBar";
 import { TAB_STORAGE_KEY, loadPersistedTabs, savePersistedTabs, type PersistedTabState } from "@/lib/tab-persistence";
 import NodeDetailPanel from "@/components/NodeDetailPanel";
 import CredentialsModal, { type Credential, createFreshCredentials, cloneCredentials, allRequiredCredentialsMet, clearCredentialCache } from "@/components/CredentialsModal";
+
 import { agentsApi } from "@/api/agents";
 import { executionApi } from "@/api/execution";
 import { graphsApi } from "@/api/graphs";
@@ -240,6 +241,8 @@ interface AgentBackendState {
  /** The message ID of the current worker input request (for inline reply box) */
  workerInputMessageId: string | null;
  queenBuilding: boolean;
+  /** Queen operating mode — "building" (coding), "staging" (loaded), or "running" (executing) */
+  queenMode: "building" | "staging" | "running";
  workerRunState: "idle" | "deploying" | "running";
  currentExecutionId: string | null;
  nodeLogs: Record<string, string[]>;
@@ -247,8 +250,18 @@ interface AgentBackendState {
  subagentReports: { subagent_id: string; message: string; data?: Record<string, unknown>; timestamp: string }[];
  isTyping: boolean;
  isStreaming: boolean;
+  /** True only when the queen's LLM is actively processing (not worker) */
+  queenIsTyping: boolean;
+  /** True only when a worker's LLM is actively processing (not queen) */
+  workerIsTyping: boolean;
  llmSnapshots: Record<string, string>;
  activeToolCalls: Record<string, { name: string; done: boolean; streamId: string }>;
+  /** Structured question text from ask_user with options */
+  pendingQuestion: string | null;
+  /** Predefined choices from ask_user (1-3 items); UI appends "Other" */
+  pendingOptions: string[] | null;
+  /** Whether the pending question came from queen or worker */
+  pendingQuestionSource: "queen" | "worker" | null;
 }

 function defaultAgentState(): AgentBackendState {
@@ -264,6 +277,7 @@ function defaultAgentState(): AgentBackendState {
    awaitingInput: false,
    workerInputMessageId: null,
    queenBuilding: false,
+    queenMode: "building",
    workerRunState: "idle",
    currentExecutionId: null,
    nodeLogs: {},
@@ -271,8 +285,13 @@ function defaultAgentState(): AgentBackendState {
    subagentReports: [],
    isTyping: false,
    isStreaming: false,
+    queenIsTyping: false,
+    workerIsTyping: false,
    llmSnapshots: {},
    activeToolCalls: {},
+    pendingQuestion: null,
+    pendingOptions: null,
+    pendingQuestionSource: null,
  };
 }

@@ -352,8 +371,14 @@ export default function Workspace() {
    if (persisted) {
      const restored = { ...persisted.activeSessionByAgent };
      const urlSessions = sessionsByAgent[initialAgent];
-      if (urlSessions?.length && !restored[initialAgent]) {
-        restored[initialAgent] = urlSessions[0].id;
+      if (urlSessions?.length) {
+        // When a prompt was submitted from home, activate the newly created
+        // session (last in array) instead of the previously active one.
+        if (initialPrompt && hasExplicitAgent) {
+          restored[initialAgent] = urlSessions[urlSessions.length - 1].id;
+        } else if (!restored[initialAgent]) {
+          restored[initialAgent] = urlSessions[0].id;
+        }
      }
      return restored;
    }
@@ -632,7 +657,11 @@ export default function Workspace() {
                  const result = await sessionsApi.get(existingSessionId);
                  if (result.loading) continue;
                  return result as LiveSession;
-                } catch {
+                } catch (pollErr) {
+                  // 404 = agent failed to load and was cleaned up — stop immediately
+                  if (pollErr instanceof ApiError && pollErr.status === 404) {
+                    throw new Error("Agent failed to load");
+                  }
                  if (i === maxAttempts - 1) throw loadErr;
                }
              }
@@ -648,7 +677,13 @@ export default function Workspace() {
      // failed, the throw inside the catch exits the outer try block.
      const session = liveSession!;
      const displayName = formatAgentDisplayName(session.worker_name || agentType);
-      updateAgentState(agentType, { sessionId: session.session_id, displayName });
+      const initialMode = session.queen_mode || (session.has_worker ? "staging" : "building");
+      updateAgentState(agentType, {
+        sessionId: session.session_id,
+        displayName,
+        queenMode: initialMode,
+        queenBuilding: initialMode === "building",
+      });

      // Update the session label
      setSessionsByAgent((prev) => {
@@ -921,7 +956,7 @@ export default function Workspace() {
    } catch {
      // Best-effort — queen may have already finished
    }
-    updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
+    updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false });
  }, [agentStates, activeWorker, updateAgentState]);

  // --- Node log helper (writes into agentStates) ---
@@ -1004,7 +1039,7 @@ export default function Workspace() {
        case "execution_started":
          if (isQueen) {
            turnCounterRef.current[turnKey] = currentTurn + 1;
-            updateAgentState(agentType, { isTyping: true });
+            updateAgentState(agentType, { isTyping: true, queenIsTyping: true });
          } else {
            // Warn if prior LLM snapshots are being dropped (edge case: execution_completed never arrived)
            const priorSnapshots = agentStates[agentType]?.llmSnapshots || {};
@@ -1015,6 +1050,7 @@ export default function Workspace() {
            updateAgentState(agentType, {
              isTyping: true,
              isStreaming: false,
+              workerIsTyping: true,
              awaitingInput: false,
              workerRunState: "running",
              currentExecutionId: event.execution_id || agentStates[agentType]?.currentExecutionId || null,
@@ -1022,6 +1058,9 @@ export default function Workspace() {
              subagentReports: [],
              llmSnapshots: {},
              activeToolCalls: {},
+              pendingQuestion: null,
+              pendingOptions: null,
+              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping", "complete", "error"], "pending");
          }
@@ -1029,7 +1068,7 @@ export default function Workspace() {

        case "execution_completed":
          if (isQueen) {
-            updateAgentState(agentType, { isTyping: false });
+            updateAgentState(agentType, { isTyping: false, queenIsTyping: false });
          } else {
            // Flush any remaining LLM snapshots before clearing state
            const completedSnapshots = agentStates[agentType]?.llmSnapshots || {};
@@ -1041,11 +1080,15 @@ export default function Workspace() {
            updateAgentState(agentType, {
              isTyping: false,
              isStreaming: false,
+              workerIsTyping: false,
              awaitingInput: false,
              workerInputMessageId: null,
              workerRunState: "idle",
              currentExecutionId: null,
              llmSnapshots: {},
+              pendingQuestion: null,
+              pendingOptions: null,
+              pendingQuestionSource: null,
            });
            markAllNodesAs(agentType, ["running", "looping"], "complete");

@@ -1070,7 +1113,7 @@ export default function Workspace() {

          // Mark streaming when LLM text is actively arriving
          if (event.type === "llm_text_delta" || event.type === "client_output_delta") {
-            updateAgentState(agentType, { isStreaming: true });
+            updateAgentState(agentType, { isStreaming: true, ...(isQueen ? {} : { workerIsTyping: false }) });
          }

          if (event.type === "llm_text_delta" && !isQueen && event.node_id) {
@@ -1092,8 +1135,41 @@ export default function Workspace() {

          if (event.type === "client_input_requested") {
            console.log('[CLIENT_INPUT_REQ] stream_id:', streamId, 'isQueen:', isQueen, 'node_id:', event.node_id, 'prompt:', (event.data?.prompt as string)?.slice(0, 80), 'agentType:', agentType);
+            const rawOptions = event.data?.options;
+            const options = Array.isArray(rawOptions) ? (rawOptions as string[]) : null;
            if (isQueen) {
-              updateAgentState(agentType, { awaitingInput: true, isTyping: false, isStreaming: false, queenBuilding: false });
+              const prompt = (event.data?.prompt as string) || "";
+              const isAutoBlock = !prompt && !options;
+              // Queen auto-block (empty prompt, no options) should not
+              // overwrite a pending worker question — the worker's
+              // QuestionWidget must stay visible.  Use the updater form
+              // to read the latest state and avoid stale-closure races
+              // when worker and queen events arrive in the same batch.
+              setAgentStates(prev => {
+                const cur = prev[agentType] || defaultAgentState();
+                const workerQuestionActive = cur.pendingQuestionSource === "worker";
+                if (isAutoBlock && workerQuestionActive) {
+                  return { ...prev, [agentType]: {
+                    ...cur,
+                    awaitingInput: true,
+                    isTyping: false,
+                    isStreaming: false,
+                    queenIsTyping: false,
+                    queenBuilding: false,
+                  }};
+                }
+                return { ...prev, [agentType]: {
+                  ...cur,
+                  awaitingInput: true,
+                  isTyping: false,
+                  isStreaming: false,
+                  queenIsTyping: false,
+                  queenBuilding: false,
+                  pendingQuestion: prompt || null,
+                  pendingOptions: options,
+                  pendingQuestionSource: "queen",
+                }};
+              });
            } else {
              // Worker input request.
              // If the prompt is non-empty (explicit ask_user), create a visible
@@ -1121,18 +1197,22 @@ export default function Workspace() {
                awaitingInput: true,
                isTyping: false,
                isStreaming: false,
+                queenIsTyping: false,
+                pendingQuestion: prompt || null,
+                pendingOptions: options,
+                pendingQuestionSource: options ? "worker" : null,
              });
            }
          }
          if (event.type === "execution_paused") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, awaitingInput: false, workerInputMessageId: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              markAllNodesAs(agentType, ["running", "looping"], "pending");
            }
          }
          if (event.type === "execution_failed") {
-            updateAgentState(agentType, { isTyping: false, isStreaming: false, awaitingInput: false, workerInputMessageId: null });
+            updateAgentState(agentType, { isTyping: false, isStreaming: false, queenIsTyping: false, workerIsTyping: false, awaitingInput: false, workerInputMessageId: null, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
            if (!isQueen) {
              updateAgentState(agentType, { workerRunState: "idle", currentExecutionId: null });
              if (event.node_id) {
@@ -1164,7 +1244,11 @@ export default function Workspace() {

        case "node_loop_iteration":
          turnCounterRef.current[turnKey] = currentTurn + 1;
-          updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false });
+          if (isQueen) {
+            updateAgentState(agentType, { isStreaming: false, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+          } else {
+            updateAgentState(agentType, { isStreaming: false, workerIsTyping: true, activeToolCalls: {}, awaitingInput: false, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+          }
          if (!isQueen && event.node_id) {
            const pendingText = agentStates[agentType]?.llmSnapshots[event.node_id];
            if (pendingText?.trim()) {
@@ -1212,13 +1296,7 @@ export default function Workspace() {
        case "tool_call_started": {
          console.log('[TOOL_PILL] tool_call_started received:', { isQueen, nodeId: event.node_id, streamId: event.stream_id, agentType, executionId: event.execution_id, toolName: event.data?.tool_name });

-          // Detect queen building: when the queen starts writing/editing files, she's building an agent
-          if (isQueen) {
-            const tn = (event.data?.tool_name as string) || "";
-            if (tn === "write_file" || tn === "edit_file") {
-              updateAgentState(agentType, { queenBuilding: true });
-            }
-          }
+          // queenBuilding is now driven by queen_mode_changed events

          if (event.node_id) {
            if (!isQueen) {
@@ -1453,6 +1531,19 @@ export default function Workspace() {
          break;
        }

+        case "queen_mode_changed": {
+          const rawMode = event.data?.mode as string;
+          const newMode: "building" | "staging" | "running" =
+            rawMode === "running" ? "running" : rawMode === "staging" ? "staging" : "building";
+          updateAgentState(agentType, {
+            queenMode: newMode,
+            queenBuilding: newMode === "building",
+            // Sync workerRunState so the RunButton reflects the mode
+            workerRunState: newMode === "running" ? "running" : "idle",
+          });
+          break;
+        }
+
        case "worker_loaded": {
          const workerName = event.data?.worker_name as string | undefined;
          const agentPathFromEvent = event.data?.agent_path as string | undefined;
@@ -1561,6 +1652,11 @@ export default function Workspace() {
      return;
    }

+    // If queen has a pending question widget, dismiss it when user types directly
+    if (agentStates[activeWorker]?.pendingQuestionSource === "queen") {
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    }
+
    const userMsg: ChatMessage = {
      id: makeId(), agent: "You", agentColor: "",
      content: text, timestamp: "", type: "user", thread, createdAt: Date.now(),
@@ -1571,7 +1667,7 @@ export default function Workspace() {
        s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
      ),
    }));
-    updateAgentState(activeWorker, { isTyping: true });
+    updateAgentState(activeWorker, { isTyping: true, queenIsTyping: true });

    if (state?.sessionId && state?.ready) {
      executionApi.chat(state.sessionId, text).catch((err: unknown) => {
@@ -1587,7 +1683,7 @@ export default function Workspace() {
            s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
          ),
        }));
-        updateAgentState(activeWorker, { isTyping: false, isStreaming: false });
+        updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false });
      });
    } else {
      const errorMsg: ChatMessage = {
@@ -1624,7 +1720,7 @@ export default function Workspace() {
    }));

    // Clear awaiting state optimistically
-    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true });
+    updateAgentState(activeWorker, { awaitingInput: false, workerInputMessageId: null, isTyping: true, pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });

    executionApi.workerInput(state.sessionId, text).catch((err: unknown) => {
      const errMsg = err instanceof Error ? err.message : String(err);
@@ -1643,6 +1739,90 @@ export default function Workspace() {
    });
  }, [activeWorker, activeSession, agentStates, updateAgentState]);

+  // --- handleWorkerQuestionAnswer: route predefined answers direct to worker, "Other" through queen ---
+  const handleWorkerQuestionAnswer = useCallback((answer: string, isOther: boolean) => {
+    if (!activeSession) return;
+    const state = agentStates[activeWorker];
+    const question = state?.pendingQuestion || "";
+    const opts = state?.pendingOptions;
+
+    if (isOther) {
+      // "Other" free-text → route through queen for evaluation
+      updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+      if (question && opts && state?.sessionId && state?.ready) {
+        const formatted = `[Worker asked: "${question}" | Options: ${opts.join(", ")}]\nUser answered: "${answer}"`;
+        const userMsg: ChatMessage = {
+          id: makeId(), agent: "You", agentColor: "",
+          content: answer, timestamp: "", type: "user", thread: activeWorker, createdAt: Date.now(),
+        };
+        setSessionsByAgent(prev => ({
+          ...prev,
+          [activeWorker]: prev[activeWorker].map(s =>
+            s.id === activeSession.id ? { ...s, messages: [...s.messages, userMsg] } : s
+          ),
+        }));
+        updateAgentState(activeWorker, { isTyping: true, queenIsTyping: true });
+        executionApi.chat(state.sessionId, formatted).catch((err: unknown) => {
+          const errMsg = err instanceof Error ? err.message : String(err);
+          const errorChatMsg: ChatMessage = {
+            id: makeId(), agent: "System", agentColor: "",
+            content: `Failed to send message: ${errMsg}`,
+            timestamp: "", type: "system", thread: activeWorker, createdAt: Date.now(),
+          };
+          setSessionsByAgent(prev => ({
+            ...prev,
+            [activeWorker]: prev[activeWorker].map(s =>
+              s.id === activeSession.id ? { ...s, messages: [...s.messages, errorChatMsg] } : s
+            ),
+          }));
+          updateAgentState(activeWorker, { isTyping: false, isStreaming: false, queenIsTyping: false });
+        });
+      } else {
+        handleSend(answer, activeWorker);
+      }
+    } else {
+      // Predefined option → send directly to worker
+      handleWorkerReply(answer);
+      // Queue context for queen (fire-and-forget, no LLM response triggered)
+      if (question && state?.sessionId && state?.ready) {
+        const notification = `[Worker asked: "${question}" | User selected: "${answer}"]`;
+        executionApi.queenContext(state.sessionId, notification).catch(() => {});
+      }
+    }
+  }, [activeWorker, activeSession, agentStates, handleWorkerReply, handleSend, updateAgentState, setSessionsByAgent]);
+
+  // --- handleQueenQuestionAnswer: submit queen's own question answer via /chat ---
+  // The queen asked the question herself, so she already has context — just send the raw answer.
+  const handleQueenQuestionAnswer = useCallback((answer: string, _isOther: boolean) => {
+    updateAgentState(activeWorker, { pendingQuestion: null, pendingOptions: null, pendingQuestionSource: null });
+    handleSend(answer, activeWorker);
+  }, [activeWorker, handleSend, updateAgentState]);
+
+  // --- handleQuestionDismiss: user closed the question widget without answering ---
+  // Injects a dismiss signal so the blocked node can continue.
+  const handleQuestionDismiss = useCallback(() => {
+    const state = agentStates[activeWorker];
+    if (!state?.sessionId) return;
+    const source = state.pendingQuestionSource;
+    const question = state.pendingQuestion || "";
+
+    // Clear UI state immediately
+    updateAgentState(activeWorker, {
+      pendingQuestion: null,
+      pendingOptions: null,
+      pendingQuestionSource: null,
+      awaitingInput: false,
+    });
+
+    // Unblock the waiting node with a dismiss signal
+    const dismissMsg = `[User dismissed the question: "${question}"]`;
+    if (source === "worker") {
+      executionApi.workerInput(state.sessionId, dismissMsg).catch(() => {});
+    } else {
+      executionApi.chat(state.sessionId, dismissMsg).catch(() => {});
+    }
+  }, [agentStates, activeWorker, updateAgentState]);
+
  const handleLoadAgent = useCallback(async (agentPath: string) => {
    const state = agentStates[activeWorker];
    if (!state?.sessionId) return;
@@ -1795,6 +1975,7 @@ export default function Workspace() {
              onPause={handlePause}
              runState={activeAgentState?.workerRunState ?? "idle"}
              building={activeAgentState?.queenBuilding ?? false}
+              queenMode={activeAgentState?.queenMode ?? "building"}
            />
          </div>
        </div>
@@ -1856,16 +2037,23 @@ export default function Workspace() {
                messages={activeSession.messages}
                onSend={handleSend}
                onCancel={handleCancelQueen}
-                onWorkerReply={handleWorkerReply}
                activeThread={activeWorker}
-                isWaiting={(activeAgentState?.isTyping && !activeAgentState?.isStreaming) ?? false}
-                workerAwaitingInput={
-                  (activeAgentState?.awaitingInput && activeAgentState?.workerRunState === "running") ?? false
-                }
+                isWaiting={(activeAgentState?.queenIsTyping && !activeAgentState?.isStreaming) ?? false}
+                isWorkerWaiting={(activeAgentState?.workerIsTyping && !activeAgentState?.isStreaming) ?? false}
+                isBusy={activeAgentState?.queenIsTyping ?? false}
                disabled={
                  (activeAgentState?.loading ?? true) ||
                  !(activeAgentState?.queenReady)
                }
+                queenMode={activeAgentState?.queenMode ?? "building"}
+                pendingQuestion={activeAgentState?.awaitingInput ? activeAgentState.pendingQuestion : null}
+                pendingOptions={activeAgentState?.awaitingInput ? activeAgentState.pendingOptions : null}
+                onQuestionSubmit={
+                  activeAgentState?.pendingQuestionSource === "queen"
+                    ? handleQueenQuestionAnswer
+                    : handleWorkerQuestionAnswer
+                }
+                onQuestionDismiss={handleQuestionDismiss}
              />
            )}
          </div>
@@ -53,7 +53,13 @@ def log_error(message: str):
 def run_command(cmd: list, error_msg: str) -> bool:
    """Run a command and return success status."""
    try:
-        subprocess.run(cmd, check=True, capture_output=True, text=True)
+        subprocess.run(
+            cmd,
+            check=True,
+            capture_output=True,
+            text=True,
+            encoding="utf-8",
+        )
        return True
    except subprocess.CalledProcessError as e:
        log_error(error_msg)
@@ -97,7 +103,7 @@ def main():
    if mcp_config_path.exists():
        log_success("MCP configuration found at .mcp.json")
        logger.info("Configuration:")
-        with open(mcp_config_path) as f:
+        with open(mcp_config_path, encoding="utf-8") as f:
            config = json.load(f)
            logger.info(json.dumps(config, indent=2))
    else:
@@ -114,7 +120,7 @@ def main():
            }
        }

-        with open(mcp_config_path, "w") as f:
+        with open(mcp_config_path, "w", encoding="utf-8") as f:
            json.dump(config, f, indent=2)

        log_success("Created .mcp.json")
@@ -129,6 +135,7 @@ def main():
            check=True,
            capture_output=True,
            text=True,
+            encoding="utf-8",
        )
        log_success("MCP server module verified")
    except subprocess.CalledProcessError as e:
@@ -68,6 +68,7 @@ class TestFrameworkModule:
            [sys.executable, "-m", "framework", "--help"],
            capture_output=True,
            text=True,
+            encoding="utf-8",
            cwd=str(project_root / "core"),
        )
        assert result.returncode == 0
@@ -79,6 +80,7 @@ class TestFrameworkModule:
            [sys.executable, "-m", "framework", "list", "--help"],
            capture_output=True,
            text=True,
+            encoding="utf-8",
            cwd=str(project_root / "core"),
        )
        assert result.returncode == 0
@@ -104,6 +106,7 @@ class TestHiveEntryPoint:
            ["hive", "--help"],
            capture_output=True,
            text=True,
+            encoding="utf-8",
        )
        assert result.returncode == 0
        assert "run" in result.stdout.lower()
@@ -115,6 +118,7 @@ class TestHiveEntryPoint:
            ["hive", "list", "--help"],
            capture_output=True,
            text=True,
+            encoding="utf-8",
        )
        assert result.returncode == 0

@@ -124,5 +128,6 @@ class TestHiveEntryPoint:
            ["hive", "run", "nonexistent_agent_xyz"],
            capture_output=True,
            text=True,
+            encoding="utf-8",
        )
        assert result.returncode != 0
@@ -578,7 +578,11 @@ class TestClientFacingBlocking:
        """signal_shutdown should unblock a waiting client_facing node."""
        llm = MockStreamingLLM(
            scenarios=[
-                tool_call_scenario("ask_user", {"question": "Waiting..."}, tool_use_id="ask_1"),
+                tool_call_scenario(
+                    "ask_user",
+                    {"question": "Waiting...", "options": ["Continue", "Stop"]},
+                    tool_use_id="ask_1",
+                ),
            ]
        )
        bus = EventBus()
@@ -600,7 +604,11 @@ class TestClientFacingBlocking:
        """CLIENT_INPUT_REQUESTED should be published when ask_user blocks."""
        llm = MockStreamingLLM(
            scenarios=[
-                tool_call_scenario("ask_user", {"question": "Hello!"}, tool_use_id="ask_1"),
+                tool_call_scenario(
+                    "ask_user",
+                    {"question": "Hello!", "options": ["Yes", "No"]},
+                    tool_use_id="ask_1",
+                ),
            ]
        )
        bus = EventBus()
@@ -796,7 +804,7 @@ class TestClientFacingExpectingWork:

        async def user_then_shutdown():
            await asyncio.sleep(0.05)
-            await node.inject_event("furwise.app")
+            await node.inject_event("furwise.app", is_client_input=True)
            # Node should auto-block on "Monitoring..." text.
            # Give it time to reach the block, then shutdown.
            await asyncio.sleep(0.1)
@@ -2027,3 +2035,61 @@ class TestExecutionId:
            node_spec=node_spec, memory=SharedMemory(), goal=goal, input_data={}
        )
        assert ctx.execution_id == ""
+
+
+# ---------------------------------------------------------------------------
+# Subagent memory snapshot includes accumulator outputs
+# ---------------------------------------------------------------------------
+
+
+class TestSubagentAccumulatorMemory:
+    """Verify that subagent memory construction merges accumulator outputs
+    and includes the subagent's input_keys in read permissions."""
+
+    def test_accumulator_values_merged_into_parent_data(self):
+        """Keys from OutputAccumulator should appear in subagent memory."""
+        # Simulate what _execute_subagent does internally:
+        # parent shared memory has user_request but NOT tweet_content
+        parent_memory = SharedMemory()
+        parent_memory.write("user_request", "post a joke")
+        parent_data = parent_memory.read_all()  # {"user_request": "post a joke"}
+
+        # Accumulator has tweet_content (set via set_output before delegation)
+        acc = OutputAccumulator(values={"tweet_content": "Hello world!"})
+
+        # Merge accumulator outputs (the fix)
+        for key, value in acc.to_dict().items():
+            if key not in parent_data:
+                parent_data[key] = value
+
+        # Build subagent memory
+        subagent_memory = SharedMemory()
+        for key, value in parent_data.items():
+            subagent_memory.write(key, value, validate=False)
+
+        subagent_input_keys = ["tweet_content"]
+        read_keys = set(parent_data.keys()) | set(subagent_input_keys)
+        scoped = subagent_memory.with_permissions(read_keys=list(read_keys), write_keys=[])
+
+        # This would have raised PermissionError before the fix
+        assert scoped.read("tweet_content") == "Hello world!"
+        assert scoped.read("user_request") == "post a joke"
+
+    def test_input_keys_allowed_even_if_not_in_data(self):
+        """Subagent input_keys should be in read permissions even if the
+        key doesn't exist in memory (returns None instead of PermissionError)."""
+        parent_memory = SharedMemory()
+        parent_memory.write("user_request", "hi")
+        parent_data = parent_memory.read_all()
+
+        subagent_memory = SharedMemory()
+        for key, value in parent_data.items():
+            subagent_memory.write(key, value, validate=False)
+
+        # input_keys includes "tweet_content" which isn't in parent_data
+        read_keys = set(parent_data.keys()) | {"tweet_content"}
+        scoped = subagent_memory.with_permissions(read_keys=list(read_keys), write_keys=[])
+
+        # Should return None (not raise PermissionError)
+        assert scoped.read("tweet_content") is None
+        assert scoped.read("user_request") == "hi"
@@ -232,7 +232,7 @@ async def test_shared_session_reuses_directory_and_memory(tmp_path):
    # Verify primary session's state.json exists and has the primary entry_point
    primary_state_path = tmp_path / "sessions" / primary_exec_id / "state.json"
    assert primary_state_path.exists()
-    primary_state = json.loads(primary_state_path.read_text())
+    primary_state = json.loads(primary_state_path.read_text(encoding="utf-8"))
    assert primary_state["entry_point"] == "primary"

    # Async stream — simulates a webhook entry point sharing the session
@@ -275,7 +275,7 @@ async def test_shared_session_reuses_directory_and_memory(tmp_path):

    # State.json should NOT have been overwritten by the async execution
    # (it should still show the primary entry point)
-    final_state = json.loads(primary_state_path.read_text())
+    final_state = json.loads(primary_state_path.read_text(encoding="utf-8"))
    assert final_state["entry_point"] == "primary"

    # Verify only ONE session directory exists (not two)
@@ -2,11 +2,12 @@

 from __future__ import annotations

+import json
 from typing import Any

 import pytest

-from framework.graph.conversation import Message, NodeConversation
+from framework.graph.conversation import Message, NodeConversation, extract_tool_call_history
 from framework.storage.conversation_store import FileConversationStore

 # ---------------------------------------------------------------------------
@@ -930,3 +931,600 @@ class TestConversationIntegration:
        assert restored.next_seq == 4
        assert restored.messages[0].content == "new msg"
        assert restored.messages[0].seq == 2
+
+
+# ---------------------------------------------------------------------------
+# Helpers for aggressive compaction tests
+# ---------------------------------------------------------------------------
+
+
+def _make_tool_call(call_id: str, name: str, args: dict) -> dict:
+    return {
+        "id": call_id,
+        "type": "function",
+        "function": {"name": name, "arguments": json.dumps(args)},
+    }
+
+
+async def _build_tool_heavy_conversation(
+    store: MockConversationStore | None = None,
+) -> NodeConversation:
+    """Build a conversation with many tool call pairs.
+
+    Layout: user msg, then 5x (assistant with append_data tool_call + tool result),
+    then 1x (assistant with set_output tool_call + tool result), then user msg + assistant msg.
+    """
+    conv = NodeConversation(store=store)
+    await conv.add_user_message("Process the data")  # seq 0
+
+    for i in range(5):
+        args = {"filename": "output.html", "content": "x" * 500}
+        tc = [_make_tool_call(f"call_{i}", "append_data", args)]
+        conv._messages.append(
+            Message(
+                seq=conv._next_seq,
+                role="assistant",
+                content=f"Appending part {i}",
+                tool_calls=tc,
+            )
+        )
+        if store:
+            await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+        conv._next_seq += 1
+        conv._messages.append(
+            Message(
+                seq=conv._next_seq,
+                role="tool",
+                content='{"success": true}',
+                tool_use_id=f"call_{i}",
+            )
+        )
+        if store:
+            await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+        conv._next_seq += 1
+
+    # set_output call — must be protected
+    so_tc = [_make_tool_call("call_so", "set_output", {"key": "result", "value": "done"})]
+    conv._messages.append(
+        Message(seq=conv._next_seq, role="assistant", content="Setting output", tool_calls=so_tc)
+    )
+    if store:
+        await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+    conv._next_seq += 1
+    conv._messages.append(
+        Message(
+            seq=conv._next_seq,
+            role="tool",
+            content="Output 'result' set successfully.",
+            tool_use_id="call_so",
+        )
+    )
+    if store:
+        await store.write_part(conv._next_seq, conv._messages[-1].to_storage_dict())
+    conv._next_seq += 1
+
+    # Recent messages
+    await conv.add_user_message("Continue")
+    await conv.add_assistant_message("Working on it")
+    return conv
+
+
+# ---------------------------------------------------------------------------
+# Tests: aggressive structural compaction
+# ---------------------------------------------------------------------------
+
+
+class TestAggressiveStructuralCompaction:
+    @pytest.mark.asyncio
+    async def test_aggressive_collapses_tool_pairs(self, tmp_path):
+        """Aggressive mode should collapse non-essential tool pairs into a summary."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=True,
+        )
+
+        # The 5 append_data pairs (10 msgs) + 1 user msg should be collapsed.
+        # Remaining: ref_msg + set_output pair (2 msgs) + 2 recent = 5
+        assert conv.message_count == 5
+        assert conv.messages[0].role == "user"  # ref message
+        assert "TOOLS ALREADY CALLED" in conv.messages[0].content
+        assert "append_data (5x)" in conv.messages[0].content
+
+        # set_output pair should be preserved
+        assert conv.messages[1].role == "assistant"
+        assert conv.messages[1].tool_calls is not None
+        assert conv.messages[1].tool_calls[0]["function"]["name"] == "set_output"
+        assert conv.messages[2].role == "tool"
+
+        # Recent messages intact
+        assert conv.messages[3].content == "Continue"
+        assert conv.messages[4].content == "Working on it"
+
+    @pytest.mark.asyncio
+    async def test_aggressive_preserves_set_output(self, tmp_path):
+        """set_output tool calls are always protected in aggressive mode."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=True,
+        )
+
+        # Find all tool calls in remaining messages
+        tool_names = []
+        for msg in conv.messages:
+            if msg.tool_calls:
+                for tc in msg.tool_calls:
+                    tool_names.append(tc["function"]["name"])
+
+        assert "set_output" in tool_names
+        # append_data should NOT be in remaining messages (collapsed)
+        assert "append_data" not in tool_names
+
+    @pytest.mark.asyncio
+    async def test_aggressive_preserves_errors(self, tmp_path):
+        """Error tool results are always protected in aggressive mode."""
+        conv = NodeConversation()
+        await conv.add_user_message("Start")
+
+        # Regular tool call
+        tc1 = [_make_tool_call("call_ok", "web_search", {"query": "test"})]
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc1)
+        )
+        conv._next_seq += 1
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="tool", content="results", tool_use_id="call_ok")
+        )
+        conv._next_seq += 1
+
+        # Error tool call
+        tc2 = [_make_tool_call("call_err", "web_scrape", {"url": "http://broken.com"})]
+        conv._messages.append(
+            Message(seq=conv._next_seq, role="assistant", content="", tool_calls=tc2)
+        )
+        conv._next_seq += 1
+        conv._messages.append(
+            Message(
+                seq=conv._next_seq,
+                role="tool",
+                content="Connection timeout",
+                tool_use_id="call_err",
+                is_error=True,
+            )
+        )
+        conv._next_seq += 1
+
+        await conv.add_user_message("Next")
+        await conv.add_assistant_message("OK")
+
+        spill = str(tmp_path)
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=True,
+        )
+
+        # Error pair should be preserved
+        error_msgs = [m for m in conv.messages if m.role == "tool" and m.is_error]
+        assert len(error_msgs) == 1
+        assert error_msgs[0].content == "Connection timeout"
+
+    @pytest.mark.asyncio
+    async def test_standard_mode_keeps_all_tool_pairs(self, tmp_path):
+        """Non-aggressive mode should keep all tool pairs (existing behavior)."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=False,
+        )
+
+        # All 6 tool pairs (12 msgs) should be kept as structural.
+        # Removed: 1 user msg (freeform). Remaining: ref + 12 structural + 2 recent = 15
+        assert conv.message_count == 15
+
+    @pytest.mark.asyncio
+    async def test_two_pass_sequence(self, tmp_path):
+        """Standard pass then aggressive pass produces valid result."""
+        conv = await _build_tool_heavy_conversation()
+        spill = str(tmp_path)
+
+        # Pass 1: standard
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+        )
+        after_standard = conv.message_count
+        assert after_standard == 15  # all structural kept
+
+        # Pass 2: aggressive
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=True,
+        )
+        after_aggressive = conv.message_count
+        assert after_aggressive < after_standard
+        # ref + set_output pair + 2 recent = 5
+        assert after_aggressive == 5
+
+    @pytest.mark.asyncio
+    async def test_aggressive_persists_correctly(self, tmp_path):
+        """Aggressive compaction correctly updates the store."""
+        store = MockConversationStore()
+        conv = await _build_tool_heavy_conversation(store=store)
+        spill = str(tmp_path)
+
+        await conv.compact_preserving_structure(
+            spillover_dir=spill,
+            keep_recent=2,
+            aggressive=True,
+        )
+
+        # Verify store state matches in-memory state
+        parts = await store.read_parts()
+        assert len(parts) == conv.message_count
+
+
+class TestExtractToolCallHistory:
+    def test_basic_extraction(self):
+        msgs = [
+            Message(
+                seq=0,
+                role="assistant",
+                content="",
+                tool_calls=[
+                    _make_tool_call("c1", "web_search", {"query": "python async"}),
+                ],
+            ),
+            Message(seq=1, role="tool", content="results", tool_use_id="c1"),
+            Message(
+                seq=2,
+                role="assistant",
+                content="",
+                tool_calls=[
+                    _make_tool_call(
+                        "c2", "save_data", {"filename": "output.txt", "content": "data"}
+                    ),
+                ],
+            ),
+            Message(seq=3, role="tool", content="saved", tool_use_id="c2"),
+        ]
+        result = extract_tool_call_history(msgs)
+        assert "web_search (1x)" in result
+        assert "save_data (1x)" in result
+        assert "FILES SAVED: output.txt" in result
+
+    def test_errors_included(self):
+        msgs = [
+            Message(
+                seq=0,
+                role="tool",
+                content="Connection refused",
+                is_error=True,
+                tool_use_id="c1",
+            ),
+        ]
+        result = extract_tool_call_history(msgs)
+        assert "ERRORS" in result
+        assert "Connection refused" in result
+
+    def test_empty_messages(self):
+        assert extract_tool_call_history([]) == ""
+
+
+# ---------------------------------------------------------------------------
+# Tests for _is_context_too_large_error
+# ---------------------------------------------------------------------------
+
+
+class TestIsContextTooLargeError:
+    def test_context_window_class_name(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        class ContextWindowExceededError(Exception):
+            pass
+
+        assert _is_context_too_large_error(ContextWindowExceededError("x"))
+
+    def test_openai_context_length(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = RuntimeError("This model's maximum context length is 128000 tokens")
+        assert _is_context_too_large_error(err)
+
+    def test_anthropic_too_long(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = RuntimeError("prompt is too long: 150000 tokens > 100000")
+        assert _is_context_too_large_error(err)
+
+    def test_generic_exceeds_limit(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        err = ValueError("Request exceeds token limit")
+        assert _is_context_too_large_error(err)
+
+    def test_unrelated_error(self):
+        from framework.graph.event_loop_node import _is_context_too_large_error
+
+        assert not _is_context_too_large_error(ValueError("connection refused"))
+        assert not _is_context_too_large_error(RuntimeError("timeout"))
+
+
+# ---------------------------------------------------------------------------
+# Tests for _format_messages_for_summary
+# ---------------------------------------------------------------------------
+
+
+class TestFormatMessagesForSummary:
+    def test_user_assistant_messages(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        msgs = [
+            Message(seq=0, role="user", content="Hello world"),
+            Message(seq=1, role="assistant", content="Hi there"),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "[user]: Hello world" in result
+        assert "[assistant]: Hi there" in result
+
+    def test_tool_result_truncated(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        msgs = [
+            Message(seq=0, role="tool", content="x" * 1000, tool_use_id="c1"),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "[tool result]:" in result
+        assert "..." in result
+        # Should be truncated to 500 + "..."
+        assert len(result) < 600
+
+    def test_assistant_with_tool_calls(self):
+        from framework.graph.event_loop_node import EventLoopNode
+
+        tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
+        msgs = [
+            Message(seq=0, role="assistant", content="Searching", tool_calls=tc),
+        ]
+        result = EventLoopNode._format_messages_for_summary(msgs)
+        assert "web_search" in result
+        assert "[assistant (calls:" in result
+
+
+# ---------------------------------------------------------------------------
+# Tests for _llm_compact (recursive binary-search)
+# ---------------------------------------------------------------------------
+
+
+class TestLlmCompact:
+    """Test the recursive LLM compaction with mock LLM."""
+
+    def _make_node(self):
+        """Create a minimal EventLoopNode for testing."""
+        from framework.graph.event_loop_node import EventLoopNode, LoopConfig
+
+        config = LoopConfig(max_history_tokens=32000)
+        node = EventLoopNode.__new__(EventLoopNode)
+        node._config = config
+        node._event_bus = None
+        node._judge = None
+        node._approval_callback = None
+        node._tool_executor = None
+        node._adaptive_learner = None
+        # Set class-level constants (already on class, but explicit)
+        return node
+
+    def _make_ctx(self, llm_responses=None, llm_error=None):
+        """Create a mock NodeContext with controllable LLM."""
+        from unittest.mock import AsyncMock, MagicMock
+
+        from framework.graph.node import NodeSpec
+
+        spec = NodeSpec(
+            id="test",
+            name="Test Node",
+            description="A test node",
+            node_type="event_loop",
+            input_keys=[],
+            output_keys=["result"],
+        )
+
+        ctx = MagicMock()
+        ctx.node_spec = spec
+        ctx.node_id = "test"
+        ctx.stream_id = "test"
+        ctx.continuous_mode = False
+        ctx.runtime_logger = None
+
+        mock_llm = AsyncMock()
+        if llm_error:
+            mock_llm.acomplete.side_effect = llm_error
+        elif llm_responses:
+            responses = []
+            for text in llm_responses:
+                resp = MagicMock()
+                resp.content = text
+                responses.append(resp)
+            mock_llm.acomplete.side_effect = responses
+        else:
+            resp = MagicMock()
+            resp.content = "Summary of conversation."
+            mock_llm.acomplete.return_value = resp
+
+        ctx.llm = mock_llm
+        return ctx
+
+    @pytest.mark.asyncio
+    async def test_single_call_success(self):
+        node = self._make_node()
+        ctx = self._make_ctx()
+        msgs = [
+            Message(seq=0, role="user", content="Do something"),
+            Message(seq=1, role="assistant", content="Done"),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "Summary of conversation." in result
+        ctx.llm.acomplete.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_context_too_large_triggers_split(self):
+        """When LLM raises context error, should split and retry."""
+        from unittest.mock import MagicMock
+
+        node = self._make_node()
+
+        call_count = 0
+
+        async def mock_acomplete(**kwargs):
+            nonlocal call_count
+            call_count += 1
+            # First call with full messages → fail
+            # Subsequent calls with smaller chunks → succeed
+            if call_count == 1:
+                raise RuntimeError("This model's maximum context length is 128000 tokens")
+            resp = MagicMock()
+            resp.content = f"Summary part {call_count}"
+            return resp
+
+        ctx = self._make_ctx()
+        ctx.llm.acomplete = mock_acomplete
+
+        msgs = [Message(seq=i, role="user", content=f"Message {i}") for i in range(10)]
+        result = await node._llm_compact(ctx, msgs, None)
+        # Should have split and produced two summaries
+        assert "Summary part" in result
+        assert call_count >= 3  # 1 failure + 2 successful halves
+
+    @pytest.mark.asyncio
+    async def test_non_context_error_propagates(self):
+        """Non-context errors should propagate, not trigger splitting."""
+        node = self._make_node()
+        ctx = self._make_ctx(llm_error=ValueError("API key invalid"))
+        msgs = [
+            Message(seq=0, role="user", content="Hello"),
+            Message(seq=1, role="assistant", content="Hi"),
+        ]
+        with pytest.raises(ValueError, match="API key invalid"):
+            await node._llm_compact(ctx, msgs, None)
+
+    @pytest.mark.asyncio
+    async def test_proactive_split_for_large_input(self):
+        """Messages exceeding char limit should be split proactively."""
+        node = self._make_node()
+        # Lower the limit for testing
+        node._LLM_COMPACT_CHAR_LIMIT = 100
+
+        ctx = self._make_ctx(
+            llm_responses=["Part 1 summary", "Part 2 summary"],
+        )
+        msgs = [
+            Message(seq=0, role="user", content="x" * 80),
+            Message(seq=1, role="user", content="y" * 80),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "Part 1 summary" in result
+        assert "Part 2 summary" in result
+        # LLM should have been called twice (no failure, proactive split)
+        assert ctx.llm.acomplete.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_tool_history_appended_at_top_level(self):
+        """Tool history should only be appended at depth 0."""
+        node = self._make_node()
+        ctx = self._make_ctx()
+
+        tc = [_make_tool_call("c1", "web_search", {"query": "test"})]
+        msgs = [
+            Message(seq=0, role="assistant", content="", tool_calls=tc),
+            Message(seq=1, role="tool", content="results", tool_use_id="c1"),
+        ]
+        result = await node._llm_compact(ctx, msgs, None)
+        assert "TOOLS ALREADY CALLED" in result
+        assert "web_search" in result
+
+
+# ---------------------------------------------------------------------------
+# Orphaned tool result repair
+# ---------------------------------------------------------------------------
+
+
+class TestRepairOrphanedToolCalls:
+    """Test _repair_orphaned_tool_calls handles both directions."""
+
+    def test_orphaned_tool_result_dropped(self):
+        """Tool result with no matching tool_use should be dropped."""
+        msgs = [
+            # tool result with no preceding assistant tool_use
+            {"role": "tool", "tool_call_id": "orphan_1", "content": "stale result"},
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "hi"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        assert len(repaired) == 2
+        assert repaired[0]["role"] == "user"
+        assert repaired[1]["role"] == "assistant"
+
+    def test_valid_tool_pair_preserved(self):
+        """Tool result with matching tool_use should be kept."""
+        msgs = [
+            {"role": "user", "content": "search"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
+            },
+            {"role": "tool", "tool_call_id": "tc_1", "content": "results"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        assert len(repaired) == 3
+        assert repaired[2]["tool_call_id"] == "tc_1"
+
+    def test_orphaned_tool_use_gets_stub(self):
+        """Tool use with no following tool result gets a synthetic error stub."""
+        msgs = [
+            {"role": "user", "content": "search"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_1", "function": {"name": "search", "arguments": "{}"}}],
+            },
+            # No tool result follows
+            {"role": "user", "content": "what happened?"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        # Should insert a synthetic tool result between assistant and user
+        assert len(repaired) == 4
+        assert repaired[2]["role"] == "tool"
+        assert repaired[2]["tool_call_id"] == "tc_1"
+        assert "interrupted" in repaired[2]["content"].lower()
+
+    def test_mixed_orphans(self):
+        """Both orphaned results and orphaned calls handled together."""
+        msgs = [
+            # Orphaned result (no matching tool_use)
+            {"role": "tool", "tool_call_id": "gone_1", "content": "old result"},
+            {"role": "user", "content": "try again"},
+            {
+                "role": "assistant",
+                "content": "",
+                "tool_calls": [{"id": "tc_2", "function": {"name": "fetch", "arguments": "{}"}}],
+            },
+            # Missing result for tc_2
+            {"role": "user", "content": "done?"},
+        ]
+        repaired = NodeConversation._repair_orphaned_tool_calls(msgs)
+        # orphaned result dropped, stub added for tc_2
+        roles = [m["role"] for m in repaired]
+        assert roles == ["user", "assistant", "tool", "user"]
+        assert repaired[2]["tool_call_id"] == "tc_2"
@@ -184,7 +184,7 @@ class TestPathTraversalWithActualFiles:

            # Create a secret file outside storage
            secret_file = tmpdir_path / "secret.txt"
-            secret_file.write_text("SENSITIVE_DATA")
+            secret_file.write_text("SENSITIVE_DATA", encoding="utf-8")

            storage = FileStorage(storage_dir)

@@ -193,7 +193,7 @@ class TestPathTraversalWithActualFiles:
                storage.get_runs_by_goal("../secret")

            # Verify the secret file was not accessed (still contains original data)
-            assert secret_file.read_text() == "SENSITIVE_DATA"
+            assert secret_file.read_text(encoding="utf-8") == "SENSITIVE_DATA"

    def test_cannot_write_outside_storage(self):
        """Verify that we can't write files outside storage directory."""
@@ -353,7 +353,9 @@ class TestRuntimeLogger:
        # Verify the file exists and has one line
        jsonl_path = tmp_path / "logs" / "sessions" / run_id / "logs" / "tool_logs.jsonl"
        assert jsonl_path.exists()
-        lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
+        lines = [
+            line for line in jsonl_path.read_text(encoding="utf-8").strip().split("\n") if line
+        ]
        assert len(lines) == 1

        data = json.loads(lines[0])
@@ -376,7 +378,8 @@ class TestRuntimeLogger:

        jsonl_path = tmp_path / "logs" / "sessions" / run_id / "logs" / "details.jsonl"
        assert jsonl_path.exists()
-        lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
+        content = jsonl_path.read_text(encoding="utf-8").strip()
+        lines = [line for line in content.split("\n") if line]
        assert len(lines) == 1

        data = json.loads(lines[0])
@@ -98,7 +98,7 @@ class TestFileStorageRunOperations:
        assert run_file.exists()

        # Verify it's valid JSON
-        with open(run_file) as f:
+        with open(run_file, encoding="utf-8") as f:
            data = json.load(f)
        assert data["id"] == "my_run"

@@ -71,6 +71,7 @@ def main():
            capture_output=True,
            text=True,
            check=True,
+            encoding="utf-8",
        )
        framework_path = result.stdout.strip()
        success(f"installed at {framework_path}")
@@ -84,7 +85,12 @@ def main():
    missing_deps = []
    for dep in ["mcp", "fastmcp"]:
        try:
-            subprocess.run([sys.executable, "-c", f"import {dep}"], capture_output=True, check=True)
+            subprocess.run(
+                [sys.executable, "-c", f"import {dep}"],
+                capture_output=True,
+                check=True,
+                encoding="utf-8",
+            )
        except subprocess.CalledProcessError:
            missing_deps.append(dep)

@@ -103,6 +109,7 @@ def main():
            capture_output=True,
            text=True,
            check=True,
+            encoding="utf-8",
        )
        success("loads successfully")
    except subprocess.CalledProcessError as e:
@@ -115,7 +122,7 @@ def main():
    mcp_config = script_dir / ".mcp.json"
    if mcp_config.exists():
        try:
-            with open(mcp_config) as f:
+            with open(mcp_config, encoding="utf-8") as f:
                config = json.load(f)

            if "mcpServers" in config and "agent-builder" in config["mcpServers"]:
@@ -149,7 +156,10 @@ def main():
    for module in modules_to_check:
        try:
            subprocess.run(
-                [sys.executable, "-c", f"import {module}"], capture_output=True, check=True
+                [sys.executable, "-c", f"import {module}"],
+                capture_output=True,
+                check=True,
+                encoding="utf-8",
            )
        except subprocess.CalledProcessError:
            failed_modules.append(module)
@@ -174,6 +184,7 @@ def main():
            text=True,
            check=True,
            timeout=5,
+            encoding="utf-8",
        )
        if "OK" in result.stdout:
            success("server can start")
@@ -27,8 +27,22 @@ uv run python -c "import framework; import aden_tools; print('✓ Setup complete

 ## Building Your First Agent

+Agents are not included by default in a fresh clone.
+
+Agents are created using Claude Code or by manual creation in the
+exports/ directory. Until an agent exists, agent validation and run
+commands will fail.
+
 ### Option 1: Using Claude Code Skills (Recommended)

+This is the recommended way to create your first agent.
+
+**Requirements**
+
+- Anthropic (Claude) API access
+- Claude Code CLI installed
+- Unix-based shell (macOS, Linux, or Windows via WSL)
+
 ```bash
 # Setup already done via quickstart.sh above

@@ -120,7 +134,10 @@ hive/
 ## Running an Agent

 ```bash
-# Browse and run agents interactively (Recommended)
+# Launch the web dashboard in your browser
+hive open
+
+# Browse and run agents in terminal
 hive tui

 # Run a specific agent
@@ -164,7 +181,7 @@ PYTHONPATH=exports uv run python -m my_agent test --type success

 ## Next Steps

-1. **TUI Dashboard**: Run `hive tui` to explore agents interactively
+1. **Dashboard**: Run `hive open` to launch the web dashboard, or `hive tui` for the terminal UI
 2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
 3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
 4. **Build Agents**: Use `/hive` skill in Claude Code
@@ -37,8 +37,6 @@ Ported from `agent_builder_server.py` lines 3484-3856. Pure filesystem reads —
 | Tool | Purpose |
 |------|---------|
 | `list_agent_sessions(agent_name, status?, limit?)` | List sessions, filterable by status |
-| `get_agent_session_state(agent_name, session_id)` | Full session state (memory excluded to prevent context bloat) |
-| `get_agent_session_memory(agent_name, session_id, key?)` | Read memory contents from a session |
 | `list_agent_checkpoints(agent_name, session_id)` | List checkpoints for debugging |
 | `get_agent_checkpoint(agent_name, session_id, checkpoint_id?)` | Load a checkpoint's full state |

@@ -67,7 +65,7 @@ Add all 8 tools after the existing `undo_changes` tool:

 # ── Meta-agent: Session & checkpoint inspection ───────────────
 # _resolve_hive_agent_path(), _read_session_json(), _scan_agent_sessions(), _truncate_value()
-# list_agent_sessions(), get_agent_session_state(), get_agent_session_memory()
+# list_agent_sessions(), list_agent_checkpoints(), get_agent_checkpoint()
 # list_agent_checkpoints(), get_agent_checkpoint()

 # ── Meta-agent: Test execution ────────────────────────────────
@@ -43,7 +43,7 @@ Dedicated tool server providing:
 - **File I/O**: `read_file` (with line numbers, offset/limit), `write_file` (auto-mkdir), `edit_file` (9-strategy fuzzy matching ported from opencode), `list_directory`, `search_files` (regex)
 - **Shell**: `run_command` (timeout, cwd, output truncation)
 - **Git**: `undo_changes` (snapshot-based rollback)
- **Meta-agent**: `discover_mcp_tools`, `list_agents`, `list_agent_sessions`, `get_agent_session_state`, `get_agent_session_memory`, `list_agent_checkpoints`, `get_agent_checkpoint`, `run_agent_tests`
+- **Meta-agent**: `discover_mcp_tools`, `list_agents`, `list_agent_sessions`, `list_agent_checkpoints`, `get_agent_checkpoint`, `run_agent_tests`

 All file operations sandboxed to a configurable project root.

@@ -16,7 +16,7 @@ The agent is deeply integrated with the framework: it can discover available MCP
 - **`reference/`** — Framework guide, file templates, and anti-patterns docs embedded as agent reference material

 ### New: Coder Tools MCP Server (`tools/coder_tools_server.py`)
- 1500-line MCP server providing 15 tools: `read_file`, `write_file`, `edit_file` (with opencode-style 9-strategy fuzzy matching), `list_directory`, `search_files`, `run_command`, `undo_changes`, `discover_mcp_tools`, `list_agents`, `list_agent_sessions`, `get_agent_session_state`, `get_agent_session_memory`, `list_agent_checkpoints`, `get_agent_checkpoint`, `run_agent_tests`
+- 1500-line MCP server providing 13 tools: `read_file`, `write_file`, `edit_file` (with opencode-style 9-strategy fuzzy matching), `list_directory`, `search_files`, `run_command`, `undo_changes`, `discover_mcp_tools`, `list_agents`, `list_agent_sessions`, `list_agent_checkpoints`, `get_agent_checkpoint`, `run_agent_tests`
 - Path-scoped security: all file operations sandboxed to project root
 - Git-based undo: automatic snapshots before writes with `undo_changes` rollback

@@ -145,7 +145,7 @@ Implement the core execution engine where every Agent operates as an isolated, a
    - [x] SharedState manager (runtime/shared_state.py)
    - [x] Session-based storage (storage/session_store.py)
    - [x] Isolation levels: ISOLATED, SHARED, SYNCHRONIZED
- [ ] **Default Monitoring Hooks**
+- [x] **Default Monitoring Hooks**
    - [ ] Performance metrics collection
    - [ ] Resource usage tracking
    - [ ] Health check endpoints
@@ -590,7 +590,7 @@ Write the Quick Start guide, detailed tool usage documentation, and set up the M
    - [x] README with examples
    - [x] Contributing guidelines
    - [x] GitHub Page setup
- [ ] **Tool Usage Documentation**
+- [x] **Tool Usage Documentation**
    - [ ] Comprehensive tool documentation
    - [ ] Tool integration examples
    - [ ] Best practices guide
@@ -643,7 +643,7 @@ Expose basic REST/WebSocket endpoints for external control (Start, Stop, Pause,
    - [x] Load/unload/start/restart in AgentRuntime
    - [x] State persistence
    - [x] Recovery mechanisms
- [ ] **REST API Endpoints**
+- [x] **REST API Endpoints**
    - [ ] Start endpoint for agent execution
    - [ ] Stop endpoint for graceful shutdown
    - [ ] Pause endpoint for execution suspension
@@ -661,7 +661,7 @@ Implement automated test execution, agent version control, and mandatory test-pa
    - [x] Test framework with pytest integration (testing/)
    - [x] Test result reporting
    - [x] Test CLI commands (test-run, test-debug, etc.)
- [ ] **Automated Testing Pipeline**
+- [x] **Automated Testing Pipeline**
    - [ ] CI integration (GitHub Actions, etc.)
    - [ ] Mandatory test-passing gates
    - [ ] Coverage reporting
@@ -873,7 +873,7 @@ Build native frontend configurations to easily connect Open Hive's backend to lo
    - [ ] Node.js runtime support
    - [ ] Browser runtime support
 - [ ] **Platform Compatibility**
-    - [ ] Windows support improvements
+    - [x] Windows support improvements
    - [ ] macOS optimization
    - [ ] Linux distribution support

@@ -1,59 +0,0 @@
-# TUI Dashboard Guide
-
-## Launching the TUI
-
-There are two ways to launch the TUI dashboard:
-
-```bash
-# Browse and select an agent interactively
-hive tui
-
-# Launch the TUI for a specific agent
-hive run exports/my_agent --tui
-```
-
-`hive tui` scans both `exports/` and `examples/templates/` for available agents, then presents a selection menu.
-
-## Dashboard Panels
-
-The TUI dashboard is divided into four areas:
-
- **Status Bar** - Shows the current agent name, execution state, and model in use
- **Graph Overview** - Live visualization of the agent's node graph with highlighted active node
- **Log Pane** - Scrollable event log streaming node transitions, LLM calls, and tool outputs
- **Chat REPL** - Input area for interacting with client-facing nodes (`ask_user()` prompts appear here)
-
-## Keybindings
-
-| Key           | Action                |
-|---------------|-----------------------|
-| `Tab`         | Next panel            |
-| `Shift+Tab`   | Previous panel        |
-| `Ctrl+S`      | Save SVG screenshot   |
-| `Ctrl+O`      | Command palette       |
-| `Q`           | Quit                  |
-
-## Panel Cycle Order
-
-`Tab` cycles: **Log Pane → Graph View → Chat Input**
-
-## Text Selection
-
-Textual apps capture the mouse, so normal click-drag selection won't work by default. To select and copy text from any pane:
-
-1. **Hold `Shift`** while clicking and dragging — this bypasses Textual's mouse capture and lets your terminal handle selection natively.
-2. Copy with your terminal's shortcut (`Cmd+C` on macOS, `Ctrl+Shift+C` on most Linux terminals).
-
-## Log Pane Scrolling
-
-The log pane uses `auto_scroll=False`. New output only scrolls to the bottom when you are already at the bottom of the log. If you've scrolled up to read earlier output, it stays in place.
-
-## Screenshots
-
-`Ctrl+S` saves an SVG screenshot to the `screenshots/` directory with a timestamped filename. Open the SVG in any browser to view it.
-
-## Tips
-
- Use `--mock` mode to explore agent execution without spending API credits: `hive run exports/my_agent --tui --mock`
- Override the default model with `--model`: `hive run exports/my_agent --model gpt-4o`
- Screenshots are saved as SVG files to `screenshots/` and can be opened in any browser
@@ -191,7 +191,7 @@ Both events are handled in the cross-graph filter (events from non-active graphs

 ## Known Gaps

-**Gap 1 — Resolved.** The queen is now the full `HiveCoderAgent` graph (not a minimal hand-assembled subset). `_load_judge_and_queen` calls `HiveCoderAgent._setup(mock_mode=True)` to load hive-tools MCP, then merges those tools into the worker runtime alongside monitoring tools. When the operator connects via Ctrl+Q, they get `coder_node` with `read_file`, `write_file`, `run_command`, `restart_agent`, `get_agent_session_state`, and all other hive-tools. The `ticket_triage_node` still handles auto-triage on ticket events. `self._queen_agent` is held on the TUI instance to keep the MCP process alive.
+**Gap 1 — Resolved.** The queen is now the full `HiveCoderAgent` graph (not a minimal hand-assembled subset). `_load_judge_and_queen` calls `HiveCoderAgent._setup(mock_mode=True)` to load hive-tools MCP, then merges those tools into the worker runtime alongside monitoring tools. When the operator connects via Ctrl+Q, they get `coder_node` with `read_file`, `write_file`, `run_command`, `restart_agent`, and all other hive-tools. The `ticket_triage_node` still handles auto-triage on ticket events. `self._queen_agent` is held on the TUI instance to keep the MCP process alive.

 **Gap 2 — LLM-hang detection latency.**
 If the worker's LLM call hangs (API never returns), no new log entries are written. The judge detects this on its next timer tick (≤2 min). Bounded latency, not zero.
@@ -8,7 +8,7 @@ from framework.graph.executor import ExecutionResult
 from framework.graph.checkpoint_config import CheckpointConfig
 from framework.llm import LiteLLMProvider
 from framework.runner.tool_registry import ToolRegistry
-from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
+from framework.runtime.agent_runtime import create_agent_runtime
 from framework.runtime.execution_stream import EntryPointSpec

 from .config import default_config, metadata
@@ -90,7 +90,7 @@ edges = [
        source="confirm-draft",
        target="intake",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="batch_complete == True and send_started == True and send_count >= 1 and sent_message_ids is not None and len(sent_message_ids) >= 1",
+        condition_expr="batch_complete == True",
        priority=1,
    ),
 ]
@@ -251,9 +251,7 @@ class EmailReplyAgent:
                errors.append(f"Terminal node '{t}' not found")
        for ep_id, nid in self.entry_points.items():
            if nid not in node_ids:
-                errors.append(
-                    f"Entry point '{ep_id}' references unknown node '{nid}'"
-                )
+                errors.append(f"Entry point '{ep_id}' references unknown node '{nid}'")
        return {"valid": len(errors) == 0, "errors": errors, "warnings": warnings}


@@ -36,7 +36,9 @@ default_config = RuntimeConfig()
 class AgentMetadata:
    name: str = "Email Reply Agent"
    version: str = "1.0.0"
-    description: str = "Filter unreplied emails, confirm recipients, send personalized replies."
+    description: str = (
+        "Filter unreplied emails, confirm recipients, send personalized replies."
+    )
    intro_message: str = "Tell me which emails you want to reply to (e.g., 'emails from @company.com in the last week')."


@@ -83,8 +83,8 @@ confirm_draft_node = NodeSpec(
    client_facing=True,
    max_node_visits=0,
    input_keys=["email_list", "filter_criteria"],
-    output_keys=["batch_complete", "restart", "send_started", "send_count", "sent_message_ids", "send_failures"],
-    nullable_output_keys=["batch_complete", "restart", "send_started", "send_count", "sent_message_ids", "send_failures"],
+    output_keys=["batch_complete", "restart"],
+    nullable_output_keys=["batch_complete", "restart"],
    success_criteria="User confirmed recipients and personalized replies sent for each.",
    system_prompt="""\
 You are a Gmail reply assistant. Present emails for confirmation, then send personalized replies.
@@ -99,22 +99,14 @@ You are a Gmail reply assistant. Present emails for confirmation, then send pers
 **STEP 2 — Handle user response:**

 If user CONFIRMS (says yes, go ahead, sounds good, etc.):
-1. Immediately call set_output("send_started", True) before any send tools.
-2. For EACH email in email_list, call gmail_reply_email with:
+For EACH email in email_list:
+1. Read the subject and snippet
+2. Use tone_guidance from filter_criteria + any user-specified preferences
+3. Call gmail_reply_email with:
   - message_id: the email's message_id
-   - html: personalized 2-4 sentence reply based on email context, using tone_guidance from filter_criteria and any new user preferences.
-3. Track send results during this run:
-   - send_count: number of successful gmail_reply_email calls
-   - sent_message_ids: list of message_ids successfully replied to
-   - send_failures: list of {"message_id": "...", "error": "..."} for failed sends
-4. REQUIRED completion gate:
-   - You MUST NOT set batch_complete=True unless send_started is True AND send_count >= 1 AND sent_message_ids is non-empty.
-   - If no sends succeeded, do NOT set batch_complete=True. Instead explain what failed and ask user whether to retry or restart.
-5. After successful sends, call set_output in a separate turn:
-   - set_output("send_count", <int>)
-   - set_output("sent_message_ids", <list>)
-   - set_output("send_failures", <list>)
-   - set_output("batch_complete", True)
+   - html: personalized 2-4 sentence reply based on email context
+   (The tool automatically handles recipient, subject, and threading)
+4. After all replies sent, call: set_output("batch_complete", True)

 If user wants to CHANGE LOGIC/FILTER (says change filter, different criteria, not these emails, wrong emails, etc.):
 1. Acknowledge their request
@@ -1,7 +1,5 @@
 """Structural tests for Email Reply Agent."""

-import pytest
-

 class TestAgentStructure:
    """Test agent graph structure."""
@@ -78,5 +78,6 @@ if (-not $env:HIVE_CREDENTIAL_KEY) {
 }

 # ── Run the Hive CLI ────────────────────────────────────────────────
-
+# PYTHONUTF8=1: use UTF-8 for default encoding (fixes charmap decode errors on Windows)
+$env:PYTHONUTF8 = "1"
 & uv run hive @args
@@ -130,8 +130,8 @@ function Test-DefenderExclusions {
    
    # Normalize and filter null/empty values
    $safePrefixes = $safePrefixes | Where-Object { $_ } | ForEach-Object {
-        [System.IO.Path]::GetFullPath($_)
-    }
+        try { [System.IO.Path]::GetFullPath($_) } catch { $null }
+    } | Where-Object { $_ }
    
    try {
        # Check if Defender cmdlets are available (may not exist on older Windows)
@@ -157,15 +157,20 @@ function Test-DefenderExclusions {
        $existing = $prefs.ExclusionPath
        if (-not $existing) { $existing = @() }
        
-        # Normalize existing paths for comparison
+        # Normalize existing paths for comparison (some may contain wildcards
+        # or env vars that GetFullPath rejects — skip those gracefully)
        $existing = $existing | Where-Object { $_ } | ForEach-Object {
-            [System.IO.Path]::GetFullPath($_)
+            try { [System.IO.Path]::GetFullPath($_) } catch { $_ }
        }
        
        # Normalize paths and find missing exclusions
        $missing = @()
        foreach ($path in $Paths) {
-            $normalized = [System.IO.Path]::GetFullPath($path)
+            try {
+                $normalized = [System.IO.Path]::GetFullPath($path)
+            } catch {
+                continue  # Skip paths with unsupported format
+            }
            
            # Security: Ensure path is within safe boundaries
            $isSafe = $false
@@ -250,7 +255,11 @@ function Add-DefenderExclusions {
    
    foreach ($path in $Paths) {
        try {
-            $normalized = [System.IO.Path]::GetFullPath($path)
+            try {
+                $normalized = [System.IO.Path]::GetFullPath($path)
+            } catch {
+                $normalized = $path  # Use raw path if normalization fails
+            }
            Add-MpPreference -ExclusionPath $normalized -ErrorAction Stop
            $added += $normalized
        } catch {
@@ -408,6 +417,58 @@ Write-Ok "uv detected: $uvVersion"
 Write-Host ""

 # Check for Node.js (needed for frontend dashboard)
+function Install-NodeViaFnm {
+    <#
+    .SYNOPSIS
+        Install Node.js 20 via fnm (Fast Node Manager) - mirrors nvm approach in quickstart.sh
+    #>
+    $fnmCmd = Get-Command fnm -ErrorAction SilentlyContinue
+    if (-not $fnmCmd) {
+        $fnmDir = Join-Path $env:LOCALAPPDATA "fnm"
+        $fnmExe = Join-Path $fnmDir "fnm.exe"
+        if (-not (Test-Path $fnmExe)) {
+            try {
+                Write-Host "    Downloading fnm (Fast Node Manager)..." -ForegroundColor DarkGray
+                $zipUrl = "https://github.com/Schniz/fnm/releases/latest/download/fnm-windows.zip"
+                $zipPath = Join-Path $env:TEMP "fnm-install.zip"
+                Invoke-WebRequest -Uri $zipUrl -OutFile $zipPath -UseBasicParsing -ErrorAction Stop
+                if (-not (Test-Path $fnmDir)) { New-Item -ItemType Directory -Path $fnmDir -Force | Out-Null }
+                Expand-Archive -Path $zipPath -DestinationPath $fnmDir -Force
+                Remove-Item $zipPath -Force -ErrorAction SilentlyContinue
+            } catch {
+                Write-Fail "fnm download failed"
+                Write-Host "    Install Node.js 20+ manually from https://nodejs.org" -ForegroundColor DarkGray
+                return $false
+            }
+        }
+        if (Test-Path (Join-Path $fnmDir "fnm.exe")) {
+            $env:PATH = "$fnmDir;$env:PATH"
+        } else {
+            Write-Fail "fnm binary not found after download"
+            Write-Host "    Install Node.js 20+ manually from https://nodejs.org" -ForegroundColor DarkGray
+            return $false
+        }
+    }
+
+    try {
+        $null = & fnm install 20 2>&1
+        if ($LASTEXITCODE -ne 0) { throw "fnm install 20 exited with code $LASTEXITCODE" }
+        & fnm env --use-on-cd --shell powershell | Out-String | Invoke-Expression
+        $null = & fnm use 20 2>&1
+        $testNode = Get-Command node -ErrorAction SilentlyContinue
+        if ($testNode) {
+            $ver = & node --version 2>$null
+            Write-Ok "Node.js $ver installed via fnm"
+            return $true
+        }
+        throw "node not found after fnm install"
+    } catch {
+        Write-Fail "Node.js installation failed"
+        Write-Host "    Install manually from https://nodejs.org" -ForegroundColor DarkGray
+        return $false
+    }
+}
+
 $NodeAvailable = $false
 $nodeCmd = Get-Command node -ErrorAction SilentlyContinue
 if ($nodeCmd) {
@@ -419,12 +480,13 @@ if ($nodeCmd) {
            $NodeAvailable = $true
        } else {
            Write-Warn "Node.js $nodeVersion found (20+ required for frontend dashboard)"
-            Write-Host "    Install from https://nodejs.org" -ForegroundColor DarkGray
+            Write-Host "    Installing Node.js 20 via fnm..." -ForegroundColor Yellow
+            $NodeAvailable = Install-NodeViaFnm
        }
    }
 } else {
-    Write-Warn "Node.js not found (optional, needed for web dashboard)"
-    Write-Host "    Install from https://nodejs.org" -ForegroundColor DarkGray
+    Write-Warn "Node.js not found. Installing via fnm..."
+    $NodeAvailable = Install-NodeViaFnm
 }
 Write-Host ""

@@ -736,8 +798,8 @@ $ProviderMap = [ordered]@{
 }

 $DefaultModels = @{
-    anthropic   = "claude-opus-4-6"
-    openai      = "gpt-5.2"
+    anthropic   = "claude-haiku-4-5-20251001"
+    openai      = "gpt-5-mini"
    gemini      = "gemini-3-flash-preview"
    groq        = "moonshotai/kimi-k2-instruct-0905"
    cerebras    = "zai-glm-4.7"
@@ -749,14 +811,14 @@ $DefaultModels = @{
 # Model choices: array of hashtables per provider
 $ModelChoices = @{
    anthropic = @(
-        @{ Id = "claude-opus-4-6";            Label = "Opus 4.6 - Most capable (recommended)"; MaxTokens = 32768 },
-        @{ Id = "claude-sonnet-4-5-20250929"; Label = "Sonnet 4.5 - Best balance";             MaxTokens = 16384 },
-        @{ Id = "claude-sonnet-4-20250514";   Label = "Sonnet 4 - Fast + capable";             MaxTokens = 8192 },
-        @{ Id = "claude-haiku-4-5-20251001";  Label = "Haiku 4.5 - Fast + cheap";              MaxTokens = 8192 }
+        @{ Id = "claude-haiku-4-5-20251001";  Label = "Haiku 4.5 - Fast + cheap (recommended)"; MaxTokens = 8192 },
+        @{ Id = "claude-sonnet-4-20250514";   Label = "Sonnet 4 - Fast + capable";              MaxTokens = 8192 },
+        @{ Id = "claude-sonnet-4-5-20250929"; Label = "Sonnet 4.5 - Best balance";              MaxTokens = 16384 },
+        @{ Id = "claude-opus-4-6";            Label = "Opus 4.6 - Most capable";                MaxTokens = 32768 }
    )
    openai = @(
-        @{ Id = "gpt-5.2";   Label = "GPT-5.2 - Most capable (recommended)"; MaxTokens = 16384 },
-        @{ Id = "gpt-5-mini"; Label = "GPT-5 Mini - Fast + cheap";            MaxTokens = 16384 }
+        @{ Id = "gpt-5-mini"; Label = "GPT-5 Mini - Fast + cheap (recommended)"; MaxTokens = 16384 },
+        @{ Id = "gpt-5.2";   Label = "GPT-5.2 - Most capable";                   MaxTokens = 16384 }
    )
    gemini = @(
        @{ Id = "gemini-3-flash-preview"; Label = "Gemini 3 Flash - Fast (recommended)"; MaxTokens = 8192 },
@@ -783,6 +845,17 @@ function Get-ModelSelection {
        return @{ Model = $choices[0].Id; MaxTokens = $choices[0].MaxTokens }
    }

+    # Find default index from previous model (if same provider)
+    $defaultIdx = "1"
+    if ($PrevModel -and $PrevProvider -eq $ProviderId) {
+        for ($j = 0; $j -lt $choices.Count; $j++) {
+            if ($choices[$j].Id -eq $PrevModel) {
+                $defaultIdx = [string]($j + 1)
+                break
+            }
+        }
+    }
+
    Write-Host ""
    Write-Color -Text "Select a model:" -Color White
    Write-Host ""
@@ -794,8 +867,8 @@ function Get-ModelSelection {
    Write-Host ""

    while ($true) {
-        $raw = Read-Host "Enter choice [1]"
-        if ([string]::IsNullOrWhiteSpace($raw)) { $raw = "1" }
+        $raw = Read-Host "Enter choice [$defaultIdx]"
+        if ([string]::IsNullOrWhiteSpace($raw)) { $raw = $defaultIdx }
        if ($raw -match '^\d+$') {
            $num = [int]$raw
            if ($num -ge 1 -and $num -le $choices.Count) {
@@ -851,6 +924,60 @@ $ProviderMenuUrls     = @(
    "https://cloud.cerebras.ai/"
 )

+# ── Read previous configuration (if any) ──────────────────────
+$PrevProvider = ""
+$PrevModel = ""
+$PrevEnvVar = ""
+$PrevSubMode = ""
+if (Test-Path $HiveConfigFile) {
+    try {
+        $prevConfig = Get-Content -Path $HiveConfigFile -Raw | ConvertFrom-Json
+        $prevLlm = $prevConfig.llm
+        if ($prevLlm) {
+            $PrevProvider = if ($prevLlm.provider) { $prevLlm.provider } else { "" }
+            $PrevModel = if ($prevLlm.model) { $prevLlm.model } else { "" }
+            $PrevEnvVar = if ($prevLlm.api_key_env_var) { $prevLlm.api_key_env_var } else { "" }
+            if ($prevLlm.use_claude_code_subscription) { $PrevSubMode = "claude_code" }
+            elseif ($prevLlm.use_codex_subscription) { $PrevSubMode = "codex" }
+            elseif ($prevLlm.api_base -and $prevLlm.api_base -like "*api.z.ai*") { $PrevSubMode = "zai_code" }
+        }
+    } catch { }
+}
+
+# Compute default menu number (only if credential is still valid)
+$DefaultChoice = ""
+if ($PrevSubMode -or $PrevProvider) {
+    $prevCredValid = $false
+    switch ($PrevSubMode) {
+        "claude_code" { if ($ClaudeCredDetected) { $prevCredValid = $true } }
+        "zai_code"    { if ($ZaiCredDetected)    { $prevCredValid = $true } }
+        "codex"       { if ($CodexCredDetected)  { $prevCredValid = $true } }
+        default {
+            if ($PrevEnvVar) {
+                $envVal = [System.Environment]::GetEnvironmentVariable($PrevEnvVar, "Process")
+                if (-not $envVal) { $envVal = [System.Environment]::GetEnvironmentVariable($PrevEnvVar, "User") }
+                if ($envVal) { $prevCredValid = $true }
+            }
+        }
+    }
+    if ($prevCredValid) {
+        switch ($PrevSubMode) {
+            "claude_code" { $DefaultChoice = "1" }
+            "zai_code"    { $DefaultChoice = "2" }
+            "codex"       { $DefaultChoice = "3" }
+        }
+        if (-not $DefaultChoice) {
+            switch ($PrevProvider) {
+                "anthropic" { $DefaultChoice = "4" }
+                "openai"    { $DefaultChoice = "5" }
+                "gemini"    { $DefaultChoice = "6" }
+                "groq"      { $DefaultChoice = "7" }
+                "cerebras"  { $DefaultChoice = "8" }
+            }
+        }
+    }
+}
+
 # ── Show unified provider selection menu ─────────────────────
 Write-Color -Text "Select your default LLM provider:" -Color White
 Write-Host ""
@@ -896,8 +1023,18 @@ Write-Color -Text "9" -Color Cyan -NoNewline
 Write-Host ") Skip for now"
 Write-Host ""

+if ($DefaultChoice) {
+    Write-Color -Text "  Previously configured: $PrevProvider/$PrevModel. Press Enter to keep." -Color DarkGray
+    Write-Host ""
+}
+
 while ($true) {
-    $raw = Read-Host "Enter choice (1-9)"
+    if ($DefaultChoice) {
+        $raw = Read-Host "Enter choice (1-9) [$DefaultChoice]"
+        if ([string]::IsNullOrWhiteSpace($raw)) { $raw = $DefaultChoice }
+    } else {
+        $raw = Read-Host "Enter choice (1-9)"
+    }
    if ($raw -match '^\d+$') {
        $num = [int]$raw
        if ($num -ge 1 -and $num -le 9) { break }
@@ -974,28 +1111,68 @@ switch ($num) {
        $providerName       = $ProviderMenuNames[$provIdx] -replace ' - .*', ''  # strip description
        $signupUrl          = $ProviderMenuUrls[$provIdx]

-        # Check if key is already set
-        $existingKey = [System.Environment]::GetEnvironmentVariable($SelectedEnvVar, "User")
-        if (-not $existingKey) { $existingKey = [System.Environment]::GetEnvironmentVariable($SelectedEnvVar, "Process") }
-        if (-not $existingKey) {
-            Write-Host ""
-            Write-Host "Get your API key from: " -NoNewline
-            Write-Color -Text $signupUrl -Color Cyan
-            Write-Host ""
-            $apiKey = Read-Host "Paste your $providerName API key (or press Enter to skip)"
+        # Prompt for key (allow replacement if already set) with verification + retry
+        while ($true) {
+            $existingKey = [System.Environment]::GetEnvironmentVariable($SelectedEnvVar, "User")
+            if (-not $existingKey) { $existingKey = [System.Environment]::GetEnvironmentVariable($SelectedEnvVar, "Process") }
+
+            if ($existingKey) {
+                $masked = $existingKey.Substring(0, [Math]::Min(4, $existingKey.Length)) + "..." + $existingKey.Substring([Math]::Max(0, $existingKey.Length - 4))
+                Write-Host ""
+                Write-Color -Text "  $([char]0x2B22) Current key: $masked" -Color Green
+                $apiKey = Read-Host "  Press Enter to keep, or paste a new key to replace"
+            } else {
+                Write-Host ""
+                Write-Host "Get your API key from: " -NoNewline
+                Write-Color -Text $signupUrl -Color Cyan
+                Write-Host ""
+                $apiKey = Read-Host "Paste your $providerName API key (or press Enter to skip)"
+            }

            if ($apiKey) {
                [System.Environment]::SetEnvironmentVariable($SelectedEnvVar, $apiKey, "User")
                Set-Item -Path "Env:\$SelectedEnvVar" -Value $apiKey
                Write-Host ""
                Write-Ok "API key saved as User environment variable: $SelectedEnvVar"
-                Write-Color -Text "  (Persisted for all future sessions)" -Color DarkGray
-            } else {
+
+                # Health check the new key
+                Write-Host "  Verifying API key... " -NoNewline
+                try {
+                    $hcResult = & uv run python (Join-Path $ScriptDir "scripts/check_llm_key.py") $SelectedProviderId $apiKey 2>$null
+                    $hcJson = $hcResult | ConvertFrom-Json
+                    if ($hcJson.valid -eq $true) {
+                        Write-Color -Text "ok" -Color Green
+                        break
+                    } elseif ($hcJson.valid -eq $false) {
+                        Write-Color -Text "failed" -Color Red
+                        Write-Warn $hcJson.message
+                        # Undo the save so user can retry cleanly
+                        [System.Environment]::SetEnvironmentVariable($SelectedEnvVar, $null, "User")
+                        Remove-Item -Path "Env:\$SelectedEnvVar" -ErrorAction SilentlyContinue
+                        Write-Host ""
+                        Read-Host "  Press Enter to try again"
+                        # loop back to key prompt
+                    } else {
+                        Write-Color -Text "--" -Color Yellow
+                        Write-Color -Text "  Could not verify key (network issue). The key has been saved." -Color DarkGray
+                        break
+                    }
+                } catch {
+                    Write-Color -Text "--" -Color Yellow
+                    Write-Color -Text "  Could not verify key (network issue). The key has been saved." -Color DarkGray
+                    break
+                }
+            } elseif (-not $existingKey) {
+                # No existing key and user skipped
                Write-Host ""
                Write-Warn "Skipped. Set the environment variable manually when ready:"
                Write-Host "  [System.Environment]::SetEnvironmentVariable('$SelectedEnvVar', 'your-key', 'User')"
                $SelectedEnvVar     = ""
                $SelectedProviderId = ""
+                break
+            } else {
+                # User pressed Enter with existing key — keep it
+                break
            }
        }
    }
@@ -1011,26 +1188,67 @@ switch ($num) {
    }
 }

-# For ZAI subscription: prompt for API key if not already set
+# For ZAI subscription: prompt for API key (allow replacement if already set) with verification + retry
 if ($SubscriptionMode -eq "zai_code") {
-    $existingZai = [System.Environment]::GetEnvironmentVariable("ZAI_API_KEY", "User")
-    if (-not $existingZai) { $existingZai = $env:ZAI_API_KEY }
-    if (-not $existingZai) {
-        Write-Host ""
-        $apiKey = Read-Host "Paste your ZAI API key (or press Enter to skip)"
+    while ($true) {
+        $existingZai = [System.Environment]::GetEnvironmentVariable("ZAI_API_KEY", "User")
+        if (-not $existingZai) { $existingZai = $env:ZAI_API_KEY }
+
+        if ($existingZai) {
+            $masked = $existingZai.Substring(0, [Math]::Min(4, $existingZai.Length)) + "..." + $existingZai.Substring([Math]::Max(0, $existingZai.Length - 4))
+            Write-Host ""
+            Write-Color -Text "  $([char]0x2B22) Current ZAI key: $masked" -Color Green
+            $apiKey = Read-Host "  Press Enter to keep, or paste a new key to replace"
+        } else {
+            Write-Host ""
+            $apiKey = Read-Host "Paste your ZAI API key (or press Enter to skip)"
+        }

        if ($apiKey) {
            [System.Environment]::SetEnvironmentVariable("ZAI_API_KEY", $apiKey, "User")
            $env:ZAI_API_KEY = $apiKey
            Write-Host ""
            Write-Ok "ZAI API key saved as User environment variable"
-        } else {
+
+            # Health check the new key
+            Write-Host "  Verifying ZAI API key... " -NoNewline
+            try {
+                $hcResult = & uv run python (Join-Path $ScriptDir "scripts/check_llm_key.py") "zai" $apiKey "https://api.z.ai/api/coding/paas/v4" 2>$null
+                $hcJson = $hcResult | ConvertFrom-Json
+                if ($hcJson.valid -eq $true) {
+                    Write-Color -Text "ok" -Color Green
+                    break
+                } elseif ($hcJson.valid -eq $false) {
+                    Write-Color -Text "failed" -Color Red
+                    Write-Warn $hcJson.message
+                    # Undo the save so user can retry cleanly
+                    [System.Environment]::SetEnvironmentVariable("ZAI_API_KEY", $null, "User")
+                    Remove-Item -Path "Env:\ZAI_API_KEY" -ErrorAction SilentlyContinue
+                    Write-Host ""
+                    Read-Host "  Press Enter to try again"
+                    # loop back to key prompt
+                } else {
+                    Write-Color -Text "--" -Color Yellow
+                    Write-Color -Text "  Could not verify key (network issue). The key has been saved." -Color DarkGray
+                    break
+                }
+            } catch {
+                Write-Color -Text "--" -Color Yellow
+                Write-Color -Text "  Could not verify key (network issue). The key has been saved." -Color DarkGray
+                break
+            }
+        } elseif (-not $existingZai) {
+            # No existing key and user skipped
            Write-Host ""
            Write-Warn "Skipped. Add your ZAI API key later:"
            Write-Color -Text "  [System.Environment]::SetEnvironmentVariable('ZAI_API_KEY', 'your-key', 'User')" -Color Cyan
            $SelectedEnvVar     = ""
            $SelectedProviderId = ""
            $SubscriptionMode   = ""
+            break
+        } else {
+            # User pressed Enter with existing key — keep it
+            break
        }
    }
 }
@@ -1081,37 +1299,18 @@ if ($SelectedProviderId) {
 Write-Host ""

 # ============================================================
-# Step 5b: Browser Automation (GCU)
+# Step 5b: Browser Automation (GCU) — always enabled
 # ============================================================

 Write-Host ""
-Write-Color -Text "Enable browser automation?" -Color White
-Write-Color -Text "This lets your agents control a real browser - navigate websites, fill forms," -Color DarkGray
-Write-Color -Text "scrape dynamic pages, and interact with web UIs." -Color DarkGray
-Write-Host ""
-Write-Host "  " -NoNewline; Write-Color -Text "1)" -Color Cyan -NoNewline; Write-Host " Yes"
-Write-Host "  " -NoNewline; Write-Color -Text "2)" -Color Cyan -NoNewline; Write-Host " No"
-Write-Host ""
-
-do {
-    $gcuChoice = Read-Host "Enter choice (1-2)"
-} while ($gcuChoice -ne "1" -and $gcuChoice -ne "2")
-
-$GcuEnabled = $false
-if ($gcuChoice -eq "1") {
-    $GcuEnabled = $true
-    Write-Ok "Browser automation enabled"
-} else {
-    Write-Color -Text "  Browser automation skipped" -Color DarkGray
-}
+Write-Ok "Browser automation enabled"

 # Patch gcu_enabled into configuration.json
 if (Test-Path $HiveConfigFile) {
    $existingConfig = Get-Content -Path $HiveConfigFile -Raw | ConvertFrom-Json
-    $existingConfig | Add-Member -NotePropertyName "gcu_enabled" -NotePropertyValue $GcuEnabled -Force
+    $existingConfig | Add-Member -NotePropertyName "gcu_enabled" -NotePropertyValue $true -Force
    $existingConfig | ConvertTo-Json -Depth 4 | Set-Content -Path $HiveConfigFile -Encoding UTF8
-} elseif ($GcuEnabled) {
-    # No config file yet (user skipped LLM provider) - create minimal one
+} else {
    if (-not (Test-Path $HiveConfigDir)) {
        New-Item -ItemType Directory -Path $HiveConfigDir -Force | Out-Null
    }
@@ -1425,7 +1624,7 @@ if ($FrontendBuilt) {
    Write-Color -Text "  Starting server on http://localhost:8787" -Color DarkGray
    Write-Color -Text "  Press Ctrl+C to stop" -Color DarkGray
    Write-Host ""
-    & (Join-Path $ScriptDir "hive.ps1") serve --open
+    & (Join-Path $ScriptDir "hive.ps1") open
 } else {
    Write-Color -Text "═══════════════════════════════════════════════════════" -Color Yellow
    Write-Host ""
@@ -407,7 +407,7 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    )

    declare -A DEFAULT_MODELS=(
-        ["anthropic"]="claude-haiku-4-5"
+        ["anthropic"]="claude-haiku-4-5-20251001"
        ["openai"]="gpt-5-mini"
        ["gemini"]="gemini-3-flash-preview"
        ["groq"]="moonshotai/kimi-k2-instruct-0905"
@@ -420,12 +420,12 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    # Model choices per provider: composite-key associative arrays
    # Keys: "provider:index" -> value
    declare -A MODEL_CHOICES_ID=(
-        ["anthropic:0"]="claude-opus-4-6"
-        ["anthropic:1"]="claude-sonnet-4-5-20250929"
-        ["anthropic:2"]="claude-sonnet-4-20250514"
-        ["anthropic:3"]="claude-haiku-4-5-20251001"
-        ["openai:0"]="gpt-5.2"
-        ["openai:1"]="gpt-5-mini"
+        ["anthropic:0"]="claude-haiku-4-5-20251001"
+        ["anthropic:1"]="claude-sonnet-4-20250514"
+        ["anthropic:2"]="claude-sonnet-4-5-20250929"
+        ["anthropic:3"]="claude-opus-4-6"
+        ["openai:0"]="gpt-5-mini"
+        ["openai:1"]="gpt-5.2"
        ["gemini:0"]="gemini-3-flash-preview"
        ["gemini:1"]="gemini-3.1-pro-preview"
        ["groq:0"]="moonshotai/kimi-k2-instruct-0905"
@@ -435,12 +435,12 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    )

    declare -A MODEL_CHOICES_LABEL=(
-        ["anthropic:0"]="Opus 4.6 - Most capable (recommended)"
-        ["anthropic:1"]="Sonnet 4.5 - Best balance"
-        ["anthropic:2"]="Sonnet 4 - Fast + capable"
-        ["anthropic:3"]="Haiku 4.5 - Fast + cheap"
-        ["openai:0"]="GPT-5.2 - Most capable (recommended)"
-        ["openai:1"]="GPT-5 Mini - Fast + cheap"
+        ["anthropic:0"]="Haiku 4.5 - Fast + cheap (recommended)"
+        ["anthropic:1"]="Sonnet 4 - Fast + capable"
+        ["anthropic:2"]="Sonnet 4.5 - Best balance"
+        ["anthropic:3"]="Opus 4.6 - Most capable"
+        ["openai:0"]="GPT-5 Mini - Fast + cheap (recommended)"
+        ["openai:1"]="GPT-5.2 - Most capable"
        ["gemini:0"]="Gemini 3 Flash - Fast (recommended)"
        ["gemini:1"]="Gemini 3.1 Pro - Best quality"
        ["groq:0"]="Kimi K2 - Best quality (recommended)"
@@ -450,10 +450,10 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    )

    declare -A MODEL_CHOICES_MAXTOKENS=(
-        ["anthropic:0"]=32768
-        ["anthropic:1"]=16384
-        ["anthropic:2"]=8192
-        ["anthropic:3"]=8192
+        ["anthropic:0"]=8192
+        ["anthropic:1"]=8192
+        ["anthropic:2"]=16384
+        ["anthropic:3"]=32768
        ["openai:0"]=16384
        ["openai:1"]=16384
        ["gemini:0"]=8192
@@ -508,7 +508,7 @@ else

    # Default models by provider id (parallel arrays)
    MODEL_PROVIDER_IDS=(anthropic openai gemini groq cerebras mistral together_ai deepseek)
-    MODEL_DEFAULTS=("claude-opus-4-6" "gpt-5.2" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")
+    MODEL_DEFAULTS=("claude-haiku-4-5-20251001" "gpt-5-mini" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")

    # Helper: get provider display name for an env var
    get_provider_name() {
@@ -552,9 +552,9 @@ else
    # Model choices per provider - flat parallel arrays with provider offsets
    # Provider order: anthropic(4), openai(2), gemini(2), groq(2), cerebras(2)
    MC_PROVIDERS=(anthropic anthropic anthropic anthropic openai openai gemini gemini groq groq cerebras cerebras)
-    MC_IDS=("claude-opus-4-6" "claude-sonnet-4-5-20250929" "claude-sonnet-4-20250514" "claude-haiku-4-5-20251001" "gpt-5.2" "gpt-5-mini" "gemini-3-flash-preview" "gemini-3.1-pro-preview" "moonshotai/kimi-k2-instruct-0905" "openai/gpt-oss-120b" "zai-glm-4.7" "qwen3-235b-a22b-instruct-2507")
-    MC_LABELS=("Opus 4.6 - Most capable (recommended)" "Sonnet 4.5 - Best balance" "Sonnet 4 - Fast + capable" "Haiku 4.5 - Fast + cheap" "GPT-5.2 - Most capable (recommended)" "GPT-5 Mini - Fast + cheap" "Gemini 3 Flash - Fast (recommended)" "Gemini 3.1 Pro - Best quality" "Kimi K2 - Best quality (recommended)" "GPT-OSS 120B - Fast reasoning" "ZAI-GLM 4.7 - Best quality (recommended)" "Qwen3 235B - Frontier reasoning")
-    MC_MAXTOKENS=(32768 16384 8192 8192 16384 16384 8192 8192 8192 8192 8192 8192)
+    MC_IDS=("claude-haiku-4-5-20251001" "claude-sonnet-4-20250514" "claude-sonnet-4-5-20250929" "claude-opus-4-6" "gpt-5-mini" "gpt-5.2" "gemini-3-flash-preview" "gemini-3.1-pro-preview" "moonshotai/kimi-k2-instruct-0905" "openai/gpt-oss-120b" "zai-glm-4.7" "qwen3-235b-a22b-instruct-2507")
+    MC_LABELS=("Haiku 4.5 - Fast + cheap (recommended)" "Sonnet 4 - Fast + capable" "Sonnet 4.5 - Best balance" "Opus 4.6 - Most capable" "GPT-5 Mini - Fast + cheap (recommended)" "GPT-5.2 - Most capable" "Gemini 3 Flash - Fast (recommended)" "Gemini 3.1 Pro - Best quality" "Kimi K2 - Best quality (recommended)" "GPT-OSS 120B - Fast reasoning" "ZAI-GLM 4.7 - Best quality (recommended)" "Qwen3 235B - Frontier reasoning")
+    MC_MAXTOKENS=(8192 8192 16384 32768 16384 16384 8192 8192 8192 8192 8192 8192)

    # Helper: get number of model choices for a provider
    get_model_choice_count() {
@@ -687,6 +687,19 @@ prompt_model_selection() {
    echo -e "${BOLD}Select a model:${NC}"
    echo ""

+    # Find default index from previous model (if same provider)
+    local default_idx=""
+    if [ -n "$PREV_MODEL" ] && [ "$provider_id" = "$PREV_PROVIDER" ]; then
+        local j=0
+        while [ $j -lt "$count" ]; do
+            if [ "$(get_model_choice_id "$provider_id" "$j")" = "$PREV_MODEL" ]; then
+                default_idx=$((j + 1))
+                break
+            fi
+            j=$((j + 1))
+        done
+    fi
+
    local i=0
    while [ $i -lt "$count" ]; do
        local label
@@ -701,7 +714,12 @@ prompt_model_selection() {

    local choice
    while true; do
-        read -r -p "Enter choice (1-$count): " choice || true
+        if [ -n "$default_idx" ]; then
+            read -r -p "Enter choice (1-$count) [$default_idx]: " choice || true
+            choice="${choice:-$default_idx}"
+        else
+            read -r -p "Enter choice (1-$count): " choice || true
+        fi
        if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "$count" ]; then
            local idx=$((choice - 1))
            SELECTED_MODEL="$(get_model_choice_id "$provider_id" "$idx")"
@@ -781,7 +799,9 @@ SUBSCRIPTION_MODE=""    # "claude_code" | "codex" | "zai_code" | ""

 # ── Credential detection (silent — just set flags) ───────────
 CLAUDE_CRED_DETECTED=false
-if [ -f "$HOME/.claude/.credentials.json" ]; then
+if command -v security &>/dev/null && security find-generic-password -s "Claude Code-credentials" &>/dev/null 2>&1; then
+    CLAUDE_CRED_DETECTED=true
+elif [ -f "$HOME/.claude/.credentials.json" ]; then
    CLAUDE_CRED_DETECTED=true
 fi

@@ -814,6 +834,65 @@ else
    done
 fi

+# ── Read previous configuration (if any) ──────────────────────
+PREV_PROVIDER=""
+PREV_MODEL=""
+PREV_ENV_VAR=""
+PREV_SUB_MODE=""
+if [ -f "$HIVE_CONFIG_FILE" ]; then
+    eval "$($PYTHON_CMD -c "
+import json, sys
+try:
+    with open('$HIVE_CONFIG_FILE') as f:
+        c = json.load(f)
+    llm = c.get('llm', {})
+    print(f'PREV_PROVIDER={llm.get(\"provider\", \"\")}')
+    print(f'PREV_MODEL={llm.get(\"model\", \"\")}')
+    print(f'PREV_ENV_VAR={llm.get(\"api_key_env_var\", \"\")}')
+    sub = ''
+    if llm.get('use_claude_code_subscription'): sub = 'claude_code'
+    elif llm.get('use_codex_subscription'): sub = 'codex'
+    elif 'api.z.ai' in llm.get('api_base', ''): sub = 'zai_code'
+    print(f'PREV_SUB_MODE={sub}')
+except Exception:
+    pass
+" 2>/dev/null)" || true
+fi
+
+# Compute default menu number from previous config (only if credential is still valid)
+DEFAULT_CHOICE=""
+if [ -n "$PREV_SUB_MODE" ] || [ -n "$PREV_PROVIDER" ]; then
+    PREV_CRED_VALID=false
+    case "$PREV_SUB_MODE" in
+        claude_code) [ "$CLAUDE_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
+        zai_code)    [ "$ZAI_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
+        codex)       [ "$CODEX_CRED_DETECTED" = true ] && PREV_CRED_VALID=true ;;
+        *)
+            # API key provider — check if the env var is set
+            if [ -n "$PREV_ENV_VAR" ] && [ -n "${!PREV_ENV_VAR}" ]; then
+                PREV_CRED_VALID=true
+            fi
+            ;;
+    esac
+
+    if [ "$PREV_CRED_VALID" = true ]; then
+        case "$PREV_SUB_MODE" in
+            claude_code) DEFAULT_CHOICE=1 ;;
+            zai_code)    DEFAULT_CHOICE=2 ;;
+            codex)       DEFAULT_CHOICE=3 ;;
+        esac
+        if [ -z "$DEFAULT_CHOICE" ]; then
+            case "$PREV_PROVIDER" in
+                anthropic) DEFAULT_CHOICE=4 ;;
+                openai)    DEFAULT_CHOICE=5 ;;
+                gemini)    DEFAULT_CHOICE=6 ;;
+                groq)      DEFAULT_CHOICE=7 ;;
+                cerebras)  DEFAULT_CHOICE=8 ;;
+            esac
+        fi
+    fi
+fi
+
 # ── Show unified provider selection menu ─────────────────────
 echo -e "${BOLD}Select your default LLM provider:${NC}"
 echo ""
@@ -858,8 +937,18 @@ done
 echo -e "  ${CYAN}9)${NC} Skip for now"
 echo ""

+if [ -n "$DEFAULT_CHOICE" ]; then
+    echo -e "  ${DIM}Previously configured: ${PREV_PROVIDER}/${PREV_MODEL}. Press Enter to keep.${NC}"
+    echo ""
+fi
+
 while true; do
-    read -r -p "Enter choice (1-9): " choice || true
+    if [ -n "$DEFAULT_CHOICE" ]; then
+        read -r -p "Enter choice (1-9) [$DEFAULT_CHOICE]: " choice || true
+        choice="${choice:-$DEFAULT_CHOICE}"
+    else
+        read -r -p "Enter choice (1-9): " choice || true
+    fi
    if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le 9 ]; then
        break
    fi
@@ -968,48 +1057,132 @@ case $choice in
        ;;
 esac

-# For API-key providers: prompt for key if not already set
-if [ -z "$SUBSCRIPTION_MODE" ] && [ -n "$SELECTED_ENV_VAR" ] && [ -z "${!SELECTED_ENV_VAR}" ]; then
-    echo ""
-    echo -e "Get your API key from: ${CYAN}$SIGNUP_URL${NC}"
-    echo ""
-    read -r -p "Paste your $PROVIDER_NAME API key (or press Enter to skip): " API_KEY
+# For API-key providers: prompt for key (allow replacement if already set)
+if [ -z "$SUBSCRIPTION_MODE" ] && [ -n "$SELECTED_ENV_VAR" ]; then
+    while true; do
+        CURRENT_KEY="${!SELECTED_ENV_VAR}"
+        if [ -n "$CURRENT_KEY" ]; then
+            # Key exists — offer to keep or replace
+            MASKED_KEY="${CURRENT_KEY:0:4}...${CURRENT_KEY: -4}"
+            echo ""
+            echo -e "  ${GREEN}⬢${NC} Current key: ${DIM}$MASKED_KEY${NC}"
+            read -r -p "  Press Enter to keep, or paste a new key to replace: " API_KEY
+        else
+            # No key — prompt for one
+            echo ""
+            echo -e "Get your API key from: ${CYAN}$SIGNUP_URL${NC}"
+            echo ""
+            read -r -p "Paste your $PROVIDER_NAME API key (or press Enter to skip): " API_KEY
+        fi

-    if [ -n "$API_KEY" ]; then
-        echo "" >> "$SHELL_RC_FILE"
-        echo "# Hive Agent Framework - $PROVIDER_NAME API key" >> "$SHELL_RC_FILE"
-        echo "export $SELECTED_ENV_VAR=\"$API_KEY\"" >> "$SHELL_RC_FILE"
-        export "$SELECTED_ENV_VAR=$API_KEY"
-        echo ""
-        echo -e "${GREEN}⬢${NC} API key saved to $SHELL_RC_FILE"
-    else
-        echo ""
-        echo -e "${YELLOW}Skipped.${NC} Add your API key to $SHELL_RC_FILE when ready."
-        SELECTED_ENV_VAR=""
-        SELECTED_PROVIDER_ID=""
-    fi
+        if [ -n "$API_KEY" ]; then
+            # Remove old export line(s) for this env var from shell rc, then append new
+            sed -i.bak "/^export ${SELECTED_ENV_VAR}=/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+            echo "" >> "$SHELL_RC_FILE"
+            echo "# Hive Agent Framework - $PROVIDER_NAME API key" >> "$SHELL_RC_FILE"
+            echo "export $SELECTED_ENV_VAR=\"$API_KEY\"" >> "$SHELL_RC_FILE"
+            export "$SELECTED_ENV_VAR=$API_KEY"
+            echo ""
+            echo -e "${GREEN}⬢${NC} API key saved to $SHELL_RC_FILE"
+            # Health check the new key
+            echo -n "  Verifying API key... "
+            HC_RESULT=$(uv run python "$SCRIPT_DIR/scripts/check_llm_key.py" "$SELECTED_PROVIDER_ID" "$API_KEY" 2>/dev/null) || true
+            HC_VALID=$(echo "$HC_RESULT" | $PYTHON_CMD -c "import json,sys; print(json.loads(sys.stdin.read()).get('valid',''))" 2>/dev/null) || true
+            HC_MSG=$(echo "$HC_RESULT" | $PYTHON_CMD -c "import json,sys; print(json.loads(sys.stdin.read()).get('message',''))" 2>/dev/null) || true
+            if [ "$HC_VALID" = "True" ]; then
+                echo -e "${GREEN}ok${NC}"
+                break
+            elif [ "$HC_VALID" = "False" ]; then
+                echo -e "${RED}failed${NC}"
+                echo -e "  ${YELLOW}⚠ $HC_MSG${NC}"
+                # Undo the save so the user can retry cleanly
+                sed -i.bak "/^export ${SELECTED_ENV_VAR}=/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+                # Remove the comment line we just added
+                sed -i.bak "/^# Hive Agent Framework - $PROVIDER_NAME API key$/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+                unset "$SELECTED_ENV_VAR"
+                echo ""
+                read -r -p "  Press Enter to try again: " _
+                # Loop back to key prompt
+            else
+                echo -e "${YELLOW}--${NC}"
+                echo -e "  ${DIM}Could not verify key (network issue). The key has been saved.${NC}"
+                break
+            fi
+        elif [ -z "$CURRENT_KEY" ]; then
+            # No existing key and user skipped — abort provider
+            echo ""
+            echo -e "${YELLOW}Skipped.${NC} Add your API key to $SHELL_RC_FILE when ready."
+            SELECTED_ENV_VAR=""
+            SELECTED_PROVIDER_ID=""
+            break
+        else
+            # User pressed Enter with existing key — keep it, proceed normally
+            break
+        fi
+    done
 fi

-# For ZAI subscription: always prompt for API key
+# For ZAI subscription: prompt for API key (allow replacement if already set)
 if [ "$SUBSCRIPTION_MODE" = "zai_code" ]; then
-    echo ""
-    read -r -p "Paste your ZAI API key (or press Enter to skip): " API_KEY
+    while true; do
+        if [ "$ZAI_CRED_DETECTED" = true ] && [ -n "$ZAI_API_KEY" ]; then
+            # Key exists — offer to keep or replace
+            MASKED_KEY="${ZAI_API_KEY:0:4}...${ZAI_API_KEY: -4}"
+            echo ""
+            echo -e "  ${GREEN}⬢${NC} Current ZAI key: ${DIM}$MASKED_KEY${NC}"
+            read -r -p "  Press Enter to keep, or paste a new key to replace: " API_KEY
+        else
+            # No key — prompt for one
+            echo ""
+            read -r -p "Paste your ZAI API key (or press Enter to skip): " API_KEY
+        fi

-    if [ -n "$API_KEY" ]; then
-        echo "" >> "$SHELL_RC_FILE"
-        echo "# Hive Agent Framework - ZAI Code subscription API key" >> "$SHELL_RC_FILE"
-        echo "export ZAI_API_KEY=\"$API_KEY\"" >> "$SHELL_RC_FILE"
-        export ZAI_API_KEY="$API_KEY"
-        echo ""
-        echo -e "${GREEN}⬢${NC} ZAI API key saved to $SHELL_RC_FILE"
-    else
-        echo ""
-        echo -e "${YELLOW}Skipped.${NC} Add your ZAI API key to $SHELL_RC_FILE when ready:"
-        echo -e "  ${CYAN}echo 'export ZAI_API_KEY=\"your-key\"' >> $SHELL_RC_FILE${NC}"
-        SELECTED_ENV_VAR=""
-        SELECTED_PROVIDER_ID=""
-        SUBSCRIPTION_MODE=""
-    fi
+        if [ -n "$API_KEY" ]; then
+            sed -i.bak "/^export ZAI_API_KEY=/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+            echo "" >> "$SHELL_RC_FILE"
+            echo "# Hive Agent Framework - ZAI Code subscription API key" >> "$SHELL_RC_FILE"
+            echo "export ZAI_API_KEY=\"$API_KEY\"" >> "$SHELL_RC_FILE"
+            export ZAI_API_KEY="$API_KEY"
+            echo ""
+            echo -e "${GREEN}⬢${NC} ZAI API key saved to $SHELL_RC_FILE"
+            # Health check the new key
+            echo -n "  Verifying ZAI API key... "
+            HC_RESULT=$(uv run python "$SCRIPT_DIR/scripts/check_llm_key.py" "zai" "$API_KEY" "https://api.z.ai/api/coding/paas/v4" 2>/dev/null) || true
+            HC_VALID=$(echo "$HC_RESULT" | $PYTHON_CMD -c "import json,sys; print(json.loads(sys.stdin.read()).get('valid',''))" 2>/dev/null) || true
+            HC_MSG=$(echo "$HC_RESULT" | $PYTHON_CMD -c "import json,sys; print(json.loads(sys.stdin.read()).get('message',''))" 2>/dev/null) || true
+            if [ "$HC_VALID" = "True" ]; then
+                echo -e "${GREEN}ok${NC}"
+                break
+            elif [ "$HC_VALID" = "False" ]; then
+                echo -e "${RED}failed${NC}"
+                echo -e "  ${YELLOW}⚠ $HC_MSG${NC}"
+                # Undo the save so the user can retry cleanly
+                sed -i.bak "/^export ZAI_API_KEY=/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+                sed -i.bak "/^# Hive Agent Framework - ZAI Code subscription API key$/d" "$SHELL_RC_FILE" && rm -f "${SHELL_RC_FILE}.bak"
+                unset ZAI_API_KEY
+                ZAI_CRED_DETECTED=false
+                echo ""
+                read -r -p "  Press Enter to try again: " _
+                # Loop back to key prompt
+            else
+                echo -e "${YELLOW}--${NC}"
+                echo -e "  ${DIM}Could not verify key (network issue). The key has been saved.${NC}"
+                break
+            fi
+        elif [ "$ZAI_CRED_DETECTED" = false ] || [ -z "$ZAI_API_KEY" ]; then
+            # No existing key and user skipped — abort provider
+            echo ""
+            echo -e "${YELLOW}Skipped.${NC} Add your ZAI API key to $SHELL_RC_FILE when ready:"
+            echo -e "  ${CYAN}echo 'export ZAI_API_KEY=\"your-key\"' >> $SHELL_RC_FILE${NC}"
+            SELECTED_ENV_VAR=""
+            SELECTED_PROVIDER_ID=""
+            SUBSCRIPTION_MODE=""
+            break
+        else
+            # User pressed Enter with existing key — keep it, proceed normally
+            break
+        fi
+    done
 fi

 # Prompt for model if not already selected (manual provider path)
@@ -1037,52 +1210,22 @@ fi
 echo ""

 # ============================================================
-# Step 4b: Browser Automation (GCU)
+# Step 4b: Browser Automation (GCU) — always enabled
 # ============================================================

-echo -e "${BOLD}Enable browser automation?${NC}"
-echo -e "${DIM}This lets your agents control a real browser — navigate websites, fill forms,${NC}"
-echo -e "${DIM}scrape dynamic pages, and interact with web UIs.${NC}"
-echo ""
-echo -e "  ${CYAN}${BOLD}1)${NC} ${BOLD}Yes${NC}"
-echo -e "  ${CYAN}2)${NC} No"
-echo ""
-
-while true; do
-    read -r -p "Enter choice (1-2, default 1): " gcu_choice || true
-    gcu_choice="${gcu_choice:-1}"
-    if [ "$gcu_choice" = "1" ] || [ "$gcu_choice" = "2" ]; then
-        break
-    fi
-    echo -e "${RED}Invalid choice. Please enter 1 or 2${NC}"
-done
-
-if [ "$gcu_choice" = "1" ]; then
-    GCU_ENABLED=true
-    echo -e "${GREEN}⬢${NC} Browser automation enabled"
-else
-    GCU_ENABLED=false
-    echo -e "${DIM}⬡ Browser automation skipped${NC}"
-fi
+echo -e "${GREEN}⬢${NC} Browser automation enabled"

 # Patch gcu_enabled into configuration.json
-if [ "$GCU_ENABLED" = "true" ]; then
-    GCU_PY_VAL="True"
-else
-    GCU_PY_VAL="False"
-fi
-
 if [ -f "$HIVE_CONFIG_FILE" ]; then
    uv run python -c "
 import json
 with open('$HIVE_CONFIG_FILE') as f:
    config = json.load(f)
-config['gcu_enabled'] = $GCU_PY_VAL
+config['gcu_enabled'] = True
 with open('$HIVE_CONFIG_FILE', 'w') as f:
    json.dump(config, f, indent=2)
 "
-elif [ "$GCU_ENABLED" = "true" ]; then
-    # No config file yet (user skipped LLM provider) — create minimal one
+else
    mkdir -p "$HIVE_CONFIG_DIR"
    uv run python -c "
 import json
@@ -1352,9 +1495,10 @@ if [ "$FRONTEND_BUILT" = true ]; then
    echo -e "  ${DIM}Starting server on http://localhost:8787${NC}"
    echo -e "  ${DIM}Press Ctrl+C to stop${NC}"
    echo ""
-    # exec replaces the quickstart process with hive serve
-    # --open tells it to auto-open the browser once the server is ready
-    exec "$SCRIPT_DIR/hive" serve --open
+    echo -e "  ${DIM}Tip: You can restart the dashboard anytime with:${NC} ${CYAN}hive open${NC}"
+    echo ""
+    # exec replaces the quickstart process with hive open
+    exec "$SCRIPT_DIR/hive" open
 else
    # No frontend — show manual instructions
    echo -e "${YELLOW}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
@@ -0,0 +1,125 @@
+"""Validate an LLM API key without consuming tokens.
+
+Usage:
+    python scripts/check_llm_key.py <provider_id> <api_key> [api_base]
+
+Exit codes:
+    0 = valid key
+    1 = invalid key
+    2 = inconclusive (timeout, network error)
+
+Output: single JSON line {"valid": bool, "message": str}
+"""
+
+import json
+import sys
+
+import httpx
+
+TIMEOUT = 10.0
+
+
+def check_anthropic(api_key: str, **_: str) -> dict:
+    """Send empty messages to trigger 400 without consuming tokens."""
+    with httpx.Client(timeout=TIMEOUT) as client:
+        r = client.post(
+            "https://api.anthropic.com/v1/messages",
+            headers={
+                "x-api-key": api_key,
+                "anthropic-version": "2023-06-01",
+                "Content-Type": "application/json",
+            },
+            json={"model": "claude-sonnet-4-20250514", "max_tokens": 1, "messages": []},
+        )
+    if r.status_code in (200, 400, 429):
+        return {"valid": True, "message": "API key valid"}
+    if r.status_code == 401:
+        return {"valid": False, "message": "Invalid API key"}
+    if r.status_code == 403:
+        return {"valid": False, "message": "API key lacks permissions"}
+    return {"valid": False, "message": f"Unexpected status {r.status_code}"}
+
+
+def check_openai_compatible(api_key: str, endpoint: str, name: str) -> dict:
+    """GET /models on any OpenAI-compatible API."""
+    with httpx.Client(timeout=TIMEOUT) as client:
+        r = client.get(
+            endpoint,
+            headers={"Authorization": f"Bearer {api_key}"},
+        )
+    if r.status_code in (200, 429):
+        return {"valid": True, "message": f"{name} API key valid"}
+    if r.status_code == 401:
+        return {"valid": False, "message": f"Invalid {name} API key"}
+    if r.status_code == 403:
+        return {"valid": False, "message": f"{name} API key lacks permissions"}
+    return {"valid": False, "message": f"{name} API returned status {r.status_code}"}
+
+
+def check_gemini(api_key: str, **_: str) -> dict:
+    """List models with query param auth."""
+    with httpx.Client(timeout=TIMEOUT) as client:
+        r = client.get(
+            "https://generativelanguage.googleapis.com/v1beta/models",
+            params={"key": api_key},
+        )
+    if r.status_code in (200, 429):
+        return {"valid": True, "message": "Gemini API key valid"}
+    if r.status_code in (400, 401, 403):
+        return {"valid": False, "message": "Invalid Gemini API key"}
+    return {"valid": False, "message": f"Gemini API returned status {r.status_code}"}
+
+
+PROVIDERS = {
+    "anthropic": lambda key, **kw: check_anthropic(key),
+    "openai": lambda key, **kw: check_openai_compatible(
+        key, "https://api.openai.com/v1/models", "OpenAI"
+    ),
+    "gemini": lambda key, **kw: check_gemini(key),
+    "groq": lambda key, **kw: check_openai_compatible(
+        key, "https://api.groq.com/openai/v1/models", "Groq"
+    ),
+    "cerebras": lambda key, **kw: check_openai_compatible(
+        key, "https://api.cerebras.ai/v1/models", "Cerebras"
+    ),
+}
+
+
+def main() -> None:
+    if len(sys.argv) < 3:
+        print(json.dumps({"valid": False, "message": "Usage: check_llm_key.py <provider> <key> [api_base]"}))
+        sys.exit(2)
+
+    provider_id = sys.argv[1]
+    api_key = sys.argv[2]
+    api_base = sys.argv[3] if len(sys.argv) > 3 else ""
+
+    try:
+        if api_base:
+            # Custom API base (ZAI or other OpenAI-compatible)
+            endpoint = api_base.rstrip("/") + "/models"
+            result = check_openai_compatible(api_key, endpoint, "ZAI")
+        elif provider_id in PROVIDERS:
+            result = PROVIDERS[provider_id](api_key)
+        else:
+            result = {"valid": True, "message": f"No health check for {provider_id}"}
+            print(json.dumps(result))
+            sys.exit(0)
+
+        print(json.dumps(result))
+        sys.exit(0 if result["valid"] else 1)
+
+    except httpx.TimeoutException:
+        print(json.dumps({"valid": None, "message": "Request timed out"}))
+        sys.exit(2)
+    except httpx.RequestError as e:
+        msg = str(e)
+        # Redact key from error messages
+        if api_key in msg:
+            msg = msg.replace(api_key, "***")
+        print(json.dumps({"valid": None, "message": f"Connection failed: {msg}"}))
+        sys.exit(2)
+
+
+if __name__ == "__main__":
+    main()
@@ -20,6 +20,7 @@ def test_check_requirements():
        [sys.executable, "scripts/check_requirements.py", "json", "sys", "os"],
        capture_output=True,
        text=True,
+        encoding="utf-8",
    )
    print(f"Exit code: {result.returncode}")
    print(f"Output:\n{result.stdout}")
@@ -39,6 +40,7 @@ def test_check_requirements():
        [sys.executable, "scripts/check_requirements.py", "json", "nonexistent_module"],
        capture_output=True,
        text=True,
+        encoding="utf-8",
    )
    print(f"Exit code: {result.returncode}")
    print(f"Output:\n{result.stdout}")
@@ -0,0 +1,66 @@
+# MSSQL Connection Configuration Template
+#
+# Copy this file to .env and fill in your actual values
+# DO NOT commit the .env file to version control!
+
+# ============================================================================
+# SQL Server Connection - Choose ONE format below:
+# ============================================================================
+
+# OPTION 1: Local named instance
+MSSQL_SERVER=localhost\SQLEXPRESS
+
+# OPTION 2: Local default instance
+# MSSQL_SERVER=localhost
+
+# OPTION 3: Remote server with default port (1433)
+# MSSQL_SERVER=192.168.1.100
+
+# OPTION 4: Remote server with custom port (comma-separated)
+# MSSQL_SERVER=192.168.1.100,1433
+
+# OPTION 5: Remote named instance
+# MSSQL_SERVER=PRODUCTION-SERVER\INSTANCE01
+
+# OPTION 6: Domain server name
+# MSSQL_SERVER=sql-prod.company.com
+
+# OPTION 7: Domain server with port
+# MSSQL_SERVER=sql-prod.company.com,1433
+
+# ============================================================================
+# Database Configuration
+# ============================================================================
+MSSQL_DATABASE=AdenTestDB
+
+# ============================================================================
+# Authentication - Choose ONE method:
+# ============================================================================
+
+# METHOD 1: SQL Server Authentication (username/password)
+# Use this for: remote servers, Linux servers, specific SQL logins
+MSSQL_USERNAME=sa
+MSSQL_PASSWORD=your_password_here
+
+# METHOD 2: Windows Authentication (leave both empty)
+# Use this for: local Windows servers, domain-joined environments
+# MSSQL_USERNAME=
+# MSSQL_PASSWORD=
+
+# ============================================================================
+# Important Notes:
+# ============================================================================
+# - Port format: Use comma (,) not colon - Example: server,1433
+# - Named instances: Use backslash (\) - Example: SERVER\INSTANCE
+# - Default port: 1433 (can be omitted if using default)
+# - ODBC Driver: Requires "ODBC Driver 17 for SQL Server" or newer
+# - Security: Never commit this file with real credentials!
+# - Escaping: In some shells, escape backslashes (\\) when setting env vars
+# ============================================================================
+
+# Example Production Configurations:
+# -----------------------------------
+# Azure SQL: MSSQL_SERVER=yourserver.database.windows.net
+# AWS RDS: MSSQL_SERVER=yourinstance.region.rds.amazonaws.com,1433
+# Docker: MSSQL_SERVER=localhost,1401
+# Kubernetes: MSSQL_SERVER=mssql-service.namespace.svc.cluster.local,1433
@@ -72,6 +72,7 @@ python mcp_server.py
 | `apply_diff` | Apply diff patches to files |
 | `apply_patch` | Apply unified patches to files |
 | `grep_search` | Search file contents with regex |
+| `hashline_edit` | Anchor-based file editing with hash-validated line references |
 | `execute_command_tool` | Execute shell commands |
 | `save_data` / `load_data` | Persist and retrieve structured data across steps |
 | `serve_file_to_user` | Serve a file for the user to download |
@@ -175,14 +176,17 @@ tools/
 │   └── tools/               # Tool implementations
 │       ├── example_tool/
 │       ├── file_system_toolkits/  # File operation tools
-│       │   ├── view_file.py
-│       │   ├── write_to_file.py
-│       │   ├── list_dir.py
-│       │   ├── replace_file_content.py
-│       │   ├── apply_diff.py
-│       │   ├── apply_patch.py
-│       │   ├── grep_search.py
-│       │   └── execute_command_tool.py
+│       │   ├── security.py
+│       │   ├── hashline.py
+│       │   ├── view_file/
+│       │   ├── write_to_file/
+│       │   ├── list_dir/
+│       │   ├── replace_file_content/
+│       │   ├── apply_diff/
+│       │   ├── apply_patch/
+│       │   ├── grep_search/
+│       │   ├── hashline_edit/
+│       │   └── execute_command_tool/
 │       ├── web_search_tool/
 │       ├── web_scrape_tool/
 │       ├── pdf_read_tool/
@@ -71,8 +71,49 @@ def _find_project_root() -> str:

 def _resolve_path(path: str) -> str:
    """Resolve path relative to PROJECT_ROOT. Raises ValueError if outside."""
+    # Normalize slashes for cross-platform (e.g. exports/hi_agent from LLM)
+    path = path.replace("/", os.sep)
    if os.path.isabs(path):
        resolved = os.path.abspath(path)
+        try:
+            common = os.path.commonpath([resolved, PROJECT_ROOT])
+        except ValueError:
+            common = ""
+        if common != PROJECT_ROOT:
+            # LLM may emit wrong-root paths (/mnt/data, /workspace, etc.).
+            # Strip known prefixes and treat the remainder as relative to PROJECT_ROOT.
+            path_norm = path.replace("\\", "/")
+            for prefix in (
+                "/mnt/data/",
+                "/mnt/data",
+                "/workspace/",
+                "/workspace",
+                "/repo/",
+                "/repo",
+            ):
+                p = prefix.rstrip("/") + "/"
+                prefix_stripped = prefix.rstrip("/")
+                if path_norm.startswith(p) or (
+                    path_norm.startswith(prefix_stripped) and len(path_norm) > len(prefix)
+                ):
+                    suffix = path_norm[len(prefix_stripped) :].lstrip("/")
+                    if suffix:
+                        path = suffix.replace("/", os.sep)
+                        resolved = os.path.abspath(os.path.join(PROJECT_ROOT, path))
+                        break
+            else:
+                # Try extracting exports/ or core/ subpath from the absolute path
+                parts = path.split(os.sep)
+                if "exports" in parts:
+                    idx = parts.index("exports")
+                    path = os.sep.join(parts[idx:])
+                    resolved = os.path.abspath(os.path.join(PROJECT_ROOT, path))
+                elif "core" in parts:
+                    idx = parts.index("core")
+                    path = os.sep.join(parts[idx:])
+                    resolved = os.path.abspath(os.path.join(PROJECT_ROOT, path))
+                else:
+                    raise ValueError(f"Access denied: '{path}' is outside the project root.")
    else:
        resolved = os.path.abspath(os.path.join(PROJECT_ROOT, path))
    try:
@@ -90,7 +131,9 @@ def _resolve_path(path: str) -> str:
 def _snapshot_git(*args: str) -> str:
    """Run a git command with the snapshot GIT_DIR and PROJECT_ROOT worktree."""
    cmd = ["git", "--git-dir", SNAPSHOT_DIR, "--work-tree", PROJECT_ROOT, *args]
-    result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
+    result = subprocess.run(
+        cmd, capture_output=True, text=True, timeout=30, encoding="utf-8", stdin=subprocess.DEVNULL
+    )
    return result.stdout.strip()


@@ -104,6 +147,8 @@ def _ensure_snapshot_repo():
            ["git", "init", "--bare", SNAPSHOT_DIR],
            capture_output=True,
            timeout=10,
+            stdin=subprocess.DEVNULL,
+            encoding="utf-8",
        )
        _snapshot_git("config", "core.autocrlf", "false")

@@ -125,6 +170,37 @@ def _take_snapshot() -> str:
 MAX_COMMAND_OUTPUT = 30_000  # chars before truncation


+def _translate_command_for_windows(command: str) -> str:
+    """Translate common Unix commands to Windows equivalents."""
+    if os.name != "nt":
+        return command
+    cmd = command.strip()
+
+    # mkdir -p: Unix creates parents; Windows mkdir already does; -p becomes a dir name
+    if cmd.startswith("mkdir -p ") or cmd.startswith("mkdir -p\t"):
+        rest = cmd[9:].lstrip().replace("/", os.sep)
+        return "mkdir " + rest
+
+    # ls / pwd: cmd.exe uses dir and cd
+    # Order matters: replace longer patterns first
+    for unix, win in [
+        ("ls -la", "dir /a"),
+        ("ls -al", "dir /a"),
+        ("ls -l", "dir"),
+        ("ls -a", "dir /a"),
+        ("ls ", "dir "),
+        ("pwd", "cd"),
+    ]:
+        cmd = cmd.replace(unix, win)
+    # Standalone "ls" at end (e.g. "cd x && ls")
+    if cmd.endswith(" ls"):
+        cmd = cmd[:-3] + " dir"
+    elif cmd == "ls":
+        cmd = "dir"
+
+    return cmd
+
+
@mcp.tool()
 def run_command(command: str, cwd: str = "", timeout: int = 120) -> str:
    """Execute a shell command in the project context.
@@ -144,6 +220,7 @@ def run_command(command: str, cwd: str = "", timeout: int = 120) -> str:
    work_dir = _resolve_path(cwd) if cwd else PROJECT_ROOT

    try:
+        command = _translate_command_for_windows(command)
        start = time.monotonic()
        result = subprocess.run(
            command,
@@ -152,11 +229,16 @@ def run_command(command: str, cwd: str = "", timeout: int = 120) -> str:
            capture_output=True,
            text=True,
            timeout=timeout,
+            stdin=subprocess.DEVNULL,
+            encoding="utf-8",
            env={
                **os.environ,
-                "PYTHONPATH": (
-                    f"{PROJECT_ROOT}/core:{PROJECT_ROOT}/exports"
-                    f":{PROJECT_ROOT}/core/framework/agents"
+                "PYTHONPATH": os.pathsep.join(
+                    [
+                        os.path.join(PROJECT_ROOT, "core"),
+                        os.path.join(PROJECT_ROOT, "exports"),
+                        os.path.join(PROJECT_ROOT, "core", "framework", "agents"),
+                    ]
                ),
            },
        )
@@ -228,6 +310,8 @@ def undo_changes(path: str = "") -> str:
                capture_output=True,
                text=True,
                timeout=10,
+                stdin=subprocess.DEVNULL,
+                encoding="utf-8",
            )
            return f"Restored: {path}"
        else:
@@ -247,121 +331,34 @@ def undo_changes(path: str = "") -> str:


@mcp.tool()
-def discover_mcp_tools(server_config_path: str = "") -> str:
-    """Discover available MCP tools by connecting to servers defined in a config file.
+def list_agent_tools(
+    server_config_path: str = "",
+    output_schema: str = "simple",
+    group: str = "all",
+) -> str:
+    """Discover tools available for agent building, grouped by category.

-    Connects to each MCP server, lists all tools with full schemas, then
-    disconnects. Use this to see what tools are available before designing
-    an agent — never rely on static documentation.
-
-    Args:
-        server_config_path: Path to mcp_servers.json (relative to project root).
-            Default: the hive-tools server config at tools/mcp_servers.json.
-            Can also point to any agent's mcp_servers.json.
-
-    Returns:
-        JSON listing of all tools with names, descriptions, and input schemas
-    """
-    # Resolve config path
-    if not server_config_path:
-        # Default: look for the main hive-tools mcp_servers.json
-        candidates = [
-            os.path.join(PROJECT_ROOT, "tools", "mcp_servers.json"),
-            os.path.join(PROJECT_ROOT, "mcp_servers.json"),
-        ]
-        config_path = None
-        for c in candidates:
-            if os.path.isfile(c):
-                config_path = c
-                break
-        if not config_path:
-            return "Error: No mcp_servers.json found. Provide server_config_path."
-    else:
-        config_path = _resolve_path(server_config_path)
-        if not os.path.isfile(config_path):
-            return f"Error: Config file not found: {server_config_path}"
-
-    try:
-        with open(config_path, encoding="utf-8") as f:
-            servers_config = json.load(f)
-    except (json.JSONDecodeError, OSError) as e:
-        return f"Error reading config: {e}"
-
-    # Import MCPClient (deferred — needs PYTHONPATH to include core/)
-    try:
-        from framework.runner.mcp_client import MCPClient, MCPServerConfig
-    except ImportError:
-        return "Error: Cannot import MCPClient. Ensure PYTHONPATH includes the core/ directory."
-
-    all_tools = []
-    errors = []
-    config_dir = os.path.dirname(config_path)
-
-    for server_name, server_conf in servers_config.items():
-        # Resolve cwd relative to config file location
-        cwd = server_conf.get("cwd", "")
-        if cwd and not os.path.isabs(cwd):
-            cwd = os.path.abspath(os.path.join(config_dir, cwd))
-
-        try:
-            config = MCPServerConfig(
-                name=server_name,
-                transport=server_conf.get("transport", "stdio"),
-                command=server_conf.get("command"),
-                args=server_conf.get("args", []),
-                env=server_conf.get("env", {}),
-                cwd=cwd or None,
-                url=server_conf.get("url"),
-                headers=server_conf.get("headers", {}),
-            )
-            client = MCPClient(config)
-            client.connect()
-            tools = client.list_tools()
-
-            for tool in tools:
-                all_tools.append(
-                    {
-                        "server": server_name,
-                        "name": tool.name,
-                        "description": tool.description,
-                        "input_schema": tool.input_schema,
-                    }
-                )
-
-            client.disconnect()
-        except Exception as e:
-            errors.append({"server": server_name, "error": str(e)})
-
-    result = {
-        "tools": all_tools,
-        "total": len(all_tools),
-        "servers_queried": len(servers_config),
-    }
-    if errors:
-        result["errors"] = errors
-
-    return json.dumps(result, indent=2, default=str)
-
-
-# ── Meta-agent: Agent tool catalog ────────────────────────────────────────
-
-
-@mcp.tool()
-def list_agent_tools(server_config_path: str = "") -> str:
-    """List all tools available for agent building from the hive-tools MCP server.
-
-    Returns tool names grouped by category. Use this BEFORE designing an agent
-    to know exactly which tools exist. Only use tools from this list in node
-    definitions — never guess or fabricate tool names.
+    Connects to each MCP server, lists tools, then disconnects. Use this
+    BEFORE designing an agent to know exactly which tools exist. Only use
+    tools from this list in node definitions — never guess or fabricate.

    Args:
        server_config_path: Path to mcp_servers.json. Default: tools/mcp_servers.json
            (the standard hive-tools server). Can also point to an agent's config
            to see what tools that specific agent has access to.
+        output_schema: "simple" (default) returns name and description per tool.
+            "full" also includes server and input_schema.
+        group: "all" (default) returns every category. A prefix like "gmail"
+            returns only that group's tools.

    Returns:
-        JSON with tool names grouped by prefix (e.g. gmail_*, slack_*, etc.)
+        JSON with tools grouped by prefix (e.g. gmail_*, slack_*).
    """
+    if output_schema not in ("simple", "full"):
+        return json.dumps(
+            {"error": f"Invalid output_schema: {output_schema!r}. Use 'simple' or 'full'."}
+        )
+
    # Resolve config path
    if not server_config_path:
        candidates = [
@@ -387,53 +384,75 @@ def list_agent_tools(server_config_path: str = "") -> str:
        return json.dumps({"error": f"Failed to read config: {e}"})

    try:
+        from pathlib import Path
+
        from framework.runner.mcp_client import MCPClient, MCPServerConfig
+        from framework.runner.tool_registry import ToolRegistry
    except ImportError:
        return json.dumps({"error": "Cannot import MCPClient"})

    all_tools: list[dict] = []
    errors = []
-    config_dir = os.path.dirname(config_path)
+    config_dir = Path(config_path).parent

    for server_name, server_conf in servers_config.items():
-        cwd = server_conf.get("cwd", "")
-        if cwd and not os.path.isabs(cwd):
-            cwd = os.path.abspath(os.path.join(config_dir, cwd))
+        resolved = ToolRegistry.resolve_mcp_stdio_config(
+            {"name": server_name, **server_conf}, config_dir
+        )
        try:
            config = MCPServerConfig(
                name=server_name,
-                transport=server_conf.get("transport", "stdio"),
-                command=server_conf.get("command"),
-                args=server_conf.get("args", []),
-                env=server_conf.get("env", {}),
-                cwd=cwd or None,
-                url=server_conf.get("url"),
-                headers=server_conf.get("headers", {}),
+                transport=resolved.get("transport", "stdio"),
+                command=resolved.get("command"),
+                args=resolved.get("args", []),
+                env=resolved.get("env", {}),
+                cwd=resolved.get("cwd"),
+                url=resolved.get("url"),
+                headers=resolved.get("headers", {}),
            )
            client = MCPClient(config)
            client.connect()
            for tool in client.list_tools():
-                all_tools.append({"name": tool.name, "description": tool.description})
+                all_tools.append(
+                    {
+                        "server": server_name,
+                        "name": tool.name,
+                        "description": tool.description,
+                        "input_schema": tool.input_schema,
+                    }
+                )
            client.disconnect()
        except Exception as e:
            errors.append({"server": server_name, "error": str(e)})

    # Group by prefix (e.g., gmail_, slack_, stripe_)
-    groups: dict[str, list[str]] = {}
+    groups: dict[str, list[dict]] = {}
    for t in sorted(all_tools, key=lambda x: x["name"]):
        parts = t["name"].split("_", 1)
        prefix = parts[0] if len(parts) > 1 else "general"
-        groups.setdefault(prefix, []).append(t["name"])
+        groups.setdefault(prefix, []).append(t)

+    # Filter to a specific group
+    if group != "all":
+        groups = {group: groups[group]} if group in groups else {}
+
+    # Apply output schema
+    if output_schema == "simple":
+        groups = {
+            prefix: [{"name": t["name"], "description": t["description"]} for t in tools]
+            for prefix, tools in groups.items()
+        }
+
+    all_names = sorted(t["name"] for tools in groups.values() for t in tools)
    result: dict = {
-        "total": len(all_tools),
+        "total": len(all_names),
        "tools_by_category": groups,
-        "all_tool_names": sorted(t["name"] for t in all_tools),
+        "all_tool_names": all_names,
    }
    if errors:
        result["errors"] = errors

-    return json.dumps(result, indent=2)
+    return json.dumps(result, indent=2, default=str)


 # ── Meta-agent: Agent tool validation ─────────────────────────────────────
@@ -478,19 +497,24 @@ def validate_agent_tools(agent_path: str) -> str:
    if not os.path.isdir(resolved):
        return json.dumps({"error": f"Agent directory not found: {agent_path}"})

+    agent_dir = resolved  # Keep path; 'resolved' is reused for MCP config in loop
+
    # --- Discover available tools from agent's MCP servers ---
-    mcp_config_path = os.path.join(resolved, "mcp_servers.json")
+    mcp_config_path = os.path.join(agent_dir, "mcp_servers.json")
    if not os.path.isfile(mcp_config_path):
        return json.dumps({"error": f"No mcp_servers.json found in {agent_path}"})

    try:
+        from pathlib import Path
+
        from framework.runner.mcp_client import MCPClient, MCPServerConfig
+        from framework.runner.tool_registry import ToolRegistry
    except ImportError:
        return json.dumps({"error": "Cannot import MCPClient"})

    available_tools: set[str] = set()
    discovery_errors = []
-    config_dir = os.path.dirname(mcp_config_path)
+    config_dir = Path(mcp_config_path).parent

    try:
        with open(mcp_config_path, encoding="utf-8") as f:
@@ -499,19 +523,19 @@ def validate_agent_tools(agent_path: str) -> str:
        return json.dumps({"error": f"Failed to read mcp_servers.json: {e}"})

    for server_name, server_conf in servers_config.items():
-        cwd = server_conf.get("cwd", "")
-        if cwd and not os.path.isabs(cwd):
-            cwd = os.path.abspath(os.path.join(config_dir, cwd))
+        resolved = ToolRegistry.resolve_mcp_stdio_config(
+            {"name": server_name, **server_conf}, config_dir
+        )
        try:
            config = MCPServerConfig(
                name=server_name,
-                transport=server_conf.get("transport", "stdio"),
-                command=server_conf.get("command"),
-                args=server_conf.get("args", []),
-                env=server_conf.get("env", {}),
-                cwd=cwd or None,
-                url=server_conf.get("url"),
-                headers=server_conf.get("headers", {}),
+                transport=resolved.get("transport", "stdio"),
+                command=resolved.get("command"),
+                args=resolved.get("args", []),
+                env=resolved.get("env", {}),
+                cwd=resolved.get("cwd"),
+                url=resolved.get("url"),
+                headers=resolved.get("headers", {}),
            )
            client = MCPClient(config)
            client.connect()
@@ -522,7 +546,7 @@ def validate_agent_tools(agent_path: str) -> str:
            discovery_errors.append({"server": server_name, "error": str(e)})

    # --- Load agent nodes and extract declared tools ---
-    agent_py = os.path.join(resolved, "agent.py")
+    agent_py = os.path.join(agent_dir, "agent.py")
    if not os.path.isfile(agent_py):
        return json.dumps({"error": f"No agent.py found in {agent_path}"})

@@ -530,8 +554,8 @@ def validate_agent_tools(agent_path: str) -> str:
    import importlib.util
    import sys

-    package_name = os.path.basename(resolved)
-    parent_dir = os.path.dirname(os.path.abspath(resolved))
+    package_name = os.path.basename(agent_dir)
+    parent_dir = os.path.dirname(os.path.abspath(agent_dir))
    if parent_dir not in sys.path:
        sys.path.insert(0, parent_dir)

@@ -564,7 +588,7 @@ def validate_agent_tools(agent_path: str) -> str:
        result["missing_tools"] = missing_by_node
        result["message"] = (
            f"FAIL: {sum(len(v) for v in missing_by_node.values())} tool(s) declared "
-            f"in nodes do not exist. Run discover_mcp_tools() to see available tools "
+            f"in nodes do not exist. Run list_agent_tools() to see available tools "
            f"and fix the node definitions."
        )
    else:
@@ -785,94 +809,6 @@ def list_agent_sessions(
    )


-@mcp.tool()
-def get_agent_session_state(agent_name: str, session_id: str) -> str:
-    """Load full session state (excluding memory to prevent context bloat).
-
-    Returns status, progress, result, metrics, and checkpoint info.
-    Use get_agent_session_memory to read memory contents separately.
-
-    Args:
-        agent_name: Agent package name (e.g. 'deep_research_agent')
-        session_id: Session ID (e.g. 'session_20260208_143022_abc12345')
-
-    Returns:
-        JSON with full session state
-    """
-    agent_dir = _resolve_hive_agent_path(agent_name)
-    state_path = agent_dir / "sessions" / session_id / "state.json"
-    data = _read_session_json(state_path)
-    if data is None:
-        return json.dumps({"error": f"Session not found: {session_id}"})
-
-    # Exclude memory values but show keys
-    memory = data.get("memory", {})
-    data["memory_keys"] = list(memory.keys()) if isinstance(memory, dict) else []
-    data["memory_size"] = len(memory) if isinstance(memory, dict) else 0
-    data.pop("memory", None)
-
-    return json.dumps(data, indent=2, default=str)
-
-
-@mcp.tool()
-def get_agent_session_memory(
-    agent_name: str,
-    session_id: str,
-    key: str = "",
-) -> str:
-    """Read memory contents from a session.
-
-    Memory stores intermediate results passed between nodes. Use this
-    to inspect what data was produced during execution.
-
-    Args:
-        agent_name: Agent package name
-        session_id: Session ID
-        key: Specific memory key to retrieve. Empty for all keys.
-
-    Returns:
-        JSON with memory contents
-    """
-    agent_dir = _resolve_hive_agent_path(agent_name)
-    state_path = agent_dir / "sessions" / session_id / "state.json"
-    data = _read_session_json(state_path)
-    if data is None:
-        return json.dumps({"error": f"Session not found: {session_id}"})
-
-    memory = data.get("memory", {})
-    if not isinstance(memory, dict):
-        memory = {}
-
-    if key:
-        if key not in memory:
-            return json.dumps(
-                {
-                    "error": f"Memory key not found: '{key}'",
-                    "available_keys": list(memory.keys()),
-                }
-            )
-        return json.dumps(
-            {
-                "session_id": session_id,
-                "key": key,
-                "value": memory[key],
-                "value_type": type(memory[key]).__name__,
-            },
-            indent=2,
-            default=str,
-        )
-
-    return json.dumps(
-        {
-            "session_id": session_id,
-            "memory": memory,
-            "total_keys": len(memory),
-        },
-        indent=2,
-        default=str,
-    )
-
-
@mcp.tool()
 def list_agent_checkpoints(
    agent_name: str,
@@ -1074,13 +1010,16 @@ def run_agent_tests(
        cmd.append("-x")
    cmd.append("--tb=short")

-    # Set PYTHONPATH
+    # Set PYTHONPATH (use pathsep for Windows)
    env = os.environ.copy()
    pythonpath = env.get("PYTHONPATH", "")
    core_path = os.path.join(PROJECT_ROOT, "core")
    exports_path = os.path.join(PROJECT_ROOT, "exports")
    fw_agents_path = os.path.join(PROJECT_ROOT, "core", "framework", "agents")
-    env["PYTHONPATH"] = f"{core_path}:{exports_path}:{fw_agents_path}:{PROJECT_ROOT}:{pythonpath}"
+    path_parts = [core_path, exports_path, fw_agents_path, PROJECT_ROOT]
+    if pythonpath:
+        path_parts.append(pythonpath)
+    env["PYTHONPATH"] = os.pathsep.join(path_parts)

    try:
        result = subprocess.run(
@@ -1089,6 +1028,8 @@ def run_agent_tests(
            text=True,
            timeout=120,
            env=env,
+            stdin=subprocess.DEVNULL,
+            encoding="utf-8",
        )
    except subprocess.TimeoutExpired:
        return json.dumps(
@@ -1212,7 +1153,7 @@ def main() -> None:
    register_file_tools(
        mcp,
        resolve_path=_resolve_path,
-        before_write=_take_snapshot,
+        before_write=None,  # Git snapshot causes stdio deadlock on Windows; undo_changes limited
        project_root=PROJECT_ROOT,
    )

@@ -0,0 +1,120 @@
+"""
+Database Initialization Script Runner for AdenTestDB
+
+This script executes the SQL initialization file to create the AdenTestDB database.
+Make sure your SQL Server is running before executing this script.
+"""
+
+import os
+
+import pyodbc
+from dotenv import load_dotenv
+
+# Load environment variables from .env
+load_dotenv()
+
+# Database connection settings (from environment variables)
+SERVER = os.getenv("MSSQL_SERVER", r"MONSTER\MSSQLSERVERR")
+USERNAME = os.getenv("MSSQL_USERNAME")
+PASSWORD = os.getenv("MSSQL_PASSWORD")
+
+# SQL file path
+SQL_FILE = os.path.join(os.path.dirname(__file__), "init_aden_testdb.sql")
+
+
+def execute_sql_file():
+    """Execute the SQL initialization file."""
+    connection = None
+
+    try:
+        # Read SQL file
+        if not os.path.exists(SQL_FILE):
+            print(f"[ERROR] SQL file not found: {SQL_FILE}")
+            return False
+
+        with open(SQL_FILE, encoding="utf-8") as f:
+            sql_script = f.read()
+
+        print("=" * 70)
+        print("AdenTestDB Database Initialization")
+        print("=" * 70)
+        print(f"Server: {SERVER}")
+        print(f"SQL Script: {SQL_FILE}")
+        print()
+
+        # Connect to master database (to create new database)
+        connection_string = (
+            f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+            f"SERVER={SERVER};"
+            f"DATABASE=master;"
+            f"UID={USERNAME};"
+            f"PWD={PASSWORD};"
+        )
+
+        print("Connecting to SQL Server...")
+        connection = pyodbc.connect(connection_string)
+        connection.autocommit = True  # Required for CREATE DATABASE
+        cursor = connection.cursor()
+
+        print("[OK] Connected successfully!")
+        print()
+        print("Executing SQL script...")
+        print("-" * 70)
+
+        # Split by GO statements and execute each batch
+        batches = sql_script.split("\nGO\n")
+
+        for i, batch in enumerate(batches, 1):
+            batch = batch.strip()
+            if batch and not batch.startswith("--"):
+                try:
+                    cursor.execute(batch)
+                    # Print any messages from the server
+                    while cursor.nextset():
+                        pass
+                except pyodbc.Error as e:
+                    # Some statements might not return results, that's OK
+                    if "No results" not in str(e):
+                        print(f"Warning in batch {i}: {str(e)}")
+
+        print("-" * 70)
+        print()
+        print("=" * 70)
+        print("[SUCCESS] Database initialization completed successfully!")
+        print("=" * 70)
+        print()
+        print("Next steps:")
+        print("1. Run: python test_mssql_connection.py")
+        print("2. Verify the relational schema and sample data")
+        print()
+
+        return True
+
+    except pyodbc.Error as e:
+        print()
+        print("=" * 70)
+        print("[ERROR] Database initialization failed!")
+        print("=" * 70)
+        print(f"Error detail: {str(e)}")
+        print()
+        print("Possible solutions:")
+        print("1. Ensure SQL Server is running")
+        print("2. Check server name, username, and password")
+        print("3. Ensure you have permission to create databases")
+        print("4. Verify ODBC Driver 17 for SQL Server is installed")
+        print()
+        return False
+
+    except Exception as e:
+        print(f"\n[ERROR] Unexpected error: {str(e)}")
+        return False
+
+    finally:
+        if connection:
+            connection.close()
+            print("Connection closed.")
+
+
+if __name__ == "__main__":
+    success = execute_sql_file()
+    exit(0 if success else 1)
@@ -0,0 +1,134 @@
+"""
+Grant Permissions to AdenTestDB
+
+This script grants the necessary permissions to the 'sa' user to access AdenTE testDB.
+"""
+
+import pyodbc
+
+SERVER = r"MONSTER\MSSQLSERVERR"
+USERNAME = "sa"
+PASSWORD = "622622aA."
+
+
+def grant_permissions():
+    """Grant permissions to the database."""
+    connection = None
+
+    try:
+        # Connect to AdenTestDB
+        connection_string = (
+            f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+            f"SERVER={SERVER};"
+            f"DATABASE=AdenTestDB;"
+            f"UID={USERNAME};"
+            f"PWD={PASSWORD};"
+            f"TrustServerCertificate=yes;"
+        )
+
+        print("=" * 70)
+        print("Granting Permissions to AdenTestDB")
+        print("=" * 70)
+        print(f"Server: {SERVER}")
+        print()
+
+        print("Connecting to database...")
+        connection = pyodbc.connect(connection_string)
+        cursor = connection.cursor()
+
+        print("[OK] Connected successfully!")
+        print()
+
+        # Grant permissions
+        print("Granting permissions...")
+
+        try:
+            cursor.execute("GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA::dbo TO sa")
+            print("[OK] Granted schema permissions to sa")
+        except pyodbc.Error as e:
+            print(f"Note: {str(e)}")
+
+        connection.commit()
+
+        print()
+        print("=" * 70)
+        print("[SUCCESS] Permissions granted!")
+        print("=" * 70)
+        print()
+        print("You can now run: python test_mssql_connection.py")
+
+        return True
+
+    except pyodbc.Error:
+        # If we can't connect, try connecting to master and creating user
+        try:
+            connection_string = (
+                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+                f"SERVER={SERVER};"
+                f"DATABASE=master;"
+                f"UID={USERNAME};"
+                f"PWD={PASSWORD};"
+                f"TrustServerCertificate=yes;"
+            )
+
+            print("Attempting to grant permissions via master database...")
+            connection = pyodbc.connect(connection_string)
+            cursor = connection.cursor()
+
+            # Create login if not exists
+            try:
+                cursor.execute(f"""
+                IF NOT EXISTS (SELECT * FROM sys.server_principals WHERE name = 'sa')
+                BEGIN
+                    CREATE LOGIN sa WITH PASSWORD = '{PASSWORD}'
+                END
+                """)
+            except Exception:
+                pass
+
+            # Switch to AdenTestDB and grant permissions
+            cursor.execute("USE AdenTestDB")
+
+            # Create user if not exists
+            try:
+                cursor.execute("""
+                IF NOT EXISTS (SELECT * FROM sys.database_principals WHERE name = 'sa')
+                BEGIN
+                    CREATE USER sa FOR LOGIN sa
+                END
+                """)
+                print("[OK] Created database user")
+            except Exception:
+                pass
+
+            # Grant permissions
+            cursor.execute("ALTER ROLE db_datareader ADD MEMBER sa")
+            cursor.execute("ALTER ROLE db_datawriter ADD MEMBER sa")
+
+            connection.commit()
+
+            print("[OK] Permissions granted successfully!")
+            return True
+
+        except Exception as inner_e:
+            print("\n[ERROR] Could not grant permissions!")
+            print(f"Error: {str(inner_e)}")
+            print()
+            print("The database was created successfully, but there's a permission issue.")
+            print("Please run this SQL command in SQL Server Management Studio:")
+            print()
+            print("USE AdenTestDB;")
+            print("GO")
+            print("ALTER ROLE db_datareader ADD MEMBER sa;")
+            print("ALTER ROLE db_datawriter ADD MEMBER sa;")
+            print("GO")
+            return False
+
+    finally:
+        if connection:
+            connection.close()
+            print("\nConnection closed.")
+
+
+if __name__ == "__main__":
+    grant_permissions()
@@ -0,0 +1,183 @@
+-- ============================================================================
+-- AdenTestDB Database Initialization Script
+-- ============================================================================
+-- Purpose: Create a professional testing database for Aden Hive MSSQL tool
+-- Author: Database Architect
+-- Date: 2026-02-08
+-- ============================================================================
+
+USE master;
+GO
+
+-- Drop database if exists (for clean recreation)
+IF EXISTS (SELECT name FROM sys.databases WHERE name = N'AdenTestDB')
+BEGIN
+    ALTER DATABASE AdenTestDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
+    DROP DATABASE AdenTestDB;
+    PRINT 'Existing AdenTestDB dropped successfully.';
+END
+GO
+
+-- Create new database
+CREATE DATABASE AdenTestDB;
+GO
+
+PRINT 'AdenTestDB created successfully.';
+GO
+
+USE AdenTestDB;
+GO
+
+-- ============================================================================
+-- TABLE: Departments
+-- ============================================================================
+-- Purpose: Store department information with budget tracking
+-- ============================================================================
+
+CREATE TABLE Departments (
+    department_id   INT IDENTITY(1,1) NOT NULL,
+    name            NVARCHAR(100) NOT NULL,
+    budget          DECIMAL(15,2) NOT NULL,
+    created_date    DATETIME NOT NULL DEFAULT GETDATE(),
+
+    CONSTRAINT PK_Departments PRIMARY KEY (department_id),
+    CONSTRAINT UK_Departments_Name UNIQUE (name),
+    CONSTRAINT CK_Departments_Budget CHECK (budget >= 0)
+);
+GO
+
+-- Create index for performance optimization
+CREATE INDEX IX_Departments_Name ON Departments(name);
+GO
+
+PRINT 'Departments table created successfully.';
+GO
+
+-- ============================================================================
+-- TABLE: Employees
+-- ============================================================================
+-- Purpose: Store employee information with department association
+-- ============================================================================
+
+CREATE TABLE Employees (
+    employee_id     INT IDENTITY(1000,1) NOT NULL,
+    first_name      NVARCHAR(50) NOT NULL,
+    last_name       NVARCHAR(50) NOT NULL,
+    email           NVARCHAR(100) NOT NULL,
+    salary          DECIMAL(12,2) NOT NULL,
+    hire_date       DATETIME NOT NULL,
+    department_id   INT NOT NULL,
+
+    CONSTRAINT PK_Employees PRIMARY KEY (employee_id),
+    CONSTRAINT UK_Employees_Email UNIQUE (email),
+    CONSTRAINT CK_Employees_Salary CHECK (salary >= 0),
+    CONSTRAINT FK_Employees_Departments
+        FOREIGN KEY (department_id) REFERENCES Departments(department_id)
+        ON DELETE CASCADE
+        ON UPDATE CASCADE
+);
+GO
+
+-- Create indexes for performance optimization
+CREATE INDEX IX_Employees_DepartmentId ON Employees(department_id);
+CREATE INDEX IX_Employees_LastName ON Employees(last_name);
+CREATE INDEX IX_Employees_Email ON Employees(email);
+GO
+
+PRINT 'Employees table created successfully.';
+GO
+
+-- ============================================================================
+-- SAMPLE DATA: Departments
+-- ============================================================================
+
+INSERT INTO Departments (name, budget, created_date) VALUES
+    ('Engineering', 2500000.00, '2023-01-15'),
+    ('Human Resources', 800000.00, '2023-01-15'),
+    ('Sales', 1500000.00, '2023-01-20'),
+    ('Marketing', 1200000.00, '2023-02-01'),
+    ('Finance', 1000000.00, '2023-02-10');
+GO
+
+PRINT 'Sample departments inserted successfully.';
+GO
+
+-- ============================================================================
+-- SAMPLE DATA: Employees
+-- ============================================================================
+
+INSERT INTO Employees (first_name, last_name, email, salary, hire_date, department_id) VALUES
+    -- Engineering Department (ID: 1)
+    ('John', 'Smith', 'john.smith@adenhive.com', 120000.00, '2023-03-01', 1),
+    ('Sarah', 'Johnson', 'sarah.johnson@adenhive.com', 115000.00, '2023-03-15', 1),
+    ('Michael', 'Chen', 'michael.chen@adenhive.com', 125000.00, '2023-04-01', 1),
+    ('Emily', 'Rodriguez', 'emily.rodriguez@adenhive.com', 110000.00, '2023-05-10', 1),
+    ('David', 'Kim', 'david.kim@adenhive.com', 105000.00, '2024-01-15', 1),
+
+    -- Human Resources Department (ID: 2)
+    ('Lisa', 'Anderson', 'lisa.anderson@adenhive.com', 85000.00, '2023-02-20', 2),
+    ('James', 'Wilson', 'james.wilson@adenhive.com', 80000.00, '2023-06-01', 2),
+
+    -- Sales Department (ID: 3)
+    ('Jennifer', 'Taylor', 'jennifer.taylor@adenhive.com', 95000.00, '2023-04-15', 3),
+    ('Robert', 'Martinez', 'robert.martinez@adenhive.com', 90000.00, '2023-05-01', 3),
+    ('Amanda', 'Garcia', 'amanda.garcia@adenhive.com', 92000.00, '2023-07-20', 3),
+
+    -- Marketing Department (ID: 4)
+    ('Christopher', 'Lee', 'christopher.lee@adenhive.com', 88000.00, '2023-03-10', 4),
+    ('Michelle', 'White', 'michelle.white@adenhive.com', 86000.00, '2023-08-01', 4),
+    ('Kevin', 'Brown', 'kevin.brown@adenhive.com', 84000.00, '2024-02-01', 4),
+
+    -- Finance Department (ID: 5)
+    ('Jessica', 'Davis', 'jessica.davis@adenhive.com', 98000.00, '2023-02-15', 5),
+    ('Daniel', 'Miller', 'daniel.miller@adenhive.com', 95000.00, '2023-09-01', 5);
+GO
+
+PRINT 'Sample employees inserted successfully.';
+GO
+
+-- ============================================================================
+-- VERIFICATION QUERIES
+-- ============================================================================
+
+PRINT '';
+PRINT '============================================================';
+PRINT 'Database Setup Summary';
+PRINT '============================================================';
+
+-- Count departments
+DECLARE @DeptCount INT;
+SELECT @DeptCount = COUNT(*) FROM Departments;
+PRINT 'Total Departments: ' + CAST(@DeptCount AS NVARCHAR(10));
+
+-- Count employees
+DECLARE @EmpCount INT;
+SELECT @EmpCount = COUNT(*) FROM Employees;
+PRINT 'Total Employees: ' + CAST(@EmpCount AS NVARCHAR(10));
+
+-- Show department summary
+PRINT '';
+PRINT 'Department Summary:';
+PRINT '------------------------------------------------------------';
+SELECT
+    d.name AS Department,
+    COUNT(e.employee_id) AS Employees,
+    d.budget AS Budget,
+    FORMAT(d.budget / NULLIF(COUNT(e.employee_id), 0), 'C', 'en-US') AS BudgetPerEmployee
+FROM Departments d
+LEFT JOIN Employees e ON d.department_id = e.department_id
+GROUP BY d.name, d.budget
+ORDER BY d.name;
+GO
+
+PRINT '';
+PRINT '============================================================';
+PRINT 'AdenTestDB initialization completed successfully!';
+PRINT '============================================================';
+PRINT '';
+PRINT 'Next Steps:';
+PRINT '1. Run: python test_mssql_connection.py';
+PRINT '2. Verify JOIN queries work correctly';
+PRINT '3. Test relational integrity';
+PRINT '============================================================';
+GO
@@ -0,0 +1,208 @@
+"""
+Payroll Analysis Tool
+Analyzes total payroll costs by department and identifies highest-paid employee
+"""
+
+import io
+import os
+import sys
+
+import pyodbc
+from dotenv import load_dotenv
+
+# Force UTF-8 encoding for console output
+sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Database connection settings (from environment variables)
+SERVER = os.getenv("MSSQL_SERVER", r"MONSTER\MSSQLSERVERR")
+DATABASE = os.getenv("MSSQL_DATABASE", "AdenTestDB")
+USERNAME = os.getenv("MSSQL_USERNAME")
+PASSWORD = os.getenv("MSSQL_PASSWORD")
+
+
+def main():
+    """Main analysis function."""
+    connection = None
+
+    try:
+        print("=" * 80)
+        print("  COMPANY PAYROLL ANALYSIS")
+        print("=" * 80)
+        print(f"Server: {SERVER}")
+        print(f"Database: {DATABASE}")
+        print()
+
+        # Connect to database
+        if USERNAME and PASSWORD:
+            # SQL Server Authentication
+            connection_string = (
+                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+                f"SERVER={SERVER};"
+                f"DATABASE={DATABASE};"
+                f"UID={USERNAME};"
+                f"PWD={PASSWORD};"
+            )
+        else:
+            # Windows Authentication
+            connection_string = (
+                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+                f"SERVER={SERVER};"
+                f"DATABASE={DATABASE};"
+                f"Trusted_Connection=yes;"
+            )
+
+        print("Connecting to database...")
+        connection = pyodbc.connect(connection_string)
+        cursor = connection.cursor()
+        print("✓ Connection successful!")
+        print()
+
+        # Analysis 1: Total Payroll by Department
+        print("=" * 80)
+        print("  TOTAL SALARY COSTS BY DEPARTMENT")
+        print("=" * 80)
+
+        payroll_query = """
+        SELECT
+            d.name AS department_name,
+            COUNT(e.employee_id) AS employee_count,
+            SUM(e.salary) AS total_salary_cost,
+            AVG(e.salary) AS avg_salary
+        FROM Departments d
+        LEFT JOIN Employees e ON d.department_id = e.department_id
+        GROUP BY d.name
+        ORDER BY total_salary_cost DESC
+        """
+
+        cursor.execute(payroll_query)
+
+        print(
+            f"\n{'Department':<25} {'Employees':<12} {'Total Salary Cost':<20} {'Avg Salary':<15}"
+        )
+        print("-" * 80)
+
+        total_company_payroll = 0
+        total_employees = 0
+
+        for row in cursor:
+            dept_name = row[0]
+            emp_count = row[1]
+            total_salary = row[2] if row[2] else 0
+            avg_salary = row[3] if row[3] else 0
+
+            total_company_payroll += total_salary
+            total_employees += emp_count
+
+            total_salary_str = f"${total_salary:,.2f}"
+            avg_salary_str = f"${avg_salary:,.2f}" if avg_salary > 0 else "N/A"
+
+            print(f"{dept_name:<25} {emp_count:<12} {total_salary_str:<20} {avg_salary_str:<15}")
+
+        print("-" * 80)
+        print(f"{'TOTAL COMPANY':<25} {total_employees:<12} ${total_company_payroll:,.2f}")
+        print("-" * 80)
+        print()
+
+        # Analysis 2: Highest Paid Employee
+        print("=" * 80)
+        print("  HIGHEST PAID EMPLOYEE")
+        print("=" * 80)
+
+        highest_paid_query = """
+        SELECT TOP 1
+            e.employee_id,
+            e.first_name + ' ' + e.last_name AS full_name,
+            e.email,
+            e.salary,
+            d.name AS department_name
+        FROM Employees e
+        INNER JOIN Departments d ON e.department_id = d.department_id
+        ORDER BY e.salary DESC
+        """
+
+        cursor.execute(highest_paid_query)
+        top_employee = cursor.fetchone()
+
+        if top_employee:
+            print(f"\n{'Field':<20} {'Value':<50}")
+            print("-" * 80)
+            print(f"{'Employee ID':<20} {top_employee[0]}")
+            print(f"{'Name':<20} {top_employee[1]}")
+            print(f"{'Email':<20} {top_employee[2]}")
+            print(f"{'Department':<20} {top_employee[4]}")
+            print(f"{'Salary':<20} ${top_employee[3]:,.2f}")
+            print("-" * 80)
+        else:
+            print("\nNo employees found in the database.")
+
+        print()
+
+        # Additional Analysis: Top 5 Highest Paid Employees
+        print("=" * 80)
+        print("  TOP 5 HIGHEST PAID EMPLOYEES")
+        print("=" * 80)
+
+        top_5_query = """
+        SELECT TOP 5
+            e.first_name + ' ' + e.last_name AS full_name,
+            d.name AS department_name,
+            e.salary
+        FROM Employees e
+        INNER JOIN Departments d ON e.department_id = d.department_id
+        ORDER BY e.salary DESC
+        """
+
+        cursor.execute(top_5_query)
+
+        print(f"\n{'Rank':<6} {'Name':<30} {'Department':<25} {'Salary':<15}")
+        print("-" * 80)
+
+        rank = 1
+        for row in cursor:
+            full_name = row[0]
+            dept_name = row[1]
+            salary = row[2]
+
+            print(f"{rank:<6} {full_name:<30} {dept_name:<25} ${salary:,.2f}")
+            rank += 1
+
+        print("-" * 80)
+        print()
+
+        # Summary
+        print("=" * 80)
+        print("  ANALYSIS SUMMARY")
+        print("=" * 80)
+        print(f"✓ Total Employees: {total_employees}")
+        print(f"✓ Total Company Payroll: ${total_company_payroll:,.2f}")
+        print(
+            f"✓ Average Employee Salary: ${total_company_payroll / total_employees:,.2f}"
+            if total_employees > 0
+            else "N/A"
+        )
+        print("=" * 80)
+        print("\nPayroll analysis completed successfully!")
+
+    except pyodbc.Error as e:
+        print("\n[ERROR] Database operation failed!")
+        print(f"Error detail: {str(e)}")
+        print()
+        print("Possible solutions:")
+        print("1. Ensure SQL Server is running")
+        print("2. Verify database access permissions")
+        print("3. Check connection string configuration")
+
+    except Exception as e:
+        print(f"\n[ERROR] Unexpected error: {str(e)}")
+
+    finally:
+        if connection:
+            connection.close()
+            print("\nConnection closed.")
+
+
+if __name__ == "__main__":
+    main()
@@ -31,6 +31,7 @@ dependencies = [
    "litellm>=1.81.0",
    "dnspython>=2.4.0",
    "resend>=2.0.0",
+    "asana>=3.2.0",
    "google-analytics-data>=0.18.0",
    "framework",
    "stripe>=14.3.0",
@@ -60,6 +61,10 @@ sql = [
 bigquery = [
    "google-cloud-bigquery>=3.0.0",
 ]
+databricks = [
+    "databricks-sdk>=0.30.0",
+    "databricks-mcp>=0.1.0",
+]
 all = [
    "RestrictedPython>=7.0",
    "pytesseract>=0.3.10",
@@ -67,6 +72,8 @@ all = [
    "duckdb>=1.0.0",
    "openpyxl>=3.1.0",
    "google-cloud-bigquery>=3.0.0",
+    "databricks-sdk>=0.30.0",
+    "databricks-mcp>=0.1.0",
 ]

 [tool.uv.sources]
@@ -107,6 +114,10 @@ lint.isort.section-order = [
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 asyncio_mode = "auto"
+addopts = "-m 'not live'"
+markers = [
+    "live: Tests that call real external APIs (require credentials, never run in CI)",
+]

 [dependency-groups]
 dev = [
@@ -0,0 +1,117 @@
+"""
+Query Average Salary by Department
+"""
+
+import io
+import os
+import sys
+
+import pyodbc
+from dotenv import load_dotenv
+
+# Force UTF-8 encoding for console output
+sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Database connection settings (from environment variables)
+SERVER = os.getenv("MSSQL_SERVER", r"MONSTER\\MSSQLSERVERR")
+DATABASE = os.getenv("MSSQL_DATABASE", "AdenTestDB")
+USERNAME = os.getenv("MSSQL_USERNAME")
+PASSWORD = os.getenv("MSSQL_PASSWORD")
+
+
+def main():
+    """Query and display average salary by department."""
+    connection = None
+
+    try:
+        # Connect to database
+        if USERNAME and PASSWORD:
+            # SQL Server Authentication
+            connection_string = (
+                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+                f"SERVER={SERVER};"
+                f"DATABASE={DATABASE};"
+                f"UID={USERNAME};"
+                f"PWD={PASSWORD};"
+            )
+        else:
+            # Windows Authentication
+            connection_string = (
+                f"DRIVER={{ODBC Driver 17 for SQL Server}};"
+                f"SERVER={SERVER};"
+                f"DATABASE={DATABASE};"
+                f"Trusted_Connection=yes;"
+            )
+
+        connection = pyodbc.connect(connection_string)
+        cursor = connection.cursor()
+
+        # Query to get average salary by department, sorted by average salary descending
+        query = """
+        SELECT
+            d.name AS department,
+            AVG(e.salary) AS avg_salary,
+            COUNT(e.employee_id) AS emp_count
+        FROM Departments d
+        LEFT JOIN Employees e ON d.department_id = e.department_id
+        WHERE e.salary IS NOT NULL
+        GROUP BY d.name
+        ORDER BY avg_salary DESC
+        """
+
+        cursor.execute(query)
+        results = cursor.fetchall()
+
+        if not results:
+            print("No salary data found.")
+            return
+
+        # Get the highest average salary for highlighting
+        highest_avg = results[0][1] if results else 0
+
+        print("=" * 80)
+        print("  AVERAGE SALARY BY DEPARTMENT (Sorted Highest to Lowest)")
+        print("=" * 80)
+        print()
+        print(f"{'Rank':<6} {'Department':<25} {'Avg Salary':<20} {'Employees':<12}")
+        print("-" * 80)
+
+        for idx, row in enumerate(results, 1):
+            department = row[0]
+            avg_salary = row[1]
+            emp_count = row[2]
+
+            avg_salary_str = f"${avg_salary:,.2f}"
+
+            # Highlight the department with the highest average
+            if avg_salary == highest_avg:
+                # Use special formatting for the highest
+                prefix = f"{'>>> ' + str(idx):<6}"
+                print(f"{prefix} {department:<25} {avg_salary_str:<20} {emp_count:<12} ⭐ HIGHEST")
+            else:
+                print(f"{idx:<6} {department:<25} {avg_salary_str:<20} {emp_count:<12}")
+
+        print("-" * 80)
+        print()
+        print("📊 Summary:")
+        print(f"   • Total departments with employees: {len(results)}")
+        print(f"   • Highest average salary: ${highest_avg:,.2f} ({results[0][0]})")
+        print(f"   • Lowest average salary: ${results[-1][1]:,.2f} ({results[-1][0]})")
+        print("=" * 80)
+
+    except pyodbc.Error as e:
+        print(f"\n[ERROR] Database operation failed: {str(e)}")
+
+    except Exception as e:
+        print(f"\n[ERROR] Unexpected error: {str(e)}")
+
+    finally:
+        if connection:
+            connection.close()
+
+
+if __name__ == "__main__":
+    main()
@@ -33,7 +33,6 @@ Usage:
    })

 Credential categories:
- llm.py: LLM provider credentials (anthropic, openai, etc.)
 - search.py: Search tool credentials (brave_search, google_search, etc.)
 - email.py: Email provider credentials (resend, google/gmail)
 - apollo.py: Apollo.io API credentials
@@ -55,20 +54,35 @@ To add a new credential:
 3. If new category, import and merge it in this __init__.py
 """

+from .airtable import AIRTABLE_CREDENTIALS
+from .apify import APIFY_CREDENTIALS
 from .apollo import APOLLO_CREDENTIALS
+from .asana import ASANA_CREDENTIALS
+from .attio import ATTIO_CREDENTIALS
+from .aws_s3 import AWS_S3_CREDENTIALS
+from .azure_sql import AZURE_SQL_CREDENTIALS
 from .base import CredentialError, CredentialSpec
 from .bigquery import BIGQUERY_CREDENTIALS
 from .brevo import BREVO_CREDENTIALS
 from .browser import get_aden_auth_url, get_aden_setup_url, open_browser
 from .calcom import CALCOM_CREDENTIALS
+from .calendly import CALENDLY_CREDENTIALS
+from .cloudinary import CLOUDINARY_CREDENTIALS
+from .confluence import CONFLUENCE_CREDENTIALS
+from .databricks import DATABRICKS_CREDENTIALS
 from .discord import DISCORD_CREDENTIALS
+from .docker_hub import DOCKER_HUB_CREDENTIALS
 from .email import EMAIL_CREDENTIALS
 from .gcp_vision import GCP_VISION_CREDENTIALS
 from .github import GITHUB_CREDENTIALS
+from .gitlab import GITLAB_CREDENTIALS
 from .google_analytics import GOOGLE_ANALYTICS_CREDENTIALS
 from .google_calendar import GOOGLE_CALENDAR_CREDENTIALS
 from .google_docs import GOOGLE_DOCS_CREDENTIALS
 from .google_maps import GOOGLE_MAPS_CREDENTIALS
+from .google_search_console import GOOGLE_SEARCH_CONSOLE_CREDENTIALS
+from .google_sheets import GOOGLE_SHEETS_CREDENTIALS
+from .greenhouse import GREENHOUSE_CREDENTIALS
 from .health_check import (
    BaseHttpHealthChecker,
    HealthCheckResult,
@@ -76,11 +90,33 @@ from .health_check import (
    validate_integration_wiring,
 )
 from .hubspot import HUBSPOT_CREDENTIALS
+from .huggingface import HUGGINGFACE_CREDENTIALS
 from .intercom import INTERCOM_CREDENTIALS
-from .llm import LLM_CREDENTIALS
+from .jira import JIRA_CREDENTIALS
+from .kafka import KAFKA_CREDENTIALS
+from .langfuse import LANGFUSE_CREDENTIALS
+from .linear import LINEAR_CREDENTIALS
+from .lusha import LUSHA_CREDENTIALS
+from .microsoft_graph import MICROSOFT_GRAPH_CREDENTIALS
+from .mongodb import MONGODB_CREDENTIALS
+from .n8n import N8N_CREDENTIALS
 from .news import NEWS_CREDENTIALS
+from .notion import NOTION_CREDENTIALS
+from .obsidian import OBSIDIAN_CREDENTIALS
+from .pagerduty import PAGERDUTY_CREDENTIALS
+from .pinecone import PINECONE_CREDENTIALS
+from .pipedrive import PIPEDRIVE_CREDENTIALS
+from .plaid import PLAID_CREDENTIALS
 from .postgres import POSTGRES_CREDENTIALS
+from .powerbi import POWERBI_CREDENTIALS
+from .pushover import PUSHOVER_CREDENTIALS
+from .quickbooks import QUICKBOOKS_CREDENTIALS
 from .razorpay import RAZORPAY_CREDENTIALS
+from .reddit import REDDIT_CREDENTIALS
+from .redis import REDIS_CREDENTIALS
+from .redshift import REDSHIFT_CREDENTIALS
+from .salesforce import SALESFORCE_CREDENTIALS
+from .sap import SAP_CREDENTIALS
 from .search import SEARCH_CREDENTIALS
 from .serpapi import SERPAPI_CREDENTIALS
 from .shell_config import (
@@ -89,26 +125,48 @@ from .shell_config import (
    get_shell_config_path,
    get_shell_source_command,
 )
+from .shopify import SHOPIFY_CREDENTIALS
 from .slack import SLACK_CREDENTIALS
+from .snowflake import SNOWFLAKE_CREDENTIALS
 from .store_adapter import CredentialStoreAdapter
 from .stripe import STRIPE_CREDENTIALS
+from .supabase import SUPABASE_CREDENTIALS
 from .telegram import TELEGRAM_CREDENTIALS
+from .terraform import TERRAFORM_CREDENTIALS
+from .tines import TINES_CREDENTIALS
+from .trello import TRELLO_CREDENTIALS
+from .twilio import TWILIO_CREDENTIALS
+from .twitter import TWITTER_CREDENTIALS
+from .vercel import VERCEL_CREDENTIALS
+from .youtube import YOUTUBE_CREDENTIALS
+from .zendesk import ZENDESK_CREDENTIALS
+from .zoho_crm import ZOHO_CRM_CREDENTIALS
+from .zoom import ZOOM_CREDENTIALS

 # Merged registry of all credentials
 CREDENTIAL_SPECS = {
-    **LLM_CREDENTIALS,
+    **AIRTABLE_CREDENTIALS,
    **NEWS_CREDENTIALS,
    **SEARCH_CREDENTIALS,
    **EMAIL_CREDENTIALS,
    **GCP_VISION_CREDENTIALS,
+    **APIFY_CREDENTIALS,
+    **AWS_S3_CREDENTIALS,
+    **ASANA_CREDENTIALS,
    **APOLLO_CREDENTIALS,
+    **ATTIO_CREDENTIALS,
    **DISCORD_CREDENTIALS,
    **GITHUB_CREDENTIALS,
    **GOOGLE_ANALYTICS_CREDENTIALS,
    **GOOGLE_DOCS_CREDENTIALS,
    **GOOGLE_MAPS_CREDENTIALS,
+    **GOOGLE_SEARCH_CONSOLE_CREDENTIALS,
+    **HUGGINGFACE_CREDENTIALS,
    **HUBSPOT_CREDENTIALS,
    **INTERCOM_CREDENTIALS,
+    **LINEAR_CREDENTIALS,
+    **MONGODB_CREDENTIALS,
+    **PAGERDUTY_CREDENTIALS,
    **GOOGLE_CALENDAR_CREDENTIALS,
    **SLACK_CREDENTIALS,
    **SERPAPI_CREDENTIALS,
@@ -116,9 +174,50 @@ CREDENTIAL_SPECS = {
    **TELEGRAM_CREDENTIALS,
    **BIGQUERY_CREDENTIALS,
    **CALCOM_CREDENTIALS,
+    **CALENDLY_CREDENTIALS,
+    **DATABRICKS_CREDENTIALS,
+    **DOCKER_HUB_CREDENTIALS,
+    **PIPEDRIVE_CREDENTIALS,
    **STRIPE_CREDENTIALS,
    **BREVO_CREDENTIALS,
    **POSTGRES_CREDENTIALS,
+    **QUICKBOOKS_CREDENTIALS,
+    **MICROSOFT_GRAPH_CREDENTIALS,
+    **PUSHOVER_CREDENTIALS,
+    **REDIS_CREDENTIALS,
+    **SUPABASE_CREDENTIALS,
+    **VERCEL_CREDENTIALS,
+    **YOUTUBE_CREDENTIALS,
+    **PINECONE_CREDENTIALS,
+    **PLAID_CREDENTIALS,
+    **TRELLO_CREDENTIALS,
+    **CONFLUENCE_CREDENTIALS,
+    **CLOUDINARY_CREDENTIALS,
+    **GITLAB_CREDENTIALS,
+    **GOOGLE_SHEETS_CREDENTIALS,
+    **GREENHOUSE_CREDENTIALS,
+    **JIRA_CREDENTIALS,
+    **NOTION_CREDENTIALS,
+    **REDDIT_CREDENTIALS,
+    **TINES_CREDENTIALS,
+    **TWITTER_CREDENTIALS,
+    **TWILIO_CREDENTIALS,
+    **ZENDESK_CREDENTIALS,
+    **ZOHO_CRM_CREDENTIALS,
+    **TERRAFORM_CREDENTIALS,
+    **LUSHA_CREDENTIALS,
+    **POWERBI_CREDENTIALS,
+    **SNOWFLAKE_CREDENTIALS,
+    **AZURE_SQL_CREDENTIALS,
+    **KAFKA_CREDENTIALS,
+    **REDSHIFT_CREDENTIALS,
+    **SAP_CREDENTIALS,
+    **SALESFORCE_CREDENTIALS,
+    **SHOPIFY_CREDENTIALS,
+    **ZOOM_CREDENTIALS,
+    **N8N_CREDENTIALS,
+    **LANGFUSE_CREDENTIALS,
+    **OBSIDIAN_CREDENTIALS,
 }

 __all__ = [
@@ -145,7 +244,7 @@ __all__ = [
    # Merged registry
    "CREDENTIAL_SPECS",
    # Category registries (for direct access if needed)
-    "LLM_CREDENTIALS",
+    "AIRTABLE_CREDENTIALS",
    "NEWS_CREDENTIALS",
    "SEARCH_CREDENTIALS",
    "EMAIL_CREDENTIALS",
@@ -154,18 +253,68 @@ __all__ = [
    "GOOGLE_ANALYTICS_CREDENTIALS",
    "GOOGLE_DOCS_CREDENTIALS",
    "GOOGLE_MAPS_CREDENTIALS",
+    "GOOGLE_SEARCH_CONSOLE_CREDENTIALS",
+    "HUGGINGFACE_CREDENTIALS",
    "HUBSPOT_CREDENTIALS",
    "INTERCOM_CREDENTIALS",
+    "LINEAR_CREDENTIALS",
+    "MONGODB_CREDENTIALS",
+    "PAGERDUTY_CREDENTIALS",
    "GOOGLE_CALENDAR_CREDENTIALS",
    "SLACK_CREDENTIALS",
+    "APIFY_CREDENTIALS",
+    "AWS_S3_CREDENTIALS",
+    "ASANA_CREDENTIALS",
    "APOLLO_CREDENTIALS",
+    "ATTIO_CREDENTIALS",
    "SERPAPI_CREDENTIALS",
    "RAZORPAY_CREDENTIALS",
    "TELEGRAM_CREDENTIALS",
    "BIGQUERY_CREDENTIALS",
    "CALCOM_CREDENTIALS",
+    "CALENDLY_CREDENTIALS",
+    "DATABRICKS_CREDENTIALS",
    "DISCORD_CREDENTIALS",
+    "DOCKER_HUB_CREDENTIALS",
+    "PIPEDRIVE_CREDENTIALS",
    "STRIPE_CREDENTIALS",
    "BREVO_CREDENTIALS",
    "POSTGRES_CREDENTIALS",
+    "QUICKBOOKS_CREDENTIALS",
+    "MICROSOFT_GRAPH_CREDENTIALS",
+    "PUSHOVER_CREDENTIALS",
+    "REDIS_CREDENTIALS",
+    "SUPABASE_CREDENTIALS",
+    "VERCEL_CREDENTIALS",
+    "YOUTUBE_CREDENTIALS",
+    "PINECONE_CREDENTIALS",
+    "PLAID_CREDENTIALS",
+    "TRELLO_CREDENTIALS",
+    "CONFLUENCE_CREDENTIALS",
+    "CLOUDINARY_CREDENTIALS",
+    "GITLAB_CREDENTIALS",
+    "GOOGLE_SHEETS_CREDENTIALS",
+    "GREENHOUSE_CREDENTIALS",
+    "JIRA_CREDENTIALS",
+    "NOTION_CREDENTIALS",
+    "REDDIT_CREDENTIALS",
+    "TINES_CREDENTIALS",
+    "TWITTER_CREDENTIALS",
+    "TWILIO_CREDENTIALS",
+    "ZENDESK_CREDENTIALS",
+    "ZOHO_CRM_CREDENTIALS",
+    "TERRAFORM_CREDENTIALS",
+    "LUSHA_CREDENTIALS",
+    "POWERBI_CREDENTIALS",
+    "SNOWFLAKE_CREDENTIALS",
+    "AZURE_SQL_CREDENTIALS",
+    "KAFKA_CREDENTIALS",
+    "REDSHIFT_CREDENTIALS",
+    "SAP_CREDENTIALS",
+    "SALESFORCE_CREDENTIALS",
+    "SHOPIFY_CREDENTIALS",
+    "ZOOM_CREDENTIALS",
+    "N8N_CREDENTIALS",
+    "LANGFUSE_CREDENTIALS",
+    "OBSIDIAN_CREDENTIALS",
 ]
@@ -0,0 +1,37 @@
+"""
+Airtable credentials.
+
+Contains credentials for the Airtable Web API.
+Requires AIRTABLE_PAT (Personal Access Token).
+"""
+
+from .base import CredentialSpec
+
+AIRTABLE_CREDENTIALS = {
+    "airtable_pat": CredentialSpec(
+        env_var="AIRTABLE_PAT",
+        tools=[
+            "airtable_list_records",
+            "airtable_get_record",
+            "airtable_create_records",
+            "airtable_update_records",
+            "airtable_list_bases",
+            "airtable_get_base_schema",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://airtable.com/create/tokens",
+        description="Airtable Personal Access Token",
+        direct_api_key_supported=True,
+        api_key_instructions="""To set up Airtable API access:
+1. Go to https://airtable.com/create/tokens
+2. Create a new Personal Access Token
+3. Grant scopes: data.records:read, data.records:write, schema.bases:read
+4. Select the bases to grant access to
+5. Set environment variable:
+   export AIRTABLE_PAT=your-personal-access-token""",
+        health_check_endpoint="",
+        credential_id="airtable_pat",
+        credential_key="api_key",
+    ),
+}
@@ -0,0 +1,34 @@
+"""
+Apify credentials.
+
+Contains credentials for Apify web scraping and automation platform.
+"""
+
+from .base import CredentialSpec
+
+APIFY_CREDENTIALS = {
+    "apify": CredentialSpec(
+        env_var="APIFY_API_TOKEN",
+        tools=[
+            "apify_run_actor",
+            "apify_get_run",
+            "apify_get_dataset_items",
+            "apify_list_actors",
+            "apify_list_runs",
+            "apify_get_kv_store_record",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://docs.apify.com/api/v2",
+        description="Apify API token for running web scraping actors and retrieving datasets",
+        direct_api_key_supported=True,
+        api_key_instructions="""To get an Apify API token:
+1. Go to https://console.apify.com/account/integrations
+2. Copy your personal API token
+3. Set the environment variable:
+   export APIFY_API_TOKEN=your-api-token""",
+        health_check_endpoint="https://api.apify.com/v2/users/me",
+        credential_id="apify",
+        credential_key="api_key",
+    ),
+}
@@ -0,0 +1,35 @@
+"""
+Asana credentials.
+
+Contains credentials for Asana task and project management.
+"""
+
+from .base import CredentialSpec
+
+ASANA_CREDENTIALS = {
+    "asana": CredentialSpec(
+        env_var="ASANA_ACCESS_TOKEN",
+        tools=[
+            "asana_list_workspaces",
+            "asana_list_projects",
+            "asana_list_tasks",
+            "asana_get_task",
+            "asana_create_task",
+            "asana_search_tasks",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://developers.asana.com/docs/personal-access-token",
+        description="Asana personal access token for task and project management",
+        direct_api_key_supported=True,
+        api_key_instructions="""To get an Asana personal access token:
+1. Go to https://app.asana.com/0/my-apps
+2. Click 'Create new token'
+3. Give it a name and copy the token
+4. Set the environment variable:
+   export ASANA_ACCESS_TOKEN=your-pat""",
+        health_check_endpoint="https://app.asana.com/api/1.0/users/me",
+        credential_id="asana",
+        credential_key="api_key",
+    ),
+}
@@ -0,0 +1,55 @@
+"""
+Attio tool credentials.
+
+Contains credentials for Attio CRM integration.
+"""
+
+from .base import CredentialSpec
+
+ATTIO_CREDENTIALS = {
+    "attio": CredentialSpec(
+        env_var="ATTIO_API_KEY",
+        tools=[
+            "attio_record_list",
+            "attio_record_get",
+            "attio_record_create",
+            "attio_record_update",
+            "attio_record_assert",
+            "attio_list_lists",
+            "attio_list_entries_get",
+            "attio_list_entry_create",
+            "attio_list_entry_delete",
+            "attio_task_create",
+            "attio_task_list",
+            "attio_task_get",
+            "attio_task_delete",
+            "attio_members_list",
+            "attio_member_get",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://attio.com/help/apps/other-apps/generating-an-api-key",
+        description="Attio API key for CRM integration",
+        # Auth method support
+        aden_supported=False,
+        direct_api_key_supported=True,
+        api_key_instructions="""To get an Attio API key:
+1. Go to Attio Settings > Developers > Access tokens
+2. Click "Generate new token"
+3. Name your token (e.g., "Hive Agent")
+4. Select required scopes:
+   - record_permission:read-write
+   - object_configuration:read
+   - list_entry:read-write
+   - list_configuration:read
+   - task:read-write
+   - user_management:read
+5. Copy the generated token""",
+        # Health check configuration
+        health_check_endpoint="https://api.attio.com/v2/workspace_members",
+        health_check_method="GET",
+        # Credential store mapping
+        credential_id="attio",
+        credential_key="api_key",
+    ),
+}
@@ -0,0 +1,57 @@
+"""
+AWS S3 credentials.
+
+Contains credentials for AWS S3 REST API with SigV4 signing.
+Requires AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
+"""
+
+from .base import CredentialSpec
+
+AWS_S3_CREDENTIALS = {
+    "aws_access_key": CredentialSpec(
+        env_var="AWS_ACCESS_KEY_ID",
+        tools=[
+            "s3_list_buckets",
+            "s3_list_objects",
+            "s3_get_object",
+            "s3_put_object",
+            "s3_delete_object",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html",
+        description="AWS Access Key ID for S3 API access",
+        direct_api_key_supported=True,
+        api_key_instructions="""To set up AWS S3 API access:
+1. Go to AWS IAM > Users > Security credentials
+2. Create a new access key
+3. Set environment variables:
+   export AWS_ACCESS_KEY_ID=your-access-key-id
+   export AWS_SECRET_ACCESS_KEY=your-secret-access-key
+   export AWS_REGION=us-east-1""",
+        health_check_endpoint="",
+        credential_id="aws_access_key",
+        credential_key="api_key",
+        credential_group="aws",
+    ),
+    "aws_secret_key": CredentialSpec(
+        env_var="AWS_SECRET_ACCESS_KEY",
+        tools=[
+            "s3_list_buckets",
+            "s3_list_objects",
+            "s3_get_object",
+            "s3_put_object",
+            "s3_delete_object",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html",
+        description="AWS Secret Access Key for S3 API access",
+        direct_api_key_supported=True,
+        api_key_instructions="""See AWS_ACCESS_KEY_ID instructions above.""",
+        health_check_endpoint="",
+        credential_id="aws_secret_key",
+        credential_key="api_key",
+        credential_group="aws",
+    ),
+}
@@ -0,0 +1,55 @@
+"""
+Azure SQL Database management credentials.
+
+Contains credentials for the Azure SQL REST API (management plane).
+Requires AZURE_SQL_ACCESS_TOKEN and AZURE_SUBSCRIPTION_ID.
+"""
+
+from .base import CredentialSpec
+
+AZURE_SQL_CREDENTIALS = {
+    "azure_sql_token": CredentialSpec(
+        env_var="AZURE_SQL_ACCESS_TOKEN",
+        tools=[
+            "azure_sql_list_servers",
+            "azure_sql_get_server",
+            "azure_sql_list_databases",
+            "azure_sql_get_database",
+            "azure_sql_list_firewall_rules",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://learn.microsoft.com/en-us/rest/api/sql/",
+        description="Azure Bearer token for SQL management API (scope: management.azure.com)",
+        direct_api_key_supported=True,
+        api_key_instructions="""To set up Azure SQL management API access:
+1. Register an app in Azure AD (Entra ID)
+2. Assign SQL DB Contributor or Reader role
+3. Obtain a token via client credentials flow (scope: https://management.azure.com/.default)
+4. Set environment variables:
+   export AZURE_SQL_ACCESS_TOKEN=your-bearer-token
+   export AZURE_SUBSCRIPTION_ID=your-subscription-id""",
+        health_check_endpoint="",
+        credential_id="azure_sql_token",
+        credential_key="api_key",
+    ),
+    "azure_subscription_id": CredentialSpec(
+        env_var="AZURE_SUBSCRIPTION_ID",
+        tools=[
+            "azure_sql_list_servers",
+            "azure_sql_get_server",
+            "azure_sql_list_databases",
+            "azure_sql_get_database",
+            "azure_sql_list_firewall_rules",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://learn.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id",
+        description="Azure subscription ID for resource management",
+        direct_api_key_supported=True,
+        api_key_instructions="""See AZURE_SQL_ACCESS_TOKEN instructions above.""",
+        health_check_endpoint="",
+        credential_id="azure_subscription_id",
+        credential_key="api_key",
+    ),
+}
@@ -1,8 +1,6 @@
 """
 Brevo tool credentials.
-
-Contains credentials for Brevo (formerly Sendinblue) transactional email,
-SMS, and contact management integration.
+Contains credentials for Brevo email and SMS integration.
 """

 from .base import CredentialSpec
@@ -16,26 +14,22 @@ BREVO_CREDENTIALS = {
            "brevo_create_contact",
            "brevo_get_contact",
            "brevo_update_contact",
+            "brevo_get_email_stats",
        ],
        required=True,
        startup_required=False,
        help_url="https://app.brevo.com/settings/keys/api",
        description="Brevo API key for transactional email, SMS, and contact management",
-        # Auth method support
        aden_supported=False,
        direct_api_key_supported=True,
        api_key_instructions="""To get a Brevo API key:
-1. Go to https://app.brevo.com and create an account (or sign in)
-2. Navigate to Settings > API Keys (or visit https://app.brevo.com/settings/keys/api)
-3. Click "Generate a new API key"
-4. Give it a name (e.g., "Hive Agent")
-5. Copy the API key (starts with xkeysib-)
-6. Store it securely - you won't be able to see it again!
-7. Note: For sending emails, you'll need a verified sender domain or email""",
-        # Health check configuration
+1. Sign up or log in at https://www.brevo.com
+2. Go to Settings → API Keys
+3. Click 'Generate a new API key'
+4. Give it a name (e.g., 'Hive Agent')
+5. Copy the API key and set it as BREVO_API_KEY""",
        health_check_endpoint="https://api.brevo.com/v3/account",
        health_check_method="GET",
-        # Credential store mapping
        credential_id="brevo",
        credential_key="api_key",
    ),
@@ -40,6 +40,7 @@ def open_browser(url: str) -> tuple[bool, str]:
                ["open", url],
                check=True,
                capture_output=True,
+                encoding="utf-8",
            )
            return True, "Opened in browser"

@@ -50,6 +51,7 @@ def open_browser(url: str) -> tuple[bool, str]:
                    ["xdg-open", url],
                    check=True,
                    capture_output=True,
+                    encoding="utf-8",
                )
                return True, "Opened in browser"
            except FileNotFoundError:
@@ -0,0 +1,34 @@
+"""
+Calendly credentials.
+
+Contains credentials for the Calendly API v2.
+Requires CALENDLY_PAT (Personal Access Token).
+"""
+
+from .base import CredentialSpec
+
+CALENDLY_CREDENTIALS = {
+    "calendly_pat": CredentialSpec(
+        env_var="CALENDLY_PAT",
+        tools=[
+            "calendly_get_current_user",
+            "calendly_list_event_types",
+            "calendly_list_scheduled_events",
+            "calendly_get_scheduled_event",
+            "calendly_list_invitees",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://developer.calendly.com/how-to-authenticate-with-personal-access-tokens",
+        description="Calendly Personal Access Token",
+        direct_api_key_supported=True,
+        api_key_instructions="""To set up Calendly API access:
+1. Go to https://calendly.com/integrations/api_webhooks
+2. Generate a Personal Access Token
+3. Set environment variable:
+   export CALENDLY_PAT=your-personal-access-token""",
+        health_check_endpoint="https://api.calendly.com/users/me",
+        credential_id="calendly_pat",
+        credential_key="api_key",
+    ),
+}
@@ -0,0 +1,74 @@
+"""
+Cloudinary credentials.
+
+Contains credentials for Cloudinary image/video management.
+Requires CLOUDINARY_CLOUD_NAME, CLOUDINARY_API_KEY, and CLOUDINARY_API_SECRET.
+"""
+
+from .base import CredentialSpec
+
+CLOUDINARY_CREDENTIALS = {
+    "cloudinary_cloud_name": CredentialSpec(
+        env_var="CLOUDINARY_CLOUD_NAME",
+        tools=[
+            "cloudinary_upload",
+            "cloudinary_list_resources",
+            "cloudinary_get_resource",
+            "cloudinary_delete_resource",
+            "cloudinary_search",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://console.cloudinary.com/",
+        description="Cloudinary cloud name from your dashboard",
+        direct_api_key_supported=True,
+        api_key_instructions="""To set up Cloudinary access:
+1. Go to https://console.cloudinary.com/
+2. Copy your Cloud Name, API Key, and API Secret from the dashboard
+3. Set environment variables:
+   export CLOUDINARY_CLOUD_NAME=your-cloud-name
+   export CLOUDINARY_API_KEY=your-api-key
+   export CLOUDINARY_API_SECRET=your-api-secret""",
+        health_check_endpoint="",
+        credential_id="cloudinary_cloud_name",
+        credential_key="api_key",
+    ),
+    "cloudinary_key": CredentialSpec(
+        env_var="CLOUDINARY_API_KEY",
+        tools=[
+            "cloudinary_upload",
+            "cloudinary_list_resources",
+            "cloudinary_get_resource",
+            "cloudinary_delete_resource",
+            "cloudinary_search",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://console.cloudinary.com/",
+        description="Cloudinary API key for authentication",
+        direct_api_key_supported=True,
+        api_key_instructions="""See CLOUDINARY_CLOUD_NAME instructions above.""",
+        health_check_endpoint="",
+        credential_id="cloudinary_key",
+        credential_key="api_key",
+    ),
+    "cloudinary_secret": CredentialSpec(
+        env_var="CLOUDINARY_API_SECRET",
+        tools=[
+            "cloudinary_upload",
+            "cloudinary_list_resources",
+            "cloudinary_get_resource",
+            "cloudinary_delete_resource",
+            "cloudinary_search",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://console.cloudinary.com/",
+        description="Cloudinary API secret for authentication",
+        direct_api_key_supported=True,
+        api_key_instructions="""See CLOUDINARY_CLOUD_NAME instructions above.""",
+        health_check_endpoint="",
+        credential_id="cloudinary_secret",
+        credential_key="api_key",
+    ),
+}
--- a/Show More
+++ b/Show More