Merge remote-tracking branch 'origin/main' into feature/quickstart-credential-store

2026-02-04 20:03:44 -08:00
parent 8f32ef8064 e12bc96e21
commit 1c6f17e8db
50 changed files with 13098 additions and 1708 deletions
@@ -267,7 +267,7 @@ This returns JSON with all the goal, nodes, edges, and MCP server configurations
 - NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
 - NOT: `{"first-node-id"}` (WRONG - this is a set)

-**Use the example agent** at `.claude/skills/building-agents-construction/examples/online_research_agent/` as a template for file structure and patterns.
+**Use the example agent** at `.claude/skills/building-agents-construction/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.

 **AFTER writing all files, tell the user:**

@@ -354,7 +354,7 @@ mcp__agent-builder__get_session_status()

 ## REFERENCE: System Prompt Best Practice

-For event_loop nodes, instruct the LLM to use `set_output` for structured outputs:
+For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:

 ```
 Use set_output(key, value) to store your results. For example:
@@ -363,71 +363,55 @@ Use set_output(key, value) to store your results. For example:
 Do NOT return raw JSON. Use the set_output tool to produce outputs.
 ```

+For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
+
+```
+**STEP 1 — Respond to the user (text only, NO tool calls):**
+[Present information, ask questions, etc.]
+
+**STEP 2 — After the user responds, call set_output:**
+- set_output("key", "value based on user's response")
+```
+
+This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
+
 ---

-## CRITICAL: EventLoopNode Registration
+## EventLoopNode Runtime

-**`AgentRuntime` does NOT support `event_loop` nodes.** The `AgentRuntime` / `create_agent_runtime()` path creates `GraphExecutor` instances internally without passing a `node_registry`, causing all `event_loop` nodes to fail at runtime with:
-
-```
-EventLoopNode 'node-id' not found in registry. Register it with executor.register_node() before execution.
-```
-
-**The correct pattern**: Use `GraphExecutor` directly with a `node_registry` dict containing `EventLoopNode` instances:
+EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.

 ```python
-from framework.graph.executor import GraphExecutor, ExecutionResult
-from framework.graph.event_loop_node import EventLoopNode, LoopConfig
-from framework.runtime.event_bus import EventBus
-from framework.runtime.core import Runtime  # REQUIRED - executor calls runtime.start_run()
+# Direct execution
+from framework.graph.executor import GraphExecutor
+from framework.runtime.core import Runtime

-# 1. Build node_registry with EventLoopNode instances
-event_bus = EventBus()
-node_registry = {}
-for node_spec in nodes:
-    if node_spec.node_type == "event_loop":
-        node_registry[node_spec.id] = EventLoopNode(
-            event_bus=event_bus,
-            judge=None,  # implicit judge: accepts when output_keys are filled
-            config=LoopConfig(
-                max_iterations=50,
-                max_tool_calls_per_turn=15,
-                stall_detection_threshold=3,
-                max_history_tokens=32000,
-            ),
-            tool_executor=tool_executor,
-        )
-
-# 2. Create Runtime for run tracking (GraphExecutor calls runtime.start_run())
 storage_path = Path.home() / ".hive" / "my_agent"
 storage_path.mkdir(parents=True, exist_ok=True)
 runtime = Runtime(storage_path)

-# 3. Create GraphExecutor WITH node_registry and runtime
 executor = GraphExecutor(
-    runtime=runtime,       # NOT None - executor needs this for run tracking
+    runtime=runtime,
    llm=llm,
    tools=tools,
    tool_executor=tool_executor,
-    node_registry=node_registry,  # EventLoopNode instances
+    storage_path=storage_path,
 )
-
-# 4. Execute
 result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
 ```

-**DO NOT use `AgentRuntime` or `create_agent_runtime()` for agents with `event_loop` nodes.**
-
 **DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.

 ---

 ## COMMON MISTAKES TO AVOID

-1. **Using `AgentRuntime` with event_loop nodes** - `AgentRuntime` does not register EventLoopNodes. Use `GraphExecutor` directly with `node_registry`
-2. **Passing `runtime=None` to GraphExecutor** - The executor calls `runtime.start_run()` internally. Always provide a `Runtime(storage_path)` instance
-3. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
-4. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
-5. **Skipping validation** - Always validate nodes and graph before proceeding
-6. **Not waiting for approval** - Always ask user before major steps
-7. **Displaying this file** - Execute the steps, don't show documentation
+1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
+2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
+3. **Skipping validation** - Always validate nodes and graph before proceeding
+4. **Not waiting for approval** - Always ask user before major steps
+5. **Displaying this file** - Execute the steps, don't show documentation
+6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
+7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
+8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
+9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
@@ -0,0 +1,24 @@
+"""
+Deep Research Agent - Interactive, rigorous research with TUI conversation.
+
+Research any topic through multi-source web search, quality evaluation,
+and synthesis. Features client-facing TUI interaction at key checkpoints
+for user guidance and iterative deepening.
+"""
+
+from .agent import DeepResearchAgent, default_agent, goal, nodes, edges
+from .config import RuntimeConfig, AgentMetadata, default_config, metadata
+
+__version__ = "1.0.0"
+
+__all__ = [
+    "DeepResearchAgent",
+    "default_agent",
+    "goal",
+    "nodes",
+    "edges",
+    "RuntimeConfig",
+    "AgentMetadata",
+    "default_config",
+    "metadata",
+]
@@ -1,5 +1,5 @@
 """
-CLI entry point for Online Research Agent.
+CLI entry point for Deep Research Agent.

 Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
 """
@@ -10,7 +10,7 @@ import logging
 import sys
 import click

-from .agent import default_agent, OnlineResearchAgent
+from .agent import default_agent, DeepResearchAgent


 def setup_logging(verbose=False, debug=False):
@@ -28,7 +28,7 @@ def setup_logging(verbose=False, debug=False):
@click.group()
@click.version_option(version="1.0.0")
 def cli():
-    """Online Research Agent - Deep-dive research with narrative reports."""
+    """Deep Research Agent - Interactive, rigorous research with TUI conversation."""
    pass


@@ -59,6 +59,83 @@ def run(topic, mock, quiet, verbose, debug):
    sys.exit(0 if result.success else 1)


+@cli.command()
+@click.option("--mock", is_flag=True, help="Run in mock mode")
+@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
+@click.option("--debug", is_flag=True, help="Show debug logging")
+def tui(mock, verbose, debug):
+    """Launch the TUI dashboard for interactive research."""
+    setup_logging(verbose=verbose, debug=debug)
+
+    try:
+        from framework.tui.app import AdenTUI
+    except ImportError:
+        click.echo("TUI requires the 'textual' package. Install with: pip install textual")
+        sys.exit(1)
+
+    from pathlib import Path
+
+    from framework.llm import LiteLLMProvider
+    from framework.runner.tool_registry import ToolRegistry
+    from framework.runtime.agent_runtime import create_agent_runtime
+    from framework.runtime.event_bus import EventBus
+    from framework.runtime.execution_stream import EntryPointSpec
+
+    async def run_with_tui():
+        agent = DeepResearchAgent()
+
+        # Build graph and tools
+        agent._event_bus = EventBus()
+        agent._tool_registry = ToolRegistry()
+
+        mcp_config_path = Path(__file__).parent / "mcp_servers.json"
+        if mcp_config_path.exists():
+            agent._tool_registry.load_mcp_config(mcp_config_path)
+
+        llm = None
+        if not mock:
+            llm = LiteLLMProvider(
+                model=agent.config.model,
+                api_key=agent.config.api_key,
+                api_base=agent.config.api_base,
+            )
+
+        tools = list(agent._tool_registry.get_tools().values())
+        tool_executor = agent._tool_registry.get_executor()
+        graph = agent._build_graph()
+
+        storage_path = Path.home() / ".hive" / "deep_research_agent"
+        storage_path.mkdir(parents=True, exist_ok=True)
+
+        runtime = create_agent_runtime(
+            graph=graph,
+            goal=agent.goal,
+            storage_path=storage_path,
+            entry_points=[
+                EntryPointSpec(
+                    id="start",
+                    name="Start Research",
+                    entry_node="intake",
+                    trigger_type="manual",
+                    isolation_level="isolated",
+                ),
+            ],
+            llm=llm,
+            tools=tools,
+            tool_executor=tool_executor,
+        )
+
+        await runtime.start()
+
+        try:
+            app = AdenTUI(runtime)
+            await app.run_async()
+        finally:
+            await runtime.stop()
+
+    asyncio.run(run_with_tui())
+
+
@cli.command()
@click.option("--json", "output_json", is_flag=True)
 def info(output_json):
@@ -71,6 +148,7 @@ def info(output_json):
        click.echo(f"Version: {info_data['version']}")
        click.echo(f"Description: {info_data['description']}")
        click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
+        click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
        click.echo(f"Entry: {info_data['entry_node']}")
        click.echo(f"Terminal: {', '.join(info_data['terminal_nodes'])}")

@@ -81,6 +159,9 @@ def validate():
    validation = default_agent.validate()
    if validation["valid"]:
        click.echo("Agent is valid")
+        if validation["warnings"]:
+            for warning in validation["warnings"]:
+                click.echo(f"  WARNING: {warning}")
    else:
        click.echo("Agent has errors:")
        for error in validation["errors"]:
@@ -91,7 +172,7 @@ def validate():
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
 def shell(verbose):
-    """Interactive research session."""
+    """Interactive research session (CLI, no TUI)."""
    asyncio.run(_interactive_shell(verbose))


@@ -99,10 +180,10 @@ async def _interactive_shell(verbose=False):
    """Async interactive shell."""
    setup_logging(verbose=verbose)

-    click.echo("=== Online Research Agent ===")
+    click.echo("=== Deep Research Agent ===")
    click.echo("Enter a topic to research (or 'quit' to exit):\n")

-    agent = OnlineResearchAgent()
+    agent = DeepResearchAgent()
    await agent.start()

    try:
@@ -118,7 +199,7 @@ async def _interactive_shell(verbose=False):
                if not topic.strip():
                    continue

-                click.echo("\nResearching... (this may take a few minutes)\n")
+                click.echo("\nResearching...\n")

                result = await agent.trigger_and_wait("start", {"topic": topic})

@@ -128,16 +209,14 @@ async def _interactive_shell(verbose=False):

                if result.success:
                    output = result.output
-                    if "file_path" in output:
-                        click.echo(f"\nReport saved to: {output['file_path']}\n")
-                    if "final_report" in output:
-                        click.echo("\n--- Report Preview ---\n")
-                        preview = (
-                            output["final_report"][:500] + "..."
-                            if len(output.get("final_report", "")) > 500
-                            else output.get("final_report", "")
-                        )
-                        click.echo(preview)
+                    if "report_content" in output:
+                        click.echo("\n--- Report ---\n")
+                        click.echo(output["report_content"])
+                        click.echo("\n")
+                    if "references" in output:
+                        click.echo("--- References ---\n")
+                        for ref in output.get("references", []):
+                            click.echo(f"  [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}")
                        click.echo("\n")
                else:
                    click.echo(f"\nResearch failed: {result.error}\n")
@@ -148,7 +227,6 @@ async def _interactive_shell(verbose=False):
            except Exception as e:
                click.echo(f"Error: {e}", err=True)
                import traceback
-
                traceback.print_exc()
    finally:
        await agent.stop()
@@ -1,9 +1,8 @@
-"""Agent graph construction for Online Research Agent."""
+"""Agent graph construction for Deep Research Agent."""

 from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
 from framework.graph.edge import GraphSpec
 from framework.graph.executor import ExecutionResult, GraphExecutor
-from framework.graph.event_loop_node import EventLoopNode, LoopConfig
 from framework.runtime.event_bus import EventBus
 from framework.runtime.core import Runtime
 from framework.llm import LiteLLMProvider
@@ -11,164 +10,132 @@ from framework.runner.tool_registry import ToolRegistry

 from .config import default_config, metadata
 from .nodes import (
-    parse_query_node,
-    search_sources_node,
-    fetch_content_node,
-    evaluate_sources_node,
-    synthesize_findings_node,
-    write_report_node,
-    quality_check_node,
-    save_report_node,
+    intake_node,
+    research_node,
+    review_node,
+    report_node,
 )

 # Goal definition
 goal = Goal(
-    id="comprehensive-online-research",
-    name="Comprehensive Online Research",
-    description="Research any topic by searching multiple sources, synthesizing information, and producing a well-structured narrative report with citations.",
+    id="rigorous-interactive-research",
+    name="Rigorous Interactive Research",
+    description=(
+        "Research any topic by searching diverse sources, analyzing findings, "
+        "and producing a cited report — with user checkpoints to guide direction."
+    ),
    success_criteria=[
        SuccessCriterion(
-            id="source-coverage",
-            description="Query 10+ diverse sources",
+            id="source-diversity",
+            description="Use multiple diverse, authoritative sources",
            metric="source_count",
-            target=">=10",
-            weight=0.20,
-        ),
-        SuccessCriterion(
-            id="relevance",
-            description="All sources directly address the query",
-            metric="relevance_score",
-            target="90%",
+            target=">=5",
            weight=0.25,
        ),
        SuccessCriterion(
-            id="synthesis",
-            description="Synthesize findings into coherent narrative",
-            metric="coherence_score",
-            target="85%",
-            weight=0.25,
-        ),
-        SuccessCriterion(
-            id="citations",
-            description="Include citations for all claims",
+            id="citation-coverage",
+            description="Every factual claim in the report cites its source",
            metric="citation_coverage",
            target="100%",
-            weight=0.15,
+            weight=0.25,
        ),
        SuccessCriterion(
-            id="actionable",
-            description="Report answers the user's question",
-            metric="answer_completeness",
+            id="user-satisfaction",
+            description="User reviews findings before report generation",
+            metric="user_approval",
+            target="true",
+            weight=0.25,
+        ),
+        SuccessCriterion(
+            id="report-completeness",
+            description="Final report answers the original research questions",
+            metric="question_coverage",
            target="90%",
-            weight=0.15,
+            weight=0.25,
        ),
    ],
    constraints=[
        Constraint(
            id="no-hallucination",
-            description="Only include information found in sources",
+            description="Only include information found in fetched sources",
            constraint_type="quality",
            category="accuracy",
        ),
        Constraint(
            id="source-attribution",
-            description="Every factual claim must cite its source",
+            description="Every claim must cite its source with a numbered reference",
            constraint_type="quality",
            category="accuracy",
        ),
        Constraint(
-            id="recency-preference",
-            description="Prefer recent sources when relevant",
-            constraint_type="quality",
-            category="relevance",
-        ),
-        Constraint(
-            id="no-paywalled",
-            description="Avoid sources that require payment to access",
+            id="user-checkpoint",
+            description="Present findings to the user before writing the final report",
            constraint_type="functional",
-            category="accessibility",
+            category="interaction",
        ),
    ],
 )

 # Node list
 nodes = [
-    parse_query_node,
-    search_sources_node,
-    fetch_content_node,
-    evaluate_sources_node,
-    synthesize_findings_node,
-    write_report_node,
-    quality_check_node,
-    save_report_node,
+    intake_node,
+    research_node,
+    review_node,
+    report_node,
 ]

 # Edge definitions
 edges = [
+    # intake -> research
    EdgeSpec(
-        id="parse-to-search",
-        source="parse-query",
-        target="search-sources",
+        id="intake-to-research",
+        source="intake",
+        target="research",
        condition=EdgeCondition.ON_SUCCESS,
        priority=1,
    ),
+    # research -> review
    EdgeSpec(
-        id="search-to-fetch",
-        source="search-sources",
-        target="fetch-content",
+        id="research-to-review",
+        source="research",
+        target="review",
        condition=EdgeCondition.ON_SUCCESS,
        priority=1,
    ),
+    # review -> research (feedback loop)
    EdgeSpec(
-        id="fetch-to-evaluate",
-        source="fetch-content",
-        target="evaluate-sources",
-        condition=EdgeCondition.ON_SUCCESS,
+        id="review-to-research-feedback",
+        source="review",
+        target="research",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="needs_more_research == True",
        priority=1,
    ),
+    # review -> report (user satisfied)
    EdgeSpec(
-        id="evaluate-to-synthesize",
-        source="evaluate-sources",
-        target="synthesize-findings",
-        condition=EdgeCondition.ON_SUCCESS,
-        priority=1,
-    ),
-    EdgeSpec(
-        id="synthesize-to-write",
-        source="synthesize-findings",
-        target="write-report",
-        condition=EdgeCondition.ON_SUCCESS,
-        priority=1,
-    ),
-    EdgeSpec(
-        id="write-to-quality",
-        source="write-report",
-        target="quality-check",
-        condition=EdgeCondition.ON_SUCCESS,
-        priority=1,
-    ),
-    EdgeSpec(
-        id="quality-to-save",
-        source="quality-check",
-        target="save-report",
-        condition=EdgeCondition.ON_SUCCESS,
-        priority=1,
+        id="review-to-report",
+        source="review",
+        target="report",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="needs_more_research == False",
+        priority=2,
    ),
 ]

 # Graph configuration
-entry_node = "parse-query"
-entry_points = {"start": "parse-query"}
+entry_node = "intake"
+entry_points = {"start": "intake"}
 pause_nodes = []
-terminal_nodes = ["save-report"]
+terminal_nodes = ["report"]


-class OnlineResearchAgent:
+class DeepResearchAgent:
    """
-    Online Research Agent - Deep-dive research with narrative reports.
+    Deep Research Agent — 4-node pipeline with user checkpoints.

-    Uses GraphExecutor directly with EventLoopNode instances registered
-    in the node_registry for multi-turn tool execution.
+    Flow: intake -> research -> review -> report
+                      ^           |
+                      +-- feedback loop (if user wants more)
    """

    def __init__(self, config=None):
@@ -188,7 +155,7 @@ class OnlineResearchAgent:
    def _build_graph(self) -> GraphSpec:
        """Build the GraphSpec."""
        return GraphSpec(
-            id="online-research-agent-graph",
+            id="deep-research-agent-graph",
            goal_id=self.goal.id,
            version="1.0.0",
            entry_node=self.entry_node,
@@ -201,29 +168,11 @@ class OnlineResearchAgent:
            max_tokens=self.config.max_tokens,
        )

-    def _build_node_registry(self, tool_executor=None) -> dict:
-        """Create EventLoopNode instances for all event_loop nodes."""
-        registry = {}
-        for node_spec in self.nodes:
-            if node_spec.node_type == "event_loop":
-                registry[node_spec.id] = EventLoopNode(
-                    event_bus=self._event_bus,
-                    judge=None,  # implicit judge: accept when output_keys are filled
-                    config=LoopConfig(
-                        max_iterations=50,
-                        max_tool_calls_per_turn=15,
-                        stall_detection_threshold=3,
-                        max_history_tokens=32000,
-                    ),
-                    tool_executor=tool_executor,
-                )
-        return registry
-
    def _setup(self, mock_mode=False) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

-        storage_path = Path.home() / ".hive" / "online_research_agent"
+        storage_path = Path.home() / ".hive" / "deep_research_agent"
        storage_path.mkdir(parents=True, exist_ok=True)

        self._event_bus = EventBus()
@@ -245,7 +194,6 @@ class OnlineResearchAgent:
        tools = list(self._tool_registry.get_tools().values())

        self._graph = self._build_graph()
-        node_registry = self._build_node_registry(tool_executor=tool_executor)
        runtime = Runtime(storage_path)

        self._executor = GraphExecutor(
@@ -253,7 +201,8 @@ class OnlineResearchAgent:
            llm=llm,
            tools=tools,
            tool_executor=tool_executor,
-            node_registry=node_registry,
+            event_bus=self._event_bus,
+            storage_path=storage_path,
        )

        return self._executor
@@ -317,7 +266,7 @@ class OnlineResearchAgent:
            "entry_points": self.entry_points,
            "pause_nodes": self.pause_nodes,
            "terminal_nodes": self.terminal_nodes,
-            "multi_entrypoint": True,
+            "client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
        }

    def validate(self):
@@ -339,10 +288,6 @@ class OnlineResearchAgent:
            if terminal not in node_ids:
                errors.append(f"Terminal node '{terminal}' not found")

-        for pause in self.pause_nodes:
-            if pause not in node_ids:
-                errors.append(f"Pause node '{pause}' not found")
-
        for ep_id, node_id in self.entry_points.items():
            if node_id not in node_ids:
                errors.append(
@@ -357,4 +302,4 @@ class OnlineResearchAgent:


 # Create default instance
-default_agent = OnlineResearchAgent()
+default_agent = DeepResearchAgent()
@@ -32,12 +32,15 @@ class RuntimeConfig:
 default_config = RuntimeConfig()


-# Agent metadata
@dataclass
 class AgentMetadata:
-    name: str = "Online Research Agent"
+    name: str = "Deep Research Agent"
    version: str = "1.0.0"
-    description: str = "Research any topic by searching multiple sources, synthesizing information, and producing a well-structured narrative report with citations."
+    description: str = (
+        "Interactive research agent that rigorously investigates topics through "
+        "multi-source search, quality evaluation, and synthesis - with TUI conversation "
+        "at key checkpoints for user guidance and feedback."
+    )


 metadata = AgentMetadata()
@@ -0,0 +1,147 @@
+"""Node definitions for Deep Research Agent."""
+
+from framework.graph import NodeSpec
+
+# Node 1: Intake (client-facing)
+# Brief conversation to clarify what the user wants researched.
+intake_node = NodeSpec(
+    id="intake",
+    name="Research Intake",
+    description="Discuss the research topic with the user, clarify scope, and confirm direction",
+    node_type="event_loop",
+    client_facing=True,
+    input_keys=["topic"],
+    output_keys=["research_brief"],
+    system_prompt="""\
+You are a research intake specialist. The user wants to research a topic.
+Have a brief conversation to clarify what they need.
+
+**STEP 1 — Read and respond (text only, NO tool calls):**
+1. Read the topic provided
+2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)
+3. If it's already clear, confirm your understanding and ask the user to confirm
+
+Keep it short. Don't over-ask.
+
+**STEP 2 — After the user confirms, call set_output:**
+- set_output("research_brief", "A clear paragraph describing exactly what to research, \
+what questions to answer, what scope to cover, and how deep to go.")
+""",
+    tools=[],
+)
+
+# Node 2: Research
+# The workhorse — searches the web, fetches content, analyzes sources.
+# One node with both tools avoids the context-passing overhead of 5 separate nodes.
+research_node = NodeSpec(
+    id="research",
+    name="Research",
+    description="Search the web, fetch source content, and compile findings",
+    node_type="event_loop",
+    max_node_visits=3,
+    input_keys=["research_brief", "feedback"],
+    output_keys=["findings", "sources", "gaps"],
+    nullable_output_keys=["feedback"],
+    system_prompt="""\
+You are a research agent. Given a research brief, find and analyze sources.
+
+If feedback is provided, this is a follow-up round — focus on the gaps identified.
+
+Work in phases:
+1. **Search**: Use web_search with 3-5 diverse queries covering different angles.
+   Prioritize authoritative sources (.edu, .gov, established publications).
+2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).
+   Skip URLs that fail. Extract the substantive content.
+3. **Analyze**: Review what you've collected. Identify key findings, themes,
+   and any contradictions between sources.
+
+Important:
+- Work in batches of 3-4 tool calls at a time to manage context
+- After each batch, assess whether you have enough material
+- Prefer quality over quantity — 5 good sources beat 15 thin ones
+- Track which URL each finding comes from (you'll need citations later)
+
+When done, use set_output:
+- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
+Include themes, contradictions, and confidence levels.")
+- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
+- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
+""",
+    tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
+)
+
+# Node 3: Review (client-facing)
+# Shows the user what was found and asks whether to dig deeper or proceed.
+review_node = NodeSpec(
+    id="review",
+    name="Review Findings",
+    description="Present findings to user and decide whether to research more or write the report",
+    node_type="event_loop",
+    client_facing=True,
+    max_node_visits=3,
+    input_keys=["findings", "sources", "gaps", "research_brief"],
+    output_keys=["needs_more_research", "feedback"],
+    system_prompt="""\
+Present the research findings to the user clearly and concisely.
+
+**STEP 1 — Present (your first message, text only, NO tool calls):**
+1. **Summary** (2-3 sentences of what was found)
+2. **Key Findings** (bulleted, with confidence levels)
+3. **Sources Used** (count and quality assessment)
+4. **Gaps** (what's still unclear or under-covered)
+
+End by asking: Are they satisfied, or do they want deeper research? \
+Should we proceed to writing the final report?
+
+**STEP 2 — After the user responds, call set_output:**
+- set_output("needs_more_research", "true")  — if they want more
+- set_output("needs_more_research", "false") — if they're satisfied
+- set_output("feedback", "What the user wants explored further, or empty string")
+""",
+    tools=[],
+)
+
+# Node 4: Report (client-facing)
+# Writes the final report and presents it to the user.
+report_node = NodeSpec(
+    id="report",
+    name="Write & Deliver Report",
+    description="Write a cited report from the findings and present it to the user",
+    node_type="event_loop",
+    client_facing=True,
+    input_keys=["findings", "sources", "research_brief"],
+    output_keys=["delivery_status"],
+    system_prompt="""\
+Write a comprehensive research report and present it to the user.
+
+**STEP 1 — Write and present the report (text only, NO tool calls):**
+
+Report structure:
+1. **Executive Summary** (2-3 paragraphs)
+2. **Findings** (organized by theme, with [n] citations)
+3. **Analysis** (synthesis, implications, areas of debate)
+4. **Conclusion** (key takeaways, confidence assessment)
+5. **References** (numbered list of sources cited)
+
+Requirements:
+- Every factual claim must cite its source with [n] notation
+- Be objective — present multiple viewpoints where sources disagree
+- Distinguish well-supported conclusions from speculation
+- Answer the original research questions from the brief
+
+End by asking the user if they have questions or want to save the report.
+
+**STEP 2 — After the user responds:**
+- Answer follow-up questions from the research material
+- If they want to save, use write_to_file tool
+- When the user is satisfied: set_output("delivery_status", "completed")
+""",
+    tools=["write_to_file"],
+)
+
+__all__ = [
+    "intake_node",
+    "research_node",
+    "review_node",
+    "report_node",
+]
@@ -1,80 +0,0 @@
-# Online Research Agent
-
-Deep-dive research agent that searches 10+ sources and produces comprehensive narrative reports with citations.
-
-## Features
-
- Generates multiple search queries from a topic
- Searches and fetches 15+ web sources
- Evaluates and ranks sources by relevance
- Synthesizes findings into themes
- Writes narrative report with numbered citations
- Quality checks for uncited claims
- Saves report to local markdown file
-
-## Usage
-
-### CLI
-
-```bash
-# Show agent info
-uv run python -m online_research_agent info
-
-# Validate structure
-uv run python -m online_research_agent validate
-
-# Run research on a topic
-uv run python -m online_research_agent run --topic "impact of AI on healthcare"
-
-# Interactive shell
-uv run python -m online_research_agent shell
-```
-
-### Python API
-
-```python
-from online_research_agent import default_agent
-
-# Simple usage
-result = await default_agent.run({"topic": "climate change solutions"})
-
-# Check output
-if result.success:
-    print(f"Report saved to: {result.output['file_path']}")
-    print(result.output['final_report'])
-```
-
-## Workflow
-
-```
-parse-query → search-sources → fetch-content → evaluate-sources
-                                                      ↓
-                                write-report ← synthesize-findings
-                                      ↓
-                               quality-check → save-report
-```
-
-## Output
-
-Reports are saved to `./research_reports/` as markdown files with:
-
-1. Executive Summary
-2. Introduction
-3. Key Findings (by theme)
-4. Analysis
-5. Conclusion
-6. References
-
-## Requirements
-
- Python 3.11+
- LLM provider API key (Groq, Cerebras, etc.)
- Internet access for web search/fetch
-
-## Configuration
-
-Edit `config.py` to change:
-
- `model`: LLM model (default: groq/moonshotai/kimi-k2-instruct-0905)
- `temperature`: Generation temperature (default: 0.7)
- `max_tokens`: Max tokens per response (default: 16384)
@@ -1,23 +0,0 @@
-"""
-Online Research Agent - Deep-dive research with narrative reports.
-
-Research any topic by searching multiple sources, synthesizing information,
-and producing a well-structured narrative report with citations.
-"""
-
-from .agent import OnlineResearchAgent, default_agent, goal, nodes, edges
-from .config import RuntimeConfig, AgentMetadata, default_config, metadata
-
-__version__ = "1.0.0"
-
-__all__ = [
-    "OnlineResearchAgent",
-    "default_agent",
-    "goal",
-    "nodes",
-    "edges",
-    "RuntimeConfig",
-    "AgentMetadata",
-    "default_config",
-    "metadata",
-]
@@ -1,232 +0,0 @@
-"""Node definitions for Online Research Agent."""
-
-from framework.graph import NodeSpec
-
-# Node 1: Parse Query
-parse_query_node = NodeSpec(
-    id="parse-query",
-    name="Parse Query",
-    description="Analyze the research topic and generate 3-5 diverse search queries to cover different aspects",
-    node_type="event_loop",
-    input_keys=["topic"],
-    output_keys=["search_queries", "research_focus", "key_aspects"],
-    system_prompt="""\
-You are a research query strategist. Given a research topic, analyze it and generate search queries.
-
-Your task:
-1. Understand the core research question
-2. Identify 3-5 key aspects to investigate
-3. Generate 3-5 diverse search queries that will find comprehensive information
-
-Use set_output to store each result:
- set_output("research_focus", "Brief statement of what we're researching")
- set_output("key_aspects", ["aspect1", "aspect2", "aspect3"])
- set_output("search_queries", ["query 1", "query 2", "query 3", "query 4", "query 5"])
-""",
-    tools=[],
-)
-
-# Node 2: Search Sources
-search_sources_node = NodeSpec(
-    id="search-sources",
-    name="Search Sources",
-    description="Execute web searches using the generated queries to find 15+ source URLs",
-    node_type="event_loop",
-    input_keys=["search_queries", "research_focus"],
-    output_keys=["source_urls", "search_results_summary"],
-    system_prompt="""\
-You are a research assistant executing web searches. Use the web_search tool to find sources.
-
-Your task:
-1. Execute each search query using web_search tool
-2. Collect URLs from search results
-3. Aim for 15+ diverse sources
-
-After searching, use set_output to store results:
- set_output("source_urls", ["url1", "url2", ...])
- set_output("search_results_summary", "Brief summary of what was found")
-""",
-    tools=["web_search"],
-)
-
-# Node 3: Fetch Content
-fetch_content_node = NodeSpec(
-    id="fetch-content",
-    name="Fetch Content",
-    description="Fetch and extract content from the discovered source URLs",
-    node_type="event_loop",
-    input_keys=["source_urls", "research_focus"],
-    output_keys=["fetched_sources", "fetch_errors"],
-    system_prompt="""\
-You are a content fetcher. Use web_scrape tool to retrieve content from URLs.
-
-Your task:
-1. Fetch content from each source URL using web_scrape tool
-2. Extract the main content relevant to the research focus
-3. Track any URLs that failed to fetch
-
-After fetching, use set_output to store results:
- set_output("fetched_sources", [{"url": "...", "title": "...", "content": "..."}])
- set_output("fetch_errors", ["url that failed", ...])
-""",
-    tools=["web_scrape"],
-)
-
-# Node 4: Evaluate Sources
-evaluate_sources_node = NodeSpec(
-    id="evaluate-sources",
-    name="Evaluate Sources",
-    description="Score sources for relevance and quality, filter to top 10",
-    node_type="event_loop",
-    input_keys=["fetched_sources", "research_focus", "key_aspects"],
-    output_keys=["ranked_sources", "source_analysis"],
-    system_prompt="""\
-You are a source evaluator. Assess each source for quality and relevance.
-
-Scoring criteria:
- Relevance to research focus (1-10)
- Source credibility (1-10)
- Information depth (1-10)
- Recency if relevant (1-10)
-
-Your task:
-1. Score each source
-2. Rank by combined score
-3. Select top 10 sources
-4. Note what each source uniquely contributes
-
-Use set_output to store results:
- set_output("ranked_sources", [{"url": "...", "title": "...", "score": 8.5}])
- set_output("source_analysis", "Overview of source quality and coverage")
-""",
-    tools=[],
-)
-
-# Node 5: Synthesize Findings
-synthesize_findings_node = NodeSpec(
-    id="synthesize-findings",
-    name="Synthesize Findings",
-    description="Extract key facts from sources and identify common themes",
-    node_type="event_loop",
-    input_keys=["ranked_sources", "research_focus", "key_aspects"],
-    output_keys=["key_findings", "themes", "source_citations"],
-    system_prompt="""\
-You are a research synthesizer. Analyze multiple sources to extract insights.
-
-Your task:
-1. Identify key facts from each source
-2. Find common themes across sources
-3. Note contradictions or debates
-4. Build a citation map (fact -> source URL)
-
-Use set_output to store each result:
- set_output("key_findings", [{"finding": "...", "sources": ["url1"], "confidence": "high"}])
- set_output("themes", [{"theme": "...", "description": "...", "supporting_sources": [...]}])
- set_output("source_citations", {"fact or claim": ["url1", "url2"]})
-""",
-    tools=[],
-)
-
-# Node 6: Write Report
-write_report_node = NodeSpec(
-    id="write-report",
-    name="Write Report",
-    description="Generate a narrative report with proper citations",
-    node_type="event_loop",
-    input_keys=[
-        "key_findings",
-        "themes",
-        "source_citations",
-        "research_focus",
-        "ranked_sources",
-    ],
-    output_keys=["report_content", "references"],
-    system_prompt="""\
-You are a research report writer. Create a well-structured narrative report.
-
-Report structure:
-1. Executive Summary (2-3 paragraphs)
-2. Introduction (context and scope)
-3. Key Findings (organized by theme)
-4. Analysis (synthesis and implications)
-5. Conclusion
-6. References (numbered list of all sources)
-
-Citation format: Use numbered citations like [1], [2] that correspond to the References section.
-
-IMPORTANT:
- Every factual claim MUST have a citation
- Write in clear, professional prose
- Be objective and balanced
- Highlight areas of consensus and debate
-
-Use set_output to store results:
- set_output("report_content", "Full markdown report text with citations...")
- set_output("references", [{"number": 1, "url": "...", "title": "..."}])
-""",
-    tools=[],
-)
-
-# Node 7: Quality Check
-quality_check_node = NodeSpec(
-    id="quality-check",
-    name="Quality Check",
-    description="Verify all claims have citations and report is coherent",
-    node_type="event_loop",
-    input_keys=["report_content", "references", "source_citations"],
-    output_keys=["quality_score", "issues", "final_report"],
-    system_prompt="""\
-You are a quality assurance reviewer. Check the research report for issues.
-
-Check for:
-1. Uncited claims (factual statements without [n] citation)
-2. Broken citations (references to non-existent numbers)
-3. Coherence (logical flow between sections)
-4. Completeness (all key aspects covered)
-5. Accuracy (claims match source content)
-
-If issues found, fix them in the final report.
-
-Use set_output to store results:
- set_output("quality_score", 0.95)
- set_output("issues", [{"type": "uncited_claim", "location": "...", "fixed": true}])
- set_output("final_report", "Corrected full report with all issues fixed...")
-""",
-    tools=[],
-)
-
-# Node 8: Save Report
-save_report_node = NodeSpec(
-    id="save-report",
-    name="Save Report",
-    description="Write the final report to a local markdown file",
-    node_type="event_loop",
-    input_keys=["final_report", "references", "research_focus"],
-    output_keys=["file_path", "save_status"],
-    system_prompt="""\
-You are a file manager. Save the research report to disk.
-
-Your task:
-1. Generate a filename from the research focus (slugified, with date)
-2. Use the write_to_file tool to save the report as markdown
-3. Save to the ./research_reports/ directory
-
-Filename format: research_YYYY-MM-DD_topic-slug.md
-
-Use set_output to store results:
- set_output("file_path", "research_reports/research_2026-01-23_topic-name.md")
- set_output("save_status", "success")
-""",
-    tools=["write_to_file"],
-)
-
-__all__ = [
-    "parse_query_node",
-    "search_sources_node",
-    "fetch_content_node",
-    "evaluate_sources_node",
-    "synthesize_findings_node",
-    "write_report_node",
-    "quality_check_node",
-    "save_report_node",
-]
@@ -158,6 +158,43 @@ intake_node = NodeSpec(

 > **Legacy Note:** The old `pause_nodes` / `entry_points` pattern still works but `client_facing=True` is preferred for new agents.

+**STEP 1 / STEP 2 Prompt Pattern:** For client-facing nodes, structure the system prompt with two explicit phases:
+
+```python
+system_prompt="""\
+**STEP 1 — Respond to the user (text only, NO tool calls):**
+[Present information, ask questions, etc.]
+
+**STEP 2 — After the user responds, call set_output:**
+[Call set_output with the structured outputs]
+"""
+```
+
+This prevents the LLM from calling `set_output` prematurely before the user has had a chance to respond.
+
+### Node Design: Fewer, Richer Nodes
+
+Prefer fewer nodes that do more work over many thin single-purpose nodes:
+
+- **Bad**: 8 thin nodes (parse query → search → fetch → evaluate → synthesize → write → check → save)
+- **Good**: 4 rich nodes (intake → research → review → report)
+
+Why: Each node boundary requires serializing outputs and passing context. Fewer nodes means the LLM retains full context of its work within the node. A research node that searches, fetches, and analyzes keeps all the source material in its conversation history.
+
+### nullable_output_keys for Cross-Edge Inputs
+
+When a node receives inputs that only arrive on certain edges (e.g., `feedback` only comes from a review → research feedback loop, not from intake → research), mark those keys as `nullable_output_keys`:
+
+```python
+research_node = NodeSpec(
+    id="research",
+    input_keys=["research_brief", "feedback"],
+    nullable_output_keys=["feedback"],  # Not present on first visit
+    max_node_visits=3,
+    ...
+)
+```
+
 ## Event Loop Architecture Concepts

 ### How EventLoopNode Works
@@ -169,40 +206,30 @@ An event loop node runs a multi-turn loop:
 4. Judge evaluates: ACCEPT (exit loop), RETRY (loop again), or ESCALATE
 5. Repeat until judge ACCEPTs or max_iterations reached

-### CRITICAL: EventLoopNode Runtime Requirements
+### EventLoopNode Runtime

-EventLoopNodes are **not auto-created** by the graph executor. They must be explicitly instantiated and registered in a `node_registry` dict before execution.
-
-**Required components:**
-1. **`EventLoopNode` instances** — One per event_loop NodeSpec, registered in `node_registry`
-2. **`Runtime` instance** — `GraphExecutor` calls `runtime.start_run()` internally. Passing `None` crashes the executor
-3. **`GraphExecutor` (not `AgentRuntime`)** — `AgentRuntime`/`create_agent_runtime()` does NOT pass `node_registry` to the internal `GraphExecutor`, so all event_loop nodes fail with "not found in registry"
+EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. You do NOT need to manually register them. Both `GraphExecutor` (direct) and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically.

 ```python
+# Direct execution — executor auto-creates EventLoopNodes
 from framework.graph.executor import GraphExecutor
-from framework.graph.event_loop_node import EventLoopNode, LoopConfig
-from framework.runtime.event_bus import EventBus
 from framework.runtime.core import Runtime

-# Build node_registry
-event_bus = EventBus()
-node_registry = {}
-for node_spec in nodes:
-    if node_spec.node_type == "event_loop":
-        node_registry[node_spec.id] = EventLoopNode(
-            event_bus=event_bus,
-            config=LoopConfig(max_iterations=50, max_tool_calls_per_turn=15),
-            tool_executor=tool_executor,
-        )
-
-# Create executor with Runtime and node_registry
 runtime = Runtime(storage_path)
 executor = GraphExecutor(
    runtime=runtime,
    llm=llm,
    tools=tools,
    tool_executor=tool_executor,
-    node_registry=node_registry,
+    storage_path=storage_path,
+)
+result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
+
+# TUI execution — AgentRuntime also works
+from framework.runtime.agent_runtime import create_agent_runtime
+runtime = create_agent_runtime(
+    graph=graph, goal=goal, storage_path=storage_path,
+    entry_points=[...], llm=llm, tools=tools, tool_executor=tool_executor,
 )
 ```

@@ -210,8 +237,12 @@ executor = GraphExecutor(

 Nodes produce structured outputs by calling `set_output(key, value)` — a synthetic tool injected by the framework. When the LLM calls `set_output`, the value is stored in the output accumulator and made available to downstream nodes via shared memory.

+`set_output` is NOT a real tool — it is excluded from `real_tool_results`. For client-facing nodes, this means a turn where the LLM only calls `set_output` (no other tools) is treated as a conversational boundary and will block for user input.
+
 ### JudgeProtocol

+**The judge is the SOLE mechanism for acceptance decisions.** Do not add ad-hoc framework gating, output rollback, or premature rejection logic. If the LLM calls `set_output` too early, fix it with better prompts or a custom judge — not framework-level guards.
+
 The judge controls when a node's loop exits:
 - **Implicit judge** (default, no judge configured): ACCEPTs when the LLM finishes with no tool calls and all required output keys are set
 - **SchemaJudge**: Validates outputs against a Pydantic model
@@ -225,6 +256,23 @@ Controls loop behavior:
 - `stall_detection_threshold` (default 3) — detects repeated identical responses
 - `max_history_tokens` (default 32000) — triggers conversation compaction

+### Data Tools (Spillover Management)
+
+When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
+
+- `save_data(filename, data, data_dir)` — Write data to a file in the data directory
+- `load_data(filename, data_dir, offset=0, limit=50)` — Read data with line-based pagination
+- `list_data_files(data_dir)` — List available data files
+
+These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
+
+```python
+research_node = NodeSpec(
+    ...
+    tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
+)
+```
+
 ### Fan-Out / Fan-In

 Multiple ON_SUCCESS edges from the same source create parallel execution. All branches run concurrently via `asyncio.gather()`. Parallel event_loop nodes must have disjoint `output_keys`.
@@ -61,28 +61,38 @@ For agents needing multi-turn conversations with users, use `client_facing=True`
 A client-facing node streams LLM output to the user and blocks for user input between conversational turns. This replaces the old pause/resume pattern.

 ```python
-# Client-facing node blocks for user input
+# Client-facing node with STEP 1/STEP 2 prompt pattern
 intake_node = NodeSpec(
    id="intake",
    name="Intake",
    description="Gather requirements from the user",
    node_type="event_loop",
    client_facing=True,
-    input_keys=[],
-    output_keys=["repo_url", "project_url"],
-    system_prompt="You are the intake agent. Ask the user for their repo URL and project URL. When you have both, call set_output for each.",
+    input_keys=["topic"],
+    output_keys=["research_brief"],
+    system_prompt="""\
+You are an intake specialist.
+
+**STEP 1 — Read and respond (text only, NO tool calls):**
+1. Read the topic provided
+2. If it's vague, ask 1-2 clarifying questions
+3. If it's clear, confirm your understanding
+
+**STEP 2 — After the user confirms, call set_output:**
+- set_output("research_brief", "Clear description of what to research")
+""",
 )

 # Internal node runs without user interaction
-scanner_node = NodeSpec(
-    id="scanner",
-    name="Scanner",
-    description="Scan the repository",
+research_node = NodeSpec(
+    id="research",
+    name="Research",
+    description="Search and analyze sources",
    node_type="event_loop",
-    input_keys=["repo_url"],
-    output_keys=["scan_results"],
-    system_prompt="Scan the repository at {repo_url}...",
-    tools=["scan_github_repo"],
+    input_keys=["research_brief"],
+    output_keys=["findings", "sources"],
+    system_prompt="Research the topic using web_search and web_scrape...",
+    tools=["web_search", "web_scrape", "load_data", "save_data"],
 )
 ```

@@ -91,6 +101,9 @@ scanner_node = NodeSpec(
 - User input is injected via `node.inject_event(text)`
 - When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
 - Internal nodes (non-client-facing) run their entire loop without blocking
+- `set_output` is a synthetic tool — a turn with only `set_output` calls (no real tools) triggers user input blocking
+
+**STEP 1/STEP 2 pattern:** Always structure client-facing prompts with explicit phases. STEP 1 is text-only conversation. STEP 2 calls `set_output` after user confirmation. This prevents the LLM from calling `set_output` prematurely before the user responds.

 ### When to Use client_facing

@@ -160,6 +173,12 @@ EdgeSpec(

 ## Judge Patterns

+**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
+- Output rollback logic
+- `_user_has_responded` flags
+- Premature set_output rejection
+- Interaction protocol injection into system prompts
+
 Judges control when an event_loop node's loop exits. Choose based on validation needs.

 ### Implicit Judge (Default)
@@ -241,15 +260,34 @@ EventLoopNode automatically manages context window usage with tiered compaction:

 ### Spillover Pattern

-For large tool results, use `save_data()` to write to disk and pass the filename through `set_output`. This keeps the LLM context window small.
+The framework automatically truncates large tool results and saves full content to a spillover directory. The LLM receives a truncation message with instructions to use `load_data` to read the full result.

-```
-LLM calls save_data(filename, large_data) → file written to spillover/
-LLM calls set_output("results_file", filename) → filename stored in output
-Downstream node calls load_data(filename) → reads from spillover/
+For explicit data management, use the data tools (real MCP tools, not synthetic):
+
+```python
+# save_data, load_data, list_data_files are real MCP tools
+# Each takes a data_dir parameter since the MCP server is shared
+
+# Saving large results
+save_data(filename="sources.json", data=large_json_string, data_dir="/path/to/spillover")
+
+# Reading with pagination (line-based offset/limit)
+load_data(filename="sources.json", data_dir="/path/to/spillover", offset=0, limit=50)
+
+# Listing available files
+list_data_files(data_dir="/path/to/spillover")
 ```

-The `load_data()` tool supports `offset` and `limit` parameters for paginated reading of large files.
+Add data tools to nodes that handle large tool results:
+
+```python
+research_node = NodeSpec(
+    ...
+    tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
+)
+```
+
+The `data_dir` is passed by the framework (from the node's spillover directory). The LLM sees `data_dir` in truncation messages and uses it when calling `load_data`.

 ## Anti-Patterns

@@ -259,6 +297,29 @@ The `load_data()` tool supports `offset` and `limit` parameters for paginated re
 - **Don't hide code in session** — Write to files as components are approved
 - **Don't wait to write files** — Agent visible from first step
 - **Don't batch everything** — Write incrementally, one component at a time
+- **Don't create too many thin nodes** — Prefer fewer, richer nodes (see below)
+- **Don't add framework gating for LLM behavior** — Fix prompts or use judges instead
+
+### Fewer, Richer Nodes
+
+A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
+
+| Bad (8 thin nodes) | Good (4 rich nodes) |
+|---------------------|---------------------|
+| parse-query | intake (client-facing) |
+| search-sources | research (search + fetch + analyze) |
+| fetch-content | review (client-facing) |
+| evaluate-sources | report (write + deliver) |
+| synthesize-findings | |
+| write-report | |
+| quality-check | |
+| save-report | |
+
+**Why fewer nodes are better:**
+- The LLM retains full context of its work within a single node
+- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
+- Fewer edges means simpler graph and fewer failure points
+- Data tools (`save_data`/`load_data`) handle context window limits within a single node

 ### MCP Tools - Correct Usage

@@ -55,14 +55,10 @@ jobs:
      - name: Install uv
        uses: astral-sh/setup-uv@v4

-      - name: Install dependencies
+      - name: Install dependencies and run tests
        run: |
          cd core
          uv sync
-
-      - name: Run tests
-        run: |
-          cd core
          uv run pytest tests/ -v

  test-tools:
@@ -54,7 +54,6 @@ __pycache__/
 *.egg-info/
 .eggs/
 *.egg
-uv.lock

 # Generated runtime data
 core/data/
@@ -198,9 +198,8 @@ hive/                                    # Repository root
 │   ├── quizzes/                         # Developer quizzes
 │   └── i18n/                            # Translations
 │
-├── scripts/                             # Build & utility scripts
-│   ├── setup-python.sh                  # Python environment setup
-│   └── setup.sh                         # Legacy setup script
+├── scripts/                             # Utility scripts
+│   └── auto-close-duplicates.ts         # GitHub duplicate issue closer
 │
 ├── quickstart.sh                        # Interactive setup wizard
 ├── ENVIRONMENT_SETUP.md                 # Complete Python setup guide
@@ -21,42 +21,18 @@ This will:
 - Fix package compatibility issues (openai + litellm)
 - Verify all installations

-## Quick Setup (Windows – PowerShell)
+## Windows Setup

-Windows users can use the native PowerShell setup script.
+Windows users should use **WSL (Windows Subsystem for Linux)** to set up and run agents.

-Before running the script, allow script execution for the current session:
-
-```powershell
-Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
-```
-
-Run setup from the project root:
-
-```powershell
-./scripts/setup-python.ps1
-```
-
-This will:
-
- Check Python version (requires 3.11+)
- Create a local `.venv` virtual environment
- Install the core framework package (`framework`)
- Install the tools package (`aden_tools`)
- Fix package compatibility issues (openai + litellm)
- Verify all installations
-
-After setup, activate the virtual environment:
-
-```powershell
-.\.venv\Scripts\Activate.ps1
-```
-
-Set `PYTHONPATH` (required in every new PowerShell session):
-
-```powershell
-$env:PYTHONPATH="core;exports"
-```
+1. [Install WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) if you haven't already:
+   ```powershell
+   wsl --install
+   ```
+2. Open your WSL terminal, clone the repo, and run the quickstart script:
+   ```bash
+   ./quickstart.sh
+   ```

 ## Alpine Linux Setup

@@ -326,12 +302,6 @@ Or run the setup script:
 ./quickstart.sh
 ```

-Windows:
-
-```powershell
-./scripts/setup-python.ps1
-```
-
 ### "ModuleNotFoundError: No module named 'openai.\_models'"

 **Cause:** Outdated `openai` package (0.27.x) incompatible with `litellm`
@@ -375,12 +345,6 @@ uv pip uninstall framework tools
 ./quickstart.sh
 ```

-Windows:
-
-```powershell
-./scripts/setup-python.ps1
-```
-
 ## Package Structure

 The Hive framework consists of three Python packages:
@@ -479,12 +443,6 @@ This design allows agents in `exports/` to be:
 ./quickstart.sh
 ```

-Windows:
-
-```powershell
-./scripts/setup-python.ps1
-```
-
 ### 2. Build Agent (Claude Code)

 ```
@@ -4,7 +4,6 @@
 - **Added empty response retry logic** — LLM provider now detects empty responses (e.g. Gemini returning 200 with no content on rate limit) and retries with exponential backoff, preventing hallucinated output from the cleanup LLM
 - **Added context-aware input compaction** — LLM nodes now estimate input token count before calling the model and progressively truncate the largest values if they exceed the context window budget
 - **Increased rate limit retries to 10** with verbose `[retry]` and `[compaction]` logging that includes model name, finish reason, and attempt count
- **Updated setup scripts** — `scripts/setup-python.sh` now installs Playwright Chromium browser automatically for web scraping support
 - **Interactive quickstart onboarding** — `quickstart.sh` rewritten as bee-themed interactive wizard that detects existing API keys (including Claude Code subscription), lets user pick ONE default LLM provider, and saves configuration to `~/.hive/configuration.json`
 - **Fixed lint errors** across `hubspot_tool.py` (line length) and `agent_builder_server.py` (unused variable)

@@ -24,8 +23,6 @@
 - `tools/src/aden_tools/tools/web_scrape_tool/README.md` — Updated docs
 - `tools/pyproject.toml` — Added `playwright`, `playwright-stealth` deps
 - `tools/Dockerfile` — Added `playwright install chromium --with-deps`
- `scripts/setup-python.sh` — Added Playwright Chromium browser install step
-
 ### LLM Reliability
 - `core/framework/llm/litellm.py` — Empty response retry + max retries 10 + verbose logging
 - `core/framework/graph/node.py` — Input compaction via `_compact_inputs()`, `_estimate_tokens()`, `_get_context_limit()`
@@ -41,7 +38,6 @@
 ## Test plan
 - [ ] Run `make lint` — passes clean
 - [ ] Run `./quickstart.sh` and verify interactive flow works, config saved to `~/.hive/configuration.json`
- [ ] Run `./scripts/setup-python.sh` and verify Playwright Chromium installs
 - [ ] Run `pytest tests/tools/test_web_scrape_tool.py -v`
 - [ ] Run agent against a JS-heavy site and verify `web_scrape` returns rendered content
 - [ ] Set `HUBSPOT_ACCESS_TOKEN` and verify HubSpot tool CRUD operations work
@@ -0,0 +1,30 @@
+# TUI Text Selection and Copy Guide
+
+## Keybindings
+
+| Key           | Action                |
+|---------------|-----------------------|
+| `Tab`         | Next panel            |
+| `Shift+Tab`   | Previous panel        |
+| `Ctrl+S`      | Save SVG screenshot   |
+| `Ctrl+O`      | Command palette       |
+| `Q`           | Quit                  |
+
+## Panel Cycle Order
+
+`Tab` cycles: **Log Pane → Graph View → Chat Input**
+
+## Text Selection
+
+Textual apps capture the mouse, so normal click-drag selection won't work by default. To select and copy text from any pane:
+
+1. **Hold `Shift`** while clicking and dragging — this bypasses Textual's mouse capture and lets your terminal handle selection natively.
+2. Copy with your terminal's shortcut (`Cmd+C` on macOS, `Ctrl+Shift+C` on most Linux terminals).
+
+## Log Pane Scrolling
+
+The log pane uses `auto_scroll=False`. New output only scrolls to the bottom when you are already at the bottom of the log. If you've scrolled up to read earlier output, it stays in place.
+
+## Screenshots
+
+`Ctrl+S` saves an SVG screenshot to the `screenshots/` directory with a timestamped filename. Open the SVG in any browser to view it.
@@ -144,19 +144,19 @@ class EventLoopNode(NodeProtocol):
    1. Try to restore from durable state (crash recovery)
    2. If no prior state, init from NodeSpec.system_prompt + input_keys
    3. Loop: drain injection queue -> stream LLM -> execute tools
-       -> if client_facing + no tools: block for user input (inject_event)
-       -> if not client_facing or tools present: judge evaluates
+       -> if client_facing + no real tools: block for user input
+       -> judge evaluates (acceptance criteria)
       (each add_* and set_output writes through to store immediately)
    4. Publish events to EventBus at each stage
    5. Write cursor after each iteration
    6. Terminate when judge returns ACCEPT, shutdown signaled, or max iterations
    7. Build output dict from OutputAccumulator

-    Client-facing blocking: When ``client_facing=True`` and the LLM produces
-    text without tool calls (a natural conversational turn), the node blocks
-    via ``_await_user_input()`` until ``inject_event()`` or ``signal_shutdown()``
-    is called.  This separates blocking (node concern) from output evaluation
-    (judge concern).
+    Client-facing blocking: When ``client_facing=True`` and the LLM finishes
+    without real tool calls (stop_reason != tool_call), the node blocks via
+    ``_await_user_input()`` until ``inject_event()`` or ``signal_shutdown()``
+    is called.  After user input, the judge evaluates — the judge is the
+    sole mechanism for acceptance decisions.

    Always returns NodeResult with retryable=False semantics. The executor
    must NOT retry event loop nodes -- retry is handled internally by the
@@ -212,8 +212,10 @@ class EventLoopNode(NodeProtocol):
        # 2. Restore or create new conversation + accumulator
        conversation, accumulator, start_iteration = await self._restore(ctx)
        if conversation is None:
+            system_prompt = ctx.node_spec.system_prompt or ""
+
            conversation = NodeConversation(
-                system_prompt=ctx.node_spec.system_prompt or "",
+                system_prompt=system_prompt,
                max_history_tokens=self._config.max_history_tokens,
                output_keys=ctx.node_spec.output_keys or None,
                store=self._conversation_store,
@@ -276,15 +278,20 @@ class EventLoopNode(NodeProtocol):
                iteration,
                len(conversation.messages),
            )
-            assistant_text, tool_results_list, turn_tokens = await self._run_single_turn(
-                ctx, conversation, tools, iteration, accumulator
-            )
+            (
+                assistant_text,
+                real_tool_results,
+                outputs_set,
+                turn_tokens,
+            ) = await self._run_single_turn(ctx, conversation, tools, iteration, accumulator)
            logger.info(
-                "[%s] iter=%d: LLM done — text=%d chars, tool_calls=%d, tokens=%s, accumulator=%s",
+                "[%s] iter=%d: LLM done — text=%d chars, real_tools=%d, "
+                "outputs_set=%s, tokens=%s, accumulator=%s",
                node_id,
                iteration,
                len(assistant_text),
-                len(tool_results_list),
+                len(real_tool_results),
+                outputs_set or "[]",
                turn_tokens,
                {k: ("set" if v is not None else "None") for k, v in accumulator.to_dict().items()},
            )
@@ -300,6 +307,31 @@ class EventLoopNode(NodeProtocol):
            if conversation.needs_compaction():
                await self._compact_tiered(ctx, conversation, accumulator)

+            # 6e'''. Empty response guard — if the LLM returned nothing
+            # (no text, no real tools, no set_output) and all required
+            # outputs are already set, accept immediately.  This prevents
+            # wasted iterations when the LLM has genuinely finished its
+            # work (e.g. after calling set_output in a previous turn).
+            truly_empty = not assistant_text and not real_tool_results and not outputs_set
+            if truly_empty and accumulator is not None:
+                missing = self._get_missing_output_keys(
+                    accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
+                )
+                if not missing:
+                    logger.info(
+                        "[%s] iter=%d: empty response but all outputs set — accepting",
+                        node_id,
+                        iteration,
+                    )
+                    await self._publish_loop_completed(stream_id, node_id, iteration + 1)
+                    latency_ms = int((time.time() - start_time) * 1000)
+                    return NodeResult(
+                        success=True,
+                        output=accumulator.to_dict(),
+                        tokens_used=total_input_tokens + total_output_tokens,
+                        latency_ms=latency_ms,
+                    )
+
            # 6f. Stall detection
            recent_responses.append(assistant_text)
            if len(recent_responses) > self._config.stall_detection_threshold:
@@ -321,18 +353,17 @@ class EventLoopNode(NodeProtocol):
            # 6g. Write cursor checkpoint
            await self._write_cursor(ctx, conversation, accumulator, iteration)

-            # 6h. Client-facing input wait
-            logger.info(
-                "[%s] iter=%d: 6h check — client_facing=%s, tool_results=%d",
-                node_id,
-                iteration,
-                ctx.node_spec.client_facing,
-                len(tool_results_list),
-            )
-            if ctx.node_spec.client_facing and not tool_results_list:
-                # LLM finished speaking (no tool calls) on a client-facing node.
-                # This is a conversational turn boundary: block for user input
-                # instead of running the judge.
+            # 6h. Client-facing input blocking
+            #
+            # For client_facing nodes, block for user input whenever the
+            # LLM finishes without making real tool calls (i.e. the LLM's
+            # stop_reason is not tool_call).  set_output is separated from
+            # real tools by _run_single_turn, so this correctly treats
+            # set_output-only turns as conversational boundaries.
+            #
+            # After user input, always fall through to judge evaluation
+            # (6i).  The judge handles all acceptance decisions.
+            if ctx.node_spec.client_facing and not real_tool_results:
                if self._shutdown:
                    await self._publish_loop_completed(stream_id, node_id, iteration + 1)
                    latency_ms = int((time.time() - start_time) * 1000)
@@ -347,7 +378,6 @@ class EventLoopNode(NodeProtocol):
                got_input = await self._await_user_input(ctx)
                logger.info("[%s] iter=%d: unblocked, got_input=%s", node_id, iteration, got_input)
                if not got_input:
-                    # Shutdown signaled during wait
                    await self._publish_loop_completed(stream_id, node_id, iteration + 1)
                    latency_ms = int((time.time() - start_time) * 1000)
                    return NodeResult(
@@ -357,46 +387,13 @@ class EventLoopNode(NodeProtocol):
                        latency_ms=latency_ms,
                    )

-                # Clear stall detection — user input resets the conversation
                recent_responses.clear()
-
-                # For nodes with an explicit judge, fall through to judge
-                # evaluation so the LLM gets structured feedback about missing
-                # outputs (e.g. "Missing output keys: [...]"). Without this,
-                # the LLM may generate text like "Ready to proceed!" without
-                # ever calling set_output, and the judge feedback never reaches it.
-                #
-                # For nodes without a judge (HITL review/approval with all-
-                # nullable keys), keep conversing UNLESS the LLM has already
-                # set an output — in that case fall through to the implicit
-                # judge which will ACCEPT and terminate the node.
-                if self._judge is None:
-                    has_outputs = accumulator and any(
-                        v is not None for v in accumulator.to_dict().values()
-                    )
-                    if not has_outputs:
-                        logger.info(
-                            "[%s] iter=%d: no judge, no outputs, continuing",
-                            node_id,
-                            iteration,
-                        )
-                        continue
-                    logger.info(
-                        "[%s] iter=%d: no judge, outputs set — implicit judge",
-                        node_id,
-                        iteration,
-                    )
-                else:
-                    logger.info(
-                        "[%s] iter=%d: has judge, falling through to 6i",
-                        node_id,
-                        iteration,
-                    )
+                # Fall through to judge evaluation (6i)

            # 6i. Judge evaluation
            should_judge = (
                (iteration + 1) % self._config.judge_every_n_turns == 0
-                or not tool_results_list  # no tool calls = natural stop
+                or not real_tool_results  # no real tool calls = natural stop
            )

            logger.info("[%s] iter=%d: 6i should_judge=%s", node_id, iteration, should_judge)
@@ -406,7 +403,7 @@ class EventLoopNode(NodeProtocol):
                    conversation,
                    accumulator,
                    assistant_text,
-                    tool_results_list,
+                    real_tool_results,
                    iteration,
                )
                fb_preview = (verdict.feedback or "")[:200]
@@ -526,16 +523,24 @@ class EventLoopNode(NodeProtocol):
        tools: list[Tool],
        iteration: int,
        accumulator: OutputAccumulator,
-    ) -> tuple[str, list[dict], dict[str, int]]:
+    ) -> tuple[str, list[dict], list[str], dict[str, int]]:
        """Run a single LLM turn with streaming and tool execution.

-        Returns (assistant_text, tool_results, token_counts).
+        Returns (assistant_text, real_tool_results, outputs_set, token_counts).
+
+        ``real_tool_results`` contains only results from actual tools (web_search,
+        etc.), NOT from the synthetic ``set_output`` tool.  ``outputs_set`` lists
+        the output keys written via ``set_output`` during this turn.  This
+        separation lets the caller treat set_output as a framework concern
+        rather than a tool-execution concern.
        """
        stream_id = ctx.node_id
        node_id = ctx.node_id
        token_counts: dict[str, int] = {"input": 0, "output": 0}
        tool_call_count = 0
        final_text = ""
+        # Track output keys set via set_output across all inner iterations
+        outputs_set_this_turn: list[str] = []

        # Inner tool loop: stream may produce tool calls requiring re-invocation
        while True:
@@ -606,10 +611,10 @@ class EventLoopNode(NodeProtocol):

            # If no tool calls, turn is complete
            if not tool_calls:
-                return final_text, [], token_counts
+                return final_text, [], outputs_set_this_turn, token_counts

-            # Execute tool calls
-            tool_results: list[dict] = []
+            # Execute tool calls — separate real tools from set_output
+            real_tool_results: list[dict] = []
            limit_hit = False
            executed_in_batch = 0
            for tc in tool_calls:
@@ -624,21 +629,21 @@ class EventLoopNode(NodeProtocol):
                    stream_id, node_id, tc.tool_use_id, tc.tool_name, tc.tool_input
                )

-                # Handle set_output synthetic tool
                logger.info(
                    "[%s] tool_call: %s(%s)",
                    node_id,
                    tc.tool_name,
                    json.dumps(tc.tool_input)[:200],
                )
+
                if tc.tool_name == "set_output":
+                    # --- Framework-level set_output handling ---
                    result = self._handle_set_output(tc.tool_input, ctx.node_spec.output_keys)
                    result = ToolResult(
                        tool_use_id=tc.tool_use_id,
                        content=result.content,
                        is_error=result.is_error,
                    )
-                    # Async write-through for set_output
                    if not result.is_error:
                        value = tc.tool_input["value"]
                        # Parse JSON strings into native types so downstream
@@ -652,26 +657,27 @@ class EventLoopNode(NodeProtocol):
                            except (json.JSONDecodeError, TypeError):
                                pass
                        await accumulator.set(tc.tool_input["key"], value)
+                        outputs_set_this_turn.append(tc.tool_input["key"])
                else:
-                    # Execute real tool
+                    # --- Real tool execution ---
                    result = await self._execute_tool(tc)
-                    # Truncate large results to prevent context blowup
                    result = self._truncate_tool_result(result, tc.tool_name)
+                    real_tool_results.append(
+                        {
+                            "tool_use_id": tc.tool_use_id,
+                            "tool_name": tc.tool_name,
+                            "content": result.content,
+                            "is_error": result.is_error,
+                        }
+                    )

-                # Record tool result in conversation (write-through)
+                # Record tool result in conversation (both real and set_output
+                # go into the conversation for LLM context continuity)
                await conversation.add_tool_result(
                    tool_use_id=tc.tool_use_id,
                    content=result.content,
                    is_error=result.is_error,
                )
-                tool_results.append(
-                    {
-                        "tool_use_id": tc.tool_use_id,
-                        "tool_name": tc.tool_name,
-                        "content": result.content,
-                        "is_error": result.is_error,
-                    }
-                )

                # Publish tool call completed
                await self._publish_tool_completed(
@@ -708,7 +714,9 @@ class EventLoopNode(NodeProtocol):
                        content=discard_msg,
                        is_error=True,
                    )
-                    tool_results.append(
+                    # Discarded calls go into real_tool_results so the
+                    # caller sees they were attempted (for judge context).
+                    real_tool_results.append(
                        {
                            "tool_use_id": tc.tool_use_id,
                            "tool_name": tc.tool_name,
@@ -716,9 +724,24 @@ class EventLoopNode(NodeProtocol):
                            "is_error": True,
                        }
                    )
+                # Prune old tool results NOW to prevent context bloat on the
+                # next turn.  The char-based token estimator underestimates
+                # actual API tokens, so the standard compaction check in the
+                # outer loop may not trigger in time.
+                protect = max(2000, self._config.max_history_tokens // 12)
+                pruned = await conversation.prune_old_tool_results(
+                    protect_tokens=protect,
+                    min_prune_tokens=max(1000, protect // 3),
+                )
+                if pruned > 0:
+                    logger.info(
+                        "Post-limit pruning: cleared %d old tool results (budget: %d)",
+                        pruned,
+                        self._config.max_history_tokens,
+                    )
                # Limit hit — return from this turn so the judge can
                # evaluate instead of looping back for another stream.
-                return final_text, tool_results, token_counts
+                return final_text, real_tool_results, outputs_set_this_turn, token_counts

            # --- Mid-turn pruning: prevent context blowup within a single turn ---
            if conversation.usage_ratio() >= 0.6:
@@ -1025,7 +1048,8 @@ class EventLoopNode(NodeProtocol):
            truncated = (
                f"[Result from {tool_name}: {len(result.content)} chars — "
                f"too large for context, saved to '{filename}'. "
-                f"Use load_data('{filename}') to read the full result.]\n\n"
+                f"Use load_data(filename='{filename}', data_dir='{spill_dir}') "
+                f"to read the full result.]\n\n"
                f"Preview:\n{preview}…"
            )
            logger.info(
@@ -1244,9 +1268,11 @@ class EventLoopNode(NodeProtocol):

        # 5. Spillover files hint
        if self._config.spillover_dir:
+            spill = self._config.spillover_dir
            parts.append(
                "NOTE: Large tool results were saved to files. "
-                "Use load_data('<filename>') to read them."
+                f"Use load_data(filename='<filename>', data_dir='{spill}') "
+                "to read them."
            )

        # 6. Tool call history (prevent re-calling tools)
@@ -14,6 +14,7 @@ import logging
 import warnings
 from collections.abc import Callable
 from dataclasses import dataclass, field
+from pathlib import Path
 from typing import Any

 from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
@@ -128,6 +129,9 @@ class GraphExecutor:
        cleansing_config: CleansingConfig | None = None,
        enable_parallel_execution: bool = True,
        parallel_config: ParallelExecutionConfig | None = None,
+        event_bus: Any | None = None,
+        stream_id: str = "",
+        storage_path: str | Path | None = None,
    ):
        """
        Initialize the executor.
@@ -142,6 +146,9 @@ class GraphExecutor:
            cleansing_config: Optional output cleansing configuration
            enable_parallel_execution: Enable parallel fan-out execution (default True)
            parallel_config: Configuration for parallel execution behavior
+            event_bus: Optional event bus for emitting node lifecycle events
+            stream_id: Stream ID for event correlation
+            storage_path: Optional base path for conversation persistence
        """
        self.runtime = runtime
        self.llm = llm
@@ -151,6 +158,9 @@ class GraphExecutor:
        self.approval_callback = approval_callback
        self.validator = OutputValidator()
        self.logger = logging.getLogger(__name__)
+        self._event_bus = event_bus
+        self._stream_id = stream_id
+        self._storage_path = Path(storage_path) if storage_path else None

        # Initialize output cleaner
        self.cleansing_config = cleansing_config or CleansingConfig()
@@ -357,13 +367,33 @@ class GraphExecutor:
                        description=f"Validation errors for {current_node_id}: {validation_errors}",
                    )

+                # Emit node-started event (skip event_loop nodes — they emit their own)
+                if self._event_bus and node_spec.node_type != "event_loop":
+                    await self._event_bus.emit_node_loop_started(
+                        stream_id=self._stream_id, node_id=current_node_id
+                    )
+
                # Execute node
                self.logger.info("   Executing...")
                result = await node_impl.execute(ctx)

+                # Emit node-completed event (skip event_loop nodes)
+                if self._event_bus and node_spec.node_type != "event_loop":
+                    await self._event_bus.emit_node_loop_completed(
+                        stream_id=self._stream_id, node_id=current_node_id, iterations=1
+                    )
+
                if result.success:
-                    # Validate output before accepting it
-                    if result.output and node_spec.output_keys:
+                    # Validate output before accepting it.
+                    # Skip for event_loop nodes — their judge system is
+                    # the sole acceptance mechanism (see WP-8).  Empty
+                    # strings and other flexible outputs are legitimate
+                    # for LLM-driven nodes that already passed the judge.
+                    if (
+                        result.output
+                        and node_spec.output_keys
+                        and node_spec.node_type != "event_loop"
+                    ):
                        validation = self.validator.validate_all(
                            output=result.output,
                            expected_keys=node_spec.output_keys,
@@ -441,48 +471,66 @@ class GraphExecutor:
                        _is_retry = True
                        continue
                    else:
-                        # Max retries exceeded - fail the execution
+                        # Max retries exceeded - check for failure handlers
                        self.logger.error(
                            f"   ✗ Max retries ({max_retries}) exceeded for node {current_node_id}"
                        )
-                        self.runtime.report_problem(
-                            severity="critical",
-                            description=(
-                                f"Node {current_node_id} failed after "
-                                f"{max_retries} attempts: {result.error}"
-                            ),
-                        )
-                        self.runtime.end_run(
-                            success=False,
-                            output_data=memory.read_all(),
-                            narrative=(
-                                f"Failed at {node_spec.name} after "
-                                f"{max_retries} retries: {result.error}"
-                            ),
+
+                        # Check if there's an ON_FAILURE edge to follow
+                        next_node = self._follow_edges(
+                            graph=graph,
+                            goal=goal,
+                            current_node_id=current_node_id,
+                            current_node_spec=node_spec,
+                            result=result,  # result.success=False triggers ON_FAILURE
+                            memory=memory,
                        )

-                        # Calculate quality metrics
-                        total_retries_count = sum(node_retry_counts.values())
-                        nodes_failed = list(node_retry_counts.keys())
+                        if next_node:
+                            # Found a failure handler - route to it
+                            self.logger.info(f"   → Routing to failure handler: {next_node}")
+                            current_node_id = next_node
+                            continue  # Continue execution with handler
+                        else:
+                            # No failure handler - terminate execution
+                            self.runtime.report_problem(
+                                severity="critical",
+                                description=(
+                                    f"Node {current_node_id} failed after "
+                                    f"{max_retries} attempts: {result.error}"
+                                ),
+                            )
+                            self.runtime.end_run(
+                                success=False,
+                                output_data=memory.read_all(),
+                                narrative=(
+                                    f"Failed at {node_spec.name} after "
+                                    f"{max_retries} retries: {result.error}"
+                                ),
+                            )

-                        return ExecutionResult(
-                            success=False,
-                            error=(
-                                f"Node '{node_spec.name}' failed after "
-                                f"{max_retries} attempts: {result.error}"
-                            ),
-                            output=memory.read_all(),
-                            steps_executed=steps,
-                            total_tokens=total_tokens,
-                            total_latency_ms=total_latency,
-                            path=path,
-                            total_retries=total_retries_count,
-                            nodes_with_failures=nodes_failed,
-                            retry_details=dict(node_retry_counts),
-                            had_partial_failures=len(nodes_failed) > 0,
-                            execution_quality="failed",
-                            node_visit_counts=dict(node_visit_counts),
-                        )
+                            # Calculate quality metrics
+                            total_retries_count = sum(node_retry_counts.values())
+                            nodes_failed = list(node_retry_counts.keys())
+
+                            return ExecutionResult(
+                                success=False,
+                                error=(
+                                    f"Node '{node_spec.name}' failed after "
+                                    f"{max_retries} attempts: {result.error}"
+                                ),
+                                output=memory.read_all(),
+                                steps_executed=steps,
+                                total_tokens=total_tokens,
+                                total_latency_ms=total_latency,
+                                path=path,
+                                total_retries=total_retries_count,
+                                nodes_with_failures=nodes_failed,
+                                retry_details=dict(node_retry_counts),
+                                had_partial_failures=len(nodes_failed) > 0,
+                                execution_quality="failed",
+                                node_visit_counts=dict(node_visit_counts),
+                            )

                # Check if we just executed a pause node - if so, save state and return
                # This must happen BEFORE determining next node, since pause nodes may have no edges
@@ -781,11 +829,43 @@ class GraphExecutor:
            )

        if node_spec.node_type == "event_loop":
-            # Event loop nodes must be pre-registered (like function nodes)
-            raise RuntimeError(
-                f"EventLoopNode '{node_spec.id}' not found in registry. "
-                "Register it with executor.register_node() before execution."
+            # Auto-create EventLoopNode with sensible defaults.
+            # Custom configs can still be pre-registered via node_registry.
+            from framework.graph.event_loop_node import EventLoopNode, LoopConfig
+
+            # Create a FileConversationStore if a storage path is available
+            conv_store = None
+            if self._storage_path:
+                from framework.storage.conversation_store import FileConversationStore
+
+                store_path = self._storage_path / "conversations" / node_spec.id
+                conv_store = FileConversationStore(base_path=store_path)
+
+            # Auto-configure spillover directory for large tool results.
+            # When a tool result exceeds max_tool_result_chars, the full
+            # content is written to spillover_dir and the agent gets a
+            # truncated preview with instructions to use load_data().
+            spillover = None
+            if self._storage_path:
+                spillover = str(self._storage_path / "data")
+
+            node = EventLoopNode(
+                event_bus=self._event_bus,
+                judge=None,  # implicit judge: accept when output_keys are filled
+                config=LoopConfig(
+                    max_iterations=100 if node_spec.client_facing else 50,
+                    max_tool_calls_per_turn=10,
+                    stall_detection_threshold=3,
+                    max_history_tokens=32000,
+                    max_tool_result_chars=3_000,
+                    spillover_dir=spillover,
+                ),
+                tool_executor=self.tool_executor,
+                conversation_store=conv_store,
            )
+            # Cache so inject_event() is reachable for client-facing input
+            self.node_registry[node_spec.id] = node
+            return node

        # Should never reach here due to validation above
        raise RuntimeError(f"Unhandled node type: {node_spec.node_type}")
@@ -814,9 +894,12 @@ class GraphExecutor:
                source_node_name=current_node_spec.name if current_node_spec else current_node_id,
                target_node_name=target_node_spec.name if target_node_spec else edge.target,
            ):
-                # Validate and clean output before mapping inputs
+                # Validate and clean output before mapping inputs.
+                # Use full memory state (not just result.output) because
+                # target input_keys may come from earlier nodes in the
+                # graph, not only from the immediate source node.
                if self.cleansing_config.enabled and target_node_spec:
-                    output_to_validate = result.output
+                    output_to_validate = memory.read_all()

                    validation = self.output_cleaner.validate_output(
                        output=output_to_validate,
@@ -1012,10 +1095,13 @@ class GraphExecutor:
            branch.status = "running"

            try:
-                # Validate and clean output before mapping inputs (same as _follow_edges)
+                # Validate and clean output before mapping inputs (same as _follow_edges).
+                # Use full memory state since target input_keys may come
+                # from earlier nodes, not just the immediate source.
                if self.cleansing_config.enabled and node_spec:
+                    mem_snapshot = memory.read_all()
                    validation = self.output_cleaner.validate_output(
-                        output=source_result.output,
+                        output=mem_snapshot,
                        source_node_id=source_node_spec.id if source_node_spec else "unknown",
                        target_node_spec=node_spec,
                    )
@@ -1026,7 +1112,7 @@ class GraphExecutor:
                            f"{branch.node_id}: {validation.errors}"
                        )
                        cleaned_output = self.output_cleaner.clean_output(
-                            output=source_result.output,
+                            output=mem_snapshot,
                            source_node_id=source_node_spec.id if source_node_spec else "unknown",
                            target_node_spec=node_spec,
                            validation_errors=validation.errors,
@@ -1049,12 +1135,24 @@ class GraphExecutor:
                    ctx = self._build_context(node_spec, memory, goal, mapped, graph.max_tokens)
                    node_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)

+                    # Emit node-started event (skip event_loop nodes)
+                    if self._event_bus and node_spec.node_type != "event_loop":
+                        await self._event_bus.emit_node_loop_started(
+                            stream_id=self._stream_id, node_id=branch.node_id
+                        )
+
                    self.logger.info(
                        f"      ▶ Branch {node_spec.name}: executing (attempt {attempt + 1})"
                    )
                    result = await node_impl.execute(ctx)
                    last_result = result

+                    # Emit node-completed event (skip event_loop nodes)
+                    if self._event_bus and node_spec.node_type != "event_loop":
+                        await self._event_bus.emit_node_loop_completed(
+                            stream_id=self._stream_id, node_id=branch.node_id, iterations=1
+                        )
+
                    if result.success:
                        # Write outputs to shared memory using async write
                        for key, value in result.output.items():
@@ -144,8 +144,11 @@ class OutputCleaner:
        errors = []
        warnings = []

-        # Check 1: Required input keys present
+        # Check 1: Required input keys present (skip nullable keys)
+        nullable = set(getattr(target_node_spec, "nullable_output_keys", None) or [])
        for key in target_node_spec.input_keys:
+            if key in nullable:
+                continue
            if key not in output:
                errors.append(f"Missing required key: '{key}'")
                continue
@@ -572,17 +572,21 @@ class LiteLLMProvider(LLMProvider):
                # and we skip the retry path — nothing was yielded in vain.)
                has_content = accumulated_text or tool_calls_acc
                if not has_content and attempt < RATE_LIMIT_MAX_RETRIES:
-                    # If the conversation ends with an assistant message,
-                    # an empty stream is expected (nothing new to say).
-                    # Don't retry — just flush whatever we have.
+                    # If the conversation ends with an assistant or tool
+                    # message, an empty stream is expected — the LLM has
+                    # nothing new to say.  Don't burn retries on this;
+                    # let the caller (EventLoopNode) decide what to do.
+                    # Typical case: client_facing node where the LLM set
+                    # all outputs via set_output tool calls, and the tool
+                    # results are the last messages.
                    last_role = next(
                        (m["role"] for m in reversed(full_messages) if m.get("role") != "system"),
                        None,
                    )
-                    if last_role == "assistant":
+                    if last_role in ("assistant", "tool"):
                        logger.debug(
-                            "[stream] Empty response after assistant message — "
-                            "expected, not retrying."
+                            "[stream] Empty response after %s message — expected, not retrying.",
+                            last_role,
                        )
                        for event in tail_events:
                            yield event
@@ -1105,17 +1105,30 @@ def validate_graph() -> str:
                errors.append(f"Unreachable nodes: {unreachable}")

    # === CONTEXT FLOW VALIDATION ===
-    # Build dependency map (node_id -> list of nodes it depends on)
+    # Build dependency maps — separate forward edges from feedback edges.
+    # Feedback edges (priority < 0) create cycles; they must not block the
+    # topological sort.  Context they carry arrives on *revisits*, not on
+    # the first execution of a node.
+    feedback_edge_ids = {e.id for e in session.edges if e.priority < 0}
+    forward_dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
+    feedback_sources: dict[str, list[str]] = {node.id: [] for node in session.nodes}
+    # Combined map kept for error-message generation (all deps)
    dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
+
    for edge in session.edges:
-        if edge.target in dependencies:
-            dependencies[edge.target].append(edge.source)
+        if edge.target not in forward_dependencies:
+            continue
+        dependencies[edge.target].append(edge.source)
+        if edge.id in feedback_edge_ids:
+            feedback_sources[edge.target].append(edge.source)
+        else:
+            forward_dependencies[edge.target].append(edge.source)

    # Build output map (node_id -> keys it produces)
    node_outputs: dict[str, set[str]] = {node.id: set(node.output_keys) for node in session.nodes}

    # Compute available context for each node (what keys it can read)
-    # Using topological order
+    # Using topological order on the forward-edge DAG
    available_context: dict[str, set[str]] = {}
    computed = set()
    nodes_by_id = {n.id: n for n in session.nodes}
@@ -1125,7 +1138,8 @@ def validate_graph() -> str:
    # Entry nodes can only read from initial context
    initial_context_keys: set[str] = set()

-    # Compute in topological order
+    # Compute in topological order (forward edges only — feedback edges
+    # don't block, since their context arrives on revisits)
    remaining = {n.id for n in session.nodes}
    max_iterations = len(session.nodes) * 2

@@ -1134,18 +1148,23 @@ def validate_graph() -> str:
            break

        for node_id in list(remaining):
-            deps = dependencies.get(node_id, [])
+            fwd_deps = forward_dependencies.get(node_id, [])

-            # Can compute if all dependencies are computed (or no dependencies)
-            if all(d in computed for d in deps):
-                # Collect outputs from all dependencies
+            # Can compute if all FORWARD dependencies are computed
+            if all(d in computed for d in fwd_deps):
+                # Collect outputs from all forward dependencies
                available = set(initial_context_keys)
-                for dep_id in deps:
-                    # Add outputs from dependency
+                for dep_id in fwd_deps:
                    available.update(node_outputs.get(dep_id, set()))
-                    # Also add what was available to the dependency (transitive)
                    available.update(available_context.get(dep_id, set()))

+                # Also include context from already-computed feedback
+                # sources (bonus, not blocking)
+                for fb_src in feedback_sources.get(node_id, []):
+                    if fb_src in computed:
+                        available.update(node_outputs.get(fb_src, set()))
+                        available.update(available_context.get(fb_src, set()))
+
                available_context[node_id] = available
                computed.add(node_id)
                remaining.remove(node_id)
@@ -1155,15 +1174,37 @@ def validate_graph() -> str:
    context_errors = []
    context_warnings = []
    missing_inputs: dict[str, list[str]] = {}
+    feedback_only_inputs: dict[str, list[str]] = {}

    for node in session.nodes:
        available = available_context.get(node.id, set())

        for input_key in node.input_keys:
            if input_key not in available:
-                if node.id not in missing_inputs:
-                    missing_inputs[node.id] = []
-                missing_inputs[node.id].append(input_key)
+                # Check if this input is provided by a feedback source
+                fb_provides = set()
+                for fb_src in feedback_sources.get(node.id, []):
+                    fb_provides.update(node_outputs.get(fb_src, set()))
+                    fb_provides.update(available_context.get(fb_src, set()))
+
+                if input_key in fb_provides:
+                    # Input arrives via feedback edge — warn, don't error
+                    if node.id not in feedback_only_inputs:
+                        feedback_only_inputs[node.id] = []
+                    feedback_only_inputs[node.id].append(input_key)
+                else:
+                    if node.id not in missing_inputs:
+                        missing_inputs[node.id] = []
+                    missing_inputs[node.id].append(input_key)
+
+    # Warn about feedback-only inputs (available on revisits, not first run)
+    for node_id, fb_keys in feedback_only_inputs.items():
+        fb_srcs = feedback_sources.get(node_id, [])
+        context_warnings.append(
+            f"Node '{node_id}' input(s) {fb_keys} are only provided via "
+            f"feedback edge(s) from {fb_srcs}. These will be available on "
+            f"revisits but not on the first execution."
+        )

    # Generate helpful error messages
    for node_id, missing in missing_inputs.items():
@@ -56,6 +56,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        action="store_true",
        help="Show detailed execution logs (steps, LLM calls, etc.)",
    )
+    run_parser.add_argument(
+        "--tui",
+        action="store_true",
+        help="Launch interactive terminal dashboard",
+    )
+    run_parser.add_argument(
+        "--model",
+        "-m",
+        type=str,
+        default=None,
+        help="LLM model to use (any LiteLLM-compatible name)",
+    )
    run_parser.set_defaults(func=cmd_run)

    # info command
@@ -205,38 +217,83 @@ def cmd_run(args: argparse.Namespace) -> int:
            print(f"Error reading input file: {e}", file=sys.stderr)
            return 1

-    # Load and run agent
-    try:
-        runner = AgentRunner.load(
-            args.agent_path,
-            mock_mode=args.mock,
-            model=getattr(args, "model", "claude-haiku-4-5-20251001"),
-        )
-    except FileNotFoundError as e:
-        print(f"Error: {e}", file=sys.stderr)
-        return 1
+    # Run the agent (with TUI or standard)
+    if getattr(args, "tui", False):
+        from framework.tui.app import AdenTUI

-    # Auto-inject user_id if the agent expects it but it's not provided
-    entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
-    if "user_id" in entry_input_keys and context.get("user_id") is None:
-        import os
+        async def run_with_tui():
+            try:
+                # Load runner inside the async loop to ensure strict loop affinity
+                # (only one load — avoids spawning duplicate MCP subprocesses)
+                try:
+                    runner = AgentRunner.load(
+                        args.agent_path,
+                        mock_mode=args.mock,
+                        model=args.model,
+                        enable_tui=True,
+                    )
+                except Exception as e:
+                    print(f"Error loading agent: {e}")
+                    return

-        context["user_id"] = os.environ.get("USER", "default_user")
+                # Force setup inside the loop
+                if runner._agent_runtime is None:
+                    runner._setup()

-    if not args.quiet:
-        info = runner.info()
-        print(f"Agent: {info.name}")
-        print(f"Goal: {info.goal_name}")
-        print(f"Steps: {info.node_count}")
-        print(f"Input: {json.dumps(context)}")
-        print()
-        print("=" * 60)
-        print("Executing agent...")
-        print("=" * 60)
-        print()
+                # Start runtime before TUI so it's ready for user input
+                if runner._agent_runtime and not runner._agent_runtime.is_running:
+                    await runner._agent_runtime.start()

-    # Run the agent
-    result = asyncio.run(runner.run(context))
+                app = AdenTUI(runner._agent_runtime)
+
+                # TUI handles execution via ChatRepl — user submits input,
+                # ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
+                await app.run_async()
+            except Exception as e:
+                import traceback
+
+                traceback.print_exc()
+                print(f"TUI error: {e}")
+
+            await runner.cleanup_async()
+            return None
+
+        asyncio.run(run_with_tui())
+        print("TUI session ended.")
+        return 0
+    else:
+        # Standard execution — load runner here (not shared with TUI path)
+        try:
+            runner = AgentRunner.load(
+                args.agent_path,
+                mock_mode=args.mock,
+                model=args.model,
+                enable_tui=False,
+            )
+        except FileNotFoundError as e:
+            print(f"Error: {e}", file=sys.stderr)
+            return 1
+
+        # Auto-inject user_id if the agent expects it but it's not provided
+        entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
+        if "user_id" in entry_input_keys and context.get("user_id") is None:
+            import os
+
+            context["user_id"] = os.environ.get("USER", "default_user")
+
+        if not args.quiet:
+            info = runner.info()
+            print(f"Agent: {info.name}")
+            print(f"Goal: {info.goal_name}")
+            print(f"Steps: {info.node_count}")
+            print(f"Input: {json.dumps(context)}")
+            print()
+            print("=" * 60)
+            print("Executing agent...")
+            print("=" * 60)
+            print()
+
+        result = asyncio.run(runner.run(context))

    # Format output
    output = {
@@ -362,6 +362,15 @@ class MCPClient:
        # Call tool using persistent session
        result = await self._session.call_tool(tool_name, arguments=arguments)

+        # Check for server-side errors (validation failures, tool exceptions, etc.)
+        if getattr(result, "isError", False):
+            error_text = ""
+            if result.content:
+                content_item = result.content[0]
+                if hasattr(content_item, "text"):
+                    error_text = content_item.text
+            raise RuntimeError(f"MCP tool '{tool_name}' failed: {error_text}")
+
        # Extract content
        if result.content:
            # MCP returns content as a list of content items
@@ -28,6 +28,33 @@ logger = logging.getLogger(__name__)

 # Configuration paths
 HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
+
+
+def _ensure_credential_key_env() -> None:
+    """Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
+
+    The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
+    If the user hasn't sourced their config in the current shell, this reads it
+    directly so the runner (and any MCP subprocesses it spawns) can unlock the
+    encrypted credential store.
+
+    Only HIVE_CREDENTIAL_KEY is loaded this way — all other secrets (API keys, etc.)
+    come from the credential store itself.
+    """
+    if os.environ.get("HIVE_CREDENTIAL_KEY"):
+        return
+
+    try:
+        from aden_tools.credentials.shell_config import check_env_var_in_shell_config
+
+        found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
+        if found and value:
+            os.environ["HIVE_CREDENTIAL_KEY"] = value
+            logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
+    except ImportError:
+        pass
+
+
 CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"


@@ -236,6 +263,15 @@ class AgentRunner:
        result = await runner.run({"lead_id": "123"})
    """

+    @staticmethod
+    def _resolve_default_model() -> str:
+        """Resolve the default model from ~/.hive/configuration.json."""
+        config = get_hive_config()
+        llm = config.get("llm", {})
+        if llm.get("provider") and llm.get("model"):
+            return f"{llm['provider']}/{llm['model']}"
+        return "anthropic/claude-sonnet-4-20250514"
+
    def __init__(
        self,
        agent_path: Path,
@@ -243,7 +279,8 @@ class AgentRunner:
        goal: Goal,
        mock_mode: bool = False,
        storage_path: Path | None = None,
-        model: str = "cerebras/zai-glm-4.7",
+        model: str | None = None,
+        enable_tui: bool = False,
    ):
        """
        Initialize the runner (use AgentRunner.load() instead).
@@ -254,14 +291,15 @@ class AgentRunner:
            goal: Loaded Goal object
            mock_mode: If True, use mock LLM responses
            storage_path: Path for runtime storage (defaults to temp)
-            model: Model to use - any LiteLLM-compatible model name
-                   (e.g., "claude-sonnet-4-20250514", "gpt-4o-mini", "gemini/gemini-pro")
+            model: Model to use (reads from agent config or ~/.hive/configuration.json if None)
+            enable_tui: If True, forces use of AgentRuntime with EventBus
        """
        self.agent_path = agent_path
        self.graph = graph
        self.goal = goal
        self.mock_mode = mock_mode
-        self.model = model
+        self.model = model or self._resolve_default_model()
+        self.enable_tui = enable_tui

        # Set up storage
        if storage_path:
@@ -275,6 +313,10 @@ class AgentRunner:
            self._storage_path = default_storage
            self._temp_dir = None

+        # Load HIVE_CREDENTIAL_KEY from shell config if not in env.
+        # Must happen before MCP subprocesses are spawned so they inherit it.
+        _ensure_credential_key_env()
+
        # Initialize components
        self._tool_registry = ToolRegistry()
        self._runtime: Runtime | None = None
@@ -296,32 +338,121 @@ class AgentRunner:
        if mcp_config_path.exists():
            self._load_mcp_servers_from_config(mcp_config_path)

+    @staticmethod
+    def _import_agent_module(agent_path: Path):
+        """Import an agent package from its directory path.
+
+        Tries package import first (works when exports/ is on sys.path,
+        which cli.py:_configure_paths() ensures). Falls back to direct
+        file import of agent.py via importlib.util.
+        """
+        import importlib
+
+        package_name = agent_path.name
+
+        # Try importing as a package (works when exports/ is on sys.path)
+        try:
+            return importlib.import_module(package_name)
+        except ImportError:
+            pass
+
+        # Fallback: import agent.py directly via file path
+        import importlib.util
+
+        agent_py = agent_path / "agent.py"
+        if not agent_py.exists():
+            raise FileNotFoundError(
+                f"No importable agent found at {agent_path}. "
+                f"Expected a Python package with agent.py."
+            )
+        spec = importlib.util.spec_from_file_location(
+            f"{package_name}.agent",
+            agent_py,
+            submodule_search_locations=[str(agent_path)],
+        )
+        module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(module)
+        return module
+
    @classmethod
    def load(
        cls,
        agent_path: str | Path,
        mock_mode: bool = False,
        storage_path: Path | None = None,
-        model: str = "cerebras/zai-glm-4.7",
+        model: str | None = None,
+        enable_tui: bool = False,
    ) -> "AgentRunner":
        """
        Load an agent from an export folder.

+        Imports the agent's Python package and reads module-level variables
+        (goal, nodes, edges, etc.) to build a GraphSpec. Falls back to
+        agent.json if no Python module is found.
+
        Args:
-            agent_path: Path to agent folder (containing agent.json)
+            agent_path: Path to agent folder
            mock_mode: If True, use mock LLM responses
-            storage_path: Path for runtime storage (defaults to temp)
-            model: LLM model to use (any LiteLLM-compatible model name)
+            storage_path: Path for runtime storage (defaults to ~/.hive/storage/{name})
+            model: LLM model to use (reads from agent's default_config if None)
+            enable_tui: If True, forces use of AgentRuntime with EventBus

        Returns:
            AgentRunner instance ready to run
        """
        agent_path = Path(agent_path)

-        # Load agent.json
+        # Try loading from Python module first (code-based agents)
+        agent_py = agent_path / "agent.py"
+        if agent_py.exists():
+            agent_module = cls._import_agent_module(agent_path)
+
+            goal = getattr(agent_module, "goal", None)
+            nodes = getattr(agent_module, "nodes", None)
+            edges = getattr(agent_module, "edges", None)
+
+            if goal is None or nodes is None or edges is None:
+                raise ValueError(
+                    f"Agent at {agent_path} must define 'goal', 'nodes', and 'edges' "
+                    f"in agent.py (or __init__.py)"
+                )
+
+            # Read model and max_tokens from agent's config if not explicitly provided
+            agent_config = getattr(agent_module, "default_config", None)
+            if model is None:
+                if agent_config and hasattr(agent_config, "model"):
+                    model = agent_config.model
+
+            max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
+
+            # Build GraphSpec from module-level variables
+            graph = GraphSpec(
+                id=f"{agent_path.name}-graph",
+                goal_id=goal.id,
+                version="1.0.0",
+                entry_node=getattr(agent_module, "entry_node", nodes[0].id),
+                entry_points=getattr(agent_module, "entry_points", {}),
+                terminal_nodes=getattr(agent_module, "terminal_nodes", []),
+                pause_nodes=getattr(agent_module, "pause_nodes", []),
+                nodes=nodes,
+                edges=edges,
+                max_tokens=max_tokens,
+            )
+
+            return cls(
+                agent_path=agent_path,
+                graph=graph,
+                goal=goal,
+                mock_mode=mock_mode,
+                storage_path=storage_path,
+                model=model,
+                enable_tui=enable_tui,
+            )
+
+        # Fallback: load from agent.json (legacy JSON-based agents)
        agent_json_path = agent_path / "agent.json"
        if not agent_json_path.exists():
-            raise FileNotFoundError(f"agent.json not found in {agent_path}")
+            raise FileNotFoundError(f"No agent.py or agent.json found in {agent_path}")

        with open(agent_json_path) as f:
            graph, goal = load_agent_export(f.read())
@@ -333,6 +464,7 @@ class AgentRunner:
            mock_mode=mock_mode,
            storage_path=storage_path,
            model=model,
+            enable_tui=enable_tui,
        )

    def register_tool(
@@ -471,16 +603,25 @@ class AgentRunner:
                api_key_env = self._get_api_key_env_var(self.model)
                if api_key_env and os.environ.get(api_key_env):
                    self._llm = LiteLLMProvider(model=self.model)
-                elif api_key_env:
-                    print(f"Warning: {api_key_env} not set. LLM calls will fail.")
-                    print(f"Set it with: export {api_key_env}=your-api-key")
+                else:
+                    # Fall back to credential store
+                    api_key = self._get_api_key_from_credential_store()
+                    if api_key:
+                        self._llm = LiteLLMProvider(model=self.model, api_key=api_key)
+                        # Set env var so downstream code (e.g. cleanup LLM in
+                        # node._extract_json) can also find it
+                        if api_key_env:
+                            os.environ[api_key_env] = api_key
+                    elif api_key_env:
+                        print(f"Warning: {api_key_env} not set. LLM calls will fail.")
+                        print(f"Set it with: export {api_key_env}=your-api-key")

        # Get tools for executor/runtime
        tools = list(self._tool_registry.get_tools().values())
        tool_executor = self._tool_registry.get_executor()

-        if self._uses_async_entry_points:
-            # Multi-entry-point mode: use AgentRuntime
+        if self._uses_async_entry_points or self.enable_tui:
+            # Multi-entry-point mode or TUI mode: use AgentRuntime
            self._setup_agent_runtime(tools, tool_executor)
        else:
            # Single-entry-point mode: use legacy GraphExecutor
@@ -518,6 +659,33 @@ class AgentRunner:
            # Default: assume OpenAI-compatible
            return "OPENAI_API_KEY"

+    def _get_api_key_from_credential_store(self) -> str | None:
+        """Get the LLM API key from the encrypted credential store.
+
+        Maps model name to credential store ID (e.g. "anthropic/..." -> "anthropic")
+        and retrieves the key via CredentialStore.get().
+        """
+        if not os.environ.get("HIVE_CREDENTIAL_KEY"):
+            return None
+
+        # Map model prefix to credential store ID
+        model_lower = self.model.lower()
+        cred_id = None
+        if model_lower.startswith("anthropic/") or model_lower.startswith("claude"):
+            cred_id = "anthropic"
+        # Add more mappings as providers are added to LLM_CREDENTIALS
+
+        if cred_id is None:
+            return None
+
+        try:
+            from framework.credentials import CredentialStore
+
+            store = CredentialStore.with_encrypted_storage()
+            return store.get(cred_id)
+        except Exception:
+            return None
+
    def _setup_legacy_executor(self, tools: list, tool_executor: Callable | None) -> None:
        """Set up legacy single-entry-point execution using GraphExecutor."""
        # Create runtime
@@ -549,6 +717,19 @@ class AgentRunner:
            )
            entry_points.append(ep)

+        # If TUI enabled but no entry points (single-entry agent), create default
+        if not entry_points and self.enable_tui and self.graph.entry_node:
+            logger.info("Creating default entry point for TUI")
+            entry_points.append(
+                EntryPointSpec(
+                    id="default",
+                    name="Default",
+                    entry_node=self.graph.entry_node,
+                    trigger_type="manual",
+                    isolation_level="shared",
+                )
+            )
+
        # Create AgentRuntime with all entry points
        self._agent_runtime = create_agent_runtime(
            graph=self.graph,
@@ -599,7 +780,7 @@ class AgentRunner:
                error=error_msg,
            )

-        if self._uses_async_entry_points:
+        if self._uses_async_entry_points or self.enable_tui:
            # Multi-entry-point mode: use AgentRuntime
            return await self._run_with_agent_runtime(
                input_data=input_data or {},
@@ -891,15 +1072,25 @@ class AgentRunner:
                EnvVarStorage,
            )

-            # Build env mapping for fallback
+            # Build env mapping for credential lookup
            env_mapping = {
                (spec.credential_id or name): spec.env_var
                for name, spec in CREDENTIAL_SPECS.items()
            }
-            storage = CompositeStorage(
-                primary=EncryptedFileStorage(),
-                fallbacks=[EnvVarStorage(env_mapping=env_mapping)],
-            )
+
+            # Only use EncryptedFileStorage if the encryption key is configured;
+            # otherwise just check env vars (avoids generating a throwaway key)
+            storages: list = [EnvVarStorage(env_mapping=env_mapping)]
+            if os.environ.get("HIVE_CREDENTIAL_KEY"):
+                storages.insert(0, EncryptedFileStorage())
+
+            if len(storages) == 1:
+                storage = storages[0]
+            else:
+                storage = CompositeStorage(
+                    primary=storages[0],
+                    fallbacks=storages[1:],
+                )
            store = CredentialStore(storage=storage)

            # Build reverse mappings
@@ -33,6 +33,11 @@ class ToolRegistry:
    4. Manually registered tools
    """

+    # Framework-internal context keys injected into tool calls.
+    # Stripped from LLM-facing schemas (the LLM doesn't know these values)
+    # and auto-injected at call time for tools that accept them.
+    CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id"})
+
    def __init__(self):
        self._tools: dict[str, RegisteredTool] = {}
        self._mcp_clients: list[Any] = []  # List of MCPClient instances
@@ -275,7 +280,16 @@ class ToolRegistry:
            return

        base_dir = config_path.parent
-        for server_config in config.get("servers", []):
+
+        # Support both formats:
+        #   {"servers": [{"name": "x", ...}]}        (list format)
+        #   {"server-name": {"transport": ...}, ...}  (dict format)
+        server_list = config.get("servers", [])
+        if not server_list and "servers" not in config:
+            # Treat top-level keys as server names
+            server_list = [{"name": name, **cfg} for name, cfg in config.items()]
+
+        for server_config in server_list:
            cwd = server_config.get("cwd")
            if cwd and not Path(cwd).is_absolute():
                server_config["cwd"] = str((base_dir / cwd).resolve())
@@ -333,7 +347,7 @@ class ToolRegistry:
            # Register each tool
            count = 0
            for mcp_tool in client.list_tools():
-                # Convert MCP tool to framework Tool
+                # Convert MCP tool to framework Tool (strips context params from LLM schema)
                tool = self._convert_mcp_tool_to_framework_tool(mcp_tool)

                # Create executor that calls the MCP server
@@ -395,6 +409,11 @@ class ToolRegistry:
        properties = input_schema.get("properties", {})
        required = input_schema.get("required", [])

+        # Strip framework-internal context params from LLM-facing schema.
+        # The LLM can't know these values; they're auto-injected at call time.
+        properties = {k: v for k, v in properties.items() if k not in self.CONTEXT_PARAMS}
+        required = [r for r in required if r not in self.CONTEXT_PARAMS]
+
        # Convert to framework Tool format
        tool = Tool(
            name=mcp_tool.name,
@@ -296,6 +296,25 @@ class AgentRuntime:
            raise ValueError(f"Entry point '{entry_point_id}' not found")
        return await stream.wait_for_completion(exec_id, timeout)

+    async def inject_input(self, node_id: str, content: str) -> bool:
+        """Inject user input into a running client-facing node.
+
+        Routes input to the EventLoopNode identified by ``node_id``
+        across all active streams. Used by the TUI ChatRepl to deliver
+        user responses during client-facing node execution.
+
+        Args:
+            node_id: The node currently waiting for input
+            content: The user's input text
+
+        Returns:
+            True if input was delivered, False if no matching node found
+        """
+        for stream in self._streams.values():
+            if await stream.inject_input(node_id, content):
+                return True
+        return False
+
    async def get_goal_progress(self) -> dict[str, Any]:
        """
        Evaluate goal progress across all streams.
@@ -153,6 +153,7 @@ class ExecutionStream:
        # Execution tracking
        self._active_executions: dict[str, ExecutionContext] = {}
        self._execution_tasks: dict[str, asyncio.Task] = {}
+        self._active_executors: dict[str, GraphExecutor] = {}
        self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
        self._execution_result_times: dict[str, float] = {}
        self._completion_events: dict[str, asyncio.Event] = {}
@@ -237,6 +238,21 @@ class ExecutionStream:
                )
            )

+    async def inject_input(self, node_id: str, content: str) -> bool:
+        """Inject user input into a running client-facing EventLoopNode.
+
+        Searches active executors for a node matching ``node_id`` and calls
+        its ``inject_event()`` method to unblock ``_await_user_input()``.
+
+        Returns True if input was delivered, False otherwise.
+        """
+        for executor in self._active_executors.values():
+            node = executor.node_registry.get(node_id)
+            if node is not None and hasattr(node, "inject_event"):
+                await node.inject_event(content)
+                return True
+        return False
+
    async def execute(
        self,
        input_data: dict[str, Any],
@@ -314,13 +330,21 @@ class ExecutionStream:
                # Create runtime adapter for this execution
                runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)

-                # Create executor for this execution
+                # Create executor for this execution.
+                # Scope storage by execution_id so each execution gets
+                # fresh conversations and spillover directories.
+                exec_storage = self._storage.base_path / "sessions" / execution_id
                executor = GraphExecutor(
                    runtime=runtime_adapter,
                    llm=self._llm,
                    tools=self._tools,
                    tool_executor=self._tool_executor,
+                    event_bus=self._event_bus,
+                    stream_id=self.stream_id,
+                    storage_path=exec_storage,
                )
+                # Track executor so inject_input() can reach EventLoopNode instances
+                self._active_executors[execution_id] = executor

                # Create modified graph with entry point
                # We need to override the entry_node to use our entry point
@@ -334,6 +358,9 @@ class ExecutionStream:
                    session_state=ctx.session_state,
                )

+                # Clean up executor reference
+                self._active_executors.pop(execution_id, None)
+
                # Store result with retention
                self._record_execution_result(execution_id, result)

@@ -0,0 +1,518 @@
+import logging
+import time
+
+from textual.app import App, ComposeResult
+from textual.binding import Binding
+from textual.containers import Container, Horizontal, Vertical
+from textual.widgets import Footer, Label
+
+from framework.runtime.agent_runtime import AgentRuntime
+from framework.runtime.event_bus import AgentEvent, EventType
+from framework.tui.widgets.chat_repl import ChatRepl
+from framework.tui.widgets.graph_view import GraphOverview
+from framework.tui.widgets.log_pane import LogPane
+
+
+class StatusBar(Container):
+    """Live status bar showing agent execution state."""
+
+    DEFAULT_CSS = """
+    StatusBar {
+        dock: top;
+        height: 1;
+        background: $panel;
+        color: $text;
+        padding: 0 1;
+    }
+    StatusBar > Label {
+        width: 100%;
+    }
+    """
+
+    def __init__(self, graph_id: str = ""):
+        super().__init__()
+        self._graph_id = graph_id
+        self._state = "idle"
+        self._active_node: str | None = None
+        self._node_detail: str = ""
+        self._start_time: float | None = None
+        self._final_elapsed: float | None = None
+
+    def compose(self) -> ComposeResult:
+        yield Label(id="status-content")
+
+    def on_mount(self) -> None:
+        self._refresh()
+        self.set_interval(1.0, self._refresh)
+
+    def _format_elapsed(self, seconds: float) -> str:
+        total = int(seconds)
+        hours, remainder = divmod(total, 3600)
+        mins, secs = divmod(remainder, 60)
+        if hours:
+            return f"{hours}:{mins:02d}:{secs:02d}"
+        return f"{mins}:{secs:02d}"
+
+    def _refresh(self) -> None:
+        parts: list[str] = []
+
+        if self._graph_id:
+            parts.append(f"[bold]{self._graph_id}[/bold]")
+
+        if self._state == "idle":
+            parts.append("[dim]○ idle[/dim]")
+        elif self._state == "running":
+            parts.append("[bold green]● running[/bold green]")
+        elif self._state == "completed":
+            parts.append("[green]✓ done[/green]")
+        elif self._state == "failed":
+            parts.append("[bold red]✗ failed[/bold red]")
+
+        if self._active_node:
+            node_str = f"[cyan]{self._active_node}[/cyan]"
+            if self._node_detail:
+                node_str += f" [dim]({self._node_detail})[/dim]"
+            parts.append(node_str)
+
+        if self._state == "running" and self._start_time:
+            parts.append(f"[dim]{self._format_elapsed(time.time() - self._start_time)}[/dim]")
+        elif self._final_elapsed is not None:
+            parts.append(f"[dim]{self._format_elapsed(self._final_elapsed)}[/dim]")
+
+        try:
+            label = self.query_one("#status-content", Label)
+            label.update(" │ ".join(parts))
+        except Exception:
+            pass
+
+    def set_graph_id(self, graph_id: str) -> None:
+        self._graph_id = graph_id
+        self._refresh()
+
+    def set_running(self, entry_node: str = "") -> None:
+        self._state = "running"
+        self._active_node = entry_node or None
+        self._node_detail = ""
+        self._start_time = time.time()
+        self._final_elapsed = None
+        self._refresh()
+
+    def set_completed(self) -> None:
+        self._state = "completed"
+        if self._start_time:
+            self._final_elapsed = time.time() - self._start_time
+        self._active_node = None
+        self._node_detail = ""
+        self._start_time = None
+        self._refresh()
+
+    def set_failed(self, error: str = "") -> None:
+        self._state = "failed"
+        if self._start_time:
+            self._final_elapsed = time.time() - self._start_time
+        self._node_detail = error[:40] if error else ""
+        self._start_time = None
+        self._refresh()
+
+    def set_active_node(self, node_id: str, detail: str = "") -> None:
+        self._active_node = node_id
+        self._node_detail = detail
+        self._refresh()
+
+    def set_node_detail(self, detail: str) -> None:
+        self._node_detail = detail
+        self._refresh()
+
+
+class AdenTUI(App):
+    TITLE = "Aden TUI Dashboard"
+    COMMAND_PALETTE_BINDING = "ctrl+o"
+    CSS = """
+    Screen {
+        layout: vertical;
+        background: $surface;
+    }
+
+    #left-pane {
+        width: 60%;
+        height: 100%;
+        layout: vertical;
+        background: $surface;
+    }
+
+    GraphOverview {
+        height: 40%;
+        background: $panel;
+        padding: 0;
+    }
+
+    LogPane {
+        height: 60%;
+        background: $surface;
+        padding: 0;
+        margin-bottom: 1;
+    }
+
+    ChatRepl {
+        width: 40%;
+        height: 100%;
+        background: $panel;
+        border-left: tall $primary;
+        padding: 0;
+    }
+
+    #chat-history {
+        height: 1fr;
+        width: 100%;
+        background: $surface;
+        border: none;
+        scrollbar-background: $panel;
+        scrollbar-color: $primary;
+    }
+
+    RichLog {
+        background: $surface;
+        border: none;
+        scrollbar-background: $panel;
+        scrollbar-color: $primary;
+    }
+
+    Input {
+        background: $surface;
+        border: tall $primary;
+        margin-top: 1;
+    }
+
+    Input:focus {
+        border: tall $accent;
+    }
+
+    StatusBar {
+        background: $panel;
+        color: $text;
+        height: 1;
+        padding: 0 1;
+    }
+
+    Footer {
+        background: $panel;
+        color: $text-muted;
+    }
+    """
+
+    BINDINGS = [
+        Binding("q", "quit", "Quit"),
+        Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
+        Binding("tab", "focus_next", "Next Panel", show=True),
+        Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
+    ]
+
+    def __init__(self, runtime: AgentRuntime):
+        super().__init__()
+
+        self.runtime = runtime
+        self.log_pane = LogPane()
+        self.graph_view = GraphOverview(runtime)
+        self.chat_repl = ChatRepl(runtime)
+        self.status_bar = StatusBar(graph_id=runtime.graph.id)
+        self.is_ready = False
+
+    def compose(self) -> ComposeResult:
+        yield self.status_bar
+
+        yield Horizontal(
+            Vertical(
+                self.log_pane,
+                self.graph_view,
+                id="left-pane",
+            ),
+            self.chat_repl,
+        )
+
+        yield Footer()
+
+    async def on_mount(self) -> None:
+        """Called when app starts."""
+        self.title = "Aden TUI Dashboard"
+
+        # Add logging setup
+        self._setup_logging_queue()
+
+        # Set ready immediately so _poll_logs can process messages
+        self.is_ready = True
+
+        # Add event subscription with delay to ensure TUI is fully initialized
+        self.call_later(self._init_runtime_connection)
+
+        # Delay initial log messages until layout is fully rendered
+        def write_initial_logs():
+            logging.info("TUI Dashboard initialized successfully")
+            logging.info("Waiting for agent execution to start...")
+
+        # Wait for layout to be fully rendered before writing logs
+        self.set_timer(0.2, write_initial_logs)
+
+    def _setup_logging_queue(self) -> None:
+        """Setup a thread-safe queue for logs."""
+        try:
+            import queue
+            from logging.handlers import QueueHandler
+
+            self.log_queue = queue.Queue()
+            self.queue_handler = QueueHandler(self.log_queue)
+            self.queue_handler.setLevel(logging.INFO)
+
+            # Get root logger
+            root_logger = logging.getLogger()
+
+            # Remove ALL existing handlers to prevent stdout output
+            # This is critical - StreamHandlers cause text to appear in header
+            for handler in root_logger.handlers[:]:
+                root_logger.removeHandler(handler)
+
+            # Add ONLY our queue handler
+            root_logger.addHandler(self.queue_handler)
+            root_logger.setLevel(logging.INFO)
+
+            # Suppress LiteLLM logging completely
+            litellm_logger = logging.getLogger("LiteLLM")
+            litellm_logger.setLevel(logging.CRITICAL)  # Only show critical errors
+            litellm_logger.propagate = False  # Don't propagate to root logger
+
+            # Start polling
+            self.set_interval(0.1, self._poll_logs)
+        except Exception:
+            pass
+
+    def _poll_logs(self) -> None:
+        """Poll the log queue and update UI."""
+        if not self.is_ready:
+            return
+
+        try:
+            while not self.log_queue.empty():
+                record = self.log_queue.get_nowait()
+                # Filter out framework/library logs
+                if record.name.startswith(("textual", "LiteLLM", "litellm")):
+                    continue
+
+                self.log_pane.write_python_log(record)
+        except Exception:
+            pass
+
+    _EVENT_TYPES = [
+        EventType.LLM_TEXT_DELTA,
+        EventType.CLIENT_OUTPUT_DELTA,
+        EventType.TOOL_CALL_STARTED,
+        EventType.TOOL_CALL_COMPLETED,
+        EventType.EXECUTION_STARTED,
+        EventType.EXECUTION_COMPLETED,
+        EventType.EXECUTION_FAILED,
+        EventType.NODE_LOOP_STARTED,
+        EventType.NODE_LOOP_ITERATION,
+        EventType.NODE_LOOP_COMPLETED,
+        EventType.CLIENT_INPUT_REQUESTED,
+        EventType.NODE_STALLED,
+        EventType.GOAL_PROGRESS,
+        EventType.GOAL_ACHIEVED,
+        EventType.CONSTRAINT_VIOLATION,
+        EventType.STATE_CHANGED,
+        EventType.NODE_INPUT_BLOCKED,
+    ]
+
+    _LOG_PANE_EVENTS = frozenset(_EVENT_TYPES) - {
+        EventType.LLM_TEXT_DELTA,
+        EventType.CLIENT_OUTPUT_DELTA,
+    }
+
+    async def _init_runtime_connection(self) -> None:
+        """Subscribe to runtime events with an async handler."""
+        try:
+            self._subscription_id = self.runtime.subscribe_to_events(
+                event_types=self._EVENT_TYPES,
+                handler=self._handle_event,
+            )
+        except Exception:
+            pass
+
+    async def _handle_event(self, event: AgentEvent) -> None:
+        """Called from the agent thread — bridge to Textual's main thread."""
+        try:
+            self.call_from_thread(self._route_event, event)
+        except Exception:
+            pass
+
+    def _route_event(self, event: AgentEvent) -> None:
+        """Route incoming events to widgets. Runs on Textual's main thread."""
+        if not self.is_ready:
+            return
+
+        try:
+            et = event.type
+
+            # --- Chat REPL events ---
+            if et in (EventType.LLM_TEXT_DELTA, EventType.CLIENT_OUTPUT_DELTA):
+                self.chat_repl.handle_text_delta(
+                    event.data.get("content", ""),
+                    event.data.get("snapshot", ""),
+                )
+            elif et == EventType.TOOL_CALL_STARTED:
+                self.chat_repl.handle_tool_started(
+                    event.data.get("tool_name", "unknown"),
+                    event.data.get("tool_input", {}),
+                )
+            elif et == EventType.TOOL_CALL_COMPLETED:
+                self.chat_repl.handle_tool_completed(
+                    event.data.get("tool_name", "unknown"),
+                    event.data.get("result", ""),
+                    event.data.get("is_error", False),
+                )
+            elif et == EventType.EXECUTION_COMPLETED:
+                self.chat_repl.handle_execution_completed(event.data.get("output", {}))
+            elif et == EventType.EXECUTION_FAILED:
+                self.chat_repl.handle_execution_failed(event.data.get("error", "Unknown error"))
+            elif et == EventType.CLIENT_INPUT_REQUESTED:
+                self.chat_repl.handle_input_requested(
+                    event.node_id or event.data.get("node_id", ""),
+                )
+
+            # --- Graph view events ---
+            if et in (
+                EventType.EXECUTION_STARTED,
+                EventType.EXECUTION_COMPLETED,
+                EventType.EXECUTION_FAILED,
+            ):
+                self.graph_view.update_execution(event)
+
+            if et == EventType.NODE_LOOP_STARTED:
+                self.graph_view.handle_node_loop_started(event.node_id or "")
+            elif et == EventType.NODE_LOOP_ITERATION:
+                self.graph_view.handle_node_loop_iteration(
+                    event.node_id or "",
+                    event.data.get("iteration", 0),
+                )
+            elif et == EventType.NODE_LOOP_COMPLETED:
+                self.graph_view.handle_node_loop_completed(event.node_id or "")
+            elif et == EventType.NODE_STALLED:
+                self.graph_view.handle_stalled(
+                    event.node_id or "",
+                    event.data.get("reason", ""),
+                )
+
+            if et == EventType.TOOL_CALL_STARTED:
+                self.graph_view.handle_tool_call(
+                    event.node_id or "",
+                    event.data.get("tool_name", "unknown"),
+                    started=True,
+                )
+            elif et == EventType.TOOL_CALL_COMPLETED:
+                self.graph_view.handle_tool_call(
+                    event.node_id or "",
+                    event.data.get("tool_name", "unknown"),
+                    started=False,
+                )
+
+            # --- Status bar events ---
+            if et == EventType.EXECUTION_STARTED:
+                entry_node = event.data.get("entry_node") or (
+                    self.runtime.graph.entry_node if self.runtime else ""
+                )
+                self.status_bar.set_running(entry_node)
+            elif et == EventType.EXECUTION_COMPLETED:
+                self.status_bar.set_completed()
+            elif et == EventType.EXECUTION_FAILED:
+                self.status_bar.set_failed(event.data.get("error", ""))
+            elif et == EventType.NODE_LOOP_STARTED:
+                self.status_bar.set_active_node(event.node_id or "", "thinking...")
+            elif et == EventType.NODE_LOOP_ITERATION:
+                self.status_bar.set_node_detail(f"step {event.data.get('iteration', '?')}")
+            elif et == EventType.TOOL_CALL_STARTED:
+                self.status_bar.set_node_detail(f"{event.data.get('tool_name', '')}...")
+            elif et == EventType.TOOL_CALL_COMPLETED:
+                self.status_bar.set_node_detail("thinking...")
+            elif et == EventType.NODE_STALLED:
+                self.status_bar.set_node_detail(f"stalled: {event.data.get('reason', '')}")
+
+            # --- Log pane events ---
+            if et in self._LOG_PANE_EVENTS:
+                self.log_pane.write_event(event)
+        except Exception:
+            pass
+
+    def save_screenshot(self, filename: str | None = None) -> str:
+        """Save a screenshot of the current screen as SVG (viewable in browsers).
+
+        Args:
+            filename: Optional filename for the screenshot. If None, generates timestamp-based name.
+
+        Returns:
+            Path to the saved SVG file.
+        """
+        from datetime import datetime
+        from pathlib import Path
+
+        # Create screenshots directory
+        screenshots_dir = Path("screenshots")
+        screenshots_dir.mkdir(exist_ok=True)
+
+        # Generate filename if not provided
+        if filename is None:
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            filename = f"tui_screenshot_{timestamp}.svg"
+
+        # Ensure .svg extension
+        if not filename.endswith(".svg"):
+            filename += ".svg"
+
+        # Full path
+        filepath = screenshots_dir / filename
+
+        # Temporarily hide borders for cleaner screenshot
+        chat_widget = self.query_one(ChatRepl)
+        original_chat_border = chat_widget.styles.border_left
+        chat_widget.styles.border_left = ("none", "transparent")
+
+        # Hide all Input widget borders
+        input_widgets = self.query("Input")
+        original_input_borders = []
+        for input_widget in input_widgets:
+            original_input_borders.append(input_widget.styles.border)
+            input_widget.styles.border = ("none", "transparent")
+
+        try:
+            # Get SVG data from Textual and save it
+            svg_data = self.export_screenshot()
+            filepath.write_text(svg_data, encoding="utf-8")
+        finally:
+            # Restore the original borders
+            chat_widget.styles.border_left = original_chat_border
+            for i, input_widget in enumerate(input_widgets):
+                input_widget.styles.border = original_input_borders[i]
+
+        return str(filepath)
+
+    def action_screenshot(self) -> None:
+        """Take a screenshot (bound to Ctrl+S)."""
+        try:
+            filepath = self.save_screenshot()
+            self.notify(
+                f"Screenshot saved: {filepath} (SVG - open in browser)",
+                severity="information",
+                timeout=5,
+            )
+        except Exception as e:
+            self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)
+
+    async def on_unmount(self) -> None:
+        """Cleanup on app shutdown."""
+        self.is_ready = False
+        try:
+            if hasattr(self, "_subscription_id"):
+                self.runtime.unsubscribe_from_events(self._subscription_id)
+        except Exception:
+            pass
+        try:
+            if hasattr(self, "queue_handler"):
+                logging.getLogger().removeHandler(self.queue_handler)
+        except Exception:
+            pass
@@ -0,0 +1,303 @@
+"""
+Chat / REPL Widget - Uses RichLog for append-only, selection-safe display.
+
+Streaming display approach:
+- The processing-indicator Label is used as a live status bar during streaming
+  (Label.update() replaces text in-place, unlike RichLog which is append-only).
+- On EXECUTION_COMPLETED, the final output is written to RichLog as permanent history.
+- Tool events are written directly to RichLog as discrete status lines.
+
+Client-facing input:
+- When a client_facing=True EventLoopNode emits CLIENT_INPUT_REQUESTED, the
+  ChatRepl transitions to "waiting for input" state: input is re-enabled and
+  subsequent submissions are routed to runtime.inject_input() instead of
+  starting a new execution.
+"""
+
+import asyncio
+import threading
+from typing import Any
+
+from textual.app import ComposeResult
+from textual.containers import Vertical
+from textual.widgets import Input, Label, RichLog
+
+from framework.runtime.agent_runtime import AgentRuntime
+
+
+class ChatRepl(Vertical):
+    """Widget for interactive chat/REPL."""
+
+    DEFAULT_CSS = """
+    ChatRepl {
+        width: 100%;
+        height: 100%;
+        layout: vertical;
+    }
+
+    ChatRepl > RichLog {
+        width: 100%;
+        height: 1fr;
+        background: $surface;
+        border: none;
+        scrollbar-background: $panel;
+        scrollbar-color: $primary;
+    }
+
+    ChatRepl > #processing-indicator {
+        width: 100%;
+        height: 1;
+        background: $primary 20%;
+        color: $text;
+        text-style: bold;
+        display: none;
+    }
+
+    ChatRepl > Input {
+        width: 100%;
+        height: auto;
+        dock: bottom;
+        background: $surface;
+        border: tall $primary;
+        margin-top: 1;
+    }
+
+    ChatRepl > Input:focus {
+        border: tall $accent;
+    }
+    """
+
+    def __init__(self, runtime: AgentRuntime):
+        super().__init__()
+        self.runtime = runtime
+        self._current_exec_id: str | None = None
+        self._streaming_snapshot: str = ""
+        self._waiting_for_input: bool = False
+        self._input_node_id: str | None = None
+
+        # Dedicated event loop for agent execution.
+        # Keeps blocking runtime code (LLM calls, MCP tools) off
+        # the Textual event loop so the UI stays responsive.
+        self._agent_loop = asyncio.new_event_loop()
+        self._agent_thread = threading.Thread(
+            target=self._agent_loop.run_forever,
+            daemon=True,
+            name="agent-execution",
+        )
+        self._agent_thread.start()
+
+    def compose(self) -> ComposeResult:
+        yield RichLog(id="chat-history", highlight=True, markup=True, auto_scroll=False, wrap=True)
+        yield Label("Agent is processing...", id="processing-indicator")
+        yield Input(placeholder="Enter input for agent...", id="chat-input")
+
+    def _write_history(self, content: str) -> None:
+        """Write to chat history, only auto-scrolling if user is at the bottom."""
+        history = self.query_one("#chat-history", RichLog)
+        was_at_bottom = history.is_vertical_scroll_end
+        history.write(content)
+        if was_at_bottom:
+            history.scroll_end(animate=False)
+
+    def on_mount(self) -> None:
+        """Add welcome message when widget mounts."""
+        history = self.query_one("#chat-history", RichLog)
+        history.write("[bold cyan]Chat REPL Ready[/bold cyan] — Type your input below\n")
+
+    async def on_input_submitted(self, message: Input.Submitted) -> None:
+        """Handle input submission — either start new execution or inject input."""
+        user_input = message.value.strip()
+        if not user_input:
+            return
+
+        # Client-facing input: route to the waiting node
+        if self._waiting_for_input and self._input_node_id:
+            self._write_history(f"[bold green]You:[/bold green] {user_input}")
+            message.input.value = ""
+
+            # Disable input while agent processes the response
+            chat_input = self.query_one("#chat-input", Input)
+            chat_input.disabled = True
+            chat_input.placeholder = "Enter input for agent..."
+            self._waiting_for_input = False
+
+            indicator = self.query_one("#processing-indicator", Label)
+            indicator.update("Thinking...")
+
+            node_id = self._input_node_id
+            self._input_node_id = None
+
+            try:
+                future = asyncio.run_coroutine_threadsafe(
+                    self.runtime.inject_input(node_id, user_input),
+                    self._agent_loop,
+                )
+                await asyncio.wrap_future(future)
+            except Exception as e:
+                self._write_history(f"[bold red]Error delivering input:[/bold red] {e}")
+            return
+
+        # Double-submit guard: reject input while an execution is in-flight
+        if self._current_exec_id is not None:
+            self._write_history("[dim]Agent is still running — please wait.[/dim]")
+            return
+
+        indicator = self.query_one("#processing-indicator", Label)
+
+        # Append user message and clear input
+        self._write_history(f"[bold green]You:[/bold green] {user_input}")
+        message.input.value = ""
+
+        try:
+            # Get entry point
+            entry_points = self.runtime.get_entry_points()
+            if not entry_points:
+                self._write_history("[bold red]Error:[/bold red] No entry points")
+                return
+
+            # Determine the input key from the entry node
+            entry_point = entry_points[0]
+            entry_node = self.runtime.graph.get_node(entry_point.entry_node)
+
+            if entry_node and entry_node.input_keys:
+                input_key = entry_node.input_keys[0]
+            else:
+                input_key = "input"
+
+            # Reset streaming state
+            self._streaming_snapshot = ""
+
+            # Show processing indicator
+            indicator.update("Thinking...")
+            indicator.display = True
+
+            # Disable input while the agent is working
+            chat_input = self.query_one("#chat-input", Input)
+            chat_input.disabled = True
+
+            # Submit execution to the dedicated agent loop so blocking
+            # runtime code (LLM, MCP tools) never touches Textual's loop.
+            # trigger() returns immediately with an exec_id; the heavy
+            # execution task runs entirely on the agent thread.
+            future = asyncio.run_coroutine_threadsafe(
+                self.runtime.trigger(
+                    entry_point_id=entry_point.id,
+                    input_data={input_key: user_input},
+                ),
+                self._agent_loop,
+            )
+            # wrap_future lets us await without blocking Textual's loop
+            self._current_exec_id = await asyncio.wrap_future(future)
+
+        except Exception as e:
+            indicator.display = False
+            self._current_exec_id = None
+            # Re-enable input on error
+            chat_input = self.query_one("#chat-input", Input)
+            chat_input.disabled = False
+            self._write_history(f"[bold red]Error:[/bold red] {e}")
+
+    # -- Event handlers called by app.py _handle_event --
+
+    def handle_text_delta(self, content: str, snapshot: str) -> None:
+        """Handle a streaming text token from the LLM."""
+        self._streaming_snapshot = snapshot
+
+        # Show a truncated live preview in the indicator label
+        indicator = self.query_one("#processing-indicator", Label)
+        preview = snapshot[-80:] if len(snapshot) > 80 else snapshot
+        # Replace newlines for single-line display
+        preview = preview.replace("\n", " ")
+        indicator.update(
+            f"Thinking: ...{preview}" if len(snapshot) > 80 else f"Thinking: {preview}"
+        )
+
+    def handle_tool_started(self, tool_name: str, tool_input: dict[str, Any]) -> None:
+        """Handle a tool call starting."""
+        # Update indicator to show tool activity
+        indicator = self.query_one("#processing-indicator", Label)
+        indicator.update(f"Using tool: {tool_name}...")
+
+        # Write a discrete status line to history
+        self._write_history(f"[dim]Tool: {tool_name}[/dim]")
+
+    def handle_tool_completed(self, tool_name: str, result: str, is_error: bool) -> None:
+        """Handle a tool call completing."""
+        result_str = str(result)
+        preview = result_str[:200] + "..." if len(result_str) > 200 else result_str
+        preview = preview.replace("\n", " ")
+
+        if is_error:
+            self._write_history(f"[dim red]Tool {tool_name} error: {preview}[/dim red]")
+        else:
+            self._write_history(f"[dim]Tool {tool_name} result: {preview}[/dim]")
+
+        # Restore thinking indicator
+        indicator = self.query_one("#processing-indicator", Label)
+        indicator.update("Thinking...")
+
+    def handle_execution_completed(self, output: dict[str, Any]) -> None:
+        """Handle execution finishing successfully."""
+        indicator = self.query_one("#processing-indicator", Label)
+        indicator.display = False
+
+        # Write the final streaming snapshot to permanent history (if any)
+        if self._streaming_snapshot:
+            self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
+        else:
+            output_str = str(output.get("output_string", output))
+            self._write_history(f"[bold blue]Agent:[/bold blue] {output_str}")
+        self._write_history("")  # separator
+
+        self._current_exec_id = None
+        self._streaming_snapshot = ""
+        self._waiting_for_input = False
+        self._input_node_id = None
+
+        # Re-enable input
+        chat_input = self.query_one("#chat-input", Input)
+        chat_input.disabled = False
+        chat_input.placeholder = "Enter input for agent..."
+        chat_input.focus()
+
+    def handle_execution_failed(self, error: str) -> None:
+        """Handle execution failing."""
+        indicator = self.query_one("#processing-indicator", Label)
+        indicator.display = False
+
+        self._write_history(f"[bold red]Error:[/bold red] {error}")
+        self._write_history("")  # separator
+
+        self._current_exec_id = None
+        self._streaming_snapshot = ""
+        self._waiting_for_input = False
+        self._input_node_id = None
+
+        # Re-enable input
+        chat_input = self.query_one("#chat-input", Input)
+        chat_input.disabled = False
+        chat_input.placeholder = "Enter input for agent..."
+        chat_input.focus()
+
+    def handle_input_requested(self, node_id: str) -> None:
+        """Handle a client-facing node requesting user input.
+
+        Transitions to 'waiting for input' state: flushes the current
+        streaming snapshot to history, re-enables the input widget,
+        and sets a flag so the next submission routes to inject_input().
+        """
+        # Flush accumulated streaming text as agent output
+        if self._streaming_snapshot:
+            self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
+            self._streaming_snapshot = ""
+
+        self._waiting_for_input = True
+        self._input_node_id = node_id or None
+
+        indicator = self.query_one("#processing-indicator", Label)
+        indicator.update("Waiting for your input...")
+
+        chat_input = self.query_one("#chat-input", Input)
+        chat_input.disabled = False
+        chat_input.placeholder = "Type your response..."
+        chat_input.focus()
@@ -0,0 +1,194 @@
+"""
+Graph/Tree Overview Widget - Displays real agent graph structure.
+"""
+
+from textual.app import ComposeResult
+from textual.containers import Vertical
+from textual.widgets import RichLog
+
+from framework.runtime.agent_runtime import AgentRuntime
+from framework.runtime.event_bus import EventType
+
+
+class GraphOverview(Vertical):
+    """Widget to display Agent execution graph/tree with real data."""
+
+    DEFAULT_CSS = """
+    GraphOverview {
+        width: 100%;
+        height: 100%;
+        background: $panel;
+    }
+
+    GraphOverview > RichLog {
+        width: 100%;
+        height: 100%;
+        background: $panel;
+        border: none;
+        scrollbar-background: $surface;
+        scrollbar-color: $primary;
+    }
+    """
+
+    def __init__(self, runtime: AgentRuntime):
+        super().__init__()
+        self.runtime = runtime
+        self.active_node: str | None = None
+        self.execution_path: list[str] = []
+        # Per-node status strings shown next to the node in the graph display.
+        # e.g. {"planner": "thinking...", "searcher": "web_search..."}
+        self._node_status: dict[str, str] = {}
+
+    def compose(self) -> ComposeResult:
+        # Use RichLog for formatted output
+        yield RichLog(id="graph-display", highlight=True, markup=True)
+
+    def on_mount(self) -> None:
+        """Display initial graph structure."""
+        self._display_graph()
+
+    def _topo_order(self) -> list[str]:
+        """BFS from entry_node following edges."""
+        graph = self.runtime.graph
+        visited: list[str] = []
+        seen: set[str] = set()
+        queue = [graph.entry_node]
+        while queue:
+            nid = queue.pop(0)
+            if nid in seen:
+                continue
+            seen.add(nid)
+            visited.append(nid)
+            for edge in graph.get_outgoing_edges(nid):
+                if edge.target not in seen:
+                    queue.append(edge.target)
+        # Append orphan nodes not reachable from entry
+        for node in graph.nodes:
+            if node.id not in seen:
+                visited.append(node.id)
+        return visited
+
+    def _render_node_line(self, node_id: str) -> str:
+        """Render a single node with status symbol and optional status text."""
+        graph = self.runtime.graph
+        is_terminal = node_id in (graph.terminal_nodes or [])
+        is_active = node_id == self.active_node
+        is_done = node_id in self.execution_path and not is_active
+        status = self._node_status.get(node_id, "")
+
+        if is_active:
+            sym = "[bold green]●[/bold green]"
+        elif is_done:
+            sym = "[dim]✓[/dim]"
+        elif is_terminal:
+            sym = "[yellow]■[/yellow]"
+        else:
+            sym = "○"
+
+        if is_active:
+            name = f"[bold green]{node_id}[/bold green]"
+        elif is_done:
+            name = f"[dim]{node_id}[/dim]"
+        else:
+            name = node_id
+
+        suffix = f"  [italic]{status}[/italic]" if status else ""
+        return f"  {sym} {name}{suffix}"
+
+    def _render_edges(self, node_id: str) -> list[str]:
+        """Render edge connectors from this node to its targets."""
+        edges = self.runtime.graph.get_outgoing_edges(node_id)
+        if not edges:
+            return []
+        if len(edges) == 1:
+            return ["  │", "  ▼"]
+        # Fan-out: show branches
+        lines: list[str] = []
+        for i, edge in enumerate(edges):
+            connector = "└" if i == len(edges) - 1 else "├"
+            cond = ""
+            if edge.condition.value not in ("always", "on_success"):
+                cond = f" [dim]({edge.condition.value})[/dim]"
+            lines.append(f"  {connector}──▶ {edge.target}{cond}")
+        return lines
+
+    def _display_graph(self) -> None:
+        """Display the graph as an ASCII DAG with edge connectors."""
+        display = self.query_one("#graph-display", RichLog)
+        display.clear()
+
+        graph = self.runtime.graph
+        display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
+
+        # Render each node in topological order with edges
+        ordered = self._topo_order()
+        for node_id in ordered:
+            display.write(self._render_node_line(node_id))
+            for edge_line in self._render_edges(node_id):
+                display.write(edge_line)
+
+        # Execution path footer
+        if self.execution_path:
+            display.write("")
+            display.write(f"[dim]Path:[/dim] {' → '.join(self.execution_path[-5:])}")
+
+    def update_active_node(self, node_id: str) -> None:
+        """Update the currently active node."""
+        self.active_node = node_id
+        if node_id not in self.execution_path:
+            self.execution_path.append(node_id)
+        self._display_graph()
+
+    def update_execution(self, event) -> None:
+        """Update the displayed node status based on execution lifecycle events."""
+        if event.type == EventType.EXECUTION_STARTED:
+            self._node_status.clear()
+            self.execution_path.clear()
+            entry_node = event.data.get("entry_node") or (
+                self.runtime.graph.entry_node if self.runtime else None
+            )
+            if entry_node:
+                self.update_active_node(entry_node)
+
+        elif event.type == EventType.EXECUTION_COMPLETED:
+            self.active_node = None
+            self._node_status.clear()
+            self._display_graph()
+
+        elif event.type == EventType.EXECUTION_FAILED:
+            error = event.data.get("error", "Unknown error")
+            if self.active_node:
+                self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
+            self.active_node = None
+            self._display_graph()
+
+    # -- Event handlers called by app.py _handle_event --
+
+    def handle_node_loop_started(self, node_id: str) -> None:
+        """A node's event loop has started."""
+        self._node_status[node_id] = "thinking..."
+        self.update_active_node(node_id)
+
+    def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
+        """A node advanced to a new loop iteration."""
+        self._node_status[node_id] = f"step {iteration}"
+        self._display_graph()
+
+    def handle_node_loop_completed(self, node_id: str) -> None:
+        """A node's event loop completed."""
+        self._node_status.pop(node_id, None)
+        self._display_graph()
+
+    def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
+        """Show tool activity next to the active node."""
+        if started:
+            self._node_status[node_id] = f"{tool_name}..."
+        else:
+            # Restore to generic thinking status after tool completes
+            self._node_status[node_id] = "thinking..."
+        self._display_graph()
+
+    def handle_stalled(self, node_id: str, reason: str) -> None:
+        """Highlight a stalled node."""
+        self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
+        self._display_graph()
@@ -0,0 +1,147 @@
+"""
+Log Pane Widget - Uses RichLog for reliable rendering.
+"""
+
+import logging
+from datetime import datetime
+
+from textual.app import ComposeResult
+from textual.containers import Container
+from textual.widgets import RichLog
+
+from framework.runtime.event_bus import AgentEvent, EventType
+
+
+class LogPane(Container):
+    """Widget to display logs with reliable rendering."""
+
+    _EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
+        EventType.EXECUTION_STARTED: (">>", "bold cyan"),
+        EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
+        EventType.EXECUTION_FAILED: ("!!", "bold red"),
+        EventType.TOOL_CALL_STARTED: ("->", "yellow"),
+        EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
+        EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
+        EventType.NODE_LOOP_ITERATION: ("..", "dim"),
+        EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
+        EventType.NODE_STALLED: ("!!", "bold yellow"),
+        EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
+        EventType.GOAL_PROGRESS: ("%%", "blue"),
+        EventType.GOAL_ACHIEVED: ("**", "bold green"),
+        EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
+        EventType.STATE_CHANGED: ("~~", "dim"),
+        EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
+    }
+
+    _LOG_LEVEL_COLORS = {
+        logging.DEBUG: "dim",
+        logging.INFO: "",
+        logging.WARNING: "yellow",
+        logging.ERROR: "red",
+        logging.CRITICAL: "bold red",
+    }
+
+    DEFAULT_CSS = """
+    LogPane {
+        width: 100%;
+        height: 100%;
+    }
+
+    LogPane > RichLog {
+        width: 100%;
+        height: 100%;
+        background: $surface;
+        border: none;
+        scrollbar-background: $panel;
+        scrollbar-color: $primary;
+    }
+    """
+
+    def compose(self) -> ComposeResult:
+        # RichLog is designed for log display and doesn't have TextArea's rendering issues
+        yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
+
+    def write_event(self, event: AgentEvent) -> None:
+        """Format an AgentEvent with timestamp + symbol and write to the log."""
+        ts = event.timestamp.strftime("%H:%M:%S")
+        symbol, color = self._EVENT_FORMAT.get(event.type, ("--", "dim"))
+        text = self._extract_event_text(event)
+        self.write_log(f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]")
+
+    def _extract_event_text(self, event: AgentEvent) -> str:
+        """Extract human-readable text from an event's data dict."""
+        et = event.type
+        data = event.data
+
+        if et == EventType.EXECUTION_STARTED:
+            return "Execution started"
+        elif et == EventType.EXECUTION_COMPLETED:
+            return "Execution completed"
+        elif et == EventType.EXECUTION_FAILED:
+            return f"Execution FAILED: {data.get('error', 'unknown')}"
+        elif et == EventType.TOOL_CALL_STARTED:
+            return f"Tool call: {data.get('tool_name', 'unknown')}"
+        elif et == EventType.TOOL_CALL_COMPLETED:
+            name = data.get("tool_name", "unknown")
+            if data.get("is_error"):
+                preview = str(data.get("result", ""))[:80]
+                return f"Tool error: {name} - {preview}"
+            return f"Tool done: {name}"
+        elif et == EventType.NODE_LOOP_STARTED:
+            return f"Node started: {event.node_id or 'unknown'}"
+        elif et == EventType.NODE_LOOP_ITERATION:
+            return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
+        elif et == EventType.NODE_LOOP_COMPLETED:
+            return f"Node done: {event.node_id or 'unknown'}"
+        elif et == EventType.NODE_STALLED:
+            reason = data.get("reason", "")
+            node = event.node_id or "unknown"
+            return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
+        elif et == EventType.NODE_INPUT_BLOCKED:
+            return f"Node input blocked: {event.node_id or 'unknown'}"
+        elif et == EventType.GOAL_PROGRESS:
+            return f"Goal progress: {data.get('progress', '?')}"
+        elif et == EventType.GOAL_ACHIEVED:
+            return "Goal achieved"
+        elif et == EventType.CONSTRAINT_VIOLATION:
+            return f"Constraint violated: {data.get('description', 'unknown')}"
+        elif et == EventType.STATE_CHANGED:
+            return f"State changed: {data.get('key', 'unknown')}"
+        elif et == EventType.CLIENT_INPUT_REQUESTED:
+            return "Waiting for user input"
+        else:
+            return f"{et.value}: {data}"
+
+    def write_python_log(self, record: logging.LogRecord) -> None:
+        """Format a Python log record with timestamp and severity color."""
+        ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
+        color = self._LOG_LEVEL_COLORS.get(record.levelno, "")
+        msg = record.getMessage()
+        if color:
+            self.write_log(f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}")
+        else:
+            self.write_log(f"[dim]{ts}[/dim] {record.levelname} {msg}")
+
+    def write_log(self, message: str) -> None:
+        """Write a log message to the log pane."""
+        try:
+            # Check if widget is mounted
+            if not self.is_mounted:
+                return
+
+            log = self.query_one("#main-log", RichLog)
+
+            # Check if log is mounted
+            if not log.is_mounted:
+                return
+
+            # Only auto-scroll if user is already at the bottom
+            was_at_bottom = log.is_vertical_scroll_end
+
+            log.write(message)
+
+            if was_at_bottom:
+                log.scroll_end(animate=False)
+
+        except Exception:
+            pass
@@ -18,7 +18,8 @@ dependencies = [
  "tools",
 ]

-# [project.optional-dependencies]
+[project.optional-dependencies]
+tui = ["textual>=0.75.0"]

 [project.scripts]
 hive = "framework.cli:main"
@@ -104,8 +104,10 @@ def test_event_loop_node_spec_accepted():
 # --- _get_node_implementation() tests ---


-def test_unregistered_event_loop_raises(runtime):
-    """An event_loop node not in the registry should raise RuntimeError."""
+def test_unregistered_event_loop_auto_creates(runtime):
+    """An event_loop node not in the registry should be auto-created."""
+    from framework.graph.event_loop_node import EventLoopNode
+
    spec = NodeSpec(
        id="el1",
        name="Event Loop",
@@ -114,8 +116,10 @@ def test_unregistered_event_loop_raises(runtime):
    )
    executor = GraphExecutor(runtime=runtime)

-    with pytest.raises(RuntimeError, match="not found in registry"):
-        executor._get_node_implementation(spec)
+    result = executor._get_node_implementation(spec)
+    assert isinstance(result, EventLoopNode)
+    # Auto-created node should be cached in registry
+    assert "el1" in executor.node_registry


 def test_registered_event_loop_returns_impl(runtime):
@@ -5,7 +5,7 @@ Focused on minimal success and failure scenarios.

 import pytest

-from framework.graph.edge import GraphSpec
+from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
 from framework.graph.executor import GraphExecutor
 from framework.graph.goal import Goal
 from framework.graph.node import NodeResult, NodeSpec
@@ -130,3 +130,169 @@ async def test_executor_single_node_failure():
    assert result.success is False
    assert result.error is not None
    assert result.path == ["n1"]
+
+
+# ---- Fake event bus that records calls ----
+class FakeEventBus:
+    def __init__(self):
+        self.events = []
+
+    async def emit_node_loop_started(self, **kwargs):
+        self.events.append(("started", kwargs))
+
+    async def emit_node_loop_completed(self, **kwargs):
+        self.events.append(("completed", kwargs))
+
+
+@pytest.mark.asyncio
+async def test_executor_emits_node_events():
+    """Executor should emit NODE_LOOP_STARTED/COMPLETED for each non-event_loop node."""
+    runtime = DummyRuntime()
+    event_bus = FakeEventBus()
+
+    graph = GraphSpec(
+        id="graph-ev",
+        goal_id="g-ev",
+        nodes=[
+            NodeSpec(
+                id="n1",
+                name="first",
+                description="first node",
+                node_type="llm_generate",
+                input_keys=[],
+                output_keys=["result"],
+                max_retries=0,
+            ),
+            NodeSpec(
+                id="n2",
+                name="second",
+                description="second node",
+                node_type="llm_generate",
+                input_keys=["result"],
+                output_keys=["result"],
+                max_retries=0,
+            ),
+        ],
+        edges=[
+            EdgeSpec(
+                id="e1",
+                source="n1",
+                target="n2",
+                condition=EdgeCondition.ON_SUCCESS,
+            ),
+        ],
+        entry_node="n1",
+        terminal_nodes=["n2"],
+    )
+
+    executor = GraphExecutor(
+        runtime=runtime,
+        node_registry={
+            "n1": SuccessNode(),
+            "n2": SuccessNode(),
+        },
+        event_bus=event_bus,
+        stream_id="test-stream",
+    )
+
+    goal = Goal(id="g-ev", name="event-test", description="test events")
+    result = await executor.execute(graph=graph, goal=goal)
+
+    assert result.success is True
+    assert result.path == ["n1", "n2"]
+
+    # Should have 4 events: started/completed for n1, then started/completed for n2
+    assert len(event_bus.events) == 4
+    assert event_bus.events[0] == ("started", {"stream_id": "test-stream", "node_id": "n1"})
+    assert event_bus.events[1] == (
+        "completed",
+        {"stream_id": "test-stream", "node_id": "n1", "iterations": 1},
+    )
+    assert event_bus.events[2] == ("started", {"stream_id": "test-stream", "node_id": "n2"})
+    assert event_bus.events[3] == (
+        "completed",
+        {"stream_id": "test-stream", "node_id": "n2", "iterations": 1},
+    )
+
+
+# ---- Fake event_loop node (registered, so executor won't emit for it) ----
+class FakeEventLoopNode:
+    def validate_input(self, ctx):
+        return []
+
+    async def execute(self, ctx):
+        return NodeResult(success=True, output={"result": "loop-done"}, tokens_used=1, latency_ms=1)
+
+
+@pytest.mark.asyncio
+async def test_executor_skips_events_for_event_loop_nodes():
+    """Executor should NOT emit events for event_loop nodes (they emit their own)."""
+    runtime = DummyRuntime()
+    event_bus = FakeEventBus()
+
+    graph = GraphSpec(
+        id="graph-el",
+        goal_id="g-el",
+        nodes=[
+            NodeSpec(
+                id="el1",
+                name="event-loop-node",
+                description="event loop node",
+                node_type="event_loop",
+                input_keys=[],
+                output_keys=["result"],
+                max_retries=0,
+            ),
+        ],
+        edges=[],
+        entry_node="el1",
+    )
+
+    executor = GraphExecutor(
+        runtime=runtime,
+        node_registry={"el1": FakeEventLoopNode()},
+        event_bus=event_bus,
+        stream_id="test-stream",
+    )
+
+    goal = Goal(id="g-el", name="el-test", description="test event_loop guard")
+    result = await executor.execute(graph=graph, goal=goal)
+
+    assert result.success is True
+    # No events should have been emitted — event_loop nodes are skipped
+    assert len(event_bus.events) == 0
+
+
+@pytest.mark.asyncio
+async def test_executor_no_events_without_event_bus():
+    """Executor should work fine without an event bus (backward compat)."""
+    runtime = DummyRuntime()
+
+    graph = GraphSpec(
+        id="graph-nobus",
+        goal_id="g-nobus",
+        nodes=[
+            NodeSpec(
+                id="n1",
+                name="node1",
+                description="test node",
+                node_type="llm_generate",
+                input_keys=[],
+                output_keys=["result"],
+                max_retries=0,
+            )
+        ],
+        edges=[],
+        entry_node="n1",
+    )
+
+    # No event_bus passed — should not crash
+    executor = GraphExecutor(
+        runtime=runtime,
+        node_registry={"n1": SuccessNode()},
+    )
+
+    goal = Goal(id="g-nobus", name="nobus-test", description="no event bus")
+    result = await executor.execute(graph=graph, goal=goal)
+
+    assert result.success is True
@@ -0,0 +1,360 @@
+"""
+Test that ON_FAILURE edges are followed when a node fails after max retries.
+
+Verifies the fix for Issue #3449 where the executor would immediately terminate
+when max retries were exceeded, without checking for ON_FAILURE edges that could
+route to error handler nodes.
+"""
+
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
+from framework.graph.executor import GraphExecutor
+from framework.graph.goal import Goal
+from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
+from framework.runtime.core import Runtime
+
+
+class AlwaysFailsNode(NodeProtocol):
+    """A node that always fails."""
+
+    def __init__(self):
+        self.attempt_count = 0
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        self.attempt_count += 1
+        return NodeResult(success=False, error=f"Permanent error (attempt {self.attempt_count})")
+
+
+class FailureHandlerNode(NodeProtocol):
+    """A node that handles failures from upstream nodes."""
+
+    def __init__(self):
+        self.executed = False
+        self.execute_count = 0
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        self.executed = True
+        self.execute_count += 1
+        return NodeResult(
+            success=True,
+            output={"handled": True, "recovery": "graceful"},
+        )
+
+
+class SuccessNode(NodeProtocol):
+    """A node that always succeeds with configurable output."""
+
+    def __init__(self, output: dict | None = None):
+        self.execute_count = 0
+        self._output = output or {"result": "ok"}
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        self.execute_count += 1
+        return NodeResult(success=True, output=self._output)
+
+
+@pytest.fixture(autouse=True)
+def fast_sleep(monkeypatch):
+    """Mock asyncio.sleep to avoid real delays from exponential backoff."""
+    monkeypatch.setattr("asyncio.sleep", AsyncMock())
+
+
+@pytest.fixture
+def runtime():
+    """Create a mock Runtime for testing."""
+    runtime = MagicMock(spec=Runtime)
+    runtime.start_run = MagicMock(return_value="test_run_id")
+    runtime.decide = MagicMock(return_value="test_decision_id")
+    runtime.record_outcome = MagicMock()
+    runtime.end_run = MagicMock()
+    runtime.report_problem = MagicMock()
+    runtime.set_node = MagicMock()
+    return runtime
+
+
+@pytest.fixture
+def goal():
+    return Goal(
+        id="test_goal",
+        name="Test Goal",
+        description="Test ON_FAILURE edge routing",
+    )
+
+
+@pytest.mark.asyncio
+async def test_on_failure_edge_followed_after_max_retries(runtime, goal):
+    """
+    When a node fails after exhausting max retries, ON_FAILURE edges should
+    be followed to route execution to a failure handler node.
+    """
+    nodes = [
+        NodeSpec(
+            id="failing",
+            name="Failing Node",
+            description="Always fails",
+            node_type="function",
+            output_keys=[],
+            max_retries=1,
+        ),
+        NodeSpec(
+            id="handler",
+            name="Failure Handler",
+            description="Handles failures",
+            node_type="function",
+            output_keys=["handled", "recovery"],
+        ),
+    ]
+
+    edges = [
+        EdgeSpec(
+            id="fail_to_handler",
+            source="failing",
+            target="handler",
+            condition=EdgeCondition.ON_FAILURE,
+        ),
+    ]
+
+    graph = GraphSpec(
+        id="test_graph",
+        goal_id="test_goal",
+        name="Test Graph",
+        entry_node="failing",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["handler"],
+    )
+
+    executor = GraphExecutor(runtime=runtime)
+    failing_node = AlwaysFailsNode()
+    handler_node = FailureHandlerNode()
+    executor.register_node("failing", failing_node)
+    executor.register_node("handler", handler_node)
+
+    result = await executor.execute(graph, goal, {})
+
+    # The handler should have executed
+    assert handler_node.executed, "Failure handler was not executed"
+    assert handler_node.execute_count == 1
+
+    # Overall execution should succeed (handler recovered)
+    assert result.success
+    # Handler node should appear in the execution path
+    assert "handler" in result.path
+
+
+@pytest.mark.asyncio
+async def test_no_on_failure_edge_still_terminates(runtime, goal):
+    """
+    When a node fails after max retries and there is no ON_FAILURE edge,
+    the executor should terminate with a failure result (original behavior).
+    """
+    nodes = [
+        NodeSpec(
+            id="failing",
+            name="Failing Node",
+            description="Always fails",
+            node_type="function",
+            output_keys=[],
+            max_retries=1,
+        ),
+    ]
+
+    graph = GraphSpec(
+        id="test_graph",
+        goal_id="test_goal",
+        name="Test Graph",
+        entry_node="failing",
+        nodes=[nodes[0]],
+        edges=[],
+        terminal_nodes=["failing"],
+    )
+
+    executor = GraphExecutor(runtime=runtime)
+    failing_node = AlwaysFailsNode()
+    executor.register_node("failing", failing_node)
+
+    result = await executor.execute(graph, goal, {})
+
+    assert not result.success
+    assert "failed after 1 attempts" in result.error
+
+
+@pytest.mark.asyncio
+async def test_on_failure_edge_not_followed_on_success(runtime, goal):
+    """
+    ON_FAILURE edges should NOT be followed when a node succeeds.
+    Only ON_SUCCESS edges should fire.
+    """
+    nodes = [
+        NodeSpec(
+            id="working",
+            name="Working Node",
+            description="Always succeeds",
+            node_type="function",
+            output_keys=["result"],
+        ),
+        NodeSpec(
+            id="handler",
+            name="Failure Handler",
+            description="Should not be reached",
+            node_type="function",
+            output_keys=["handled"],
+        ),
+        NodeSpec(
+            id="next",
+            name="Next Node",
+            description="Normal successor",
+            node_type="function",
+            output_keys=["done"],
+        ),
+    ]
+
+    edges = [
+        EdgeSpec(
+            id="on_fail",
+            source="working",
+            target="handler",
+            condition=EdgeCondition.ON_FAILURE,
+        ),
+        EdgeSpec(
+            id="on_success",
+            source="working",
+            target="next",
+            condition=EdgeCondition.ON_SUCCESS,
+        ),
+    ]
+
+    graph = GraphSpec(
+        id="test_graph",
+        goal_id="test_goal",
+        name="Test Graph",
+        entry_node="working",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["handler", "next"],
+    )
+
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("working", SuccessNode(output={"result": "ok"}))
+    handler_node = FailureHandlerNode()
+    executor.register_node("handler", handler_node)
+    executor.register_node("next", SuccessNode(output={"done": True}))
+
+    result = await executor.execute(graph, goal, {})
+
+    assert result.success
+    assert not handler_node.executed, "Failure handler should not run on success"
+    assert "next" in result.path, "Should follow ON_SUCCESS edge to 'next'"
+
+
+@pytest.mark.asyncio
+async def test_on_failure_edge_with_zero_retries(runtime, goal):
+    """
+    ON_FAILURE edges should work even when max_retries=0 (no retries allowed).
+    The node fails once and immediately routes to the failure handler.
+    """
+    nodes = [
+        NodeSpec(
+            id="fragile",
+            name="Fragile Node",
+            description="Fails with no retries",
+            node_type="function",
+            output_keys=[],
+            max_retries=0,
+        ),
+        NodeSpec(
+            id="handler",
+            name="Failure Handler",
+            description="Handles failures",
+            node_type="function",
+            output_keys=["handled", "recovery"],
+        ),
+    ]
+
+    edges = [
+        EdgeSpec(
+            id="fail_to_handler",
+            source="fragile",
+            target="handler",
+            condition=EdgeCondition.ON_FAILURE,
+        ),
+    ]
+
+    graph = GraphSpec(
+        id="test_graph",
+        goal_id="test_goal",
+        name="Test Graph",
+        entry_node="fragile",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["handler"],
+    )
+
+    executor = GraphExecutor(runtime=runtime)
+    failing_node = AlwaysFailsNode()
+    handler_node = FailureHandlerNode()
+    executor.register_node("fragile", failing_node)
+    executor.register_node("handler", handler_node)
+
+    result = await executor.execute(graph, goal, {})
+
+    # Should route to handler after single failure (no retries)
+    assert failing_node.attempt_count == 1
+    assert handler_node.executed
+    assert result.success
+
+
+@pytest.mark.asyncio
+async def test_on_failure_handler_appears_in_path(runtime, goal):
+    """
+    The failure handler node should appear in the execution path.
+    """
+    nodes = [
+        NodeSpec(
+            id="failing",
+            name="Failing Node",
+            description="Always fails",
+            node_type="function",
+            output_keys=[],
+            max_retries=1,
+        ),
+        NodeSpec(
+            id="handler",
+            name="Failure Handler",
+            description="Handles failures",
+            node_type="function",
+            output_keys=["handled", "recovery"],
+        ),
+    ]
+
+    edges = [
+        EdgeSpec(
+            id="fail_to_handler",
+            source="failing",
+            target="handler",
+            condition=EdgeCondition.ON_FAILURE,
+        ),
+    ]
+
+    graph = GraphSpec(
+        id="test_graph",
+        goal_id="test_goal",
+        name="Test Graph",
+        entry_node="failing",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["handler"],
+    )
+
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("failing", AlwaysFailsNode())
+    executor.register_node("handler", FailureHandlerNode())
+
+    result = await executor.execute(graph, goal, {})
+
+    assert "failing" in result.path
+    assert "handler" in result.path
+    assert result.node_visit_counts.get("handler") == 1
@@ -83,7 +83,7 @@ git clone https://github.com/adenhq/hive.git
 cd hive

 # Python वातावरण कॉन्फ़िगरेशन चलाएँ
-./scripts/setup-python.sh
+./quickstart.sh
 ```

 यह इंस्टॉल करता है:
@@ -236,7 +236,7 @@ hive/

 ```bash
 # एक-बार का कॉन्फ़िगरेशन
-./scripts/setup-python.sh
+./quickstart.sh

 # यह इंस्टॉल करता है:
 # - फ्रेमवर्क पैकेज (मुख्य रनटाइम)
@@ -9,14 +9,11 @@
  },
  "license": "Apache-2.0",
  "scripts": {
-    "setup": "echo '⚠️  This npm setup is for the archived web application. For agent development, use: ./scripts/setup-python.sh' && bash scripts/setup.sh",
    "test:duplicates": "bun test scripts/auto-close-duplicates"
  },
  "devDependencies": {
    "@types/node": "^20.10.0",
-    "tsx": "^4.7.0",
-    "typescript": "^5.3.0",
-    "yaml": "^2.3.0"
+    "typescript": "^5.3.0"
  },
  "engines": {
    "node": ">=20.0.0",
@@ -1,180 +0,0 @@
-/**
- * Environment Generator Script
- *
- * Reads config.yaml and generates .env files for each service.
- * This provides a single source of truth for configuration while
- * maintaining compatibility with standard .env file workflows.
- *
- * Usage: npx tsx scripts/generate-env.ts
- */
-
-import { readFileSync, writeFileSync, existsSync } from 'fs';
-import { parse } from 'yaml';
-import { join, dirname } from 'path';
-import { fileURLToPath } from 'url';
-
-const __dirname = dirname(fileURLToPath(import.meta.url));
-const PROJECT_ROOT = join(__dirname, '..');
-
-interface Config {
-  app: {
-    name: string;
-    environment: string;
-    log_level: string;
-  };
-  server: {
-    frontend: {
-      port: number;
-    };
-    backend: {
-      port: number;
-      host: string;
-    };
-  };
-  timescaledb: {
-    url: string;
-    port: number;
-  };
-  mongodb: {
-    url: string;
-    database: string;
-    erp_database: string;
-    port: number;
-  };
-  redis: {
-    url: string;
-    port: number;
-  };
-  auth: {
-    jwt_secret: string;
-    jwt_expires_in: string;
-    passphrase: string;
-  };
-  npm: {
-    token: string;
-  };
-  cors: {
-    origin: string;
-  };
-  features: {
-    registration: boolean;
-    rate_limiting: boolean;
-    request_logging: boolean;
-    mcp_server: boolean;
-  };
-}
-
-function loadConfig(): Config {
-  const configPath = join(PROJECT_ROOT, 'config.yaml');
-
-  if (!existsSync(configPath)) {
-    console.error('Error: config.yaml not found.');
-    console.error('Run: cp config.yaml.example config.yaml');
-    process.exit(1);
-  }
-
-  const configContent = readFileSync(configPath, 'utf-8');
-  return parse(configContent) as Config;
-}
-
-function generateRootEnv(config: Config): string {
-  return `# Generated from config.yaml - do not edit directly
-# Regenerate with: npm run generate:env
-
-# Application
-NODE_ENV=${config.app.environment}
-APP_NAME=${config.app.name}
-LOG_LEVEL=${config.app.log_level}
-
-# Ports
-FRONTEND_PORT=${config.server.frontend.port}
-BACKEND_PORT=${config.server.backend.port}
-TSDB_PORT=${config.timescaledb.port}
-MONGODB_PORT=${config.mongodb.port}
-REDIS_PORT=${config.redis.port}
-
-# API URL for frontend
-VITE_API_URL=http://localhost:${config.server.backend.port}
-
-# MongoDB
-MONGODB_DBNAME=${config.mongodb.database}
-MONGODB_ERP_DBNAME=${config.mongodb.erp_database}
-
-# Authentication
-JWT_SECRET=${config.auth.jwt_secret}
-PASSPHRASE=${config.auth.passphrase}
-
-# NPM (for Docker builds with private packages)
-NPM_TOKEN=${config.npm.token}
-
-# CORS
-CORS_ORIGIN=${config.cors.origin}
-`;
-}
-
-function generateFrontendEnv(config: Config): string {
-  return `# Generated from config.yaml - do not edit directly
-# Regenerate with: npm run generate:env
-
-VITE_API_URL=http://localhost:${config.server.backend.port}
-VITE_APP_NAME=${config.app.name}
-VITE_APP_ENV=${config.app.environment}
-`;
-}
-
-function generateBackendEnv(config: Config): string {
-  return `# Generated from config.yaml - do not edit directly
-# Regenerate with: npm run generate:env
-
-# Server
-NODE_ENV=${config.app.environment}
-PORT=${config.server.backend.port}
-
-# Application
-LOG_LEVEL=${config.app.log_level}
-
-# TimescaleDB (PostgreSQL)
-TSDB_PG_URL=${config.timescaledb.url}
-
-# MongoDB
-MONGODB_URL=${config.mongodb.url}
-MONGODB_DBNAME=${config.mongodb.database}
-MONGODB_ERP_DBNAME=${config.mongodb.erp_database}
-
-# Redis
-REDIS_URL=${config.redis.url}
-
-# Authentication
-JWT_SECRET=${config.auth.jwt_secret}
-PASSPHRASE=${config.auth.passphrase}
-
-# Features
-FEATURE_MCP_SERVER=${config.features.mcp_server}
-`;
-}
-
-function main() {
-  console.log('Generating environment files from config.yaml...\n');
-
-  const config = loadConfig();
-
-  // Generate root .env (for docker-compose)
-  const rootEnvPath = join(PROJECT_ROOT, '.env');
-  writeFileSync(rootEnvPath, generateRootEnv(config));
-  console.log(`✓ Generated ${rootEnvPath}`);
-
-  // Generate frontend .env
-  const frontendEnvPath = join(PROJECT_ROOT, 'honeycomb', '.env');
-  writeFileSync(frontendEnvPath, generateFrontendEnv(config));
-  console.log(`✓ Generated ${frontendEnvPath}`);
-
-  // Generate backend .env
-  const backendEnvPath = join(PROJECT_ROOT, 'hive', '.env');
-  writeFileSync(backendEnvPath, generateBackendEnv(config));
-  console.log(`✓ Generated ${backendEnvPath}`);
-
-  console.log('\nDone! Environment files have been generated.');
-  console.log('\nNote: These files are git-ignored. Regenerate after editing config.yaml.');
-}
-
-main();
@@ -1,251 +0,0 @@
-<#
-
-setup-python.ps1 - Python Environment Setup for Aden Agent Framework
-
-This script sets up the Python environment with all required packages
-for building and running goal-driven agents.
-#>
-
-$ErrorActionPreference = "Stop"
-
-# Colors for output
-$RED    = "Red"
-$GREEN  = "Green"
-$YELLOW = "Yellow"
-$BLUE   = "Cyan"
-
-# Get the directory where this script is located
-$SCRIPT_DIR = Split-Path -Parent $MyInvocation.MyCommand.Path
-$PROJECT_ROOT = Split-Path -Parent $SCRIPT_DIR
-
-Write-Host ""
-Write-Host "=================================================="
-Write-Host "  Aden Agent Framework - Python Setup"
-Write-Host "=================================================="
-Write-Host ""
-
-# Check for Python
-$pythonCmd = $null
-if (Get-Command python -ErrorAction SilentlyContinue) {
-    $pythonCmd = "python"
-}
-
-if (-not $pythonCmd) {
-    Write-Host "Error: Python is not installed." -ForegroundColor $RED
-    Write-Host "Please install Python 3.11+ from https://python.org"
-    exit 1
-}
-
-# Check Python version
-$versionInfo = & $pythonCmd -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')"
-$major = & $pythonCmd -c "import sys; print(sys.version_info.major)"
-$minor = & $pythonCmd -c "import sys; print(sys.version_info.minor)"
-
-Write-Host "Detected Python: $versionInfo" -ForegroundColor $BLUE
-
-if ($major -lt 3 -or ($major -eq 3 -and $minor -lt 11)) {
-    Write-Host "Error: Python 3.11+ is required (found $versionInfo)" -ForegroundColor $RED
-    Write-Host "Please upgrade your Python installation"
-    exit 1
-}
-
-if ($minor -lt 11) {
-    Write-Host "Warning: Python 3.11+ is recommended for best compatibility" -ForegroundColor $YELLOW
-    Write-Host "You have Python $versionInfo which may work but is not officially supported" -ForegroundColor $YELLOW
-    Write-Host ""
-}
-
-Write-Host "[OK] Python version check passed" -ForegroundColor $GREEN
-Write-Host ""
-
-# Create and activate virtual environment
-Write-Host "=================================================="
-Write-Host "Setting up Python Virtual Environment"
-Write-Host "=================================================="
-Write-Host ""
-
-$VENV_PATH = Join-Path $PROJECT_ROOT ".venv"
-$VENV_PYTHON = Join-Path $VENV_PATH "Scripts\python.exe"
-$VENV_ACTIVATE = Join-Path $VENV_PATH "Scripts\Activate.ps1"
-
-if (-not (Test-Path $VENV_PYTHON)) {
-    Write-Host "Creating virtual environment at .venv..."
-    & $pythonCmd -m venv $VENV_PATH
-    Write-Host "[OK] Virtual environment created" -ForegroundColor $GREEN
-}
-else {
-    Write-Host "[OK] Virtual environment already exists" -ForegroundColor $GREEN
-}
-
-# Activate venv
-Write-Host "Activating virtual environment..."
-& $VENV_ACTIVATE
-Write-Host "[OK] Virtual environment activated" -ForegroundColor $GREEN
-
-# From here on, always use venv python
-$pythonCmd = $VENV_PYTHON
-
-Write-Host ""
-
-# Check for pip
-try {
-    & $pythonCmd -m pip --version | Out-Null
-}
-catch {
-    Write-Host "Error: pip is not installed" -ForegroundColor $RED
-    Write-Host "Please install pip for Python $versionInfo"
-    exit 1
-}
-
-Write-Host "[OK] pip detected" -ForegroundColor $GREEN
-Write-Host ""
-
-# Upgrade pip, setuptools, and wheel
-Write-Host "Upgrading pip, setuptools, and wheel..."
-& $pythonCmd -m pip install --upgrade pip setuptools wheel 
-Write-Host "[OK] Core packages upgraded" -ForegroundColor $GREEN
-Write-Host ""
-
-# Install core framework package
-Write-Host "=================================================="
-Write-Host "Installing Core Framework Package"
-Write-Host "=================================================="
-Write-Host ""
-
-Set-Location "$PROJECT_ROOT\core"
-
-if (Test-Path "pyproject.toml") {
-    Write-Host "Installing framework from core/ (editable mode)..."
-    & $pythonCmd -m pip install -e . | Out-Null
-    Write-Host "[OK] Framework package installed" -ForegroundColor $GREEN
-}
-else {
-    Write-Host "[WARN] No pyproject.toml found in core/, skipping framework installation" -ForegroundColor $YELLOW
-}
-
-Write-Host ""
-
-# Install tools package
-Write-Host "=================================================="
-Write-Host "Installing Tools Package (aden_tools)"
-Write-Host "=================================================="
-Write-Host ""
-
-Set-Location "$PROJECT_ROOT\tools"
-
-if (Test-Path "pyproject.toml") {
-    Write-Host "Installing aden_tools from tools/ (editable mode)..."
-    & $pythonCmd -m pip install -e . | Out-Null
-    Write-Host "[OK] Tools package installed" -ForegroundColor $GREEN
-}
-else {
-    Write-Host "Error: No pyproject.toml found in tools/" -ForegroundColor $RED
-    exit 1
-}
-
-Write-Host ""
-
-# Fix openai version compatibility with litellm
-Write-Host "=================================================="
-Write-Host "Fixing Package Compatibility"
-Write-Host "=================================================="
-Write-Host ""
-
-try {
-    $openaiVersion = & $pythonCmd -c "import openai; print(openai.__version__)"
-}
-catch {
-    $openaiVersion = "not_installed"
-}
-
-if ($openaiVersion -eq "not_installed") {
-    Write-Host "Installing openai package..."
-    & $pythonCmd -m pip install "openai>=1.0.0" | Out-Null
-    Write-Host "[OK] openai package installed" -ForegroundColor $GREEN
-}
-elseif ($openaiVersion.StartsWith("0.")) {
-    Write-Host "Found old openai version: $openaiVersion" -ForegroundColor $YELLOW
-    Write-Host "Upgrading to openai 1.x+ for litellm compatibility..."
-    & $pythonCmd -m pip install --upgrade "openai>=1.0.0" | Out-Null
-    $openaiVersion = & $pythonCmd -c "import openai; print(openai.__version__)"
-    Write-Host "[OK] openai upgraded to $openaiVersion" -ForegroundColor $GREEN
-}
-else {
-    Write-Host "[OK] openai $openaiVersion is compatible" -ForegroundColor $GREEN
-}
-
-Write-Host ""
-
-# Verify installations
-Write-Host "=================================================="
-Write-Host "Verifying Installation"
-Write-Host "=================================================="
-Write-Host ""
-
-Set-Location $PROJECT_ROOT
-
-# Test framework import
-& $pythonCmd -c "import framework" 2>$null
-if ($LASTEXITCODE -eq 0) {
-    Write-Host "[OK] framework package imports successfully" -ForegroundColor Green
-}
-else {
-    Write-Host "[FAIL] framework package import failed" -ForegroundColor Red
-}
-
-# Test aden_tools import
-& $pythonCmd -c "import aden_tools" 2>$null
-if ($LASTEXITCODE -eq 0) {
-    Write-Host "[OK] aden_tools package imports successfully" -ForegroundColor Green
-}
-else {
-    Write-Host "[FAIL] aden_tools package import failed" -ForegroundColor Red
-    exit 1
-}
-
-# Test litellm
-& $pythonCmd -c "import litellm" 2>$null
-if ($LASTEXITCODE -eq 0) {
-    Write-Host "[OK] litellm package imports successfully" -ForegroundColor $GREEN
-}
-else {
-    Write-Host "[WARN] litellm import had issues (may be OK if not using LLM features)" -ForegroundColor $YELLOW
-}
-
-Write-Host ""
-
-# Print agent commands
-Write-Host "=================================================="
-Write-Host "  Setup Complete!"
-Write-Host "=================================================="
-Write-Host ""
-Write-Host "Python packages installed:"
-Write-Host "  - framework (core agent runtime)"
-Write-Host "  - aden_tools (tools and MCP servers)"
-Write-Host "  - All dependencies and compatibility fixes applied"
-Write-Host ""
-Write-Host "To run agents on Windows (PowerShell):"
-Write-Host ""
-Write-Host "1. From the project root, set PYTHONPATH:"
-Write-Host "   `$env:PYTHONPATH=`"exports`""
-Write-Host ""
-Write-Host "2. Run an agent command:"
-Write-Host "   uv run python -m agent_name validate"
-Write-Host "   uv run python -m agent_name info"
-Write-Host "   uv run python -m agent_name run --input '{...}'"
-Write-Host ""
-Write-Host "Example (support_ticket_agent):"
-Write-Host "   uv run python -m support_ticket_agent validate"
-Write-Host "   uv run python -m support_ticket_agent info"
-Write-Host "   uv run python -m support_ticket_agent run --input '{""ticket_content"":""..."",""customer_id"":""..."",""ticket_id"":""...""}'"
-Write-Host ""
-Write-Host "Notes:"
-Write-Host "  - Ensure the virtual environment is activated (.venv)"
-Write-Host "  - PYTHONPATH must be set in each new PowerShell session"
-Write-Host ""
-Write-Host "Documentation:"
-Write-Host "  $PROJECT_ROOT\README.md"
-Write-Host ""
-Write-Host "Agent Examples:"
-Write-Host "  $PROJECT_ROOT\exports\"
-Write-Host ""
@@ -1,308 +0,0 @@
-#!/bin/bash
-#
-# setup-python.sh - Python Environment Setup for Aden Agent Framework
-#
-# DEPRECATED: Use ./quickstart.sh instead. It does everything this script
-# does plus verifies MCP configuration, Claude Code skills, and API keys.
-#
-# This script is kept for CI/headless environments where the extra
-# verification steps in quickstart.sh are not needed.
-#
-
-set -e
-
-# Colors for output
-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-BLUE='\033[0;34m'
-NC='\033[0m' # No Color
-
-# Get the directory where this script is located
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
-
-# Python Version
-REQUIRED_PYTHON_VERSION="3.11"
-
-# Python version split into Major and Minor
-IFS='.' read -r PYTHON_MAJOR_VERSION PYTHON_MINOR_VERSION <<< "$REQUIRED_PYTHON_VERSION"
-
-# Available python interpreter (follows sequence)
-POSSIBLE_PYTHONS=("python3" "python" "py")
-
-# Default python interpreter (initialized)
-PYTHON_CMD=()
-
-
-echo ""
-echo "=================================================="
-echo "  Aden Agent Framework - Python Setup"
-echo "=================================================="
-echo ""
-echo -e "${YELLOW}NOTE: Consider using ./quickstart.sh instead for a complete setup.${NC}"
-echo ""
-
-# Available Python interpreter
-for cmd in "${POSSIBLE_PYTHONS[@]}"; do
-    # Check for python interpreter
-    if command -v "$cmd" >/dev/null 2>&1; then
-
-        # Specific check for Windows 'py' launcher
-        if [ "$cmd" = "py" ]; then
-            CURRENT_CMD=(py -3)
-        else
-            CURRENT_CMD=("$cmd")
-        fi
-
-        # Check Python version
-        if "${CURRENT_CMD[@]}" -c "import sys; sys.exit(0 if sys.version_info >= ($PYTHON_MAJOR_VERSION, $PYTHON_MINOR_VERSION) else 1)" >/dev/null 2>&1; then
-            echo -e "${GREEN}✓${NC} interpreter detected: ${CURRENT_CMD[@]}"
-            # Check for pip
-            if "${CURRENT_CMD[@]}" -m pip --version >/dev/null 2>&1; then
-                PYTHON_CMD=("${CURRENT_CMD[@]}")
-                echo -e "${GREEN}✓${NC} pip detected"
-                echo ""
-                break
-            else
-                echo -e "${RED}✗${NC} pip not found"
-                echo ""
-            fi
-        else
-            echo -e "${RED}✗${NC} ${CURRENT_CMD[@]} not found"
-            echo ""
-        fi
-    fi
-done
-
-# Display error message if python not found
-if [ "${#PYTHON_CMD[@]}" -eq 0 ]; then
-    echo -e "${RED}Error:${NC} No suitable Python interpreter found with pip installed."
-    echo ""
-    echo "Requirements:"
-    echo "  • Python $PYTHON_MAJOR_VERSION.$PYTHON_MINOR_VERSION+"
-    echo "  • pip installed"
-    echo ""
-    echo "Tried the following commands:"
-    echo "  ${POSSIBLE_PYTHONS[*]}"
-    echo ""
-    echo "Please install Python from:"
-    echo "  https://www.python.org/downloads/"
-    exit 1
-fi
-
-# Display Python version
-PYTHON_VERSION=$("${PYTHON_CMD[@]}" -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
-echo -e "${BLUE}Detected Python:${NC} $PYTHON_VERSION"
-echo -e "${GREEN}✓${NC} Python version check passed"
-echo ""
-
-# Check for uv
-if ! command -v uv &> /dev/null; then
-    echo -e "${RED}Error: uv is not installed${NC}"
-    echo "Please install uv from https://github.com/astral-sh/uv"
-    exit 1
-fi
-
-echo -e "${GREEN}✓${NC} uv detected"
-echo ""
-
-# Install core framework package
-echo "=================================================="
-echo "Installing Core Framework Package"
-echo "=================================================="
-echo ""
-cd "$PROJECT_ROOT/core"
-
-# Create venv if it doesn't exist
-if [ ! -d ".venv" ]; then
-    echo "Creating virtual environment in core/.venv..."
-    uv venv
-    echo -e "${GREEN}✓${NC} Virtual environment created"
-else
-    echo -e "${GREEN}✓${NC} Virtual environment already exists"
-fi
-echo ""
-
-if [ -f "pyproject.toml" ]; then
-    echo "Installing framework from core/ (editable mode)..."
-    CORE_PYTHON=".venv/bin/python"
-    if uv pip install --python "$CORE_PYTHON" -e .; then
-        echo -e "${GREEN}✓${NC} Framework package installed"
-    else
-        echo -e "${YELLOW}⚠${NC} Framework installation encountered issues (may be OK if already installed)"
-    fi
-else
-    echo -e "${YELLOW}⚠${NC} No pyproject.toml found in core/, skipping framework installation"
-fi
-echo ""
-
-# Install tools package
-echo "=================================================="
-echo "Installing Tools Package (aden_tools)"
-echo "=================================================="
-echo ""
-cd "$PROJECT_ROOT/tools"
-
-# Create venv if it doesn't exist
-if [ ! -d ".venv" ]; then
-    echo "Creating virtual environment in tools/.venv..."
-    uv venv
-    echo -e "${GREEN}✓${NC} Virtual environment created"
-else
-    echo -e "${GREEN}✓${NC} Virtual environment already exists"
-fi
-echo ""
-
-if [ -f "pyproject.toml" ]; then
-    echo "Installing aden_tools from tools/ (editable mode)..."
-    TOOLS_PYTHON=".venv/bin/python"
-    if uv pip install --python "$TOOLS_PYTHON" -e .; then
-        echo -e "${GREEN}✓${NC} Tools package installed"
-    else
-        echo -e "${RED}✗${NC} Tools installation failed"
-        exit 1
-    fi
-else
-    echo -e "${RED}Error: No pyproject.toml found in tools/${NC}"
-    exit 1
-fi
-echo ""
-
-# Install Playwright browser for web scraping
-echo "=================================================="
-echo "Installing Playwright Browser"
-echo "=================================================="
-echo ""
-
-if $PYTHON_CMD -c "import playwright" > /dev/null 2>&1; then
-    echo "Installing Chromium browser for web scraping..."
-    if $PYTHON_CMD -m playwright install chromium > /dev/null 2>&1; then
-        echo -e "${GREEN}✓${NC} Playwright Chromium installed"
-    else
-        echo -e "${YELLOW}⚠${NC} Playwright browser install failed (web_scrape tool may not work)"
-        echo "  Run manually: uv run python -m playwright install chromium"
-    fi
-else
-    echo -e "${YELLOW}⚠${NC} Playwright not found, skipping browser install"
-fi
-echo ""
-
-# Fix openai version compatibility with litellm
-echo "=================================================="
-echo "Fixing Package Compatibility"
-echo "=================================================="
-echo ""
-
-TOOLS_PYTHON="$PROJECT_ROOT/tools/.venv/bin/python"
-
-# Check openai version in tools venv
-OPENAI_VERSION=$($TOOLS_PYTHON -c "import openai; print(openai.__version__)" 2>/dev/null || echo "not_installed")
-
-if [ "$OPENAI_VERSION" = "not_installed" ]; then
-    echo "Installing openai package..."
-    uv pip install --python "$TOOLS_PYTHON" "openai>=1.0.0"
-    echo -e "${GREEN}✓${NC} openai package installed"
-elif [[ "$OPENAI_VERSION" =~ ^0\. ]]; then
-    echo -e "${YELLOW}Found old openai version: $OPENAI_VERSION${NC}"
-    echo "Upgrading to openai 1.x+ for litellm compatibility..."
-    uv pip install --python "$TOOLS_PYTHON" --upgrade "openai>=1.0.0"
-    OPENAI_VERSION=$($TOOLS_PYTHON -c "import openai; print(openai.__version__)" 2>/dev/null)
-    echo -e "${GREEN}✓${NC} openai upgraded to $OPENAI_VERSION"
-else
-    echo -e "${GREEN}✓${NC} openai $OPENAI_VERSION is compatible"
-fi
-echo ""
-
-# Ensure exports directory exists
-echo "=================================================="
-echo "Checking Directory Structure"
-echo "=================================================="
-echo ""
-
-if [ ! -d "$PROJECT_ROOT/exports" ]; then
-    echo "Creating exports directory..."
-    mkdir -p "$PROJECT_ROOT/exports"
-    echo "# Agent Exports" > "$PROJECT_ROOT/exports/README.md"
-    echo "" >> "$PROJECT_ROOT/exports/README.md"
-    echo "This directory is the default location for generated agent packages." >> "$PROJECT_ROOT/exports/README.md"
-    echo -e "${GREEN}✓${NC} Created exports directory"
-else
-    echo -e "${GREEN}✓${NC} exports directory exists"
-fi
-echo ""
-
-# Verify installations
-echo "=================================================="
-echo "Verifying Installation"
-echo "=================================================="
-echo ""
-
-cd "$PROJECT_ROOT"
-
-# Test framework import using core venv
-CORE_PYTHON="$PROJECT_ROOT/core/.venv/bin/python"
-if [ -f "$CORE_PYTHON" ]; then
-    if $CORE_PYTHON -c "import framework; print('framework OK')" > /dev/null 2>&1; then
-        echo -e "${GREEN}✓${NC} framework package imports successfully"
-    else
-        echo -e "${RED}✗${NC} framework package import failed"
-        echo -e "${YELLOW}  Note: This may be OK if you don't need the framework${NC}"
-    fi
-else
-    echo -e "${RED}✗${NC} core/.venv not found - venv creation may have failed${NC}"
-    exit 1
-fi
-
-# Test aden_tools import using tools venv
-TOOLS_PYTHON="$PROJECT_ROOT/tools/.venv/bin/python"
-if [ -f "$TOOLS_PYTHON" ]; then
-    if $TOOLS_PYTHON -c "import aden_tools; print('aden_tools OK')" > /dev/null 2>&1; then
-        echo -e "${GREEN}✓${NC} aden_tools package imports successfully"
-    else
-        echo -e "${RED}✗${NC} aden_tools package import failed"
-        exit 1
-    fi
-else
-    echo -e "${RED}✗${NC} tools/.venv not found - venv creation may have failed${NC}"
-    exit 1
-fi
-
-# Test litellm + openai compatibility using tools venv
-if $TOOLS_PYTHON -c "import litellm; print('litellm OK')" > /dev/null 2>&1; then
-    echo -e "${GREEN}✓${NC} litellm package imports successfully"
-else
-    echo -e "${YELLOW}⚠${NC} litellm import had issues (may be OK if not using LLM features)"
-fi
-
-echo ""
-
-# Print agent commands
-echo "=================================================="
-echo "  Setup Complete!"
-echo "=================================================="
-echo ""
-echo "Python packages installed:"
-echo "  • framework (core agent runtime)"
-echo "  • aden_tools (tools and MCP servers)"
-echo "  • All dependencies and compatibility fixes applied"
-echo ""
-echo "To run agents, use:"
-echo ""
-echo "  ${BLUE}# From project root:${NC}"
-echo "  PYTHONPATH=exports uv run python -m agent_name validate"
-echo "  PYTHONPATH=exports uv run python -m agent_name info"
-echo "  PYTHONPATH=exports uv run python -m agent_name run --input '{...}'"
-echo ""
-echo "Available commands for your new agent:"
-echo "  PYTHONPATH=exports uv run python -m support_ticket_agent validate"
-echo "  PYTHONPATH=exports uv run python -m support_ticket_agent info"
-echo "  PYTHONPATH=exports uv run python -m support_ticket_agent run --input '{\"ticket_content\":\"...\",\"customer_id\":\"...\",\"ticket_id\":\"...\"}'"
-echo ""
-echo "To build new agents, use Claude Code skills:"
-echo "  • /building-agents - Build a new agent"
-echo "  • /testing-agent   - Test an existing agent"
-echo ""
-echo "Documentation: ${PROJECT_ROOT}/README.md"
-echo "Agent Examples: ${PROJECT_ROOT}/exports/"
-echo ""
@@ -1,79 +0,0 @@
-#!/bin/bash
-# Legacy Web Application Setup Script
-# NOTE: This script is for the archived honeycomb/hive web application.
-# For agent development, use: ./quickstart.sh
-
-set -e
-
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
-
-echo "==================================="
-echo "  Legacy Web App Setup (Archived)"
-echo "==================================="
-echo ""
-echo "⚠️  This script is for the archived web application."
-echo "    For agent development, use: ./quickstart.sh"
-echo ""
-
-# Check for Node.js
-if ! command -v node &> /dev/null; then
-    echo "Error: Node.js is not installed."
-    echo "Please install Node.js 20+ from https://nodejs.org"
-    exit 1
-fi
-
-NODE_VERSION=$(node -v | cut -d'v' -f2 | cut -d'.' -f1)
-if [ "$NODE_VERSION" -lt 20 ]; then
-    echo "Error: Node.js 20+ is required (found v$NODE_VERSION)"
-    exit 1
-fi
-
-echo "✓ Node.js $(node -v) detected"
-
-# Check for Docker (optional)
-if command -v docker &> /dev/null; then
-    echo "✓ Docker $(docker --version | cut -d' ' -f3 | tr -d ',') detected"
-else
-    echo "⚠ Docker not found (optional, needed for containerized deployment)"
-fi
-
-echo ""
-
-# Create config.yaml if it doesn't exist
-if [ ! -f "$PROJECT_ROOT/config.yaml" ]; then
-    echo "Creating config.yaml from template..."
-    cp "$PROJECT_ROOT/config.yaml.example" "$PROJECT_ROOT/config.yaml"
-    echo "✓ Created config.yaml"
-    echo ""
-    echo "  Please review and edit config.yaml with your settings."
-    echo ""
-else
-    echo "✓ config.yaml already exists"
-fi
-
-# Install dependencies
-echo ""
-echo "Installing dependencies..."
-cd "$PROJECT_ROOT"
-npm install
-echo "✓ Dependencies installed"
-
-# Generate environment files
-echo ""
-echo "Generating environment files from config.yaml..."
-npx tsx scripts/generate-env.ts
-echo "✓ Environment files generated"
-
-echo ""
-echo "==================================="
-echo "  Setup Complete (Legacy)"
-echo "==================================="
-echo ""
-echo "⚠️  NOTE: The honeycomb/hive web application has been archived."
-echo ""
-echo "For agent development, please use:"
-echo "  ./quickstart.sh"
-echo ""
-echo "See ENVIRONMENT_SETUP.md for complete agent development guide."
-echo ""
@@ -26,6 +26,7 @@ from .email_tool import register_tools as register_email
 from .example_tool import register_tools as register_example
 from .file_system_toolkits.apply_diff import register_tools as register_apply_diff
 from .file_system_toolkits.apply_patch import register_tools as register_apply_patch
+from .file_system_toolkits.data_tools import register_tools as register_data_tools
 from .file_system_toolkits.execute_command_tool import (
    register_tools as register_execute_command,
 )
@@ -82,6 +83,7 @@ def register_all_tools(
    register_apply_patch(mcp)
    register_grep_search(mcp)
    register_execute_command(mcp)
+    register_data_tools(mcp)
    register_csv(mcp)

    return [
@@ -97,6 +99,9 @@ def register_all_tools(
        "apply_patch",
        "grep_search",
        "execute_command_tool",
+        "load_data",
+        "save_data",
+        "list_data_files",
        "csv_read",
        "csv_write",
        "csv_append",
@@ -0,0 +1,3 @@
+from .data_tools import register_tools
+
+__all__ = ["register_tools"]
@@ -0,0 +1,179 @@
+"""
+Data Tools - Load, save, and list data files for agent pipelines.
+
+These tools let agents store large intermediate results in files and
+retrieve them with pagination, keeping the LLM conversation context small.
+Used in conjunction with the spillover system: when a tool result is too
+large, the framework writes it to a file and the agent can load it back
+with load_data().
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from mcp.server.fastmcp import FastMCP
+
+
+def register_tools(mcp: FastMCP) -> None:
+    """Register data management tools with the MCP server."""
+
+    @mcp.tool()
+    def save_data(filename: str, data: str, data_dir: str) -> dict:
+        """
+        Purpose
+            Save data to a file for later retrieval by this or downstream nodes.
+
+        When to use
+            Store large results (search results, profiles, analysis) instead
+            of passing them inline through set_output.
+            Returns a brief summary with the filename to reference later.
+
+        Rules & Constraints
+            filename must be a simple name like 'results.json' — no paths or '..'
+            data_dir must be the absolute path to the data directory
+
+        Args:
+            filename: Simple filename like 'github_users.json'. No paths or '..'.
+            data: The string data to write (typically JSON).
+            data_dir: Absolute path to the data directory.
+
+        Returns:
+            Dict with success status and file metadata, or error dict
+        """
+        if not filename or ".." in filename or "/" in filename or "\\" in filename:
+            return {"error": "Invalid filename. Use simple names like 'users.json'"}
+        if not data_dir:
+            return {"error": "data_dir is required"}
+
+        try:
+            dir_path = Path(data_dir)
+            dir_path.mkdir(parents=True, exist_ok=True)
+            path = dir_path / filename
+            path.write_text(data, encoding="utf-8")
+            lines = data.count("\n") + 1
+            return {
+                "success": True,
+                "filename": filename,
+                "size_bytes": len(data.encode("utf-8")),
+                "lines": lines,
+                "preview": data[:200] + ("..." if len(data) > 200 else ""),
+            }
+        except Exception as e:
+            return {"error": f"Failed to save data: {str(e)}"}
+
+    @mcp.tool()
+    def load_data(
+        filename: str,
+        data_dir: str,
+        offset: int = 0,
+        limit: int = 50,
+    ) -> dict:
+        """
+        Purpose
+            Load data from a previously saved file with pagination.
+
+        When to use
+            Retrieve large tool results that were spilled to disk.
+            Read data saved by save_data or by the spillover system.
+            Page through large files without loading everything into context.
+
+        Rules & Constraints
+            filename must match a file in data_dir
+            Returns a page of lines with metadata about the full file
+
+        Args:
+            filename: The filename to load (as shown in spillover messages or save_data results).
+            data_dir: Absolute path to the data directory.
+            offset: 0-based line number to start reading from. Default 0.
+            limit: Max number of lines to return. Default 50.
+
+        Returns:
+            Dict with content, pagination info, and metadata
+
+        Examples:
+            load_data('users.json', '/path/to/data')                      # first 50 lines
+            load_data('users.json', '/path/to/data', offset=50, limit=50) # next 50
+            load_data('users.json', '/path/to/data', limit=200)           # first 200 lines
+        """
+        if not filename or ".." in filename or "/" in filename or "\\" in filename:
+            return {"error": "Invalid filename"}
+        if not data_dir:
+            return {"error": "data_dir is required"}
+
+        try:
+            offset = int(offset)
+            limit = int(limit)
+            path = Path(data_dir) / filename
+            if not path.exists():
+                return {"error": f"File not found: {filename}"}
+
+            content = path.read_text(encoding="utf-8")
+            size_bytes = len(content.encode("utf-8"))
+
+            # If content is a single long line, try to pretty-print JSON so
+            # line-based pagination actually works.
+            all_lines = content.split("\n")
+            if len(all_lines) <= 2 and size_bytes > 500:
+                try:
+                    parsed = json.loads(content)
+                    content = json.dumps(parsed, indent=2, ensure_ascii=False)
+                    all_lines = content.split("\n")
+                except (json.JSONDecodeError, TypeError, ValueError):
+                    pass
+
+            total = len(all_lines)
+            start = min(offset, total)
+            end = min(start + limit, total)
+            sliced = all_lines[start:end]
+
+            return {
+                "success": True,
+                "filename": filename,
+                "content": "\n".join(sliced),
+                "total_lines": total,
+                "size_bytes": size_bytes,
+                "offset": start,
+                "lines_returned": len(sliced),
+                "has_more": end < total,
+            }
+        except Exception as e:
+            return {"error": f"Failed to load data: {str(e)}"}
+
+    @mcp.tool()
+    def list_data_files(data_dir: str) -> dict:
+        """
+        Purpose
+            List all data files in the data directory.
+
+        When to use
+            Discover what intermediate results or spillover files are available.
+            Check what data was saved by previous nodes in the pipeline.
+
+        Args:
+            data_dir: Absolute path to the data directory.
+
+        Returns:
+            Dict with list of files and their sizes
+        """
+        if not data_dir:
+            return {"error": "data_dir is required"}
+
+        try:
+            dir_path = Path(data_dir)
+            if not dir_path.exists():
+                return {"files": []}
+
+            files = []
+            for f in sorted(dir_path.iterdir()):
+                if f.is_file():
+                    files.append(
+                        {
+                            "filename": f.name,
+                            "size_bytes": f.stat().st_size,
+                        }
+                    )
+            return {"files": files}
+        except Exception as e:
+            return {"error": f"Failed to list data files: {str(e)}"}
@@ -11,6 +11,7 @@ Auto-detection: If provider="auto", tries Brave first (backward compatible), the
 from __future__ import annotations

 import os
+import time
 from typing import TYPE_CHECKING, Literal

 import httpx
@@ -35,27 +36,35 @@ def register_tools(
        cse_id: str,
    ) -> dict:
        """Execute search using Google Custom Search API."""
-        response = httpx.get(
-            "https://www.googleapis.com/customsearch/v1",
-            params={
-                "key": api_key,
-                "cx": cse_id,
-                "q": query,
-                "num": min(num_results, 10),
-                "lr": f"lang_{language}",
-                "gl": country,
-            },
-            timeout=30.0,
-        )
+        max_retries = 3
+        for attempt in range(max_retries + 1):
+            response = httpx.get(
+                "https://www.googleapis.com/customsearch/v1",
+                params={
+                    "key": api_key,
+                    "cx": cse_id,
+                    "q": query,
+                    "num": min(num_results, 10),
+                    "lr": f"lang_{language}",
+                    "gl": country,
+                },
+                timeout=30.0,
+            )

-        if response.status_code == 401:
-            return {"error": "Invalid Google API key"}
-        elif response.status_code == 403:
-            return {"error": "Google API key not authorized or quota exceeded"}
-        elif response.status_code == 429:
-            return {"error": "Google rate limit exceeded. Try again later."}
-        elif response.status_code != 200:
-            return {"error": f"Google API request failed: HTTP {response.status_code}"}
+            if response.status_code == 429 and attempt < max_retries:
+                time.sleep(2**attempt)
+                continue
+
+            if response.status_code == 401:
+                return {"error": "Invalid Google API key"}
+            elif response.status_code == 403:
+                return {"error": "Google API key not authorized or quota exceeded"}
+            elif response.status_code == 429:
+                return {"error": "Google rate limit exceeded. Try again later."}
+            elif response.status_code != 200:
+                return {"error": f"Google API request failed: HTTP {response.status_code}"}
+
+            break

        data = response.json()
        results = []
@@ -82,26 +91,34 @@ def register_tools(
        api_key: str,
    ) -> dict:
        """Execute search using Brave Search API."""
-        response = httpx.get(
-            "https://api.search.brave.com/res/v1/web/search",
-            params={
-                "q": query,
-                "count": min(num_results, 20),
-                "country": country,
-            },
-            headers={
-                "X-Subscription-Token": api_key,
-                "Accept": "application/json",
-            },
-            timeout=30.0,
-        )
+        max_retries = 3
+        for attempt in range(max_retries + 1):
+            response = httpx.get(
+                "https://api.search.brave.com/res/v1/web/search",
+                params={
+                    "q": query,
+                    "count": min(num_results, 20),
+                    "country": country,
+                },
+                headers={
+                    "X-Subscription-Token": api_key,
+                    "Accept": "application/json",
+                },
+                timeout=30.0,
+            )

-        if response.status_code == 401:
-            return {"error": "Invalid Brave API key"}
-        elif response.status_code == 429:
-            return {"error": "Brave rate limit exceeded. Try again later."}
-        elif response.status_code != 200:
-            return {"error": f"Brave API request failed: HTTP {response.status_code}"}
+            if response.status_code == 429 and attempt < max_retries:
+                time.sleep(2**attempt)
+                continue
+
+            if response.status_code == 401:
+                return {"error": "Invalid Brave API key"}
+            elif response.status_code == 429:
+                return {"error": "Brave rate limit exceeded. Try again later."}
+            elif response.status_code != 200:
+                return {"error": f"Brave API request failed: HTTP {response.status_code}"}
+
+            break

        data = response.json()
        results = []