Merge pull request #4304 from adenhq/fix/init-config

model selection + max_tokens in quickstart
Merge pull request #4270 from TimothyZhang7/feature/hard-goal-negotiation
2026-02-09 20:11:55 -08:00 · 2026-02-09 20:04:20 -08:00 · 2026-02-09 19:59:49 -08:00 · 2026-02-09 19:45:55 -08:00 · 2026-02-09 19:42:55 -08:00 · 2026-02-09 19:30:49 -08:00
95 changed files with 7565 additions and 807 deletions
@@ -1,10 +1,10 @@
 ---
 name: hive-create
-description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
+description: Step-by-step guide for building goal-driven agents. Qualifies use cases first (the good, bad, and ugly), then creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
 license: Apache-2.0
 metadata:
  author: hive
-  version: "2.1"
+  version: "2.2"
  type: procedural
  part_of: hive
  requires: hive-concepts
@@ -14,66 +14,427 @@ metadata:

 **THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**

-**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 1 — call the MCP tools listed in Step 1 as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 1 now.
+**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 0 — determine the build path as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 0 now.

 ---

-## STEP 1: Initialize Build Environment
+## STEP 0: Choose Build Path
+
+**If the user has already indicated whether they want to build from scratch or from a template, skip this question and proceed to the appropriate step.**
+
+Otherwise, ask:
+
+```
+AskUserQuestion(questions=[{
+    "question": "How would you like to build your agent?",
+    "header": "Build Path",
+    "options": [
+        {"label": "From scratch", "description": "Design goal, nodes, and graph collaboratively from nothing"},
+        {"label": "From a template", "description": "Start from a working sample agent and customize it"}
+    ],
+    "multiSelect": false
+}])
+```
+
+- If **From scratch**: Proceed to STEP 1A
+- If **From a template**: Proceed to STEP 1B
+
+---
+
+## STEP 1A: Initialize Build Environment (From Scratch)

 **EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):

-1. Register the hive-tools MCP server:
+1. Check for existing sessions:

 ```
-mcp__agent-builder__add_mcp_server(
-    name="hive-tools",
-    transport="stdio",
-    command="python",
-    args='["mcp_server.py", "--stdio"]',
-    cwd="tools",
-    description="Hive tools MCP server"
-)
+mcp__agent-builder__list_sessions()
 ```

+- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to step 3.
+- If no matching session exists, proceed to step 2.
+
 2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):

 ```
 mcp__agent-builder__create_session(name="AGENT_NAME")
 ```

-3. Discover available tools:
+3. Register the hive-tools MCP server:
+
+```
+mcp__agent-builder__add_mcp_server(
+    name="hive-tools",
+    transport="stdio",
+    command="uv",
+    args='["run", "python", "mcp_server.py", "--stdio"]',
+    cwd="tools",
+    description="Hive tools MCP server"
+)
+```
+
+4. Discover available tools:

 ```
 mcp__agent-builder__list_mcp_tools()
 ```

-4. Create the package directory:
+5. Create the package directory:

 ```bash
 mkdir -p exports/AGENT_NAME/nodes
 ```

-**Save the tool list for step 3** — you will need it for node design in STEP 3.
+**Save the tool list for STEP 4** — you will need it for node design.

 **THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).

 ---

+## STEP 1B: Initialize Build Environment (From Template)
+
+**EXECUTE THESE STEPS NOW:**
+
+### 1B.1: Discover available templates
+
+List the template directories and read each template's `agent.json` to get its name and description:
+
+```bash
+ls examples/templates/
+```
+
+For each directory found, read `examples/templates/TEMPLATE_DIR/agent.json` with the Read tool and extract:
+- `agent.name` — the template's display name
+- `agent.description` — what the template does
+
+### 1B.2: Present templates to user
+
+Show the user a table of available templates:
+
+> **Available Templates:**
+>
+> | # | Template | Description |
+> |---|----------|-------------|
+> | 1 | [name from agent.json] | [description from agent.json] |
+> | 2 | ... | ... |
+
+Then ask the user to pick a template and provide a name for their new agent:
+
+```
+AskUserQuestion(questions=[{
+    "question": "Which template would you like to start from?",
+    "header": "Template",
+    "options": [
+        {"label": "[template 1 name]", "description": "[template 1 description]"},
+        {"label": "[template 2 name]", "description": "[template 2 description]"},
+        ...
+    ],
+    "multiSelect": false
+}, {
+    "question": "What should the new agent be named? (snake_case)",
+    "header": "Agent Name",
+    "options": [
+        {"label": "Use template name", "description": "Keep the original template name as-is"},
+        {"label": "Custom name", "description": "I'll provide a new snake_case name"}
+    ],
+    "multiSelect": false
+}])
+```
+
+### 1B.3: Copy template to exports
+
+```bash
+cp -r examples/templates/TEMPLATE_DIR exports/NEW_AGENT_NAME
+```
+
+### 1B.4: Create session and register MCP (same logic as STEP 1A)
+
+First, check for existing sessions:
+
+```
+mcp__agent-builder__list_sessions()
+```
+
+- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to `list_mcp_tools`.
+- If no matching session exists, create one:
+
+```
+mcp__agent-builder__create_session(name="NEW_AGENT_NAME")
+```
+
+Then register MCP and discover tools:
+
+```
+mcp__agent-builder__add_mcp_server(
+    name="hive-tools",
+    transport="stdio",
+    command="uv",
+    args='["run", "python", "mcp_server.py", "--stdio"]',
+    cwd="tools",
+    description="Hive tools MCP server"
+)
+```
+
+```
+mcp__agent-builder__list_mcp_tools()
+```
+
+### 1B.5: Load template into builder session
+
+Import the entire agent definition in one call:
+
+```
+mcp__agent-builder__import_from_export(agent_json_path="exports/NEW_AGENT_NAME/agent.json")
+```
+
+This reads the agent.json and populates the builder session with the goal, all nodes, and all edges.
+
+**THEN immediately proceed to STEP 2.**
+
+---
+
 ## STEP 2: Define Goal Together with User
+**A responsible engineer doesn't jump into building. First, understand the problem and be transparent about what the framework can and cannot do.**
+
+**If starting from a template**, the goal is already loaded in the builder session. Present the existing goal to the user using the format below and ask for approval. Skip the collaborative drafting questions — go straight to presenting and asking "Do you approve this goal, or would you like to modify it?"
+
+**If the user has NOT already described what they want to build**, start by asking what kind of agent they have in mind:
+
+```
+AskUserQuestion(questions=[{
+    "question": "What kind of agent do you want to build? Select an option below, or choose 'Other' to describe your own.",
+    "header": "Agent type",
+    "options": [
+        {"label": "Data collection", "description": "Gathers information from the web, analyzes it, and produces a report or sends outreach (e.g. market research, news digest, email campaigns, competitive analysis)"},
+        {"label": "Workflow automation", "description": "Automates a multi-step business process end-to-end (e.g. lead qualification, content publishing pipeline, data entry)"},
+        {"label": "Personal assistant", "description": "Handles recurring tasks or monitors for events and acts on them (e.g. daily briefings, meeting prep, file organization)"}
+    ],
+    "multiSelect": false
+}])
+```
+
+Use the user's selection (or their custom description if they chose "Other") as context when shaping the goal below. If the user already described what they want before this step, skip the question and proceed directly.

 **DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.

-**START by asking the user to help shape the goal:**
+### 2a: Fast Discovery (3-8 Turns)

-> I've set up the build environment and discovered [N] available tools. Let's define the goal for your agent together.
->
-> To get started, can you help me understand:
->
-> 1. **What should this agent accomplish?** (the core purpose)
-> 2. **How will we know it succeeded?** (what does "done" look like)
-> 3. **Are there any hard constraints?** (things it must never do, quality bars, etc.)
+**The core principle**: Discovery should feel like progress, not paperwork. The stakeholder should walk away feeling like you understood them faster than anyone else would have.

-**WAIT for the user to respond.** Use their input to draft:
+**Communication sytle**: Be concise. Say less. Mean more. Impatient stakeholders don't want a wall of text — they want to know you get it. Every sentence you say should either move the conversation forward or prove you understood something. If it does neither, cut it.
+
+**Ask Question Rules: Respect Their Time.** Every question must earn its place by:
+1. **Preventing a costly wrong turn** — you're about to build the wrong thing
+2. **Unlocking a shortcut** — their answer lets you simplify the design
+3. **Surfacing a dealbreaker** — there's a constraint that changes everything
+4. **Provide Options** - Provide options to your questions if possible, but also always allow the user to type something beyong the options.
+
+If a question doesn't do one of these, don't ask it. Make an assumption, state it, and move on.
+
+---
+
+#### 2a.1: Let Them Talk, But Listen Like an Architect
+
+When the stakeholder describes what they want, don't just hear the words — listen for the architecture underneath. While they talk, mentally construct:
+
+- **The actors**: Who are the people/systems involved?
+- **The trigger**: What kicks off the workflow?
+- **The core loop**: What's the main thing that happens repeatedly?
+- **The output**: What's the valuable thing produced at the end?
+- **The pain**: What about today's situation is broken, slow, or missing?
+
+You are extracting a **domain model** from natural language in real time. Most stakeholders won't give you this structure explicitly — they'll give you a story. Your job is to hear the structure inside the story.
+
+| They say... | You're hearing... |
+|-------------|-------------------|
+| Nouns they repeat | Your entities |
+| Verbs they emphasize | Your core operations |
+| Frustrations they mention | Your design constraints |
+| Workarounds they describe | What the system must replace |
+| People they name | Your user types |
+
+---
+
+#### 2a.2: Use Domain Knowledge to Fill In the Blanks
+
+You have broad knowledge of how systems work. Use it aggressively.
+
+If they say "I need a research agent," you already know it probably involves: search, summarization, source tracking, and iteration. Don't ask about each — use them as your starting mental model and let their specifics override your defaults.
+
+If they say "I need to monitor files and alert me," you know this probably involves: watch patterns, triggers, notifications, and state tracking.
+
+**The key move**: Take your general knowledge of the domain and merge it with the specifics they've given you. The result is a draft understanding that's 60-80% right before you've asked a single question. Your questions close the remaining 20-40%.
+
+---
+
+#### 2a.3: Play Back a Proposed Model (Not a List of Questions)
+
+After listening, present a **concrete picture** of what you think they need. Make it specific enough that they can spot what's wrong.
+
+**Pattern: "Here's what I heard — tell me where I'm off"**
+
+> "OK here's how I'm picturing this: [User type] needs to [core action]. Right now they're [current painful workflow]. What you want is [proposed solution that replaces the pain].
+>
+> The way I'd structure this: [key entities] connected by [key relationships], with the main flow being [trigger → steps → outcome].
+>
+> For the MVP, I'd focus on [the one thing that delivers the most value] and hold off on [things that can wait].
+>
+> Before I start — [1-2 specific questions you genuinely can't infer]."
+
+Why this works:
+- **Proves you were listening** — they don't feel like they have to repeat themselves
+- **Shows competence** — you're already thinking in systems
+- **Fast to correct** — "no, it's more like X" takes 10 seconds vs. answering 15 questions
+- **Creates momentum** — heading toward building, not more talking
+
+---
+
+#### 2a.4: Ask Only What You Cannot Infer
+
+Your questions should be **narrow, specific, and consequential**. Never ask what you could answer yourself.
+
+**Good questions** (high-stakes, can't infer):
+- "Who's the primary user — you or your end customers?"
+- "Is this replacing a spreadsheet, or is there literally nothing today?"
+- "Does this need to integrate with anything, or standalone?"
+- "Is there existing data to migrate, or starting fresh?"
+
+**Bad questions** (low-stakes, inferable):
+- "What should happen if there's an error?" *(handle gracefully, obviously)*
+- "Should it have search?" *(if there's a list, yes)*
+- "How should we handle permissions?" *(follow standard patterns)*
+- "What tools should I use?" *(your call, not theirs)*
+
+---
+
+#### Conversation Flow (3-5 Turns)
+
+| Turn | Who | What |
+|------|-----|------|
+| 1 | User | Describes what they need |
+| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions max. |
+| 3 | User | Corrects, confirms, or adds detail |
+| 4 | Agent | Adjusts model, confirms MVP scope, states assumptions, declares starting point |
+| *(5)* | *(Only if Turn 3 revealed something that fundamentally changes the approach)* |
+
+**AFTER the conversation, IMMEDIATELY proceed to 2b. DO NOT skip to building.**
+
+---
+
+#### Anti-Patterns
+
+| Don't | Do Instead |
+|-------|------------|
+| Open with a list of questions | Open with what you understood from their request |
+| "What are your requirements?" | "Here's what I think you need — am I right?" |
+| Ask about every edge case | Handle with smart defaults, flag in summary |
+| 10+ turn discovery conversation | 3-8 turns. Start building, iterate with real software. |
+| Being lazy nd not understand what user want to achieve | Understand "what" and "why |
+| Ask for permission to start | State your plan and start |
+| Wait for certainty | Start at 80% confidence, iterate the rest |
+| Ask what tech/tools to use | That's your job. Decide, disclose, move on. |
+
+---
+
+
+
+### 2b: Capability Assessment
+
+**After the user responds, analyze the fit.** Present this assessment honestly:
+
+> **Framework Fit Assessment**
+>
+> Based on what you've described, here's my honest assessment of how well this framework fits your use case:
+>
+> **What Works Well (The Good):**
+> - [List 2-4 things the framework handles well for this use case]
+> - Examples: multi-turn conversations, human-in-the-loop review, tool orchestration, structured outputs
+>
+> **Limitations to Be Aware Of (The Bad):**
+> - [List 2-3 limitations that apply but are workable]
+> - Examples: LLM latency means not suitable for sub-second responses, context window limits for very large documents, cost per run for heavy tool usage
+>
+> **Potential Deal-Breakers (The Ugly):**
+> - [List any significant challenges or missing capabilities — be honest]
+> - Examples: no tool available for X, would require custom MCP server, framework not designed for Y
+
+**Be specific.** Reference the actual tools discovered in Step 1. If the user needs `send_email` but it's not available, say so. If they need real-time streaming from a database, explain that's not how the framework works.
+
+### 2c: Gap Analysis
+
+**Identify specific gaps** between what the user wants and what you can deliver:
+
+| Requirement | Framework Support | Gap/Workaround |
+|-------------|-------------------|----------------|
+| [User need] | [✅ Supported / ⚠️ Partial / ❌ Not supported] | [How to handle or why it's a problem] |
+
+**Examples of gaps to identify:**
+- Missing tools (user needs X, but only Y and Z are available)
+- Scope issues (user wants to process 10,000 items, but LLM rate limits apply)
+- Interaction mismatches (user wants CLI-only, but agent is designed for TUI)
+- Data flow issues (user needs to persist state across runs, but sessions are isolated)
+- Latency requirements (user needs instant responses, but LLM calls take seconds)
+
+### 2d: Recommendation
+
+**Give a clear recommendation:**
+
+> **My Recommendation:**
+>
+> [One of these three:]
+>
+> **✅ PROCEED** — This is a good fit. The framework handles your core needs well. [List any minor caveats.]
+>
+> **⚠️ PROCEED WITH SCOPE ADJUSTMENT** — This can work, but we should adjust: [specific changes]. Without these adjustments, you'll hit [specific problems].
+>
+> **🛑 RECONSIDER** — This framework may not be the right tool for this job because [specific reasons]. Consider instead: [alternatives — simpler script, different framework, custom solution].
+
+### 2e: Get Explicit Acknowledgment
+
+**CALL AskUserQuestion:**
+
+```
+AskUserQuestion(questions=[{
+    "question": "Based on this assessment, how would you like to proceed?",
+    "header": "Proceed",
+    "options": [
+        {"label": "Proceed as described", "description": "I understand the limitations, let's build it"},
+        {"label": "Adjust scope", "description": "Let's modify the requirements to fit better"},
+        {"label": "More questions", "description": "I have questions about the assessment"},
+        {"label": "Reconsider", "description": "Maybe this isn't the right approach"}
+    ],
+    "multiSelect": false
+}])
+```
+
+**WAIT for user response.**
+
+- If **Proceed**: Move to STEP 3
+- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
+- If **More questions**: Answer them honestly, then ask again
+- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, that's their informed choice
+
+---
+
+## STEP 3: Define Goal Together with User
+
+**Now that the use case is qualified, collaborate on the goal definition.**
+
+**START by synthesizing what you learned:**
+
+> Based on our discussion, here's my understanding of the goal:
+>
+> **Core purpose:** [what you understood from 2a]
+> **Success looks like:** [what you inferred]
+> **Key constraints:** [what you inferred]
+>
+> Let me refine this with you:
+>
+> 1. **What should this agent accomplish?** (confirm or correct my understanding)
+> 2. **How will we know it succeeded?** (what specific outcomes matter)
+> 3. **Are there any hard constraints?** (things it must never do, quality bars)
+
+**WAIT for the user to respond.** Use their input (and the agent type they selected) to draft:

 - Goal ID (kebab-case)
 - Goal name
@@ -115,12 +476,14 @@ AskUserQuestion(questions=[{

 **WAIT for user response.**

- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
+- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 4
 - If **Modify**: Ask what they want to change, update the draft, ask again

 ---

-## STEP 3: Design Conceptual Nodes
+## STEP 4: Design Conceptual Nodes
+
+**If starting from a template**, the nodes are already loaded in the builder session. Present the existing nodes using the table format below and ask for approval. Skip the design phase.

 **BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.

@@ -173,12 +536,14 @@ AskUserQuestion(questions=[{

 **WAIT for user response.**

- If **Approve**: Proceed to STEP 4
+- If **Approve**: Proceed to STEP 5
 - If **Modify**: Ask what they want to change, update design, ask again

 ---

-## STEP 4: Design Full Graph and Review
+## STEP 5: Design Full Graph and Review
+
+**If starting from a template**, the edges are already loaded in the builder session. Render the existing graph as ASCII art and present it to the user for approval. Skip the edge design phase.

 **DETERMINE the edges** connecting the approved nodes. For each edge:

@@ -288,16 +653,37 @@ AskUserQuestion(questions=[{

 **WAIT for user response.**

- If **Approve**: Proceed to STEP 5
+- If **Approve**: Proceed to STEP 6
 - If **Modify**: Ask what they want to change, update the graph, re-render, ask again

 ---

-## STEP 5: Build the Agent
+## STEP 6: Build the Agent

 **NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.

-### 5a: Register nodes and edges with MCP
+### 6a: Register nodes and edges with MCP
+**If starting from a template**, the copied files will be overwritten with the approved design. You MUST replace every occurrence of the old template name with the new agent name. Here is the complete checklist — miss NONE of these:
+
+| File | What to rename |
+|------|---------------|
+| `config.py` | `AgentMetadata.name` — the display name shown in TUI agent selection |
+| `config.py` | `AgentMetadata.description` — agent description |
+| `agent.py` | Module docstring (line 1) |
+| `agent.py` | `class OldNameAgent:` → `class NewNameAgent:` |
+| `agent.py` | `GraphSpec(id="old-name-graph")` → `GraphSpec(id="new-name-graph")` — shown in TUI status bar |
+| `agent.py` | Storage path: `Path.home() / ".hive" / "agents" / "old_name"` → `"new_name"` |
+| `__main__.py` | Module docstring (line 1) |
+| `__main__.py` | `from .agent import ... OldNameAgent` → `NewNameAgent` |
+| `__main__.py` | CLI help string in `def cli()` docstring |
+| `__main__.py` | All `OldNameAgent()` instantiations |
+| `__main__.py` | Storage path (duplicated from agent.py) |
+| `__main__.py` | Shell banner string (e.g. `"=== Old Name Agent ==="`) |
+| `__init__.py` | Package docstring |
+| `__init__.py` | `from .agent import OldNameAgent` import |
+| `__init__.py` | `__all__` list entry |
+
+**If starting from a template and no modifications were made in Steps 2-5**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).

 **FOR EACH approved node**, call:

@@ -337,9 +723,9 @@ mcp__agent-builder__validate_graph()
 ```

 - If invalid: Fix the issues and re-validate
- If valid: Continue to 5b
+- If valid: Continue to 6b

-### 5b: Write Python package files
+### 6b: Write Python package files

 **EXPORT the graph data:**

@@ -363,6 +749,24 @@ mcp__agent-builder__export_graph()
 - NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
 - NOT: `{"first-node-id"}` (WRONG - this is a set)

+**IMPORTANT mcp_servers.json format:**
+
+```json
+{
+  "hive-tools": {
+    "transport": "stdio",
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
+    "cwd": "../../tools",
+    "description": "Hive tools MCP server"
+  }
+}
+```
+
+- NO `"mcpServers"` wrapper (that's Claude Desktop format, NOT hive format)
+- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
+- `command` MUST be `"uv"` with `"args": ["run", "python", ...]` (NOT bare `"python"` which fails on Mac)
+
 **Use the example agent** at `.claude/skills/hive-create/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.

 **AFTER writing all files, tell the user:**
@@ -381,7 +785,7 @@ mcp__agent-builder__export_graph()

 ---

-## STEP 6: Verify and Test
+## STEP 7: Verify and Test

 **RUN validation:**

@@ -407,7 +811,9 @@ cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME vali
 │                                                                             │
 │  2. RUN YOUR AGENT:                                                         │
 │                                                                             │
-│     PYTHONPATH=core:exports python -m AGENT_NAME tui                        │
+│     hive tui                                                                │
+│                                                                             │
+│     Then select your agent from the list and press Enter.                   │
 │                                                                             │
 │  3. DEBUG ANY ISSUES:                                                       │
 │                                                                             │
@@ -487,7 +893,7 @@ EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `
 from framework.graph.executor import GraphExecutor
 from framework.runtime.core import Runtime

-storage_path = Path.home() / ".hive" / "my_agent"
+storage_path = Path.home() / ".hive" / "agents" / "my_agent"
 storage_path.mkdir(parents=True, exist_ok=True)
 runtime = Runtime(storage_path)

@@ -505,15 +911,70 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)

 ---

+## REFERENCE: Framework Capabilities for Qualification
+
+Use this reference during STEP 2 to give accurate, honest assessments.
+
+### What the Framework Does Well (The Good)
+
+| Capability | Description |
+|------------|-------------|
+| Multi-turn conversations | Client-facing nodes stream to users and block for input |
+| Human-in-the-loop review | Approval checkpoints with feedback loops back to earlier nodes |
+| Tool orchestration | LLM can call multiple tools, framework handles execution |
+| Structured outputs | `set_output` produces validated, typed outputs |
+| Parallel execution | Fan-out/fan-in for concurrent node execution |
+| Context management | Automatic compaction and spillover for large data |
+| Error recovery | Retry logic, judges, and feedback edges for self-correction |
+| Session persistence | State saved to disk, resumable sessions |
+
+### Framework Limitations (The Bad)
+
+| Limitation | Impact | Workaround |
+|------------|--------|------------|
+| LLM latency | 2-10+ seconds per turn | Not suitable for real-time/low-latency needs |
+| Context window limits | ~128K tokens max | Use data tools for spillover, design for chunking |
+| Cost per run | LLM API calls cost money | Budget planning, caching where possible |
+| Rate limits | API throttling on heavy usage | Backoff, queue management |
+| Node boundaries lose context | Outputs must be serialized | Prefer fewer, richer nodes |
+| Single-threaded within node | One LLM call at a time per node | Use fan-out for parallelism |
+
+### Not Designed For (The Ugly)
+
+| Use Case | Why It's Problematic | Alternative |
+|----------|---------------------|-------------|
+| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
+| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
+| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
+| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
+| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
+| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
+| Multi-user concurrency | Single-user session model | Separate agent instances per user |
+
+### Tool Availability Reality Check
+
+**Before promising any capability, check `list_mcp_tools()`.** Common gaps:
+
+- **Email**: May not have `send_email` — check before promising email automation
+- **Calendar**: May not have calendar APIs — check before promising scheduling
+- **Database**: May not have SQL tools — check before promising data queries
+- **File system**: Has data tools but not arbitrary filesystem access
+- **External APIs**: Depends entirely on what MCP servers are registered
+
+---
+
 ## COMMON MISTAKES TO AVOID

-1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
-2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
-3. **Skipping validation** - Always validate nodes and graph before proceeding
-4. **Not waiting for approval** - Always ask user before major steps
-5. **Displaying this file** - Execute the steps, don't show documentation
-6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
-7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
-8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
-9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
-10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
+1. **Skipping use case qualification** - A responsible engineer qualifies the use case BEFORE building. Be transparent about what works, what doesn't, and what's problematic
+2. **Hiding limitations** - Don't oversell the framework. If a tool doesn't exist or a capability is missing, say so upfront
+3. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
+4. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
+5. **Skipping validation** - Always validate nodes and graph before proceeding
+6. **Not waiting for approval** - Always ask user before major steps
+7. **Displaying this file** - Execute the steps, don't show documentation
+8. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
+9. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
+10. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
+11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
+12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
+13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
@@ -90,7 +90,7 @@ def tui(mock, verbose, debug):
        agent._event_bus = EventBus()
        agent._tool_registry = ToolRegistry()

-        storage_path = Path.home() / ".hive" / "deep_research_agent"
+        storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
        storage_path.mkdir(parents=True, exist_ok=True)

        mcp_config_path = Path(__file__).parent / "mcp_servers.json"
@@ -177,7 +177,7 @@ class DeepResearchAgent:
        """Set up the executor with all components."""
        from pathlib import Path

-        storage_path = Path.home() / ".hive" / "deep_research_agent"
+        storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
        storage_path.mkdir(parents=True, exist_ok=True)

        self._event_bus = EventBus()
@@ -1,33 +1,8 @@
 """Runtime configuration."""

-import json
-from dataclasses import dataclass, field
-from pathlib import Path
-
-
-def _load_preferred_model() -> str:
-    """Load preferred model from ~/.hive/configuration.json."""
-    config_path = Path.home() / ".hive" / "configuration.json"
-    if config_path.exists():
-        try:
-            with open(config_path) as f:
-                config = json.load(f)
-            llm = config.get("llm", {})
-            if llm.get("provider") and llm.get("model"):
-                return f"{llm['provider']}/{llm['model']}"
-        except Exception:
-            pass
-    return "anthropic/claude-sonnet-4-20250514"
-
-
-@dataclass
-class RuntimeConfig:
-    model: str = field(default_factory=_load_preferred_model)
-    temperature: float = 0.7
-    max_tokens: int = 40000
-    api_key: str | None = None
-    api_base: str | None = None
+from dataclasses import dataclass

+from framework.config import RuntimeConfig

 default_config = RuntimeConfig()

@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
  }
@@ -34,7 +34,7 @@ Before using this skill, ensure:
 1. You have an exported agent in `exports/{agent_name}/`
 2. The agent has been run at least once (logs exist)
 3. Runtime logging is enabled (default in Hive framework)
-4. You have access to the agent's working directory at `~/.hive/{agent_name}/`
+4. You have access to the agent's working directory at `~/.hive/agents/{agent_name}/`

 ---

@@ -51,7 +51,7 @@ Before using this skill, ensure:
   - Confirm the agent exists in `exports/{agent_name}/`

 2. **Determine agent working directory:**
-   - Calculate: `~/.hive/{agent_name}/`
+   - Calculate: `~/.hive/agents/{agent_name}/`
   - Verify this directory exists and contains session logs

 3. **Read agent configuration:**
@@ -495,11 +495,96 @@ max_node_visits=3  # Prevent getting stuck
 - Confirm it calls set_output eventually
 ```

+#### Template 6: Checkpoint Recovery (Post-Fix Resume)
+
+```markdown
+## Recovery Strategy: Resume from Last Clean Checkpoint
+
+**Situation:** You've fixed the issue, but the failed session is stuck mid-execution
+
+**Solution:** Resume execution from a checkpoint before the failure
+
+### Option A: Auto-Resume from Latest Checkpoint (Recommended)
+
+Use CLI arguments to auto-resume when launching TUI:
+
+```bash
+PYTHONPATH=core:exports python -m {agent_name} --tui \
+    --resume-session {session_id}
+```
+
+This will:
+- Load session state from `state.json`
+- Continue from where it paused/failed
+- Apply your fixes immediately
+
+### Option B: Resume from Specific Checkpoint (Time-Travel)
+
+If you need to go back to an earlier point:
+
+```bash
+PYTHONPATH=core:exports python -m {agent_name} --tui \
+    --resume-session {session_id} \
+    --checkpoint {checkpoint_id}
+```
+
+Example:
+```bash
+PYTHONPATH=core:exports python -m deep_research_agent --tui \
+    --resume-session session_20260208_143022_abc12345 \
+    --checkpoint cp_node_complete_intake_143030
+```
+
+### Option C: Use TUI Commands
+
+Alternatively, launch TUI normally and use commands:
+
+```bash
+# Launch TUI
+PYTHONPATH=core:exports python -m {agent_name} --tui
+
+# In TUI, use commands:
+/resume {session_id}                    # Resume from session state
+/recover {session_id} {checkpoint_id}   # Recover from specific checkpoint
+```
+
+### When to Use Each Option:
+
+**Use `/resume` (or --resume-session) when:**
+- You fixed credentials and want to retry
+- Agent paused and you want to continue
+- Agent failed and you want to retry from last state
+
+**Use `/recover` (or --resume-session + --checkpoint) when:**
+- You need to go back to an earlier checkpoint
+- You want to try a different path from a specific point
+- Debugging requires time-travel to earlier state
+
+### Find Available Checkpoints:
+
+```bash
+# In TUI:
+/sessions {session_id}
+
+# This shows all checkpoints with timestamps:
+Available Checkpoints: (3)
+  1. cp_node_complete_intake_143030
+  2. cp_node_complete_research_143115
+  3. cp_pause_research_143130
+```
+
+**Verification:**
+- Use `--resume-session` to test your fix immediately
+- No need to re-run from the beginning
+- Session continues with your code changes applied
+```
+
 **Selecting the right template:**
 - Match the issue category from Stage 4
 - Customize with specific details from Stage 5
 - Include actual error messages and code snippets
 - Provide file paths and line numbers when possible
+- **Always include recovery commands** (Template 6) after providing fix recommendations

 ---

@@ -532,7 +617,7 @@ max_node_visits=3  # Prevent getting stuck
   **Check if issue is resolved:**
   ```
   query_runtime_logs(
-       agent_work_dir="~/.hive/{agent_name}",
+       agent_work_dir="~/.hive/agents/{agent_name}",
       status="needs_attention",
       limit=5
   )
@@ -542,7 +627,7 @@ max_node_visits=3  # Prevent getting stuck
   **Verify specific node behavior:**
   ```
   query_runtime_log_details(
-       agent_work_dir="~/.hive/{agent_name}",
+       agent_work_dir="~/.hive/agents/{agent_name}",
       run_id="{new_run_id}",
       node_id="{fixed_node_id}"
   )
@@ -671,7 +756,7 @@ You: "I'll help debug the twitter_outreach agent. Let me gather context..."
 Context:
 - Agent: twitter_outreach
 - Goal: twitter-outreach-multi-loop
- Working Dir: ~/.hive/twitter_outreach
+- Working Dir: ~/.hive/agents/twitter_outreach
 - Success Criteria: ["Successfully send 5 personalized outreach messages"]
 - Constraints: ["Must verify handle exists", "Must personalize message"]
 - Nodes: intake-collector, profile-analyzer, message-composer, outreach-sender
@@ -834,12 +919,12 @@ Your agent should now work correctly!"
 ## Storage Locations Reference

 **New unified storage (default):**
- Logs: `~/.hive/{agent_name}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/logs/`
- State: `~/.hive/{agent_name}/sessions/{session_id}/state.json`
- Conversations: `~/.hive/{agent_name}/sessions/{session_id}/conversations/`
+- Logs: `~/.hive/agents/{agent_name}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/logs/`
+- State: `~/.hive/agents/{agent_name}/sessions/{session_id}/state.json`
+- Conversations: `~/.hive/agents/{agent_name}/sessions/{session_id}/conversations/`

 **Old storage (deprecated, still supported):**
- Logs: `~/.hive/{agent_name}/runtime_logs/runs/{run_id}/`
+- Logs: `~/.hive/agents/{agent_name}/runtime_logs/runs/{run_id}/`

 The MCP tools automatically check both locations.

@@ -19,14 +19,18 @@ metadata:

 **THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**

-When this skill is loaded, determine what the user needs and invoke the appropriate skill NOW:
- **User wants to build an agent** → Invoke `/hive-create` immediately
- **User wants to test an agent** → Invoke `/hive-test` immediately
- **User wants to learn concepts** → Invoke `/hive-concepts` immediately
- **User wants patterns/optimization** → Invoke `/hive-patterns` immediately
- **User wants to set up credentials** → Invoke `/hive-credentials` immediately
- **User has a failing/broken agent** → Invoke `/hive-debugger` immediately
- **Unclear what user needs** → Ask the user (do NOT explore the codebase to figure it out)
+When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
+
+```
+Use AskUserQuestion with these options:
+- "Build a new agent" → Then invoke /hive-create
+- "Test an existing agent" → Then invoke /hive-test
+- "Learn agent concepts" → Then invoke /hive-concepts
+- "Optimize agent design" → Then invoke /hive-patterns
+- "Set up credentials" → Then invoke /hive-credentials
+- "Debug a failing agent" → Then invoke /hive-debugger
+- "Other" (please describe what you want to achieve)
+```

 **DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.

@@ -73,7 +77,6 @@ Use this meta-skill when:

 ## Phase 0: Understand Concepts (Optional)

-**Duration**: 5-10 minutes
 **Skill**: `/hive-concepts`
 **Input**: Questions about agent architecture

@@ -95,9 +98,8 @@ Use this meta-skill when:

 ## Phase 1: Build Agent Structure

-**Duration**: 15-30 minutes
 **Skill**: `/hive-create`
-**Input**: User requirements ("Build an agent that...")
+**Input**: User requirements ("Build an agent that...") or a template to start from

 ### What This Phase Does

@@ -166,7 +168,6 @@ exports/agent_name/

 ## Phase 1.5: Optimize Design (Optional)

-**Duration**: 10-15 minutes
 **Skill**: `/hive-patterns`
 **Input**: Completed agent structure

@@ -191,22 +192,21 @@ exports/agent_name/

 ## Phase 2: Test & Validate

-**Duration**: 20-40 minutes
 **Skill**: `/hive-test`
 **Input**: Working agent from Phase 1

 ### What This Phase Does

-Creates comprehensive test suite:
- Constraint tests (verify hard requirements)
- Success criteria tests (measure goal achievement)
- Edge case tests (handle failures gracefully)
- Integration tests (end-to-end workflows)
+Guides the creation and execution of a comprehensive test suite:
+- Constraint tests
+- Success criteria tests
+- Edge case tests
+- Integration tests

 ### Process

 1. **Analyze agent** - Read goal, constraints, success criteria
-2. **Generate tests** - Create pytest files in `exports/agent_name/tests/`
+2. **Generate tests** - The calling agent writes pytest files in `exports/agent_name/tests/` using hive-test guidelines and templates
 3. **User approval** - Review and approve each test
 4. **Run evaluation** - Execute tests and collect results
 5. **Debug failures** - Identify and fix issues
@@ -287,6 +287,19 @@ User: "Build an agent (first time)"
 → Done: Production-ready agent
 ```

+### Pattern 1c: Build from Template
+
+```
+User: "Build an agent based on the deep research template"
+→ Use /hive-create
+→ Select "From a template" path
+→ Pick template, name new agent
+→ Review/modify goal, nodes, graph
+→ Agent exported with customizations
+→ Use /hive-test
+→ Done: Customized agent
+```
+
 ### Pattern 2: Test Existing Agent

 ```
@@ -490,6 +503,7 @@ The workflow is **flexible** - skip phases as needed, iterate freely, and adapt
 - Have clear requirements
 - Ready to write code
 - Want step-by-step guidance
+- Want to start from an existing template and customize it

 **Choose hive-patterns when:**
 - Agent structure complete
@@ -1 +0,0 @@
-../../.claude/skills/agent-workflow
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-construction
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-core
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-patterns
@@ -0,0 +1 @@
+../../.claude/skills/hive
@@ -0,0 +1 @@
+../../.claude/skills/hive-concepts
@@ -0,0 +1 @@
+../../.claude/skills/hive-create
@@ -0,0 +1 @@
+../../.claude/skills/hive-credentials
@@ -0,0 +1 @@
+../../.claude/skills/hive-patterns
@@ -0,0 +1 @@
+../../.claude/skills/hive-test
@@ -1 +0,0 @@
-../../.claude/skills/testing-agent
@@ -1,41 +0,0 @@
-# Changelog
-
-All notable changes to this project will be documented in this file.
-
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-
-## [Unreleased]
-
-### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
-
-### Changed
- N/A
-
-### Deprecated
- N/A
-
-### Removed
- N/A
-
-
-### Fixed
- tools: Fixed web_scrape tool attempting to parse non-HTML content (PDF, JSON) as HTML (#487)
-
-### Security
- N/A
-
-## [0.1.0] - 2025-01-13
-
-### Added
- Initial release
-
-[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
-[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
@@ -1,10 +1,10 @@
 # Contributing to Aden Agent Framework

-Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. We’re especially looking for help building tools, integrations([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If you’re interested in extending its functionality, this is the perfect place to start. 
+Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. We’re especially looking for help building tools, integrations ([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If you’re interested in extending its functionality, this is the perfect place to start. 

 ## Code of Conduct

-By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md).
+By participating in this project, you agree to abide by our [Code of Conduct](docs/CODE_OF_CONDUCT.md).

 ## Issue Assignment Policy

@@ -159,4 +159,4 @@ By submitting a Pull Request, you agree that your contributions will be licensed

 Feel free to open an issue for questions or join our [Discord community](https://discord.com/invite/MXE49hrKDk).

-Thank you for contributing!
+Thank you for contributing!
@@ -1,5 +1,5 @@
 <p align="center">
-  <img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
+  <img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
 </p>

 <p align="center">
@@ -13,16 +13,20 @@
  <a href="docs/i18n/ko.md">한국어</a>
 </p>

-[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
-[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
-[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
-[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
-[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
+<p align="center">
+  <a href="https://github.com/adenhq/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
+  <a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
+  <a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
+  <a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
+  <a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
+  <img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
+</p>
+

 <p align="center">
  <img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
  <img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
-  <img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
+  <img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
  <img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
  <img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
 </p>
@@ -30,15 +34,16 @@
  <img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
  <img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
  <img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
-  <img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
 </p>

 ## Overview

-Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
+Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.

 Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.

+https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
+
 ## Who Is Hive For?

 Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
@@ -58,36 +63,23 @@ Hive may not be the best fit if you’re only experimenting with simple agent ch
 Use Hive when you need:

 - Long-running, autonomous agents
- Multi-agent coordination
+- Strong guardrails, process, and controls
 - Continuous improvement based on failures
- Strong monitoring, safety, and budget controls
+- Multi-agent coordination
 - A framework that evolves with your goals

-## What is Aden
-
-<p align="center">
-  <img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
-</p>
-
-Aden is a platform for building, deploying, operating, and adapting AI agents:
-
- **Build** - A Coding Agent generates specialized Worker Agents (Sales, Marketing, Ops) from natural language goals
- **Deploy** - Headless deployment with CI/CD integration and full API lifecycle management
- **Operate** - Real-time monitoring, observability, and runtime guardrails keep agents reliable
- **Adapt** - Continuous evaluation, supervision, and adaptation ensure agents improve over time
- **Infra** - Shared memory, LLM integrations, tools, and skills power every agent
-
 ## Quick Links

 - **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
 - **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
 - **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
-<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
+- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
 - **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
+- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs

 ## Quick Start

-## Prerequisites
+### Prerequisites

 - Python 3.11+ for agent development
 - Claude Code or Cursor for utilizing agent skills
@@ -109,7 +101,9 @@ This sets up:

 - **framework** - Core agent runtime and graph executor (in `core/.venv`)
 - **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- All required Python dependencies
+- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
+- **LLM provider** - Interactive default model configuration
+- All required Python dependencies with `uv`

 ### Build Your First Agent

@@ -118,34 +112,39 @@ This sets up:
 claude> /hive

 # Test your agent
-claude> /hive-test
+claude> /hive-debugger

-# Run your agent
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
+# (at separate terminal) Launch the interactive dashboard
+hive tui
+
+# Or run directly
+hive run exports/your_agent_name --input '{"key": "value"}'
 ```

 **[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development

-### Cursor IDE Support
-
-Skills are also available in Cursor. To enable:
-
-1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
-2. Run `MCP: Enable` to enable MCP servers
-3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
-4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
-
 ## Features

- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
+- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
+- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
+- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
 - **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
+- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
 - **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
+- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
 - **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
 - **Production-Ready** - Self-hostable, built for scale and reliability

+## Integration
+
+<a href="https://github.com/adenhq/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
+
+Hive is built to be model-agnostic and system-agnostic.
+
+- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
+- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
+
+
 ## Why Aden

 Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
@@ -182,9 +181,9 @@ flowchart LR
    style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
 ```

-### The Aden Advantage
+### The Hive Advantage

-| Traditional Frameworks     | Aden                                   |
+| Traditional Frameworks     | Hive                                   |
 | -------------------------- | -------------------------------------- |
 | Hardcode agent workflows   | Describe goals in natural language     |
 | Manual graph definition    | Auto-generated agent graphs            |
@@ -195,46 +194,41 @@ flowchart LR

 ### How It Works

-1. **Define Your Goal** → Describe what you want to achieve in plain English
-2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
-3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
+1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
+2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
+3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
 4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
-5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
+5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically

-## Run pre-built Agents (Coming Soon)
+## Run Agents

-### Run a sample agent
-
-Aden Hive provides a list of featured agents that you can use and build on top of.
-
-### Run an agent shared by others
-
-Put the agent in `exports/` and run `PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'`
-
-For building and running goal-driven agents with the framework:
+The `hive` CLI is the primary interface for running agents.

 ```bash
-# One-time setup
-./quickstart.sh
+# Browse and run agents interactively (Recommended)
+hive tui

-# This sets up:
-# - framework package (core runtime)
-# - aden_tools package (MCP tools)
-# - All Python dependencies
+# Run a specific agent directly
+hive run exports/my_agent --input '{"task": "Your input here"}'

-# Build new agents using Agent Skills
-claude> /hive
+# Run a specific agent with the TUI dashboard
+hive run exports/my_agent --tui

-# Run agents
-PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
+# Interactive REPL
+hive shell
 ```

+The TUI scans both `exports/` and `examples/templates/` for available agents.
+
+> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
+
 See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.

 ## Documentation

 - **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
 - [Getting Started](docs/getting-started.md) - Quick setup instructions
+- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
 - [Configuration Guide](docs/configuration.md) - All configuration options
 - [Architecture Overview](docs/architecture/README.md) - System design and structure

@@ -368,10 +362,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS

 ## Frequently Asked Questions (FAQ)

-**Q: Does Hive depend on LangChain or other agent frameworks?**
-
-No. Hive is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
-
 **Q: What LLM providers does Hive support?**

 Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
@@ -382,37 +372,25 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma

 **Q: What makes Hive different from other agent frameworks?**

-Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
+Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.

 **Q: Is Hive open-source?**

 Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.

-**Q: Does Hive collect data from users?**
-
-Hive collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
-
-**Q: What deployment options does Hive support?**
-
-Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](docs/environment-setup.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
-
 **Q: Can Hive handle complex, production-scale use cases?**

 Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.

 **Q: Does Hive support human-in-the-loop workflows?**

-Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
-
-**Q: What monitoring and debugging tools does Hive provide?**
-
-Hive includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and MCP tools for agent execution, including file operations, web search, data processing, and more.
+Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.

 **Q: What programming languages does Hive support?**

 The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.

-**Q: Can Aden agents interact with external tools and APIs?**
+**Q: Can Hive agents interact with external tools and APIs?**

 Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.

@@ -436,10 +414,6 @@ Aden's adaptation loop begins working from the first execution. When an agent fa

 Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.

-**Q: Does Aden offer enterprise support?**
-
-For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
-
 ---

 <p align="center">
@@ -1,4 +1,5 @@
 exports/
 docs/
+.agent-builder-sessions/
 .pytest_cache/
 **/__pycache__/
@@ -4,8 +4,8 @@
      "name": "tools",
      "description": "Aden tools including web search, file operations, and PDF reading",
      "transport": "stdio",
-      "command": "python",
-      "args": ["mcp_server.py", "--stdio"],
+      "command": "uv",
+      "args": ["run", "python", "mcp_server.py", "--stdio"],
      "cwd": "../tools",
      "env": {
        "BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
@@ -0,0 +1,64 @@
+"""Shared Hive configuration utilities.
+
+Centralises reading of ~/.hive/configuration.json so that the runner
+and every agent template share one implementation instead of copy-pasting
+helper functions.
+"""
+
+import json
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+from framework.graph.edge import DEFAULT_MAX_TOKENS
+
+# ---------------------------------------------------------------------------
+# Low-level config file access
+# ---------------------------------------------------------------------------
+
+HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
+
+
+def get_hive_config() -> dict[str, Any]:
+    """Load hive configuration from ~/.hive/configuration.json."""
+    if not HIVE_CONFIG_FILE.exists():
+        return {}
+    try:
+        with open(HIVE_CONFIG_FILE) as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError):
+        return {}
+
+
+# ---------------------------------------------------------------------------
+# Derived helpers
+# ---------------------------------------------------------------------------
+
+
+def get_preferred_model() -> str:
+    """Return the user's preferred LLM model string (e.g. 'anthropic/claude-sonnet-4-20250514')."""
+    llm = get_hive_config().get("llm", {})
+    if llm.get("provider") and llm.get("model"):
+        return f"{llm['provider']}/{llm['model']}"
+    return "anthropic/claude-sonnet-4-20250514"
+
+
+def get_max_tokens() -> int:
+    """Return the configured max_tokens, falling back to DEFAULT_MAX_TOKENS."""
+    return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
+
+
+# ---------------------------------------------------------------------------
+# RuntimeConfig – shared across agent templates
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class RuntimeConfig:
+    """Agent runtime configuration loaded from ~/.hive/configuration.json."""
+
+    model: str = field(default_factory=get_preferred_model)
+    temperature: float = 0.7
+    max_tokens: int = field(default_factory=get_max_tokens)
+    api_key: str | None = None
+    api_base: str | None = None
@@ -143,19 +143,34 @@ class AdenCredentialResponse:
    def from_dict(
        cls, data: dict[str, Any], integration_id: str | None = None
    ) -> AdenCredentialResponse:
-        """Create from API response dictionary."""
+        """Create from API response dictionary or normalized credential dict."""
+
        expires_at = None
        if data.get("expires_at"):
            expires_at = datetime.fromisoformat(data["expires_at"].replace("Z", "+00:00"))

+        resolved_integration_id = (
+            integration_id
+            or data.get("integration_id")
+            or data.get("alias")
+            or data.get("provider", "")
+        )
+
+        resolved_integration_type = data.get("integration_type") or data.get("provider", "")
+        metadata = data.get("metadata")
+        if metadata is None and data.get("email"):
+            metadata = {"email": data.get("email")}
+        if metadata is None:
+            metadata = {}
+
        return cls(
-            integration_id=integration_id or data.get("alias", data.get("provider", "")),
-            integration_type=data.get("provider", ""),
+            integration_id=resolved_integration_id,
+            integration_type=resolved_integration_type,
            access_token=data["access_token"],
            token_type=data.get("token_type", "Bearer"),
            expires_at=expires_at,
            scopes=data.get("scopes", []),
-            metadata={"email": data.get("email")} if data.get("email") else {},
+            metadata=metadata,
        )


@@ -9,7 +9,7 @@ from framework.graph.client_io import (
 from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
 from framework.graph.context_handoff import ContextHandoff, HandoffContext
 from framework.graph.conversation import ConversationStore, Message, NodeConversation
-from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
+from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
 from framework.graph.event_loop_node import (
    EventLoopNode,
    JudgeProtocol,
@@ -58,6 +58,7 @@ __all__ = [
    "EdgeSpec",
    "EdgeCondition",
    "GraphSpec",
+    "DEFAULT_MAX_TOKENS",
    # Executor (fixed graph)
    "GraphExecutor",
    # Plan (flexible execution)
@@ -0,0 +1,85 @@
+"""
+Checkpoint Configuration - Controls checkpoint behavior during execution.
+"""
+
+from dataclasses import dataclass
+
+
+@dataclass
+class CheckpointConfig:
+    """
+    Configuration for checkpoint behavior during graph execution.
+
+    Controls when checkpoints are created, how they're stored,
+    and when they're pruned.
+    """
+
+    # Enable/disable checkpointing
+    enabled: bool = True
+
+    # When to checkpoint
+    checkpoint_on_node_start: bool = True
+    checkpoint_on_node_complete: bool = True
+
+    # Pruning (time-based)
+    checkpoint_max_age_days: int = 7  # Prune checkpoints older than 1 week
+    prune_every_n_nodes: int = 10  # Check for pruning every N nodes
+
+    # Performance
+    async_checkpoint: bool = True  # Don't block execution on checkpoint writes
+
+    # What to include in checkpoints
+    include_full_memory: bool = True
+    include_metrics: bool = True
+
+    def should_checkpoint_node_start(self) -> bool:
+        """Check if should checkpoint before node execution."""
+        return self.enabled and self.checkpoint_on_node_start
+
+    def should_checkpoint_node_complete(self) -> bool:
+        """Check if should checkpoint after node execution."""
+        return self.enabled and self.checkpoint_on_node_complete
+
+    def should_prune_checkpoints(self, nodes_executed: int) -> bool:
+        """
+        Check if should prune checkpoints based on execution progress.
+
+        Args:
+            nodes_executed: Number of nodes executed so far
+
+        Returns:
+            True if should check for old checkpoints and prune them
+        """
+        return (
+            self.enabled
+            and self.prune_every_n_nodes > 0
+            and nodes_executed % self.prune_every_n_nodes == 0
+        )
+
+
+# Default configuration for most agents
+DEFAULT_CHECKPOINT_CONFIG = CheckpointConfig(
+    enabled=True,
+    checkpoint_on_node_start=True,
+    checkpoint_on_node_complete=True,
+    checkpoint_max_age_days=7,
+    prune_every_n_nodes=10,
+    async_checkpoint=True,
+)
+
+
+# Minimal configuration (only checkpoint at node completion)
+MINIMAL_CHECKPOINT_CONFIG = CheckpointConfig(
+    enabled=True,
+    checkpoint_on_node_start=False,
+    checkpoint_on_node_complete=True,
+    checkpoint_max_age_days=7,
+    prune_every_n_nodes=20,
+    async_checkpoint=True,
+)
+
+
+# Disabled configuration (no checkpointing)
+DISABLED_CHECKPOINT_CONFIG = CheckpointConfig(
+    enabled=False,
+)
@@ -24,10 +24,12 @@ given the current goal, context, and execution state.
 from enum import StrEnum
 from typing import Any

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator

 from framework.graph.safe_eval import safe_eval

+DEFAULT_MAX_TOKENS = 8192
+

 class EdgeCondition(StrEnum):
    """When an edge should be traversed."""
@@ -424,7 +426,7 @@ class GraphSpec(BaseModel):

    # Default LLM settings
    default_model: str = "claude-haiku-4-5-20251001"
-    max_tokens: int = 1024
+    max_tokens: int = Field(default=None)  # resolved by _resolve_max_tokens validator

    # Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
    # If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
@@ -447,6 +449,16 @@ class GraphSpec(BaseModel):

    model_config = {"extra": "allow"}

+    @model_validator(mode="before")
+    @classmethod
+    def _resolve_max_tokens(cls, values: Any) -> Any:
+        """Resolve max_tokens from the global config store when not explicitly set."""
+        if isinstance(values, dict) and values.get("max_tokens") is None:
+            from framework.config import get_max_tokens
+
+            values["max_tokens"] = get_max_tokens()
+        return values
+
    def get_node(self, node_id: str) -> Any | None:
        """Get a node by ID."""
        for node in self.nodes:
@@ -1763,7 +1763,19 @@ class EventLoopNode(NodeProtocol):
        conversation: NodeConversation,
        iteration: int,
    ) -> bool:
-        """Check if pause has been requested. Returns True if paused."""
+        """
+        Check if pause has been requested. Returns True if paused.
+
+        Note: This check happens BEFORE starting iteration N, after completing N-1.
+        If paused, the node exits having completed {iteration} iterations (0 to iteration-1).
+        """
+        # Check executor-level pause event (for /pause command, Ctrl+Z)
+        if ctx.pause_event and ctx.pause_event.is_set():
+            completed = iteration  # 0-indexed: iteration=3 means 3 iterations completed (0,1,2)
+            logger.info(f"⏸ Pausing after {completed} iteration(s) completed (executor-level)")
+            return True
+
+        # Check context-level pause flags (legacy/alternative methods)
        pause_requested = ctx.input_data.get("pause_requested", False)
        if not pause_requested:
            try:
@@ -1771,8 +1783,10 @@ class EventLoopNode(NodeProtocol):
            except (PermissionError, KeyError):
                pause_requested = False
        if pause_requested:
-            logger.info(f"Pause requested at iteration {iteration}")
+            completed = iteration
+            logger.info(f"⏸ Pausing after {completed} iteration(s) completed (context-level)")
            return True
+
        return False

    # -------------------------------------------------------------------
@@ -17,6 +17,7 @@ from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any

+from framework.graph.checkpoint_config import CheckpointConfig
 from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
 from framework.graph.goal import Goal
 from framework.graph.node import (
@@ -32,7 +33,10 @@ from framework.graph.node import (
 from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
 from framework.graph.validator import OutputValidator
 from framework.llm.provider import LLMProvider, Tool
+from framework.observability import set_trace_context
 from framework.runtime.core import Runtime
+from framework.schemas.checkpoint import Checkpoint
+from framework.storage.checkpoint_store import CheckpointStore


@dataclass
@@ -179,6 +183,9 @@ class GraphExecutor:
        self.enable_parallel_execution = enable_parallel_execution
        self._parallel_config = parallel_config or ParallelExecutionConfig()

+        # Pause/resume control
+        self._pause_requested = asyncio.Event()
+
    def _validate_tools(self, graph: GraphSpec) -> list[str]:
        """
        Validate that all tools declared by nodes are available.
@@ -208,6 +215,7 @@ class GraphExecutor:
        goal: Goal,
        input_data: dict[str, Any] | None = None,
        session_state: dict[str, Any] | None = None,
+        checkpoint_config: "CheckpointConfig | None" = None,
    ) -> ExecutionResult:
        """
        Execute a graph for a goal.
@@ -221,6 +229,9 @@ class GraphExecutor:
        Returns:
            ExecutionResult with output and metrics
        """
+        # Add agent_id to trace context for correlation
+        set_trace_context(agent_id=graph.id)
+
        # Validate graph
        errors = graph.validate()
        if errors:
@@ -246,6 +257,12 @@ class GraphExecutor:
        # Initialize execution state
        memory = SharedMemory()

+        # Initialize checkpoint store if checkpointing is enabled
+        checkpoint_store: CheckpointStore | None = None
+        if checkpoint_config and checkpoint_config.enabled and self._storage_path:
+            checkpoint_store = CheckpointStore(self._storage_path)
+            self.logger.info("✓ Checkpointing enabled")
+
        # Restore session state if provided
        if session_state and "memory" in session_state:
            memory_data = session_state["memory"]
@@ -273,8 +290,110 @@ class GraphExecutor:
        node_visit_counts: dict[str, int] = {}  # Track visits for feedback loops
        _is_retry = False  # True when looping back for a retry (not a new visit)

+        # Restore node_visit_counts from session state if available
+        if session_state and "node_visit_counts" in session_state:
+            node_visit_counts = dict(session_state["node_visit_counts"])
+            if node_visit_counts:
+                self.logger.info(f"📥 Restored node visit counts: {node_visit_counts}")
+
+                # If resuming at a specific node (paused_at), that node was counted
+                # but never completed, so decrement its count
+                paused_at = session_state.get("paused_at")
+                if (
+                    paused_at
+                    and paused_at in node_visit_counts
+                    and node_visit_counts[paused_at] > 0
+                ):
+                    old_count = node_visit_counts[paused_at]
+                    node_visit_counts[paused_at] -= 1
+                    self.logger.info(
+                        f"📥 Decremented visit count for paused node '{paused_at}': "
+                        f"{old_count} -> {node_visit_counts[paused_at]}"
+                    )
+
        # Determine entry point (may differ if resuming)
-        current_node_id = graph.get_entry_point(session_state)
+        # Check if resuming from checkpoint
+        if session_state and session_state.get("resume_from_checkpoint") and checkpoint_store:
+            checkpoint_id = session_state["resume_from_checkpoint"]
+            try:
+                checkpoint = await checkpoint_store.load_checkpoint(checkpoint_id)
+
+                if checkpoint:
+                    self.logger.info(
+                        f"🔄 Resuming from checkpoint: {checkpoint_id} "
+                        f"(node: {checkpoint.current_node})"
+                    )
+
+                    # Restore memory from checkpoint
+                    for key, value in checkpoint.shared_memory.items():
+                        memory.write(key, value, validate=False)
+
+                    # Start from checkpoint's next node or current node
+                    current_node_id = (
+                        checkpoint.next_node or checkpoint.current_node or graph.entry_node
+                    )
+
+                    # Restore execution path
+                    path.extend(checkpoint.execution_path)
+
+                    self.logger.info(
+                        f"📥 Restored memory with {len(checkpoint.shared_memory)} keys, "
+                        f"resuming at node: {current_node_id}"
+                    )
+                else:
+                    self.logger.warning(
+                        f"Checkpoint {checkpoint_id} not found, resuming from normal entry point"
+                    )
+                    # Check if resuming from paused_at (fallback to session state)
+                    paused_at = session_state.get("paused_at") if session_state else None
+                    if paused_at and graph.get_node(paused_at) is not None:
+                        current_node_id = paused_at
+                        self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
+                    else:
+                        current_node_id = graph.get_entry_point(session_state)
+
+            except Exception as e:
+                self.logger.error(
+                    f"Failed to load checkpoint {checkpoint_id}: {e}, "
+                    f"resuming from normal entry point"
+                )
+                # Check if resuming from paused_at (fallback to session state)
+                paused_at = session_state.get("paused_at") if session_state else None
+                if paused_at and graph.get_node(paused_at) is not None:
+                    current_node_id = paused_at
+                    self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
+                else:
+                    current_node_id = graph.get_entry_point(session_state)
+        else:
+            # Check if resuming from paused_at (session state resume)
+            paused_at = session_state.get("paused_at") if session_state else None
+            node_ids = [n.id for n in graph.nodes]
+            self.logger.info(f"🔍 Debug: paused_at={paused_at}, available node IDs={node_ids}")
+
+            if paused_at and graph.get_node(paused_at) is not None:
+                # Resume from paused_at node directly (works for any node, not just pause_nodes)
+                current_node_id = paused_at
+
+                # Restore execution path from session state if available
+                if session_state:
+                    execution_path = session_state.get("execution_path", [])
+                    if execution_path:
+                        path.extend(execution_path)
+                        self.logger.info(
+                            f"🔄 Resuming from paused node: {paused_at} "
+                            f"(restored path: {execution_path})"
+                        )
+                    else:
+                        self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
+                else:
+                    self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
+            else:
+                # Fall back to normal entry point logic
+                self.logger.warning(
+                    f"⚠ paused_at={paused_at} is not a valid node, falling back to entry point"
+                )
+                current_node_id = graph.get_entry_point(session_state)
+
        steps = 0

        if session_state and current_node_id != graph.entry_node:
@@ -289,7 +408,6 @@ class GraphExecutor:

        if self.runtime_logger:
            # Extract session_id from storage_path if available (for unified sessions)
-            # storage_path format: base_path/sessions/{session_id}/
            session_id = ""
            if self._storage_path and self._storage_path.name.startswith("session_"):
                session_id = self._storage_path.name
@@ -313,6 +431,45 @@ class GraphExecutor:
            while steps < graph.max_steps:
                steps += 1

+                # Check for pause request
+                if self._pause_requested.is_set():
+                    self.logger.info("⏸ Pause detected - stopping at node boundary")
+
+                    # Create session state for pause
+                    saved_memory = memory.read_all()
+                    pause_session_state: dict[str, Any] = {
+                        "memory": saved_memory,  # Include memory for resume
+                        "execution_path": list(path),
+                        "node_visit_counts": dict(node_visit_counts),
+                    }
+
+                    # Create a pause checkpoint
+                    if checkpoint_store:
+                        pause_checkpoint = self._create_checkpoint(
+                            checkpoint_type="pause",
+                            current_node=current_node_id,
+                            execution_path=path,
+                            memory=memory,
+                            next_node=current_node_id,
+                            is_clean=True,
+                        )
+                        await checkpoint_store.save_checkpoint(pause_checkpoint)
+                        pause_session_state["latest_checkpoint_id"] = pause_checkpoint.checkpoint_id
+                        pause_session_state["resume_from_checkpoint"] = (
+                            pause_checkpoint.checkpoint_id
+                        )
+
+                    # Return with paused status
+                    return ExecutionResult(
+                        success=False,
+                        output=saved_memory,
+                        path=path,
+                        paused_at=current_node_id,
+                        error="Execution paused by user request",
+                        session_state=pause_session_state,
+                        node_visit_counts=dict(node_visit_counts),
+                    )
+
                # Get current node
                node_spec = graph.get_node(current_node_id)
                if node_spec is None:
@@ -391,6 +548,27 @@ class GraphExecutor:
                        description=f"Validation errors for {current_node_id}: {validation_errors}",
                    )

+                # CHECKPOINT: node_start
+                if (
+                    checkpoint_store
+                    and checkpoint_config
+                    and checkpoint_config.should_checkpoint_node_start()
+                ):
+                    checkpoint = self._create_checkpoint(
+                        checkpoint_type="node_start",
+                        current_node=node_spec.id,
+                        execution_path=list(path),
+                        memory=memory,
+                        is_clean=(sum(node_retry_counts.values()) == 0),
+                    )
+
+                    if checkpoint_config.async_checkpoint:
+                        # Non-blocking checkpoint save
+                        asyncio.create_task(checkpoint_store.save_checkpoint(checkpoint))
+                    else:
+                        # Blocking checkpoint save
+                        await checkpoint_store.save_checkpoint(checkpoint)
+
                # Emit node-started event (skip event_loop nodes — they emit their own)
                if self._event_bus and node_spec.node_type != "event_loop":
                    await self._event_bus.emit_node_loop_started(
@@ -464,6 +642,13 @@ class GraphExecutor:
                            if len(value_str) > 200:
                                value_str = value_str[:200] + "..."
                            self.logger.info(f"      {key}: {value_str}")
+
+                    # Write node outputs to memory BEFORE edge evaluation
+                    # This enables direct key access in conditional expressions (e.g., "score > 80")
+                    # Without this, conditional edges can only use output['key'] syntax
+                    if result.output:
+                        for key, value in result.output.items():
+                            memory.write(key, value, validate=False)
                else:
                    self.logger.error(f"   ✗ Failed: {result.error}")

@@ -557,13 +742,21 @@ class GraphExecutor:
                                    execution_quality="failed",
                                )

+                            # Save memory for potential resume
+                            saved_memory = memory.read_all()
+                            failure_session_state = {
+                                "memory": saved_memory,
+                                "execution_path": list(path),
+                                "node_visit_counts": dict(node_visit_counts),
+                            }
+
                            return ExecutionResult(
                                success=False,
                                error=(
                                    f"Node '{node_spec.name}' failed after "
                                    f"{max_retries} attempts: {result.error}"
                                ),
-                                output=memory.read_all(),
+                                output=saved_memory,
                                steps_executed=steps,
                                total_tokens=total_tokens,
                                total_latency_ms=total_latency,
@@ -574,6 +767,7 @@ class GraphExecutor:
                                had_partial_failures=len(nodes_failed) > 0,
                                execution_quality="failed",
                                node_visit_counts=dict(node_visit_counts),
+                                session_state=failure_session_state,
                            )

                # Check if we just executed a pause node - if so, save state and return
@@ -696,6 +890,39 @@ class GraphExecutor:
                            break
                        next_spec = graph.get_node(next_node)
                        self.logger.info(f"   → Next: {next_spec.name if next_spec else next_node}")
+
+                        # CHECKPOINT: node_complete (after determining next node)
+                        if (
+                            checkpoint_store
+                            and checkpoint_config
+                            and checkpoint_config.should_checkpoint_node_complete()
+                        ):
+                            checkpoint = self._create_checkpoint(
+                                checkpoint_type="node_complete",
+                                current_node=node_spec.id,
+                                execution_path=list(path),
+                                memory=memory,
+                                next_node=next_node,
+                                is_clean=(sum(node_retry_counts.values()) == 0),
+                            )
+
+                            if checkpoint_config.async_checkpoint:
+                                asyncio.create_task(checkpoint_store.save_checkpoint(checkpoint))
+                            else:
+                                await checkpoint_store.save_checkpoint(checkpoint)
+
+                        # Periodic checkpoint pruning
+                        if (
+                            checkpoint_store
+                            and checkpoint_config
+                            and checkpoint_config.should_prune_checkpoints(len(path))
+                        ):
+                            asyncio.create_task(
+                                checkpoint_store.prune_checkpoints(
+                                    max_age_days=checkpoint_config.checkpoint_max_age_days
+                                )
+                            )
+
                        current_node_id = next_node

                # Update input_data for next node
@@ -753,6 +980,50 @@ class GraphExecutor:
                node_visit_counts=dict(node_visit_counts),
            )

+        except asyncio.CancelledError:
+            # Handle cancellation (e.g., TUI quit) - save as paused instead of failed
+            self.logger.info("⏸ Execution cancelled - saving state for resume")
+
+            # Save memory and state for resume
+            saved_memory = memory.read_all()
+            session_state_out: dict[str, Any] = {
+                "memory": saved_memory,
+                "execution_path": list(path),
+                "node_visit_counts": dict(node_visit_counts),
+            }
+
+            # Calculate quality metrics
+            total_retries_count = sum(node_retry_counts.values())
+            nodes_failed = [nid for nid, count in node_retry_counts.items() if count > 0]
+            exec_quality = "degraded" if total_retries_count > 0 else "clean"
+
+            if self.runtime_logger:
+                await self.runtime_logger.end_run(
+                    status="paused",
+                    duration_ms=total_latency,
+                    node_path=path,
+                    execution_quality=exec_quality,
+                )
+
+            # Return with paused status
+            return ExecutionResult(
+                success=False,
+                error="Execution paused by user",
+                output=saved_memory,
+                steps_executed=steps,
+                total_tokens=total_tokens,
+                total_latency_ms=total_latency,
+                path=path,
+                paused_at=current_node_id,  # Save where we were
+                session_state=session_state_out,
+                total_retries=total_retries_count,
+                nodes_with_failures=nodes_failed,
+                retry_details=dict(node_retry_counts),
+                had_partial_failures=len(nodes_failed) > 0,
+                execution_quality=exec_quality,
+                node_visit_counts=dict(node_visit_counts),
+            )
+
        except Exception as e:
            import traceback

@@ -790,9 +1061,40 @@ class GraphExecutor:
                    execution_quality="failed",
                )

+            # Save memory and state for potential resume
+            saved_memory = memory.read_all()
+            session_state_out: dict[str, Any] = {
+                "memory": saved_memory,
+                "execution_path": list(path),
+                "node_visit_counts": dict(node_visit_counts),
+            }
+
+            # Mark latest checkpoint for resume on failure
+            if checkpoint_store:
+                try:
+                    checkpoints = await checkpoint_store.list_checkpoints()
+                    if checkpoints:
+                        # Find latest clean checkpoint
+                        index = await checkpoint_store.load_index()
+                        if index:
+                            latest_clean = index.get_latest_clean_checkpoint()
+                            if latest_clean:
+                                session_state_out["resume_from_checkpoint"] = (
+                                    latest_clean.checkpoint_id
+                                )
+                                session_state_out["latest_checkpoint_id"] = (
+                                    latest_clean.checkpoint_id
+                                )
+                                self.logger.info(
+                                    f"💾 Marked checkpoint for resume: {latest_clean.checkpoint_id}"
+                                )
+                except Exception as checkpoint_err:
+                    self.logger.warning(f"Failed to mark checkpoint for resume: {checkpoint_err}")
+
            return ExecutionResult(
                success=False,
                error=str(e),
+                output=saved_memory,
                steps_executed=steps,
                path=path,
                total_retries=total_retries_count,
@@ -801,6 +1103,7 @@ class GraphExecutor:
                had_partial_failures=len(nodes_failed) > 0,
                execution_quality="failed",
                node_visit_counts=dict(node_visit_counts),
+                session_state=session_state_out,
            )

        finally:
@@ -841,6 +1144,7 @@ class GraphExecutor:
            goal=goal,  # Pass Goal object for LLM-powered routers
            max_tokens=max_tokens,
            runtime_logger=self.runtime_logger,
+            pause_event=self._pause_requested,  # Pass pause event for granular control
        )

    # Valid node types - no ambiguous "llm" type allowed
@@ -1353,3 +1657,50 @@ class GraphExecutor:
    def register_function(self, node_id: str, func: Callable) -> None:
        """Register a function as a node."""
        self.node_registry[node_id] = FunctionNode(func)
+
+    def request_pause(self) -> None:
+        """
+        Request graceful pause of the current execution.
+
+        The execution will pause at the next node boundary after the current
+        node completes. A checkpoint will be saved at the pause point, allowing
+        the execution to be resumed later.
+
+        This method is safe to call from any thread.
+        """
+        self._pause_requested.set()
+        self.logger.info("⏸ Pause requested - will pause at next node boundary")
+
+    def _create_checkpoint(
+        self,
+        checkpoint_type: str,
+        current_node: str,
+        execution_path: list[str],
+        memory: SharedMemory,
+        next_node: str | None = None,
+        is_clean: bool = True,
+    ) -> Checkpoint:
+        """
+        Create a checkpoint from current execution state.
+
+        Args:
+            checkpoint_type: Type of checkpoint (node_start, node_complete)
+            current_node: Current node ID
+            execution_path: Nodes executed so far
+            memory: SharedMemory instance
+            next_node: Next node to execute (for node_complete checkpoints)
+            is_clean: Whether execution was clean up to this point
+
+        Returns:
+            New Checkpoint instance
+        """
+
+        return Checkpoint.create(
+            checkpoint_type=checkpoint_type,
+            session_id=self._storage_path.name if self._storage_path else "unknown",
+            current_node=current_node,
+            execution_path=execution_path,
+            shared_memory=memory.read_all(),
+            next_node=next_node,
+            is_clean=is_clean,
+        )
@@ -480,6 +480,9 @@ class NodeContext:
    # Runtime logging (optional)
    runtime_logger: Any = None  # RuntimeLogger | None — uses Any to avoid import

+    # Pause control (optional) - asyncio.Event for pause requests
+    pause_event: Any = None  # asyncio.Event | None
+

@dataclass
 class NodeResult:
@@ -23,6 +23,7 @@ if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
 del _framework_dir, _project_root, _exports_dir

 from mcp.server import FastMCP  # noqa: E402
+from pydantic import ValidationError  # noqa: E402

 from framework.graph import (  # noqa: E402
    Constraint,
@@ -1856,6 +1857,85 @@ def export_graph() -> str:
    )


+@mcp.tool()
+def import_from_export(
+    agent_json_path: Annotated[str, "Path to the agent.json file to import"],
+) -> str:
+    """
+    Import an agent definition from an exported agent.json file into the current build session.
+
+    Reads the agent.json, parses goal/nodes/edges, and populates the current session.
+    This is the reverse of export_graph().
+
+    Args:
+        agent_json_path: Path to the agent.json file to import
+
+    Returns:
+        JSON summary of what was imported (goal name, node count, edge count)
+    """
+    session = get_session()
+
+    path = Path(agent_json_path)
+    if not path.exists():
+        return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
+
+    try:
+        data = json.loads(path.read_text())
+    except json.JSONDecodeError as e:
+        return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
+
+    try:
+        # Parse goal (same pattern as BuildSession.from_dict lines 88-99)
+        goal_data = data.get("goal")
+        if goal_data:
+            session.goal = Goal(
+                id=goal_data["id"],
+                name=goal_data["name"],
+                description=goal_data["description"],
+                success_criteria=[
+                    SuccessCriterion(**sc) for sc in goal_data.get("success_criteria", [])
+                ],
+                constraints=[Constraint(**c) for c in goal_data.get("constraints", [])],
+            )
+
+        # Parse nodes (same pattern as BuildSession.from_dict line 102)
+        graph_data = data.get("graph", {})
+        nodes_data = graph_data.get("nodes", [])
+        session.nodes = [NodeSpec(**n) for n in nodes_data]
+
+        # Parse edges (same pattern as BuildSession.from_dict lines 105-118)
+        edges_data = graph_data.get("edges", [])
+        session.edges = []
+        for e in edges_data:
+            condition_str = e.get("condition")
+            if isinstance(condition_str, str):
+                condition_map = {
+                    "always": EdgeCondition.ALWAYS,
+                    "on_success": EdgeCondition.ON_SUCCESS,
+                    "on_failure": EdgeCondition.ON_FAILURE,
+                    "conditional": EdgeCondition.CONDITIONAL,
+                    "llm_decide": EdgeCondition.LLM_DECIDE,
+                }
+                e["condition"] = condition_map.get(condition_str, EdgeCondition.ON_SUCCESS)
+            session.edges.append(EdgeSpec(**e))
+    except (KeyError, TypeError, ValueError, ValidationError) as e:
+        return json.dumps({"success": False, "error": f"Malformed agent.json: {e}"})
+
+    # Persist updated session
+    _save_session(session)
+
+    return json.dumps(
+        {
+            "success": True,
+            "goal": session.goal.name if session.goal else None,
+            "nodes_count": len(session.nodes),
+            "edges_count": len(session.edges),
+            "node_ids": [n.id for n in session.nodes],
+            "edge_ids": [e.id for e in session.edges],
+        }
+    )
+
+
@mcp.tool()
 def get_session_status() -> str:
    """Get the current status of the build session."""
@@ -0,0 +1,236 @@
+# Observability - Structured Logging
+
+## Configuration via Environment Variables
+
+Control logging format using environment variables:
+
+```bash
+# JSON logging (production) - Machine-parseable, one line per log
+export LOG_FORMAT=json
+python -m my_agent run
+
+# Human-readable (development) - Color-coded, easy to read
+# Default if LOG_FORMAT is not set
+python -m my_agent run
+```
+
+**Alternative:** Set `ENV=production` to automatically use JSON format:
+
+```bash
+export ENV=production
+python -m my_agent run
+```
+
+---
+
+## Overview
+
+The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
+
+**Features:**
+- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
+- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
+- **Dual output modes**: JSON for production, human-readable for development
+- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
+
+## Quick Start
+
+Logging is automatically configured when you use `AgentRunner`. No setup required:
+
+```python
+from framework.runner import AgentRunner
+
+runner = AgentRunner(graph=my_graph, goal=my_goal)
+result = await runner.run({"input": "data"})
+# Logs automatically include trace_id, execution_id, agent_id, etc.
+```
+
+## Programmatic Configuration
+
+Configure logging explicitly in your code:
+
+```python
+from framework.observability import configure_logging
+
+# Human-readable (development)
+configure_logging(level="DEBUG", format="human")
+
+# JSON (production)
+configure_logging(level="INFO", format="json")
+
+# Auto-detect from environment
+configure_logging(level="INFO", format="auto")
+```
+
+### Configuration Options
+
+- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
+- **format**: 
+  - `"json"` - Machine-parseable JSON (one line per log entry)
+  - `"human"` - Human-readable with colors
+  - `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
+
+## Log Format Examples
+
+### JSON Format (Machine-parseable)
+
+```json
+{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "54e80d7b5bd6409dbc3217e5cd16a4fd", "execution_id": "b4c348ec54e80d7b5bd6409dbc3217e50", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
+```
+
+**Features:**
+- `trace_id` and `execution_id` are 32 hex chars (W3C/OTel-aligned, no prefixes)
+- Compact single-line format (easy to stream/parse)
+- All trace context fields included automatically
+
+### Human-Readable Format (Development)
+
+```
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
+[INFO    ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
+```
+
+**Features:**
+- Color-coded log levels
+- Shortened IDs for readability (first 8 chars)
+- Context prefix shows trace correlation
+
+## Trace Context Fields
+
+When the framework sets trace context, these fields are included in all logs. IDs are 32 hex (W3C/OTel-aligned, no prefixes).
+
+- **trace_id**: Trace identifier
+- **execution_id**: Run/session correlation
+- **agent_id**: Agent/graph identifier
+- **goal_id**: Goal being pursued
+- **node_id**: Current node (when set)
+
+## Custom Log Fields
+
+Add custom fields using the `extra` parameter:
+
+```python
+import logging
+
+logger = logging.getLogger("my_module")
+
+# Add custom fields
+logger.info("LLM call completed", extra={
+    "latency_ms": 1250,
+    "tokens_used": 450,
+    "model": "claude-3-5-sonnet-20241022",
+    "node_id": "web-search"
+})
+```
+
+These fields appear in both JSON and human-readable formats.
+
+## Usage in Your Code
+
+### Standard Logging (Recommended)
+
+Just use Python's standard logging - context is automatic:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+def my_function():
+    # This log automatically includes trace_id, execution_id, etc.
+    logger.info("Processing data")
+    
+    try:
+        result = do_work()
+        logger.info("Work completed", extra={"result_count": len(result)})
+    except Exception as e:
+        logger.error("Work failed", exc_info=True)
+```
+
+### Framework-Managed Context
+
+The framework automatically sets trace context at key points:
+
+- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
+- **GraphExecutor.execute()**: Adds `agent_id`
+- **Node execution**: Adds `node_id`
+
+Propagation is automatic via ContextVar.
+
+## Advanced Usage
+
+### Manual Context Management
+
+If you need to set trace context manually (rare):
+
+```python
+from framework.observability import set_trace_context, get_trace_context
+
+# Set context (32-hex, no prefixes)
+set_trace_context(
+    trace_id="54e80d7b5bd6409dbc3217e5cd16a4fd",
+    execution_id="b4c348ec54e80d7b5bd6409dbc3217e50",
+    agent_id="my-agent"
+)
+
+# Get current context
+context = get_trace_context()
+print(context["execution_id"])
+
+# Clear context (usually not needed)
+from framework.observability import clear_trace_context
+clear_trace_context()
+```
+
+### Testing
+
+For tests, you may want to configure logging explicitly:
+
+```python
+import pytest
+from framework.observability import configure_logging
+
+@pytest.fixture(autouse=True)
+def setup_logging():
+    configure_logging(level="DEBUG", format="human")
+```
+
+## Best Practices
+
+1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
+2. **Development**: Use human-readable format (default)
+3. **Don't manually set context**: Let the framework manage it
+4. **Use standard logging**: No special APIs needed - just `logger.info()`
+5. **Add custom fields**: Use `extra` dict for additional metadata
+
+## Troubleshooting
+
+### Logs missing trace context
+
+Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
+
+### JSON logs not appearing
+
+Check environment variables:
+```bash
+echo $LOG_FORMAT
+echo $ENV
+```
+
+Or explicitly set:
+```python
+configure_logging(format="json")
+```
+
+### Context not propagating
+
+ContextVar automatically propagates through async calls. If context seems lost, check:
+- Are you in the same async execution context?
+- Has `set_trace_context()` been called for this execution?
+
+## See Also
+
+- [Logging Implementation](../observability/logging.py) - Source code
+- [AgentRunner](../runner/runner.py) - Where logging is configured
+- [Runtime Core](../runtime/core.py) - Where trace context is set
@@ -0,0 +1,23 @@
+"""
+Observability module for automatic trace correlation and structured logging.
+
+This module provides zero-friction observability:
+- Automatic trace context propagation via ContextVar
+- Structured JSON logging for production
+- Human-readable logging for development
+- No manual ID passing required
+"""
+
+from framework.observability.logging import (
+    clear_trace_context,
+    configure_logging,
+    get_trace_context,
+    set_trace_context,
+)
+
+__all__ = [
+    "configure_logging",
+    "get_trace_context",
+    "set_trace_context",
+    "clear_trace_context",
+]
@@ -0,0 +1,302 @@
+"""
+Structured logging with automatic trace context propagation.
+
+Key Features:
+- Zero developer friction: Standard logger.info() calls get automatic context
+- ContextVar-based propagation: Thread-safe and async-safe
+- Dual output modes: JSON for production, human-readable for development
+- Correlation IDs: trace_id follows entire request flow automatically
+
+Architecture:
+    Runtime.start_run() → Generates trace_id, sets context once
+        ↓ (automatic propagation via ContextVar)
+    GraphExecutor.execute() → Adds agent_id to context
+        ↓ (automatic propagation)
+    Node.execute() → Adds node_id to context
+        ↓ (automatic propagation)
+    User code → logger.info("message") → Gets ALL context automatically!
+"""
+
+import json
+import logging
+import os
+import re
+from contextvars import ContextVar
+from datetime import UTC, datetime
+from typing import Any
+
+# Context variable for trace propagation
+# ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
+trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)
+
+# ANSI escape code pattern (matches \033[...m or \x1b[...m)
+ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")
+
+
+def strip_ansi_codes(text: str) -> str:
+    """Remove ANSI escape codes from text for clean JSON logging."""
+    return ANSI_ESCAPE_PATTERN.sub("", text)
+
+
+class StructuredFormatter(logging.Formatter):
+    """
+    JSON formatter for structured logging.
+
+    Produces machine-parseable log entries with:
+    - Standard fields (timestamp, level, logger, message)
+    - Trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC
+    - Custom fields from extra dict
+    """
+
+    def format(self, record: logging.LogRecord) -> str:
+        """Format log record as JSON."""
+        # Get trace context for correlation - AUTOMATIC!
+        context = trace_context.get() or {}
+
+        # Strip ANSI codes from message for clean JSON output
+        message = strip_ansi_codes(record.getMessage())
+
+        # Build base log entry
+        log_entry = {
+            "timestamp": datetime.now(UTC).isoformat(),
+            "level": record.levelname.lower(),
+            "logger": record.name,
+            "message": message,
+        }
+
+        # Add trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC!
+        log_entry.update(context)
+
+        # Add custom fields from extra (optional)
+        event = getattr(record, "event", None)
+        if event is not None:
+            if isinstance(event, str):
+                log_entry["event"] = strip_ansi_codes(str(event))
+            else:
+                log_entry["event"] = event
+
+        latency_ms = getattr(record, "latency_ms", None)
+        if latency_ms is not None:
+            log_entry["latency_ms"] = latency_ms
+
+        tokens_used = getattr(record, "tokens_used", None)
+        if tokens_used is not None:
+            log_entry["tokens_used"] = tokens_used
+
+        node_id = getattr(record, "node_id", None)
+        if node_id is not None:
+            log_entry["node_id"] = node_id
+
+        model = getattr(record, "model", None)
+        if model is not None:
+            log_entry["model"] = model
+
+        # Add exception info if present (strip ANSI codes from exception text too)
+        if record.exc_info:
+            exception_text = self.formatException(record.exc_info)
+            log_entry["exception"] = strip_ansi_codes(exception_text)
+
+        return json.dumps(log_entry)
+
+
+class HumanReadableFormatter(logging.Formatter):
+    """
+    Human-readable formatter for development.
+
+    Provides colorized logs with trace context for local debugging.
+    Includes trace_id prefix for correlation - AUTOMATIC!
+    """
+
+    COLORS = {
+        "DEBUG": "\033[36m",  # Cyan
+        "INFO": "\033[32m",  # Green
+        "WARNING": "\033[33m",  # Yellow
+        "ERROR": "\033[31m",  # Red
+        "CRITICAL": "\033[35m",  # Magenta
+    }
+    RESET = "\033[0m"
+
+    def format(self, record: logging.LogRecord) -> str:
+        """Format log record as human-readable string."""
+        # Get trace context - AUTOMATIC!
+        context = trace_context.get() or {}
+        trace_id = context.get("trace_id", "")
+        execution_id = context.get("execution_id", "")
+        agent_id = context.get("agent_id", "")
+
+        # Build context prefix
+        prefix_parts = []
+        if trace_id:
+            prefix_parts.append(f"trace:{trace_id[:8]}")
+        if execution_id:
+            prefix_parts.append(f"exec:{execution_id[-8:]}")
+        if agent_id:
+            prefix_parts.append(f"agent:{agent_id}")
+
+        context_prefix = f"[{' | '.join(prefix_parts)}] " if prefix_parts else ""
+
+        # Get color
+        color = self.COLORS.get(record.levelname, "")
+        reset = self.RESET
+
+        # Format log level (5 chars wide for alignment)
+        level = f"{record.levelname:<8}"
+
+        # Add event if present
+        event = ""
+        record_event = getattr(record, "event", None)
+        if record_event is not None:
+            event = f" [{record_event}]"
+
+        # Format message: [LEVEL] [trace context] message
+        return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
+
+
+def configure_logging(
+    level: str = "INFO",
+    format: str = "auto",  # "json", "human", or "auto"
+) -> None:
+    """
+    Configure structured logging for the application.
+
+    This should be called ONCE at application startup, typically in:
+    - AgentRunner._setup()
+    - Main entry point
+    - Test fixtures
+
+    Args:
+        level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
+        format: Output format:
+            - "json": Machine-parseable JSON (for production)
+            - "human": Human-readable with colors (for development)
+            - "auto": JSON if LOG_FORMAT=json or ENV=production, else human
+
+    Examples:
+        # Development mode (human-readable)
+        configure_logging(level="DEBUG", format="human")
+
+        # Production mode (JSON)
+        configure_logging(level="INFO", format="json")
+
+        # Auto-detect from environment
+        configure_logging(level="INFO", format="auto")
+    """
+    # Auto-detect format
+    if format == "auto":
+        # Use JSON if LOG_FORMAT=json or ENV=production
+        log_format_env = os.getenv("LOG_FORMAT", "").lower()
+        env = os.getenv("ENV", "development").lower()
+
+        if log_format_env == "json" or env == "production":
+            format = "json"
+        else:
+            format = "human"
+
+    # Select formatter
+    if format == "json":
+        formatter = StructuredFormatter()
+        # Disable colors in third-party libraries when using JSON format
+        _disable_third_party_colors()
+    else:
+        formatter = HumanReadableFormatter()
+
+    # Configure handler
+    handler = logging.StreamHandler()
+    handler.setFormatter(formatter)
+
+    # Configure root logger
+    root_logger = logging.getLogger()
+    root_logger.handlers.clear()
+    root_logger.addHandler(handler)
+    root_logger.setLevel(level.upper())
+
+    # When in JSON mode, configure known third-party loggers to use JSON formatter
+    # This ensures libraries like LiteLLM, httpcore also output clean JSON
+    if format == "json":
+        third_party_loggers = [
+            "LiteLLM",
+            "httpcore",
+            "httpx",
+            "openai",
+        ]
+        for logger_name in third_party_loggers:
+            logger = logging.getLogger(logger_name)
+            # Clear existing handlers so records propagate to root and use our formatter there
+            logger.handlers.clear()
+            logger.propagate = True  # Still propagate to root for consistency
+
+
+def _disable_third_party_colors() -> None:
+    """Disable color output in third-party libraries for clean JSON logging."""
+    # Set NO_COLOR environment variable (common convention for disabling colors)
+    os.environ["NO_COLOR"] = "1"
+    os.environ["FORCE_COLOR"] = "0"
+
+    # Disable LiteLLM debug/verbose output colors if available
+    try:
+        import litellm
+
+        # LiteLLM respects NO_COLOR, but we can also suppress debug info
+        if hasattr(litellm, "suppress_debug_info"):
+            litellm.suppress_debug_info = True  # type: ignore[attr-defined]
+    except (ImportError, AttributeError):
+        pass
+
+
+def set_trace_context(**kwargs: Any) -> None:
+    """
+    Set trace context for current execution.
+
+    Context is stored in a ContextVar and AUTOMATICALLY propagates
+    through async calls within the same execution context.
+
+    This is called by the framework at key points:
+    - Runtime.start_run(): Sets trace_id, execution_id, goal_id
+    - GraphExecutor.execute(): Adds agent_id
+    - Node execution: Adds node_id
+
+    Developers/agents NEVER call this directly - it's framework-managed.
+
+    Args:
+        **kwargs: Context fields (trace_id, execution_id, agent_id, etc.)
+
+    Example (framework code):
+        # In Runtime.start_run()
+        trace_id = uuid.uuid4().hex  # 32 hex, W3C Trace Context compliant
+        execution_id = uuid.uuid4().hex  # 32 hex, OTel-aligned for correlation
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=execution_id,
+            goal_id=goal_id
+        )
+        # All subsequent logs in this execution get these fields automatically!
+    """
+    current = trace_context.get() or {}
+    trace_context.set({**current, **kwargs})
+
+
+def get_trace_context() -> dict:
+    """
+    Get current trace context.
+
+    Returns:
+        Dict with trace_id, execution_id, agent_id, etc.
+        Empty dict if no context set.
+    """
+    context = trace_context.get() or {}
+    return context.copy()
+
+
+def clear_trace_context() -> None:
+    """
+    Clear trace context.
+
+    Useful for:
+    - Cleanup between test runs
+    - Starting a completely new execution context
+    - Manual context management (rare)
+
+    Note: Framework typically doesn't need to call this - ContextVar
+    is execution-scoped and cleans itself up automatically.
+    """
+    trace_context.set(None)
@@ -33,11 +33,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        type=str,
        help="Input context from JSON file",
    )
-    run_parser.add_argument(
-        "--mock",
-        action="store_true",
-        help="Run in mock mode (no real LLM calls)",
-    )
    run_parser.add_argument(
        "--output",
        "-o",
@@ -68,6 +63,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        default=None,
        help="LLM model to use (any LiteLLM-compatible name)",
    )
+    run_parser.add_argument(
+        "--resume-session",
+        type=str,
+        default=None,
+        help="Resume from a specific session ID",
+    )
+    run_parser.add_argument(
+        "--checkpoint",
+        type=str,
+        default=None,
+        help="Resume from a specific checkpoint (requires --resume-session)",
+    )
    run_parser.set_defaults(func=cmd_run)

    # info command
@@ -192,11 +199,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        help="Launch interactive TUI dashboard",
        description="Browse available agents and launch the terminal dashboard.",
    )
-    tui_parser.add_argument(
-        "--mock",
-        action="store_true",
-        help="Run in mock mode (no real LLM calls)",
-    )
    tui_parser.add_argument(
        "--model",
        "-m",
@@ -206,6 +208,129 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
    )
    tui_parser.set_defaults(func=cmd_tui)

+    # sessions command group (checkpoint/resume management)
+    sessions_parser = subparsers.add_parser(
+        "sessions",
+        help="Manage agent sessions",
+        description="List, inspect, and manage agent execution sessions.",
+    )
+    sessions_subparsers = sessions_parser.add_subparsers(
+        dest="sessions_cmd",
+        help="Session management commands",
+    )
+
+    # sessions list
+    sessions_list_parser = sessions_subparsers.add_parser(
+        "list",
+        help="List agent sessions",
+        description="List all sessions for an agent.",
+    )
+    sessions_list_parser.add_argument(
+        "agent_path",
+        type=str,
+        help="Path to agent folder",
+    )
+    sessions_list_parser.add_argument(
+        "--status",
+        choices=["all", "active", "failed", "completed", "paused"],
+        default="all",
+        help="Filter by session status (default: all)",
+    )
+    sessions_list_parser.add_argument(
+        "--has-checkpoints",
+        action="store_true",
+        help="Show only sessions with checkpoints",
+    )
+    sessions_list_parser.set_defaults(func=cmd_sessions_list)
+
+    # sessions show
+    sessions_show_parser = sessions_subparsers.add_parser(
+        "show",
+        help="Show session details",
+        description="Display detailed information about a specific session.",
+    )
+    sessions_show_parser.add_argument(
+        "agent_path",
+        type=str,
+        help="Path to agent folder",
+    )
+    sessions_show_parser.add_argument(
+        "session_id",
+        type=str,
+        help="Session ID to inspect",
+    )
+    sessions_show_parser.add_argument(
+        "--json",
+        action="store_true",
+        help="Output as JSON",
+    )
+    sessions_show_parser.set_defaults(func=cmd_sessions_show)
+
+    # sessions checkpoints
+    sessions_checkpoints_parser = sessions_subparsers.add_parser(
+        "checkpoints",
+        help="List session checkpoints",
+        description="List all checkpoints for a session.",
+    )
+    sessions_checkpoints_parser.add_argument(
+        "agent_path",
+        type=str,
+        help="Path to agent folder",
+    )
+    sessions_checkpoints_parser.add_argument(
+        "session_id",
+        type=str,
+        help="Session ID",
+    )
+    sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
+
+    # pause command
+    pause_parser = subparsers.add_parser(
+        "pause",
+        help="Pause running session",
+        description="Request graceful pause of a running agent session.",
+    )
+    pause_parser.add_argument(
+        "agent_path",
+        type=str,
+        help="Path to agent folder",
+    )
+    pause_parser.add_argument(
+        "session_id",
+        type=str,
+        help="Session ID to pause",
+    )
+    pause_parser.set_defaults(func=cmd_pause)
+
+    # resume command
+    resume_parser = subparsers.add_parser(
+        "resume",
+        help="Resume session from checkpoint",
+        description="Resume a paused or failed session from a checkpoint.",
+    )
+    resume_parser.add_argument(
+        "agent_path",
+        type=str,
+        help="Path to agent folder",
+    )
+    resume_parser.add_argument(
+        "session_id",
+        type=str,
+        help="Session ID to resume",
+    )
+    resume_parser.add_argument(
+        "--checkpoint",
+        "-c",
+        type=str,
+        help="Specific checkpoint ID to resume from (default: latest)",
+    )
+    resume_parser.add_argument(
+        "--tui",
+        action="store_true",
+        help="Resume in TUI dashboard mode",
+    )
+    resume_parser.set_defaults(func=cmd_resume)
+

 def cmd_run(args: argparse.Namespace) -> int:
    """Run an exported agent."""
@@ -248,7 +373,6 @@ def cmd_run(args: argparse.Namespace) -> int:
                try:
                    runner = AgentRunner.load(
                        args.agent_path,
-                        mock_mode=args.mock,
                        model=args.model,
                        enable_tui=True,
                    )
@@ -264,7 +388,11 @@ def cmd_run(args: argparse.Namespace) -> int:
                if runner._agent_runtime and not runner._agent_runtime.is_running:
                    await runner._agent_runtime.start()

-                app = AdenTUI(runner._agent_runtime)
+                app = AdenTUI(
+                    runner._agent_runtime,
+                    resume_session=getattr(args, "resume_session", None),
+                    resume_checkpoint=getattr(args, "checkpoint", None),
+                )

                # TUI handles execution via ChatRepl — user submits input,
                # ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
@@ -286,7 +414,6 @@ def cmd_run(args: argparse.Namespace) -> int:
        try:
            runner = AgentRunner.load(
                args.agent_path,
-                mock_mode=args.mock,
                model=args.model,
                enable_tui=False,
            )
@@ -1057,7 +1184,6 @@ def cmd_tui(args: argparse.Namespace) -> int:
        try:
            runner = AgentRunner.load(
                agent_path,
-                mock_mode=args.mock,
                model=args.model,
                enable_tui=True,
            )
@@ -1445,3 +1571,53 @@ def _interactive_multi(agents_dir: Path) -> int:

    orchestrator.cleanup()
    return 0
+
+
+def cmd_sessions_list(args: argparse.Namespace) -> int:
+    """List agent sessions."""
+    print("⚠ Sessions list command not yet implemented")
+    print("This will be available once checkpoint infrastructure is complete.")
+    print(f"\nAgent: {args.agent_path}")
+    print(f"Status filter: {args.status}")
+    print(f"Has checkpoints: {args.has_checkpoints}")
+    return 1
+
+
+def cmd_sessions_show(args: argparse.Namespace) -> int:
+    """Show detailed session information."""
+    print("⚠ Session show command not yet implemented")
+    print("This will be available once checkpoint infrastructure is complete.")
+    print(f"\nAgent: {args.agent_path}")
+    print(f"Session: {args.session_id}")
+    return 1
+
+
+def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
+    """List checkpoints for a session."""
+    print("⚠ Session checkpoints command not yet implemented")
+    print("This will be available once checkpoint infrastructure is complete.")
+    print(f"\nAgent: {args.agent_path}")
+    print(f"Session: {args.session_id}")
+    return 1
+
+
+def cmd_pause(args: argparse.Namespace) -> int:
+    """Pause a running session."""
+    print("⚠ Pause command not yet implemented")
+    print("This will be available once executor pause integration is complete.")
+    print(f"\nAgent: {args.agent_path}")
+    print(f"Session: {args.session_id}")
+    return 1
+
+
+def cmd_resume(args: argparse.Namespace) -> int:
+    """Resume a session from checkpoint."""
+    print("⚠ Resume command not yet implemented")
+    print("This will be available once checkpoint resume integration is complete.")
+    print(f"\nAgent: {args.agent_path}")
+    print(f"Session: {args.session_id}")
+    if args.checkpoint:
+        print(f"Checkpoint: {args.checkpoint}")
+    if args.tui:
+        print("Mode: TUI")
+    return 1
@@ -8,8 +8,15 @@ from dataclasses import dataclass, field
 from pathlib import Path
 from typing import TYPE_CHECKING, Any

+from framework.config import get_hive_config, get_preferred_model
 from framework.graph import Goal
-from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec
+from framework.graph.edge import (
+    DEFAULT_MAX_TOKENS,
+    AsyncEntryPointSpec,
+    EdgeCondition,
+    EdgeSpec,
+    GraphSpec,
+)
 from framework.graph.executor import ExecutionResult, GraphExecutor
 from framework.graph.node import NodeSpec
 from framework.llm.provider import LLMProvider, Tool
@@ -28,9 +35,6 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

-# Configuration paths
-HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
-

 def _ensure_credential_key_env() -> None:
    """Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
@@ -60,17 +64,6 @@ def _ensure_credential_key_env() -> None:
 CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"


-def get_hive_config() -> dict[str, Any]:
-    """Load hive configuration from ~/.hive/configuration.json."""
-    if not HIVE_CONFIG_FILE.exists():
-        return {}
-    try:
-        with open(HIVE_CONFIG_FILE) as f:
-            return json.load(f)
-    except (json.JSONDecodeError, OSError):
-        return {}
-
-
 def get_claude_code_token() -> str | None:
    """
    Get the OAuth token from Claude Code subscription.
@@ -268,11 +261,7 @@ class AgentRunner:
    @staticmethod
    def _resolve_default_model() -> str:
        """Resolve the default model from ~/.hive/configuration.json."""
-        config = get_hive_config()
-        llm = config.get("llm", {})
-        if llm.get("provider") and llm.get("model"):
-            return f"{llm['provider']}/{llm['model']}"
-        return "anthropic/claude-sonnet-4-20250514"
+        return get_preferred_model()

    def __init__(
        self,
@@ -308,9 +297,9 @@ class AgentRunner:
            self._storage_path = storage_path
            self._temp_dir = None
        else:
-            # Use persistent storage in ~/.hive by default
+            # Use persistent storage in ~/.hive/agents/{agent_name}/ per RUNTIME_LOGGING.md spec
            home = Path.home()
-            default_storage = home / ".hive" / "storage" / agent_path.name
+            default_storage = home / ".hive" / "agents" / agent_path.name
            default_storage.mkdir(parents=True, exist_ok=True)
            self._storage_path = default_storage
            self._temp_dir = None
@@ -395,7 +384,7 @@ class AgentRunner:
        Args:
            agent_path: Path to agent folder
            mock_mode: If True, use mock LLM responses
-            storage_path: Path for runtime storage (defaults to ~/.hive/storage/{name})
+            storage_path: Path for runtime storage (defaults to ~/.hive/agents/{name})
            model: LLM model to use (reads from agent's default_config if None)
            enable_tui: If True, forces use of AgentRuntime with EventBus

@@ -425,7 +414,11 @@ class AgentRunner:
                if agent_config and hasattr(agent_config, "model"):
                    model = agent_config.model

-            max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
+            if agent_config and hasattr(agent_config, "max_tokens"):
+                max_tokens = agent_config.max_tokens
+            else:
+                hive_config = get_hive_config()
+                max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)

            # Build GraphSpec from module-level variables
            graph = GraphSpec(
@@ -562,6 +555,11 @@ class AgentRunner:

    def _setup(self) -> None:
        """Set up runtime, LLM, and executor."""
+        # Configure structured logging (auto-detects JSON vs human-readable)
+        from framework.observability import configure_logging
+
+        configure_logging(level="INFO", format="auto")
+
        # Set up session context for tools (workspace_id, agent_id, session_id)
        workspace_id = "default"  # Could be derived from storage path
        agent_id = self.graph.id or "unknown"
@@ -741,6 +739,17 @@ class AgentRunner:
        # Create AgentRuntime with all entry points
        log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")

+        # Enable checkpointing by default for resumable sessions
+        from framework.graph.checkpoint_config import CheckpointConfig
+
+        checkpoint_config = CheckpointConfig(
+            enabled=True,
+            checkpoint_on_node_start=False,  # Only checkpoint after nodes complete
+            checkpoint_on_node_complete=True,
+            checkpoint_max_age_days=7,
+            async_checkpoint=True,  # Non-blocking
+        )
+
        self._agent_runtime = create_agent_runtime(
            graph=self.graph,
            goal=self.goal,
@@ -750,6 +759,7 @@ class AgentRunner:
            tools=tools,
            tool_executor=tool_executor,
            runtime_log_store=log_store,
+            checkpoint_config=checkpoint_config,
        )

    async def run(
@@ -0,0 +1,842 @@
+# Resumable Sessions Design
+
+## Problem Statement
+
+Currently, when an agent encounters a failure during execution (e.g., credential validation, API errors, tool failures), the entire session is lost. This creates a poor user experience, especially when:
+
+1. The agent has completed significant work before the failure
+2. The failure is recoverable (e.g., adding missing credentials)
+3. The user wants to retry from the exact failure point without redoing work
+
+## Design Goals
+
+1. **Crash Recovery**: Sessions can resume after process crashes or errors
+2. **Partial Completion**: Preserve work done by nodes that completed successfully
+3. **Flexible Resume Points**: Resume from exact failure point or previous checkpoints
+4. **State Consistency**: Guarantee consistent SharedMemory and conversation state
+5. **Minimal Overhead**: Checkpointing shouldn't significantly impact performance
+6. **User Control**: Users can inspect, modify, and resume sessions explicitly
+
+## Architecture
+
+### 1. Checkpoint System
+
+#### Checkpoint Types
+
+**Automatic Checkpoints** (saved automatically by framework):
+- `node_start`: Before each node begins execution
+- `node_complete`: After each node successfully completes
+- `edge_transition`: Before traversing to next node
+- `loop_iteration`: At each iteration in EventLoopNode (optional)
+
+**Manual Checkpoints** (triggered by agent designer):
+- `safe_point`: Explicitly marked safe points in graph
+- `user_checkpoint`: Before awaiting user input in client-facing nodes
+
+#### Checkpoint Data Structure
+
+```python
+@dataclass
+class Checkpoint:
+    """Single checkpoint in execution timeline."""
+
+    # Identity
+    checkpoint_id: str  # Format: checkpoint_{timestamp}_{uuid_short}
+    session_id: str
+    checkpoint_type: str  # "node_start", "node_complete", etc.
+
+    # Timestamps
+    created_at: str  # ISO 8601
+
+    # Execution state
+    current_node: str | None
+    next_node: str | None  # For edge_transition checkpoints
+    execution_path: list[str]  # Nodes executed so far
+
+    # Memory state (snapshot)
+    shared_memory: dict[str, Any]  # Full SharedMemory._data
+
+    # Per-node conversation state references
+    # (actual conversations stored separately, reference by node_id)
+    conversation_states: dict[str, str]  # {node_id: conversation_checkpoint_id}
+
+    # Output accumulator state
+    accumulated_outputs: dict[str, Any]
+
+    # Execution metrics (for resuming quality tracking)
+    metrics_snapshot: dict[str, Any]
+
+    # Metadata
+    is_clean: bool  # True if no failures/retries before this checkpoint
+    can_resume_from: bool  # False if checkpoint is in unstable state
+    description: str  # Human-readable checkpoint description
+```
+
+#### Storage Structure
+
+```
+~/.hive/agents/{agent_name}/
+└── sessions/
+    └── session_YYYYMMDD_HHMMSS_{uuid}/
+        ├── state.json                    # Session state (existing)
+        ├── checkpoints/
+        │   ├── index.json                # Checkpoint index/manifest
+        │   ├── checkpoint_1.json         # Individual checkpoints
+        │   ├── checkpoint_2.json
+        │   └── checkpoint_N.json
+        ├── conversations/                # Per-node conversation state (existing)
+        │   ├── node_id_1/
+        │   │   ├── parts/
+        │   │   ├── meta.json
+        │   │   └── cursor.json
+        │   └── node_id_2/...
+        ├── data/                         # Spillover artifacts (existing)
+        └── logs/                         # L1/L2/L3 logs (existing)
+```
+
+**Checkpoint Index Format** (`checkpoints/index.json`):
+```json
+{
+  "session_id": "session_20260208_143022_abc12345",
+  "checkpoints": [
+    {
+      "checkpoint_id": "checkpoint_20260208_143030_xyz123",
+      "type": "node_complete",
+      "created_at": "2026-02-08T14:30:30.123Z",
+      "current_node": "collector",
+      "is_clean": true,
+      "can_resume_from": true,
+      "description": "Completed collector node successfully"
+    },
+    {
+      "checkpoint_id": "checkpoint_20260208_143045_abc789",
+      "type": "node_start",
+      "created_at": "2026-02-08T14:30:45.456Z",
+      "current_node": "analyzer",
+      "is_clean": true,
+      "can_resume_from": true,
+      "description": "Starting analyzer node"
+    }
+  ],
+  "latest_checkpoint_id": "checkpoint_20260208_143045_abc789",
+  "total_checkpoints": 2
+}
+```
+
+### 2. Resume Mechanism
+
+#### Resume Flow
+
+```python
+# High-level resume flow
+async def resume_session(
+    session_id: str,
+    checkpoint_id: str | None = None,  # None = resume from latest
+    modifications: dict[str, Any] | None = None,  # Override memory values
+) -> ExecutionResult:
+    """
+    Resume a session from a checkpoint.
+
+    Args:
+        session_id: Session to resume
+        checkpoint_id: Specific checkpoint (None = latest)
+        modifications: Optional memory/state modifications before resume
+
+    Returns:
+        ExecutionResult with resumed execution
+    """
+    # 1. Load session state
+    session_state = await session_store.read_state(session_id)
+
+    # 2. Verify session is resumable
+    if not session_state.is_resumable:
+        raise ValueError(f"Session {session_id} is not resumable")
+
+    # 3. Load checkpoint
+    checkpoint = await checkpoint_store.load_checkpoint(
+        session_id,
+        checkpoint_id or session_state.progress.resume_from
+    )
+
+    # 4. Restore state
+    # - Restore SharedMemory from checkpoint.shared_memory
+    # - Restore per-node conversations from checkpoint.conversation_states
+    # - Restore output accumulator from checkpoint.accumulated_outputs
+    # - Apply modifications if provided
+
+    # 5. Resume execution from checkpoint.next_node or checkpoint.current_node
+    result = await executor.execute(
+        graph=graph,
+        goal=goal,
+        memory=restored_memory,
+        entry_point=checkpoint.next_node or checkpoint.current_node,
+        session_state=restored_session_state,
+    )
+
+    # 6. Update session state with resumed execution
+    await session_store.write_state(session_id, updated_state)
+
+    return result
+```
+
+#### Checkpoint Restoration
+
+```python
+@dataclass
+class CheckpointStore:
+    """Manages checkpoint storage and retrieval."""
+
+    async def save_checkpoint(
+        self,
+        session_id: str,
+        checkpoint: Checkpoint,
+    ) -> None:
+        """Save a checkpoint atomically."""
+        # 1. Write checkpoint file: checkpoints/checkpoint_{id}.json
+        # 2. Update index: checkpoints/index.json
+        # 3. Use atomic write for crash safety
+
+    async def load_checkpoint(
+        self,
+        session_id: str,
+        checkpoint_id: str | None = None,
+    ) -> Checkpoint | None:
+        """Load a checkpoint by ID or latest."""
+        # 1. Read checkpoint index
+        # 2. Find checkpoint by ID (or latest if None)
+        # 3. Load and deserialize checkpoint file
+
+    async def list_checkpoints(
+        self,
+        session_id: str,
+        checkpoint_type: str | None = None,
+        is_clean: bool | None = None,
+    ) -> list[Checkpoint]:
+        """List all checkpoints for a session with optional filters."""
+
+    async def delete_checkpoint(
+        self,
+        session_id: str,
+        checkpoint_id: str,
+    ) -> bool:
+        """Delete a specific checkpoint."""
+
+    async def prune_checkpoints(
+        self,
+        session_id: str,
+        keep_count: int = 10,
+        keep_clean_only: bool = False,
+    ) -> int:
+        """Prune old checkpoints, keeping most recent N."""
+```
+
+### 3. GraphExecutor Integration
+
+#### Modified Execution Loop
+
+```python
+# In GraphExecutor.execute()
+
+async def execute(
+    self,
+    graph: GraphSpec,
+    goal: Goal,
+    memory: SharedMemory | None = None,
+    entry_point: str = "start",
+    session_state: dict[str, Any] | None = None,
+    checkpoint_config: CheckpointConfig | None = None,
+) -> ExecutionResult:
+    """
+    Execute graph with checkpointing support.
+
+    New parameters:
+        checkpoint_config: Configuration for checkpointing behavior
+    """
+
+    # Initialize checkpoint store
+    checkpoint_store = CheckpointStore(storage_path / "checkpoints")
+
+    # Restore from checkpoint if session_state indicates resume
+    if session_state and session_state.get("resume_from"):
+        checkpoint = await checkpoint_store.load_checkpoint(
+            session_id,
+            session_state["resume_from"]
+        )
+        memory = self._restore_memory_from_checkpoint(checkpoint)
+        entry_point = checkpoint.next_node or checkpoint.current_node
+
+    current_node = entry_point
+
+    while current_node:
+        # CHECKPOINT: node_start
+        if checkpoint_config and checkpoint_config.checkpoint_on_node_start:
+            await self._save_checkpoint(
+                checkpoint_store,
+                checkpoint_type="node_start",
+                current_node=current_node,
+                memory=memory,
+                # ... other state
+            )
+
+        try:
+            # Execute node
+            result = await self._execute_node(current_node, memory, context)
+
+            # CHECKPOINT: node_complete
+            if checkpoint_config and checkpoint_config.checkpoint_on_node_complete:
+                await self._save_checkpoint(
+                    checkpoint_store,
+                    checkpoint_type="node_complete",
+                    current_node=current_node,
+                    memory=memory,
+                    # ... other state
+                )
+
+        except Exception as e:
+            # On failure, mark current checkpoint as resume point
+            await self._mark_failure_checkpoint(
+                checkpoint_store,
+                current_node=current_node,
+                error=str(e),
+            )
+            raise
+
+        # Find next edge
+        next_node = self._find_next_node(current_node, result, memory)
+
+        # CHECKPOINT: edge_transition
+        if next_node and checkpoint_config and checkpoint_config.checkpoint_on_edge:
+            await self._save_checkpoint(
+                checkpoint_store,
+                checkpoint_type="edge_transition",
+                current_node=current_node,
+                next_node=next_node,
+                memory=memory,
+                # ... other state
+            )
+
+        current_node = next_node
+```
+
+### 4. EventLoopNode Integration
+
+#### Conversation State Checkpointing
+
+EventLoopNode already has conversation persistence via `ConversationStore`. For resumability:
+
+```python
+class EventLoopNode:
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        """Execute with checkpoint support."""
+
+        # Try to restore from checkpoint
+        if ctx.checkpoint_id:
+            conversation = await self._restore_conversation(ctx.checkpoint_id)
+            output_accumulator = await OutputAccumulator.restore(self.store)
+        else:
+            # Fresh start
+            conversation = await self._initialize_conversation(ctx)
+            output_accumulator = OutputAccumulator(store=self.store)
+
+        # Event loop with periodic checkpointing
+        iteration = 0
+        while iteration < self.config.max_iterations:
+
+            # Optional: checkpoint every N iterations
+            if self.config.checkpoint_every_n_iterations:
+                if iteration % self.config.checkpoint_every_n_iterations == 0:
+                    await self._save_loop_checkpoint(
+                        conversation,
+                        output_accumulator,
+                        iteration,
+                    )
+
+            # ... rest of event loop
+
+            iteration += 1
+```
+
+**Note**: EventLoopNode conversation state is already persisted to disk after each turn via `ConversationStore`, so it's naturally resumable. We just need to:
+1. Track which conversation checkpoint to restore from
+2. Ensure output accumulator state is also restored
+
+### 5. User-Facing API
+
+#### MCP Tools for Resume
+
+```python
+# In tools/src/aden_tools/tools/session_management/
+
+@tool
+async def list_resumable_sessions(
+    agent_work_dir: str,
+    status: str = "failed",  # "failed", "paused", "cancelled"
+    limit: int = 20,
+) -> dict:
+    """
+    List sessions that can be resumed.
+
+    Returns:
+        {
+            "sessions": [
+                {
+                    "session_id": "session_20260208_143022_abc12345",
+                    "status": "failed",
+                    "error": "Missing API key: OPENAI_API_KEY",
+                    "failed_at_node": "analyzer",
+                    "last_checkpoint": "checkpoint_20260208_143045_abc789",
+                    "created_at": "2026-02-08T14:30:22Z",
+                    "updated_at": "2026-02-08T14:30:45Z"
+                }
+            ],
+            "total": 1
+        }
+    """
+
+@tool
+async def list_session_checkpoints(
+    agent_work_dir: str,
+    session_id: str,
+    checkpoint_type: str = "",  # Filter by type
+    clean_only: bool = False,  # Only show clean checkpoints
+) -> dict:
+    """
+    List all checkpoints for a session.
+
+    Returns:
+        {
+            "session_id": "session_20260208_143022_abc12345",
+            "checkpoints": [
+                {
+                    "checkpoint_id": "checkpoint_20260208_143030_xyz123",
+                    "type": "node_complete",
+                    "created_at": "2026-02-08T14:30:30Z",
+                    "current_node": "collector",
+                    "is_clean": true,
+                    "can_resume_from": true,
+                    "description": "Completed collector node successfully"
+                },
+                ...
+            ]
+        }
+    """
+
+@tool
+async def inspect_checkpoint(
+    agent_work_dir: str,
+    session_id: str,
+    checkpoint_id: str,
+    include_memory: bool = False,  # Include full memory state
+) -> dict:
+    """
+    Inspect a checkpoint's detailed state.
+
+    Returns:
+        {
+            "checkpoint_id": "checkpoint_20260208_143030_xyz123",
+            "type": "node_complete",
+            "current_node": "collector",
+            "execution_path": ["start", "collector"],
+            "accumulated_outputs": {
+                "twitter_handles": ["@user1", "@user2"]
+            },
+            "memory": {...},  # If include_memory=True
+            "metrics_snapshot": {
+                "total_retries": 2,
+                "nodes_with_failures": []
+            }
+        }
+    """
+
+@tool
+async def resume_session(
+    agent_work_dir: str,
+    session_id: str,
+    checkpoint_id: str = "",  # Empty = latest checkpoint
+    memory_modifications: str = "{}",  # JSON string of memory overrides
+) -> dict:
+    """
+    Resume a session from a checkpoint.
+
+    Args:
+        agent_work_dir: Path to agent workspace
+        session_id: Session to resume
+        checkpoint_id: Specific checkpoint (empty = latest)
+        memory_modifications: JSON object with memory key overrides
+
+    Returns:
+        {
+            "session_id": "session_20260208_143022_abc12345",
+            "resumed_from": "checkpoint_20260208_143045_abc789",
+            "status": "active",  # Now actively running
+            "message": "Session resumed successfully from checkpoint_20260208_143045_abc789"
+        }
+    """
+```
+
+#### CLI Commands
+
+```bash
+# List resumable sessions
+hive sessions list --agent twitter_outreach --status failed
+
+# Show checkpoints for a session
+hive sessions checkpoints session_20260208_143022_abc12345
+
+# Inspect a checkpoint
+hive sessions inspect session_20260208_143022_abc12345 checkpoint_20260208_143045_abc789
+
+# Resume a session
+hive sessions resume session_20260208_143022_abc12345
+
+# Resume from specific checkpoint
+hive sessions resume session_20260208_143022_abc12345 --checkpoint checkpoint_20260208_143030_xyz123
+
+# Resume with memory modifications (e.g., after adding credentials)
+hive sessions resume session_20260208_143022_abc12345 --set api_key=sk-...
+```
+
+### 6. Configuration
+
+#### CheckpointConfig
+
+```python
+@dataclass
+class CheckpointConfig:
+    """Configuration for checkpoint behavior."""
+
+    # When to checkpoint
+    checkpoint_on_node_start: bool = True
+    checkpoint_on_node_complete: bool = True
+    checkpoint_on_edge: bool = False  # Usually redundant with node_start
+    checkpoint_on_loop_iteration: bool = False  # Can be expensive
+    checkpoint_every_n_iterations: int = 0  # 0 = disabled
+
+    # Pruning
+    max_checkpoints_per_session: int = 100
+    prune_after_node_count: int = 10  # Prune every N nodes
+    keep_clean_checkpoints_only: bool = False
+
+    # Performance
+    async_checkpoint: bool = True  # Don't block execution on checkpoint writes
+
+    # What to include
+    include_conversation_snapshots: bool = True
+    include_full_memory: bool = True
+```
+
+#### Agent-Level Configuration
+
+```python
+# In agent.py or config.py
+
+class MyAgent(Agent):
+    def get_checkpoint_config(self) -> CheckpointConfig:
+        """Override to customize checkpoint behavior."""
+        return CheckpointConfig(
+            checkpoint_on_node_start=True,
+            checkpoint_on_node_complete=True,
+            checkpoint_every_n_iterations=5,  # Checkpoint every 5 iterations in loops
+            max_checkpoints_per_session=50,
+        )
+```
+
+## Implementation Plan
+
+### Phase 1: Core Checkpoint Infrastructure (Week 1)
+
+1. **Create checkpoint schemas**
+   - `Checkpoint` dataclass
+   - `CheckpointIndex` for manifest
+   - Serialization/deserialization
+
+2. **Implement CheckpointStore**
+   - `save_checkpoint()` with atomic writes
+   - `load_checkpoint()` with deserialization
+   - `list_checkpoints()` with filtering
+   - `prune_checkpoints()` for cleanup
+
+3. **Update SessionState schema**
+   - Add `resume_from_checkpoint_id` field
+   - Add `checkpoints_enabled` flag
+
+### Phase 2: GraphExecutor Integration (Week 2)
+
+1. **Modify GraphExecutor**
+   - Add `CheckpointConfig` parameter
+   - Implement checkpoint saving at node boundaries
+   - Implement checkpoint restoration logic
+   - Handle memory state snapshots
+
+2. **Update execution loop**
+   - Checkpoint before node execution
+   - Checkpoint after successful completion
+   - Mark failure checkpoints on errors
+
+### Phase 3: EventLoopNode Integration (Week 3)
+
+1. **Enhance conversation restoration**
+   - Link checkpoints to conversation states
+   - Ensure OutputAccumulator is checkpointed
+   - Test loop resumption from middle of execution
+
+2. **Add optional loop iteration checkpoints**
+   - Configurable iteration frequency
+   - Balance between granularity and performance
+
+### Phase 4: User-Facing Features (Week 4)
+
+1. **Implement MCP tools**
+   - `list_resumable_sessions`
+   - `list_session_checkpoints`
+   - `inspect_checkpoint`
+   - `resume_session`
+
+2. **Add CLI commands**
+   - `hive sessions list`
+   - `hive sessions checkpoints`
+   - `hive sessions inspect`
+   - `hive sessions resume`
+
+3. **Update TUI**
+   - Show resumable sessions in UI
+   - Allow resume from TUI interface
+
+### Phase 5: Testing & Documentation (Week 5)
+
+1. **Write comprehensive tests**
+   - Unit tests for CheckpointStore
+   - Integration tests for resume flow
+   - Edge case testing (concurrent checkpoints, corruption, etc.)
+
+2. **Performance testing**
+   - Measure checkpoint overhead
+   - Optimize async checkpoint writing
+   - Test with large memory states
+
+3. **Documentation**
+   - Update skills with resume patterns
+   - Document checkpoint configuration
+   - Add troubleshooting guide
+
+## Performance Considerations
+
+### Checkpoint Overhead
+
+**Estimated overhead per checkpoint**:
+- Memory serialization: ~5-10ms for typical state (< 1MB)
+- File I/O: ~10-20ms for atomic write
+- Total: ~15-30ms per checkpoint
+
+**Mitigation strategies**:
+1. **Async checkpointing**: Don't block execution on writes
+2. **Selective checkpointing**: Only checkpoint at important boundaries
+3. **Incremental checkpoints**: Store deltas instead of full state (future)
+4. **Compression**: Compress large memory states before writing
+
+### Storage Size
+
+**Typical checkpoint size**:
+- Small memory state (< 100KB): ~50-100KB per checkpoint
+- Medium memory state (< 1MB): ~500KB-1MB per checkpoint
+- Large memory state (> 1MB): ~1-5MB per checkpoint
+
+**Mitigation strategies**:
+1. **Pruning**: Keep only N most recent checkpoints
+2. **Clean-only retention**: Only keep checkpoints from clean execution
+3. **Compression**: Use gzip for checkpoint files
+4. **Archiving**: Move old checkpoints to archive storage
+
+## Error Handling
+
+### Checkpoint Save Failures
+
+**Scenarios**:
+- Disk full
+- Permission errors
+- Serialization failures
+- Concurrent writes
+
+**Handling**:
+```python
+try:
+    await checkpoint_store.save_checkpoint(session_id, checkpoint)
+except CheckpointSaveError as e:
+    # Log warning but don't fail execution
+    logger.warning(f"Failed to save checkpoint: {e}")
+    # Continue execution without checkpoint
+```
+
+### Checkpoint Load Failures
+
+**Scenarios**:
+- Checkpoint file corrupted
+- Checkpoint format incompatible
+- Referenced conversation state missing
+
+**Handling**:
+```python
+try:
+    checkpoint = await checkpoint_store.load_checkpoint(session_id, checkpoint_id)
+except CheckpointLoadError as e:
+    # Try to find previous valid checkpoint
+    checkpoints = await checkpoint_store.list_checkpoints(session_id)
+    for cp in reversed(checkpoints):
+        try:
+            checkpoint = await checkpoint_store.load_checkpoint(session_id, cp.checkpoint_id)
+            logger.info(f"Fell back to checkpoint {cp.checkpoint_id}")
+            break
+        except CheckpointLoadError:
+            continue
+    else:
+        raise ValueError(f"No valid checkpoints found for session {session_id}")
+```
+
+### Resume Failures
+
+**Scenarios**:
+- Checkpoint state inconsistent with current graph
+- Node no longer exists in updated agent code
+- Memory keys missing required values
+
+**Handling**:
+1. **Validation**: Verify checkpoint compatibility before resume
+2. **Graceful degradation**: Resume from earlier checkpoint if possible
+3. **User notification**: Clear error messages about why resume failed
+
+## Migration Path
+
+### Backward Compatibility
+
+**Existing sessions** (without checkpoints):
+- Can still be executed normally
+- Checkpoint system is opt-in per agent
+- No breaking changes to existing APIs
+
+**Enabling checkpoints**:
+```python
+# Option 1: Agent-level default
+class MyAgent(Agent):
+    checkpoint_config = CheckpointConfig(
+        checkpoint_on_node_complete=True,
+    )
+
+# Option 2: Runtime override
+runtime = create_agent_runtime(
+    agent=my_agent,
+    checkpoint_config=CheckpointConfig(...),
+)
+
+# Option 3: Per-execution
+result = await executor.execute(
+    graph=graph,
+    goal=goal,
+    checkpoint_config=CheckpointConfig(...),
+)
+```
+
+### Gradual Rollout
+
+1. **Phase 1**: Core infrastructure, no user-facing features
+2. **Phase 2**: Opt-in for specific agents via config
+3. **Phase 3**: User-facing MCP tools and CLI
+4. **Phase 4**: Enable by default for all new agents
+5. **Phase 5**: TUI integration
+
+## Future Enhancements
+
+### 1. Incremental Checkpoints
+
+Instead of full state snapshots, store only deltas:
+```python
+@dataclass
+class IncrementalCheckpoint:
+    """Checkpoint with only changed state."""
+    base_checkpoint_id: str  # Parent checkpoint
+    memory_delta: dict[str, Any]  # Only changed keys
+    added_outputs: dict[str, Any]  # Only new outputs
+```
+
+### 2. Distributed Checkpointing
+
+For long-running agents, checkpoint to cloud storage:
+```python
+checkpoint_config = CheckpointConfig(
+    storage_backend="s3",  # or "gcs", "azure"
+    storage_url="s3://my-bucket/checkpoints/",
+)
+```
+
+### 3. Checkpoint Compression
+
+Compress large memory states:
+```python
+checkpoint_config = CheckpointConfig(
+    compress=True,
+    compression_threshold_bytes=100_000,  # Compress if > 100KB
+)
+```
+
+### 4. Smart Checkpoint Selection
+
+Use heuristics to decide when to checkpoint:
+```python
+class SmartCheckpointStrategy:
+    def should_checkpoint(self, context: ExecutionContext) -> bool:
+        # Checkpoint after expensive nodes
+        if context.node_latency_ms > 30_000:
+            return True
+        # Checkpoint before risky operations
+        if context.node_id in ["api_call", "external_tool"]:
+            return True
+        # Checkpoint after significant memory changes
+        if context.memory_delta_size > 10:
+            return True
+        return False
+```
+
+## Security Considerations
+
+### 1. Sensitive Data in Checkpoints
+
+**Problem**: Checkpoints may contain sensitive data (API keys, credentials, PII)
+
+**Mitigation**:
+```python
+@dataclass
+class CheckpointConfig:
+    # Exclude sensitive keys from checkpoint
+    exclude_memory_keys: list[str] = field(default_factory=lambda: [
+        "api_key",
+        "credentials",
+        "access_token",
+    ])
+
+    # Encrypt checkpoint files
+    encrypt_checkpoints: bool = True
+    encryption_key_source: str = "keychain"  # or "env_var", "file"
+```
+
+### 2. Checkpoint Tampering
+
+**Problem**: Malicious modification of checkpoint files
+
+**Mitigation**:
+```python
+@dataclass
+class Checkpoint:
+    # Add cryptographic signature
+    signature: str  # HMAC of checkpoint content
+
+    def verify_signature(self, secret_key: str) -> bool:
+        """Verify checkpoint hasn't been tampered with."""
+        ...
+```
+
+## References
+
+- [RUNTIME_LOGGING.md](./RUNTIME_LOGGING.md) - Current logging system
+- [session_state.py](../schemas/session_state.py) - Session state schema
+- [session_store.py](../storage/session_store.py) - Session storage
+- [executor.py](../graph/executor.py) - Graph executor
+- [event_loop_node.py](../graph/event_loop_node.py) - EventLoop implementation
@@ -19,7 +19,7 @@ This layered approach enables efficient debugging: start with L1 to identify pro
 **Default since 2026-02-06**

 ```
-~/.hive/{agent_name}/
+~/.hive/agents/{agent_name}/
 └── sessions/
    └── session_YYYYMMDD_HHMMSS_{uuid}/
        ├── state.json           # Session state and metadata
@@ -42,7 +42,7 @@ This layered approach enables efficient debugging: start with L1 to identify pro
 **Read-only for backward compatibility**

 ```
-~/.hive/{agent_name}/
+~/.hive/agents/{agent_name}/
 ├── runtime_logs/
 │   └── runs/
 │       └── {run_id}/
@@ -197,8 +197,17 @@ class NodeStepLog:
    tokens_used: int
    latency_ms: int
    # ... detailed execution state
+    # Trace context (OTel-aligned; empty if observability context not set):
+    trace_id: str   # From set_trace_context (OTel trace)
+    span_id: str    # 16 hex chars per step (OTel span)
+    parent_span_id: str  # Optional; for nested span hierarchy
+    execution_id: str    # Session/run correlation id
 ```

+L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
+
+**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
+
 ---

 ## Querying Logs (MCP Tools)
@@ -215,7 +224,7 @@ Three MCP tools provide access to the logging system:

 ```python
 query_runtime_logs(
-    agent_work_dir: str,        # e.g., "~/.hive/twitter_outreach"
+    agent_work_dir: str,        # e.g., "~/.hive/agents/twitter_outreach"
    status: str = "",           # "needs_attention", "success", "failure", "degraded"
    limit: int = 20
 ) -> dict  # {"runs": [...], "total": int}
@@ -362,14 +371,14 @@ query_runtime_log_raw(agent_work_dir, run_id)
 ```python
 # 1. Find problematic runs (L1)
 result = query_runtime_logs(
-    agent_work_dir="~/.hive/twitter_outreach",
+    agent_work_dir="~/.hive/agents/twitter_outreach",
    status="needs_attention"
 )
 run_id = result["runs"][0]["run_id"]

 # 2. Identify failing nodes (L2)
 details = query_runtime_log_details(
-    agent_work_dir="~/.hive/twitter_outreach",
+    agent_work_dir="~/.hive/agents/twitter_outreach",
    run_id=run_id,
    needs_attention_only=True
 )
@@ -377,7 +386,7 @@ problem_node = details["nodes"][0]["node_id"]

 # 3. Analyze root cause (L3)
 raw = query_runtime_log_raw(
-    agent_work_dir="~/.hive/twitter_outreach",
+    agent_work_dir="~/.hive/agents/twitter_outreach",
    run_id=run_id,
    node_id=problem_node
 )
@@ -390,12 +399,12 @@ raw = query_runtime_log_raw(

 ```python
 # Get recent runs
-runs = query_runtime_logs("~/.hive/my_agent", limit=10)
+runs = query_runtime_logs("~/.hive/agents/my_agent", limit=10)

 # For each run, check specific node
 for run in runs["runs"]:
    node_details = query_runtime_log_details(
-        "~/.hive/my_agent",
+        "~/.hive/agents/my_agent",
        run["run_id"],
        node_id="problematic-node"
    )
@@ -411,7 +420,7 @@ import time

 while True:
    result = query_runtime_logs(
-        agent_work_dir="~/.hive/my_agent",
+        agent_work_dir="~/.hive/agents/my_agent",
        status="needs_attention",
        limit=1
    )
@@ -520,9 +529,10 @@ logger.start_run(goal_id, session_id=execution_id)
 **Written:** Incrementally (append per step)
 **Format:** JSONL (one JSON object per line)

+Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
+
 ```jsonl
-{"node_id":"intake-collector","step_index":3,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Missing required output 'twitter_handles'. You found the handle but didn't call set_output.","llm_response_text":"I found the profile...","tokens_used":1234,"latency_ms":2500}
-{"node_id":"intake-collector","step_index":4,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf twitter"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Still missing 'twitter_handles'.","llm_response_text":"Found more info...","tokens_used":1456,"latency_ms":2300}
+{"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
 ```

 **Why JSONL?**
@@ -574,10 +584,10 @@ The system automatically handles both old and new formats:

 ```python
 # MCP tools check both locations automatically
-result = query_runtime_logs("~/.hive/old_agent")
+result = query_runtime_logs("~/.hive/agents/old_agent")
 # Returns logs from both:
-# - ~/.hive/old_agent/runtime_logs/runs/*/
-# - ~/.hive/old_agent/sessions/session_*/logs/
+# - ~/.hive/agents/old_agent/runtime_logs/runs/*/
+# - ~/.hive/agents/old_agent/sessions/session_*/logs/
 ```

 ### Deprecation Warnings
@@ -636,9 +646,9 @@ Typical session with 5 nodes, 20 steps:
 **Symptom:** MCP tools return empty results

 **Check:**
-1. Verify storage path exists: `~/.hive/{agent_name}/`
-2. Check session directories: `ls ~/.hive/{agent_name}/sessions/`
-3. Verify logs directory exists: `ls ~/.hive/{agent_name}/sessions/session_*/logs/`
+1. Verify storage path exists: `~/.hive/agents/{agent_name}/`
+2. Check session directories: `ls ~/.hive/agents/{agent_name}/sessions/`
+3. Verify logs directory exists: `ls ~/.hive/agents/{agent_name}/sessions/session_*/logs/`
 4. Check file permissions

 ### Issue: Corrupt JSONL files
@@ -661,7 +671,7 @@ query_runtime_log_details(agent_work_dir, run_id)
 **Solution:**
 ```bash
 # Archive old sessions
-cd ~/.hive/{agent_name}/sessions/
+cd ~/.hive/agents/{agent_name}/sessions/
 find . -name "session_2025*" -type d -exec tar -czf archive.tar.gz {} +
 rm -rf session_2025*

@@ -12,6 +12,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import TYPE_CHECKING, Any

+from framework.graph.checkpoint_config import CheckpointConfig
 from framework.graph.executor import ExecutionResult
 from framework.runtime.event_bus import EventBus
 from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
@@ -102,6 +103,7 @@ class AgentRuntime:
        tool_executor: Callable | None = None,
        config: AgentRuntimeConfig | None = None,
        runtime_log_store: Any = None,
+        checkpoint_config: CheckpointConfig | None = None,
    ):
        """
        Initialize agent runtime.
@@ -115,11 +117,13 @@ class AgentRuntime:
            tool_executor: Function to execute tools
            config: Optional runtime configuration
            runtime_log_store: Optional RuntimeLogStore for per-execution logging
+            checkpoint_config: Optional checkpoint configuration for resumable sessions
        """
        self.graph = graph
        self.goal = goal
        self._config = config or AgentRuntimeConfig()
        self._runtime_log_store = runtime_log_store
+        self._checkpoint_config = checkpoint_config

        # Initialize storage
        storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
@@ -222,6 +226,7 @@ class AgentRuntime:
                    result_retention_ttl_seconds=self._config.execution_result_ttl_seconds,
                    runtime_log_store=self._runtime_log_store,
                    session_store=self._session_store,
+                    checkpoint_config=self._checkpoint_config,
                )
                await stream.start()
                self._streams[ep_id] = stream
@@ -460,6 +465,7 @@ def create_agent_runtime(
    config: AgentRuntimeConfig | None = None,
    runtime_log_store: Any = None,
    enable_logging: bool = True,
+    checkpoint_config: CheckpointConfig | None = None,
 ) -> AgentRuntime:
    """
    Create and configure an AgentRuntime with entry points.
@@ -480,6 +486,8 @@ def create_agent_runtime(
            If None and enable_logging=True, creates one automatically.
        enable_logging: Whether to enable runtime logging (default: True).
            Set to False to disable logging entirely.
+        checkpoint_config: Optional checkpoint configuration for resumable sessions.
+            If None, uses default checkpointing behavior.

    Returns:
        Configured AgentRuntime (not yet started)
@@ -500,6 +508,7 @@ def create_agent_runtime(
        tool_executor=tool_executor,
        config=config,
        runtime_log_store=runtime_log_store,
+        checkpoint_config=checkpoint_config,
    )

    for spec in entry_points:
@@ -13,6 +13,7 @@ from datetime import datetime
 from pathlib import Path
 from typing import Any

+from framework.observability import set_trace_context
 from framework.schemas.decision import Decision, DecisionType, Option, Outcome
 from framework.schemas.run import Run, RunStatus
 from framework.storage.backend import FileStorage
@@ -79,6 +80,14 @@ class Runtime:
            The run ID
        """
        run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        trace_id = uuid.uuid4().hex
+        execution_id = uuid.uuid4().hex  # 32 hex, OTel/W3C-aligned for logs
+
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=execution_id,
+            goal_id=goal_id,
+        )

        self._current_run = Run(
            id=run_id,
@@ -17,6 +17,7 @@ from dataclasses import dataclass, field
 from datetime import datetime
 from typing import TYPE_CHECKING, Any

+from framework.graph.checkpoint_config import CheckpointConfig
 from framework.graph.executor import ExecutionResult, GraphExecutor
 from framework.runtime.shared_state import IsolationLevel, SharedStateManager
 from framework.runtime.stream_runtime import StreamRuntime, StreamRuntimeAdapter
@@ -115,6 +116,7 @@ class ExecutionStream:
        result_retention_ttl_seconds: float | None = None,
        runtime_log_store: Any = None,
        session_store: "SessionStore | None" = None,
+        checkpoint_config: CheckpointConfig | None = None,
    ):
        """
        Initialize execution stream.
@@ -133,6 +135,7 @@ class ExecutionStream:
            tool_executor: Function to execute tools
            runtime_log_store: Optional RuntimeLogStore for per-execution logging
            session_store: Optional SessionStore for unified session storage
+            checkpoint_config: Optional checkpoint configuration for resumable sessions
        """
        self.stream_id = stream_id
        self.entry_spec = entry_spec
@@ -148,6 +151,7 @@ class ExecutionStream:
        self._result_retention_max = result_retention_max
        self._result_retention_ttl_seconds = result_retention_ttl_seconds
        self._runtime_log_store = runtime_log_store
+        self._checkpoint_config = checkpoint_config
        self._session_store = session_store

        # Create stream-scoped runtime
@@ -357,6 +361,13 @@ class ExecutionStream:
                # Create runtime adapter for this execution
                runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)

+                # Start run to set trace context (CRITICAL for observability)
+                runtime_adapter.start_run(
+                    goal_id=self.goal.id,
+                    goal_description=self.goal.description,
+                    input_data=ctx.input_data,
+                )
+
                # Create per-execution runtime logger
                runtime_logger = None
                if self._runtime_log_store:
@@ -400,6 +411,7 @@ class ExecutionStream:
                    goal=self.goal,
                    input_data=ctx.input_data,
                    session_state=ctx.session_state,
+                    checkpoint_config=self._checkpoint_config,
                )

                # Clean up executor reference
@@ -408,6 +420,13 @@ class ExecutionStream:
                # Store result with retention
                self._record_execution_result(execution_id, result)

+                # End run to complete trace (for observability)
+                runtime_adapter.end_run(
+                    success=result.success,
+                    narrative=f"Execution {'succeeded' if result.success else 'failed'}",
+                    output_data=result.output,
+                )
+
                # Update context
                ctx.completed_at = datetime.now()
                ctx.status = "completed" if result.success else "failed"
@@ -437,8 +456,42 @@ class ExecutionStream:
                logger.debug(f"Execution {execution_id} completed: success={result.success}")

            except asyncio.CancelledError:
-                ctx.status = "cancelled"
-                raise
+                # Execution was cancelled
+                # The executor catches CancelledError and returns a paused result,
+                # but if cancellation happened before executor started, we won't have a result
+                logger.info(f"Execution {execution_id} cancelled")
+
+                # Check if we have a result (executor completed and returned)
+                try:
+                    _ = result  # Check if result variable exists
+                    has_result = True
+                except NameError:
+                    has_result = False
+                    result = ExecutionResult(
+                        success=False,
+                        error="Execution cancelled",
+                    )
+
+                # Update context status based on result
+                if has_result and result.paused_at:
+                    ctx.status = "paused"
+                    ctx.completed_at = datetime.now()
+                else:
+                    ctx.status = "cancelled"
+
+                # Clean up executor reference
+                self._active_executors.pop(execution_id, None)
+
+                # Store result with retention
+                self._record_execution_result(execution_id, result)
+
+                # Write session state
+                if has_result and result.paused_at:
+                    await self._write_session_state(execution_id, ctx, result=result)
+                else:
+                    await self._write_session_state(execution_id, ctx, error="Execution cancelled")
+
+                # Don't re-raise - we've handled it and saved state

            except Exception as e:
                ctx.status = "failed"
@@ -456,6 +509,16 @@ class ExecutionStream:
                # Write error session state
                await self._write_session_state(execution_id, ctx, error=str(e))

+                # End run with failure (for observability)
+                try:
+                    runtime_adapter.end_run(
+                        success=False,
+                        narrative=f"Execution failed: {str(e)}",
+                        output_data={},
+                    )
+                except Exception:
+                    pass  # Don't let end_run errors mask the original error
+
                # Emit failure event
                if self._event_bus:
                    await self._event_bus.emit_execution_failed(
@@ -511,7 +574,11 @@ class ExecutionStream:
                else:
                    status = SessionStatus.FAILED
            elif error:
-                status = SessionStatus.FAILED
+                # Check if this is a cancellation
+                if ctx.status == "cancelled" or "cancelled" in error.lower():
+                    status = SessionStatus.CANCELLED
+                else:
+                    status = SessionStatus.FAILED
            else:
                status = SessionStatus.ACTIVE

@@ -31,6 +31,9 @@ class NodeStepLog(BaseModel):

    For EventLoopNode, each iteration is a step. For single-step nodes
    (LLMNode, FunctionNode, RouterNode), step_index is 0.
+
+    OTel-aligned fields (trace_id, span_id, execution_id) enable correlation
+    and future OpenTelemetry export without schema changes.
    """

    node_id: str
@@ -48,6 +51,11 @@ class NodeStepLog(BaseModel):
    error: str = ""  # Error message if step failed
    stacktrace: str = ""  # Full stack trace if exception occurred
    is_partial: bool = False  # True if step didn't complete normally
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""  # OTel trace id (e.g. from set_trace_context)
+    span_id: str = ""  # OTel span id (16 hex chars per step)
+    parent_span_id: str = ""  # Optional; for nested span hierarchy
+    execution_id: str = ""  # Session/run correlation id


 # ---------------------------------------------------------------------------
@@ -56,7 +64,10 @@ class NodeStepLog(BaseModel):


 class NodeDetail(BaseModel):
-    """Per-node completion result and attention flags."""
+    """Per-node completion result and attention flags.
+
+    OTel-aligned fields (trace_id, span_id) tie L2 to the same trace as L3.
+    """

    node_id: str
    node_name: str = ""
@@ -78,6 +89,9 @@ class NodeDetail(BaseModel):
    continue_count: int = 0
    needs_attention: bool = False
    attention_reasons: list[str] = Field(default_factory=list)
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""
+    span_id: str = ""  # Optional node-level span for hierarchy


 # ---------------------------------------------------------------------------
@@ -86,7 +100,10 @@ class NodeDetail(BaseModel):


 class RunSummaryLog(BaseModel):
-    """Run-level summary for a full graph execution."""
+    """Run-level summary for a full graph execution.
+
+    OTel-aligned fields (trace_id, execution_id) tie L1 to the same trace as L2/L3.
+    """

    run_id: str
    agent_id: str = ""
@@ -101,6 +118,9 @@ class RunSummaryLog(BaseModel):
    started_at: str = ""  # ISO timestamp
    duration_ms: int = 0
    execution_quality: str = ""  # "clean"|"degraded"|"failed"
+    # OTel / trace context (from observability; empty if not set):
+    trace_id: str = ""
+    execution_id: str = ""


 # ---------------------------------------------------------------------------
@@ -52,29 +52,20 @@ class RuntimeLogStore:

        - New format (session_*): {storage_root}/sessions/{run_id}/logs/
        - Old format (anything else): {base_path}/runs/{run_id}/ (deprecated)
-
-        When base_path ends with 'runtime_logs', we use the parent directory
-        to avoid nesting under runtime_logs/.
-
-        This allows backward compatibility for reading old logs.
        """
        if run_id.startswith("session_"):
-            # New: sessions/{session_id}/logs/
-            # If base_path ends with runtime_logs, use parent (storage root)
            is_runtime_logs = self._base_path.name == "runtime_logs"
            root = self._base_path.parent if is_runtime_logs else self._base_path
            return root / "sessions" / run_id / "logs"
-        else:
-            # Old: runs/{run_id}/ (deprecated, backward compatibility only)
-            import warnings
+        import warnings

-            warnings.warn(
-                f"Reading logs from deprecated location for run_id={run_id}. "
-                "New sessions use unified storage at sessions/session_*/logs/",
-                DeprecationWarning,
-                stacklevel=3,
-            )
-            return self._base_path / "runs" / run_id
+        warnings.warn(
+            f"Reading logs from deprecated location for run_id={run_id}. "
+            "New sessions use unified storage at sessions/session_*/logs/",
+            DeprecationWarning,
+            stacklevel=3,
+        )
+        return self._base_path / "runs" / run_id

    # -------------------------------------------------------------------
    # Incremental write (sync — called from locked sections)
@@ -26,6 +26,7 @@ import uuid
 from datetime import UTC, datetime
 from typing import Any

+from framework.observability import get_trace_context
 from framework.runtime.runtime_log_schemas import (
    NodeDetail,
    NodeStepLog,
@@ -64,10 +65,8 @@ class RuntimeLogger:
            The run_id (same as session_id if provided)
        """
        if session_id:
-            # Use provided session_id as run_id (unified sessions)
            self._run_id = session_id
        else:
-            # Generate run_id in old format (backward compatibility)
            ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S")
            short_uuid = uuid.uuid4().hex[:8]
            self._run_id = f"{ts}_{short_uuid}"
@@ -118,6 +117,12 @@ class RuntimeLogger:
                )
            )

+        # OTel / trace context: from observability ContextVar (empty if not set)
+        ctx = get_trace_context()
+        trace_id = ctx.get("trace_id", "")
+        execution_id = ctx.get("execution_id", "")
+        span_id = uuid.uuid4().hex[:16]  # OTel 16-hex span_id per step
+
        step_log = NodeStepLog(
            node_id=node_id,
            node_type=node_type,
@@ -132,6 +137,9 @@ class RuntimeLogger:
            error=error,
            stacktrace=stacktrace,
            is_partial=is_partial,
+            trace_id=trace_id,
+            span_id=span_id,
+            execution_id=execution_id,
        )

        with self._lock:
@@ -190,6 +198,11 @@ class RuntimeLogger:
            needs_attention = True
            attention_reasons.append(f"Many iterations: {total_steps}")

+        # OTel / trace context for L2 correlation
+        ctx = get_trace_context()
+        trace_id = ctx.get("trace_id", "")
+        span_id = uuid.uuid4().hex[:16]  # Optional node-level span
+
        detail = NodeDetail(
            node_id=node_id,
            node_name=node_name,
@@ -210,6 +223,8 @@ class RuntimeLogger:
            continue_count=continue_count,
            needs_attention=needs_attention,
            attention_reasons=attention_reasons,
+            trace_id=trace_id,
+            span_id=span_id,
        )

        with self._lock:
@@ -274,6 +289,11 @@ class RuntimeLogger:
            for nd in node_details:
                attention_reasons.extend(nd.attention_reasons)

+            # OTel / trace context for L1 correlation
+            ctx = get_trace_context()
+            trace_id = ctx.get("trace_id", "")
+            execution_id = ctx.get("execution_id", "")
+
            summary = RunSummaryLog(
                run_id=self._run_id,
                agent_id=self._agent_id,
@@ -288,6 +308,8 @@ class RuntimeLogger:
                started_at=self._started_at,
                duration_ms=duration_ms,
                execution_quality=execution_quality,
+                trace_id=trace_id,
+                execution_id=execution_id,
            )

            await self._store.save_summary(self._run_id, summary)
@@ -12,6 +12,7 @@ import uuid
 from datetime import datetime
 from typing import TYPE_CHECKING, Any

+from framework.observability import set_trace_context
 from framework.schemas.decision import Decision, DecisionType, Option, Outcome
 from framework.schemas.run import Run, RunStatus
 from framework.storage.concurrent import ConcurrentStorage
@@ -119,6 +120,16 @@ class StreamRuntime:
        """
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}"
+        trace_id = uuid.uuid4().hex
+        otel_execution_id = uuid.uuid4().hex  # 32 hex, OTel/W3C-aligned for logs
+
+        set_trace_context(
+            trace_id=trace_id,
+            execution_id=otel_execution_id,
+            run_id=run_id,
+            goal_id=goal_id,
+            stream_id=self.stream_id,
+        )

        run = Run(
            id=run_id,
@@ -0,0 +1,178 @@
+"""
+Checkpoint Schema - Execution state snapshots for resumability.
+
+Checkpoints capture the execution state at strategic points (node boundaries,
+iterations) to enable crash recovery and resume-from-failure scenarios.
+"""
+
+from datetime import datetime
+from typing import Any
+
+from pydantic import BaseModel, Field
+
+
+class Checkpoint(BaseModel):
+    """
+    Single checkpoint in execution timeline.
+
+    Captures complete execution state at a specific point to enable
+    resuming from that exact point after failures or pauses.
+    """
+
+    # Identity
+    checkpoint_id: str  # Format: cp_{type}_{node_id}_{timestamp}
+    checkpoint_type: str  # "node_start" | "node_complete" | "loop_iteration"
+    session_id: str
+
+    # Timestamps
+    created_at: str  # ISO 8601 format
+
+    # Execution state
+    current_node: str | None = None
+    next_node: str | None = None  # For edge_transition checkpoints
+    execution_path: list[str] = Field(default_factory=list)  # Nodes executed so far
+
+    # State snapshots
+    shared_memory: dict[str, Any] = Field(default_factory=dict)  # Full SharedMemory._data
+    accumulated_outputs: dict[str, Any] = Field(default_factory=dict)  # Outputs accumulated so far
+
+    # Execution metrics (for resuming quality tracking)
+    metrics_snapshot: dict[str, Any] = Field(default_factory=dict)
+
+    # Metadata
+    is_clean: bool = True  # True if no failures/retries before this checkpoint
+    description: str = ""  # Human-readable checkpoint description
+
+    model_config = {"extra": "allow"}
+
+    @classmethod
+    def create(
+        cls,
+        checkpoint_type: str,
+        session_id: str,
+        current_node: str,
+        execution_path: list[str],
+        shared_memory: dict[str, Any],
+        next_node: str | None = None,
+        accumulated_outputs: dict[str, Any] | None = None,
+        metrics_snapshot: dict[str, Any] | None = None,
+        is_clean: bool = True,
+        description: str = "",
+    ) -> "Checkpoint":
+        """
+        Create a new checkpoint with generated ID and timestamp.
+
+        Args:
+            checkpoint_type: Type of checkpoint (node_start, node_complete, etc.)
+            session_id: Session this checkpoint belongs to
+            current_node: Node ID at checkpoint time
+            execution_path: List of node IDs executed so far
+            shared_memory: Full memory state snapshot
+            next_node: Next node to execute (for node_complete checkpoints)
+            accumulated_outputs: Outputs accumulated so far
+            metrics_snapshot: Execution metrics at checkpoint time
+            is_clean: Whether execution was clean up to this point
+            description: Human-readable description
+
+        Returns:
+            New Checkpoint instance
+        """
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        checkpoint_id = f"cp_{checkpoint_type}_{current_node}_{timestamp}"
+
+        if not description:
+            description = f"{checkpoint_type.replace('_', ' ').title()}: {current_node}"
+
+        return cls(
+            checkpoint_id=checkpoint_id,
+            checkpoint_type=checkpoint_type,
+            session_id=session_id,
+            created_at=datetime.now().isoformat(),
+            current_node=current_node,
+            next_node=next_node,
+            execution_path=execution_path,
+            shared_memory=shared_memory,
+            accumulated_outputs=accumulated_outputs or {},
+            metrics_snapshot=metrics_snapshot or {},
+            is_clean=is_clean,
+            description=description,
+        )
+
+
+class CheckpointSummary(BaseModel):
+    """
+    Lightweight checkpoint metadata for index listings.
+
+    Used in checkpoint index to provide fast scanning without
+    loading full checkpoint data.
+    """
+
+    checkpoint_id: str
+    checkpoint_type: str
+    created_at: str
+    current_node: str | None = None
+    next_node: str | None = None
+    is_clean: bool = True
+    description: str = ""
+
+    model_config = {"extra": "allow"}
+
+    @classmethod
+    def from_checkpoint(cls, checkpoint: Checkpoint) -> "CheckpointSummary":
+        """Create summary from full checkpoint."""
+        return cls(
+            checkpoint_id=checkpoint.checkpoint_id,
+            checkpoint_type=checkpoint.checkpoint_type,
+            created_at=checkpoint.created_at,
+            current_node=checkpoint.current_node,
+            next_node=checkpoint.next_node,
+            is_clean=checkpoint.is_clean,
+            description=checkpoint.description,
+        )
+
+
+class CheckpointIndex(BaseModel):
+    """
+    Manifest of all checkpoints for a session.
+
+    Provides fast lookup and filtering without loading
+    full checkpoint files.
+    """
+
+    session_id: str
+    checkpoints: list[CheckpointSummary] = Field(default_factory=list)
+    latest_checkpoint_id: str | None = None
+    total_checkpoints: int = 0
+
+    model_config = {"extra": "allow"}
+
+    def add_checkpoint(self, checkpoint: Checkpoint) -> None:
+        """Add a checkpoint to the index."""
+        summary = CheckpointSummary.from_checkpoint(checkpoint)
+        self.checkpoints.append(summary)
+        self.latest_checkpoint_id = checkpoint.checkpoint_id
+        self.total_checkpoints = len(self.checkpoints)
+
+    def get_checkpoint_summary(self, checkpoint_id: str) -> CheckpointSummary | None:
+        """Get checkpoint summary by ID."""
+        for summary in self.checkpoints:
+            if summary.checkpoint_id == checkpoint_id:
+                return summary
+        return None
+
+    def filter_by_type(self, checkpoint_type: str) -> list[CheckpointSummary]:
+        """Filter checkpoints by type."""
+        return [cp for cp in self.checkpoints if cp.checkpoint_type == checkpoint_type]
+
+    def filter_by_node(self, node_id: str) -> list[CheckpointSummary]:
+        """Filter checkpoints by current_node."""
+        return [cp for cp in self.checkpoints if cp.current_node == node_id]
+
+    def get_clean_checkpoints(self) -> list[CheckpointSummary]:
+        """Get all clean checkpoints (no failures before them)."""
+        return [cp for cp in self.checkpoints if cp.is_clean]
+
+    def get_latest_clean_checkpoint(self) -> CheckpointSummary | None:
+        """Get the most recent clean checkpoint."""
+        clean = self.get_clean_checkpoints()
+        return clean[-1] if clean else None
@@ -91,10 +91,11 @@ class SessionState(BaseModel):

    Version History:
    - v1.0: Initial schema (2026-02-06)
+    - v1.1: Added checkpoint support (2026-02-08)
    """

    # Schema version for forward/backward compatibility
-    schema_version: str = "1.0"
+    schema_version: str = "1.1"

    # Identity
    session_id: str  # Format: session_YYYYMMDD_HHMMSS_{uuid_8char}
@@ -136,6 +137,10 @@ class SessionState(BaseModel):
    # Isolation level (from ExecutionContext)
    isolation_level: str = "shared"

+    # Checkpointing (for crash recovery and resume-from-failure)
+    checkpoint_enabled: bool = False
+    latest_checkpoint_id: str | None = None
+
    model_config = {"extra": "allow"}

    @computed_field
@@ -154,6 +159,14 @@ class SessionState(BaseModel):
        """Can this session be resumed?"""
        return self.status == SessionStatus.PAUSED and self.progress.resume_from is not None

+    @computed_field
+    @property
+    def is_resumable_from_checkpoint(self) -> bool:
+        """Can this session be resumed from a checkpoint?"""
+        # ANY session with checkpoints can be resumed (not just failed ones)
+        # This enables: pause/resume, iterative execution, continuation after completion
+        return self.checkpoint_enabled and self.latest_checkpoint_id is not None
+
    @classmethod
    def from_execution_result(
        cls,
@@ -0,0 +1,325 @@
+"""
+Checkpoint Store - Manages checkpoint storage with atomic writes.
+
+Handles saving, loading, listing, and pruning of execution checkpoints
+for session resumability.
+"""
+
+import asyncio
+import logging
+from datetime import datetime, timedelta
+from pathlib import Path
+
+from framework.schemas.checkpoint import Checkpoint, CheckpointIndex, CheckpointSummary
+from framework.utils.io import atomic_write
+
+logger = logging.getLogger(__name__)
+
+
+class CheckpointStore:
+    """
+    Manages checkpoint storage with atomic writes.
+
+    Stores checkpoints in a session's checkpoints/ directory with
+    an index for fast lookup and filtering.
+
+    Directory structure:
+        checkpoints/
+            index.json              # Checkpoint manifest
+            cp_{type}_{node}_{timestamp}.json  # Individual checkpoints
+    """
+
+    def __init__(self, base_path: Path):
+        """
+        Initialize checkpoint store.
+
+        Args:
+            base_path: Session directory (e.g., ~/.hive/agents/agent_name/sessions/session_ID/)
+        """
+        self.base_path = Path(base_path)
+        self.checkpoints_dir = self.base_path / "checkpoints"
+        self.index_path = self.checkpoints_dir / "index.json"
+        self._index_lock = asyncio.Lock()
+
+    async def save_checkpoint(self, checkpoint: Checkpoint) -> None:
+        """
+        Atomically save checkpoint and update index.
+
+        Uses temp file + rename for crash safety. Updates index
+        after checkpoint is persisted.
+
+        Args:
+            checkpoint: Checkpoint to save
+
+        Raises:
+            OSError: If file write fails
+        """
+
+        def _write():
+            # Ensure directory exists
+            self.checkpoints_dir.mkdir(parents=True, exist_ok=True)
+
+            # Write checkpoint file atomically
+            checkpoint_path = self.checkpoints_dir / f"{checkpoint.checkpoint_id}.json"
+            with atomic_write(checkpoint_path) as f:
+                f.write(checkpoint.model_dump_json(indent=2))
+
+            logger.debug(f"Saved checkpoint {checkpoint.checkpoint_id}")
+
+        # Write checkpoint file (blocking I/O in thread)
+        await asyncio.to_thread(_write)
+
+        # Update index (with lock to prevent concurrent modifications)
+        async with self._index_lock:
+            await self._update_index_add(checkpoint)
+
+    async def load_checkpoint(
+        self,
+        checkpoint_id: str | None = None,
+    ) -> Checkpoint | None:
+        """
+        Load checkpoint by ID or latest.
+
+        Args:
+            checkpoint_id: Checkpoint ID to load, or None for latest
+
+        Returns:
+            Checkpoint object, or None if not found
+        """
+
+        def _read(checkpoint_id: str) -> Checkpoint | None:
+            checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
+
+            if not checkpoint_path.exists():
+                logger.warning(f"Checkpoint file not found: {checkpoint_path}")
+                return None
+
+            try:
+                return Checkpoint.model_validate_json(checkpoint_path.read_text())
+            except Exception as e:
+                logger.error(f"Failed to load checkpoint {checkpoint_id}: {e}")
+                return None
+
+        # Load index to get checkpoint ID if not provided
+        if checkpoint_id is None:
+            index = await self.load_index()
+            if not index or not index.latest_checkpoint_id:
+                logger.warning("No checkpoints found in index")
+                return None
+            checkpoint_id = index.latest_checkpoint_id
+
+        return await asyncio.to_thread(_read, checkpoint_id)
+
+    async def load_index(self) -> CheckpointIndex | None:
+        """
+        Load checkpoint index.
+
+        Returns:
+            CheckpointIndex or None if not found
+        """
+
+        def _read() -> CheckpointIndex | None:
+            if not self.index_path.exists():
+                return None
+
+            try:
+                return CheckpointIndex.model_validate_json(self.index_path.read_text())
+            except Exception as e:
+                logger.error(f"Failed to load checkpoint index: {e}")
+                return None
+
+        return await asyncio.to_thread(_read)
+
+    async def list_checkpoints(
+        self,
+        checkpoint_type: str | None = None,
+        is_clean: bool | None = None,
+    ) -> list[CheckpointSummary]:
+        """
+        List checkpoints with optional filters.
+
+        Args:
+            checkpoint_type: Filter by type (node_start, node_complete)
+            is_clean: Filter by clean status
+
+        Returns:
+            List of CheckpointSummary objects
+        """
+        index = await self.load_index()
+        if not index:
+            return []
+
+        checkpoints = index.checkpoints
+
+        # Apply filters
+        if checkpoint_type:
+            checkpoints = [cp for cp in checkpoints if cp.checkpoint_type == checkpoint_type]
+
+        if is_clean is not None:
+            checkpoints = [cp for cp in checkpoints if cp.is_clean == is_clean]
+
+        return checkpoints
+
+    async def delete_checkpoint(self, checkpoint_id: str) -> bool:
+        """
+        Delete a specific checkpoint.
+
+        Args:
+            checkpoint_id: Checkpoint ID to delete
+
+        Returns:
+            True if deleted, False if not found
+        """
+
+        def _delete(checkpoint_id: str) -> bool:
+            checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
+
+            if not checkpoint_path.exists():
+                logger.warning(f"Checkpoint file not found: {checkpoint_path}")
+                return False
+
+            try:
+                checkpoint_path.unlink()
+                logger.info(f"Deleted checkpoint {checkpoint_id}")
+                return True
+            except Exception as e:
+                logger.error(f"Failed to delete checkpoint {checkpoint_id}: {e}")
+                return False
+
+        # Delete checkpoint file
+        deleted = await asyncio.to_thread(_delete, checkpoint_id)
+
+        if deleted:
+            # Update index (with lock)
+            async with self._index_lock:
+                await self._update_index_remove(checkpoint_id)
+
+        return deleted
+
+    async def prune_checkpoints(
+        self,
+        max_age_days: int = 7,
+    ) -> int:
+        """
+        Prune checkpoints older than max_age_days.
+
+        Args:
+            max_age_days: Maximum age in days (default 7)
+
+        Returns:
+            Number of checkpoints deleted
+        """
+        index = await self.load_index()
+        if not index or not index.checkpoints:
+            return 0
+
+        # Calculate cutoff datetime
+        cutoff = datetime.now() - timedelta(days=max_age_days)
+
+        # Find old checkpoints
+        old_checkpoints = []
+        for cp in index.checkpoints:
+            try:
+                created = datetime.fromisoformat(cp.created_at)
+                if created < cutoff:
+                    old_checkpoints.append(cp.checkpoint_id)
+            except Exception as e:
+                logger.warning(f"Failed to parse timestamp for {cp.checkpoint_id}: {e}")
+
+        # Delete old checkpoints
+        deleted_count = 0
+        for checkpoint_id in old_checkpoints:
+            if await self.delete_checkpoint(checkpoint_id):
+                deleted_count += 1
+
+        if deleted_count > 0:
+            logger.info(f"Pruned {deleted_count} checkpoints older than {max_age_days} days")
+
+        return deleted_count
+
+    async def checkpoint_exists(self, checkpoint_id: str) -> bool:
+        """
+        Check if a checkpoint exists.
+
+        Args:
+            checkpoint_id: Checkpoint ID
+
+        Returns:
+            True if checkpoint exists
+        """
+
+        def _check(checkpoint_id: str) -> bool:
+            checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
+            return checkpoint_path.exists()
+
+        return await asyncio.to_thread(_check, checkpoint_id)
+
+    async def _update_index_add(self, checkpoint: Checkpoint) -> None:
+        """
+        Update index after adding a checkpoint.
+
+        Should be called with _index_lock held.
+
+        Args:
+            checkpoint: Checkpoint that was added
+        """
+
+        def _write(index: CheckpointIndex):
+            # Ensure directory exists
+            self.checkpoints_dir.mkdir(parents=True, exist_ok=True)
+
+            # Write index atomically
+            with atomic_write(self.index_path) as f:
+                f.write(index.model_dump_json(indent=2))
+
+        # Load or create index
+        index = await self.load_index()
+        if not index:
+            index = CheckpointIndex(
+                session_id=checkpoint.session_id,
+                checkpoints=[],
+            )
+
+        # Add checkpoint to index
+        index.add_checkpoint(checkpoint)
+
+        # Write updated index
+        await asyncio.to_thread(_write, index)
+
+        logger.debug(f"Updated index with checkpoint {checkpoint.checkpoint_id}")
+
+    async def _update_index_remove(self, checkpoint_id: str) -> None:
+        """
+        Update index after removing a checkpoint.
+
+        Should be called with _index_lock held.
+
+        Args:
+            checkpoint_id: Checkpoint ID that was removed
+        """
+
+        def _write(index: CheckpointIndex):
+            with atomic_write(self.index_path) as f:
+                f.write(index.model_dump_json(indent=2))
+
+        # Load index
+        index = await self.load_index()
+        if not index:
+            return
+
+        # Remove checkpoint from index
+        index.checkpoints = [cp for cp in index.checkpoints if cp.checkpoint_id != checkpoint_id]
+
+        # Update totals
+        index.total_checkpoints = len(index.checkpoints)
+
+        # Update latest_checkpoint_id if we removed the latest
+        if index.latest_checkpoint_id == checkpoint_id:
+            index.latest_checkpoint_id = (
+                index.checkpoints[-1].checkpoint_id if index.checkpoints else None
+            )
+
+        # Write updated index
+        await asyncio.to_thread(_write, index)
+
+        logger.debug(f"Removed checkpoint {checkpoint_id} from index")
@@ -37,7 +37,7 @@ class SessionStore:
        Initialize session store.

        Args:
-            base_path: Base path for storage (e.g., ~/.hive/twitter_outreach)
+            base_path: Base path for storage (e.g., ~/.hive/agents/twitter_outreach)
        """
        self.base_path = Path(base_path)
        self.sessions_dir = self.base_path / "sessions"
@@ -6,7 +6,7 @@ import time
 from textual.app import App, ComposeResult
 from textual.binding import Binding
 from textual.containers import Container, Horizontal, Vertical
-from textual.widgets import Footer, Label
+from textual.widgets import Footer, Input, Label

 from framework.runtime.agent_runtime import AgentRuntime
 from framework.runtime.event_bus import AgentEvent, EventType
@@ -208,17 +208,24 @@ class AdenTUI(App):
        Binding("ctrl+c", "ctrl_c", "Interrupt", show=False, priority=True),
        Binding("super+c", "ctrl_c", "Copy", show=False, priority=True),
        Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
+        Binding("ctrl+z", "pause_execution", "Pause", show=True, priority=True),
+        Binding("ctrl+r", "show_sessions", "Sessions", show=True, priority=True),
        Binding("tab", "focus_next", "Next Panel", show=True),
        Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
    ]

-    def __init__(self, runtime: AgentRuntime):
+    def __init__(
+        self,
+        runtime: AgentRuntime,
+        resume_session: str | None = None,
+        resume_checkpoint: str | None = None,
+    ):
        super().__init__()

        self.runtime = runtime
        self.log_pane = LogPane()
        self.graph_view = GraphOverview(runtime)
-        self.chat_repl = ChatRepl(runtime)
+        self.chat_repl = ChatRepl(runtime, resume_session, resume_checkpoint)
        self.status_bar = StatusBar(graph_id=runtime.graph.id)
        self.is_ready = False

@@ -528,9 +535,99 @@ class AdenTUI(App):
        except Exception as e:
            self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)

+    def action_pause_execution(self) -> None:
+        """Immediately pause execution by cancelling task (bound to Ctrl+Z)."""
+        try:
+            chat_repl = self.query_one(ChatRepl)
+            if not chat_repl._current_exec_id:
+                self.notify(
+                    "No active execution to pause",
+                    severity="information",
+                    timeout=3,
+                )
+                return
+
+            # Find and cancel the execution task - executor will catch and save state
+            task_cancelled = False
+            for stream in self.runtime._streams.values():
+                exec_id = chat_repl._current_exec_id
+                task = stream._execution_tasks.get(exec_id)
+                if task and not task.done():
+                    task.cancel()
+                    task_cancelled = True
+                    self.notify(
+                        "⏸ Execution paused - state saved",
+                        severity="information",
+                        timeout=3,
+                    )
+                    break
+
+            if not task_cancelled:
+                self.notify(
+                    "Execution already completed",
+                    severity="information",
+                    timeout=2,
+                )
+        except Exception as e:
+            self.notify(
+                f"Error pausing execution: {e}",
+                severity="error",
+                timeout=5,
+            )
+
+    def action_show_sessions(self) -> None:
+        """Show sessions list (bound to Ctrl+R)."""
+        # Send /sessions command to chat input
+        try:
+            chat_repl = self.query_one(ChatRepl)
+            chat_input = chat_repl.query_one("#chat-input", Input)
+            chat_input.value = "/sessions"
+            # Trigger submission
+            self.notify(
+                "💡 Type /sessions in the chat to see all sessions",
+                severity="information",
+                timeout=3,
+            )
+        except Exception:
+            self.notify(
+                "Use /sessions command to see all sessions",
+                severity="information",
+                timeout=3,
+            )
+
    async def on_unmount(self) -> None:
-        """Cleanup on app shutdown."""
+        """Cleanup on app shutdown - cancel execution which will save state."""
        self.is_ready = False
+
+        # Cancel any active execution - the executor will catch CancelledError
+        # and save current state as paused (no waiting needed!)
+        try:
+            import asyncio
+
+            chat_repl = self.query_one(ChatRepl)
+            if chat_repl._current_exec_id:
+                # Find the stream with this execution
+                for stream in self.runtime._streams.values():
+                    exec_id = chat_repl._current_exec_id
+                    task = stream._execution_tasks.get(exec_id)
+                    if task and not task.done():
+                        # Cancel the task - executor will catch and save state
+                        task.cancel()
+                        try:
+                            # Wait for executor to save state (may take a few seconds)
+                            # Longer timeout for quit to ensure state is properly saved
+                            await asyncio.wait_for(task, timeout=5.0)
+                        except (TimeoutError, asyncio.CancelledError):
+                            # Expected - task was cancelled
+                            # If timeout, state may not be fully saved
+                            pass
+                        except Exception:
+                            # Ignore other exceptions during cleanup
+                            pass
+                        break
+        except Exception:
+            pass
+
        try:
            if hasattr(self, "_subscription_id"):
                self.runtime.unsubscribe_from_events(self._subscription_id)
@@ -17,6 +17,7 @@ Client-facing input:
 import asyncio
 import re
 import threading
+from pathlib import Path
 from typing import Any

 from textual.app import ComposeResult
@@ -69,13 +70,20 @@ class ChatRepl(Vertical):
    }
    """

-    def __init__(self, runtime: AgentRuntime):
+    def __init__(
+        self,
+        runtime: AgentRuntime,
+        resume_session: str | None = None,
+        resume_checkpoint: str | None = None,
+    ):
        super().__init__()
        self.runtime = runtime
        self._current_exec_id: str | None = None
        self._streaming_snapshot: str = ""
        self._waiting_for_input: bool = False
        self._input_node_id: str | None = None
+        self._resume_session = resume_session
+        self._resume_checkpoint = resume_checkpoint

        # Dedicated event loop for agent execution.
        # Keeps blocking runtime code (LLM calls, MCP tools) off
@@ -121,10 +129,589 @@ class ChatRepl(Vertical):
        if was_at_bottom:
            history.scroll_end(animate=False)

+    async def _handle_command(self, command: str) -> None:
+        """Handle slash commands for session and checkpoint operations."""
+        parts = command.split(maxsplit=2)
+        cmd = parts[0].lower()
+
+        if cmd == "/help":
+            self._write_history("""[bold cyan]Available Commands:[/bold cyan]
+  [bold]/sessions[/bold]                    - List all sessions for this agent
+  [bold]/sessions[/bold] <session_id>       - Show session details and checkpoints
+  [bold]/resume[/bold]                     - Resume latest paused/failed session
+  [bold]/resume[/bold] <session_id>         - Resume session from where it stopped
+  [bold]/recover[/bold] <session_id> <cp_id> - Recover from specific checkpoint
+  [bold]/pause[/bold]                      - Pause current execution (Ctrl+Z)
+  [bold]/help[/bold]                       - Show this help message
+
+[dim]Examples:[/dim]
+  /sessions                              [dim]# List all sessions[/dim]
+  /sessions session_20260208_143022      [dim]# Show session details[/dim]
+  /resume                                [dim]# Resume latest session (from state)[/dim]
+  /resume session_20260208_143022        [dim]# Resume specific session (from state)[/dim]
+  /recover session_20260208_143022 cp_xxx [dim]# Recover from specific checkpoint[/dim]
+  /pause                                 [dim]# Pause (or Ctrl+Z)[/dim]
+""")
+        elif cmd == "/sessions":
+            session_id = parts[1].strip() if len(parts) > 1 else None
+            await self._cmd_sessions(session_id)
+        elif cmd == "/resume":
+            # Resume from session state (not checkpoint-based)
+            if len(parts) < 2:
+                session_id = await self._find_latest_resumable_session()
+                if not session_id:
+                    self._write_history("[bold red]No resumable sessions found[/bold red]")
+                    self._write_history("  Tip: Use [bold]/sessions[/bold] to see all sessions")
+                    return
+            else:
+                session_id = parts[1].strip()
+            await self._cmd_resume(session_id)
+        elif cmd == "/recover":
+            # Recover from specific checkpoint
+            if len(parts) < 3:
+                self._write_history(
+                    "[bold red]Error:[/bold red] /recover requires session_id and checkpoint_id"
+                )
+                self._write_history("  Usage: [bold]/recover <session_id> <checkpoint_id>[/bold]")
+                self._write_history(
+                    "  Tip: Use [bold]/sessions <session_id>[/bold] to see checkpoints"
+                )
+                return
+            session_id = parts[1].strip()
+            checkpoint_id = parts[2].strip()
+            await self._cmd_recover(session_id, checkpoint_id)
+        elif cmd == "/pause":
+            await self._cmd_pause()
+        else:
+            self._write_history(
+                f"[bold red]Unknown command:[/bold red] {cmd}\n"
+                "Type [bold]/help[/bold] for available commands"
+            )
+
+    async def _cmd_sessions(self, session_id: str | None) -> None:
+        """List sessions or show details of a specific session."""
+        try:
+            # Get storage path from runtime
+            storage_path = self.runtime._storage.base_path
+
+            if session_id:
+                # Show details of specific session including checkpoints
+                await self._show_session_details(storage_path, session_id)
+            else:
+                # List all sessions
+                await self._list_sessions(storage_path)
+        except Exception as e:
+            self._write_history(f"[bold red]Error:[/bold red] {e}")
+            self._write_history("  Could not access session data")
+
+    async def _find_latest_resumable_session(self) -> str | None:
+        """Find the most recent paused or failed session."""
+        try:
+            storage_path = self.runtime._storage.base_path
+            sessions_dir = storage_path / "sessions"
+
+            if not sessions_dir.exists():
+                return None
+
+            # Get all sessions, most recent first
+            session_dirs = sorted(
+                [d for d in sessions_dir.iterdir() if d.is_dir()],
+                key=lambda d: d.name,
+                reverse=True,
+            )
+
+            # Find first paused, failed, or cancelled session
+            import json
+
+            for session_dir in session_dirs:
+                state_file = session_dir / "state.json"
+                if not state_file.exists():
+                    continue
+
+                with open(state_file) as f:
+                    state = json.load(f)
+
+                status = state.get("status", "").lower()
+
+                # Check if resumable (any non-completed status)
+                if status in ["paused", "failed", "cancelled", "active"]:
+                    return session_dir.name
+
+            return None
+        except Exception:
+            return None
+
+    async def _list_sessions(self, storage_path: Path) -> None:
+        """List all sessions for the agent."""
+        self._write_history("[bold cyan]Available Sessions:[/bold cyan]")
+
+        # Find all session directories
+        sessions_dir = storage_path / "sessions"
+        if not sessions_dir.exists():
+            self._write_history("[dim]No sessions found.[/dim]")
+            self._write_history("  Sessions will appear here after running the agent")
+            return
+
+        session_dirs = sorted(
+            [d for d in sessions_dir.iterdir() if d.is_dir()],
+            key=lambda d: d.name,
+            reverse=True,  # Most recent first
+        )
+
+        if not session_dirs:
+            self._write_history("[dim]No sessions found.[/dim]")
+            return
+
+        self._write_history(f"[dim]Found {len(session_dirs)} session(s)[/dim]\n")
+
+        for session_dir in session_dirs[:10]:  # Show last 10 sessions
+            session_id = session_dir.name
+            state_file = session_dir / "state.json"
+
+            if not state_file.exists():
+                continue
+
+            # Read session state
+            try:
+                import json
+
+                with open(state_file) as f:
+                    state = json.load(f)
+
+                status = state.get("status", "unknown").upper()
+
+                # Status with color
+                if status == "COMPLETED":
+                    status_colored = f"[green]{status}[/green]"
+                elif status == "FAILED":
+                    status_colored = f"[red]{status}[/red]"
+                elif status == "PAUSED":
+                    status_colored = f"[yellow]{status}[/yellow]"
+                elif status == "CANCELLED":
+                    status_colored = f"[dim yellow]{status}[/dim yellow]"
+                else:
+                    status_colored = f"[dim]{status}[/dim]"
+
+                # Check for checkpoints
+                checkpoint_dir = session_dir / "checkpoints"
+                checkpoint_count = 0
+                if checkpoint_dir.exists():
+                    checkpoint_files = list(checkpoint_dir.glob("cp_*.json"))
+                    checkpoint_count = len(checkpoint_files)
+
+                # Session line
+                self._write_history(f"📋 [bold]{session_id}[/bold]")
+                self._write_history(f"   Status: {status_colored}  Checkpoints: {checkpoint_count}")
+
+                if checkpoint_count > 0:
+                    self._write_history(f"   [dim]Resume: /resume {session_id}[/dim]")
+
+                self._write_history("")  # Blank line
+
+            except Exception as e:
+                self._write_history(f"   [dim red]Error reading: {e}[/dim red]")
+
+    async def _show_session_details(self, storage_path: Path, session_id: str) -> None:
+        """Show detailed information about a specific session."""
+        self._write_history(f"[bold cyan]Session Details:[/bold cyan] {session_id}\n")
+
+        session_dir = storage_path / "sessions" / session_id
+        if not session_dir.exists():
+            self._write_history("[bold red]Error:[/bold red] Session not found")
+            self._write_history(f"  Path: {session_dir}")
+            self._write_history("  Tip: Use [bold]/sessions[/bold] to see available sessions")
+            return
+
+        state_file = session_dir / "state.json"
+        if not state_file.exists():
+            self._write_history("[bold red]Error:[/bold red] Session state not found")
+            return
+
+        try:
+            import json
+
+            with open(state_file) as f:
+                state = json.load(f)
+
+            # Basic info
+            status = state.get("status", "unknown").upper()
+            if status == "COMPLETED":
+                status_colored = f"[green]{status}[/green]"
+            elif status == "FAILED":
+                status_colored = f"[red]{status}[/red]"
+            elif status == "PAUSED":
+                status_colored = f"[yellow]{status}[/yellow]"
+            elif status == "CANCELLED":
+                status_colored = f"[dim yellow]{status}[/dim yellow]"
+            else:
+                status_colored = status
+
+            self._write_history(f"Status: {status_colored}")
+
+            if "started_at" in state:
+                self._write_history(f"Started: {state['started_at']}")
+            if "completed_at" in state:
+                self._write_history(f"Completed: {state['completed_at']}")
+
+            # Execution path
+            if "execution_path" in state and state["execution_path"]:
+                self._write_history("\n[bold]Execution Path:[/bold]")
+                for node_id in state["execution_path"]:
+                    self._write_history(f"  ✓ {node_id}")
+
+            # Checkpoints
+            checkpoint_dir = session_dir / "checkpoints"
+            if checkpoint_dir.exists():
+                checkpoint_files = sorted(checkpoint_dir.glob("cp_*.json"))
+                if checkpoint_files:
+                    self._write_history(
+                        f"\n[bold]Available Checkpoints:[/bold] ({len(checkpoint_files)})"
+                    )
+
+                    # Load and show checkpoints
+                    for i, cp_file in enumerate(checkpoint_files[-5:], 1):  # Last 5
+                        try:
+                            with open(cp_file) as f:
+                                cp_data = json.load(f)
+
+                            cp_id = cp_data.get("checkpoint_id", cp_file.stem)
+                            cp_type = cp_data.get("checkpoint_type", "unknown")
+                            current_node = cp_data.get("current_node", "unknown")
+                            is_clean = cp_data.get("is_clean", False)
+
+                            clean_marker = "✓" if is_clean else "⚠"
+                            self._write_history(f"  {i}. {clean_marker} [cyan]{cp_id}[/cyan]")
+                            self._write_history(f"     Type: {cp_type}, Node: {current_node}")
+                        except Exception:
+                            pass
+
+            # Quick actions
+            if checkpoint_dir.exists() and list(checkpoint_dir.glob("cp_*.json")):
+                self._write_history("\n[bold]Quick Actions:[/bold]")
+                self._write_history(
+                    f"  [dim]/resume {session_id}[/dim]  - Resume from latest checkpoint"
+                )
+
+        except Exception as e:
+            self._write_history(f"[bold red]Error:[/bold red] {e}")
+            import traceback
+
+            self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
+
+    async def _cmd_resume(self, session_id: str) -> None:
+        """Resume a session from its last state (session state, not checkpoint)."""
+        try:
+            storage_path = self.runtime._storage.base_path
+            session_dir = storage_path / "sessions" / session_id
+
+            # Verify session exists
+            if not session_dir.exists():
+                self._write_history(f"[bold red]Error:[/bold red] Session not found: {session_id}")
+                self._write_history("  Use [bold]/sessions[/bold] to see available sessions")
+                return
+
+            # Load session state
+            state_file = session_dir / "state.json"
+            if not state_file.exists():
+                self._write_history("[bold red]Error:[/bold red] Session state not found")
+                return
+
+            import json
+
+            with open(state_file) as f:
+                state = json.load(f)
+
+            # Resume from session state (not checkpoint)
+            progress = state.get("progress", {})
+            paused_at = progress.get("paused_at") or progress.get("resume_from")
+
+            if paused_at:
+                # Has paused_at - resume from there
+                resume_session_state = {
+                    "paused_at": paused_at,
+                    "memory": state.get("memory", {}),
+                    "execution_path": progress.get("path", []),
+                    "node_visit_counts": progress.get("node_visit_counts", {}),
+                }
+                resume_info = f"From node: [cyan]{paused_at}[/cyan]"
+            else:
+                # No paused_at - just retry with same input
+                resume_session_state = {}
+                resume_info = "Retrying with same input"
+
+            # Display resume info
+            self._write_history(f"[bold cyan]🔄 Resuming session[/bold cyan] {session_id}")
+            self._write_history(f"   {resume_info}")
+            if paused_at:
+                self._write_history("   [dim](Using session state, not checkpoint)[/dim]")
+
+            # Check if already executing
+            if self._current_exec_id is not None:
+                self._write_history(
+                    "[bold yellow]Warning:[/bold yellow] An execution is already running"
+                )
+                self._write_history("  Wait for it to complete or use /pause first")
+                return
+
+            # Get original input data from session state
+            input_data = state.get("input_data", {})
+
+            # Show indicator
+            indicator = self.query_one("#processing-indicator", Label)
+            indicator.update("Resuming from session state...")
+            indicator.display = True
+
+            # Update placeholder
+            chat_input = self.query_one("#chat-input", Input)
+            chat_input.placeholder = "Commands: /pause, /sessions (agent resuming...)"
+
+            # Trigger execution with resume state
+            try:
+                entry_points = self.runtime.get_entry_points()
+                if not entry_points:
+                    self._write_history("[bold red]Error:[/bold red] No entry points available")
+                    return
+
+                # Submit execution with resume state and original input data
+                future = asyncio.run_coroutine_threadsafe(
+                    self.runtime.trigger(
+                        entry_points[0].id,
+                        input_data=input_data,
+                        session_state=resume_session_state,
+                    ),
+                    self._agent_loop,
+                )
+                exec_id = await asyncio.wrap_future(future)
+                self._current_exec_id = exec_id
+
+                self._write_history(
+                    f"[green]✓[/green] Resume started (execution: {exec_id[:12]}...)"
+                )
+                self._write_history("  Agent is continuing from where it stopped...")
+
+            except Exception as e:
+                self._write_history(f"[bold red]Error starting resume:[/bold red] {e}")
+                indicator.display = False
+                chat_input.placeholder = "Enter input for agent..."
+
+        except Exception as e:
+            self._write_history(f"[bold red]Error:[/bold red] {e}")
+            import traceback
+
+            self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
+
+    async def _cmd_recover(self, session_id: str, checkpoint_id: str) -> None:
+        """Recover a session from a specific checkpoint (time-travel debugging)."""
+        try:
+            storage_path = self.runtime._storage.base_path
+            session_dir = storage_path / "sessions" / session_id
+
+            # Verify session exists
+            if not session_dir.exists():
+                self._write_history(f"[bold red]Error:[/bold red] Session not found: {session_id}")
+                self._write_history("  Use [bold]/sessions[/bold] to see available sessions")
+                return
+
+            # Verify checkpoint exists
+            checkpoint_file = session_dir / "checkpoints" / f"{checkpoint_id}.json"
+            if not checkpoint_file.exists():
+                self._write_history(
+                    f"[bold red]Error:[/bold red] Checkpoint not found: {checkpoint_id}"
+                )
+                self._write_history(
+                    f"  Use [bold]/sessions {session_id}[/bold] to see available checkpoints"
+                )
+                return
+
+            # Display recover info
+            self._write_history(f"[bold cyan]⏪ Recovering session[/bold cyan] {session_id}")
+            self._write_history(f"   From checkpoint: [cyan]{checkpoint_id}[/cyan]")
+            self._write_history(
+                "   [dim](Checkpoint-based recovery for time-travel debugging)[/dim]"
+            )
+
+            # Check if already executing
+            if self._current_exec_id is not None:
+                self._write_history(
+                    "[bold yellow]Warning:[/bold yellow] An execution is already running"
+                )
+                self._write_history("  Wait for it to complete or use /pause first")
+                return
+
+            # Create session_state for checkpoint recovery
+            recover_session_state = {
+                "resume_from_checkpoint": checkpoint_id,
+            }
+
+            # Show indicator
+            indicator = self.query_one("#processing-indicator", Label)
+            indicator.update("Recovering from checkpoint...")
+            indicator.display = True
+
+            # Update placeholder
+            chat_input = self.query_one("#chat-input", Input)
+            chat_input.placeholder = "Commands: /pause, /sessions (agent recovering...)"
+
+            # Trigger execution with checkpoint recovery
+            try:
+                entry_points = self.runtime.get_entry_points()
+                if not entry_points:
+                    self._write_history("[bold red]Error:[/bold red] No entry points available")
+                    return
+
+                # Submit execution with checkpoint recovery state
+                future = asyncio.run_coroutine_threadsafe(
+                    self.runtime.trigger(
+                        entry_points[0].id,
+                        input_data={},
+                        session_state=recover_session_state,
+                    ),
+                    self._agent_loop,
+                )
+                exec_id = await asyncio.wrap_future(future)
+                self._current_exec_id = exec_id
+
+                self._write_history(
+                    f"[green]✓[/green] Recovery started (execution: {exec_id[:12]}...)"
+                )
+                self._write_history("  Agent is continuing from checkpoint...")
+
+            except Exception as e:
+                self._write_history(f"[bold red]Error starting recovery:[/bold red] {e}")
+                indicator.display = False
+                chat_input.placeholder = "Enter input for agent..."
+
+        except Exception as e:
+            self._write_history(f"[bold red]Error:[/bold red] {e}")
+            import traceback
+
+            self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
+
+    async def _cmd_pause(self) -> None:
+        """Immediately pause execution by cancelling task (same as Ctrl+Z)."""
+        # Check if there's a current execution
+        if not self._current_exec_id:
+            self._write_history("[bold yellow]No active execution to pause[/bold yellow]")
+            self._write_history("  Start an execution first, then use /pause during execution")
+            return
+
+        # Find and cancel the execution task - executor will catch and save state
+        task_cancelled = False
+        for stream in self.runtime._streams.values():
+            exec_id = self._current_exec_id
+            task = stream._execution_tasks.get(exec_id)
+            if task and not task.done():
+                task.cancel()
+                task_cancelled = True
+                self._write_history("[bold green]⏸ Execution paused - state saved[/bold green]")
+                self._write_history("  Resume later with: [bold]/resume[/bold]")
+                break
+
+        if not task_cancelled:
+            self._write_history("[bold yellow]Execution already completed[/bold yellow]")
+
    def on_mount(self) -> None:
-        """Add welcome message when widget mounts."""
+        """Add welcome message and check for resumable sessions."""
        history = self.query_one("#chat-history", RichLog)
-        history.write("[bold cyan]Chat REPL Ready[/bold cyan] — Type your input below\n")
+        history.write(
+            "[bold cyan]Chat REPL Ready[/bold cyan] — "
+            "Type your input or use [bold]/help[/bold] for commands\n"
+        )
+
+        # Auto-trigger resume/recover if CLI args provided
+        if self._resume_session:
+            if self._resume_checkpoint:
+                # Use /recover for checkpoint-based recovery
+                history.write(
+                    "\n[bold cyan]🔄 Auto-recovering from checkpoint "
+                    "(--resume-session + --checkpoint)[/bold cyan]"
+                )
+                self.call_later(self._cmd_recover, self._resume_session, self._resume_checkpoint)
+            else:
+                # Use /resume for session state resume
+                history.write(
+                    "\n[bold cyan]🔄 Auto-resuming session (--resume-session)[/bold cyan]"
+                )
+                self.call_later(self._cmd_resume, self._resume_session)
+            return  # Skip normal startup messages
+
+        # Check for resumable sessions
+        self._check_and_show_resumable_sessions()
+
+        history.write(
+            "[dim]Quick start: /sessions to see previous sessions, "
+            "/pause to pause execution[/dim]\n"
+        )
+
+    def _check_and_show_resumable_sessions(self) -> None:
+        """Check for non-terminated sessions and prompt user."""
+        try:
+            storage_path = self.runtime._storage.base_path
+            sessions_dir = storage_path / "sessions"
+
+            if not sessions_dir.exists():
+                return
+
+            # Find non-terminated sessions (paused, failed, cancelled, active)
+            resumable = []
+            session_dirs = sorted(
+                [d for d in sessions_dir.iterdir() if d.is_dir()],
+                key=lambda d: d.name,
+                reverse=True,  # Most recent first
+            )
+
+            import json
+
+            for session_dir in session_dirs[:5]:  # Check last 5 sessions
+                state_file = session_dir / "state.json"
+                if not state_file.exists():
+                    continue
+
+                try:
+                    with open(state_file) as f:
+                        state = json.load(f)
+
+                    status = state.get("status", "").lower()
+                    # Non-terminated statuses
+                    if status in ["paused", "failed", "cancelled", "active"]:
+                        resumable.append(
+                            {
+                                "session_id": session_dir.name,
+                                "status": status.upper(),
+                            }
+                        )
+                except Exception:
+                    continue
+
+            if resumable:
+                self._write_history("\n[bold yellow]⚠ Non-terminated sessions found:[/bold yellow]")
+                for i, session in enumerate(resumable[:3], 1):  # Show top 3
+                    status = session["status"]
+                    session_id = session["session_id"]
+
+                    # Color code status
+                    if status == "PAUSED":
+                        status_colored = f"[yellow]{status}[/yellow]"
+                    elif status == "FAILED":
+                        status_colored = f"[red]{status}[/red]"
+                    elif status == "CANCELLED":
+                        status_colored = f"[dim yellow]{status}[/dim yellow]"
+                    else:
+                        status_colored = f"[dim]{status}[/dim]"
+
+                    self._write_history(f"  {i}. {session_id[:32]}... [{status_colored}]")
+
+                self._write_history("\n[bold cyan]What would you like to do?[/bold cyan]")
+                self._write_history("  • Type [bold]/resume[/bold] to continue the latest session")
+                self._write_history(
+                    f"  • Type [bold]/resume {resumable[0]['session_id']}[/bold] "
+                    "for specific session"
+                )
+                self._write_history("  • Or just type your input to start a new session\n")
+
+        except Exception:
+            # Silently fail - don't block TUI startup
+            pass

    async def on_input_submitted(self, message: Input.Submitted) -> None:
        """Handle input submission — either start new execution or inject input."""
@@ -132,15 +719,21 @@ class ChatRepl(Vertical):
        if not user_input:
            return

+        # Handle commands (starting with /) - ALWAYS process commands first
+        # Commands work during execution, during client-facing input, anytime
+        if user_input.startswith("/"):
+            await self._handle_command(user_input)
+            message.input.value = ""
+            return
+
        # Client-facing input: route to the waiting node
        if self._waiting_for_input and self._input_node_id:
            self._write_history(f"[bold green]You:[/bold green] {user_input}")
            message.input.value = ""

-            # Disable input while agent processes the response
+            # Keep input enabled for commands (but change placeholder)
            chat_input = self.query_one("#chat-input", Input)
-            chat_input.disabled = True
-            chat_input.placeholder = "Enter input for agent..."
+            chat_input.placeholder = "Commands: /pause, /sessions (agent processing...)"
            self._waiting_for_input = False

            indicator = self.query_one("#processing-indicator", Label)
@@ -193,9 +786,9 @@ class ChatRepl(Vertical):
            indicator.update("Thinking...")
            indicator.display = True

-            # Disable input while the agent is working
+            # Keep input enabled for commands during execution
            chat_input = self.query_one("#chat-input", Input)
-            chat_input.disabled = True
+            chat_input.placeholder = "Commands available: /pause, /sessions, /help"

            # Submit execution to the dedicated agent loop so blocking
            # runtime code (LLM, MCP tools) never touches Textual's loop.
@@ -1,6 +1,6 @@
 [project]
 name = "framework"
-version = "0.1.0"
+version = "0.4.2"
 description = "Goal-driven agent runtime with Builder-friendly observability"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -0,0 +1,344 @@
+"""
+Regression tests for conditional edge direct key access (Issue #3599).
+
+Verifies that node outputs are written to memory before edge evaluation,
+enabling direct key access in conditional expressions (e.g., 'score > 80')
+instead of requiring output['score'] > 80 syntax.
+"""
+
+import pytest
+
+from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
+from framework.graph.executor import GraphExecutor
+from framework.graph.goal import Goal
+from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
+from framework.runtime.core import Runtime
+
+
+class SimpleRuntime(Runtime):
+    """Minimal runtime for testing."""
+
+    def start_run(self, **kwargs):
+        return "test-run"
+
+    def end_run(self, **kwargs):
+        pass
+
+    def report_problem(self, **kwargs):
+        pass
+
+    def decide(self, **kwargs):
+        return "test-decision"
+
+    def record_outcome(self, **kwargs):
+        pass
+
+    def set_node(self, **kwargs):
+        pass
+
+
+class ScoreNode(NodeProtocol):
+    """Node that outputs a score value."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        return NodeResult(success=True, output={"score": 85})
+
+
+class HighScoreNode(NodeProtocol):
+    """Consumer node for high scores."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        return NodeResult(success=True, output={"result": "high_score_path"})
+
+
+class MultiKeyNode(NodeProtocol):
+    """Node that outputs multiple keys."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        return NodeResult(success=True, output={"x": 100, "y": 50})
+
+
+class ConsumerNode(NodeProtocol):
+    """Generic consumer node."""
+
+    async def execute(self, ctx: NodeContext) -> NodeResult:
+        return NodeResult(success=True, output={"processed": True})
+
+
+@pytest.mark.asyncio
+async def test_direct_key_access_in_conditional_edge():
+    """
+    Verify direct key access works in conditional edges (e.g., 'score > 80').
+
+    This is the core regression test for issue #3599. Before the fix,
+    node outputs were only written to memory during input mapping (after
+    edge evaluation), causing NameError when edges tried to access keys directly.
+    """
+    goal = Goal(
+        id="test-direct-key",
+        name="Test Direct Key Access",
+        description="Test that direct key access works in conditional edges",
+    )
+
+    nodes = [
+        NodeSpec(
+            id="score_node",
+            name="ScoreNode",
+            description="Outputs a score",
+            node_type="function",
+            output_keys=["score"],
+        ),
+        NodeSpec(
+            id="high_score_node",
+            name="HighScoreNode",
+            description="Handles high scores",
+            node_type="function",
+            input_keys=["score"],
+            output_keys=["result"],
+        ),
+    ]
+
+    # Edge with DIRECT key access: 'score > 80' (not 'output["score"] > 80')
+    edges = [
+        EdgeSpec(
+            id="score_to_high",
+            source="score_node",
+            target="high_score_node",
+            condition=EdgeCondition.CONDITIONAL,
+            condition_expr="score > 80",  # Direct key access
+        )
+    ]
+
+    graph = GraphSpec(
+        id="test-graph",
+        goal_id="test-direct-key",
+        entry_node="score_node",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["high_score_node"],
+    )
+
+    runtime = SimpleRuntime(storage_path="/tmp/test")
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("score_node", ScoreNode())
+    executor.register_node("high_score_node", HighScoreNode())
+
+    result = await executor.execute(graph, goal, {})
+
+    # Verify the edge was followed (high_score_node executed)
+    assert result.success, "Execution should succeed"
+    assert "high_score_node" in result.path, (
+        f"Expected high_score_node in path. "
+        f"Condition 'score > 80' should evaluate to True (score=85). "
+        f"Path: {result.path}"
+    )
+
+
+@pytest.mark.asyncio
+async def test_backward_compatibility_output_syntax():
+    """
+    Verify backward compatibility: output['key'] syntax still works.
+
+    The fix should not break existing code that uses the explicit
+    output dictionary syntax in conditional expressions.
+    """
+    goal = Goal(
+        id="test-backward-compat",
+        name="Test Backward Compatibility",
+        description="Test that output['key'] syntax still works",
+    )
+
+    nodes = [
+        NodeSpec(
+            id="score_node",
+            name="ScoreNode",
+            description="Outputs a score",
+            node_type="function",
+            output_keys=["score"],
+        ),
+        NodeSpec(
+            id="consumer_node",
+            name="ConsumerNode",
+            description="Consumer",
+            node_type="function",
+            input_keys=["score"],
+            output_keys=["processed"],
+        ),
+    ]
+
+    # Edge with OLD syntax: output['score'] > 80
+    edges = [
+        EdgeSpec(
+            id="score_to_consumer",
+            source="score_node",
+            target="consumer_node",
+            condition=EdgeCondition.CONDITIONAL,
+            condition_expr="output['score'] > 80",  # Old explicit syntax
+        )
+    ]
+
+    graph = GraphSpec(
+        id="test-graph-compat",
+        goal_id="test-backward-compat",
+        entry_node="score_node",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["consumer_node"],
+    )
+
+    runtime = SimpleRuntime(storage_path="/tmp/test")
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("score_node", ScoreNode())
+    executor.register_node("consumer_node", ConsumerNode())
+
+    result = await executor.execute(graph, goal, {})
+
+    # Verify backward compatibility maintained
+    assert result.success, "Execution should succeed"
+    assert "consumer_node" in result.path, (
+        f"Expected consumer_node in path. "
+        f"Old syntax output['score'] > 80 should still work. "
+        f"Path: {result.path}"
+    )
+
+
+@pytest.mark.asyncio
+async def test_multiple_keys_in_expression():
+    """
+    Verify multiple direct keys work in complex expressions.
+
+    Tests that expressions like 'x > y and y < 100' work correctly
+    when both x and y are written to memory before edge evaluation.
+    """
+    goal = Goal(
+        id="test-multi-key",
+        name="Test Multiple Keys",
+        description="Test multiple keys in conditional expression",
+    )
+
+    nodes = [
+        NodeSpec(
+            id="multi_key_node",
+            name="MultiKeyNode",
+            description="Outputs multiple keys",
+            node_type="function",
+            output_keys=["x", "y"],
+        ),
+        NodeSpec(
+            id="consumer_node",
+            name="ConsumerNode",
+            description="Consumer",
+            node_type="function",
+            input_keys=["x", "y"],
+            output_keys=["processed"],
+        ),
+    ]
+
+    # Complex expression with multiple direct keys
+    edges = [
+        EdgeSpec(
+            id="multi_to_consumer",
+            source="multi_key_node",
+            target="consumer_node",
+            condition=EdgeCondition.CONDITIONAL,
+            condition_expr="x > y and y < 100",  # Multiple keys
+        )
+    ]
+
+    graph = GraphSpec(
+        id="test-graph-multi",
+        goal_id="test-multi-key",
+        entry_node="multi_key_node",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["consumer_node"],
+    )
+
+    runtime = SimpleRuntime(storage_path="/tmp/test")
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("multi_key_node", MultiKeyNode())
+    executor.register_node("consumer_node", ConsumerNode())
+
+    result = await executor.execute(graph, goal, {})
+
+    # Verify multiple keys work correctly
+    assert result.success, "Execution should succeed"
+    assert "consumer_node" in result.path, (
+        f"Expected consumer_node in path. "
+        f"Condition 'x > y and y < 100' should be True (x=100, y=50). "
+        f"Path: {result.path}"
+    )
+
+
+@pytest.mark.asyncio
+async def test_negative_case_condition_false():
+    """
+    Verify conditions correctly evaluate to False when not met.
+
+    Tests that when a condition fails, the edge is NOT followed
+    and execution doesn't proceed to the target node.
+    """
+    goal = Goal(
+        id="test-negative",
+        name="Test Negative Case",
+        description="Test condition evaluates to False correctly",
+    )
+
+    class LowScoreNode(NodeProtocol):
+        """Node that outputs a LOW score."""
+
+        async def execute(self, ctx: NodeContext) -> NodeResult:
+            return NodeResult(success=True, output={"score": 30})
+
+    nodes = [
+        NodeSpec(
+            id="low_score_node",
+            name="LowScoreNode",
+            description="Outputs low score",
+            node_type="function",
+            output_keys=["score"],
+        ),
+        NodeSpec(
+            id="high_score_handler",
+            name="HighScoreHandler",
+            description="Should NOT execute",
+            node_type="function",
+            input_keys=["score"],
+            output_keys=["result"],
+        ),
+    ]
+
+    # Condition should be FALSE (30 is not > 80)
+    edges = [
+        EdgeSpec(
+            id="low_to_high",
+            source="low_score_node",
+            target="high_score_handler",
+            condition=EdgeCondition.CONDITIONAL,
+            condition_expr="score > 80",  # Should be False
+        )
+    ]
+
+    graph = GraphSpec(
+        id="test-graph-negative",
+        goal_id="test-negative",
+        entry_node="low_score_node",
+        nodes=nodes,
+        edges=edges,
+        terminal_nodes=["high_score_handler"],
+    )
+
+    runtime = SimpleRuntime(storage_path="/tmp/test")
+    executor = GraphExecutor(runtime=runtime)
+    executor.register_node("low_score_node", LowScoreNode())
+    executor.register_node("high_score_handler", HighScoreNode())
+
+    result = await executor.execute(graph, goal, {})
+
+    # Verify condition correctly evaluated to False
+    assert result.success, "Execution should succeed"
+    assert "high_score_handler" not in result.path, (
+        f"high_score_handler should NOT be in path. "
+        f"Condition 'score > 80' should be False (score=30). "
+        f"Path: {result.path}"
+    )
@@ -11,6 +11,7 @@ from pathlib import Path

 import pytest

+from framework.observability import clear_trace_context, set_trace_context
 from framework.runtime.runtime_log_schemas import (
    NodeDetail,
    NodeStepLog,
@@ -464,6 +465,114 @@ class TestRuntimeLogger:
        assert tool_logs.steps[0].verdict == "RETRY"
        assert tool_logs.steps[2].verdict == "ACCEPT"

+    @pytest.mark.asyncio
+    async def test_trace_context_populated_in_l1_l2_l3(self, tmp_path: Path):
+        """With trace context set, L3/L2/L1 entries include trace_id, span_id, execution_id."""
+        set_trace_context(
+            trace_id="a1b2c3d4e5f6789012345678abcdef01",
+            execution_id="b2c3d4e5f6789012345678abcdef0123",
+        )
+        try:
+            store = RuntimeLogStore(tmp_path / "logs")
+            rl = RuntimeLogger(store=store, agent_id="test-agent")
+            run_id = rl.start_run("goal-1")
+
+            rl.log_step(
+                node_id="node-1",
+                node_type="event_loop",
+                step_index=0,
+                llm_text="Step.",
+                input_tokens=10,
+                output_tokens=5,
+            )
+            rl.log_node_complete(
+                node_id="node-1",
+                node_name="Search",
+                node_type="event_loop",
+                success=True,
+                exit_status="success",
+            )
+            await rl.end_run(
+                status="success",
+                duration_ms=100,
+                node_path=["node-1"],
+                execution_quality="clean",
+            )
+
+            # L3: tool_logs
+            tool_logs = await store.load_tool_logs(run_id)
+            assert tool_logs is not None
+            assert len(tool_logs.steps) == 1
+            step = tool_logs.steps[0]
+            assert step.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert step.execution_id == "b2c3d4e5f6789012345678abcdef0123"
+            assert len(step.span_id) == 16
+            assert all(c in "0123456789abcdef" for c in step.span_id)
+
+            # L2: details
+            details = await store.load_details(run_id)
+            assert details is not None
+            assert len(details.nodes) == 1
+            nd = details.nodes[0]
+            assert nd.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert len(nd.span_id) == 16
+
+            # L1: summary
+            summary = await store.load_summary(run_id)
+            assert summary is not None
+            assert summary.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
+            assert summary.execution_id == "b2c3d4e5f6789012345678abcdef0123"
+        finally:
+            clear_trace_context()
+
+    @pytest.mark.asyncio
+    async def test_trace_context_empty_when_not_set(self, tmp_path: Path):
+        """Without trace context, L3/L2/L1 trace_id and execution_id are empty."""
+        clear_trace_context()
+        store = RuntimeLogStore(tmp_path / "logs")
+        rl = RuntimeLogger(store=store, agent_id="test-agent")
+        run_id = rl.start_run("goal-1")
+
+        rl.log_step(
+            node_id="node-1",
+            node_type="event_loop",
+            step_index=0,
+            llm_text="Step.",
+            input_tokens=10,
+            output_tokens=5,
+        )
+        rl.log_node_complete(
+            node_id="node-1",
+            node_name="Search",
+            node_type="event_loop",
+            success=True,
+            exit_status="success",
+        )
+        await rl.end_run(
+            status="success",
+            duration_ms=100,
+            node_path=["node-1"],
+            execution_quality="clean",
+        )
+
+        # L3: trace_id and execution_id from context should be empty
+        tool_logs = await store.load_tool_logs(run_id)
+        assert tool_logs is not None
+        assert len(tool_logs.steps) == 1
+        assert tool_logs.steps[0].trace_id == ""
+        assert tool_logs.steps[0].execution_id == ""
+
+        # L2
+        details = await store.load_details(run_id)
+        assert details is not None
+        assert details.nodes[0].trace_id == ""
+
+        # L1
+        summary = await store.load_summary(run_id)
+        assert summary is not None
+        assert summary.trace_id == ""
+        assert summary.execution_id == ""
+
    @pytest.mark.asyncio
    async def test_multi_node_lifecycle(self, tmp_path: Path):
        """Test logging across multiple nodes in a graph run."""
@@ -5,12 +5,31 @@ Aden Hive is a Python-based agent framework. Configuration is handled through en
 ## Configuration Overview

 ```
-Environment variables     (API keys, runtime flags)
-Agent config.py           (per-agent settings: model, tools, storage)
-pyproject.toml            (package metadata and dependencies)
-.mcp.json                 (MCP server connections)
+~/.hive/configuration.json  (global defaults: provider, model, max_tokens)
+Environment variables        (API keys, runtime flags)
+Agent config.py              (per-agent settings: model, tools, storage)
+pyproject.toml               (package metadata and dependencies)
+.mcp.json                    (MCP server connections)
 ```

+## Global Configuration (~/.hive/configuration.json)
+
+The `quickstart.sh` script creates this file during setup. It stores the default LLM provider, model, and max_tokens used by all agents unless overridden in an agent's own `config.py`.
+
+```json
+{
+  "llm": {
+    "provider": "anthropic",
+    "model": "claude-sonnet-4-5-20250929",
+    "max_tokens": 8192,
+    "api_key_env_var": "ANTHROPIC_API_KEY"
+  },
+  "created_at": "2026-01-15T12:00:00+00:00"
+}
+```
+
+The default `max_tokens` value (8192) is defined as `DEFAULT_MAX_TOKENS` in `framework.graph.edge` and re-exported from `framework.graph`. Each agent's `RuntimeConfig` reads from this file at startup. To change defaults, either re-run `quickstart.sh` or edit the file directly.
+
 ## Environment Variables

 ### LLM Providers (at least one required for real execution)
@@ -61,14 +80,16 @@ Each agent package in `exports/` contains its own `config.py`:
 ```python
 # exports/my_agent/config.py
 CONFIG = {
-    "model": "claude-haiku-4-5-20251001",  # Default LLM model
-    "max_tokens": 4096,
+    "model": "anthropic/claude-sonnet-4-5-20250929",  # Default LLM model
+    "max_tokens": 8192,  # default: DEFAULT_MAX_TOKENS from framework.graph
    "temperature": 0.7,
    "tools": ["web_search", "pdf_read"],   # MCP tools to enable
    "storage_path": "/tmp/my_agent",       # Runtime data location
 }
 ```

+If `model` or `max_tokens` are omitted, the agent loads defaults from `~/.hive/configuration.json`.
+
 ### Agent Graph Specification

 Agent behavior is defined in `agent.json` (or constructed in `agent.py`):
@@ -96,14 +117,14 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
 {
  "mcpServers": {
    "agent-builder": {
-      "command": "core/.venv/bin/python",
-      "args": ["-m", "framework.mcp.agent_builder_server"],
-      "cwd": "."
+      "command": "uv",
+      "args": ["run", "-m", "framework.mcp.agent_builder_server"],
+      "cwd": "core"
    },
    "tools": {
-      "command": "tools/.venv/bin/python",
-      "args": ["-m", "aden_tools.mcp_server", "--stdio"],
-      "cwd": "."
+      "command": "uv",
+      "args": ["run", "mcp_server.py", "--stdio"],
+      "cwd": "tools"
    }
  }
 }
@@ -152,7 +173,7 @@ Add to `.vscode/settings.json`:

 1. **Never commit API keys** - Use environment variables or `.env` files
 2. **`.env` is git-ignored** - Copy `.env.example` to `.env` at the project root and fill in your values
-3. **Mock mode for testing** - Set `MOCK_MODE=1` to avoid LLM calls during development
+3. **Use real provider keys in non-production environments** - validate configuration with low-risk inputs before production rollout
 4. **Credential isolation** - Each tool validates its own credentials at runtime

 ## Troubleshooting
@@ -107,6 +107,15 @@ This installs agent-related Claude Code skills:
 - `/hive-patterns` - Best practices and design patterns
 - `/hive-test` - Test and validate agents

+### Cursor IDE Support
+
+Skills are also available in Cursor. To enable:
+
+1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
+2. Run `MCP: Enable` to enable MCP servers
+3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
+4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
+
 ### Verify Setup

 ```bash
@@ -154,10 +163,12 @@ hive/                                    # Repository root
 │   │   ├── llm/                         # LLM provider integrations (Anthropic, OpenAI, etc.)
 │   │   ├── mcp/                         # MCP server integration
 │   │   ├── runner/                      # AgentRunner - loads and runs agents
+|   |   ├── observability/               # Structured logging - human-readable and machine-parseable tracing
 │   │   ├── runtime/                     # Runtime environment
 │   │   ├── schemas/                     # Data schemas
 │   │   ├── storage/                     # File-based persistence
 │   │   ├── testing/                     # Testing utilities
+│   │   ├── tui/                         # Terminal UI dashboard
 │   │   └── __init__.py
 │   ├── pyproject.toml                   # Package metadata and dependencies
 │   ├── README.md                        # Framework documentation
@@ -180,6 +191,9 @@ hive/                                    # Repository root
 ├── exports/                             # AGENT PACKAGES (user-created, gitignored)
 │   └── your_agent_name/                 # Created via /hive-create
 │
+├── examples/                            # Example agents
+│   └── templates/                       # Pre-built template agents
+│
 ├── docs/                                # Documentation
 │   ├── getting-started.md               # Quick start guide
 │   ├── configuration.md                 # Configuration reference
@@ -196,7 +210,7 @@ hive/                                    # Repository root
 ├── CONTRIBUTING.md                      # Contribution guidelines
 ├── CHANGELOG.md                         # Version history
 ├── LICENSE                              # Apache 2.0 License
-├── CODE_OF_CONDUCT.md                   # Community guidelines
+├── docs/CODE_OF_CONDUCT.md              # Community guidelines
 └── SECURITY.md                          # Security policy
 ```

@@ -287,22 +301,19 @@ If you prefer to build agents manually:
 ### Running Agents

 ```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m agent_name validate
+# Browse and run agents interactively (Recommended)
+hive tui

-# Show agent information
-PYTHONPATH=exports uv run python -m agent_name info
+# Run a specific agent
+hive run exports/my_agent --input '{"ticket_content": "My login is broken", "customer_id": "CUST-123"}'

-# Run agent with input
-PYTHONPATH=exports uv run python -m agent_name run --input '{
-  "ticket_content": "My login is broken",
-  "customer_id": "CUST-123"
-}'
+# Run with TUI dashboard
+hive run exports/my_agent --tui

-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
 ```

+> **Using Python directly:** `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
+
 ---

 ## Testing Agents
@@ -615,16 +626,10 @@ echo 'ANTHROPIC_API_KEY=your-key-here' >> .env

 ### Debugging Agent Execution

-```python
-# Add debug logging to your agent
-import logging
-logging.basicConfig(level=logging.DEBUG)
-
+```bash
 # Run with verbose output
-PYTHONPATH=exports uv run python -m agent_name run --input '{...}' --verbose
+hive run exports/my_agent --verbose --input '{"task": "..."}'

-# Use mock mode to test without LLM calls
-PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
 ```

 ---
@@ -18,6 +18,8 @@ This will:
 - Check Python version (requires 3.11+)
 - Install the core framework package (`framework`)
 - Install the tools package (`aden_tools`)
+- Initialize encrypted credential store (`~/.hive/credentials`)
+- Configure default LLM provider
 - Fix package compatibility issues (openai + litellm)
 - Verify all installations

@@ -110,23 +112,38 @@ uv run python -c "import litellm; print('✓ litellm OK')"
 - Internet connection (for LLM API calls)
 - For Windows users: WSL 2 is recommended for full compatibility.

-### API Keys (Optional)
+### API Keys

-For running agents with real LLMs:
-
-```bash
-export ANTHROPIC_API_KEY="your-key-here"
-```
-
-Windows (PowerShell):
-
-```powershell
-$env:ANTHROPIC_API_KEY="your-key-here"
-```
+We recommend using quickstart.sh for LLM API credential setup and /hive-credentials for the tools credentials

 ## Running Agents

-All agent commands must be run from the project root with `PYTHONPATH` set:
+The `hive` CLI is the primary interface for running agents:
+
+```bash
+# Browse and run agents interactively (Recommended)
+hive tui
+
+# Run a specific agent
+hive run exports/my_agent --input '{"task": "Your input here"}'
+
+# Run with TUI dashboard
+hive run exports/my_agent --tui
+```
+
+### CLI Command Reference
+
+| Command | Description |
+|---------|-------------|
+| `hive tui` | Browse agents and launch TUI dashboard |
+| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
+| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
+| `hive info <path>` | Show agent details |
+| `hive validate <path>` | Validate agent structure |
+| `hive list [dir]` | List available agents |
+| `hive dispatch [dir]` | Multi-agent orchestration |
+
+### Using Python directly (alternative)

 ```bash
 # From /hive/ directory
@@ -140,24 +157,6 @@ $env:PYTHONPATH="core;exports"
 python -m agent_name COMMAND
 ```

-### Example: Support Ticket Agent
-
-```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m your_agent_name validate
-
-# Show agent information
-PYTHONPATH=exports uv run python -m your_agent_name info
-
-# Run agent with input
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{
-  "task": "Your input here"
-}'
-
-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m your_agent_name run --mock --input '{...}'
-```
-
 ## Building New Agents and Run Flow

 Build and run an agent using Claude Code CLI with the agent building skills:
@@ -176,6 +175,15 @@ This verifies agent-related Claude Code skills are available:
 - `/hive-patterns` - Best practices
 - `/hive-test` - Test and validate agents

+### Cursor IDE Support
+
+Skills are also available in Cursor. To enable:
+
+1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
+2. Run `MCP: Enable` to enable MCP servers
+3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
+4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
+
 ### 2. Build an Agent

 ```
@@ -353,13 +361,18 @@ hive/
 │   ├── .venv/              # Created by quickstart.sh
 │   └── pyproject.toml
 │
-└── exports/                 # Agent packages (user-created, gitignored)
-    └── your_agent_name/     # Created via /hive-create
+├── exports/                 # Agent packages (user-created, gitignored)
+│   └── your_agent_name/     # Created via /hive-create
+│
+└── examples/
+    └── templates/           # Pre-built template agents
 ```

 ## Separate Virtual Environments

-The project uses **separate virtual environments** for `core` and `tools` packages to:
+Hive primarily uses **uv** to create and manage separate virtual environments for `core` and `tools`.
+
+The project uses separate virtual environments to:

 - Isolate dependencies and avoid conflicts
 - Allow independent development and testing of each package
@@ -367,11 +380,18 @@ The project uses **separate virtual environments** for `core` and `tools` packag

 ### How It Works

-When you run `./quickstart.sh` or `uv sync` in each directory:
+When you run `./quickstart.sh`, `uv` sets up:

 1. **core/.venv/** - Contains the `framework` package and its dependencies (anthropic, litellm, mcp, etc.)
 2. **tools/.venv/** - Contains the `aden_tools` package and its dependencies (beautifulsoup4, pandas, etc.)

+If you need to refresh environments manually, use `uv`:
+
+```bash
+cd core && uv sync
+cd ../tools && uv sync
+```
+
 ### Cross-Package Imports

 The `core` and `tools` packages are **intentionally independent**:
@@ -380,38 +400,34 @@ The `core` and `tools` packages are **intentionally independent**:
 - **Communication via MCP**: Tools are exposed to agents through MCP servers, not direct Python imports
 - **Runtime integration**: The agent runner loads tools via the MCP protocol at runtime

-If you need to use both packages in a single script (e.g., for testing), you have two options:
+If you need to use both packages in a single script (e.g., for testing), prefer `uv run` with `PYTHONPATH`:

 ```bash
-# Option 1: Install both in a shared environment
-uv venv
-source .venv/bin/activate
-uv pip install -e core/ -e tools/
-
-# Option 2: Use PYTHONPATH (for quick testing)
 PYTHONPATH=tools/src uv run python your_script.py
 ```

 ### MCP Server Configuration

-The `.mcp.json` at project root configures MCP servers to use their respective virtual environments:
+The `.mcp.json` at project root configures MCP servers to run through `uv run` in each package directory:

 ```json
 {
  "mcpServers": {
    "agent-builder": {
-      "command": "core/.venv/bin/python",
-      "args": ["-m", "framework.mcp.agent_builder_server"]
+      "command": "uv",
+      "args": ["run", "-m", "framework.mcp.agent_builder_server"],
+      "cwd": "core"
    },
    "tools": {
-      "command": "tools/.venv/bin/python",
-      "args": ["-m", "aden_tools.mcp_server", "--stdio"]
+      "command": "uv",
+      "args": ["run", "mcp_server.py", "--stdio"],
+      "cwd": "tools"
    }
  }
 }
 ```

-This ensures each MCP server runs with its correct dependencies.
+This ensures each MCP server runs with the correct project environment managed by `uv`.

 ### Why PYTHONPATH is Required

@@ -456,7 +472,11 @@ claude> /hive-test
 ### 5. Run Agent

 ```bash
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
+# Interactive dashboard
+hive tui
+
+# Or run directly
+hive run exports/your_agent_name --input '{"task": "..."}'
 ```

 ## IDE Setup
@@ -88,20 +88,24 @@ hive/
 │   │   ├── runtime/        # Runtime environment
 │   │   ├── schemas/        # Data schemas
 │   │   ├── storage/        # File-based persistence
-│   │   └── testing/        # Testing utilities
+│   │   ├── testing/        # Testing utilities
+│   │   └── tui/            # Terminal UI dashboard
 │   └── pyproject.toml      # Package metadata
 │
 ├── tools/                  # MCP Tools Package
+│   ├── mcp_server.py       # MCP server entry point
 │   └── src/aden_tools/     # Tools for agent capabilities
-│       ├── tools/          # Individual tool implementations
-│       │   ├── web_search_tool/
-│       │   ├── web_scrape_tool/
-│       │   └── file_system_toolkits/
-│       └── mcp_server.py   # HTTP MCP server
+│       └── tools/          # Individual tool implementations
+│           ├── web_search_tool/
+│           ├── web_scrape_tool/
+│           └── file_system_toolkits/
 │
 ├── exports/                # Agent Packages (user-generated, not in repo)
 │   └── your_agent/         # Your agents created via /hive
 │
+├── examples/
+│   └── templates/          # Pre-built template agents
+│
 ├── .claude/                # Claude Code Skills
 │   └── skills/
 │       ├── hive/
@@ -116,19 +120,15 @@ hive/
 ## Running an Agent

 ```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m my_agent validate
+# Browse and run agents interactively (Recommended)
+hive tui

-# Show agent information
-PYTHONPATH=exports uv run python -m my_agent info
+# Run a specific agent
+hive run exports/my_agent --input '{"task": "Your input here"}'

-# Run agent with input
-PYTHONPATH=exports uv run python -m my_agent run --input '{
-  "task": "Your input here"
-}'
+# Run with TUI dashboard
+hive run exports/my_agent --tui

-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
 ```

 ## API Keys Setup
@@ -164,11 +164,12 @@ PYTHONPATH=exports uv run python -m my_agent test --type success

 ## Next Steps

-1. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
-2. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
-3. **Build Agents**: Use `/hive` skill in Claude Code
-4. **Custom Tools**: Learn to integrate MCP servers
-5. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
+1. **TUI Dashboard**: Run `hive tui` to explore agents interactively
+2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
+3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
+4. **Build Agents**: Use `/hive` skill in Claude Code
+5. **Custom Tools**: Learn to integrate MCP servers
+6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)

 ## Troubleshooting

@@ -194,8 +195,6 @@ uv pip install -e .
 # Verify API key is set
 echo $ANTHROPIC_API_KEY

-# Run in mock mode to test without API
-PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
 ```

 ### Package Installation Issues
@@ -0,0 +1,49 @@
+# Evolution
+
+## Evolution Is the Mechanism; Adaptiveness Is the Result
+
+Agents don't just fail; they fail inevitably. Real-world variables—private LinkedIn profiles, shifting API schemas, or LLM hallucinations—are impossible to predict in a vacuum. The first version of any agent is merely a "happy path" draft.
+
+Evolution is how Hive handles this. When an agent fails, the framework captures what went wrong — which node failed, which success criteria weren't met, what the agent tried and why it didn't work. Then a coding agent (Claude Code, Cursor, or similar) uses that failure data to generate an improved version of the agent. The new version gets deployed, runs, encounters new edge cases, and the cycle continues.
+
+Over generations, the agent gets more reliable. Not because someone sat down and anticipated every possible failure, but because each failure teaches the next version something specific.
+
+## How It Works
+
+The evolution loop has four stages:
+
+**1. Execute** — The worker agent runs against real inputs. Sessions produce outcomes, decisions, and metrics.
+
+**2. Evaluate** — The framework checks outcomes against the goal's success criteria and constraints. Did the agent produce the desired result? Which criteria were satisfied and which weren't? Were any constraints violated?
+
+**3. Diagnose** — Failure data is structured and specific. It's not just "the agent failed" — it's "node `draft_message` failed to produce personalized content because the research node returned insufficient data about the prospect's recent activity." The decision log, problem reports, and execution trace provide the full picture.
+
+**4. Regenerate** — A coding agent receives the diagnosis and the current agent code. It modifies the graph — adding nodes, adjusting prompts, changing edge conditions, adding tools — to address the specific failure. The new version is deployed and the cycle restarts.
+
+## Adaptiveness ≠ Intelligence or Intent
+
+An important distinction: evolution makes agents more adaptive, but not more intelligent in any general sense. The agent isn't learning to reason better — it's being rewritten to handle more situations correctly.
+
+This is closer to how biological evolution works than how learning works. A species doesn't "learn" to survive winter — individuals that happen to have thicker fur survive, and that trait gets selected for. Similarly, agent versions that handle more edge cases correctly survive in production, and the patterns that made them successful get carried forward.
+
+The practical implication: don't expect evolution to make an agent smarter about problems it's never seen. Evolution improves reliability on the *kinds* of problems the agent has already encountered. For genuinely novel situations, that's what human-in-the-loop is for — and every time a human steps in, that interaction becomes potential fuel for the next evolution cycle.
+
+## What Gets Evolved
+
+Evolution can change almost anything about an agent:
+
+**Prompts** — The most common fix. A node's system prompt gets refined based on the specific ways the LLM misunderstood its instructions.
+
+**Graph structure** — Adding a validation node before a critical step, splitting a node that's trying to do too much, adding a fallback path for a common failure mode.
+
+**Edge conditions** — Adjusting routing logic based on observed patterns. If low-confidence research results consistently lead to bad drafts, add a conditional edge that routes them back for another research pass.
+
+**Tool selection** — Swapping in a better tool, adding a new one, or removing one that causes more problems than it solves.
+
+**Constraints and criteria** — Tightening or loosening based on what's actually achievable and what matters in practice.
+
+## The Role of Decision Logging
+
+Evolution depends on good data. The runtime captures every decision an agent makes: what it was trying to do, what options it considered, what it chose, and what happened as a result. This isn't overhead — it's the signal that makes evolution possible.
+
+Without decision logging, failure analysis is guesswork. With it, the coding agent can trace a failure back to its root cause and make a targeted fix rather than a blind change.
@@ -0,0 +1,101 @@
+# Goals & Outcome-Driven Development
+
+## The Core Idea
+
+Business processes are outcome-driven. A sales team doesn't follow a rigid script — they adapt their approach until the deal closes. A support agent doesn't execute a flowchart — they resolve the customer's issue. The outcome is what matters, not the specific steps taken to get there.
+
+Hive is built on this principle. Instead of hardcoding agent workflows step by step, you define the outcome you want, and the framework figures out how to get there. We call this **Outcome-Driven Development (ODD)**.
+
+## Task-Driven vs Goal-Driven vs Outcome-Driven
+
+These three paradigms represent different levels of abstraction for building agents:
+
+**Task-Driven Development (TDD)** asks: *"Is the code correct?"*
+
+You define explicit steps. The agent follows them. Success means the steps ran without errors. The problem: an agent can execute every step perfectly and still produce a useless result. The steps become the goal, not the actual outcome.
+
+**Goal-Driven Development (GDD)** asks: *"Are we solving the right problem?"*
+
+You define what you want to achieve. The agent plans and executes toward that goal. Better than TDD because it captures intent. But goals can be vague — "improve customer satisfaction" doesn't tell you when you're done.
+
+**Outcome-Driven Development (ODD)** asks: *"Did the system produce the desired result?"*
+
+You define measurable success criteria, hard constraints, and the context the agent needs. The agent is evaluated against the actual outcome, not whether it followed the right steps or aimed at the right goal. This is what Hive implements.
+
+## Goals as First-Class Citizens
+
+In Hive, a `Goal` is not a string description. It's a structured object with three components:
+
+### Success Criteria
+
+Each goal has weighted success criteria that define what "done" looks like. These aren't binary pass/fail checks — they're multi-dimensional measures of quality.
+
+```python
+Goal(
+    id="twitter-outreach",
+    name="Personalized Twitter Outreach",
+    success_criteria=[
+        SuccessCriterion(
+            id="personalized",
+            description="Messages reference specific details from the prospect's profile",
+            metric="llm_judge",
+            weight=0.4
+        ),
+        SuccessCriterion(
+            id="compliant",
+            description="Messages follow brand voice guidelines",
+            metric="llm_judge",
+            weight=0.3
+        ),
+        SuccessCriterion(
+            id="actionable",
+            description="Each message includes a clear call to action",
+            metric="output_contains",
+            target="CTA",
+            weight=0.3
+        ),
+    ],
+    ...
+)
+```
+
+Metrics can be `output_contains`, `output_equals`, `llm_judge`, or `custom`. Weights let you express what matters most — a perfectly compliant message that isn't personalized still falls short.
+
+### Constraints
+
+Constraints define what must **not** happen. They're the guardrails.
+
+```python
+constraints=[
+    Constraint(
+        id="no_spam",
+        description="Never send more than 3 messages to the same person per week",
+        constraint_type="hard",    # Violation = immediate escalation
+        category="safety"
+    ),
+    Constraint(
+        id="budget_limit",
+        description="Total LLM cost must not exceed $5 per run",
+        constraint_type="soft",    # Violation = warning, not a hard stop
+        category="cost"
+    ),
+]
+```
+
+Hard constraints are non-negotiable — violating one triggers escalation or failure. Soft constraints are preferences that the agent should respect but can bend when necessary. Constraint categories include `time`, `cost`, `safety`, `scope`, and `quality`.
+
+### Context
+
+Goals carry context — domain knowledge, preferences, background information that the agent needs to make good decisions. This context is injected into every LLM call the agent makes, so the agent is always reasoning with the full picture.
+
+## Why This Matters
+
+When you define goals with weighted criteria and constraints, three things happen:
+
+1. **The agent can self-correct.** Goals are injected into every LLM call, so the agent is always reasoning against its success criteria. Within a [graph execution](./graph.md), nodes use these criteria to decide whether to accept their output, retry, or escalate — self-correction in real time.
+
+2. **Evolution has a target.** When an agent fails, the framework knows *which criteria* it fell short on, which gives the coding agent specific information to improve the next generation (see [Evolution](./evolution.md)).
+
+3. **Humans stay in control.** Constraints define the boundaries. The agent has freedom to find creative solutions within those boundaries, but it can't cross the lines you've drawn.
+
+The goal lifecycle flows through `DRAFT → READY → ACTIVE → COMPLETED / FAILED / SUSPENDED`, giving you visibility into where each objective stands at any point during execution.
@@ -0,0 +1,78 @@
+# The Agent Graph
+
+## Why a Graph
+
+Real business processes aren't linear. A sales outreach might go: research a prospect, draft a message, realize the research is thin, go back and dig deeper, draft again, get human approval, send. There are loops, branches, fallbacks, and decision points.
+
+Hive models this as a directed graph. Nodes do work, edges connect them, and shared memory lets them pass data. The framework walks this structure — running nodes, following edges, managing retries — until the agent reaches its goal or exhausts its step budget.
+
+Edges can loop back, creating feedback cycles where an agent retries a step or takes a different path. That's intentional. A graph that only moves forward can't self-correct.
+
+## Nodes
+
+A node is a unit of work. Each node reads inputs from shared memory, does something, and writes outputs back. There are a handful of node types, each suited to a different kind of work:
+
+**`event_loop`** — The workhorse. This is a multi-turn LLM loop: the model reasons about the current state, calls tools, observes results, and keeps going until it has produced the required outputs. Most of the interesting agent behavior happens in these nodes. They handle long-running tasks, manage their own context window, and can recover from crashes mid-conversation.
+
+**`function`** — A plain Python function. No LLM involved. Use these for anything deterministic: data transformation, API calls with known parameters, validation logic, or any step where you don't want a language model making judgment calls.
+
+**`router`** — A decision point that directs execution down different paths. Can be rule-based ("if confidence is high, go left; otherwise, go right") or LLM-powered ("given the goal and what we know so far, which path makes sense?").
+
+**`human_input`** — A pause point where the agent stops and asks a human for input before continuing. See [Human-in-the-Loop](#human-in-the-loop) below.
+
+There are also simpler LLM node types (`llm_tool_use` for a single LLM call with tools, `llm_generate` for pure text generation) for steps that don't need the full event loop.
+
+### Self-Correction Within a Node
+
+The most important behavior in an `event_loop` node is the ability to self-correct. After each iteration, the node evaluates its own output: did it produce what was needed? If yes, it's done. If not, it tries again — but this time it sees what went wrong and adjusts.
+
+This is the **reflexion pattern**: try, evaluate, learn from the result, try again. It's cheaper and more effective than starting over. An agent that takes three attempts to get something right is still more useful than one that fails on the first try and gives up.
+
+Within a single node, the outcomes are:
+
+- **Accept** — Output meets the bar. Move on.
+- **Retry** — Not good enough, but recoverable. Try again with feedback.
+- **Escalate** — Something is fundamentally broken. Hand off to error handling.
+
+This is self-correction *within a session* — the agent adapting in real time. It's different from [evolution](./evolution.md), which improves the agent *across sessions* by rewriting its code between generations. Both matter: reflexion handles the bumps in a single run, evolution handles the patterns that keep recurring across many runs.
+
+## Edges
+
+Edges control flow between nodes. Each edge has a condition:
+
+- **On success** — follow this edge if the source node succeeded
+- **On failure** — follow if the source failed (this is how you wire up fallback paths and error recovery)
+- **Conditional** — follow if an expression is true (e.g., route high-confidence results one way, low-confidence results another)
+- **LLM-decided** — let the LLM choose which path based on the [goal](./goals_outcome.md) and current context
+
+Edges also handle data plumbing between nodes — mapping one node's outputs to another node's expected inputs, so each node has a clean interface without needing to know where its data came from.
+
+When a node has multiple outgoing edges, the framework can run those branches in parallel and reconverge when they're all done. This is useful for tasks like researching a prospect from multiple sources simultaneously.
+
+## Shared Memory
+
+Shared memory is how nodes communicate. It's a key-value store scoped to a single [session](./worker_agent.md). Every node declares which keys it reads and which it writes, and the framework enforces those boundaries — a node can't quietly access data it hasn't declared.
+
+Data flows through the graph in a natural way: input arrives at the start, each node reads what it needs and writes what it produces, and edges map outputs to inputs as data moves between nodes. At the end, the full memory state is the execution result.
+
+## Human-in-the-Loop
+
+Human-in-the-loop (HITL) nodes are where the agent pauses and asks a person for input. This isn't a blunt "stop everything" — the framework supports structured questions: open-ended text, multiple choice, yes/no approvals, and multi-field forms.
+
+When the agent hits a HITL node, it saves its entire state and presents the questions. The session can sit paused for minutes, hours, or days. When the human responds, execution picks up exactly where it left off.
+
+This is what makes Hive agents supervisable in production. You place HITL nodes at critical decision points — before sending a message, before making a purchase, before any action that's hard to undo. The agent handles the routine work autonomously; humans weigh in on the decisions that matter. And every time a human provides input, that decision becomes data the [evolution](./evolution.md) process can learn from.
+
+## The Shape of an Agent
+
+A typical agent graph looks something like this:
+
+```
+intake → research → draft → [human review] → send → done
+                ↑                                 |
+                └──── on failure ─────────────────┘
+```
+
+An entry node where work begins. A chain of nodes that do the real work. HITL nodes at approval gates. Failure edges that loop back for another attempt. Terminal nodes where execution ends.
+
+The framework tracks everything as it walks the graph: which nodes ran, how many retries each needed, how much the LLM calls cost, how long each step took. This metadata feeds into the [worker agent runtime](./worker_agent.md) for monitoring and into the [evolution](./evolution.md) process for improvement.
@@ -0,0 +1,51 @@
+# The Worker Agent
+
+## What a Worker Agent Is
+
+A worker agent is a specialized AI agent built to perform a specific business process. It's not a general-purpose assistant — it's purpose-built, like hiring someone for a defined role. A sales outreach agent knows how to research prospects, craft personalized messages, and follow up. A support triage agent knows how to categorize tickets, pull customer context, and route to the right team.
+
+In Hive, a **Coding Agent** (like Claude Code or Cursor) generates worker agents from a natural language goal description. You describe what you want the agent to do, and the coding agent produces the graph, nodes, edges, and configuration. The worker agent is the thing that actually runs.
+
+## Sessions
+
+A session is a single execution of a worker agent against a specific input. If your outreach agent processes 50 prospects, that's 50 sessions.
+
+Each session is isolated — it has its own shared memory, its own execution state, and its own history. This matters because sessions can be long-running. An agent might start researching a prospect, pause for human approval, wait hours or days, and then resume to send the message. The session preserves everything across that gap.
+
+Sessions also make debugging straightforward. Every decision the agent made, every tool it called, every retry it attempted — it's all captured in the session. When something goes wrong, you can trace exactly what happened.
+
+## Iterations
+
+Within a session, nodes (especially `event_loop` nodes) work in iterations. An iteration is one turn of the loop: the LLM reasons about the current state, possibly calls tools, observes results, and produces output. Then the judge evaluates: is this good enough?
+
+If not, the node iterates again. The LLM sees what went wrong and adjusts its approach. This is how agents self-correct without human intervention — through rapid iteration within a single node, not by restarting the whole process.
+
+Iterations have limits. You set a maximum per node to prevent runaway loops. If a node can't produce acceptable output within its iteration budget, it fails and the graph's error-handling edges take over.
+
+## Headless Execution
+
+A lot of business processes need to run continuously — monitoring inboxes, processing incoming leads, watching for events. These agents run **headless**: no UI, no human sitting at a terminal, just the agent doing its job in the background.
+
+Headless doesn't mean unsupervised. HITL (human-in-the-loop) nodes still pause execution and wait for human input when the agent hits a decision it shouldn't make alone. The difference is that instead of a live conversation, the agent sends a notification, waits for a response through whatever channel you've configured, and resumes when the human weighs in.
+
+This is the operational model Hive is designed for: agents that run 24/7 as part of your business infrastructure, with humans stepping in only when needed. The goal is to automate the routine and escalate the exceptions.
+
+## The Runtime
+
+The worker agent runtime manages the lifecycle: starting sessions, executing the graph, handling pauses and resumes, tracking costs, and collecting metrics. It coordinates everything the agent needs — LLM access, tool execution, shared memory, credential management — so individual nodes can focus on their specific job.
+
+Key things the runtime handles:
+
+**Cost tracking** — Every LLM call is metered. You set budget constraints on the goal, and the runtime enforces them. An agent can't silently burn through your API credits.
+
+**Decision logging** — Every meaningful choice the agent makes is recorded: what it was trying to do, what options it considered, what it chose, and what happened. This isn't just for debugging — it's the raw material that evolution uses to improve future generations.
+
+**Event streaming** — The runtime emits events as the agent works. You can wire these up to dashboards, logs, or alerting systems to monitor agents in real time.
+
+**Crash recovery** — If execution is interrupted (process crash, deployment, anything), the runtime can resume from the last checkpoint. Conversation state and memory are persisted, so the agent picks up where it left off rather than starting over.
+
+## The Big Picture
+
+The worker agent model is Hive's answer to a simple question: how do you run AI agents like you'd run a team?
+
+You hire for a role (define the goal), you onboard them with context (provide tools, credentials, domain knowledge), you set expectations (success criteria and constraints), you let them work independently (headless execution), and you check in when something unusual comes up (HITL). When they're not performing well, you don't debug them line by line — you evolve them (see [Evolution](./evolution.md)).
@@ -1,4 +1,27 @@
-# TUI Text Selection and Copy Guide
+# TUI Dashboard Guide
+
+## Launching the TUI
+
+There are two ways to launch the TUI dashboard:
+
+```bash
+# Browse and select an agent interactively
+hive tui
+
+# Launch the TUI for a specific agent
+hive run exports/my_agent --tui
+```
+
+`hive tui` scans both `exports/` and `examples/templates/` for available agents, then presents a selection menu.
+
+## Dashboard Panels
+
+The TUI dashboard is divided into four areas:
+
+- **Status Bar** - Shows the current agent name, execution state, and model in use
+- **Graph Overview** - Live visualization of the agent's node graph with highlighted active node
+- **Log Pane** - Scrollable event log streaming node transitions, LLM calls, and tool outputs
+- **Chat REPL** - Input area for interacting with client-facing nodes (`ask_user()` prompts appear here)

 ## Keybindings

@@ -28,3 +51,9 @@ The log pane uses `auto_scroll=False`. New output only scrolls to the bottom whe
 ## Screenshots

 `Ctrl+S` saves an SVG screenshot to the `screenshots/` directory with a timestamped filename. Open the SVG in any browser to view it.
+
+## Tips
+
+- Use `--mock` mode to explore agent execution without spending API credits: `hive run exports/my_agent --tui --mock`
+- Override the default model with `--model`: `hive run exports/my_agent --model gpt-4o`
+- Screenshots are saved as SVG files to `screenshots/` and can be opened in any browser
@@ -11,6 +11,7 @@ template_name/
 ├── __init__.py       # Package exports
 ├── __main__.py       # CLI entry point
 ├── agent.py          # Goal, edges, graph spec, agent class
+├── agent.json        # Agent definition (used by build-from-template)
 ├── config.py         # Runtime configuration
 ├── nodes/
 │   └── __init__.py   # Node definitions (NodeSpec instances)
@@ -19,20 +20,28 @@ template_name/

 ## How to use a template

+### Option 1: Build from template (recommended)
+
+Use the `/hive-create` skill and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
+
+### Option 2: Manual copy
+
 ```bash
 # 1. Copy to your exports directory
-cp -r examples/templates/marketing_agent exports/my_marketing_agent
+cp -r examples/templates/deep_research_agent exports/my_research_agent

 # 2. Update the module references in __main__.py and __init__.py

 # 3. Customize goal, nodes, edges, and prompts

 # 4. Run it
-uv run python -m exports.my_marketing_agent --input '{"product_description": "..."}'
+uv run python -m exports.my_research_agent --input '{"topic": "..."}'
 ```

 ## Available templates

 | Template | Description |
 |----------|-------------|
-| [marketing_agent](marketing_agent/) | Multi-channel marketing content generator with audience analysis, content generation, and editorial review nodes |
+| [deep_research_agent](deep_research_agent/) | Interactive research agent that searches diverse sources, evaluates findings with user checkpoints, and produces a cited HTML report |
+| [tech_news_reporter](tech_news_reporter/) | Researches the latest technology and AI news from the web and produces a well-organized report |
+| [twitter_outreach](twitter_outreach/) | Researches a Twitter/X profile, crafts a personalized outreach email, and sends it after user approval |
@@ -0,0 +1,22 @@
+# Deep Research Agent
+
+A template agent designed to perform comprehensive research on a specific topic and generate a structured report.
+
+## Usage
+
+Run the agent using the following command:
+
+### Linux / Mac
+```bash
+PYTHONPATH=core:examples/templates python -m deep_research_agent run --mock --topic "Artificial Intelligence"
+
+### Windows
+```powershell
+$env:PYTHONPATH="core;examples\templates"
+python -m deep_research_agent run --mock --topic "Artificial Intelligence"
+
+## Options
+
+- `-t, --topic`: The research topic (required).
+- `--mock`: Run without calling real LLM APIs (simulated execution).
+- `--help`: Show all available options.
@@ -34,18 +34,17 @@ def cli():

@cli.command()
@click.option("--topic", "-t", type=str, required=True, help="Research topic")
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(topic, mock, quiet, verbose, debug):
+def run(topic, quiet, verbose, debug):
    """Execute research on a topic."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

    context = {"topic": topic}

-    result = asyncio.run(default_agent.run(context, mock_mode=mock))
+    result = asyncio.run(default_agent.run(context))

    output_data = {
        "success": result.success,
@@ -60,10 +59,9 @@ def run(topic, mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive research."""
    setup_logging(verbose=verbose, debug=debug)

@@ -90,20 +88,18 @@ def tui(mock, verbose, debug):
        agent._event_bus = EventBus()
        agent._tool_registry = ToolRegistry()

-        storage_path = Path.home() / ".hive" / "deep_research_agent"
+        storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
        storage_path.mkdir(parents=True, exist_ok=True)

        mcp_config_path = Path(__file__).parent / "mcp_servers.json"
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -211,17 +207,8 @@ async def _interactive_shell(verbose=False):

                if result.success:
                    output = result.output
-                    if "report_content" in output:
-                        click.echo("\n--- Report ---\n")
-                        click.echo(output["report_content"])
-                        click.echo("\n")
-                    if "references" in output:
-                        click.echo("--- References ---\n")
-                        for ref in output.get("references", []):
-                            click.echo(
-                                f"  [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
-                            )
-                        click.echo("\n")
+                    status = output.get("delivery_status", "unknown")
+                    click.echo(f"\nResearch complete (status: {status})\n")
                else:
                    click.echo(f"\nResearch failed: {result.error}\n")

@@ -0,0 +1,276 @@
+{
+  "agent": {
+    "id": "deep_research_agent",
+    "name": "Deep Research Agent",
+    "version": "1.0.0",
+    "description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback."
+  },
+  "graph": {
+    "id": "deep-research-agent-graph",
+    "goal_id": "rigorous-interactive-research",
+    "version": "1.0.0",
+    "entry_node": "intake",
+    "entry_points": {
+      "start": "intake"
+    },
+    "pause_nodes": [],
+    "terminal_nodes": [
+      "report"
+    ],
+    "nodes": [
+      {
+        "id": "intake",
+        "name": "Research Intake",
+        "description": "Discuss the research topic with the user, clarify scope, and confirm direction",
+        "node_type": "event_loop",
+        "input_keys": [
+          "topic"
+        ],
+        "output_keys": [
+          "research_brief"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "You are a research intake specialist. The user wants to research a topic.\nHave a brief conversation to clarify what they need.\n\n**STEP 1 \u2014 Read and respond (text only, NO tool calls):**\n1. Read the topic provided\n2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)\n3. If it's already clear, confirm your understanding and ask the user to confirm\n\nKeep it short. Don't over-ask.\n\nAfter your message, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user confirms, call set_output:**\n- set_output(\"research_brief\", \"A clear paragraph describing exactly what to research, what questions to answer, what scope to cover, and how deep to go.\")",
+        "tools": [],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 1,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      },
+      {
+        "id": "research",
+        "name": "Research",
+        "description": "Search the web, fetch source content, and compile findings",
+        "node_type": "event_loop",
+        "input_keys": [
+          "research_brief",
+          "feedback"
+        ],
+        "output_keys": [
+          "findings",
+          "sources",
+          "gaps"
+        ],
+        "nullable_output_keys": [
+          "feedback"
+        ],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "You are a research agent. Given a research brief, find and analyze sources.\n\nIf feedback is provided, this is a follow-up round \u2014 focus on the gaps identified.\n\nWork in phases:\n1. **Search**: Use web_search with 3-5 diverse queries covering different angles.\n   Prioritize authoritative sources (.edu, .gov, established publications).\n2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).\n   Skip URLs that fail. Extract the substantive content.\n3. **Analyze**: Review what you've collected. Identify key findings, themes,\n   and any contradictions between sources.\n\nImportant:\n- Work in batches of 3-4 tool calls at a time to manage context\n- After each batch, assess whether you have enough material\n- Prefer quality over quantity \u2014 5 good sources beat 15 thin ones\n- Track which URL each finding comes from (you'll need citations later)\n\nWhen done, use set_output:\n- set_output(\"findings\", \"Structured summary: key findings with source URLs for each claim. Include themes, contradictions, and confidence levels.\")\n- set_output(\"sources\", [{\"url\": \"...\", \"title\": \"...\", \"summary\": \"...\"}])\n- set_output(\"gaps\", \"What aspects of the research brief are NOT well-covered yet, if any.\")",
+        "tools": [
+          "web_search",
+          "web_scrape",
+          "load_data",
+          "save_data",
+          "list_data_files"
+        ],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 3,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": false
+      },
+      {
+        "id": "review",
+        "name": "Review Findings",
+        "description": "Present findings to user and decide whether to research more or write the report",
+        "node_type": "event_loop",
+        "input_keys": [
+          "findings",
+          "sources",
+          "gaps",
+          "research_brief"
+        ],
+        "output_keys": [
+          "needs_more_research",
+          "feedback"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "Present the research findings to the user clearly and concisely.\n\n**STEP 1 \u2014 Present (your first message, text only, NO tool calls):**\n1. **Summary** (2-3 sentences of what was found)\n2. **Key Findings** (bulleted, with confidence levels)\n3. **Sources Used** (count and quality assessment)\n4. **Gaps** (what's still unclear or under-covered)\n\nEnd by asking: Are they satisfied, or do they want deeper research? Should we proceed to writing the final report?\n\nAfter your presentation, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user responds, call set_output:**\n- set_output(\"needs_more_research\", \"true\")  \u2014 if they want more\n- set_output(\"needs_more_research\", \"false\") \u2014 if they're satisfied\n- set_output(\"feedback\", \"What the user wants explored further, or empty string\")",
+        "tools": [],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 3,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      },
+      {
+        "id": "report",
+        "name": "Write & Deliver Report",
+        "description": "Write a cited HTML report from the findings and present it to the user",
+        "node_type": "event_loop",
+        "input_keys": [
+          "findings",
+          "sources",
+          "research_brief"
+        ],
+        "output_keys": [
+          "delivery_status"
+        ],
+        "nullable_output_keys": [],
+        "input_schema": {},
+        "output_schema": {},
+        "system_prompt": "Write a comprehensive research report as an HTML file and present it to the user.\n\n**STEP 1 \u2014 Write the HTML report (tool calls, NO text to user yet):**\n\n1. Compose a complete, self-contained HTML document with embedded CSS styling.\n   Use a clean, readable design: max-width container, pleasant typography,\n   numbered citation links, a table of contents, and a references section.\n\n   Report structure inside the HTML:\n   - Title & date\n   - Executive Summary (2-3 paragraphs)\n   - Table of Contents\n   - Findings (organized by theme, with [n] citation links)\n   - Analysis (synthesis, implications, areas of debate)\n   - Conclusion (key takeaways, confidence assessment)\n   - References (numbered list with clickable URLs)\n\n   Requirements:\n   - Every factual claim must cite its source with [n] notation\n   - Be objective \u2014 present multiple viewpoints where sources disagree\n   - Distinguish well-supported conclusions from speculation\n   - Answer the original research questions from the brief\n\n2. Save the HTML file:\n   save_data(filename=\"report.html\", data=<your_html>)\n\n3. Get the clickable link:\n   serve_file_to_user(filename=\"report.html\", label=\"Research Report\")\n\n**STEP 2 \u2014 Present the link to the user (text only, NO tool calls):**\n\nTell the user the report is ready and include the file:// URI from\nserve_file_to_user so they can click it to open. Give a brief summary\nof what the report covers. Ask if they have questions.\n\nAfter presenting the link, call ask_user() to wait for the user's response.\n\n**STEP 3 \u2014 After the user responds:**\n- Answer follow-up questions from the research material\n- Call ask_user() again if they might have more questions\n- When the user is satisfied: set_output(\"delivery_status\", \"completed\")",
+        "tools": [
+          "save_data",
+          "serve_file_to_user",
+          "load_data",
+          "list_data_files"
+        ],
+        "model": null,
+        "function": null,
+        "routes": {},
+        "max_retries": 3,
+        "retry_on": [],
+        "max_node_visits": 1,
+        "output_model": null,
+        "max_validation_retries": 2,
+        "client_facing": true
+      }
+    ],
+    "edges": [
+      {
+        "id": "intake-to-research",
+        "source": "intake",
+        "target": "research",
+        "condition": "on_success",
+        "condition_expr": null,
+        "priority": 1,
+        "input_mapping": {}
+      },
+      {
+        "id": "research-to-review",
+        "source": "research",
+        "target": "review",
+        "condition": "on_success",
+        "condition_expr": null,
+        "priority": 1,
+        "input_mapping": {}
+      },
+      {
+        "id": "review-to-research-feedback",
+        "source": "review",
+        "target": "research",
+        "condition": "conditional",
+        "condition_expr": "str(needs_more_research).lower() == 'true'",
+        "priority": 2,
+        "input_mapping": {}
+      },
+      {
+        "id": "review-to-report",
+        "source": "review",
+        "target": "report",
+        "condition": "conditional",
+        "condition_expr": "str(needs_more_research).lower() != 'true'",
+        "priority": 1,
+        "input_mapping": {}
+      }
+    ],
+    "max_steps": 100,
+    "max_retries_per_node": 3,
+    "description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback.",
+    "created_at": "2026-02-06T00:00:00.000000"
+  },
+  "goal": {
+    "id": "rigorous-interactive-research",
+    "name": "Rigorous Interactive Research",
+    "description": "Research any topic by searching diverse sources, analyzing findings, and producing a cited report \u2014 with user checkpoints to guide direction.",
+    "status": "draft",
+    "success_criteria": [
+      {
+        "id": "source-diversity",
+        "description": "Use multiple diverse, authoritative sources",
+        "metric": "source_count",
+        "target": ">=5",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "citation-coverage",
+        "description": "Every factual claim in the report cites its source",
+        "metric": "citation_coverage",
+        "target": "100%",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "user-satisfaction",
+        "description": "User reviews findings before report generation",
+        "metric": "user_approval",
+        "target": "true",
+        "weight": 0.25,
+        "met": false
+      },
+      {
+        "id": "report-completeness",
+        "description": "Final report answers the original research questions",
+        "metric": "question_coverage",
+        "target": "90%",
+        "weight": 0.25,
+        "met": false
+      }
+    ],
+    "constraints": [
+      {
+        "id": "no-hallucination",
+        "description": "Only include information found in fetched sources",
+        "constraint_type": "quality",
+        "category": "accuracy",
+        "check": ""
+      },
+      {
+        "id": "source-attribution",
+        "description": "Every claim must cite its source with a numbered reference",
+        "constraint_type": "quality",
+        "category": "accuracy",
+        "check": ""
+      },
+      {
+        "id": "user-checkpoint",
+        "description": "Present findings to the user before writing the final report",
+        "constraint_type": "functional",
+        "category": "interaction",
+        "check": ""
+      }
+    ],
+    "context": {},
+    "required_capabilities": [],
+    "input_schema": {},
+    "output_schema": {},
+    "version": "1.0.0",
+    "parent_version": null,
+    "evolution_reason": null,
+    "created_at": "2026-02-06 00:00:00.000000",
+    "updated_at": "2026-02-06 00:00:00.000000"
+  },
+  "required_tools": [
+    "list_data_files",
+    "load_data",
+    "save_data",
+    "serve_file_to_user",
+    "web_scrape",
+    "web_search"
+  ],
+  "metadata": {
+    "created_at": "2026-02-06T00:00:00.000000",
+    "node_count": 4,
+    "edge_count": 4
+  }
+}
@@ -102,23 +102,23 @@ edges = [
        condition=EdgeCondition.ON_SUCCESS,
        priority=1,
    ),
-    # review -> research (feedback loop)
+    # review -> research (feedback loop, checked first)
    EdgeSpec(
        id="review-to-research-feedback",
        source="review",
        target="research",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="needs_more_research == True",
-        priority=1,
+        condition_expr="str(needs_more_research).lower() == 'true'",
+        priority=2,
    ),
-    # review -> report (user satisfied)
+    # review -> report (complementary condition — proceed to report when no more research needed)
    EdgeSpec(
        id="review-to-report",
        source="review",
        target="report",
        condition=EdgeCondition.CONDITIONAL,
-        condition_expr="needs_more_research == False",
-        priority=2,
+        condition_expr="str(needs_more_research).lower() != 'true'",
+        priority=1,
    ),
 ]

@@ -173,11 +173,11 @@ class DeepResearchAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

-        storage_path = Path.home() / ".hive" / "deep_research_agent"
+        storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
        storage_path.mkdir(parents=True, exist_ok=True)

        self._event_bus = EventBus()
@@ -187,13 +187,11 @@ class DeepResearchAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -213,10 +211,10 @@ class DeepResearchAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -244,10 +242,10 @@ class DeepResearchAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,33 +1,8 @@
 """Runtime configuration."""

-import json
-from dataclasses import dataclass, field
-from pathlib import Path
-
-
-def _load_preferred_model() -> str:
-    """Load preferred model from ~/.hive/configuration.json."""
-    config_path = Path.home() / ".hive" / "configuration.json"
-    if config_path.exists():
-        try:
-            with open(config_path) as f:
-                config = json.load(f)
-            llm = config.get("llm", {})
-            if llm.get("provider") and llm.get("model"):
-                return f"{llm['provider']}/{llm['model']}"
-        except Exception:
-            pass
-    return "anthropic/claude-sonnet-4-20250514"
-
-
-@dataclass
-class RuntimeConfig:
-    model: str = field(default_factory=_load_preferred_model)
-    temperature: float = 0.7
-    max_tokens: int = 40000
-    api_key: str | None = None
-    api_base: str | None = None
+from dataclasses import dataclass

+from framework.config import RuntimeConfig

 default_config = RuntimeConfig()

@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
  }
@@ -23,6 +23,8 @@ Have a brief conversation to clarify what they need.

 Keep it short. Don't over-ask.

+After your message, call ask_user() to wait for the user's response.
+
 **STEP 2 — After the user confirms, call set_output:**
 - set_output("research_brief", "A clear paragraph describing exactly what to research, \
 what questions to answer, what scope to cover, and how deep to go.")
@@ -93,6 +95,8 @@ Present the research findings to the user clearly and concisely.
 End by asking: Are they satisfied, or do they want deeper research? \
 Should we proceed to writing the final report?

+After your presentation, call ask_user() to wait for the user's response.
+
 **STEP 2 — After the user responds, call set_output:**
 - set_output("needs_more_research", "true")  — if they want more
 - set_output("needs_more_research", "false") — if they're satisfied
@@ -147,8 +151,11 @@ Tell the user the report is ready and include the file:// URI from
 serve_file_to_user so they can click it to open. Give a brief summary
 of what the report covers. Ask if they have questions.

+After presenting the link, call ask_user() to wait for the user's response.
+
 **STEP 3 — After the user responds:**
 - Answer follow-up questions from the research material
+- Call ask_user() again if they might have more questions
 - When the user is satisfied: set_output("delivery_status", "completed")
 """,
    tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
@@ -33,18 +33,17 @@ def cli():


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(mock, quiet, verbose, debug):
+def run(quiet, verbose, debug):
    """Execute the news reporter agent."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

    context = {}

-    result = asyncio.run(default_agent.run(context, mock_mode=mock))
+    result = asyncio.run(default_agent.run(context))

    output_data = {
        "success": result.success,
@@ -59,10 +58,9 @@ def run(mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive news reporting."""
    setup_logging(verbose=verbose, debug=debug)

@@ -88,20 +86,18 @@ def tui(mock, verbose, debug):
        agent._event_bus = EventBus()
        agent._tool_registry = ToolRegistry()

-        storage_path = Path.home() / ".hive" / "tech_news_reporter"
+        storage_path = Path.home() / ".hive" / "agents" / "tech_news_reporter"
        storage_path.mkdir(parents=True, exist_ok=True)

        mcp_config_path = Path(__file__).parent / "mcp_servers.json"
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -157,7 +157,7 @@ class TechNewsReporterAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

@@ -171,13 +171,11 @@ class TechNewsReporterAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -197,10 +195,10 @@ class TechNewsReporterAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -228,10 +226,10 @@ class TechNewsReporterAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,33 +1,8 @@
 """Runtime configuration."""

-import json
-from dataclasses import dataclass, field
-from pathlib import Path
-
-
-def _load_preferred_model() -> str:
-    """Load preferred model from ~/.hive/configuration.json."""
-    config_path = Path.home() / ".hive" / "configuration.json"
-    if config_path.exists():
-        try:
-            with open(config_path) as f:
-                config = json.load(f)
-            llm = config.get("llm", {})
-            if llm.get("provider") and llm.get("model"):
-                return f"{llm['provider']}/{llm['model']}"
-        except Exception:
-            pass
-    return "anthropic/claude-sonnet-4-20250514"
-
-
-@dataclass
-class RuntimeConfig:
-    model: str = field(default_factory=_load_preferred_model)
-    temperature: float = 0.7
-    max_tokens: int = 40000
-    api_key: str | None = None
-    api_base: str | None = None
+from dataclasses import dataclass

+from framework.config import RuntimeConfig

 default_config = RuntimeConfig()

@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, save_data, and serve_file_to_user"
  }
@@ -18,8 +18,8 @@ PYTHONPATH=core:exports uv run python -m twitter_outreach validate
 # Show agent info
 PYTHONPATH=core:exports uv run python -m twitter_outreach info

-# Run in mock mode (no API calls)
-PYTHONPATH=core:exports uv run python -m twitter_outreach run --mock
+# Run the workflow
+PYTHONPATH=core:exports uv run python -m twitter_outreach run

 # Launch the TUI
 PYTHONPATH=core:exports uv run python -m twitter_outreach tui
@@ -33,16 +33,15 @@ def cli():


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(mock, quiet, verbose, debug):
+def run(quiet, verbose, debug):
    """Execute the outreach workflow."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

-    result = asyncio.run(default_agent.run({}, mock_mode=mock))
+    result = asyncio.run(default_agent.run({}))

    output_data = {
        "success": result.success,
@@ -57,10 +56,9 @@ def run(mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive outreach."""
    setup_logging(verbose=verbose, debug=debug)

@@ -93,13 +91,11 @@ def tui(mock, verbose, debug):
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -172,7 +172,7 @@ class TwitterOutreachAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

@@ -186,13 +186,11 @@ class TwitterOutreachAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -212,10 +210,10 @@ class TwitterOutreachAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -243,10 +241,10 @@ class TwitterOutreachAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,33 +1,8 @@
 """Runtime configuration."""

-import json
-from dataclasses import dataclass, field
-from pathlib import Path
-
-
-def _load_preferred_model() -> str:
-    """Load preferred model from ~/.hive/configuration.json."""
-    config_path = Path.home() / ".hive" / "configuration.json"
-    if config_path.exists():
-        try:
-            with open(config_path) as f:
-                config = json.load(f)
-            llm = config.get("llm", {})
-            if llm.get("provider") and llm.get("model"):
-                return f"{llm['provider']}/{llm['model']}"
-        except Exception:
-            pass
-    return "anthropic/claude-sonnet-4-20250514"
-
-
-@dataclass
-class RuntimeConfig:
-    model: str = field(default_factory=_load_preferred_model)
-    temperature: float = 0.7
-    max_tokens: int = 40000
-    api_key: str | None = None
-    api_base: str | None = None
+from dataclasses import dataclass

+from framework.config import RuntimeConfig

 default_config = RuntimeConfig()

@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and send_email"
  }
@@ -1,92 +0,0 @@
-# Issue: Remove LLM Dependency from Agent Builder MCP Server
-
-## Summary
-
-The `agent_builder_server.py` MCP server has a hardcoded dependency on `AnthropicProvider` for test generation, which:
-1. Breaks when users don't have an Anthropic API key
-2. Is redundant since the calling agent (Claude) can write tests directly
-3. Violates the principle that MCP servers should be provider-agnostic utilities
-
-## Affected Code
-
-**File:** `core/framework/mcp/agent_builder_server.py`
-
-**Lines:** 2350-2351, 2413-2414
-
-```python
-# Line 2350-2351 (generate_constraint_tests)
-from framework.llm import AnthropicProvider
-llm = AnthropicProvider()
-
-# Line 2413-2414 (generate_success_tests)
-from framework.llm import AnthropicProvider
-llm = AnthropicProvider()
-```
-
-**Introduced by:** bryan (commit e2945b6c, 2026-01-20)
-
-## Problem
-
-When a user configures their agent to use a non-Anthropic LLM provider (e.g., `LiteLLMProvider` with Cerebras, OpenAI, or other backends), the MCP test generation tools fail with:
-
-```
-{"error": "Failed to initialize LLM: Anthropic API key required. Set ANTHROPIC_API_KEY env var or pass api_key."}
-```
-
-This happens even though:
- The user has valid credentials for their chosen provider
- The calling Claude agent is fully capable of writing tests
- MCP is an open standard that shouldn't mandate specific LLM providers
-
-## Root Cause
-
-The test generation functions (`generate_constraint_tests`, `generate_success_tests`) embed an LLM call to generate Python test code from goal definitions. This design:
-
-1. **Duplicates capability** - The outer Claude agent already writes code; delegating to an inner LLM is redundant
-2. **Creates provider lock-in** - Hardcoding `AnthropicProvider` breaks multi-provider workflows
-3. **Adds complexity** - Requires managing credentials in two places (outer agent + MCP server)
-
-## Proposed Solution
-
-**Option A: Remove LLM dependency entirely (Recommended)**
-
-Refactor the MCP server to only provide test execution utilities:
- `run_tests` - Execute pytest and return structured results
- `list_tests` - Scan test files in agent directory
- `debug_test` - Re-run single test with verbose output
-
-Test *generation* becomes the responsibility of the calling agent, which:
- Already has LLM capability
- Already knows the goal/constraints
- Can write tests directly using `Write` tool
-
-**Option B: Make LLM provider configurable**
-
-If LLM-based generation must stay in the MCP server:
-```python
-# Accept model parameter, use LiteLLM for provider-agnostic support
-from framework.llm.litellm import LiteLLMProvider
-
-def generate_constraint_tests(goal_id, goal_json, agent_path, model="gpt-4o-mini"):
-    llm = LiteLLMProvider(model=model)
-    # ...
-```
-
-## Impact
-
- Users with non-Anthropic setups cannot use `generate_constraint_tests` or `generate_success_tests`
- Workaround: Write tests manually (as done in this session)
- Skills documentation (`testing-agent`) mandates MCP tools but they don't work universally
-
-## Recommendation
-
-Implement **Option A**. The MCP server should be a thin utility layer for test execution, not a code generator. This:
- Eliminates provider dependency
- Simplifies the codebase
- Aligns with MCP's role as a protocol, not an LLM wrapper
-
-## Related Files
-
- `core/framework/mcp/agent_builder_server.py` - Main file to modify
- `.claude/skills/hive-test/SKILL.md` - Update documentation if tools change
- `core/framework/testing/` - Test generation utilities that could be removed
@@ -1,7 +0,0 @@
-{
-  "extraPaths": ["core", "tools/src"],
-  "pythonVersion": "3.11",
-  "typeCheckingMode": "basic",
-  "include": ["core", "tools/src", "exports"],
-  "exclude": ["**/node_modules", "**/__pycache__", "**/.*"]
-}
@@ -303,9 +303,9 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    )

    declare -A DEFAULT_MODELS=(
-        ["anthropic"]="claude-sonnet-4-5-20250929"
-        ["openai"]="gpt-4o"
-        ["gemini"]="gemini-3.0-flash-preview"
+        ["anthropic"]="claude-opus-4-6"
+        ["openai"]="gpt-5.2"
+        ["gemini"]="gemini-3-flash-preview"
        ["groq"]="moonshotai/kimi-k2-instruct-0905"
        ["cerebras"]="zai-glm-4.7"
        ["mistral"]="mistral-large-latest"
@@ -313,6 +313,65 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
        ["deepseek"]="deepseek-chat"
    )

+    # Model choices per provider: composite-key associative arrays
+    # Keys: "provider:index" -> value
+    declare -A MODEL_CHOICES_ID=(
+        ["anthropic:0"]="claude-opus-4-6"
+        ["anthropic:1"]="claude-sonnet-4-5-20250929"
+        ["anthropic:2"]="claude-sonnet-4-20250514"
+        ["anthropic:3"]="claude-haiku-4-5-20251001"
+        ["openai:0"]="gpt-5.2"
+        ["openai:1"]="gpt-5-mini"
+        ["openai:2"]="gpt-5-nano"
+        ["gemini:0"]="gemini-3-flash-preview"
+        ["gemini:1"]="gemini-3-pro-preview"
+        ["groq:0"]="moonshotai/kimi-k2-instruct-0905"
+        ["groq:1"]="openai/gpt-oss-120b"
+        ["cerebras:0"]="zai-glm-4.7"
+        ["cerebras:1"]="qwen3-235b-a22b-instruct-2507"
+    )
+
+    declare -A MODEL_CHOICES_LABEL=(
+        ["anthropic:0"]="Opus 4.6 - Most capable (recommended)"
+        ["anthropic:1"]="Sonnet 4.5 - Best balance"
+        ["anthropic:2"]="Sonnet 4 - Fast + capable"
+        ["anthropic:3"]="Haiku 4.5 - Fast + cheap"
+        ["openai:0"]="GPT-5.2 - Most capable (recommended)"
+        ["openai:1"]="GPT-5 Mini - Fast + cheap"
+        ["openai:2"]="GPT-5 Nano - Fastest"
+        ["gemini:0"]="Gemini 3 Flash - Fast (recommended)"
+        ["gemini:1"]="Gemini 3 Pro - Best quality"
+        ["groq:0"]="Kimi K2 - Best quality (recommended)"
+        ["groq:1"]="GPT-OSS 120B - Fast reasoning"
+        ["cerebras:0"]="ZAI-GLM 4.7 - Best quality (recommended)"
+        ["cerebras:1"]="Qwen3 235B - Frontier reasoning"
+    )
+
+    # NOTE: 8192 should match DEFAULT_MAX_TOKENS in core/framework/graph/edge.py
+    declare -A MODEL_CHOICES_MAXTOKENS=(
+        ["anthropic:0"]=8192
+        ["anthropic:1"]=8192
+        ["anthropic:2"]=8192
+        ["anthropic:3"]=8192
+        ["openai:0"]=16384
+        ["openai:1"]=16384
+        ["openai:2"]=16384
+        ["gemini:0"]=8192
+        ["gemini:1"]=8192
+        ["groq:0"]=8192
+        ["groq:1"]=8192
+        ["cerebras:0"]=8192
+        ["cerebras:1"]=8192
+    )
+
+    declare -A MODEL_CHOICES_COUNT=(
+        ["anthropic"]=4
+        ["openai"]=3
+        ["gemini"]=2
+        ["groq"]=2
+        ["cerebras"]=2
+    )
+
    # Helper functions for Bash 4+
    get_provider_name() {
        echo "${PROVIDER_NAMES[$1]}"
@@ -325,6 +384,22 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
    get_default_model() {
        echo "${DEFAULT_MODELS[$1]}"
    }
+
+    get_model_choice_count() {
+        echo "${MODEL_CHOICES_COUNT[$1]:-0}"
+    }
+
+    get_model_choice_id() {
+        echo "${MODEL_CHOICES_ID[$1:$2]}"
+    }
+
+    get_model_choice_label() {
+        echo "${MODEL_CHOICES_LABEL[$1:$2]}"
+    }
+
+    get_model_choice_maxtokens() {
+        echo "${MODEL_CHOICES_MAXTOKENS[$1:$2]}"
+    }
 else
    # Bash 3.2 - use parallel indexed arrays
    PROVIDER_ENV_VARS=(ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY GOOGLE_API_KEY GROQ_API_KEY CEREBRAS_API_KEY MISTRAL_API_KEY TOGETHER_API_KEY DEEPSEEK_API_KEY)
@@ -333,7 +408,7 @@ else

    # Default models by provider id (parallel arrays)
    MODEL_PROVIDER_IDS=(anthropic openai gemini groq cerebras mistral together_ai deepseek)
-    MODEL_DEFAULTS=("claude-sonnet-4-5-20250929" "gpt-4o" "gemini-3.0-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")
+    MODEL_DEFAULTS=("claude-opus-4-6" "gpt-5.2" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")

    # Helper: get provider display name for an env var
    get_provider_name() {
@@ -373,18 +448,188 @@ else
            i=$((i + 1))
        done
    }
+
+    # Model choices per provider - flat parallel arrays with provider offsets
+    # Provider order: anthropic(4), openai(3), gemini(2), groq(2), cerebras(2)
+    MC_PROVIDERS=(anthropic anthropic anthropic anthropic openai openai openai gemini gemini groq groq cerebras cerebras)
+    MC_IDS=("claude-opus-4-6" "claude-sonnet-4-5-20250929" "claude-sonnet-4-20250514" "claude-haiku-4-5-20251001" "gpt-5.2" "gpt-5-mini" "gpt-5-nano" "gemini-3-flash-preview" "gemini-3-pro-preview" "moonshotai/kimi-k2-instruct-0905" "openai/gpt-oss-120b" "zai-glm-4.7" "qwen3-235b-a22b-instruct-2507")
+    MC_LABELS=("Opus 4.6 - Most capable (recommended)" "Sonnet 4.5 - Best balance" "Sonnet 4 - Fast + capable" "Haiku 4.5 - Fast + cheap" "GPT-5.2 - Most capable (recommended)" "GPT-5 Mini - Fast + cheap" "GPT-5 Nano - Fastest" "Gemini 3 Flash - Fast (recommended)" "Gemini 3 Pro - Best quality" "Kimi K2 - Best quality (recommended)" "GPT-OSS 120B - Fast reasoning" "ZAI-GLM 4.7 - Best quality (recommended)" "Qwen3 235B - Frontier reasoning")
+    # NOTE: 8192 should match DEFAULT_MAX_TOKENS in core/framework/graph/edge.py
+    MC_MAXTOKENS=(8192 8192 8192 8192 16384 16384 16384 8192 8192 8192 8192 8192 8192)
+
+    # Helper: get number of model choices for a provider
+    get_model_choice_count() {
+        local provider_id="$1"
+        local count=0
+        local i=0
+        while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
+            if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
+                count=$((count + 1))
+            fi
+            i=$((i + 1))
+        done
+        echo "$count"
+    }
+
+    # Helper: get model choice id by provider and index (0-based within provider)
+    get_model_choice_id() {
+        local provider_id="$1"
+        local idx="$2"
+        local count=0
+        local i=0
+        while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
+            if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
+                if [ $count -eq "$idx" ]; then
+                    echo "${MC_IDS[$i]}"
+                    return
+                fi
+                count=$((count + 1))
+            fi
+            i=$((i + 1))
+        done
+    }
+
+    # Helper: get model choice label by provider and index
+    get_model_choice_label() {
+        local provider_id="$1"
+        local idx="$2"
+        local count=0
+        local i=0
+        while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
+            if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
+                if [ $count -eq "$idx" ]; then
+                    echo "${MC_LABELS[$i]}"
+                    return
+                fi
+                count=$((count + 1))
+            fi
+            i=$((i + 1))
+        done
+    }
+
+    # Helper: get model choice max_tokens by provider and index
+    get_model_choice_maxtokens() {
+        local provider_id="$1"
+        local idx="$2"
+        local count=0
+        local i=0
+        while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
+            if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
+                if [ $count -eq "$idx" ]; then
+                    echo "${MC_MAXTOKENS[$i]}"
+                    return
+                fi
+                count=$((count + 1))
+            fi
+            i=$((i + 1))
+        done
+    }
 fi

 # Configuration directory
 HIVE_CONFIG_DIR="$HOME/.hive"
 HIVE_CONFIG_FILE="$HIVE_CONFIG_DIR/configuration.json"

+# Detect user's shell rc file
+detect_shell_rc() {
+    local shell_name
+    shell_name=$(basename "$SHELL")
+
+    case "$shell_name" in
+        zsh)
+            if [ -f "$HOME/.zshrc" ]; then
+                echo "$HOME/.zshrc"
+            else
+                echo "$HOME/.zshenv"
+            fi
+            ;;
+        bash)
+            if [ -f "$HOME/.bashrc" ]; then
+                echo "$HOME/.bashrc"
+            elif [ -f "$HOME/.bash_profile" ]; then
+                echo "$HOME/.bash_profile"
+            else
+                echo "$HOME/.profile"
+            fi
+            ;;
+        *)
+            # Fallback to .profile for other shells
+            echo "$HOME/.profile"
+            ;;
+    esac
+}
+
+SHELL_RC_FILE=$(detect_shell_rc)
+SHELL_NAME=$(basename "$SHELL")
+
+# Prompt the user to choose a model for their selected provider.
+# Sets SELECTED_MODEL and SELECTED_MAX_TOKENS.
+prompt_model_selection() {
+    local provider_id="$1"
+    local count
+    count="$(get_model_choice_count "$provider_id")"
+
+    if [ "$count" -eq 0 ]; then
+        # No curated choices for this provider (e.g. Mistral, DeepSeek)
+        SELECTED_MODEL="$(get_default_model "$provider_id")"
+        SELECTED_MAX_TOKENS=8192
+        return
+    fi
+
+    if [ "$count" -eq 1 ]; then
+        # Only one choice — auto-select
+        SELECTED_MODEL="$(get_model_choice_id "$provider_id" 0)"
+        SELECTED_MAX_TOKENS="$(get_model_choice_maxtokens "$provider_id" 0)"
+        return
+    fi
+
+    # Multiple choices — show menu
+    echo ""
+    echo -e "${BOLD}Select a model:${NC}"
+    echo ""
+
+    local i=0
+    while [ $i -lt "$count" ]; do
+        local label
+        label="$(get_model_choice_label "$provider_id" "$i")"
+        local mid
+        mid="$(get_model_choice_id "$provider_id" "$i")"
+        local num=$((i + 1))
+        echo -e "  ${CYAN}$num)${NC} $label  ${DIM}($mid)${NC}"
+        i=$((i + 1))
+    done
+    echo ""
+
+    local choice
+    while true; do
+        read -r -p "Enter choice [1]: " choice
+        choice="${choice:-1}"
+        if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "$count" ]; then
+            local idx=$((choice - 1))
+            SELECTED_MODEL="$(get_model_choice_id "$provider_id" "$idx")"
+            SELECTED_MAX_TOKENS="$(get_model_choice_maxtokens "$provider_id" "$idx")"
+            echo ""
+            echo -e "${GREEN}⬢${NC} Model: ${DIM}$SELECTED_MODEL${NC}"
+            return
+        fi
+        echo -e "${RED}Invalid choice. Please enter 1-$count${NC}"
+    done
+}
+
 # Function to save configuration
 save_configuration() {
    local provider_id="$1"
    local env_var="$2"
-    local model
-    model="$(get_default_model "$provider_id")"
+    local model="$3"
+    local max_tokens="$4"
+
+    # Fallbacks if not provided
+    if [ -z "$model" ]; then
+        model="$(get_default_model "$provider_id")"
+    fi
+    if [ -z "$max_tokens" ]; then
+        max_tokens=8192
+    fi

    mkdir -p "$HIVE_CONFIG_DIR"

@@ -394,6 +639,7 @@ config = {
    'llm': {
        'provider': '$provider_id',
        'model': '$model',
+        'max_tokens': $max_tokens,
        'api_key_env_var': '$env_var'
    },
    'created_at': '$(date -u +"%Y-%m-%dT%H:%M:%S+00:00")'
@@ -404,18 +650,11 @@ print(json.dumps(config, indent=2))
 " 2>/dev/null
 }

-# Check for .env files (temporarily disable set -e for robustness on Bash 3.2)
+# Source shell rc file to pick up existing env vars (temporarily disable set -e)
 set +e
-if [ -f "$SCRIPT_DIR/.env" ]; then
-    set -a
-    source "$SCRIPT_DIR/.env" 2>/dev/null
-    set +a
-fi
-
-if [ -f "$HOME/.env" ]; then
-    set -a
-    source "$HOME/.env" 2>/dev/null
-    set +a
+if [ -f "$SHELL_RC_FILE" ]; then
+    # Extract only export statements to avoid running shell config commands
+    eval "$(grep -E '^export [A-Z_]+=' "$SHELL_RC_FILE" 2>/dev/null)"
 fi
 set -e

@@ -424,6 +663,8 @@ FOUND_PROVIDERS=()      # Display names for UI
 FOUND_ENV_VARS=()       # Corresponding env var names
 SELECTED_PROVIDER_ID="" # Will hold the chosen provider ID
 SELECTED_ENV_VAR=""     # Will hold the chosen env var
+SELECTED_MODEL=""       # Will hold the chosen model ID
+SELECTED_MAX_TOKENS=8192 # Will hold the chosen max_tokens

 if [ "$USE_ASSOC_ARRAYS" = true ]; then
    # Bash 4+ - iterate over associative array keys
@@ -461,6 +702,8 @@ if [ ${#FOUND_PROVIDERS[@]} -gt 0 ]; then

            echo ""
            echo -e "${GREEN}⬢${NC} Using ${FOUND_PROVIDERS[0]}"
+
+            prompt_model_selection "$SELECTED_PROVIDER_ID"
        fi
    else
        # Multiple providers found, let user pick one
@@ -473,28 +716,34 @@ if [ ${#FOUND_PROVIDERS[@]} -gt 0 ]; then
            echo -e "  ${CYAN}$i)${NC} $provider"
            i=$((i + 1))
        done
+        echo -e "  ${CYAN}$i)${NC} Other"
+        max_choice=$i
        echo ""

        while true; do
-            read -r -p "Enter choice (1-${#FOUND_PROVIDERS[@]}): " choice
-            if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "${#FOUND_PROVIDERS[@]}" ]; then
+            read -r -p "Enter choice (1-$max_choice): " choice
+            if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "$max_choice" ]; then
+                if [ "$choice" -eq "$max_choice" ]; then
+                    # Fall through to the manual provider selection below
+                    break
+                fi
                idx=$((choice - 1))
                SELECTED_ENV_VAR="${FOUND_ENV_VARS[$idx]}"
                SELECTED_PROVIDER_ID="$(get_provider_id "$SELECTED_ENV_VAR")"

                echo ""
                echo -e "${GREEN}⬢${NC} Selected: ${FOUND_PROVIDERS[$idx]}"
+
+                prompt_model_selection "$SELECTED_PROVIDER_ID"
                break
            fi
-            echo -e "${RED}Invalid choice. Please enter 1-${#FOUND_PROVIDERS[@]}${NC}"
+            echo -e "${RED}Invalid choice. Please enter 1-$max_choice${NC}"
        done
    fi
 fi

 if [ -z "$SELECTED_PROVIDER_ID" ]; then
-    echo "No API keys found. Let's configure one."
    echo ""
-
    prompt_choice "Select your LLM provider:" \
        "Anthropic (Claude) - Recommended" \
        "OpenAI (GPT)" \
@@ -540,7 +789,7 @@ if [ -z "$SELECTED_PROVIDER_ID" ]; then
            echo -e "${YELLOW}Skipped.${NC} An LLM API key is required to test and use worker agents."
            echo -e "Add your API key later by running:"
            echo ""
-            echo -e "  ${CYAN}echo 'ANTHROPIC_API_KEY=your-key' >> .env${NC}"
+            echo -e "  ${CYAN}echo 'export ANTHROPIC_API_KEY=\"your-key\"' >> $SHELL_RC_FILE${NC}"
            echo ""
            SELECTED_ENV_VAR=""
            SELECTED_PROVIDER_ID=""
@@ -554,26 +803,32 @@ if [ -z "$SELECTED_PROVIDER_ID" ]; then
        read -r -p "Paste your $PROVIDER_NAME API key (or press Enter to skip): " API_KEY

        if [ -n "$API_KEY" ]; then
-            # Save to .env
-            echo "" >> "$SCRIPT_DIR/.env"
-            echo "$SELECTED_ENV_VAR=$API_KEY" >> "$SCRIPT_DIR/.env"
+            # Save to shell rc file
+            echo "" >> "$SHELL_RC_FILE"
+            echo "# Hive Agent Framework - $PROVIDER_NAME API key" >> "$SHELL_RC_FILE"
+            echo "export $SELECTED_ENV_VAR=\"$API_KEY\"" >> "$SHELL_RC_FILE"
            export "$SELECTED_ENV_VAR=$API_KEY"
            echo ""
-            echo -e "${GREEN}⬢${NC} API key saved to .env"
+            echo -e "${GREEN}⬢${NC} API key saved to $SHELL_RC_FILE"
        else
            echo ""
-            echo -e "${YELLOW}Skipped.${NC} Add your API key to .env when ready."
+            echo -e "${YELLOW}Skipped.${NC} Add your API key to $SHELL_RC_FILE when ready."
            SELECTED_ENV_VAR=""
            SELECTED_PROVIDER_ID=""
        fi
    fi
 fi

+# Prompt for model if not already selected (manual provider path)
+if [ -n "$SELECTED_PROVIDER_ID" ] && [ -z "$SELECTED_MODEL" ]; then
+    prompt_model_selection "$SELECTED_PROVIDER_ID"
+fi
+
 # Save configuration if a provider was selected
 if [ -n "$SELECTED_PROVIDER_ID" ]; then
    echo ""
    echo -n "  Saving configuration... "
-    save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" > /dev/null
+    save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" > /dev/null
    echo -e "${GREEN}⬢${NC}"
    echo -e "  ${DIM}~/.hive/configuration.json${NC}"
 fi
@@ -591,7 +846,7 @@ echo ""

 HIVE_CRED_DIR="$HOME/.hive/credentials"

-# Check if HIVE_CREDENTIAL_KEY already exists (from env or .env)
+# Check if HIVE_CREDENTIAL_KEY already exists (from env or shell rc)
 if [ -n "$HIVE_CREDENTIAL_KEY" ]; then
    echo -e "${GREEN}  ✓ HIVE_CREDENTIAL_KEY already set${NC}"
 else
@@ -606,16 +861,13 @@ else
    else
        echo -e "${GREEN}ok${NC}"

-        # Save to .env file
-        if [ ! -f "$SCRIPT_DIR/.env" ]; then
-            touch "$SCRIPT_DIR/.env"
-        fi
-        echo "" >> "$SCRIPT_DIR/.env"
-        echo "# Encryption key for Hive credential store (~/.hive/credentials)" >> "$SCRIPT_DIR/.env"
-        echo "HIVE_CREDENTIAL_KEY=$GENERATED_KEY" >> "$SCRIPT_DIR/.env"
+        # Save to shell rc file
+        echo "" >> "$SHELL_RC_FILE"
+        echo "# Encryption key for Hive credential store (~/.hive/credentials)" >> "$SHELL_RC_FILE"
+        echo "export HIVE_CREDENTIAL_KEY=\"$GENERATED_KEY\"" >> "$SHELL_RC_FILE"
        export HIVE_CREDENTIAL_KEY="$GENERATED_KEY"

-        echo -e "${GREEN}  ✓ Encryption key saved to .env${NC}"
+        echo -e "${GREEN}  ✓ Encryption key saved to $SHELL_RC_FILE${NC}"
    fi
 fi

@@ -758,7 +1010,9 @@ echo ""

 # Show configured provider
 if [ -n "$SELECTED_PROVIDER_ID" ]; then
-    SELECTED_MODEL="$(get_default_model "$SELECTED_PROVIDER_ID")"
+    if [ -z "$SELECTED_MODEL" ]; then
+        SELECTED_MODEL="$(get_default_model "$SELECTED_PROVIDER_ID")"
+    fi
    echo -e "${BOLD}Default LLM:${NC}"
    echo -e "  ${CYAN}$SELECTED_PROVIDER_ID${NC} → ${DIM}$SELECTED_MODEL${NC}"
    echo ""
@@ -772,11 +1026,6 @@ if [ -n "$HIVE_CREDENTIAL_KEY" ]; then
    echo ""
 fi

-echo -e "${BOLD}Run an Agent:${NC}"
-echo ""
-echo -e "  Launch the interactive dashboard to browse and run agents:"
-echo -e "     ${CYAN}hive tui${NC}"
-echo ""
 echo -e "${BOLD}Build a New Agent:${NC}"
 echo ""
 echo -e "  1. Open Claude Code in this directory:"
@@ -788,15 +1037,18 @@ echo ""
 echo -e "  3. Test an existing agent:"
 echo -e "     ${CYAN}/hive-test${NC}"
 echo ""
-echo -e "${BOLD}Skills:${NC}"
-if [ -d "$SCRIPT_DIR/.claude/skills" ]; then
-    for skill_dir in "$SCRIPT_DIR/.claude/skills"/*/; do
-        skill_name=$(basename "$skill_dir")
-        echo -e "  ⬡ ${CYAN}/$skill_name${NC}"
-    done
+echo -e "${BOLD}Run an Agent:${NC}"
+echo ""
+echo -e "  Launch the interactive dashboard to browse and run agents:"
+echo -e "  You can start a example agent or an agent built by yourself:"
+echo -e "     ${CYAN}hive tui${NC}"
+echo ""
+# Show shell sourcing reminder if we added environment variables
+if [ -n "$SELECTED_PROVIDER_ID" ] || [ -n "$HIVE_CREDENTIAL_KEY" ]; then
+    echo -e "${BOLD}Note:${NC} To use the new environment variables in this shell, run:"
+    echo -e "  ${CYAN}source $SHELL_RC_FILE${NC}"
+    echo ""
 fi
-echo ""
-echo -e "${BOLD}Examples:${NC} ${CYAN}exports/${NC}"
-echo ""
+
 echo -e "${DIM}Run ./quickstart.sh again to reconfigure.${NC}"
-echo ""
+echo ""
@@ -36,6 +36,7 @@ Credential categories:
 - llm.py: LLM provider credentials (anthropic, openai, etc.)
 - search.py: Search tool credentials (brave_search, google_search, etc.)
 - email.py: Email provider credentials (resend, google/gmail)
+- apollo.py: Apollo.io API credentials
 - github.py: GitHub API credentials
 - hubspot.py: HubSpot CRM credentials
 - slack.py: Slack workspace credentials
@@ -49,6 +50,7 @@ To add a new credential:
 3. If new category, import and merge it in this __init__.py
 """

+from .apollo import APOLLO_CREDENTIALS
 from .base import CredentialError, CredentialSpec
 from .browser import get_aden_auth_url, get_aden_setup_url, open_browser
 from .email import EMAIL_CREDENTIALS
@@ -71,6 +73,7 @@ CREDENTIAL_SPECS = {
    **LLM_CREDENTIALS,
    **SEARCH_CREDENTIALS,
    **EMAIL_CREDENTIALS,
+    **APOLLO_CREDENTIALS,
    **GITHUB_CREDENTIALS,
    **HUBSPOT_CREDENTIALS,
    **SLACK_CREDENTIALS,
@@ -104,4 +107,5 @@ __all__ = [
    "GITHUB_CREDENTIALS",
    "HUBSPOT_CREDENTIALS",
    "SLACK_CREDENTIALS",
+    "APOLLO_CREDENTIALS",
 ]
@@ -0,0 +1,43 @@
+"""
+Apollo.io tool credentials.
+
+Contains credentials for Apollo.io API integration.
+"""
+
+from .base import CredentialSpec
+
+APOLLO_CREDENTIALS = {
+    "apollo": CredentialSpec(
+        env_var="APOLLO_API_KEY",
+        tools=[
+            "apollo_enrich_person",
+            "apollo_enrich_company",
+            "apollo_search_people",
+            "apollo_search_companies",
+        ],
+        required=True,
+        startup_required=False,
+        help_url="https://apolloio.github.io/apollo-api-docs/",
+        description="Apollo.io API key for contact and company data enrichment",
+        # Auth method support
+        aden_supported=False,
+        direct_api_key_supported=True,
+        api_key_instructions="""To get an Apollo.io API key:
+1. Sign up or log in at https://app.apollo.io/
+2. Go to Settings > Integrations > API
+3. Click "Connect" to generate your API key
+4. Copy the API key
+
+Note: Apollo uses export credits for enrichment:
+- Free plan: 10 credits/month
+- Basic ($49/user/mo): 1,000 credits/month
+- Professional ($79/user/mo): 2,000 credits/month
+- Overage: $0.20/credit""",
+        # Health check configuration
+        health_check_endpoint="https://api.apollo.io/v1/auth/health",
+        health_check_method="GET",
+        # Credential store mapping
+        credential_id="apollo",
+        credential_key="api_key",
+    ),
+}
@@ -353,7 +353,18 @@ class CredentialStoreAdapter:
        cls,
        specs: dict[str, CredentialSpec] | None = None,
    ) -> CredentialStoreAdapter:
-        """Create adapter with encrypted storage primary and env var fallback."""
+        """Create adapter with encrypted storage primary and env var fallback.
+
+        When ADEN_API_KEY is set, builds the store with AdenSyncProvider and
+        AdenCachedStorage so that OAuth credentials (Google, HubSpot, Slack)
+        auto-refresh via the Aden server.  Non-Aden credentials (brave_search,
+        anthropic, resend) still resolve from environment variables.
+
+        When ADEN_API_KEY is not set, behaves identically to before.
+        """
+        import logging
+        import os
+
        from framework.credentials import CredentialStore
        from framework.credentials.storage import (
            CompositeStorage,
@@ -361,6 +372,8 @@ class CredentialStoreAdapter:
            EnvVarStorage,
        )

+        log = logging.getLogger(__name__)
+
        if specs is None:
            from . import CREDENTIAL_SPECS

@@ -368,17 +381,69 @@ class CredentialStoreAdapter:

        env_mapping = {name: spec.env_var for name, spec in specs.items()}

+        # --- Aden sync branch ---
+        # Note: we don't use CredentialStore.with_aden_sync() here because it
+        # only wraps EncryptedFileStorage.  We need CompositeStorage (encrypted
+        # + env var fallback) so non-Aden credentials like brave_search still
+        # resolve from environment variables.
+        aden_api_key = os.environ.get("ADEN_API_KEY")
+        if aden_api_key:
+            try:
+                from framework.credentials.aden import (
+                    AdenCachedStorage,
+                    AdenClientConfig,
+                    AdenCredentialClient,
+                    AdenSyncProvider,
+                )
+
+                # Local storage: encrypted primary + env var fallback
+                encrypted = EncryptedFileStorage()
+                env = EnvVarStorage(env_mapping)
+                local_composite = CompositeStorage(primary=encrypted, fallbacks=[env])
+
+                # Aden components
+                client = AdenCredentialClient(
+                    AdenClientConfig(
+                        base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
+                    )
+                )
+                provider = AdenSyncProvider(client=client)
+
+                # AdenCachedStorage wraps composite, giving Aden priority
+                cached_storage = AdenCachedStorage(
+                    local_storage=local_composite,
+                    aden_provider=provider,
+                    cache_ttl_seconds=300,
+                )
+
+                store = CredentialStore(
+                    storage=cached_storage,
+                    providers=[provider],
+                    auto_refresh=True,
+                )
+
+                # Initial sync: populate local cache from Aden
+                try:
+                    synced = provider.sync_all(store)
+                    log.info("Aden credential sync complete: %d credentials synced", synced)
+                except Exception as e:
+                    log.warning("Aden initial sync failed (will retry on access): %s", e)
+
+                return cls(store=store, specs=specs)
+
+            except Exception as e:
+                log.warning(
+                    "Aden credential sync unavailable, falling back to default storage: %s", e
+                )
+
+        # --- Default branch (no ADEN_API_KEY or Aden setup failed) ---
        try:
            encrypted = EncryptedFileStorage()
            env = EnvVarStorage(env_mapping)
            composite = CompositeStorage(primary=encrypted, fallbacks=[env])
            store = CredentialStore(storage=composite)
        except Exception as e:
-            import logging
-
-            logging.getLogger(__name__).warning(
-                "Encrypted credential storage unavailable, falling back to env vars: %s", e
-            )
+            log.warning("Encrypted credential storage unavailable, falling back to env vars: %s", e)
            store = CredentialStore.with_env_storage(env_mapping)

        return cls(store=store, specs=specs)
@@ -21,6 +21,7 @@ if TYPE_CHECKING:
    from aden_tools.credentials import CredentialStoreAdapter

 # Import register_tools from each tool module
+from .apollo_tool import register_tools as register_apollo
 from .csv_tool import register_tools as register_csv
 from .email_tool import register_tools as register_email
 from .example_tool import register_tools as register_example
@@ -76,6 +77,7 @@ def register_all_tools(
    # email supports multiple providers (Resend) with auto-detection
    register_email(mcp, credentials=credentials)
    register_hubspot(mcp, credentials=credentials)
+    register_apollo(mcp, credentials=credentials)
    register_slack(mcp, credentials=credentials)

    # Register file system toolkits
@@ -112,6 +114,10 @@ def register_all_tools(
        "csv_append",
        "csv_info",
        "csv_sql",
+        "apollo_enrich_person",
+        "apollo_enrich_company",
+        "apollo_search_people",
+        "apollo_search_companies",
        "github_list_repos",
        "github_get_repo",
        "github_search_repos",
@@ -0,0 +1,42 @@
+# Apollo.io Tool
+
+B2B contact and company data enrichment via the Apollo.io API.
+
+## Tools
+
+| Tool | Description |
+|------|-------------|
+| `apollo_enrich_person` | Enrich a contact by email, LinkedIn URL, or name+domain |
+| `apollo_enrich_company` | Enrich a company by domain |
+| `apollo_search_people` | Search contacts with filters (titles, seniorities, locations, etc.) |
+| `apollo_search_companies` | Search companies with filters (industries, employee counts, etc.) |
+
+## Authentication
+
+Requires an Apollo.io API key passed via `APOLLO_API_KEY` environment variable or the credential store.
+
+**How to get an API key:**
+
+1. Sign up or log in at https://app.apollo.io/
+2. Go to Settings > Integrations > API
+3. Click "Connect" to generate your API key
+4. Copy the API key
+
+## Pricing
+
+| Plan | Price | Export Credits/month |
+|------|-------|---------------------|
+| Free | $0 | 10 |
+| Basic | $49/user/mo | 1,000 |
+| Professional | $79/user/mo | 2,000 |
+| Overage | - | $0.20/credit |
+
+## Error Handling
+
+Returns error dicts for common failure modes:
+
+- `401` - Invalid API key
+- `403` - Insufficient credits or permissions
+- `404` - Resource not found
+- `422` - Invalid parameters
+- `429` - Rate limit exceeded
@@ -0,0 +1,13 @@
+"""
+Apollo.io Tool - Contact and company data enrichment via Apollo API.
+
+Supports API key authentication for:
+- Person enrichment by email or LinkedIn
+- Company enrichment by domain
+- People search with filters
+- Company search with filters
+"""
+
+from .apollo_tool import register_tools
+
+__all__ = ["register_tools"]
@@ -0,0 +1,581 @@
+"""
+Apollo.io Tool - Contact and company data enrichment via Apollo API.
+
+Supports:
+- API key authentication (APOLLO_API_KEY)
+
+Use Cases:
+- Enrich contacts by email or LinkedIn URL
+- Enrich companies by domain
+- Search for people by titles, seniorities, locations
+- Search for companies by industries, employee counts, technologies
+
+API Reference: https://apolloio.github.io/apollo-api-docs/
+"""
+
+from __future__ import annotations
+
+import os
+from typing import TYPE_CHECKING, Any
+
+import httpx
+from fastmcp import FastMCP
+
+if TYPE_CHECKING:
+    from aden_tools.credentials import CredentialStoreAdapter
+
+APOLLO_API_BASE = "https://api.apollo.io/api/v1"
+
+
+class _ApolloClient:
+    """Internal client wrapping Apollo.io API calls."""
+
+    def __init__(self, api_key: str):
+        self._api_key = api_key
+
+    @property
+    def _headers(self) -> dict[str, str]:
+        return {
+            "Content-Type": "application/json",
+            "Accept": "application/json",
+            "Cache-Control": "no-cache",
+            "X-Api-Key": self._api_key,
+        }
+
+    def _handle_response(self, response: httpx.Response) -> dict[str, Any]:
+        """Handle common HTTP error codes."""
+        if response.status_code == 401:
+            return {"error": "Invalid Apollo API key"}
+        if response.status_code == 403:
+            return {
+                "error": "Insufficient credits or permissions. Check your Apollo plan.",
+                "help": "Apollo uses export credits for enrichment. Visit https://app.apollo.io/#/settings/plans",
+            }
+        if response.status_code == 404:
+            return {"error": "Resource not found"}
+        if response.status_code == 422:
+            try:
+                detail = response.json().get("error", response.text)
+            except Exception:
+                detail = response.text
+            return {"error": f"Invalid parameters: {detail}"}
+        if response.status_code == 429:
+            return {"error": "Apollo rate limit exceeded. Try again later."}
+        if response.status_code >= 400:
+            try:
+                detail = response.json().get("error", response.text)
+            except Exception:
+                detail = response.text
+            return {"error": f"Apollo API error (HTTP {response.status_code}): {detail}"}
+        return response.json()
+
+    def enrich_person(
+        self,
+        email: str | None = None,
+        linkedin_url: str | None = None,
+        first_name: str | None = None,
+        last_name: str | None = None,
+        name: str | None = None,
+        domain: str | None = None,
+        reveal_personal_emails: bool = False,
+        reveal_phone_number: bool = False,
+    ) -> dict[str, Any]:
+        """Enrich a person by email, LinkedIn URL, or name and domain."""
+        body: dict[str, Any] = {
+            "reveal_personal_emails": reveal_personal_emails,
+            "reveal_phone_number": reveal_phone_number,
+        }
+
+        if email:
+            body["email"] = email
+        if linkedin_url:
+            body["linkedin_url"] = linkedin_url
+        if first_name:
+            body["first_name"] = first_name
+        if last_name:
+            body["last_name"] = last_name
+        if name:
+            body["name"] = name
+        if domain:
+            body["domain"] = domain
+
+        response = httpx.post(
+            f"{APOLLO_API_BASE}/people/match",
+            headers=self._headers,
+            params=body if not email and not linkedin_url else None,
+            json=body,
+            timeout=30.0,
+        )
+        result = self._handle_response(response)
+
+        # Handle "not found" gracefully
+        if "error" not in result and result.get("person") is None:
+            return {"match_found": False, "message": "No matching person found"}
+
+        if "error" not in result:
+            person = result.get("person", {})
+            return {
+                "match_found": True,
+                "person": {
+                    "id": person.get("id"),
+                    "first_name": person.get("first_name"),
+                    "last_name": person.get("last_name"),
+                    "name": person.get("name"),
+                    "title": person.get("title"),
+                    "email": person.get("email"),
+                    "email_status": person.get("email_status"),
+                    "phone_numbers": person.get("phone_numbers", []),
+                    "linkedin_url": person.get("linkedin_url"),
+                    "twitter_url": person.get("twitter_url"),
+                    "city": person.get("city"),
+                    "state": person.get("state"),
+                    "country": person.get("country"),
+                    "organization": {
+                        "id": person.get("organization", {}).get("id"),
+                        "name": person.get("organization", {}).get("name"),
+                        "domain": person.get("organization", {}).get("primary_domain"),
+                        "industry": person.get("organization", {}).get("industry"),
+                        "employee_count": person.get("organization", {}).get(
+                            "estimated_num_employees"
+                        ),
+                    },
+                },
+            }
+        return result
+
+    def enrich_company(self, domain: str) -> dict[str, Any]:
+        """Enrich a company by domain."""
+        body: dict[str, Any] = {
+            "domain": domain,
+        }
+
+        response = httpx.post(
+            f"{APOLLO_API_BASE}/organizations/enrich",
+            headers=self._headers,
+            json=body,
+            timeout=30.0,
+        )
+        result = self._handle_response(response)
+
+        # Handle "not found" gracefully
+        if "error" not in result and result.get("organization") is None:
+            return {"match_found": False, "message": "No matching company found"}
+
+        if "error" not in result:
+            org = result.get("organization", {})
+            return {
+                "match_found": True,
+                "organization": {
+                    "id": org.get("id"),
+                    "name": org.get("name"),
+                    "domain": org.get("primary_domain"),
+                    "website_url": org.get("website_url"),
+                    "linkedin_url": org.get("linkedin_url"),
+                    "twitter_url": org.get("twitter_url"),
+                    "facebook_url": org.get("facebook_url"),
+                    "industry": org.get("industry"),
+                    "keywords": org.get("keywords", []),
+                    "employee_count": org.get("estimated_num_employees"),
+                    "employee_count_range": org.get("employee_count_range"),
+                    "annual_revenue": org.get("annual_revenue"),
+                    "annual_revenue_printed": org.get("annual_revenue_printed"),
+                    "total_funding": org.get("total_funding"),
+                    "total_funding_printed": org.get("total_funding_printed"),
+                    "latest_funding_round_date": org.get("latest_funding_round_date"),
+                    "latest_funding_stage": org.get("latest_funding_stage"),
+                    "founded_year": org.get("founded_year"),
+                    "phone": org.get("phone"),
+                    "city": org.get("city"),
+                    "state": org.get("state"),
+                    "country": org.get("country"),
+                    "street_address": org.get("street_address"),
+                    "technologies": org.get("technologies", []),
+                    "short_description": org.get("short_description"),
+                },
+            }
+        return result
+
+    def search_people(
+        self,
+        titles: list[str] | None = None,
+        seniorities: list[str] | None = None,
+        locations: list[str] | None = None,
+        company_sizes: list[str] | None = None,
+        industries: list[str] | None = None,
+        technologies: list[str] | None = None,
+        limit: int = 10,
+    ) -> dict[str, Any]:
+        """Search for people with filters."""
+        body: dict[str, Any] = {
+            "per_page": min(limit, 100),
+            "page": 1,
+        }
+
+        if titles:
+            body["person_titles"] = titles
+        if seniorities:
+            body["person_seniorities"] = seniorities
+        if locations:
+            body["person_locations"] = locations
+        if company_sizes:
+            body["organization_num_employees_ranges"] = company_sizes
+        if industries:
+            body["organization_industry_tag_ids"] = industries
+        if technologies:
+            body["currently_using_any_of_technology_uids"] = technologies
+
+        response = httpx.post(
+            f"{APOLLO_API_BASE}/mixed_people/search",
+            headers=self._headers,
+            json=body,
+            timeout=30.0,
+        )
+        result = self._handle_response(response)
+
+        if "error" not in result:
+            people = result.get("people", [])
+            return {
+                "total": result.get("pagination", {}).get("total_entries", len(people)),
+                "page": result.get("pagination", {}).get("page", 1),
+                "per_page": result.get("pagination", {}).get("per_page", limit),
+                "results": [
+                    {
+                        "id": p.get("id"),
+                        "first_name": p.get("first_name"),
+                        "last_name": p.get("last_name"),
+                        "name": p.get("name"),
+                        "title": p.get("title"),
+                        "email": p.get("email"),
+                        "email_status": p.get("email_status"),
+                        "linkedin_url": p.get("linkedin_url"),
+                        "city": p.get("city"),
+                        "state": p.get("state"),
+                        "country": p.get("country"),
+                        "seniority": p.get("seniority"),
+                        "organization": {
+                            "id": p.get("organization", {}).get("id")
+                            if p.get("organization")
+                            else None,
+                            "name": p.get("organization", {}).get("name")
+                            if p.get("organization")
+                            else None,
+                            "domain": p.get("organization", {}).get("primary_domain")
+                            if p.get("organization")
+                            else None,
+                        },
+                    }
+                    for p in people
+                ],
+            }
+        return result
+
+    def search_companies(
+        self,
+        industries: list[str] | None = None,
+        employee_counts: list[str] | None = None,
+        locations: list[str] | None = None,
+        technologies: list[str] | None = None,
+        limit: int = 10,
+    ) -> dict[str, Any]:
+        """Search for companies with filters."""
+        body: dict[str, Any] = {
+            "per_page": min(limit, 100),
+            "page": 1,
+        }
+
+        if industries:
+            body["organization_industry_tag_ids"] = industries
+        if employee_counts:
+            body["organization_num_employees_ranges"] = employee_counts
+        if locations:
+            body["organization_locations"] = locations
+        if technologies:
+            body["currently_using_any_of_technology_uids"] = technologies
+
+        response = httpx.post(
+            f"{APOLLO_API_BASE}/mixed_companies/search",
+            headers=self._headers,
+            json=body,
+            timeout=30.0,
+        )
+        result = self._handle_response(response)
+
+        if "error" not in result:
+            orgs = result.get("organizations", [])
+            return {
+                "total": result.get("pagination", {}).get("total_entries", len(orgs)),
+                "page": result.get("pagination", {}).get("page", 1),
+                "per_page": result.get("pagination", {}).get("per_page", limit),
+                "results": [
+                    {
+                        "id": o.get("id"),
+                        "name": o.get("name"),
+                        "domain": o.get("primary_domain"),
+                        "website_url": o.get("website_url"),
+                        "linkedin_url": o.get("linkedin_url"),
+                        "industry": o.get("industry"),
+                        "employee_count": o.get("estimated_num_employees"),
+                        "employee_count_range": o.get("employee_count_range"),
+                        "annual_revenue_printed": o.get("annual_revenue_printed"),
+                        "city": o.get("city"),
+                        "state": o.get("state"),
+                        "country": o.get("country"),
+                        "short_description": o.get("short_description"),
+                    }
+                    for o in orgs
+                ],
+            }
+        return result
+
+
+def register_tools(
+    mcp: FastMCP,
+    credentials: CredentialStoreAdapter | None = None,
+) -> None:
+    """Register Apollo.io data enrichment tools with the MCP server."""
+
+    def _get_api_key() -> str | None:
+        """Get Apollo API key from credential manager or environment."""
+        if credentials is not None:
+            api_key = credentials.get("apollo")
+            # Defensive check: ensure we get a string, not a complex object
+            if api_key is not None and not isinstance(api_key, str):
+                raise TypeError(
+                    f"Expected string from credentials.get('apollo'), got {type(api_key).__name__}"
+                )
+            return api_key
+        return os.getenv("APOLLO_API_KEY")
+
+    def _get_client() -> _ApolloClient | dict[str, str]:
+        """Get an Apollo client, or return an error dict if no credentials."""
+        api_key = _get_api_key()
+        if not api_key:
+            return {
+                "error": "Apollo credentials not configured",
+                "help": (
+                    "Set APOLLO_API_KEY environment variable "
+                    "or configure via credential store. "
+                    "Get your API key at https://app.apollo.io/#/settings/integrations/api"
+                ),
+            }
+        return _ApolloClient(api_key)
+
+    # --- Person Enrichment ---
+
+    @mcp.tool()
+    def apollo_enrich_person(
+        email: str | None = None,
+        linkedin_url: str | None = None,
+        first_name: str | None = None,
+        last_name: str | None = None,
+        name: str | None = None,
+        domain: str | None = None,
+        reveal_personal_emails: bool = False,
+        reveal_phone_number: bool = False,
+    ) -> dict:
+        """
+        Enrich a person's information by email, LinkedIn URL, or name and domain.
+
+        Args:
+            email: Person's email address
+            linkedin_url: Person's LinkedIn profile URL
+            first_name: Person's first name (use with last_name and domain)
+            last_name: Person's last name (use with first_name and domain)
+            name: Person's full name (use with domain)
+            domain: Person's company domain (e.g., "acme.com")
+            reveal_personal_emails: Whether to reveal personal email addresses (default: False)
+            reveal_phone_number: Whether to reveal phone numbers (default: False)
+
+        Returns:
+            Dict with person details including:
+            - Full name, title
+            - Email and email status
+            - Phone numbers (if revealed)
+            - Location (city, state, country)
+            - LinkedIn/Twitter URLs
+            - Company info (name, industry, size)
+            Or error dict if enrichment fails
+
+        Example:
+            apollo_enrich_person(email="john@acme.com")
+            apollo_enrich_person(name="John Doe", domain="acme.com")
+        """
+        client = _get_client()
+        if isinstance(client, dict):
+            return client
+
+        # Validate that we have enough info to match
+        has_email_or_linkedin = bool(email or linkedin_url)
+        has_name_and_domain = bool((first_name and last_name and domain) or (name and domain))
+
+        if not has_email_or_linkedin and not has_name_and_domain:
+            return {
+                "error": (
+                    "Invalid search criteria. Provide either (email), (linkedin_url), "
+                    "or (name/first_name+last_name AND domain)."
+                )
+            }
+        try:
+            return client.enrich_person(
+                email=email,
+                linkedin_url=linkedin_url,
+                first_name=first_name,
+                last_name=last_name,
+                name=name,
+                domain=domain,
+                reveal_personal_emails=reveal_personal_emails,
+                reveal_phone_number=reveal_phone_number,
+            )
+        except httpx.TimeoutException:
+            return {"error": "Request timed out"}
+        except httpx.RequestError as e:
+            return {"error": f"Network error: {e}"}
+
+    # --- Company Enrichment ---
+
+    @mcp.tool()
+    def apollo_enrich_company(domain: str) -> dict:
+        """
+        Enrich a company by domain.
+
+        Args:
+            domain: Company domain (e.g., "acme.com")
+
+        Returns:
+            Dict with company firmographics including:
+            - name, domain, website URL
+            - Industry, keywords
+            - Employee count and range
+            - Annual revenue, funding info
+            - Founded year, location
+            - Technologies used
+            Or error dict if enrichment fails
+
+        Example:
+            apollo_enrich_company(domain="openai.com")
+        """
+        client = _get_client()
+        if isinstance(client, dict):
+            return client
+        try:
+            return client.enrich_company(domain)
+        except httpx.TimeoutException:
+            return {"error": "Request timed out"}
+        except httpx.RequestError as e:
+            return {"error": f"Network error: {e}"}
+
+    # --- People Search ---
+
+    @mcp.tool()
+    def apollo_search_people(
+        titles: list[str] | None = None,
+        seniorities: list[str] | None = None,
+        locations: list[str] | None = None,
+        company_sizes: list[str] | None = None,
+        industries: list[str] | None = None,
+        technologies: list[str] | None = None,
+        limit: int = 10,
+    ) -> dict:
+        """
+        Search for contacts with filters.
+
+        Args:
+            titles: Job titles to search for
+                (e.g., ["VP Sales", "Director of Marketing"])
+            seniorities: Seniority levels
+                (e.g., ["vp", "director", "c_suite", "manager", "senior"])
+            locations: Geographic locations
+                (e.g., ["San Francisco, CA", "New York, NY"])
+            company_sizes: Company employee count ranges
+                (e.g., ["1-10", "11-50", "51-200", "201-500", "501-1000", "1001-5000"])
+            industries: Industry tags
+                (e.g., ["technology", "finance", "healthcare"])
+            technologies: Technologies used by company
+                (e.g., ["salesforce", "hubspot", "aws"])
+            limit: Maximum results (1-100, default 10)
+
+        Returns:
+            Dict with:
+            - total: Total matching results
+            - results: List of matching contacts with email and company info
+            Or error dict if search fails
+
+        Example:
+            apollo_search_people(
+                titles=["VP Sales", "Head of Sales"],
+                seniorities=["vp", "director"],
+                company_sizes=["51-200", "201-500"],
+                limit=25
+            )
+        """
+        client = _get_client()
+        if isinstance(client, dict):
+            return client
+        try:
+            return client.search_people(
+                titles=titles,
+                seniorities=seniorities,
+                locations=locations,
+                company_sizes=company_sizes,
+                industries=industries,
+                technologies=technologies,
+                limit=limit,
+            )
+        except httpx.TimeoutException:
+            return {"error": "Request timed out"}
+        except httpx.RequestError as e:
+            return {"error": f"Network error: {e}"}
+
+    # --- Company Search ---
+
+    @mcp.tool()
+    def apollo_search_companies(
+        industries: list[str] | None = None,
+        employee_counts: list[str] | None = None,
+        locations: list[str] | None = None,
+        technologies: list[str] | None = None,
+        limit: int = 10,
+    ) -> dict:
+        """
+        Search for companies with filters.
+
+        Args:
+            industries: Industry tags
+                (e.g., ["technology", "finance", "healthcare"])
+            employee_counts: Employee count ranges
+                (e.g., ["1-10", "11-50", "51-200", "201-500", "501-1000"])
+            locations: Geographic locations
+                (e.g., ["San Francisco, CA", "United States"])
+            technologies: Technologies used
+                (e.g., ["salesforce", "hubspot", "aws", "kubernetes"])
+            limit: Maximum results (1-100, default 10)
+
+        Returns:
+            Dict with:
+            - total: Total matching results
+            - results: List of matching companies with firmographic data
+            Or error dict if search fails
+
+        Example:
+            apollo_search_companies(
+                industries=["technology"],
+                employee_counts=["51-200", "201-500"],
+                technologies=["kubernetes"],
+                limit=20
+            )
+        """
+        client = _get_client()
+        if isinstance(client, dict):
+            return client
+        try:
+            return client.search_companies(
+                industries=industries,
+                employee_counts=employee_counts,
+                locations=locations,
+                technologies=technologies,
+                limit=limit,
+            )
+        except httpx.TimeoutException:
+            return {"error": "Request timed out"}
+        except httpx.RequestError as e:
+            return {"error": f"Network error: {e}"}
@@ -1,5 +1,7 @@
 """Tests for CredentialStoreAdapter."""

+from unittest.mock import MagicMock, patch
+
 import pytest

 from aden_tools.credentials import (
@@ -484,3 +486,130 @@ class TestSpecCompleteness:
                assert spec.credential_group == "", (
                    f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'"
                )
+
+
+class TestCredentialStoreAdapterAdenSync:
+    """Tests for Aden sync branch in CredentialStoreAdapter.default()."""
+
+    def _patch_encrypted_storage(self, tmp_path):
+        """Patch EncryptedFileStorage to use a temp directory."""
+        from framework.credentials.storage import EncryptedFileStorage
+
+        original_init = EncryptedFileStorage.__init__
+
+        def patched_init(self_inner, base_path=None, **kwargs):
+            original_init(self_inner, base_path=str(tmp_path / "creds"), **kwargs)
+
+        return patch.object(EncryptedFileStorage, "__init__", patched_init)
+
+    def test_default_with_aden_key_creates_aden_store(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is set, default() wires up AdenSyncProvider."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Verify AdenSyncProvider is registered
+        provider = adapter.store.get_provider("aden_sync")
+        assert provider is not None
+
+    def test_default_without_aden_key_uses_env_fallback(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is not set, default() uses env-only storage."""
+        monkeypatch.delenv("ADEN_API_KEY", raising=False)
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-brave-key")
+
+        with self._patch_encrypted_storage(tmp_path):
+            adapter = CredentialStoreAdapter.default()
+
+        # No Aden provider should be registered
+        assert adapter.store.get_provider("aden_sync") is None
+        # Env vars still work
+        assert adapter.get("brave_search") == "test-brave-key"
+
+    def test_default_aden_non_aden_cred_falls_through_to_env(self, monkeypatch, tmp_path):
+        """Non-Aden credentials (e.g. brave_search) resolve from env vars even with Aden."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-from-env")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+        # Aden returns None for brave_search (404 → None)
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        assert adapter.get("brave_search") == "brave-from-env"
+
+    def test_default_aden_sync_failure_falls_back_gracefully(self, monkeypatch, tmp_path):
+        """If Aden initial sync fails, adapter is still created and env vars work."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.side_effect = Exception("Connection refused")
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Adapter was created despite sync failure
+        assert adapter is not None
+        assert adapter.get("brave_search") == "brave-fallback"
+
+    def test_default_aden_import_error_falls_back(self, monkeypatch, tmp_path):
+        """If Aden imports fail (e.g. missing httpx), fall back to default storage."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        import builtins
+
+        real_import = builtins.__import__
+
+        def mock_import(name, *args, **kwargs):
+            if name == "framework.credentials.aden":
+                raise ImportError(f"No module named '{name}'")
+            return real_import(name, *args, **kwargs)
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch.object(builtins, "__import__", side_effect=mock_import),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Fell back to default — env vars still work, no Aden provider
+        assert adapter.store.get_provider("aden_sync") is None
+        assert adapter.get("brave_search") == "brave-fallback"
@@ -0,0 +1,675 @@
+"""
+Tests for Apollo.io data enrichment tool.
+
+Covers:
+- _ApolloClient methods (enrich_person, enrich_company, search_people, search_companies)
+- Error handling (401, 403, 404, 422, 429, 500, timeout)
+- Credential retrieval (CredentialStoreAdapter vs env var)
+- All 4 MCP tool functions
+- "Not found" graceful handling
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock, patch
+
+import httpx
+import pytest
+
+from aden_tools.tools.apollo_tool.apollo_tool import (
+    APOLLO_API_BASE,
+    _ApolloClient,
+    register_tools,
+)
+
+# --- _ApolloClient tests ---
+
+
+class TestApolloClient:
+    def setup_method(self):
+        self.client = _ApolloClient("test-api-key")
+
+    def test_headers(self):
+        headers = self.client._headers
+        assert headers["Content-Type"] == "application/json"
+        assert headers["Accept"] == "application/json"
+        # API key is passed in X-Api-Key header
+        assert headers["X-Api-Key"] == "test-api-key"
+
+    def test_handle_response_success(self):
+        response = MagicMock()
+        response.status_code = 200
+        response.json.return_value = {"person": {"id": "123"}}
+        assert self.client._handle_response(response) == {"person": {"id": "123"}}
+
+    @pytest.mark.parametrize(
+        "status_code,expected_substring",
+        [
+            (401, "Invalid Apollo API key"),
+            (403, "Insufficient credits"),
+            (404, "not found"),
+            (422, "Invalid parameters"),
+            (429, "rate limit"),
+        ],
+    )
+    def test_handle_response_errors(self, status_code, expected_substring):
+        response = MagicMock()
+        response.status_code = status_code
+        response.json.return_value = {"error": "Test error"}
+        response.text = "Test error"
+        result = self.client._handle_response(response)
+        assert "error" in result
+        assert expected_substring in result["error"]
+
+    def test_handle_response_generic_error(self):
+        response = MagicMock()
+        response.status_code = 500
+        response.json.return_value = {"error": "Internal Server Error"}
+        result = self.client._handle_response(response)
+        assert "error" in result
+        assert "500" in result["error"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_by_email(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {
+            "person": {
+                "id": "p123",
+                "first_name": "John",
+                "last_name": "Doe",
+                "name": "John Doe",
+                "title": "VP Sales",
+                "email": "john@acme.com",
+                "email_status": "verified",
+                "phone_numbers": [{"sanitized_number": "+1234567890"}],
+                "linkedin_url": "https://linkedin.com/in/johndoe",
+                "twitter_url": None,
+                "city": "San Francisco",
+                "state": "California",
+                "country": "United States",
+                "organization": {
+                    "id": "o456",
+                    "name": "Acme Inc",
+                    "primary_domain": "acme.com",
+                    "industry": "Technology",
+                    "estimated_num_employees": 250,
+                },
+            }
+        }
+        mock_post.return_value = mock_response
+
+        result = self.client.enrich_person(email="john@acme.com")
+
+        mock_post.assert_called_once_with(
+            f"{APOLLO_API_BASE}/people/match",
+            headers=self.client._headers,
+            params=None,
+            json={
+                "email": "john@acme.com",
+                "reveal_personal_emails": False,
+                "reveal_phone_number": False,
+            },
+            timeout=30.0,
+        )
+        assert result["match_found"] is True
+        assert result["person"]["first_name"] == "John"
+        assert result["person"]["title"] == "VP Sales"
+        assert result["person"]["organization"]["name"] == "Acme Inc"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_by_linkedin(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {
+            "person": {
+                "id": "p456",
+                "first_name": "Jane",
+                "last_name": "Smith",
+                "name": "Jane Smith",
+                "title": "CTO",
+                "email": "jane@startup.io",
+                "linkedin_url": "https://linkedin.com/in/janesmith",
+                "organization": {},
+            }
+        }
+        mock_post.return_value = mock_response
+
+        result = self.client.enrich_person(linkedin_url="https://linkedin.com/in/janesmith")
+
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["linkedin_url"] == "https://linkedin.com/in/janesmith"
+        assert result["match_found"] is True
+        assert result["person"]["title"] == "CTO"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_by_name_and_domain(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"person": {"id": "p123"}}
+        mock_post.return_value = mock_response
+
+        self.client.enrich_person(name="John Doe", domain="acme.com")
+
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["name"] == "John Doe"
+        assert call_json["domain"] == "acme.com"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_with_reveal_flags(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"person": {"id": "p123"}}
+        mock_post.return_value = mock_response
+
+        self.client.enrich_person(
+            email="john@acme.com",
+            reveal_personal_emails=True,
+            reveal_phone_number=True,
+        )
+
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["reveal_personal_emails"] is True
+        assert call_json["reveal_phone_number"] is True
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_with_optional_params(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"person": {"id": "p789"}}
+        mock_post.return_value = mock_response
+
+        self.client.enrich_person(
+            email="john@acme.com",
+            first_name="John",
+            last_name="Doe",
+            domain="acme.com",
+        )
+
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["email"] == "john@acme.com"
+        assert call_json["first_name"] == "John"
+        assert call_json["last_name"] == "Doe"
+        assert call_json["domain"] == "acme.com"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_not_found(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"person": None}
+        mock_post.return_value = mock_response
+
+        result = self.client.enrich_person(email="nobody@nowhere.xyz")
+
+        assert result["match_found"] is False
+        assert "No matching person found" in result["message"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_company(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {
+            "organization": {
+                "id": "o123",
+                "name": "OpenAI",
+                "primary_domain": "openai.com",
+                "website_url": "https://openai.com",
+                "linkedin_url": "https://linkedin.com/company/openai",
+                "industry": "Artificial Intelligence",
+                "keywords": ["ai", "machine learning", "gpt"],
+                "estimated_num_employees": 1500,
+                "employee_count_range": "1001-5000",
+                "annual_revenue": 1000000000,
+                "annual_revenue_printed": "$1B",
+                "total_funding": 11000000000,
+                "total_funding_printed": "$11B",
+                "latest_funding_round_date": "2023-01-23",
+                "latest_funding_stage": "Series D",
+                "founded_year": 2015,
+                "phone": "+1-415-123-4567",
+                "city": "San Francisco",
+                "state": "California",
+                "country": "United States",
+                "street_address": "123 Mission St",
+                "technologies": ["python", "kubernetes", "aws"],
+                "short_description": "AI research and deployment company",
+            }
+        }
+        mock_post.return_value = mock_response
+
+        result = self.client.enrich_company("openai.com")
+
+        mock_post.assert_called_once_with(
+            f"{APOLLO_API_BASE}/organizations/enrich",
+            headers=self.client._headers,
+            json={"domain": "openai.com"},
+            timeout=30.0,
+        )
+        assert result["match_found"] is True
+        assert result["organization"]["name"] == "OpenAI"
+        assert result["organization"]["industry"] == "Artificial Intelligence"
+        assert result["organization"]["employee_count"] == 1500
+        assert "python" in result["organization"]["technologies"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_company_not_found(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"organization": None}
+        mock_post.return_value = mock_response
+
+        result = self.client.enrich_company("notarealcompany12345.xyz")
+
+        assert result["match_found"] is False
+        assert "No matching company found" in result["message"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_people(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {
+            "pagination": {"total_entries": 150, "page": 1, "per_page": 10},
+            "people": [
+                {
+                    "id": "p1",
+                    "first_name": "Alice",
+                    "last_name": "Johnson",
+                    "name": "Alice Johnson",
+                    "title": "VP Sales",
+                    "email": "alice@company.com",
+                    "email_status": "verified",
+                    "linkedin_url": "https://linkedin.com/in/alicejohnson",
+                    "city": "New York",
+                    "state": "New York",
+                    "country": "United States",
+                    "seniority": "vp",
+                    "organization": {
+                        "id": "o1",
+                        "name": "Company Inc",
+                        "primary_domain": "company.com",
+                    },
+                },
+                {
+                    "id": "p2",
+                    "first_name": "Bob",
+                    "last_name": "Smith",
+                    "name": "Bob Smith",
+                    "title": "Director of Sales",
+                    "email": "bob@another.com",
+                    "email_status": "verified",
+                    "linkedin_url": "https://linkedin.com/in/bobsmith",
+                    "city": "Chicago",
+                    "state": "Illinois",
+                    "country": "United States",
+                    "seniority": "director",
+                    "organization": None,
+                },
+            ],
+        }
+        mock_post.return_value = mock_response
+
+        result = self.client.search_people(
+            titles=["VP Sales", "Director of Sales"],
+            seniorities=["vp", "director"],
+            company_sizes=["51-200", "201-500"],
+            limit=10,
+        )
+
+        mock_post.assert_called_once()
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["person_titles"] == ["VP Sales", "Director of Sales"]
+        assert call_json["person_seniorities"] == ["vp", "director"]
+        assert call_json["organization_num_employees_ranges"] == ["51-200", "201-500"]
+        assert call_json["per_page"] == 10
+
+        assert result["total"] == 150
+        assert len(result["results"]) == 2
+        assert result["results"][0]["title"] == "VP Sales"
+        assert result["results"][0]["organization"]["name"] == "Company Inc"
+        # Bob has no organization
+        assert result["results"][1]["organization"]["name"] is None
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_people_limit_capped(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"pagination": {}, "people": []}
+        mock_post.return_value = mock_response
+
+        self.client.search_people(limit=200)
+
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["per_page"] == 100
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_companies(self, mock_post):
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {
+            "pagination": {"total_entries": 50, "page": 1, "per_page": 10},
+            "organizations": [
+                {
+                    "id": "o1",
+                    "name": "Tech Startup",
+                    "primary_domain": "techstartup.io",
+                    "website_url": "https://techstartup.io",
+                    "linkedin_url": "https://linkedin.com/company/techstartup",
+                    "industry": "Technology",
+                    "estimated_num_employees": 75,
+                    "employee_count_range": "51-200",
+                    "annual_revenue_printed": "$10M",
+                    "city": "Austin",
+                    "state": "Texas",
+                    "country": "United States",
+                    "short_description": "A tech startup",
+                },
+            ],
+        }
+        mock_post.return_value = mock_response
+
+        result = self.client.search_companies(
+            industries=["technology"],
+            employee_counts=["51-200"],
+            technologies=["kubernetes"],
+            limit=10,
+        )
+
+        mock_post.assert_called_once()
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["organization_industry_tag_ids"] == ["technology"]
+        assert call_json["organization_num_employees_ranges"] == ["51-200"]
+        assert call_json["currently_using_any_of_technology_uids"] == ["kubernetes"]
+
+        assert result["total"] == 50
+        assert len(result["results"]) == 1
+        assert result["results"][0]["name"] == "Tech Startup"
+        assert result["results"][0]["industry"] == "Technology"
+
+
+# --- MCP tool registration and credential tests ---
+
+
+class TestToolRegistration:
+    def test_register_tools_registers_all_tools(self):
+        mcp = MagicMock()
+        mcp.tool.return_value = lambda fn: fn
+        register_tools(mcp)
+        assert mcp.tool.call_count == 4
+
+    def test_no_credentials_returns_error(self):
+        mcp = MagicMock()
+        registered_fns = []
+        mcp.tool.return_value = lambda fn: registered_fns.append(fn) or fn
+
+        with patch.dict("os.environ", {}, clear=True):
+            register_tools(mcp, credentials=None)
+
+        enrich_fn = next(fn for fn in registered_fns if fn.__name__ == "apollo_enrich_person")
+        result = enrich_fn(email="test@test.com")
+        assert "error" in result
+        assert "not configured" in result["error"]
+
+    def test_credentials_from_credential_manager(self):
+        mcp = MagicMock()
+        registered_fns = []
+        mcp.tool.return_value = lambda fn: registered_fns.append(fn) or fn
+
+        cred_manager = MagicMock()
+        cred_manager.get.return_value = "test-api-key"
+
+        register_tools(mcp, credentials=cred_manager)
+
+        enrich_fn = next(fn for fn in registered_fns if fn.__name__ == "apollo_enrich_company")
+
+        with patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post") as mock_post:
+            mock_response = MagicMock()
+            mock_response.status_code = 200
+            mock_response.json.return_value = {"organization": {"id": "123", "name": "Test"}}
+            mock_post.return_value = mock_response
+
+            result = enrich_fn(domain="test.com")
+
+        cred_manager.get.assert_called_with("apollo")
+        assert result["match_found"] is True
+
+    def test_credentials_from_env_var(self):
+        mcp = MagicMock()
+        registered_fns = []
+        mcp.tool.return_value = lambda fn: registered_fns.append(fn) or fn
+
+        register_tools(mcp, credentials=None)
+
+        enrich_fn = next(fn for fn in registered_fns if fn.__name__ == "apollo_enrich_company")
+
+        with (
+            patch.dict("os.environ", {"APOLLO_API_KEY": "env-api-key"}),
+            patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post") as mock_post,
+        ):
+            mock_response = MagicMock()
+            mock_response.status_code = 200
+            mock_response.json.return_value = {"organization": {"id": "123", "name": "Test"}}
+            mock_post.return_value = mock_response
+
+            result = enrich_fn(domain="test.com")
+
+        assert result["match_found"] is True
+        # Verify API key was used in X-Api-Key header
+        call_headers = mock_post.call_args.kwargs["headers"]
+        assert call_headers["X-Api-Key"] == "env-api-key"
+
+
+# --- Individual tool function tests ---
+
+
+class TestEnrichPersonTool:
+    def setup_method(self):
+        self.mcp = MagicMock()
+        self.fns = []
+        self.mcp.tool.return_value = lambda fn: self.fns.append(fn) or fn
+        cred = MagicMock()
+        cred.get.return_value = "test-key"
+        register_tools(self.mcp, credentials=cred)
+
+    def _fn(self, name):
+        return next(f for f in self.fns if f.__name__ == name)
+
+    def test_enrich_person_requires_email_or_linkedin(self):
+        result = self._fn("apollo_enrich_person")()
+        assert "error" in result
+        assert "Invalid search criteria" in result["error"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_success(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200,
+            json=MagicMock(
+                return_value={
+                    "person": {
+                        "id": "p1",
+                        "first_name": "John",
+                        "last_name": "Doe",
+                        "title": "CEO",
+                        "organization": {},
+                    }
+                }
+            ),
+        )
+        result = self._fn("apollo_enrich_person")(email="john@acme.com")
+        assert result["match_found"] is True
+        assert result["person"]["title"] == "CEO"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_timeout(self, mock_post):
+        mock_post.side_effect = httpx.TimeoutException("timed out")
+        result = self._fn("apollo_enrich_person")(email="test@test.com")
+        assert "error" in result
+        assert "timed out" in result["error"]
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_person_network_error(self, mock_post):
+        mock_post.side_effect = httpx.RequestError("connection failed")
+        result = self._fn("apollo_enrich_person")(email="test@test.com")
+        assert "error" in result
+        assert "Network error" in result["error"]
+
+
+class TestEnrichCompanyTool:
+    def setup_method(self):
+        self.mcp = MagicMock()
+        self.fns = []
+        self.mcp.tool.return_value = lambda fn: self.fns.append(fn) or fn
+        cred = MagicMock()
+        cred.get.return_value = "test-key"
+        register_tools(self.mcp, credentials=cred)
+
+    def _fn(self, name):
+        return next(f for f in self.fns if f.__name__ == name)
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_company_success(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200,
+            json=MagicMock(
+                return_value={
+                    "organization": {
+                        "id": "o1",
+                        "name": "Acme Inc",
+                        "industry": "Technology",
+                        "estimated_num_employees": 500,
+                    }
+                }
+            ),
+        )
+        result = self._fn("apollo_enrich_company")(domain="acme.com")
+        assert result["match_found"] is True
+        assert result["organization"]["name"] == "Acme Inc"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_enrich_company_not_found(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200, json=MagicMock(return_value={"organization": None})
+        )
+        result = self._fn("apollo_enrich_company")(domain="notreal.xyz")
+        assert result["match_found"] is False
+
+
+class TestSearchPeopleTool:
+    def setup_method(self):
+        self.mcp = MagicMock()
+        self.fns = []
+        self.mcp.tool.return_value = lambda fn: self.fns.append(fn) or fn
+        cred = MagicMock()
+        cred.get.return_value = "test-key"
+        register_tools(self.mcp, credentials=cred)
+
+    def _fn(self, name):
+        return next(f for f in self.fns if f.__name__ == name)
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_people_success(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200,
+            json=MagicMock(
+                return_value={
+                    "pagination": {"total_entries": 100},
+                    "people": [{"id": "p1", "name": "Alice", "title": "VP Sales"}],
+                }
+            ),
+        )
+        result = self._fn("apollo_search_people")(titles=["VP Sales"])
+        assert result["total"] == 100
+        assert len(result["results"]) == 1
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_people_with_all_filters(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200, json=MagicMock(return_value={"pagination": {}, "people": []})
+        )
+        self._fn("apollo_search_people")(
+            titles=["CEO"],
+            seniorities=["c_suite"],
+            locations=["San Francisco"],
+            company_sizes=["51-200"],
+            industries=["technology"],
+            technologies=["salesforce"],
+            limit=25,
+        )
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["person_titles"] == ["CEO"]
+        assert call_json["person_seniorities"] == ["c_suite"]
+        assert call_json["person_locations"] == ["San Francisco"]
+        assert call_json["organization_num_employees_ranges"] == ["51-200"]
+
+
+class TestSearchCompaniesTool:
+    def setup_method(self):
+        self.mcp = MagicMock()
+        self.fns = []
+        self.mcp.tool.return_value = lambda fn: self.fns.append(fn) or fn
+        cred = MagicMock()
+        cred.get.return_value = "test-key"
+        register_tools(self.mcp, credentials=cred)
+
+    def _fn(self, name):
+        return next(f for f in self.fns if f.__name__ == name)
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_companies_success(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200,
+            json=MagicMock(
+                return_value={
+                    "pagination": {"total_entries": 50},
+                    "organizations": [{"id": "o1", "name": "Tech Corp", "industry": "Technology"}],
+                }
+            ),
+        )
+        result = self._fn("apollo_search_companies")(industries=["technology"])
+        assert result["total"] == 50
+        assert len(result["results"]) == 1
+        assert result["results"][0]["industry"] == "Technology"
+
+    @patch("aden_tools.tools.apollo_tool.apollo_tool.httpx.post")
+    def test_search_companies_with_all_filters(self, mock_post):
+        mock_post.return_value = MagicMock(
+            status_code=200, json=MagicMock(return_value={"pagination": {}, "organizations": []})
+        )
+        self._fn("apollo_search_companies")(
+            industries=["finance"],
+            employee_counts=["201-500"],
+            locations=["New York"],
+            technologies=["aws"],
+            limit=15,
+        )
+        call_json = mock_post.call_args.kwargs["json"]
+        assert call_json["organization_industry_tag_ids"] == ["finance"]
+        assert call_json["organization_num_employees_ranges"] == ["201-500"]
+        assert call_json["organization_locations"] == ["New York"]
+        assert call_json["currently_using_any_of_technology_uids"] == ["aws"]
+        assert call_json["per_page"] == 15
+
+
+# --- Credential spec tests ---
+
+
+class TestCredentialSpec:
+    def test_apollo_credential_spec_exists(self):
+        from aden_tools.credentials import CREDENTIAL_SPECS
+
+        assert "apollo" in CREDENTIAL_SPECS
+
+    def test_apollo_spec_env_var(self):
+        from aden_tools.credentials import CREDENTIAL_SPECS
+
+        spec = CREDENTIAL_SPECS["apollo"]
+        assert spec.env_var == "APOLLO_API_KEY"
+
+    def test_apollo_spec_tools(self):
+        from aden_tools.credentials import CREDENTIAL_SPECS
+
+        spec = CREDENTIAL_SPECS["apollo"]
+        assert "apollo_enrich_person" in spec.tools
+        assert "apollo_enrich_company" in spec.tools
+        assert "apollo_search_people" in spec.tools
+        assert "apollo_search_companies" in spec.tools
+        assert len(spec.tools) == 4
@@ -754,7 +754,7 @@ wheels = [

 [[package]]
 name = "framework"
-version = "0.1.0"
+version = "0.4.2"
 source = { editable = "core" }
 dependencies = [
    { name = "anthropic" },
Author	SHA1	Message	Date
Timothy @aden	a12163d63f	Merge pull request #4304 from adenhq/fix/init-config Release / Create Release (push) Waiting to run Details model selection + max_tokens in quickstart	2026-02-09 20:11:55 -08:00
RichardTang-Aden	0cd6f21980	Merge pull request #4270 from TimothyZhang7/feature/hard-goal-negotiation Feature/hard goal negotiation	2026-02-09 20:04:20 -08:00
Richard Tang	a88fc1d75c	fix: remove the unnecessary summary before checking capabilities and gaps	2026-02-09 19:59:49 -08:00
Richard Tang	e9bde26611	fix: fixed minor issues introduced by the merge	2026-02-09 19:45:55 -08:00
Richard Tang	c02f40622c	Merge remote-tracking branch 'upstream/main' into feature/hard-goal-negotiation	2026-02-09 19:42:55 -08:00
Timothy @aden	3328a388b3	Merge pull request #3877 from adenhq/fix/oauth-refresh (micro-fix): update oauth to refresh token	2026-02-09 19:30:49 -08:00
Richard Tang	8f632eb005	feat: add communication style guideline	2026-02-09 19:28:48 -08:00
Richard Tang	c8ee961436	fix: update the step label to avoid confusion	2026-02-09 19:04:05 -08:00
Richard Tang	bc9f6b0af8	feat: update goal negotiation for a more conversational negotiation	2026-02-09 18:52:07 -08:00
bryan	7d48f17867	model selection + max_tokens in quickstart	2026-02-09 18:07:57 -08:00
RichardTang-Aden	736ae65a1d	Merge pull request #4262 from adenhq/feat/build-from-sample Build from Sample Agent	2026-02-09 16:05:42 -08:00
Bryan @ Aden	76c9f7c9a9	Merge pull request #1834 from fermano/feat/observability-trace-context feat(observability): structured logging for trace context	2026-02-09 15:25:51 -08:00
Fernando Mano	32ad225d7f	feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715 . -- remove redundant text from readme.md	2026-02-09 19:56:17 -03:00
bryan	7ae6f67470	updates to skills, renaming, suggested agents, remove changelog	2026-02-09 13:49:36 -08:00
Timothy @aden	594bceb8f5	Merge branch 'adenhq:main' into feature/hard-goal-negotiation	2026-02-09 12:28:19 -08:00
bryan	9dc0f48ec9	implemented building from sample agent template and updated deep research agent	2026-02-09 12:13:41 -08:00
Fernando Mano	ce5a2d4a81	feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715 . -- remove line that would cause third-party loggers to log twice	2026-02-09 09:36:25 -03:00
Fernando Mano	7f489cee46	Merge branch 'main' into feat/observability-trace-context	2026-02-09 09:25:51 -03:00
Anjali Yadav	3c2d669a2f	fix(credentials): correctly resolve integration_id in AdenCredentialResponse.from_dict (#3965 ) * fix(credentials): respect integration_id in AdenCredentialResponse.from_dict * style: fix forward reference annotation for Ruff	2026-02-09 17:52:55 +08:00
Timothy @aden	ec36e96499	Merge pull request #4146 from TimothyZhang7/main docs(release): release v0.4.2 - resumable sessions	2026-02-08 20:49:59 -08:00
Timothy	9ecd4980e4	chore: release v0.4.2 - resumable sessions Release / Create Release (push) Waiting to run Details - Add comprehensive resumable session functionality - Immediate pause with Ctrl+Z and /pause command - Auto-save state on quit - Session management with /resume and /sessions commands - Full memory and conversation history restoration - See CHANGELOG.md for complete list of changes	2026-02-08 20:44:36 -08:00
Timothy @aden	64446ff9b6	Merge pull request #4141 from TimothyZhang7/feature/resumable-sessions Feature/resumable sessions Release candidate for v0.4.2	2026-02-08 20:40:33 -08:00
Timothy	e3d2262292	fix: quit timeout, and tui interactions	2026-02-08 20:30:30 -08:00
Timothy	891cfa387a	Merge branch 'main' into feature/resumable-sessions	2026-02-08 19:46:30 -08:00
Timothy	f0243fddf2	feat: session resumable states and checkpoint system	2026-02-08 19:42:02 -08:00
Bryan @ Aden	85ff8e364b	Merge pull request #3828 from Sandeepa-git/docs/fix-contributing-typo docs(contributing): fix formatting typo in issue link	2026-02-08 19:07:48 -08:00
Bryan @ Aden	75f1afe8e3	Merge pull request #3857 from Manudeserti/docs/add-deep-research-readme docs: add missing README for Deep Research Agent	2026-02-08 19:07:40 -08:00
Bryan @ Aden	7b660311e5	Merge pull request #4025 from hamzanajam7/docs/fix-getting-started-project-structure docs(getting-started): fix project structure tree for tools and mcp_server location	2026-02-08 18:44:24 -08:00
Bryan @ Aden	98a493296d	Merge pull request #4026 from hamzanajam7/docs/add-contributing-link-readme docs(readme): add Contributing link to Quick Links section	2026-02-08 18:43:23 -08:00
RichardTang-Aden	bc2a42aed2	Merge pull request #3901 from Templar121/docs/clarify-hive-test-generation docs: clarify test generation responsibility in hive skill	2026-02-08 14:22:31 -08:00
Gaurav kapur	8b501d9091	fix: write node outputs to memory before edge evaluation (#3599 ) (#3694 ) * fix: write node outputs to memory before edge evaluation (#3599) * test: add regression tests for conditional edge direct key access	2026-02-08 23:23:37 +08:00
Fernando Mano	0304b392b2	feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715 .	2026-02-07 19:52:03 -03:00
hamzanajam7	ae9b4e82fe	docs(readme): add Contributing link to Quick Links section Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-07 14:52:50 -05:00
hamzanajam7	4bac5e4c46	docs(getting-started): fix project structure tree for tools and mcp_server location Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-07 14:49:04 -05:00
Fernando Mano	c4d3400ec4	Merge main into feat/observability-trace-context; resolve execution_stream conflicts	2026-02-07 16:49:04 -03:00
Amit Kumar	6d0a3b952a	feat(tools): add Apollo.io contact and company data enrichment integration (#3167 ) Add Apollo.io MCP tool integration for B2B contact and company data enrichment. Implements 4 MCP tools: - apollo_enrich_person: Enrich contact by email, LinkedIn URL, or name+domain - apollo_enrich_company: Enrich company by domain - apollo_search_people: Search contacts with filters (titles, seniorities, etc.) - apollo_search_companies: Search companies with filters (industries, size, etc.) Features: - Authentication via X-Api-Key header (APOLLO_API_KEY env var) - Credential spec in dedicated apollo.py (follows repo pattern) - Comprehensive error handling (401, 403, 404, 422, 429) - Full test coverage (36 tests) Closes #3061	2026-02-07 21:57:13 +08:00
Subhayan Mukherjee	873fcd5822	docs: clarify test generation responsibility in hive skill	2026-02-07 11:39:52 +05:30
RichardTang-Aden	2a98d3a489	Merge pull request #3890 from RichardTang-Aden/update-readme-gifs docs(readme): quick fix for the doc links	2026-02-06 20:34:34 -08:00
Richard Tang	b681ba03b1	chore: quick fix for the doc links	2026-02-06 20:32:20 -08:00
RichardTang-Aden	fe775a36c0	Merge pull request #3887 from RichardTang-Aden/update-readme-gifs Release / Create Release (push) Waiting to run Details feat: add video in the README	2026-02-06 20:21:48 -08:00
Timothy @aden	2df9adcb43	Merge pull request #3886 from TimothyZhang7/fix/quickstart-secret-key fix(micro-fix): quickstart secret key setup	2026-02-06 20:21:06 -08:00
Richard Tang	c756cbf6d5	feat: add video in the README	2026-02-06 20:20:53 -08:00
Timothy	d0ac67c9d3	fix: quickstart secret key setup	2026-02-06 20:18:12 -08:00
Timothy	47cd55052f	feat: hive-create needs to do some hard negotiation	2026-02-06 19:56:05 -08:00
bryan	fb203b5bdf	update oauth to refresh token	2026-02-06 19:43:30 -08:00
RichardTang-Aden	6ee47e243d	Merge pull request #3876 from RichardTang-Aden/update-readme Docs Update readme	2026-02-06 19:39:16 -08:00
Richard Tang	c1844b7a9d	docs: improve readme	2026-02-06 19:30:16 -08:00
Richard Tang	99a29e79e5	fix: fix the documentation python run to uv run	2026-02-06 19:22:16 -08:00
Richard Tang	589a66ef26	docs: remove unused docs	2026-02-06 19:19:49 -08:00
RichardTang-Aden	3f960763cb	Merge pull request #3875 from RichardTang-Aden/update-readme Update readme images	2026-02-06 19:08:46 -08:00
Richard Tang	15f8f3783c	chore: update images	2026-02-06 19:07:47 -08:00
Richard Tang	a2b045c7e3	chore: remove unnecessary links	2026-02-06 18:18:50 -08:00
Richard Tang	055cef2fdc	feat: improve quickstart.sh messages	2026-02-06 18:15:13 -08:00
Timothy @aden	6c6c69cbc3	Merge pull request #3872 from TimothyZhang7/refactor/consolidate-multi-level-log-for-tui docs(path): Align Agent Storage Path to .hive/agents/{agent_name}/	2026-02-06 17:40:39 -08:00
Timothy	6fe0062e6e	refactor(path): consolidate tui runner log path	2026-02-06 17:33:32 -08:00
Richard Tang	26b8b2f448	chore: move unused docs	2026-02-06 17:11:13 -08:00
Timothy @aden	7e40d6950a	Merge pull request #3871 from TimothyZhang7/main fix(micro-fix): uv paths in templates	2026-02-06 17:07:19 -08:00
Timothy	590bfa92cb	chore: fix mcp server default config	2026-02-06 17:04:03 -08:00
Timothy	f0e89a1720	fix: mcp server config with uv	2026-02-06 17:01:42 -08:00
Timothy @aden	575563b1e8	Merge pull request #3870 from adenhq/feat/multi-level-logging fix: hardening hive cli setup	2026-02-06 16:37:37 -08:00
RichardTang-Aden	2f57ca10f7	Merge pull request #3862 from adenhq/feat/hive-tui (micro-fix): documentation update	2026-02-06 16:19:46 -08:00
RichardTang-Aden	75c2d541c4	Merge branch 'main' into feat/hive-tui	2026-02-06 16:19:30 -08:00
Richard Tang	b666f8b50b	docs: minor doc update	2026-02-06 16:16:56 -08:00
RichardTang-Aden	09f9322676	Merge pull request #3863 from RichardTang-Aden/fix-remove-old-mock-mode Fix remove old mock mode	2026-02-06 16:02:01 -08:00
Richard Tang	f9a864ef93	fix: remove mock mode in the template	2026-02-06 15:59:48 -08:00
Richard Tang	27f28afe9c	fix: remove --mock in the codebase + documentation	2026-02-06 15:59:22 -08:00
Timothy @aden	8f85722fef	Merge pull request #3715 from adenhq/feat/multi-level-logging Feat/multi level logging	2026-02-06 15:59:16 -08:00
bryan	5588445a01	documentation update	2026-02-06 15:59:01 -08:00
Timothy @aden	cee632f50c	Merge pull request #3855 from adenhq/feat/hive-tui update tui to support menu, highlight/copy, update quickstart	2026-02-06 15:24:10 -08:00
Manudeserti	3f6bdda2a0	docs: add missing README for deep_research_agent	2026-02-06 18:11:00 -03:00
RichardTang-Aden	51e81d80fc	Merge pull request #3853 from adenhq/docs-key-concepts Docs key concepts	2026-02-06 12:45:16 -08:00
Richard Tang	cd014e41e4	docs: update links in the README.md	2026-02-06 12:44:34 -08:00
Richard Tang	830f11c47d	docs: add key concept section	2026-02-06 12:41:22 -08:00
Sandeepa	f2492bd4d4	docs(contributing): fix formatting typo in issue link	2026-02-07 00:22:48 +05:30
Timothy @aden	b22be7a6cb	Merge pull request #3818 from TimothyZhang7/main (micro-fix)(skills): cursor skill symlinks to claude skill	2026-02-06 09:32:23 -08:00
Timothy	433967f0cf	fix: cursor skill symlinks to claude skill	2026-02-05 18:11:24 -08:00
Fernando Mano	9d156325e0	Merge branch 'main' into feat/observability-trace-context	2026-02-05 17:06:07 -03:00
Fernando Mano	4310852ee6	chore: Merge branch 'main' into feat/observability-trace-context	2026-01-30 15:09:54 -03:00
Fernando Mano	853f1e9873	chore: Merge remote-tracking branch 'refs/remotes/origin/feat/observability-trace-context' into feat/observability-trace-context	2026-01-28 16:52:38 -03:00
Fernando Mano	ae5fe84fb2	feat(observability): Structured logging with automatic trace context propagation -- fix ruff formatting errors	2026-01-28 15:04:06 -03:00
Fernando Mano	92b538d5ae	Merge branch 'adenhq:main' into feat/observability-trace-context	2026-01-28 14:52:37 -03:00
Fernando Mano	5351703949	feat(observability): Structured logging with automatic trace context propagation -- fix lint error	2026-01-28 14:52:02 -03:00
Fernando Mano	7ba8169444	feat(observability): Structured logging with automatic trace context propagation -- remove colored logs for some cases when in prod mode	2026-01-28 12:46:54 -03:00
Fernando Mano	d090c954ae	feat(observability): Structured logging with automatic trace context propagation -- adjust all logs to print full uuids when in prod mode and include documentation	2026-01-28 12:31:11 -03:00
Fernando Mano	9bee1666f1	chore: Merge branch 'main' into feat/observability-trace-context	2026-01-28 11:35:13 -03:00
Fernando Mano	fb94637339	feat(observability): Structured logging with automatic trace context propagation	2026-01-28 11:27:24 -03:00
				`@@ -1 +0,0 @@`
				`../../.claude/skills/building-agents-construction`
				`@@ -1 +0,0 @@`
				`../../.claude/skills/building-agents-patterns`