update oauth to refresh token

Merge pull request #3871 from TimothyZhang7/main
fix(micro-fix): uv paths in templates
2026-02-06 19:43:30 -08:00 · 2026-02-06 17:07:19 -08:00 · 2026-02-06 17:04:03 -08:00 · 2026-02-06 17:01:42 -08:00 · 2026-02-06 16:37:37 -08:00 · 2026-02-06 16:19:46 -08:00
37 changed files with 718 additions and 207 deletions
@@ -28,8 +28,8 @@ metadata:
 mcp__agent-builder__add_mcp_server(
    name="hive-tools",
    transport="stdio",
-    command="python",
-    args='["mcp_server.py", "--stdio"]',
+    command="uv",
+    args='["run", "python", "mcp_server.py", "--stdio"]',
    cwd="tools",
    description="Hive tools MCP server"
 )
@@ -363,6 +363,24 @@ mcp__agent-builder__export_graph()
 - NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
 - NOT: `{"first-node-id"}` (WRONG - this is a set)

+**IMPORTANT mcp_servers.json format:**
+
+```json
+{
+  "hive-tools": {
+    "transport": "stdio",
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
+    "cwd": "../../tools",
+    "description": "Hive tools MCP server"
+  }
+}
+```
+
+- NO `"mcpServers"` wrapper (that's Claude Desktop format, NOT hive format)
+- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
+- `command` MUST be `"uv"` with `"args": ["run", "python", ...]` (NOT bare `"python"` which fails on Mac)
+
 **Use the example agent** at `.claude/skills/hive-create/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.

 **AFTER writing all files, tell the user:**
@@ -407,7 +425,9 @@ cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME vali
 │                                                                             │
 │  2. RUN YOUR AGENT:                                                         │
 │                                                                             │
-│     PYTHONPATH=core:exports python -m AGENT_NAME tui                        │
+│     hive tui                                                                │
+│                                                                             │
+│     Then select your agent from the list and press Enter.                   │
 │                                                                             │
 │  3. DEBUG ANY ISSUES:                                                       │
 │                                                                             │
@@ -517,3 +537,4 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
 8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
 9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
 10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
+11. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
  }
@@ -1 +0,0 @@
-../../.claude/skills/agent-workflow
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-construction
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-core
@@ -1 +0,0 @@
-../../.claude/skills/building-agents-patterns
@@ -0,0 +1 @@
+../../.claude/skills/hive
@@ -0,0 +1 @@
+../../.claude/skills/hive-concepts
@@ -0,0 +1 @@
+../../.claude/skills/hive-create
@@ -0,0 +1 @@
+../../.claude/skills/hive-credentials
@@ -0,0 +1 @@
+../../.claude/skills/hive-patterns
@@ -0,0 +1 @@
+../../.claude/skills/hive-test
@@ -1 +0,0 @@
-../../.claude/skills/testing-agent
@@ -109,6 +109,8 @@ This sets up:

 - **framework** - Core agent runtime and graph executor (in `core/.venv`)
 - **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
+- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
+- **LLM provider** - Interactive default model configuration
 - All required Python dependencies

 ### Build Your First Agent
@@ -118,10 +120,13 @@ This sets up:
 claude> /hive

 # Test your agent
-claude> /hive-test
+claude> /hive-debugger

-# Run your agent
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
+# (at separate terminal) Launch the interactive dashboard
+hive tui
+
+# Or run directly
+hive run exports/your_agent_name --input '{"key": "value"}'
 ```

 **[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
@@ -137,18 +142,19 @@ Skills are also available in Cursor. To enable:

 ## Features

- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
+- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
+- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
+- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
 - **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
+- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
 - **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
+- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
 - **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
 - **Production-Ready** - Self-hostable, built for scale and reliability

 ## Why Aden

-Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
+Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe [outcomes](docs/key_concepts/goals_outcome.md), and the system builds itself**—delivering an outcome-driven, [adaptive](docs/key_concepts/evolution.md) experience with an easy-to-use set of tools and integrations.

 ```mermaid
 flowchart LR
@@ -195,49 +201,51 @@ flowchart LR

 ### How It Works

-1. **Define Your Goal** → Describe what you want to achieve in plain English
-2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
-3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
+1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
+2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
+3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
 4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
-5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
+5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically

-## Run pre-built Agents (Coming Soon)
+## Run Agents

-### Run a sample agent
-
-Aden Hive provides a list of featured agents that you can use and build on top of.
-
-### Run an agent shared by others
-
-Put the agent in `exports/` and run `PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'`
-
-For building and running goal-driven agents with the framework:
+The `hive` CLI is the primary interface for running agents.

 ```bash
-# One-time setup
-./quickstart.sh
+# Browse and run agents interactively (Recommended)
+hive tui

-# This sets up:
-# - framework package (core runtime)
-# - aden_tools package (MCP tools)
-# - All Python dependencies
+# Run a specific agent directly
+hive run exports/my_agent --input '{"task": "Your input here"}'

-# Build new agents using Agent Skills
-claude> /hive
+# Run a specific agent with the TUI dashboard
+hive run exports/my_agent --tui

-# Run agents
-PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
+# Interactive REPL
+hive shell
 ```

+The TUI scans both `exports/` and `examples/templates/` for available agents.
+
+> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
+
 See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.

 ## Documentation

 - **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
 - [Getting Started](docs/getting-started.md) - Quick setup instructions
+- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
 - [Configuration Guide](docs/configuration.md) - All configuration options
 - [Architecture Overview](docs/architecture/README.md) - System design and structure

+### Key Concepts
+
+- [Goals & Outcome-Driven Development](docs/key_concepts/goals_outcome.md) - Why Hive is outcome-driven and how goals define success
+- [The Agent Graph](docs/key_concepts/graph.md) - Nodes, edges, shared memory, and how agents execute
+- [The Worker Agent](docs/key_concepts/worker_agent.md) - Sessions, iterations, headless execution, and the runtime
+- [Evolution](docs/key_concepts/evolution.md) - How agents improve across generations through failure data
+
 ## Roadmap

 Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
@@ -382,7 +390,7 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma

 **Q: What makes Hive different from other agent frameworks?**

-Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
+Hive generates your entire agent system from natural language [goals](docs/key_concepts/goals_outcome.md) using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.

 **Q: Is Hive open-source?**

@@ -402,7 +410,7 @@ Yes. Hive is explicitly designed for production environments with features like

 **Q: Does Hive support human-in-the-loop workflows?**

-Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
+Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.

 **Q: What monitoring and debugging tools does Hive provide?**

@@ -430,7 +438,7 @@ Contributions are welcome! Fork the repository, create your feature branch, impl

 **Q: When will my team start seeing results from Aden's adaptive agents?**

-Aden's adaptation loop begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your goal definitions, and the volume of executions generating feedback.
+Aden's [adaptation loop](docs/key_concepts/evolution.md) begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your [goal definitions](docs/key_concepts/goals_outcome.md), and the volume of executions generating feedback.

 **Q: How does Hive compare to other agent frameworks?**

@@ -4,8 +4,8 @@
      "name": "tools",
      "description": "Aden tools including web search, file operations, and PDF reading",
      "transport": "stdio",
-      "command": "python",
-      "args": ["mcp_server.py", "--stdio"],
+      "command": "uv",
+      "args": ["run", "python", "mcp_server.py", "--stdio"],
      "cwd": "../tools",
      "env": {
        "BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
@@ -33,11 +33,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        type=str,
        help="Input context from JSON file",
    )
-    run_parser.add_argument(
-        "--mock",
-        action="store_true",
-        help="Run in mock mode (no real LLM calls)",
-    )
    run_parser.add_argument(
        "--output",
        "-o",
@@ -192,11 +187,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
        help="Launch interactive TUI dashboard",
        description="Browse available agents and launch the terminal dashboard.",
    )
-    tui_parser.add_argument(
-        "--mock",
-        action="store_true",
-        help="Run in mock mode (no real LLM calls)",
-    )
    tui_parser.add_argument(
        "--model",
        "-m",
@@ -248,7 +238,6 @@ def cmd_run(args: argparse.Namespace) -> int:
                try:
                    runner = AgentRunner.load(
                        args.agent_path,
-                        mock_mode=args.mock,
                        model=args.model,
                        enable_tui=True,
                    )
@@ -286,7 +275,6 @@ def cmd_run(args: argparse.Namespace) -> int:
        try:
            runner = AgentRunner.load(
                args.agent_path,
-                mock_mode=args.mock,
                model=args.model,
                enable_tui=False,
            )
@@ -1057,7 +1045,6 @@ def cmd_tui(args: argparse.Namespace) -> int:
        try:
            runner = AgentRunner.load(
                agent_path,
-                mock_mode=args.mock,
                model=args.model,
                enable_tui=True,
            )
@@ -152,7 +152,7 @@ Add to `.vscode/settings.json`:

 1. **Never commit API keys** - Use environment variables or `.env` files
 2. **`.env` is git-ignored** - Copy `.env.example` to `.env` at the project root and fill in your values
-3. **Mock mode for testing** - Set `MOCK_MODE=1` to avoid LLM calls during development
+3. **Use real provider keys in non-production environments** - validate configuration with low-risk inputs before production rollout
 4. **Credential isolation** - Each tool validates its own credentials at runtime

 ## Troubleshooting
@@ -158,6 +158,7 @@ hive/                                    # Repository root
 │   │   ├── schemas/                     # Data schemas
 │   │   ├── storage/                     # File-based persistence
 │   │   ├── testing/                     # Testing utilities
+│   │   ├── tui/                         # Terminal UI dashboard
 │   │   └── __init__.py
 │   ├── pyproject.toml                   # Package metadata and dependencies
 │   ├── README.md                        # Framework documentation
@@ -180,6 +181,9 @@ hive/                                    # Repository root
 ├── exports/                             # AGENT PACKAGES (user-created, gitignored)
 │   └── your_agent_name/                 # Created via /hive-create
 │
+├── examples/                            # Example agents
+│   └── templates/                       # Pre-built template agents
+│
 ├── docs/                                # Documentation
 │   ├── getting-started.md               # Quick start guide
 │   ├── configuration.md                 # Configuration reference
@@ -287,22 +291,19 @@ If you prefer to build agents manually:
 ### Running Agents

 ```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m agent_name validate
+# Browse and run agents interactively (Recommended)
+hive tui

-# Show agent information
-PYTHONPATH=exports uv run python -m agent_name info
+# Run a specific agent
+hive run exports/my_agent --input '{"ticket_content": "My login is broken", "customer_id": "CUST-123"}'

-# Run agent with input
-PYTHONPATH=exports uv run python -m agent_name run --input '{
-  "ticket_content": "My login is broken",
-  "customer_id": "CUST-123"
-}'
+# Run with TUI dashboard
+hive run exports/my_agent --tui

-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
 ```

+> **Using Python directly:** `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
+
 ---

 ## Testing Agents
@@ -615,16 +616,10 @@ echo 'ANTHROPIC_API_KEY=your-key-here' >> .env

 ### Debugging Agent Execution

-```python
-# Add debug logging to your agent
-import logging
-logging.basicConfig(level=logging.DEBUG)
-
+```bash
 # Run with verbose output
-PYTHONPATH=exports uv run python -m agent_name run --input '{...}' --verbose
+hive run exports/my_agent --verbose --input '{"task": "..."}'

-# Use mock mode to test without LLM calls
-PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
 ```

 ---
@@ -18,6 +18,8 @@ This will:
 - Check Python version (requires 3.11+)
 - Install the core framework package (`framework`)
 - Install the tools package (`aden_tools`)
+- Initialize encrypted credential store (`~/.hive/credentials`)
+- Configure default LLM provider
 - Fix package compatibility issues (openai + litellm)
 - Verify all installations

@@ -126,7 +128,32 @@ $env:ANTHROPIC_API_KEY="your-key-here"

 ## Running Agents

-All agent commands must be run from the project root with `PYTHONPATH` set:
+The `hive` CLI is the primary interface for running agents:
+
+```bash
+# Browse and run agents interactively (Recommended)
+hive tui
+
+# Run a specific agent
+hive run exports/my_agent --input '{"task": "Your input here"}'
+
+# Run with TUI dashboard
+hive run exports/my_agent --tui
+```
+
+### CLI Command Reference
+
+| Command | Description |
+|---------|-------------|
+| `hive tui` | Browse agents and launch TUI dashboard |
+| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
+| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
+| `hive info <path>` | Show agent details |
+| `hive validate <path>` | Validate agent structure |
+| `hive list [dir]` | List available agents |
+| `hive dispatch [dir]` | Multi-agent orchestration |
+
+### Using Python directly (alternative)

 ```bash
 # From /hive/ directory
@@ -140,24 +167,6 @@ $env:PYTHONPATH="core;exports"
 python -m agent_name COMMAND
 ```

-### Example: Support Ticket Agent
-
-```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m your_agent_name validate
-
-# Show agent information
-PYTHONPATH=exports uv run python -m your_agent_name info
-
-# Run agent with input
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{
-  "task": "Your input here"
-}'
-
-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m your_agent_name run --mock --input '{...}'
-```
-
 ## Building New Agents and Run Flow

 Build and run an agent using Claude Code CLI with the agent building skills:
@@ -353,8 +362,11 @@ hive/
 │   ├── .venv/              # Created by quickstart.sh
 │   └── pyproject.toml
 │
-└── exports/                 # Agent packages (user-created, gitignored)
-    └── your_agent_name/     # Created via /hive-create
+├── exports/                 # Agent packages (user-created, gitignored)
+│   └── your_agent_name/     # Created via /hive-create
+│
+└── examples/
+    └── templates/           # Pre-built template agents
 ```

 ## Separate Virtual Environments
@@ -456,7 +468,11 @@ claude> /hive-test
 ### 5. Run Agent

 ```bash
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
+# Interactive dashboard
+hive tui
+
+# Or run directly
+hive run exports/your_agent_name --input '{"task": "..."}'
 ```

 ## IDE Setup
@@ -88,7 +88,8 @@ hive/
 │   │   ├── runtime/        # Runtime environment
 │   │   ├── schemas/        # Data schemas
 │   │   ├── storage/        # File-based persistence
-│   │   └── testing/        # Testing utilities
+│   │   ├── testing/        # Testing utilities
+│   │   └── tui/            # Terminal UI dashboard
 │   └── pyproject.toml      # Package metadata
 │
 ├── tools/                  # MCP Tools Package
@@ -102,6 +103,9 @@ hive/
 ├── exports/                # Agent Packages (user-generated, not in repo)
 │   └── your_agent/         # Your agents created via /hive
 │
+├── examples/
+│   └── templates/          # Pre-built template agents
+│
 ├── .claude/                # Claude Code Skills
 │   └── skills/
 │       ├── hive/
@@ -116,19 +120,15 @@ hive/
 ## Running an Agent

 ```bash
-# Validate agent structure
-PYTHONPATH=exports uv run python -m my_agent validate
+# Browse and run agents interactively (Recommended)
+hive tui

-# Show agent information
-PYTHONPATH=exports uv run python -m my_agent info
+# Run a specific agent
+hive run exports/my_agent --input '{"task": "Your input here"}'

-# Run agent with input
-PYTHONPATH=exports uv run python -m my_agent run --input '{
-  "task": "Your input here"
-}'
+# Run with TUI dashboard
+hive run exports/my_agent --tui

-# Run in mock mode (no LLM calls)
-PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
 ```

 ## API Keys Setup
@@ -164,11 +164,12 @@ PYTHONPATH=exports uv run python -m my_agent test --type success

 ## Next Steps

-1. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
-2. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
-3. **Build Agents**: Use `/hive` skill in Claude Code
-4. **Custom Tools**: Learn to integrate MCP servers
-5. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
+1. **TUI Dashboard**: Run `hive tui` to explore agents interactively
+2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
+3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
+4. **Build Agents**: Use `/hive` skill in Claude Code
+5. **Custom Tools**: Learn to integrate MCP servers
+6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)

 ## Troubleshooting

@@ -194,8 +195,6 @@ uv pip install -e .
 # Verify API key is set
 echo $ANTHROPIC_API_KEY

-# Run in mock mode to test without API
-PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
 ```

 ### Package Installation Issues
@@ -0,0 +1,49 @@
+# Evolution
+
+## Evolution Is the Mechanism; Adaptiveness Is the Result
+
+Agents don't just fail; they fail inevitably. Real-world variables—private LinkedIn profiles, shifting API schemas, or LLM hallucinations—are impossible to predict in a vacuum. The first version of any agent is merely a "happy path" draft.
+
+Evolution is how Hive handles this. When an agent fails, the framework captures what went wrong — which node failed, which success criteria weren't met, what the agent tried and why it didn't work. Then a coding agent (Claude Code, Cursor, or similar) uses that failure data to generate an improved version of the agent. The new version gets deployed, runs, encounters new edge cases, and the cycle continues.
+
+Over generations, the agent gets more reliable. Not because someone sat down and anticipated every possible failure, but because each failure teaches the next version something specific.
+
+## How It Works
+
+The evolution loop has four stages:
+
+**1. Execute** — The worker agent runs against real inputs. Sessions produce outcomes, decisions, and metrics.
+
+**2. Evaluate** — The framework checks outcomes against the goal's success criteria and constraints. Did the agent produce the desired result? Which criteria were satisfied and which weren't? Were any constraints violated?
+
+**3. Diagnose** — Failure data is structured and specific. It's not just "the agent failed" — it's "node `draft_message` failed to produce personalized content because the research node returned insufficient data about the prospect's recent activity." The decision log, problem reports, and execution trace provide the full picture.
+
+**4. Regenerate** — A coding agent receives the diagnosis and the current agent code. It modifies the graph — adding nodes, adjusting prompts, changing edge conditions, adding tools — to address the specific failure. The new version is deployed and the cycle restarts.
+
+## Adaptiveness ≠ Intelligence or Intent
+
+An important distinction: evolution makes agents more adaptive, but not more intelligent in any general sense. The agent isn't learning to reason better — it's being rewritten to handle more situations correctly.
+
+This is closer to how biological evolution works than how learning works. A species doesn't "learn" to survive winter — individuals that happen to have thicker fur survive, and that trait gets selected for. Similarly, agent versions that handle more edge cases correctly survive in production, and the patterns that made them successful get carried forward.
+
+The practical implication: don't expect evolution to make an agent smarter about problems it's never seen. Evolution improves reliability on the *kinds* of problems the agent has already encountered. For genuinely novel situations, that's what human-in-the-loop is for — and every time a human steps in, that interaction becomes potential fuel for the next evolution cycle.
+
+## What Gets Evolved
+
+Evolution can change almost anything about an agent:
+
+**Prompts** — The most common fix. A node's system prompt gets refined based on the specific ways the LLM misunderstood its instructions.
+
+**Graph structure** — Adding a validation node before a critical step, splitting a node that's trying to do too much, adding a fallback path for a common failure mode.
+
+**Edge conditions** — Adjusting routing logic based on observed patterns. If low-confidence research results consistently lead to bad drafts, add a conditional edge that routes them back for another research pass.
+
+**Tool selection** — Swapping in a better tool, adding a new one, or removing one that causes more problems than it solves.
+
+**Constraints and criteria** — Tightening or loosening based on what's actually achievable and what matters in practice.
+
+## The Role of Decision Logging
+
+Evolution depends on good data. The runtime captures every decision an agent makes: what it was trying to do, what options it considered, what it chose, and what happened as a result. This isn't overhead — it's the signal that makes evolution possible.
+
+Without decision logging, failure analysis is guesswork. With it, the coding agent can trace a failure back to its root cause and make a targeted fix rather than a blind change.
@@ -0,0 +1,101 @@
+# Goals & Outcome-Driven Development
+
+## The Core Idea
+
+Business processes are outcome-driven. A sales team doesn't follow a rigid script — they adapt their approach until the deal closes. A support agent doesn't execute a flowchart — they resolve the customer's issue. The outcome is what matters, not the specific steps taken to get there.
+
+Hive is built on this principle. Instead of hardcoding agent workflows step by step, you define the outcome you want, and the framework figures out how to get there. We call this **Outcome-Driven Development (ODD)**.
+
+## Task-Driven vs Goal-Driven vs Outcome-Driven
+
+These three paradigms represent different levels of abstraction for building agents:
+
+**Task-Driven Development (TDD)** asks: *"Is the code correct?"*
+
+You define explicit steps. The agent follows them. Success means the steps ran without errors. The problem: an agent can execute every step perfectly and still produce a useless result. The steps become the goal, not the actual outcome.
+
+**Goal-Driven Development (GDD)** asks: *"Are we solving the right problem?"*
+
+You define what you want to achieve. The agent plans and executes toward that goal. Better than TDD because it captures intent. But goals can be vague — "improve customer satisfaction" doesn't tell you when you're done.
+
+**Outcome-Driven Development (ODD)** asks: *"Did the system produce the desired result?"*
+
+You define measurable success criteria, hard constraints, and the context the agent needs. The agent is evaluated against the actual outcome, not whether it followed the right steps or aimed at the right goal. This is what Hive implements.
+
+## Goals as First-Class Citizens
+
+In Hive, a `Goal` is not a string description. It's a structured object with three components:
+
+### Success Criteria
+
+Each goal has weighted success criteria that define what "done" looks like. These aren't binary pass/fail checks — they're multi-dimensional measures of quality.
+
+```python
+Goal(
+    id="twitter-outreach",
+    name="Personalized Twitter Outreach",
+    success_criteria=[
+        SuccessCriterion(
+            id="personalized",
+            description="Messages reference specific details from the prospect's profile",
+            metric="llm_judge",
+            weight=0.4
+        ),
+        SuccessCriterion(
+            id="compliant",
+            description="Messages follow brand voice guidelines",
+            metric="llm_judge",
+            weight=0.3
+        ),
+        SuccessCriterion(
+            id="actionable",
+            description="Each message includes a clear call to action",
+            metric="output_contains",
+            target="CTA",
+            weight=0.3
+        ),
+    ],
+    ...
+)
+```
+
+Metrics can be `output_contains`, `output_equals`, `llm_judge`, or `custom`. Weights let you express what matters most — a perfectly compliant message that isn't personalized still falls short.
+
+### Constraints
+
+Constraints define what must **not** happen. They're the guardrails.
+
+```python
+constraints=[
+    Constraint(
+        id="no_spam",
+        description="Never send more than 3 messages to the same person per week",
+        constraint_type="hard",    # Violation = immediate escalation
+        category="safety"
+    ),
+    Constraint(
+        id="budget_limit",
+        description="Total LLM cost must not exceed $5 per run",
+        constraint_type="soft",    # Violation = warning, not a hard stop
+        category="cost"
+    ),
+]
+```
+
+Hard constraints are non-negotiable — violating one triggers escalation or failure. Soft constraints are preferences that the agent should respect but can bend when necessary. Constraint categories include `time`, `cost`, `safety`, `scope`, and `quality`.
+
+### Context
+
+Goals carry context — domain knowledge, preferences, background information that the agent needs to make good decisions. This context is injected into every LLM call the agent makes, so the agent is always reasoning with the full picture.
+
+## Why This Matters
+
+When you define goals with weighted criteria and constraints, three things happen:
+
+1. **The agent can self-correct.** Goals are injected into every LLM call, so the agent is always reasoning against its success criteria. Within a [graph execution](./graph.md), nodes use these criteria to decide whether to accept their output, retry, or escalate — self-correction in real time.
+
+2. **Evolution has a target.** When an agent fails, the framework knows *which criteria* it fell short on, which gives the coding agent specific information to improve the next generation (see [Evolution](./evolution.md)).
+
+3. **Humans stay in control.** Constraints define the boundaries. The agent has freedom to find creative solutions within those boundaries, but it can't cross the lines you've drawn.
+
+The goal lifecycle flows through `DRAFT → READY → ACTIVE → COMPLETED / FAILED / SUSPENDED`, giving you visibility into where each objective stands at any point during execution.
@@ -0,0 +1,78 @@
+# The Agent Graph
+
+## Why a Graph
+
+Real business processes aren't linear. A sales outreach might go: research a prospect, draft a message, realize the research is thin, go back and dig deeper, draft again, get human approval, send. There are loops, branches, fallbacks, and decision points.
+
+Hive models this as a directed graph. Nodes do work, edges connect them, and shared memory lets them pass data. The framework walks this structure — running nodes, following edges, managing retries — until the agent reaches its goal or exhausts its step budget.
+
+Edges can loop back, creating feedback cycles where an agent retries a step or takes a different path. That's intentional. A graph that only moves forward can't self-correct.
+
+## Nodes
+
+A node is a unit of work. Each node reads inputs from shared memory, does something, and writes outputs back. There are a handful of node types, each suited to a different kind of work:
+
+**`event_loop`** — The workhorse. This is a multi-turn LLM loop: the model reasons about the current state, calls tools, observes results, and keeps going until it has produced the required outputs. Most of the interesting agent behavior happens in these nodes. They handle long-running tasks, manage their own context window, and can recover from crashes mid-conversation.
+
+**`function`** — A plain Python function. No LLM involved. Use these for anything deterministic: data transformation, API calls with known parameters, validation logic, or any step where you don't want a language model making judgment calls.
+
+**`router`** — A decision point that directs execution down different paths. Can be rule-based ("if confidence is high, go left; otherwise, go right") or LLM-powered ("given the goal and what we know so far, which path makes sense?").
+
+**`human_input`** — A pause point where the agent stops and asks a human for input before continuing. See [Human-in-the-Loop](#human-in-the-loop) below.
+
+There are also simpler LLM node types (`llm_tool_use` for a single LLM call with tools, `llm_generate` for pure text generation) for steps that don't need the full event loop.
+
+### Self-Correction Within a Node
+
+The most important behavior in an `event_loop` node is the ability to self-correct. After each iteration, the node evaluates its own output: did it produce what was needed? If yes, it's done. If not, it tries again — but this time it sees what went wrong and adjusts.
+
+This is the **reflexion pattern**: try, evaluate, learn from the result, try again. It's cheaper and more effective than starting over. An agent that takes three attempts to get something right is still more useful than one that fails on the first try and gives up.
+
+Within a single node, the outcomes are:
+
+- **Accept** — Output meets the bar. Move on.
+- **Retry** — Not good enough, but recoverable. Try again with feedback.
+- **Escalate** — Something is fundamentally broken. Hand off to error handling.
+
+This is self-correction *within a session* — the agent adapting in real time. It's different from [evolution](./evolution.md), which improves the agent *across sessions* by rewriting its code between generations. Both matter: reflexion handles the bumps in a single run, evolution handles the patterns that keep recurring across many runs.
+
+## Edges
+
+Edges control flow between nodes. Each edge has a condition:
+
+- **On success** — follow this edge if the source node succeeded
+- **On failure** — follow if the source failed (this is how you wire up fallback paths and error recovery)
+- **Conditional** — follow if an expression is true (e.g., route high-confidence results one way, low-confidence results another)
+- **LLM-decided** — let the LLM choose which path based on the [goal](./goals_outcome.md) and current context
+
+Edges also handle data plumbing between nodes — mapping one node's outputs to another node's expected inputs, so each node has a clean interface without needing to know where its data came from.
+
+When a node has multiple outgoing edges, the framework can run those branches in parallel and reconverge when they're all done. This is useful for tasks like researching a prospect from multiple sources simultaneously.
+
+## Shared Memory
+
+Shared memory is how nodes communicate. It's a key-value store scoped to a single [session](./worker_agent.md). Every node declares which keys it reads and which it writes, and the framework enforces those boundaries — a node can't quietly access data it hasn't declared.
+
+Data flows through the graph in a natural way: input arrives at the start, each node reads what it needs and writes what it produces, and edges map outputs to inputs as data moves between nodes. At the end, the full memory state is the execution result.
+
+## Human-in-the-Loop
+
+Human-in-the-loop (HITL) nodes are where the agent pauses and asks a person for input. This isn't a blunt "stop everything" — the framework supports structured questions: open-ended text, multiple choice, yes/no approvals, and multi-field forms.
+
+When the agent hits a HITL node, it saves its entire state and presents the questions. The session can sit paused for minutes, hours, or days. When the human responds, execution picks up exactly where it left off.
+
+This is what makes Hive agents supervisable in production. You place HITL nodes at critical decision points — before sending a message, before making a purchase, before any action that's hard to undo. The agent handles the routine work autonomously; humans weigh in on the decisions that matter. And every time a human provides input, that decision becomes data the [evolution](./evolution.md) process can learn from.
+
+## The Shape of an Agent
+
+A typical agent graph looks something like this:
+
+```
+intake → research → draft → [human review] → send → done
+                ↑                                 |
+                └──── on failure ─────────────────┘
+```
+
+An entry node where work begins. A chain of nodes that do the real work. HITL nodes at approval gates. Failure edges that loop back for another attempt. Terminal nodes where execution ends.
+
+The framework tracks everything as it walks the graph: which nodes ran, how many retries each needed, how much the LLM calls cost, how long each step took. This metadata feeds into the [worker agent runtime](./worker_agent.md) for monitoring and into the [evolution](./evolution.md) process for improvement.
@@ -0,0 +1,51 @@
+# The Worker Agent
+
+## What a Worker Agent Is
+
+A worker agent is a specialized AI agent built to perform a specific business process. It's not a general-purpose assistant — it's purpose-built, like hiring someone for a defined role. A sales outreach agent knows how to research prospects, craft personalized messages, and follow up. A support triage agent knows how to categorize tickets, pull customer context, and route to the right team.
+
+In Hive, a **Coding Agent** (like Claude Code or Cursor) generates worker agents from a natural language goal description. You describe what you want the agent to do, and the coding agent produces the graph, nodes, edges, and configuration. The worker agent is the thing that actually runs.
+
+## Sessions
+
+A session is a single execution of a worker agent against a specific input. If your outreach agent processes 50 prospects, that's 50 sessions.
+
+Each session is isolated — it has its own shared memory, its own execution state, and its own history. This matters because sessions can be long-running. An agent might start researching a prospect, pause for human approval, wait hours or days, and then resume to send the message. The session preserves everything across that gap.
+
+Sessions also make debugging straightforward. Every decision the agent made, every tool it called, every retry it attempted — it's all captured in the session. When something goes wrong, you can trace exactly what happened.
+
+## Iterations
+
+Within a session, nodes (especially `event_loop` nodes) work in iterations. An iteration is one turn of the loop: the LLM reasons about the current state, possibly calls tools, observes results, and produces output. Then the judge evaluates: is this good enough?
+
+If not, the node iterates again. The LLM sees what went wrong and adjusts its approach. This is how agents self-correct without human intervention — through rapid iteration within a single node, not by restarting the whole process.
+
+Iterations have limits. You set a maximum per node to prevent runaway loops. If a node can't produce acceptable output within its iteration budget, it fails and the graph's error-handling edges take over.
+
+## Headless Execution
+
+A lot of business processes need to run continuously — monitoring inboxes, processing incoming leads, watching for events. These agents run **headless**: no UI, no human sitting at a terminal, just the agent doing its job in the background.
+
+Headless doesn't mean unsupervised. HITL (human-in-the-loop) nodes still pause execution and wait for human input when the agent hits a decision it shouldn't make alone. The difference is that instead of a live conversation, the agent sends a notification, waits for a response through whatever channel you've configured, and resumes when the human weighs in.
+
+This is the operational model Hive is designed for: agents that run 24/7 as part of your business infrastructure, with humans stepping in only when needed. The goal is to automate the routine and escalate the exceptions.
+
+## The Runtime
+
+The worker agent runtime manages the lifecycle: starting sessions, executing the graph, handling pauses and resumes, tracking costs, and collecting metrics. It coordinates everything the agent needs — LLM access, tool execution, shared memory, credential management — so individual nodes can focus on their specific job.
+
+Key things the runtime handles:
+
+**Cost tracking** — Every LLM call is metered. You set budget constraints on the goal, and the runtime enforces them. An agent can't silently burn through your API credits.
+
+**Decision logging** — Every meaningful choice the agent makes is recorded: what it was trying to do, what options it considered, what it chose, and what happened. This isn't just for debugging — it's the raw material that evolution uses to improve future generations.
+
+**Event streaming** — The runtime emits events as the agent works. You can wire these up to dashboards, logs, or alerting systems to monitor agents in real time.
+
+**Crash recovery** — If execution is interrupted (process crash, deployment, anything), the runtime can resume from the last checkpoint. Conversation state and memory are persisted, so the agent picks up where it left off rather than starting over.
+
+## The Big Picture
+
+The worker agent model is Hive's answer to a simple question: how do you run AI agents like you'd run a team?
+
+You hire for a role (define the goal), you onboard them with context (provide tools, credentials, domain knowledge), you set expectations (success criteria and constraints), you let them work independently (headless execution), and you check in when something unusual comes up (HITL). When they're not performing well, you don't debug them line by line — you evolve them (see [Evolution](./evolution.md)).
@@ -1,4 +1,27 @@
-# TUI Text Selection and Copy Guide
+# TUI Dashboard Guide
+
+## Launching the TUI
+
+There are two ways to launch the TUI dashboard:
+
+```bash
+# Browse and select an agent interactively
+hive tui
+
+# Launch the TUI for a specific agent
+hive run exports/my_agent --tui
+```
+
+`hive tui` scans both `exports/` and `examples/templates/` for available agents, then presents a selection menu.
+
+## Dashboard Panels
+
+The TUI dashboard is divided into four areas:
+
+- **Status Bar** - Shows the current agent name, execution state, and model in use
+- **Graph Overview** - Live visualization of the agent's node graph with highlighted active node
+- **Log Pane** - Scrollable event log streaming node transitions, LLM calls, and tool outputs
+- **Chat REPL** - Input area for interacting with client-facing nodes (`ask_user()` prompts appear here)

 ## Keybindings

@@ -28,3 +51,9 @@ The log pane uses `auto_scroll=False`. New output only scrolls to the bottom whe
 ## Screenshots

 `Ctrl+S` saves an SVG screenshot to the `screenshots/` directory with a timestamped filename. Open the SVG in any browser to view it.
+
+## Tips
+
+- Use `--mock` mode to explore agent execution without spending API credits: `hive run exports/my_agent --tui --mock`
+- Override the default model with `--model`: `hive run exports/my_agent --model gpt-4o`
+- Screenshots are saved as SVG files to `screenshots/` and can be opened in any browser
@@ -34,18 +34,17 @@ def cli():

@cli.command()
@click.option("--topic", "-t", type=str, required=True, help="Research topic")
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(topic, mock, quiet, verbose, debug):
+def run(topic, quiet, verbose, debug):
    """Execute research on a topic."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

    context = {"topic": topic}

-    result = asyncio.run(default_agent.run(context, mock_mode=mock))
+    result = asyncio.run(default_agent.run(context))

    output_data = {
        "success": result.success,
@@ -60,10 +59,9 @@ def run(topic, mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive research."""
    setup_logging(verbose=verbose, debug=debug)

@@ -97,13 +95,11 @@ def tui(mock, verbose, debug):
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -173,7 +173,7 @@ class DeepResearchAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

@@ -187,13 +187,11 @@ class DeepResearchAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -213,10 +211,10 @@ class DeepResearchAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -244,10 +242,10 @@ class DeepResearchAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
  }
@@ -33,18 +33,17 @@ def cli():


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(mock, quiet, verbose, debug):
+def run(quiet, verbose, debug):
    """Execute the news reporter agent."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

    context = {}

-    result = asyncio.run(default_agent.run(context, mock_mode=mock))
+    result = asyncio.run(default_agent.run(context))

    output_data = {
        "success": result.success,
@@ -59,10 +58,9 @@ def run(mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive news reporting."""
    setup_logging(verbose=verbose, debug=debug)

@@ -95,13 +93,11 @@ def tui(mock, verbose, debug):
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -157,7 +157,7 @@ class TechNewsReporterAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

@@ -171,13 +171,11 @@ class TechNewsReporterAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -197,10 +195,10 @@ class TechNewsReporterAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -228,10 +226,10 @@ class TechNewsReporterAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, save_data, and serve_file_to_user"
  }
@@ -18,8 +18,8 @@ PYTHONPATH=core:exports uv run python -m twitter_outreach validate
 # Show agent info
 PYTHONPATH=core:exports uv run python -m twitter_outreach info

-# Run in mock mode (no API calls)
-PYTHONPATH=core:exports uv run python -m twitter_outreach run --mock
+# Run the workflow
+PYTHONPATH=core:exports uv run python -m twitter_outreach run

 # Launch the TUI
 PYTHONPATH=core:exports uv run python -m twitter_outreach tui
@@ -33,16 +33,15 @@ def cli():


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def run(mock, quiet, verbose, debug):
+def run(quiet, verbose, debug):
    """Execute the outreach workflow."""
    if not quiet:
        setup_logging(verbose=verbose, debug=debug)

-    result = asyncio.run(default_agent.run({}, mock_mode=mock))
+    result = asyncio.run(default_agent.run({}))

    output_data = {
        "success": result.success,
@@ -57,10 +56,9 @@ def run(mock, quiet, verbose, debug):


@cli.command()
-@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
-def tui(mock, verbose, debug):
+def tui(verbose, debug):
    """Launch the TUI dashboard for interactive outreach."""
    setup_logging(verbose=verbose, debug=debug)

@@ -93,13 +91,11 @@ def tui(mock, verbose, debug):
        if mcp_config_path.exists():
            agent._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock:
-            llm = LiteLLMProvider(
-                model=agent.config.model,
-                api_key=agent.config.api_key,
-                api_base=agent.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=agent.config.model,
+            api_key=agent.config.api_key,
+            api_base=agent.config.api_base,
+        )

        tools = list(agent._tool_registry.get_tools().values())
        tool_executor = agent._tool_registry.get_executor()
@@ -172,7 +172,7 @@ class TwitterOutreachAgent:
            },
        )

-    def _setup(self, mock_mode=False) -> GraphExecutor:
+    def _setup(self) -> GraphExecutor:
        """Set up the executor with all components."""
        from pathlib import Path

@@ -186,13 +186,11 @@ class TwitterOutreachAgent:
        if mcp_config_path.exists():
            self._tool_registry.load_mcp_config(mcp_config_path)

-        llm = None
-        if not mock_mode:
-            llm = LiteLLMProvider(
-                model=self.config.model,
-                api_key=self.config.api_key,
-                api_base=self.config.api_base,
-            )
+        llm = LiteLLMProvider(
+            model=self.config.model,
+            api_key=self.config.api_key,
+            api_base=self.config.api_base,
+        )

        tool_executor = self._tool_registry.get_executor()
        tools = list(self._tool_registry.get_tools().values())
@@ -212,10 +210,10 @@ class TwitterOutreachAgent:

        return self._executor

-    async def start(self, mock_mode=False) -> None:
+    async def start(self) -> None:
        """Set up the agent (initialize executor and tools)."""
        if self._executor is None:
-            self._setup(mock_mode=mock_mode)
+            self._setup()

    async def stop(self) -> None:
        """Clean up resources."""
@@ -243,10 +241,10 @@ class TwitterOutreachAgent:
        )

    async def run(
-        self, context: dict, mock_mode=False, session_state=None
+        self, context: dict, session_state=None
    ) -> ExecutionResult:
        """Run the agent (convenience method for single execution)."""
-        await self.start(mock_mode=mock_mode)
+        await self.start()
        try:
            result = await self.trigger_and_wait(
                "start", context, session_state=session_state
@@ -1,8 +1,8 @@
 {
  "hive-tools": {
    "transport": "stdio",
-    "command": "python",
-    "args": ["mcp_server.py", "--stdio"],
+    "command": "uv",
+    "args": ["run", "python", "mcp_server.py", "--stdio"],
    "cwd": "../../../tools",
    "description": "Hive tools MCP server providing web_search, web_scrape, and send_email"
  }
@@ -353,7 +353,18 @@ class CredentialStoreAdapter:
        cls,
        specs: dict[str, CredentialSpec] | None = None,
    ) -> CredentialStoreAdapter:
-        """Create adapter with encrypted storage primary and env var fallback."""
+        """Create adapter with encrypted storage primary and env var fallback.
+
+        When ADEN_API_KEY is set, builds the store with AdenSyncProvider and
+        AdenCachedStorage so that OAuth credentials (Google, HubSpot, Slack)
+        auto-refresh via the Aden server.  Non-Aden credentials (brave_search,
+        anthropic, resend) still resolve from environment variables.
+
+        When ADEN_API_KEY is not set, behaves identically to before.
+        """
+        import logging
+        import os
+
        from framework.credentials import CredentialStore
        from framework.credentials.storage import (
            CompositeStorage,
@@ -361,6 +372,8 @@ class CredentialStoreAdapter:
            EnvVarStorage,
        )

+        log = logging.getLogger(__name__)
+
        if specs is None:
            from . import CREDENTIAL_SPECS

@@ -368,17 +381,69 @@ class CredentialStoreAdapter:

        env_mapping = {name: spec.env_var for name, spec in specs.items()}

+        # --- Aden sync branch ---
+        # Note: we don't use CredentialStore.with_aden_sync() here because it
+        # only wraps EncryptedFileStorage.  We need CompositeStorage (encrypted
+        # + env var fallback) so non-Aden credentials like brave_search still
+        # resolve from environment variables.
+        aden_api_key = os.environ.get("ADEN_API_KEY")
+        if aden_api_key:
+            try:
+                from framework.credentials.aden import (
+                    AdenCachedStorage,
+                    AdenClientConfig,
+                    AdenCredentialClient,
+                    AdenSyncProvider,
+                )
+
+                # Local storage: encrypted primary + env var fallback
+                encrypted = EncryptedFileStorage()
+                env = EnvVarStorage(env_mapping)
+                local_composite = CompositeStorage(primary=encrypted, fallbacks=[env])
+
+                # Aden components
+                client = AdenCredentialClient(
+                    AdenClientConfig(
+                        base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
+                    )
+                )
+                provider = AdenSyncProvider(client=client)
+
+                # AdenCachedStorage wraps composite, giving Aden priority
+                cached_storage = AdenCachedStorage(
+                    local_storage=local_composite,
+                    aden_provider=provider,
+                    cache_ttl_seconds=300,
+                )
+
+                store = CredentialStore(
+                    storage=cached_storage,
+                    providers=[provider],
+                    auto_refresh=True,
+                )
+
+                # Initial sync: populate local cache from Aden
+                try:
+                    synced = provider.sync_all(store)
+                    log.info("Aden credential sync complete: %d credentials synced", synced)
+                except Exception as e:
+                    log.warning("Aden initial sync failed (will retry on access): %s", e)
+
+                return cls(store=store, specs=specs)
+
+            except Exception as e:
+                log.warning(
+                    "Aden credential sync unavailable, falling back to default storage: %s", e
+                )
+
+        # --- Default branch (no ADEN_API_KEY or Aden setup failed) ---
        try:
            encrypted = EncryptedFileStorage()
            env = EnvVarStorage(env_mapping)
            composite = CompositeStorage(primary=encrypted, fallbacks=[env])
            store = CredentialStore(storage=composite)
        except Exception as e:
-            import logging
-
-            logging.getLogger(__name__).warning(
-                "Encrypted credential storage unavailable, falling back to env vars: %s", e
-            )
+            log.warning("Encrypted credential storage unavailable, falling back to env vars: %s", e)
            store = CredentialStore.with_env_storage(env_mapping)

        return cls(store=store, specs=specs)
@@ -1,5 +1,7 @@
 """Tests for CredentialStoreAdapter."""

+from unittest.mock import MagicMock, patch
+
 import pytest

 from aden_tools.credentials import (
@@ -484,3 +486,130 @@ class TestSpecCompleteness:
                assert spec.credential_group == "", (
                    f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'"
                )
+
+
+class TestCredentialStoreAdapterAdenSync:
+    """Tests for Aden sync branch in CredentialStoreAdapter.default()."""
+
+    def _patch_encrypted_storage(self, tmp_path):
+        """Patch EncryptedFileStorage to use a temp directory."""
+        from framework.credentials.storage import EncryptedFileStorage
+
+        original_init = EncryptedFileStorage.__init__
+
+        def patched_init(self_inner, base_path=None, **kwargs):
+            original_init(self_inner, base_path=str(tmp_path / "creds"), **kwargs)
+
+        return patch.object(EncryptedFileStorage, "__init__", patched_init)
+
+    def test_default_with_aden_key_creates_aden_store(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is set, default() wires up AdenSyncProvider."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Verify AdenSyncProvider is registered
+        provider = adapter.store.get_provider("aden_sync")
+        assert provider is not None
+
+    def test_default_without_aden_key_uses_env_fallback(self, monkeypatch, tmp_path):
+        """When ADEN_API_KEY is not set, default() uses env-only storage."""
+        monkeypatch.delenv("ADEN_API_KEY", raising=False)
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-brave-key")
+
+        with self._patch_encrypted_storage(tmp_path):
+            adapter = CredentialStoreAdapter.default()
+
+        # No Aden provider should be registered
+        assert adapter.store.get_provider("aden_sync") is None
+        # Env vars still work
+        assert adapter.get("brave_search") == "test-brave-key"
+
+    def test_default_aden_non_aden_cred_falls_through_to_env(self, monkeypatch, tmp_path):
+        """Non-Aden credentials (e.g. brave_search) resolve from env vars even with Aden."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-from-env")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.return_value = []
+        # Aden returns None for brave_search (404 → None)
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        assert adapter.get("brave_search") == "brave-from-env"
+
+    def test_default_aden_sync_failure_falls_back_gracefully(self, monkeypatch, tmp_path):
+        """If Aden initial sync fails, adapter is still created and env vars work."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        mock_client = MagicMock()
+        mock_client.list_integrations.side_effect = Exception("Connection refused")
+        mock_client.get_credential.return_value = None
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch(
+                "framework.credentials.aden.AdenCredentialClient",
+                return_value=mock_client,
+            ),
+            patch(
+                "framework.credentials.aden.AdenClientConfig",
+            ),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Adapter was created despite sync failure
+        assert adapter is not None
+        assert adapter.get("brave_search") == "brave-fallback"
+
+    def test_default_aden_import_error_falls_back(self, monkeypatch, tmp_path):
+        """If Aden imports fail (e.g. missing httpx), fall back to default storage."""
+        monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
+        monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
+
+        import builtins
+
+        real_import = builtins.__import__
+
+        def mock_import(name, *args, **kwargs):
+            if name == "framework.credentials.aden":
+                raise ImportError(f"No module named '{name}'")
+            return real_import(name, *args, **kwargs)
+
+        with (
+            self._patch_encrypted_storage(tmp_path),
+            patch.object(builtins, "__import__", side_effect=mock_import),
+        ):
+            adapter = CredentialStoreAdapter.default()
+
+        # Fell back to default — env vars still work, no Aden provider
+        assert adapter.store.get_provider("aden_sync") is None
+        assert adapter.get("brave_search") == "brave-fallback"
Author	SHA1	Message	Date
bryan	fb203b5bdf	update oauth to refresh token	2026-02-06 19:43:30 -08:00
Timothy @aden	7e40d6950a	Merge pull request #3871 from TimothyZhang7/main fix(micro-fix): uv paths in templates	2026-02-06 17:07:19 -08:00
Timothy	590bfa92cb	chore: fix mcp server default config	2026-02-06 17:04:03 -08:00
Timothy	f0e89a1720	fix: mcp server config with uv	2026-02-06 17:01:42 -08:00
Timothy @aden	575563b1e8	Merge pull request #3870 from adenhq/feat/multi-level-logging fix: hardening hive cli setup	2026-02-06 16:37:37 -08:00
RichardTang-Aden	2f57ca10f7	Merge pull request #3862 from adenhq/feat/hive-tui (micro-fix): documentation update	2026-02-06 16:19:46 -08:00
RichardTang-Aden	75c2d541c4	Merge branch 'main' into feat/hive-tui	2026-02-06 16:19:30 -08:00
Richard Tang	b666f8b50b	docs: minor doc update	2026-02-06 16:16:56 -08:00
RichardTang-Aden	09f9322676	Merge pull request #3863 from RichardTang-Aden/fix-remove-old-mock-mode Fix remove old mock mode	2026-02-06 16:02:01 -08:00
Richard Tang	f9a864ef93	fix: remove mock mode in the template	2026-02-06 15:59:48 -08:00
Richard Tang	27f28afe9c	fix: remove --mock in the codebase + documentation	2026-02-06 15:59:22 -08:00
Timothy @aden	8f85722fef	Merge pull request #3715 from adenhq/feat/multi-level-logging Feat/multi level logging	2026-02-06 15:59:16 -08:00
bryan	5588445a01	documentation update	2026-02-06 15:59:01 -08:00
Timothy @aden	cee632f50c	Merge pull request #3855 from adenhq/feat/hive-tui update tui to support menu, highlight/copy, update quickstart	2026-02-06 15:24:10 -08:00
RichardTang-Aden	51e81d80fc	Merge pull request #3853 from adenhq/docs-key-concepts Docs key concepts	2026-02-06 12:45:16 -08:00
Richard Tang	cd014e41e4	docs: update links in the README.md	2026-02-06 12:44:34 -08:00
Richard Tang	830f11c47d	docs: add key concept section	2026-02-06 12:41:22 -08:00
Timothy @aden	b22be7a6cb	Merge pull request #3818 from TimothyZhang7/main (micro-fix)(skills): cursor skill symlinks to claude skill	2026-02-06 09:32:23 -08:00
Timothy	433967f0cf	fix: cursor skill symlinks to claude skill	2026-02-05 18:11:24 -08:00
				`@@ -1 +0,0 @@`
				`../../.claude/skills/building-agents-construction`
				`@@ -1 +0,0 @@`
				`../../.claude/skills/building-agents-patterns`