docs: add instruction for running dummy agents and remove old documentation
This commit is contained in:
File diff suppressed because it is too large
Load Diff
+71
-160
@@ -8,11 +8,12 @@ This guide covers everything you need to know to develop with the Aden Agent Fra
|
||||
2. [Initial Setup](#initial-setup)
|
||||
3. [Project Structure](#project-structure)
|
||||
4. [Building Agents](#building-agents)
|
||||
5. [Testing Agents](#testing-agents)
|
||||
6. [Code Style & Conventions](#code-style--conventions)
|
||||
7. [Git Workflow](#git-workflow)
|
||||
8. [Common Tasks](#common-tasks)
|
||||
9. [Troubleshooting](#troubleshooting)
|
||||
5. [Running Agents](#running-agents)
|
||||
6. [Testing Agents](#testing-agents)
|
||||
7. [Code Style & Conventions](#code-style--conventions)
|
||||
8. [Git Workflow](#git-workflow)
|
||||
9. [Common Tasks](#common-tasks)
|
||||
10. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
@@ -40,121 +41,22 @@ Aden Agent Framework is a Python-based system for building goal-driven, self-imp
|
||||
|
||||
## Initial Setup
|
||||
|
||||
### Prerequisites
|
||||
See [environment-setup.md](./environment-setup.md) for the full setup guide, including Windows, Alpine Linux, and troubleshooting.
|
||||
|
||||
Ensure you have installed:
|
||||
|
||||
- **Python 3.11+** - [Download](https://www.python.org/downloads/) (3.12 or 3.13 recommended)
|
||||
- **uv** - Python package manager ([Install](https://docs.astral.sh/uv/getting-started/installation/))
|
||||
- **git** - Version control
|
||||
- **Claude Code** - [Install](https://docs.anthropic.com/claude/docs/claude-code) (optional)
|
||||
- **Codex CLI** - [Install](https://github.com/openai/codex) (optional)
|
||||
|
||||
Verify installation:
|
||||
### Quick Start
|
||||
|
||||
```bash
|
||||
python --version # Should be 3.11+
|
||||
uv --version # Should be latest
|
||||
git --version # Any recent version
|
||||
```
|
||||
|
||||
### Step-by-Step Setup
|
||||
|
||||
```bash
|
||||
# 1. Clone the repository
|
||||
git clone https://github.com/adenhq/hive.git
|
||||
cd hive
|
||||
|
||||
# 2. Run automated setup
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
The setup script performs these actions:
|
||||
|
||||
1. Checks Python version (3.11+)
|
||||
2. Installs `framework` package from `/core` (editable mode)
|
||||
3. Installs `aden_tools` package from `/tools` (editable mode)
|
||||
4. Prompts for a default LLM provider, including Hive LLM and OpenRouter
|
||||
5. Fixes package compatibility (upgrades openai for litellm)
|
||||
6. Verifies all installations
|
||||
|
||||
### API Keys (Optional)
|
||||
|
||||
For running agents with real LLMs:
|
||||
|
||||
```bash
|
||||
# Add to your shell profile (~/.bashrc, ~/.zshrc, etc.)
|
||||
export ANTHROPIC_API_KEY="your-key-here"
|
||||
export OPENAI_API_KEY="your-key-here" # Optional
|
||||
export OPENROUTER_API_KEY="your-key-here" # Optional, for OpenRouter models
|
||||
export HIVE_API_KEY="your-key-here" # Optional, for Hive LLM
|
||||
export BRAVE_SEARCH_API_KEY="your-key-here" # Optional, for web search tool
|
||||
```
|
||||
|
||||
Get API keys:
|
||||
|
||||
- **Anthropic**: [console.anthropic.com](https://console.anthropic.com/)
|
||||
- **OpenAI**: [platform.openai.com](https://platform.openai.com/)
|
||||
- **OpenRouter**: [openrouter.ai/keys](https://openrouter.ai/keys)
|
||||
- **Hive LLM**: [Hive Discord](https://discord.com/invite/hQdU7QDkgR)
|
||||
- **Brave Search**: [brave.com/search/api](https://brave.com/search/api/)
|
||||
|
||||
For OpenRouter and Hive LLM configuration snippets, see [configuration.md](./configuration.md).
|
||||
|
||||
### Install Claude Code Skills
|
||||
|
||||
```bash
|
||||
# Install building-agents and testing-agent skills
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
This sets up the MCP tools and workflows for building agents.
|
||||
|
||||
### Cursor IDE Support
|
||||
|
||||
MCP tools are also available in Cursor. To enable:
|
||||
|
||||
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
|
||||
2. Run `MCP: Enable` to enable MCP servers
|
||||
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
|
||||
4. Open Agent chat and verify MCP tools are available
|
||||
|
||||
### Codex CLI Support
|
||||
|
||||
Hive supports [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
|
||||
|
||||
Configuration files are tracked in git:
|
||||
- `.codex/config.toml` — MCP server config
|
||||
|
||||
To use Codex with Hive:
|
||||
1. Run `codex` in the repo root
|
||||
2. Start the configured MCP-assisted workflow
|
||||
|
||||
Example:
|
||||
```
|
||||
Start Codex in the repo root and use the configured MCP tools
|
||||
```
|
||||
|
||||
|
||||
### Opencode Support
|
||||
To enable Opencode integration:
|
||||
|
||||
1. Create/Ensure `.opencode/` directory exists
|
||||
2. Configure MCP servers in `.opencode/mcp.json`
|
||||
3. Restart Opencode to load the MCP servers
|
||||
4. Switch to the Hive agent
|
||||
* **Tools:** Accesses `coder-tools` and standard `tools` via standard MCP protocols over stdio.
|
||||
|
||||
### Verify Setup
|
||||
|
||||
```bash
|
||||
# Verify package imports
|
||||
uv run python -c "import framework; print('✓ framework OK')"
|
||||
uv run python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
uv run python -c "import litellm; print('✓ litellm OK')"
|
||||
|
||||
# Run an agent (after building one with coder-tools)
|
||||
PYTHONPATH=exports uv run python -m your_agent_name validate
|
||||
uv run python -c "import framework; print('OK')"
|
||||
uv run python -c "import aden_tools; print('OK')"
|
||||
uv run python -c "import litellm; print('OK')"
|
||||
```
|
||||
|
||||
---
|
||||
@@ -181,23 +83,29 @@ hive/ # Repository root
|
||||
│
|
||||
├── core/ # CORE FRAMEWORK PACKAGE
|
||||
│ ├── framework/ # Main package code
|
||||
│ │ ├── agents/ # Agent definitions and helpers
|
||||
│ │ ├── builder/ # Agent builder utilities
|
||||
│ │ ├── credentials/ # Credential management
|
||||
│ │ ├── debugger/ # Debugging tools
|
||||
│ │ ├── graph/ # GraphExecutor - executes node graphs
|
||||
│ │ ├── llm/ # LLM provider integrations (Anthropic, OpenAI, OpenRouter, Hive, etc.)
|
||||
│ │ ├── mcp/ # MCP server integration
|
||||
│ │ ├── monitoring/ # Runtime monitoring
|
||||
│ │ ├── observability/ # Structured logging - human-readable and machine-parseable tracing
|
||||
│ │ ├── runner/ # AgentRunner - loads and runs agents
|
||||
| | ├── observability/ # Structured logging - human-readable and machine-parseable tracing
|
||||
│ │ ├── runtime/ # Runtime environment
|
||||
│ │ ├── schemas/ # Data schemas
|
||||
│ │ ├── server/ # HTTP API server
|
||||
│ │ ├── skills/ # Skill definitions
|
||||
│ │ ├── storage/ # File-based persistence
|
||||
│ │ ├── testing/ # Testing utilities
|
||||
│ │ ├── tools/ # Built-in tool implementations
|
||||
│ │ ├── tui/ # Terminal UI dashboard
|
||||
│ │ └── __init__.py
|
||||
│ │ └── utils/ # Shared utilities
|
||||
│ ├── tests/ # Unit and E2E tests (including dummy agents)
|
||||
│ ├── pyproject.toml # Package metadata and dependencies
|
||||
│ ├── README.md # Framework documentation
|
||||
│ ├── MCP_INTEGRATION_GUIDE.md # MCP server integration guide
|
||||
│ └── docs/ # Protocol documentation
|
||||
│ └── MCP_INTEGRATION_GUIDE.md # MCP server integration guide
|
||||
│
|
||||
├── tools/ # TOOLS PACKAGE (MCP tools)
|
||||
│ ├── src/
|
||||
@@ -320,7 +228,11 @@ If you prefer to build agents manually:
|
||||
}
|
||||
```
|
||||
|
||||
### Running Agents
|
||||
---
|
||||
|
||||
## Running Agents
|
||||
|
||||
### Using the `hive` CLI
|
||||
|
||||
```bash
|
||||
# Browse and run agents interactively (Recommended)
|
||||
@@ -331,33 +243,35 @@ hive run exports/my_agent --input '{"ticket_content": "My login is broken", "cus
|
||||
|
||||
# Run with TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
|
||||
```
|
||||
|
||||
> **Using Python directly:** `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
|
||||
### CLI Command Reference
|
||||
|
||||
| Command | Description |
|
||||
| ---------------------- | ----------------------------------------------------------------------- |
|
||||
| `hive tui` | Browse agents and launch TUI dashboard |
|
||||
| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
|
||||
| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
|
||||
| `hive info <path>` | Show agent details |
|
||||
| `hive validate <path>` | Validate agent structure |
|
||||
| `hive list [dir]` | List available agents |
|
||||
| `hive dispatch [dir]` | Multi-agent orchestration |
|
||||
|
||||
### Using Python Directly
|
||||
|
||||
```bash
|
||||
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Agents
|
||||
|
||||
### Using Built-in Test Commands
|
||||
### Agent Tests
|
||||
|
||||
```bash
|
||||
# Run tests for an agent
|
||||
PYTHONPATH=exports uv run python -m agent_name test
|
||||
```
|
||||
|
||||
This generates and runs:
|
||||
|
||||
- **Constraint tests** - Verify agent respects constraints
|
||||
- **Success tests** - Verify agent achieves success criteria
|
||||
- **Integration tests** - End-to-end workflows
|
||||
|
||||
### Manual Testing
|
||||
|
||||
```bash
|
||||
# Run all tests for an agent
|
||||
PYTHONPATH=exports uv run python -m agent_name test
|
||||
|
||||
# Run specific test type
|
||||
PYTHONPATH=exports uv run python -m agent_name test --type constraint
|
||||
@@ -370,6 +284,32 @@ PYTHONPATH=exports uv run python -m agent_name test --parallel 4
|
||||
PYTHONPATH=exports uv run python -m agent_name test --fail-fast
|
||||
```
|
||||
|
||||
### Framework Tests
|
||||
|
||||
```bash
|
||||
# Run all unit tests (core + tools)
|
||||
make test
|
||||
|
||||
# Run linting and format checks
|
||||
make check
|
||||
```
|
||||
|
||||
### Dummy Agent Tests (E2E)
|
||||
|
||||
The repository includes end-to-end dummy agent tests under `core/tests/dummy_agents/` that run real LLM calls against deterministic graph structures. These are **not** part of CI — run them manually to verify the executor works with real providers.
|
||||
|
||||
```bash
|
||||
cd core && uv run python tests/dummy_agents/run_all.py
|
||||
```
|
||||
|
||||
The script detects available LLM credentials and prompts you to pick a provider. For verbose output:
|
||||
|
||||
```bash
|
||||
cd core && uv run python tests/dummy_agents/run_all.py --verbose
|
||||
```
|
||||
|
||||
See [environment-setup.md](./environment-setup.md#testing-with-dummy-agents) for the full list of covered agents and details.
|
||||
|
||||
### Writing Custom Tests
|
||||
|
||||
```python
|
||||
@@ -542,8 +482,6 @@ chore(deps): update React to 18.2.0
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Adding Python Dependencies
|
||||
@@ -660,30 +598,7 @@ hive run exports/my_agent --verbose --input '{"task": "..."}'
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Port Already in Use
|
||||
|
||||
```bash
|
||||
# Find process using port
|
||||
lsof -i :3000
|
||||
lsof -i :4000
|
||||
|
||||
# Kill process
|
||||
kill -9 <PID>
|
||||
|
||||
```
|
||||
|
||||
### Environment Variables Not Loading
|
||||
|
||||
```bash
|
||||
# Verify .env file exists at project root
|
||||
cat .env
|
||||
|
||||
# Or check shell environment
|
||||
echo $ANTHROPIC_API_KEY
|
||||
|
||||
# Create .env if needed
|
||||
# Then add your API keys
|
||||
```
|
||||
See [environment-setup.md](./environment-setup.md#troubleshooting) for common setup issues (module not found errors, broken installations, PEP 668, etc.).
|
||||
|
||||
---
|
||||
|
||||
@@ -693,7 +608,3 @@ echo $ANTHROPIC_API_KEY
|
||||
- **Issues**: Search [existing issues](https://github.com/adenhq/hive/issues)
|
||||
- **Discord**: Join our [community](https://discord.com/invite/MXE49hrKDk)
|
||||
- **Code Review**: Tag a maintainer on your PR
|
||||
|
||||
---
|
||||
|
||||
_Happy coding!_ 🐝
|
||||
|
||||
+41
-141
@@ -66,40 +66,6 @@ source .venv/bin/activate
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
## Manual Setup (Alternative)
|
||||
|
||||
If you prefer to set up manually or the script fails:
|
||||
|
||||
### 1. Sync Workspace Dependencies
|
||||
|
||||
```bash
|
||||
# From repository root - this creates a single .venv at the root
|
||||
uv sync
|
||||
```
|
||||
|
||||
> **Note:** The `uv sync` command uses the workspace configuration in `pyproject.toml` to install both `core` (framework) and `tools` (aden_tools) packages together. This is the recommended approach over individual `pip install -e` commands which may fail due to circular dependencies.
|
||||
|
||||
### 2. Activate the Virtual Environment
|
||||
|
||||
```bash
|
||||
# Linux/macOS
|
||||
source .venv/bin/activate
|
||||
|
||||
# Windows (PowerShell)
|
||||
.venv\Scripts\Activate.ps1
|
||||
```
|
||||
|
||||
### 3. Verify Installation
|
||||
|
||||
```bash
|
||||
uv run python -c "import framework; print('✓ framework OK')"
|
||||
uv run python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
uv run python -c "import litellm; print('✓ litellm OK')"
|
||||
```
|
||||
|
||||
> **Windows Tip:**
|
||||
> If the verification commands fail on Windows, disable "App Execution Aliases" in Windows Settings → Apps → App Execution Aliases.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Python Version
|
||||
@@ -119,47 +85,6 @@ uv run python -c "import litellm; print('✓ litellm OK')"
|
||||
|
||||
We recommend using `quickstart.sh` for LLM API credential setup and the credentials UI/tooling for tool credentials.
|
||||
|
||||
## Running Agents
|
||||
|
||||
The `hive` CLI is the primary interface for running agents:
|
||||
|
||||
```bash
|
||||
# Browse and run agents interactively (Recommended)
|
||||
hive tui
|
||||
|
||||
# Run a specific agent
|
||||
hive run exports/my_agent --input '{"task": "Your input here"}'
|
||||
|
||||
# Run with TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
```
|
||||
|
||||
### CLI Command Reference
|
||||
|
||||
| Command | Description |
|
||||
| ---------------------- | ----------------------------------------------------------------------- |
|
||||
| `hive tui` | Browse agents and launch TUI dashboard |
|
||||
| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
|
||||
| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
|
||||
| `hive info <path>` | Show agent details |
|
||||
| `hive validate <path>` | Validate agent structure |
|
||||
| `hive list [dir]` | List available agents |
|
||||
| `hive dispatch [dir]` | Multi-agent orchestration |
|
||||
|
||||
### Using Python directly (alternative)
|
||||
|
||||
```bash
|
||||
# From /hive/ directory
|
||||
PYTHONPATH=exports uv run python -m agent_name COMMAND
|
||||
```
|
||||
|
||||
Windows (PowerShell):
|
||||
|
||||
```powershell
|
||||
$env:PYTHONPATH="core;exports"
|
||||
python -m agent_name COMMAND
|
||||
```
|
||||
|
||||
## Building New Agents and Run Flow
|
||||
|
||||
Build and run an agent using Claude Code CLI with the agent building skills:
|
||||
@@ -454,30 +379,53 @@ hive tui
|
||||
hive run exports/your_agent_name --input '{"task": "..."}'
|
||||
```
|
||||
|
||||
## IDE Setup
|
||||
## Testing with Dummy Agents
|
||||
|
||||
### VSCode
|
||||
The repository includes a suite of dummy agents under `core/tests/dummy_agents/` for end-to-end testing against real LLM providers. These are **not** part of CI — they make real API calls and are meant to be run manually to verify the executor works correctly.
|
||||
|
||||
Add to `.vscode/settings.json`:
|
||||
### Running the Tests
|
||||
|
||||
```json
|
||||
{
|
||||
"python.analysis.extraPaths": [
|
||||
"${workspaceFolder}/core",
|
||||
"${workspaceFolder}/exports"
|
||||
],
|
||||
"python.autoComplete.extraPaths": [
|
||||
"${workspaceFolder}/core",
|
||||
"${workspaceFolder}/exports"
|
||||
]
|
||||
}
|
||||
```bash
|
||||
cd core && uv run python tests/dummy_agents/run_all.py
|
||||
```
|
||||
|
||||
### PyCharm
|
||||
The script auto-detects available LLM credentials and prompts you to pick a provider. You need at least one of:
|
||||
|
||||
1. Open Project Settings → Project Structure
|
||||
2. Mark `core` as Sources Root
|
||||
3. Mark `exports` as Sources Root
|
||||
- `ANTHROPIC_API_KEY`
|
||||
- `OPENAI_API_KEY`
|
||||
- `GEMINI_API_KEY`
|
||||
- `ZAI_API_KEY`
|
||||
- A Claude Code, Codex, or Kimi subscription
|
||||
|
||||
For verbose output with live LLM logs, tool calls, and node traversal details:
|
||||
|
||||
```bash
|
||||
cd core && uv run python tests/dummy_agents/run_all.py --verbose
|
||||
```
|
||||
|
||||
### What's Covered
|
||||
|
||||
| Agent | Tests | Coverage |
|
||||
| -------------- | ----- | ------------------------------------------------- |
|
||||
| echo | 2 | Single-node lifecycle, basic `set_output` |
|
||||
| pipeline | 4 | Multi-node traversal, `input_mapping`, conversation modes |
|
||||
| branch | 3 | Conditional edges, LLM-driven routing |
|
||||
| parallel_merge | 4 | Fan-out/fan-in, failure strategies |
|
||||
| retry | 4 | Retry mechanics, exhaustion, `ON_FAILURE` edges |
|
||||
| feedback_loop | 3 | Feedback cycles, `max_node_visits` |
|
||||
| worker | 4 | Real MCP tools (`example_tool`, `get_current_time`, `save_data`/`load_data`) |
|
||||
|
||||
Typical runtime is 1–3 minutes depending on provider latency.
|
||||
|
||||
### Running Individual Test Files
|
||||
|
||||
You can also run a specific dummy agent test with pytest directly:
|
||||
|
||||
```bash
|
||||
cd core && uv run pytest tests/dummy_agents/test_echo.py -v
|
||||
```
|
||||
|
||||
> **Note:** Individual pytest runs require the LLM provider to be configured via the `conftest.py` fixture. The `run_all.py` script handles this automatically.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -501,54 +449,6 @@ export HIVE_CREDENTIAL_KEY="your-fernet-key"
|
||||
export AGENT_STORAGE_PATH="/custom/storage"
|
||||
```
|
||||
|
||||
## Opencode Setup
|
||||
|
||||
[Opencode](https://github.com/opencode-ai/opencode) is fully supported as a coding agent.
|
||||
|
||||
### Automatic Setup
|
||||
|
||||
Run the quickstart script in the root directory:
|
||||
|
||||
```bash
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
## Codex Setup
|
||||
|
||||
[OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+) is supported with project-level config:
|
||||
|
||||
- `.codex/config.toml` — MCP server configuration
|
||||
|
||||
These files are tracked in git and available on clone. To use Codex with Hive:
|
||||
|
||||
1. Run `codex` in the repo root
|
||||
2. Start the configured MCP-assisted workflow
|
||||
|
||||
Quick verification:
|
||||
|
||||
```bash
|
||||
test -f .codex/config.toml && echo "OK: Codex config" || echo "MISSING: .codex/config.toml"
|
||||
echo "OK: .codex/config.toml and MCP tools configured"
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Framework Documentation:** [core/README.md](../core/README.md)
|
||||
- **Tools Documentation:** [tools/README.md](../tools/README.md)
|
||||
- **Example Agents:** [examples/](../examples/)
|
||||
- **Agent Building Guide:** [docs/developer-guide.md](./developer-guide.md)
|
||||
- **Testing Guide:** [core/README.md](../core/README.md)
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing agent packages:
|
||||
|
||||
1. Place agents in `exports/agent_name/`
|
||||
2. Follow the standard agent structure (see existing agents)
|
||||
3. Include README.md with usage instructions
|
||||
4. Add tests if using `test workflow`
|
||||
5. Document required environment variables
|
||||
|
||||
## Support
|
||||
|
||||
- **Issues:** https://github.com/adenhq/hive/issues
|
||||
|
||||
@@ -1,75 +0,0 @@
|
||||
# Hive Queen Bee: Native agent-building agent
|
||||
|
||||
## Problem
|
||||
|
||||
Building a Hive agent today requires manual assembly of 7+ files (`agent.py`, `config.py`, `nodes/__init__.py`, `__init__.py`, `__main__.py`, `mcp_servers.json`, tests) with precise framework conventions — correct imports, entry_points format, conversation_mode values, STEP 1/STEP 2 prompt patterns, nullable_output_keys, and more. A single missing re-export in `__init__.py` silently breaks `AgentRunner.load()`. This is the #1 friction point for new users and a recurring source of bugs even for experienced ones.
|
||||
|
||||
There is no tool that understands the framework deeply enough to produce correct agents. General-purpose coding assistants hallucinate tool names, use wrong import paths (`from core.framework...`), create too many thin nodes, forget module-level exports, and produce agents that fail validation.
|
||||
|
||||
## Proposal
|
||||
|
||||
Build **Hive Coder** (codename "Queen Bee") — a framework-native coding agent that lives inside the framework itself and builds complete, validated agent packages from natural language.
|
||||
|
||||
### Design principles
|
||||
|
||||
1. **Single-node, forever-alive** — One continuous EventLoopNode conversation handles the full lifecycle (understand, qualify, design, implement, verify, iterate). No artificial phase boundaries that destroy context.
|
||||
|
||||
2. **Meta-agent capabilities** — Not just a file writer. Can discover available MCP tools at runtime, inspect sessions/checkpoints of agents it builds, run their test suites, and debug failures.
|
||||
|
||||
3. **Self-verifying** — Runs three validation steps after every build: class validation (graph structure), `AgentRunner.load()` (package export contract), and pytest. Fixes its own errors up to 3 attempts.
|
||||
|
||||
4. **Honest qualification** — Assesses framework fit before building. If a use case is a poor fit (needs sub-second latency, pure CRUD, massive data pipelines), says so instead of producing a bad agent.
|
||||
|
||||
5. **Reference-grounded** — Ships with embedded reference docs (framework guide, file templates, anti-patterns) that it reads before writing code. No reliance on training data for framework specifics.
|
||||
|
||||
### Components
|
||||
|
||||
#### `hive_coder` agent (`core/framework/agents/hive_coder/`)
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `agent.py` | Goal, single-node graph, `HiveCoderAgent` class |
|
||||
| `nodes/__init__.py` | `coder` EventLoopNode with comprehensive system prompt |
|
||||
| `config.py` | RuntimeConfig with `~/.hive/configuration.json` auto-detection |
|
||||
| `__main__.py` | Click CLI (`run`, `tui`, `info`, `validate`, `shell`) |
|
||||
| `reference/framework_guide.md` | Node types, edges, patterns, async entry points |
|
||||
| `reference/file_templates.md` | Complete code templates for every agent file |
|
||||
| `reference/anti_patterns.md` | 22 common mistakes with explanations |
|
||||
|
||||
#### Coder Tools MCP Server (`tools/coder_tools_server.py`)
|
||||
|
||||
Dedicated tool server providing:
|
||||
|
||||
- **File I/O**: `read_file` (with line numbers, offset/limit), `write_file` (auto-mkdir), `edit_file` (9-strategy fuzzy matching ported from opencode), `list_directory`, `search_files` (regex)
|
||||
- **Shell**: `run_command` (timeout, cwd, output truncation)
|
||||
- **Git**: `undo_changes` (snapshot-based rollback)
|
||||
- **Meta-agent**: `discover_mcp_tools`, `list_agents`, `list_agent_sessions`, `list_agent_checkpoints`, `get_agent_checkpoint`, `run_agent_tests`
|
||||
|
||||
All file operations sandboxed to a configurable project root.
|
||||
|
||||
#### Framework changes
|
||||
|
||||
- `hive code` CLI command — direct launch shortcut
|
||||
- `hive tui` — discovers framework agents as a source
|
||||
- `AgentRuntime` — cron expression support (`croniter`) for async entry points
|
||||
- `prompt_composer` — appends current datetime to system prompts
|
||||
- `NodeSpec.max_node_visits` — default changed from 1 to 0 (unbounded), matching forever-alive as the standard pattern
|
||||
- TUI graph view — cron display and hours in countdown
|
||||
- CredentialError graceful handling in TUI launch
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
- [ ] `hive code` launches Hive Coder in the TUI
|
||||
- [ ] `hive tui` lists framework agents alongside exports/ and examples/
|
||||
- [ ] Given "build me a research agent that searches the web and summarizes findings", Hive Coder produces a valid package in `exports/` that passes `AgentRunner.load()`
|
||||
- [ ] Tool discovery works: agent calls `discover_mcp_tools()` before designing, never fabricates tool names
|
||||
- [ ] Self-verification: agent runs all 3 validation steps and fixes errors before presenting
|
||||
- [ ] Cron timers fire on schedule (unit tested)
|
||||
- [ ] `max_node_visits=0` default does not break existing agents or tests
|
||||
- [ ] Reference docs are accurate and match current framework behavior
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Multi-agent orchestration (queen spawning worker agents at runtime) — future work
|
||||
- GUI/web interface — TUI only for v1
|
||||
- Auto-publishing to a registry — agents are local packages
|
||||
@@ -1,288 +0,0 @@
|
||||
# Plan: Multi-Graph Sessions with Guardian Pattern
|
||||
|
||||
## Context
|
||||
|
||||
The target experience: hive_coder builds an agent (e.g., email automation), loads it into the same runtime session, and acts as its guardian. The email agent runs autonomously while hive_coder watches for failures. On error, hive_coder asks the user for help if they're around, attempts an autonomous fix if they're away, and escalates catastrophic failures for post-mortem.
|
||||
|
||||
This requires multiple agent graphs sharing a single `AgentRuntime` session — shared memory and data, but isolated conversations. The existing runtime already has most of the primitives: `ExecutionStream` accepts its own `graph`, `trigger_type="event"` subscribes entry points to the EventBus, and `_get_primary_session_state()` bridges memory across streams.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
AgentRuntime (shared EventBus, shared state.json, shared data/)
|
||||
├── hive_coder graph
|
||||
│ ├── Stream "default" → coder node (client_facing, manual)
|
||||
│ └── Stream "guardian" → guardian node (event-driven, subscribes to EXECUTION_FAILED)
|
||||
└── email_agent graph
|
||||
└── Stream "email_agent::default" → intake node (client_facing, manual)
|
||||
```
|
||||
|
||||
The guardian entry point on hive_coder fires when email_agent emits `EXECUTION_FAILED`. It receives the failure event in its input, reads shared memory for context, and decides: ask user (if present), auto-fix (if away), or escalate (if catastrophic).
|
||||
|
||||
## Gap 1: Event Scoping — `graph_id` on Events
|
||||
|
||||
**Problem**: EventBus events carry `stream_id` and `node_id` but no `graph_id`. The guardian needs to subscribe to events from a specific graph (email_agent), not a specific stream name.
|
||||
|
||||
**Solution**: Add `graph_id: str | None = None` to `AgentEvent` and `filter_graph` to `Subscription`.
|
||||
|
||||
### `core/framework/runtime/event_bus.py`
|
||||
- `AgentEvent` dataclass: add `graph_id: str | None = None` field, include in `to_dict()`
|
||||
- `Subscription` dataclass: add `filter_graph: str | None = None`
|
||||
- `subscribe()`: accept `filter_graph` param, pass to `Subscription`
|
||||
- `_matches()`: check `filter_graph` against `event.graph_id`
|
||||
|
||||
### `core/framework/runtime/execution_stream.py`
|
||||
- `__init__()`: accept `graph_id: str | None = None`, store as `self.graph_id`
|
||||
- When emitting events via `_event_bus.publish()`: set `event.graph_id = self.graph_id`
|
||||
|
||||
## Gap 2: Multi-Graph Runtime — `add_graph()` / `remove_graph()`
|
||||
|
||||
**Problem**: `AgentRuntime.__init__` takes a single `GraphSpec`. We need to add/remove graphs dynamically at runtime.
|
||||
|
||||
**Solution**: Keep the primary graph on `__init__`. Add methods to register secondary graphs that create their own `ExecutionStream` instances backed by a different graph.
|
||||
|
||||
### `core/framework/runtime/agent_runtime.py`
|
||||
|
||||
New instance state:
|
||||
```python
|
||||
self._graph_id: str = graph_id or "primary" # ID for the primary graph
|
||||
self._graphs: dict[str, _GraphRegistration] = {} # graph_id -> registration
|
||||
self._active_graph_id: str = self._graph_id # TUI focus
|
||||
```
|
||||
|
||||
Where `_GraphRegistration` is a simple dataclass:
|
||||
```python
|
||||
@dataclass
|
||||
class _GraphRegistration:
|
||||
graph: GraphSpec
|
||||
goal: Goal
|
||||
entry_points: dict[str, EntryPointSpec]
|
||||
streams: dict[str, ExecutionStream]
|
||||
storage_subpath: str # relative to session root, e.g. "graphs/email_agent"
|
||||
event_subscriptions: list[str] # EventBus subscription IDs
|
||||
timer_tasks: list[asyncio.Task]
|
||||
```
|
||||
|
||||
New methods:
|
||||
- `add_graph(graph_id, graph, goal, entry_points, storage_subpath=None)` — creates streams for the graph using graph-scoped storage, sets up event/timer triggers, stamps `graph_id` on all streams. Can be called while running.
|
||||
- `remove_graph(graph_id)` — stops streams, cancels timers, unsubscribes events, removes registration. Cannot remove primary graph.
|
||||
- `list_graphs() -> list[str]` — returns all graph IDs
|
||||
- `active_graph_id` property with setter — TUI uses this to control which graph's events are displayed
|
||||
|
||||
Update existing methods:
|
||||
- `start()`: stamp `self._graph_id` on primary graph streams (via `ExecutionStream.graph_id`)
|
||||
- `inject_input(node_id, content)`: search active graph's streams first, then all others
|
||||
- `_get_primary_session_state()`: search across ALL graphs' streams (not just primary's)
|
||||
- `stop()`: stop all secondary graph streams/timers/subscriptions too
|
||||
|
||||
### Storage Layout
|
||||
```
|
||||
~/.hive/agents/hive_coder/sessions/{session_id}/
|
||||
state.json ← SHARED across all graphs
|
||||
data/ ← SHARED data directory
|
||||
conversations/coder/ ← hive_coder conversations
|
||||
graphs/
|
||||
email_agent/ ← secondary graph storage root
|
||||
conversations/
|
||||
intake/
|
||||
checkpoints/
|
||||
```
|
||||
|
||||
Secondary graph executors get `storage_path = {session_root}/graphs/{graph_id}/` while `state.json` and `data/` remain at the session root. The `resume_session_id` mechanism in `_get_primary_session_state()` already handles this — secondary executions find the primary session's `state.json`.
|
||||
|
||||
**Concurrent state.json writes**: For the guardian pattern (sequential: email_agent fails → guardian triggers), no file lock needed. But since both could technically write concurrently, add a simple `fcntl.flock()` wrapper around `_write_progress()` in the executor. Small, defensive change.
|
||||
|
||||
## Gap 3: Guardian Pattern — User Presence + Autonomous Recovery
|
||||
|
||||
**Problem**: When email_agent fails, hive_coder's guardian entry point must decide: ask user or auto-fix.
|
||||
|
||||
**Solution**: User presence is a runtime-level signal. The guardian's system prompt and event data give it enough context to decide.
|
||||
|
||||
### User Presence Tracking
|
||||
Add to `AgentRuntime`:
|
||||
```python
|
||||
self._last_user_input_time: float = 0.0 # monotonic timestamp
|
||||
```
|
||||
|
||||
Updated in `inject_input()` (called whenever user types in TUI). Exposed as:
|
||||
```python
|
||||
@property
|
||||
def user_idle_seconds(self) -> float:
|
||||
if self._last_user_input_time == 0:
|
||||
return float('inf')
|
||||
return time.monotonic() - self._last_user_input_time
|
||||
```
|
||||
|
||||
The guardian node's system prompt instructs the LLM: "If user_idle_seconds < 120, ask the user for guidance via the client-facing interaction. If user is away, attempt an autonomous fix."
|
||||
|
||||
This is NOT framework logic — it's prompt-driven. The guardian node is a regular `event_loop` node with `client_facing=True` and tools for code editing + agent lifecycle. The LLM decides the strategy based on presence info injected as context.
|
||||
|
||||
### Escalation Model
|
||||
Escalation = save a structured log entry. No special framework support needed. The guardian node uses `save_data("escalation_log.jsonl", ...)` via the existing data tools. The LLM writes:
|
||||
```json
|
||||
{"timestamp": "...", "severity": "catastrophic", "agent": "email_agent", "error": "...", "attempted_fixes": [...], "recommended_action": "..."}
|
||||
```
|
||||
|
||||
Post-mortem: user opens `/data escalation_log.jsonl` or the TUI shows a notification linking to it.
|
||||
|
||||
## Gap 4: Graph Lifecycle Tools — Stop/Reload/Restart
|
||||
|
||||
**Problem**: hive_coder needs to programmatically stop a broken agent, fix its code, reload it, and restart it.
|
||||
|
||||
**Solution**: MCP tools accessible to the active agent. Uses `ContextVar` to access the runtime (same pattern as `data_dir`).
|
||||
|
||||
### `core/framework/tools/session_graph_tools.py` (NEW)
|
||||
|
||||
```python
|
||||
async def load_agent(agent_path: str) -> str:
|
||||
"""Load an agent graph into the running session."""
|
||||
|
||||
async def unload_agent(graph_id: str) -> str:
|
||||
"""Stop and remove an agent graph from the session."""
|
||||
|
||||
async def start_agent(graph_id: str, entry_point: str = "default", input_data: str = "{}") -> str:
|
||||
"""Trigger an entry point on a loaded agent graph."""
|
||||
|
||||
async def restart_agent(graph_id: str) -> str:
|
||||
"""Unload and re-load an agent (picks up code changes)."""
|
||||
|
||||
async def list_agents() -> str:
|
||||
"""List all agent graphs in the current session with their status."""
|
||||
|
||||
async def get_user_presence() -> str:
|
||||
"""Return user idle time and presence status."""
|
||||
```
|
||||
|
||||
These tools call `runtime.add_graph()`, `runtime.remove_graph()`, `runtime.trigger()`, etc.
|
||||
|
||||
### Registration
|
||||
These tools are registered via `ToolRegistry` with `CONTEXT_PARAM` for `runtime` (injected by the executor, same as `data_dir`). Only available when the runtime is multi-graph capable (set by `cmd_code()`).
|
||||
|
||||
## Gap 5: TUI Integration — Graph Switching + Background Notifications
|
||||
|
||||
### `core/framework/tui/app.py`
|
||||
- `_route_event()`: check `event.graph_id` against `runtime.active_graph_id`
|
||||
- Events from active graph: route normally (streaming, chat, etc.)
|
||||
- `CLIENT_INPUT_REQUESTED` from background graph: show notification bar
|
||||
- `EXECUTION_FAILED` from background graph: show error notification
|
||||
- `EXECUTION_COMPLETED` from background: show brief completion notice
|
||||
- Other background events: silent (visible in logs)
|
||||
- `action_switch_graph(graph_id)`: update `runtime.active_graph_id`, refresh graph view, show header
|
||||
|
||||
### `core/framework/tui/widgets/chat_repl.py`
|
||||
- Track `_input_graph_id: str | None` alongside `_input_node_id`
|
||||
- `handle_input_requested(node_id, graph_id)`: if background graph, show notification instead of enabling input
|
||||
- `_submit_input()`: pass `graph_id` to help `inject_input()` route correctly
|
||||
- New TUI commands:
|
||||
- `/graphs` — list loaded graphs and their status
|
||||
- `/graph <id>` — switch active graph focus
|
||||
- `/load <path>` — load an agent graph into the session
|
||||
- `/unload <id>` — remove a graph from the session
|
||||
- On graph switch: flush streaming state, render graph header separator
|
||||
|
||||
### `core/framework/tui/widgets/graph_view.py`
|
||||
- `switch_graph(graph_id)` — re-render the graph visualization for the new active graph
|
||||
- When multi-graph active: show tab-like header listing all loaded graphs
|
||||
|
||||
## Gap 6: CLI + Runner Integration
|
||||
|
||||
### `core/framework/runner/cli.py`
|
||||
- `cmd_code()` creates the hive_coder runtime with `graph_id="hive_coder"`
|
||||
- Registers `session_graph_tools` with the tool config so hive_coder's LLM can call them
|
||||
- Sets `runtime._multi_graph_capable = True` flag
|
||||
|
||||
### `core/framework/runner/runner.py`
|
||||
- New method: `setup_as_secondary(runtime, graph_id)` — configures this runner to join an existing `AgentRuntime` as a secondary graph. Uses the existing `AgentRunner.load()` to parse agent.json, then calls `runtime.add_graph()` with the parsed graph/goal/entry_points.
|
||||
|
||||
## Gap 7: Reliable Mid-Node Resume
|
||||
|
||||
**Problem**: When an EventLoopNode is interrupted (crash, Ctrl+Z, context switch), resume doesn't restore to exactly where execution stopped. Several pieces of in-node state are lost, which changes behavior post-resume. In multi-graph sessions with parallel execution and frequent context switching, these gaps compound.
|
||||
|
||||
### What's already restored correctly
|
||||
- **Conversation history**: All messages persisted to disk immediately via `FileConversationStore._persist()` — one file per message in `parts/NNNNNNNNNN.json`
|
||||
- **OutputAccumulator values**: Write-through to `cursor.json` on every `accumulator.set()` call
|
||||
- **Iteration counter**: Written to `cursor.json` at the end of each iteration (step 6g)
|
||||
- **Orphaned tool calls**: `_repair_orphaned_tool_calls()` patches in-flight tool calls with error messages so the LLM knows to retry
|
||||
|
||||
### What's lost — and fixes
|
||||
|
||||
#### 1. `user_interaction_count` (CRITICAL)
|
||||
Resets to 0 on resume. This controls client-facing blocking semantics: before the first interaction, `set_output`-only turns don't prevent blocking (the LLM must present to the user first). After resume, a node that had 3 user interactions behaves as if the user never interacted.
|
||||
|
||||
**Fix**: Persist `user_interaction_count` to `cursor.json` alongside `iteration` and `outputs`. Write it in `_write_cursor()` (step 6g), restore in `_restore()`.
|
||||
|
||||
**Files**: `core/framework/graph/event_loop_node.py`
|
||||
|
||||
#### 2. Accumulator outputs not in SharedMemory
|
||||
The `OutputAccumulator` writes to `cursor.json` (durable) but only writes to `SharedMemory` when the judge ACCEPTs. On crash, the CancelledError handler captures `memory.read_all()` — which doesn't include the accumulator's WIP values. On resume, edge conditions checking those memory keys see `None`.
|
||||
|
||||
**Fix**: In the executor's `CancelledError` handler, read the interrupted node's `cursor.json` and write any accumulator outputs to `memory` before building `session_state_out`. This ensures resume memory includes WIP output values.
|
||||
|
||||
**Files**: `core/framework/graph/executor.py` (CancelledError handler, ~line 1289)
|
||||
|
||||
#### 3. Stall/doom-loop detection counters
|
||||
`recent_responses` and `recent_tool_fingerprints` reset to empty lists. A previously near-stalled node gets a fresh detection budget.
|
||||
|
||||
**Fix**: Persist these to `cursor.json`. They're small (last N strings). Write in `_write_cursor()`, restore in `_restore()`.
|
||||
|
||||
**Files**: `core/framework/graph/event_loop_node.py`
|
||||
|
||||
#### 4. `continuous_conversation` at executor level
|
||||
In continuous mode, the executor's `continuous_conversation` variable is `None` on resume. The node's `_restore()` recovers messages from disk, but the executor doesn't pre-populate this variable until the node returns.
|
||||
|
||||
**Fix**: After a resumed node completes, set `continuous_conversation = result.conversation` (this already happens in the normal path at line 1155 — verify it also runs on the resume path).
|
||||
|
||||
**Files**: `core/framework/graph/executor.py`
|
||||
|
||||
### Multi-graph specific: independent resume per graph
|
||||
Each graph in a multi-graph session has its own storage subdirectory (`graphs/{graph_id}/`) with its own `conversations/`, `checkpoints/`, and `cursor.json` files. Resume is already per-executor, so each graph resumes independently. The shared `state.json` at the session root captures the union of all graphs' memory — the `fcntl.flock()` wrapper on `_write_progress()` (Gap 2) ensures concurrent writes don't corrupt it.
|
||||
|
||||
### Implementation
|
||||
These fixes are prerequisite to multi-graph and should be done as **Phase 0** before the EventBus changes:
|
||||
1. Persist `user_interaction_count` + stall/doom counters to `cursor.json`
|
||||
2. Restore them in `_restore()`
|
||||
3. Flush accumulator outputs to SharedMemory in executor's CancelledError handler
|
||||
4. Verify continuous_conversation is set on resume path
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 0: Reliable Mid-Node Resume (prerequisite)
|
||||
1. `event_loop_node.py` — persist `user_interaction_count`, `recent_responses`, `recent_tool_fingerprints` to `cursor.json` via `_write_cursor()`; restore in `_restore()`
|
||||
2. `executor.py` — in CancelledError handler, read interrupted node's `cursor.json` accumulator outputs and write to `memory` before building `session_state_out`
|
||||
3. `executor.py` — verify `continuous_conversation` is populated on resume path
|
||||
|
||||
### Phase 1: EventBus Foundation
|
||||
1. `event_bus.py` — `graph_id` on `AgentEvent`, `filter_graph` on `Subscription` + `_matches()`
|
||||
2. `execution_stream.py` — accept and stamp `graph_id` on emitted events
|
||||
|
||||
### Phase 2: Multi-Graph Runtime
|
||||
3. `agent_runtime.py` — `_GraphRegistration` dataclass, `add_graph()`, `remove_graph()`, `list_graphs()`, `active_graph_id` property
|
||||
4. `agent_runtime.py` — update `inject_input()`, `_get_primary_session_state()`, `stop()` for multi-graph
|
||||
5. `agent_runtime.py` — user presence tracking (`_last_user_input_time`, `user_idle_seconds`)
|
||||
6. Storage path logic: secondary graphs get `{session_root}/graphs/{graph_id}/`
|
||||
|
||||
### Phase 3: Graph Lifecycle Tools
|
||||
7. `core/framework/tools/session_graph_tools.py` — `load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`
|
||||
8. `runner.py` — `setup_as_secondary()` method
|
||||
|
||||
### Phase 4: TUI Integration
|
||||
9. `app.py` — `graph_id` event filtering, background notifications, `action_switch_graph`
|
||||
10. `chat_repl.py` — `/graphs`, `/graph`, `/load`, `/unload` commands, graph_id tracking
|
||||
11. `graph_view.py` — multi-graph header, `switch_graph()`
|
||||
|
||||
### Phase 5: hive_coder Integration
|
||||
12. `cli.py` — `cmd_code()` sets up multi-graph capable runtime, registers graph tools
|
||||
13. hive_coder's agent config — add guardian entry point with `trigger_type="event"` subscribing to `EXECUTION_FAILED`
|
||||
14. Guardian node system prompt — presence-aware triage logic (ask user / auto-fix / escalate)
|
||||
|
||||
## Backward Compatibility
|
||||
- Single-graph `hive run exports/my_agent` unchanged: `graph_id` defaults to `None`, no secondary graphs loaded, events carry `graph_id=None`, TUI shows no graph switching UI
|
||||
- All new fields are optional with `None` defaults
|
||||
- `_get_primary_session_state()` existing behavior preserved when no secondary graphs exist
|
||||
|
||||
## Verification
|
||||
1. **Unit**: `add_graph()` creates streams with correct `graph_id`, events carry `graph_id`, `filter_graph` works in subscriptions, `inject_input()` routes to correct graph
|
||||
2. **Integration**: Load hive_coder + email_agent, email_agent fails → guardian fires → reads shared memory → decides action
|
||||
3. **TUI**: `/graphs` shows both, `/graph` switches, background failure notification appears, input routing works across graphs
|
||||
4. **Backward compat**: `hive run exports/deep_research_agent --tui` works unchanged
|
||||
5. **Lifecycle**: `restart_agent` picks up code changes, `unload_agent` cleans up streams and subscriptions
|
||||
Reference in New Issue
Block a user