chore: resolve merge conflicts with main

This commit is contained in:
y0sif
2026-02-14 21:38:44 +02:00
302 changed files with 42521 additions and 5333 deletions
+9
View File
@@ -0,0 +1,9 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"],
"disabled": false
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-concepts
---
use hive-concepts skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-create
---
use hive-create skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-credentials
---
use hive-credentials skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-patterns
---
use hive-patterns skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-test
---
use hive-test skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive
---
use hive skill
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+34
View File
@@ -0,0 +1,34 @@
{
"permissions": {
"allow": [
"mcp__agent-builder__create_session",
"mcp__agent-builder__set_goal",
"mcp__agent-builder__add_node",
"mcp__agent-builder__add_edge",
"mcp__agent-builder__configure_loop",
"mcp__agent-builder__add_mcp_server",
"mcp__agent-builder__validate_graph",
"mcp__agent-builder__export_graph",
"mcp__agent-builder__load_session_by_id",
"Bash(git status:*)",
"Bash(gh run view:*)",
"Bash(uv run:*)",
"Bash(env:*)",
"mcp__agent-builder__test_node",
"mcp__agent-builder__list_mcp_tools",
"Bash(python -m py_compile:*)",
"Bash(python -m pytest:*)",
"Bash(source:*)",
"mcp__agent-builder__update_node",
"mcp__agent-builder__check_missing_credentials",
"mcp__agent-builder__list_stored_credentials",
"Bash(find:*)",
"mcp__agent-builder__run_tests",
"Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)",
"mcp__agent-builder__list_agent_sessions",
"mcp__agent-builder__generate_constraint_tests",
"mcp__agent-builder__generate_success_tests"
]
},
"enabledMcpjsonServers": ["agent-builder", "tools"]
}
@@ -1,415 +0,0 @@
---
name: building-agents-construction
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: procedural
part_of: building-agents
requires: building-agents-core
---
# Agent Construction - EXECUTE THESE STEPS
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it.
---
## STEP 1: Initialize Build Environment
**EXECUTE THESE TOOL CALLS NOW:**
1. Register the hive-tools MCP server:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Discover available tools:
```
mcp__agent-builder__list_mcp_tools()
```
4. Create the package directory:
```
mkdir -p exports/AGENT_NAME/nodes
```
**AFTER completing these calls**, tell the user:
> ✅ Build environment initialized
>
> - Session created
> - Available tools: [list the tools from step 3]
>
> Proceeding to define the agent goal...
**THEN immediately proceed to STEP 2.**
---
## STEP 2: Define and Approve Goal
**PROPOSE a goal to the user.** Based on what they asked for, propose:
- Goal ID (kebab-case)
- Goal name
- Goal description
- 3-5 success criteria (each with: id, description, metric, target, weight)
- 2-4 constraints (each with: id, description, constraint_type, category)
**FORMAT your proposal as a clear summary, then ask for approval:**
> **Proposed Goal: [Name]**
>
> [Description]
>
> **Success Criteria:**
>
> 1. [criterion 1]
> 2. [criterion 2]
> ...
>
> **Constraints:**
>
> 1. [constraint 1]
> 2. [constraint 2]
> ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this goal definition?",
"header": "Goal",
"options": [
{"label": "Approve", "description": "Goal looks good, proceed"},
{"label": "Modify", "description": "I want to change something"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
- If **Modify**: Ask what they want to change, update proposal, ask again
---
## STEP 3: Design Node Workflow
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
**DESIGN the workflow** as a series of nodes. For each node, determine:
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist - empty list if no tools needed)
- system_prompt (should mention `set_output` for producing structured outputs)
- client_facing: True if this node interacts with the user
- nullable_output_keys (for mutually exclusive outputs)
- max_node_visits (>1 if this node is a feedback loop target)
**PRESENT the workflow to the user:**
> **Proposed Workflow: [N] nodes**
>
> 1. **[node-id]** - [description]
>
> - Type: event_loop [client-facing] / function
> - Input: [keys]
> - Output: [keys]
> - Tools: [tools or "none"]
>
> 2. **[node-id]** - [description]
> ...
>
> **Flow:** node1 → node2 → node3 → ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this workflow design?",
"header": "Workflow",
"options": [
{"label": "Approve", "description": "Workflow looks good, proceed to build nodes"},
{"label": "Modify", "description": "I want to change the workflow"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 4
- If **Modify**: Ask what they want to change, update design, ask again
---
## STEP 4: Build Nodes One by One
**FOR EACH node in the approved workflow:**
1. **Call** `mcp__agent-builder__add_node(...)` with the node details
- input_keys and output_keys must be JSON strings: `'["key1", "key2"]'`
- tools must be a JSON string: `'["tool1"]'` or `'[]'`
2. **Call** `mcp__agent-builder__test_node(...)` to validate:
```
mcp__agent-builder__test_node(
node_id="the-node-id",
test_input='{"key": "test value"}',
mock_llm_response='{"output_key": "test output"}'
)
```
3. **Check result:**
- If valid: Tell user "✅ Node [id] validated" and continue to next node
- If invalid: Show errors, fix the node, re-validate
4. **Show progress** after each node:
```
mcp__agent-builder__get_session_status()
```
> ✅ Node [X] of [Y] complete: [node-id]
**AFTER all nodes are added and validated**, proceed to STEP 5.
---
## STEP 5: Connect Edges
**DETERMINE the edges** based on the workflow flow. For each connection:
- edge_id (kebab-case)
- source (node that outputs)
- target (node that receives)
- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"`
- condition_expr (Python expression using `output.get(...)`, only if conditional)
- priority (positive = forward edge evaluated first, negative = feedback edge)
**FOR EACH edge, call:**
```
mcp__agent-builder__add_edge(
edge_id="source-to-target",
source="source-node-id",
target="target-node-id",
condition="on_success",
condition_expr="",
priority=1
)
```
**AFTER all edges are added, validate the graph:**
```
mcp__agent-builder__validate_graph()
```
- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6
- If invalid: Show errors, fix edges, re-validate
---
## STEP 6: Generate Agent Package
**EXPORT the graph data:**
```
mcp__agent-builder__export_graph()
```
This returns JSON with all the goal, nodes, edges, and MCP server configurations.
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
5. `__main__.py` - CLI interface
6. `mcp_servers.json` - MCP server configurations
7. `README.md` - Usage documentation
**IMPORTANT entry_points format:**
- MUST be: `{"start": "first-node-id"}`
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
- NOT: `{"first-node-id"}` (WRONG - this is a set)
**Use the example agent** at `.claude/skills/building-agents-construction/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.
**AFTER writing all files, tell the user:**
> ✅ Agent package created: `exports/AGENT_NAME/`
>
> **Files generated:**
>
> - `__init__.py` - Package exports
> - `agent.py` - Goal, nodes, edges, agent class
> - `config.py` - Runtime configuration
> - `__main__.py` - CLI interface
> - `nodes/__init__.py` - Node definitions
> - `mcp_servers.json` - MCP server config
> - `README.md` - Usage documentation
>
> **Test your agent:**
>
> ```bash
> cd /home/timothy/oss/hive
> PYTHONPATH=exports uv run python -m AGENT_NAME validate
> PYTHONPATH=exports uv run python -m AGENT_NAME info
> ```
---
## STEP 7: Verify and Test
**RUN validation:**
```bash
cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME validate
```
- If valid: Agent is complete!
- If errors: Fix the issues and re-run
**SHOW final session summary:**
```
mcp__agent-builder__get_session_status()
```
**TELL the user the agent is ready** and suggest next steps:
- Run with mock mode to test without API calls
- Use `/testing-agent` skill for comprehensive testing
- Use `/setup-credentials` if the agent needs API keys
---
## REFERENCE: Node Types
| Type | tools param | Use when |
|------|-------------|----------|
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
---
## REFERENCE: NodeSpec New Fields
| Field | Default | Description |
|-------|---------|-------------|
| `client_facing` | `False` | Streams output to user, blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (mutually exclusive outputs) |
| `max_node_visits` | `1` | Max executions per run. Set >1 for feedback loop targets. 0=unlimited |
---
## REFERENCE: Edge Conditions & Priority
| Condition | When edge is followed |
|-----------|--------------------------------------|
| `on_success` | Source node completed successfully |
| `on_failure` | Source node failed |
| `always` | Always, regardless of success/failure |
| `conditional` | When condition_expr evaluates to True |
**Priority:** Positive = forward edge (evaluated first). Negative = feedback edge (loops back to earlier node). Multiple ON_SUCCESS edges from same source = parallel execution (fan-out).
---
## REFERENCE: System Prompt Best Practice
For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:
```
Use set_output(key, value) to store your results. For example:
- set_output("search_results", <your results as a JSON string>)
Do NOT return raw JSON. Use the set_output tool to produce outputs.
```
For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
```
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
- set_output("key", "value based on user's response")
```
This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
---
## EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.
```python
# Direct execution
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
storage_path = Path.home() / ".hive" / "my_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
```
**DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.
---
## COMMON MISTAKES TO AVOID
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
3. **Skipping validation** - Always validate nodes and graph before proceeding
4. **Not waiting for approval** - Always ask user before major steps
5. **Displaying this file** - Execute the steps, don't show documentation
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
@@ -1,46 +0,0 @@
"""Runtime configuration."""
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 8192
api_key: str | None = None
api_base: str | None = None
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
metadata = AgentMetadata()
@@ -1,12 +1,12 @@
---
name: building-agents-core
name: hive-concepts
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: foundational
part_of: building-agents
part_of: hive
---
# Building Agents - Core Concepts
@@ -251,6 +251,7 @@ The judge controls when a node's loop exits:
Controls loop behavior:
- `max_iterations` (default 50) — prevents infinite loops
- `max_tool_calls_per_turn` (default 10) — limits tool calls per LLM response
- `tool_call_overflow_margin` (default 0.5) — wiggle room before discarding extra tool calls (50% means hard cutoff at 150% of limit)
- `stall_detection_threshold` (default 3) — detects repeated identical responses
- `max_history_tokens` (default 32000) — triggers conversation compaction
@@ -258,9 +259,12 @@ Controls loop behavior:
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
- `save_data(filename, data, data_dir)` — Write data to a file in the data directory
- `load_data(filename, data_dir, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files(data_dir)` — List available data files
- `save_data(filename, data)` — Write data to a file in the data directory
- `load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files()` — List available data files
- `serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
@@ -346,15 +350,15 @@ Before writing a node with `tools=[...]`:
## When to Use This Skill
Use building-agents-core when:
Use hive-concepts when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `building-agents-construction` skill
- Need patterns and examples? → Use `building-agents-patterns` skill
- Ready to build? → Use `hive-create` skill
- Need patterns and examples? → Use `hive-patterns` skill
## MCP Tools for Validation
@@ -389,7 +393,7 @@ mcp__agent-builder__configure_loop(
## Related Skills
- **building-agents-construction** - Step-by-step building process
- **building-agents-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **agent-workflow** - Complete workflow orchestrator
- **testing-agent** - Test and validate completed agents
- **hive-create** - Step-by-step building process
- **hive-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **hive** - Complete workflow orchestrator
- **hive-test** - Test and validate completed agents
File diff suppressed because it is too large Load Diff
@@ -70,7 +70,9 @@ def tui(mock, verbose, debug):
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires the 'textual' package. Install with: pip install textual")
click.echo(
"TUI requires the 'textual' package. Install with: pip install textual"
)
sys.exit(1)
from pathlib import Path
@@ -88,6 +90,9 @@ def tui(mock, verbose, debug):
agent._event_bus = EventBus()
agent._tool_registry = ToolRegistry()
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
agent._tool_registry.load_mcp_config(mcp_config_path)
@@ -104,9 +109,6 @@ def tui(mock, verbose, debug):
tool_executor = agent._tool_registry.get_executor()
graph = agent._build_graph()
storage_path = Path.home() / ".hive" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = create_agent_runtime(
graph=graph,
goal=agent.goal,
@@ -216,7 +218,9 @@ async def _interactive_shell(verbose=False):
if "references" in output:
click.echo("--- References ---\n")
for ref in output.get("references", []):
click.echo(f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}")
click.echo(
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
)
click.echo("\n")
else:
click.echo(f"\nResearch failed: {result.error}\n")
@@ -227,6 +231,7 @@ async def _interactive_shell(verbose=False):
except Exception as e:
click.echo(f"Error: {e}", err=True)
import traceback
traceback.print_exc()
finally:
await agent.stop()
@@ -0,0 +1,358 @@
"""Agent graph construction for Deep Research Agent."""
from pathlib import Path
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import (
intake_node,
research_node,
review_node,
report_node,
)
# Goal definition
goal = Goal(
id="rigorous-interactive-research",
name="Rigorous Interactive Research",
description=(
"Research any topic by searching diverse sources, analyzing findings, "
"and producing a cited report — with user checkpoints to guide direction."
),
success_criteria=[
SuccessCriterion(
id="source-diversity",
description="Use multiple diverse, authoritative sources",
metric="source_count",
target=">=5",
weight=0.25,
),
SuccessCriterion(
id="citation-coverage",
description="Every factual claim in the report cites its source",
metric="citation_coverage",
target="100%",
weight=0.25,
),
SuccessCriterion(
id="user-satisfaction",
description="User reviews findings before report generation",
metric="user_approval",
target="true",
weight=0.25,
),
SuccessCriterion(
id="report-completeness",
description="Final report answers the original research questions",
metric="question_coverage",
target="90%",
weight=0.25,
),
],
constraints=[
Constraint(
id="no-hallucination",
description="Only include information found in fetched sources",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="source-attribution",
description="Every claim must cite its source with a numbered reference",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="user-checkpoint",
description="Present findings to the user before writing the final report",
constraint_type="functional",
category="interaction",
),
],
)
# Node list
nodes = [
intake_node,
research_node,
review_node,
report_node,
]
# Edge definitions
edges = [
# intake -> research
EdgeSpec(
id="intake-to-research",
source="intake",
target="research",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# research -> review
EdgeSpec(
id="research-to-review",
source="research",
target="review",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# review -> research (feedback loop)
EdgeSpec(
id="review-to-research-feedback",
source="review",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == True",
priority=1,
),
# review -> report (user satisfied)
EdgeSpec(
id="review-to-report",
source="review",
target="report",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == False",
priority=2,
),
# report -> research (user wants deeper research on current topic)
EdgeSpec(
id="report-to-research",
source="report",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'more_research'",
priority=2,
),
# report -> intake (user wants a new topic — default when not more_research)
EdgeSpec(
id="report-to-intake",
source="report",
target="intake",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() != 'more_research'",
priority=1,
),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = []
class DeepResearchAgent:
"""
Deep Research Agent 4-node pipeline with user checkpoints.
Flow: intake -> research -> review -> report
^ |
+-- feedback loop (if user wants more)
Uses AgentRuntime for proper session management:
- Session-scoped storage (sessions/{session_id}/)
- Checkpointing for resume capability
- Runtime logging
- Data folder for save_data/load_data
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._graph: GraphSpec | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
return GraphSpec(
id="deep-research-agent-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config={
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
conversation_mode="continuous",
identity_prompt=(
"You are a rigorous research agent. You search for information "
"from diverse, authoritative sources, analyze findings critically, "
"and produce well-cited reports. You never fabricate information — "
"every claim must trace back to a source you actually retrieved."
),
)
def _setup(self, mock_mode=False) -> None:
"""Set up the agent runtime with sessions, checkpoints, and logging."""
self._storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
self._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock_mode:
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
)
tool_executor = self._tool_registry.get_executor()
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True,
)
entry_point_specs = [
EntryPointSpec(
id="default",
name="Default",
entry_node=self.entry_node,
trigger_type="manual",
isolation_level="shared",
)
]
self._agent_runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=self._storage_path,
entry_points=entry_point_specs,
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=checkpoint_config,
)
async def start(self, mock_mode=False) -> None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Stop the agent runtime and clean up."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(
self,
entry_point: str = "default",
input_data: dict | None = None,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point,
input_data=input_data or {},
session_state=session_state,
)
async def run(
self, context: dict, mock_mode=False, session_state=None
) -> ExecutionResult:
"""Run the agent (convenience method for single execution)."""
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait(
"default", context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
def info(self):
"""Get agent information."""
return {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": self.goal.name,
"description": self.goal.description,
},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"entry_points": self.entry_points,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {node.id for node in self.nodes}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{self.entry_node}' not found")
for terminal in self.terminal_nodes:
if terminal not in node_ids:
errors.append(f"Terminal node '{terminal}' not found")
for ep_id, node_id in self.entry_points.items():
if node_id not in node_ids:
errors.append(
f"Entry point '{ep_id}' references unknown node '{node_id}'"
)
return {
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}
# Create default instance
default_agent = DeepResearchAgent()
@@ -0,0 +1,26 @@
"""Runtime configuration."""
from dataclasses import dataclass
from framework.config import RuntimeConfig
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
intro_message: str = (
"Hi! I'm your deep research assistant. Tell me a topic and I'll investigate it "
"thoroughly — searching multiple sources, evaluating quality, and synthesizing "
"a comprehensive report. What would you like me to research?"
)
metadata = AgentMetadata()
@@ -1,8 +1,8 @@
{
"hive-tools": {
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
}
@@ -10,8 +10,13 @@ intake_node = NodeSpec(
description="Discuss the research topic with the user, clarify scope, and confirm direction",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["topic"],
output_keys=["research_brief"],
success_criteria=(
"The research brief is specific and actionable: it states the topic, "
"the key questions to answer, the desired scope, and depth."
),
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
@@ -38,10 +43,14 @@ research_node = NodeSpec(
name="Research",
description="Search the web, fetch source content, and compile findings",
node_type="event_loop",
max_node_visits=3,
max_node_visits=0,
input_keys=["research_brief", "feedback"],
output_keys=["findings", "sources", "gaps"],
nullable_output_keys=["feedback"],
success_criteria=(
"Findings reference at least 3 distinct sources with URLs. "
"Key claims are substantiated by fetched content, not generated."
),
system_prompt="""\
You are a research agent. Given a research brief, find and analyze sources.
@@ -56,18 +65,26 @@ Work in phases:
and any contradictions between sources.
Important:
- Work in batches of 3-4 tool calls at a time to manage context
- Work in batches of 3-4 tool calls at a time never more than 10 per turn
- After each batch, assess whether you have enough material
- Prefer quality over quantity 5 good sources beat 15 thin ones
- Track which URL each finding comes from (you'll need citations later)
- Call set_output for each key in a SEPARATE turn (not in the same turn as other tool calls)
When done, use set_output:
When done, use set_output (one key at a time, separate turns):
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
Include themes, contradictions, and confidence levels.")
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
""",
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
tools=[
"web_search",
"web_scrape",
"load_data",
"save_data",
"append_data",
"list_data_files",
],
)
# Node 3: Review (client-facing)
@@ -78,9 +95,13 @@ review_node = NodeSpec(
description="Present findings to user and decide whether to research more or write the report",
node_type="event_loop",
client_facing=True,
max_node_visits=3,
max_node_visits=0,
input_keys=["findings", "sources", "gaps", "research_brief"],
output_keys=["needs_more_research", "feedback"],
success_criteria=(
"The user has been presented with findings and has explicitly indicated "
"whether they want more research or are ready for the report."
),
system_prompt="""\
Present the research findings to the user clearly and concisely.
@@ -102,41 +123,77 @@ Should we proceed to writing the final report?
)
# Node 4: Report (client-facing)
# Writes the final report and presents it to the user.
# Writes an HTML report, serves the link to the user, and answers follow-ups.
report_node = NodeSpec(
id="report",
name="Write & Deliver Report",
description="Write a cited report from the findings and present it to the user",
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status"],
output_keys=["delivery_status", "next_action"],
success_criteria=(
"An HTML report has been saved, the file link has been presented to the user, "
"and the user has indicated what they want to do next."
),
system_prompt="""\
Write a comprehensive research report and present it to the user.
Write a research report as an HTML file and present it to the user.
**STEP 1 Write and present the report (text only, NO tool calls):**
IMPORTANT: save_data requires TWO separate arguments: filename and data.
Call it like: save_data(filename="report.html", data="<html>...</html>")
Do NOT use _raw, do NOT nest arguments inside a JSON string.
**STEP 1 Write and save the HTML report (tool calls, NO text to user yet):**
Build a clean HTML document. Keep the HTML concise aim for clarity over length.
Use minimal embedded CSS (a few lines of style, not a full framework).
Report structure:
1. **Executive Summary** (2-3 paragraphs)
2. **Findings** (organized by theme, with [n] citations)
3. **Analysis** (synthesis, implications, areas of debate)
4. **Conclusion** (key takeaways, confidence assessment)
5. **References** (numbered list of sources cited)
- Title & date
- Executive Summary (2-3 paragraphs)
- Key Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications)
- Conclusion (key takeaways)
- References (numbered list with clickable URLs)
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
End by asking the user if they have questions or want to save the report.
Save the HTML:
save_data(filename="report.html", data="<html>...</html>")
**STEP 2 After the user responds:**
- Answer follow-up questions from the research material
- If they want to save, use write_to_file tool
- When the user is satisfied: set_output("delivery_status", "completed")
Then get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
If save_data fails, simplify and shorten the HTML, then retry.
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions or want to continue.
**STEP 3 After the user responds:**
- Answer any follow-up questions from the research material
- When the user is ready to move on, ask what they'd like to do next:
- Research a new topic?
- Dig deeper into the current topic?
- Then call set_output:
- set_output("delivery_status", "completed")
- set_output("next_action", "new_topic") if they want a new topic
- set_output("next_action", "more_research") if they want deeper research
""",
tools=["write_to_file"],
tools=[
"save_data",
"append_data",
"edit_data",
"serve_file_to_user",
"load_data",
"list_data_files",
],
)
__all__ = [
@@ -1,10 +1,10 @@
---
name: setup-credentials
name: hive-credentials
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
license: Apache-2.0
metadata:
author: hive
version: "2.2"
version: "2.3"
type: utility
---
@@ -31,95 +31,50 @@ Determine which agent needs credentials. The user will either:
Locate the agent's directory under `exports/{agent_name}/`.
### Step 2: Detect Required Credentials (Bash-First)
### Step 2: Detect Missing Credentials
Use bash commands to determine what the agent needs and what's already configured. This avoids Python import issues and works even when `HIVE_CREDENTIAL_KEY` is not set.
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
#### Step 2a: Read Agent Requirements
Extract `required_tools` and node types from the agent config:
```bash
# Get required tools
jq -r '.required_tools[]?' exports/{agent_name}/agent.json 2>/dev/null
# Get node types from graph nodes
jq -r '.graph.nodes[]?.node_type' exports/{agent_name}/agent.json 2>/dev/null | sort -u
```
check_missing_credentials(agent_path="exports/{agent_name}")
```
Map the extracted tools and node types to credentials by reading the spec files directly:
The tool returns a JSON response:
```bash
# Read all credential specs — each file defines tools, node_types, env_var, and credential_id
cat tools/src/aden_tools/credentials/llm.py tools/src/aden_tools/credentials/search.py tools/src/aden_tools/credentials/email.py tools/src/aden_tools/credentials/integrations.py
```json
{
"agent": "exports/{agent_name}",
"missing": [
{
"credential_name": "brave_search",
"env_var": "BRAVE_SEARCH_API_KEY",
"description": "Brave Search API key for web search",
"help_url": "https://brave.com/search/api/",
"tools": ["web_search"]
}
],
"available": [
{
"credential_name": "anthropic",
"env_var": "ANTHROPIC_API_KEY",
"source": "encrypted_store"
}
],
"total_missing": 1,
"ready": false
}
```
For each `CredentialSpec`, match its `tools` and `node_types` lists against the agent's required tools and node types. Extract the `env_var`, `credential_id`, and `credential_group` for every match. This is the list of needed credentials.
#### Step 2b: Check Existing Credential Sources
For each needed credential, check three sources. A credential is "found" if it exists in ANY of them:
**1. Encrypted store metadata index** (unencrypted JSON — no decryption key needed):
```bash
cat ~/.hive/credentials/metadata/index.json 2>/dev/null | jq -r '.credentials | keys[]'
```
If a credential ID appears in this list, it is stored in the encrypted store.
**2. Environment variables:**
```bash
# Check each needed env var, e.g.:
printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "ANTHROPIC_API_KEY: set" || echo "ANTHROPIC_API_KEY: not set"
printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "BRAVE_SEARCH_API_KEY: set" || echo "BRAVE_SEARCH_API_KEY: not set"
```
**3. Project `.env` file:**
```bash
# Check each needed env var, e.g.:
grep -q '^ANTHROPIC_API_KEY=' .env 2>/dev/null && echo "ANTHROPIC_API_KEY: in .env" || echo "ANTHROPIC_API_KEY: not in .env"
grep -q '^BRAVE_SEARCH_API_KEY=' .env 2>/dev/null && echo "BRAVE_SEARCH_API_KEY: in .env" || echo "BRAVE_SEARCH_API_KEY: not in .env"
```
#### Step 2c: HIVE_CREDENTIAL_KEY Check
If any credentials were found in the encrypted store metadata index, verify the encryption key is available. The key is typically persisted to shell config by a previous setup-credentials run.
Check both the current session AND shell config files:
```bash
# Check 1: Current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check 2: Shell config files (where setup-credentials persists it)
# Note: check each file individually to avoid non-zero exit when one doesn't exist
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
Decision logic:
- **In current session** — no action needed, credentials in the store are usable
- **In shell config but NOT in current session** — the key is persisted but this shell hasn't sourced it. Run `source ~/.zshrc` (or `~/.bashrc`), then re-check. Credentials in the store are usable after sourcing.
- **Not in session AND not in shell config** — the key was never persisted. Warn the user that credentials in the store cannot be decrypted. Help fix the key situation (recover/re-persist), do NOT re-collect credential values that are already stored.
#### Step 2d: Compute Missing & Group
Diff the "needed" credentials against the "found" credentials to get the truly missing list.
Group related credentials by their `credential_group` field from the spec files. Credentials that share the same non-empty `credential_group` value should be presented as a single setup step rather than asking for each one individually.
**If nothing is missing and there's no HIVE_CREDENTIAL_KEY issue:** Report all credentials as configured and skip Steps 3-5. Example:
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
```
All required credentials are already configured:
✓ anthropic (ANTHROPIC_API_KEY) — found in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — found in environment
✓ anthropic (ANTHROPIC_API_KEY)
✓ brave_search (BRAVE_SEARCH_API_KEY)
Your agent is ready to run!
```
**If credentials are missing:** Continue to Step 3 with only the missing ones.
**If credentials are missing:** Continue to Step 3 with the `missing` list.
### Step 3: Present Auth Options for Each Missing Credential
@@ -153,7 +108,7 @@ Present the available options using AskUserQuestion:
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Platform (OAuth) (Recommended)
Secure OAuth2 flow via integration.adenhq.com
Secure OAuth2 flow via hive.adenhq.com
- Quick setup with automatic token refresh
- No need to manage API keys manually
@@ -170,6 +125,28 @@ Choose how to configure HUBSPOT_ACCESS_TOKEN:
### Step 4: Execute Auth Flow Based on User Choice
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
```bash
# Check current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check shell config files
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
- **In current session** — proceed to store credentials
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere**`EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
> **⚠️ IMPORTANT: After adding `HIVE_CREDENTIAL_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
@@ -195,7 +172,7 @@ If not set, guide user to get one from Aden (this is where they do OAuth):
from aden_tools.credentials import open_browser, get_aden_setup_url
# Open browser to Aden - user will sign up and connect integrations there
url = get_aden_setup_url() # https://integration.adenhq.com/setup
url = get_aden_setup_url() # https://hive.adenhq.com
success, msg = open_browser(url)
print("Please sign in to Aden and connect your integrations (HubSpot, etc.).")
@@ -231,6 +208,12 @@ if success:
print(f"Run: {source_cmd}")
```
> **⚠️ IMPORTANT: After adding `ADEN_API_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
Also save to `~/.hive/configuration.json` for the framework:
```python
@@ -272,7 +255,7 @@ print(f"Synced credentials: {synced}")
# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
if "hubspot" not in synced:
print("HubSpot not found in your Aden account.")
print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.")
print("Please visit https://hive.adenhq.com to connect HubSpot, then try again.")
```
For more control over the sync process:
@@ -442,29 +425,38 @@ config_path.write_text(json.dumps(config, indent=2))
### Step 6: Verify All Credentials
Run validation again to confirm everything is set:
Use the `verify_credentials` MCP tool to confirm everything is properly configured:
```python
runner = AgentRunner.load("exports/{agent_name}")
validation = runner.validate()
assert not validation.missing_credentials, "Still missing credentials!"
```
verify_credentials(agent_path="exports/{agent_name}")
```
Report the result to the user.
The tool returns:
```json
{
"agent": "exports/{agent_name}",
"ready": true,
"missing_credentials": [],
"warnings": [],
"errors": []
}
```
If `ready` is true, report success. If `missing_credentials` is non-empty, identify what failed and loop back to Step 3 for the remaining credentials.
## Health Check Reference
Health checks validate credentials by making lightweight API calls:
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | ---------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `linear` | `POST /graphql` (viewer query) | API key validity, workspace access |
| `resend` | `GET /domains` | API key validity |
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | --------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `resend` | `GET /domains` | API key validity |
```python
from aden_tools.credentials import check_credential_health, HealthCheckResult
@@ -480,9 +472,14 @@ result: HealthCheckResult = check_credential_health("hubspot", token_value)
The local encrypted store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
- If the user doesn't have one, `EncryptedFileStorage` will auto-generate one and log it
- The user MUST persist this key (e.g., in `~/.bashrc` or a secrets manager)
- The user MUST persist this key (e.g., in `~/.bashrc`/`~/.zshrc` or a secrets manager)
- Without this key, stored credentials cannot be decrypted
- This is the ONLY secret that should live in `~/.bashrc` or environment config
**Shell config rule:** Only TWO keys belong in shell config (`~/.zshrc`/`~/.bashrc`):
- `HIVE_CREDENTIAL_KEY` — encryption key for the credential store
- `ADEN_API_KEY` — Aden platform auth key (needed before the store can sync)
All other API keys (Brave, Google, HubSpot, etc.) must go in the encrypted store only. **Never offer to add them to shell config.**
If `HIVE_CREDENTIAL_KEY` is not set:
@@ -495,6 +492,7 @@ If `HIVE_CREDENTIAL_KEY` is not set:
- **NEVER** log, print, or echo credential values in tool output
- **NEVER** store credentials in plaintext files, git-tracked files, or agent configs
- **NEVER** hardcode credentials in source code
- **NEVER** offer to save API keys to shell config (`~/.zshrc`/`~/.bashrc`) — the **only** keys that belong in shell config are `HIVE_CREDENTIAL_KEY` and `ADEN_API_KEY`. All other credentials (Brave, Google, HubSpot, GitHub, Resend, etc.) go in the encrypted store only.
- **ALWAYS** use `SecretStr` from Pydantic when handling credential values in Python
- **ALWAYS** use the local encrypted store (`~/.hive/credentials`) for persistence
- **ALWAYS** run health checks before storing credentials (when possible)
@@ -561,60 +559,27 @@ token = store.get_key("hubspot", "access_token")
## Example Session
```
User: /setup-credentials for my research-agent
User: /hive-credentials for my research-agent
Agent: Let me check what credentials your research-agent needs.
[Reads agent config]
$ jq -r '.required_tools[]?' exports/research-agent/agent.json
web_search
google_search
$ jq -r '.graph.nodes[]?.node_type' exports/research-agent/agent.json | sort -u
event_loop
[Maps tools/nodes to credentials using lookup table]
Needed: anthropic, brave_search, google_search, google_cse
[Checks encrypted store metadata index]
$ cat ~/.hive/credentials/metadata/index.json | jq -r '.credentials | keys[]'
anthropic
brave_search
[Checks environment variables]
$ printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv GOOGLE_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv GOOGLE_CSE_ID > /dev/null 2>&1 && echo "set" || echo "not set"
not set
[Checks HIVE_CREDENTIAL_KEY since credentials found in store]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
session: not set
$ for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
/Users/user/.zshrc
[Key is in shell config but not current session — sourcing it]
$ source ~/.zshrc
[Computes missing credentials]
Found:
✓ anthropic (ANTHROPIC_API_KEY) — in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — in encrypted store
Missing:
✗ google_search (GOOGLE_API_KEY)
✗ google_cse (GOOGLE_CSE_ID)
[Calls check_missing_credentials(agent_path="exports/research-agent")]
→ Returns:
available: anthropic (encrypted_store), brave_search (encrypted_store)
missing: google_search (GOOGLE_API_KEY), google_cse (GOOGLE_CSE_ID)
ready: false
Agent: 2 of 4 required credentials are already configured. Only Google Custom
Search needs setup (2 values as a single group).
Search needs setup (2 values).
--- Setting up Google Custom Search (google_search + google_cse) ---
This requires two values that work together.
[Checks HIVE_CREDENTIAL_KEY before storing]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
set
First, the Google API Key:
1. Go to https://console.cloud.google.com/apis/credentials
2. Create a new project (or select an existing one)
@@ -641,10 +606,35 @@ Now, the Custom Search Engine ID:
✓ Google Custom Search credentials valid
[Calls verify_credentials(agent_path="exports/research-agent")]
→ Returns: ready: true, missing_credentials: []
All credentials are now configured:
✓ anthropic (ANTHROPIC_API_KEY) — already in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — already in encrypted store
✓ google_search (GOOGLE_API_KEY) — stored in encrypted store
✓ google_cse (GOOGLE_CSE_ID) — stored in encrypted store
Your agent is ready to run!
┌─────────────────────────────────────────────────────────────────────────────┐
│ ✅ CREDENTIALS CONFIGURED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OPEN A NEW TERMINAL before running commands below. │
│ Environment variables were saved to your shell config but │
│ only take effect in new terminal sessions. │
│ │
│ NEXT STEPS: │
│ │
│ 1. RUN YOUR AGENT: │
│ │
│ hive tui │
│ │
│ 2. IF YOU ENCOUNTER ISSUES, USE THE DEBUGGER: │
│ │
│ /hive-debugger │
│ │
│ The debugger analyzes runtime logs, identifies retry loops, tool │
│ failures, stalled execution, and provides actionable fix suggestions. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
File diff suppressed because it is too large Load Diff
@@ -1,19 +1,19 @@
---
name: building-agents-patterns
name: hive-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: reference
part_of: building-agents
part_of: hive
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `building-agents-construction`.
**Prerequisites:** Complete agent structure using `hive-create`.
## Practical Example: Hybrid Workflow
@@ -97,6 +97,7 @@ research_node = NodeSpec(
```
**How it works:**
- Client-facing nodes stream LLM text to the user and block for input after each response
- User input is injected via `node.inject_event(text)`
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
@@ -107,13 +108,13 @@ research_node = NodeSpec(
### When to Use client_facing
| Scenario | client_facing | Why |
|----------|:---:|-----|
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
| Scenario | client_facing | Why |
| ----------------------------------- | :-----------: | ---------------------- |
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
@@ -158,22 +159,24 @@ EdgeSpec(
```
**Key concepts:**
- `nullable_output_keys`: Lists output keys that may remain unset. The node sets exactly one of the mutually exclusive keys per execution.
- `max_node_visits`: Must be >1 on the feedback target (extractor) so it can re-execute. Default is 1.
- `priority`: Positive = forward edge (evaluated first). Negative = feedback edge. The executor tries forward edges first; if none match, falls back to feedback edges.
### Routing Decision Table
| Pattern | Old Approach | New Approach |
|---------|-------------|--------------|
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
| Pattern | Old Approach | New Approach |
| ---------------------- | ----------------------- | --------------------------------------------- |
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
## Judge Patterns
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
- Output rollback logic
- `_user_has_responded` flags
- Premature set_output rejection
@@ -184,6 +187,7 @@ Judges control when an event_loop node's loop exits. Choose based on validation
### Implicit Judge (Default)
When no judge is configured, the implicit judge ACCEPTs when:
- The LLM finishes its response with no tool calls
- All required output keys have been set via `set_output`
@@ -219,11 +223,11 @@ class SchemaJudge:
### When to Use Which Judge
| Judge | Use When | Example |
|-------|----------|---------|
| Judge | Use When | Example |
| --------------- | ------------------------------------- | ---------------------- |
| Implicit (None) | Output keys are sufficient validation | Simple data extraction |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
## Fan-Out / Fan-In (Parallel Execution)
@@ -244,6 +248,7 @@ EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
```
**Requirements:**
- Parallel event_loop nodes must have **disjoint output_keys** (no key written by both)
- Only one parallel branch may contain a `client_facing` node
- Fan-in node receives outputs from all completed branches in shared memory
@@ -253,6 +258,7 @@ EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
### Tiered Compaction
EventLoopNode automatically manages context window usage with tiered compaction:
1. **Pruning** — Old tool results replaced with compact placeholders (zero-cost, no LLM call)
2. **Normal compaction** — LLM summarizes older messages
3. **Aggressive compaction** — Keeps only recent messages + summary
@@ -265,17 +271,20 @@ The framework automatically truncates large tool results and saves full content
For explicit data management, use the data tools (real MCP tools, not synthetic):
```python
# save_data, load_data, list_data_files are real MCP tools
# Each takes a data_dir parameter since the MCP server is shared
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
# data_dir is auto-injected by the framework — the LLM never sees it
# Saving large results
save_data(filename="sources.json", data=large_json_string, data_dir="/path/to/spillover")
save_data(filename="sources.json", data=large_json_string)
# Reading with pagination (line-based offset/limit)
load_data(filename="sources.json", data_dir="/path/to/spillover", offset=0, limit=50)
load_data(filename="sources.json", offset=0, limit=50)
# Listing available files
list_data_files(data_dir="/path/to/spillover")
list_data_files()
# Serving a file to the user as a clickable link
serve_file_to_user(filename="report.html", label="Research Report")
```
Add data tools to nodes that handle large tool results:
@@ -287,7 +296,7 @@ research_node = NodeSpec(
)
```
The `data_dir` is passed by the framework (from the node's spillover directory). The LLM sees `data_dir` in truncation messages and uses it when calling `load_data`.
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
## Anti-Patterns
@@ -304,18 +313,19 @@ The `data_dir` is passed by the framework (from the node's spillover directory).
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
| Bad (8 thin nodes) | Good (4 rich nodes) |
|---------------------|---------------------|
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
| Bad (8 thin nodes) | Good (4 rich nodes) |
| ------------------- | ----------------------------------- |
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
**Why fewer nodes are better:**
- The LLM retains full context of its work within a single node
- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
- Fewer edges means simpler graph and fewer failure points
@@ -324,6 +334,7 @@ A common mistake is splitting work into too many small single-purpose nodes. Eac
### MCP Tools - Correct Usage
**MCP tools OK for:**
- `test_node` — Validate node configuration with mock inputs
- `validate_graph` — Check graph structure
- `configure_loop` — Set event loop parameters
@@ -356,7 +367,7 @@ When agent is complete, transition to testing phase:
### Pre-Testing Checklist
- [ ] Agent structure validates: `uv run python -m agent_name validate`
- [ ] All nodes defined in nodes/__init__.py
- [ ] All nodes defined in nodes/**init**.py
- [ ] All edges connect valid nodes with correct priorities
- [ ] Feedback edge targets have `max_node_visits > 1`
- [ ] Client-facing nodes have meaningful system prompts
@@ -364,10 +375,10 @@ When agent is complete, transition to testing phase:
## Related Skills
- **building-agents-core** — Fundamental concepts (node types, edges, event loop architecture)
- **building-agents-construction** — Step-by-step building process
- **testing-agent** — Test and validate agents
- **agent-workflow** — Complete workflow orchestrator
- **hive-concepts** — Fundamental concepts (node types, edges, event loop architecture)
- **hive-create** — Step-by-step building process
- **hive-test** — Test and validate agents
- **hive** — Complete workflow orchestrator
---
+940
View File
@@ -0,0 +1,940 @@
---
name: hive-test
description: Iterative agent testing with session recovery. Execute, analyze, fix, resume from checkpoints. Use when testing an agent, debugging test failures, or verifying fixes without re-running from scratch.
---
# Agent Testing
Test agents iteratively: execute, analyze failures, fix, resume from checkpoint, repeat.
## When to Use
- Testing a newly built agent against its goal
- Debugging a failing agent iteratively
- Verifying fixes without re-running expensive early nodes
- Running final regression tests before deployment
## Prerequisites
1. Agent package at `exports/{agent_name}/` (built with `/hive-create`)
2. Credentials configured (`/hive-credentials`)
3. `ANTHROPIC_API_KEY` set (or appropriate LLM provider key)
**Path distinction** (critical — don't confuse these):
- `exports/{agent_name}/` — agent source code (edit here)
- `~/.hive/agents/{agent_name}/` — runtime data: sessions, checkpoints, logs (read here)
---
## The Iterative Test Loop
This is the core workflow. Don't re-run the entire agent when a late node fails — analyze, fix, and resume from the last clean checkpoint.
```
┌──────────────────────────────────────┐
│ PHASE 1: Generate Test Scenarios │
│ Goal → synthetic test inputs + tests │
└──────────────┬───────────────────────┘
┌──────────────────────────────────────┐
│ PHASE 2: Execute │◄────────────────┐
│ Run agent (CLI or pytest) │ │
└──────────────┬───────────────────────┘ │
↓ │
Pass? ──yes──► PHASE 6: Final Verification │
│ │
no │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 3: Analyze │ │
│ Session + runtime logs + checkpoints │ │
└──────────────┬───────────────────────┘ │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 4: Fix │ │
│ Prompt / code / graph / goal │ │
└──────────────┬───────────────────────┘ │
↓ │
┌──────────────────────────────────────┐ │
│ PHASE 5: Recover & Resume │─────────────────┘
│ Checkpoint resume OR fresh re-run │
└──────────────────────────────────────┘
```
---
### Phase 1: Generate Test Scenarios
Create synthetic tests from the agent's goal, constraints, and success criteria.
#### Step 1a: Read the goal
```python
# Read goal from agent.py
Read(file_path="exports/{agent_name}/agent.py")
# Extract the Goal definition and convert to JSON string
```
#### Step 1b: Get test guidelines
```python
# Get constraint test guidelines
generate_constraint_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "constraints": [...]}',
agent_path="exports/{agent_name}"
)
# Get success criteria test guidelines
generate_success_tests(
goal_id="your-goal-id",
goal_json='{"id": "...", "success_criteria": [...]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/{agent_name}"
)
```
These return `file_header`, `test_template`, `constraints_formatted`/`success_criteria_formatted`, and `test_guidelines`. They do NOT generate test code — you write the tests.
#### Step 1c: Write tests
```python
Write(
file_path=result["output_file"],
content=result["file_header"] + "\n\n" + your_test_code
)
```
#### Test writing rules
- Every test MUST be `async` with `@pytest.mark.asyncio`
- Every test MUST accept `runner, auto_responder, mock_mode` fixtures
- Use `await auto_responder.start()` before running, `await auto_responder.stop()` in `finally`
- Use `await runner.run(input_dict)` — this goes through AgentRunner → AgentRuntime → ExecutionStream
- Access output via `result.output.get("key")` — NEVER `result.output["key"]`
- `result.success=True` means no exception, NOT goal achieved — always check output
- Write 8-15 tests total, not 30+
- Each real test costs ~3 seconds + LLM tokens
- NEVER use `default_agent.run()` — it bypasses the runtime (no sessions, no logs, client-facing nodes hang)
#### Step 1d: Check existing tests
Before generating, check if tests already exist:
```python
list_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
---
### Phase 2: Execute
Two execution paths, use the right one for your situation.
#### Iterative debugging (for complex agents)
Run the agent via CLI. This creates sessions with checkpoints at `~/.hive/agents/{agent_name}/sessions/`:
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
Sessions and checkpoints are saved automatically.
**Client-facing nodes**: Agents with `client_facing=True` nodes (interactive conversation) work in headless mode when run from a real terminal — the agent streams output to stdout and reads user input from stdin via a `>>> ` prompt. In non-interactive shells (like Claude Code's Bash tool), client-facing nodes will hang because there is no stdin. For testing interactive agents from Claude Code, use `run_tests` with mock mode or have the user run the agent manually in their terminal.
#### Automated regression (for CI or final verification)
Use the `run_tests` MCP tool to run all pytest tests:
```python
run_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
Returns structured results:
```json
{
"overall_passed": false,
"summary": {"total": 12, "passed": 10, "failed": 2, "pass_rate": "83.3%"},
"test_results": [{"test_name": "test_success_source_diversity", "status": "failed"}],
"failures": [{"test_name": "test_success_source_diversity", "details": "..."}]
}
```
**Options:**
```python
# Run only constraint tests
run_tests(goal_id, agent_path, test_types='["constraint"]')
# Stop on first failure
run_tests(goal_id, agent_path, fail_fast=True)
# Parallel execution
run_tests(goal_id, agent_path, parallel=4)
```
**Note:** `run_tests` uses `AgentRunner` with `tmp_path` storage, so sessions are isolated per test run. For checkpoint-based recovery with persistent sessions, use CLI execution. Use `run_tests` for quick regression checks and final verification.
---
### Phase 3: Analyze Failures
When a test fails, drill down systematically. Don't guess — use the tools.
#### Step 3a: Get error category
```python
debug_test(
goal_id="your-goal-id",
test_name="test_success_source_diversity",
agent_path="exports/{agent_name}"
)
```
Returns error category (`IMPLEMENTATION_ERROR`, `ASSERTION_FAILURE`, `TIMEOUT`, `IMPORT_ERROR`, `API_ERROR`) plus full traceback and suggestions.
#### Step 3b: Find the failed session
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/{agent_name}",
status="failed",
limit=5
)
```
Returns session list with IDs, timestamps, current_node (where it failed), execution_quality.
#### Step 3c: Inspect session state
```python
get_agent_session_state(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345"
)
```
Returns execution path, which node was current, step count, timestamps — but excludes memory values (to avoid context bloat). Shows `memory_keys` and `memory_size` instead.
#### Step 3d: Examine runtime logs (L2/L3)
```python
# L2: Per-node success/failure, retry counts
query_runtime_log_details(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="session_20260209_143022_abc12345",
needs_attention_only=True
)
# L3: Exact LLM responses, tool call inputs/outputs
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="session_20260209_143022_abc12345",
node_id="research"
)
```
#### Step 3e: Inspect memory data
```python
# See what data a node actually produced
get_agent_session_memory(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
key="research_results"
)
```
#### Step 3f: Find recovery points
```python
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
is_clean="true"
)
```
Returns checkpoint summaries with IDs, types (`node_start`, `node_complete`), which node, and `is_clean` flag. Clean checkpoints are safe resume points.
#### Step 3g: Compare checkpoints (optional)
To understand what changed between two points in execution:
```python
compare_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
checkpoint_id_before="cp_node_complete_research_143030",
checkpoint_id_after="cp_node_complete_review_143115"
)
```
Returns memory diff (added/removed/changed keys) and execution path diff.
---
### Phase 4: Fix Based on Root Cause
Use the analysis from Phase 3 to determine what to fix and where.
| Root Cause | What to Fix | Where to Edit |
|------------|------------|---------------|
| **Prompt issue** — LLM produces wrong output format, misses instructions | Node `system_prompt` | `exports/{agent}/nodes/__init__.py` |
| **Code bug** — TypeError, KeyError, logic error in Python | Agent code | `exports/{agent}/agent.py`, `nodes/__init__.py` |
| **Graph issue** — wrong routing, missing edge, bad condition_expr | Edges, node config | `exports/{agent}/agent.py` |
| **Tool issue** — MCP tool fails, wrong config, missing credential | Tool config | `exports/{agent}/mcp_servers.json`, `/hive-credentials` |
| **Goal issue** — success criteria too strict/vague, wrong constraints | Goal definition | `exports/{agent}/agent.py` (goal section) |
| **Test issue** — test expectations don't match actual agent behavior | Test code | `exports/{agent}/tests/test_*.py` |
#### Fix strategies by error category
**IMPLEMENTATION_ERROR** (TypeError, AttributeError, KeyError):
```python
# Read the failing code
Read(file_path="exports/{agent_name}/nodes/__init__.py")
# Fix the bug
Edit(
file_path="exports/{agent_name}/nodes/__init__.py",
old_string="results.get('videos')",
new_string="(results or {}).get('videos', [])"
)
```
**ASSERTION_FAILURE** (test assertions fail but agent ran successfully):
- Check if the agent's output is actually wrong → fix the prompt
- Check if the test's expectations are unrealistic → fix the test
- Use `get_agent_session_memory` to see what the agent actually produced
**TIMEOUT / STALL** (agent runs too long):
- Check `node_visit_counts` for feedback loops hitting max_node_visits
- Check L3 logs for tool calls that hang
- Reduce `max_iterations` in loop_config or fix the prompt to converge faster
**API_ERROR** (connection, rate limit, auth):
- Verify credentials with `/hive-credentials`
- Check MCP server configuration
---
### Phase 5: Recover & Resume
After fixing the agent, decide whether to resume or re-run.
#### When to resume from checkpoint
Resume when ALL of these are true:
- The fix is to a node that comes AFTER existing clean checkpoints
- Clean checkpoints exist (from a CLI execution with checkpointing)
- The early nodes are expensive (web scraping, API calls, long LLM chains)
```bash
# Resume from the last clean checkpoint before the failing node
uv run hive run exports/{agent_name} \
--resume-session session_20260209_143022_abc12345 \
--checkpoint cp_node_complete_research_143030
```
This skips all nodes before the checkpoint and only re-runs the fixed node onward.
#### When to re-run from scratch
Re-run when ANY of these are true:
- The fix is to the entry node or an early node
- No checkpoints exist (e.g., agent was run via `run_tests`)
- The agent is fast (2-3 nodes, completes in seconds)
- You changed the graph structure (added/removed nodes/edges)
```bash
uv run hive run exports/{agent_name} --input '{"query": "test topic"}'
```
#### Inspecting a checkpoint before resuming
```python
get_agent_checkpoint(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="session_20260209_143022_abc12345",
checkpoint_id="cp_node_complete_research_143030"
)
```
Returns the full checkpoint: shared_memory snapshot, execution_path, current_node, next_node, is_clean.
#### Loop back to Phase 2
After resuming or re-running, check if the fix worked. If not, go back to Phase 3.
---
### Phase 6: Final Verification
Once the iterative fix loop converges (the agent produces correct output), run the full automated test suite:
```python
run_tests(
goal_id="your-goal-id",
agent_path="exports/{agent_name}"
)
```
All tests should pass. If not, repeat the loop for remaining failures.
---
## Credential Requirements
**CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.).
### Prerequisites
Before running agent tests, you MUST collect ALL required credentials from the user.
**Step 1: LLM API Key (always required)**
```bash
export ANTHROPIC_API_KEY="your-key-here"
```
**Step 2: Tool-specific credentials (depends on agent's tools)**
Inspect the agent's `mcp_servers.json` and tool configuration to determine which tools the agent uses, then check for all required credentials:
```python
from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS
creds = CredentialManager()
# Determine which tools the agent uses (from agent.json or mcp_servers.json)
agent_tools = [...] # e.g., ["hubspot_search_contacts", "web_search", ...]
# Find all missing credentials for those tools
missing = creds.get_missing_for_tools(agent_tools)
```
Common tool credentials:
| Tool | Env Var | Help URL |
|------|---------|----------|
| HubSpot CRM | `HUBSPOT_ACCESS_TOKEN` | https://developers.hubspot.com/docs/api/private-apps |
| Brave Search | `BRAVE_SEARCH_API_KEY` | https://brave.com/search/api/ |
| Google Search | `GOOGLE_SEARCH_API_KEY` + `GOOGLE_SEARCH_CX` | https://developers.google.com/custom-search |
**Why ALL credentials are required:**
- Tests need to execute the agent's LLM nodes to validate behavior
- Tools with missing credentials will return error dicts instead of real data
- Mock mode bypasses everything, providing no confidence in real-world performance
### Mock Mode Limitations
Mock mode (`--mock` flag or `MOCK_MODE=1`) is **ONLY for structure validation**:
- Validates graph structure (nodes, edges, connections)
- Validates that `AgentRunner.load()` succeeds and the agent is importable
- Does NOT execute event_loop agents — MockLLMProvider never calls `set_output`, so event_loop nodes loop forever
- Does NOT test LLM reasoning, content quality, or constraint validation
- Does NOT test real API integrations or tool use
**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials.
### Enforcing Credentials in Tests
When writing tests, **ALWAYS include credential checks**:
```python
import os
import pytest
from aden_tools.credentials import CredentialManager
pytestmark = pytest.mark.skipif(
not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
)
@pytest.fixture(scope="session", autouse=True)
def check_credentials():
"""Ensure ALL required credentials are set for real testing."""
creds = CredentialManager()
mock_mode = os.environ.get("MOCK_MODE")
if not creds.is_available("anthropic"):
if mock_mode:
print("\nRunning in MOCK MODE - structure validation only")
else:
pytest.fail(
"\nANTHROPIC_API_KEY not set!\n"
"Set API key: export ANTHROPIC_API_KEY='your-key-here'\n"
"Or run structure validation: MOCK_MODE=1 pytest exports/{agent}/tests/"
)
if not mock_mode:
agent_tools = [] # Update per agent
missing = creds.get_missing_for_tools(agent_tools)
if missing:
lines = ["\nMissing tool credentials!"]
for name in missing:
spec = creds.specs.get(name)
if spec:
lines.append(f" {spec.env_var} - {spec.description}")
pytest.fail("\n".join(lines))
```
### User Communication
When the user asks to test an agent, **ALWAYS check for ALL credentials first**:
1. **Identify the agent's tools** from `mcp_servers.json`
2. **Check ALL required credentials** using `CredentialManager`
3. **Ask the user to provide any missing credentials** before proceeding
4. Collect ALL missing credentials in a single prompt — not one at a time
---
## Safe Test Patterns
### OutputCleaner
The framework automatically validates and cleans node outputs using a fast LLM at edge traversal time. Tests should still use safe patterns because OutputCleaner may not catch all issues.
### Safe Access (REQUIRED)
```python
# UNSAFE - will crash on missing keys
approval = result.output["approval_decision"]
category = result.output["analysis"]["category"]
# SAFE - use .get() with defaults
output = result.output or {}
approval = output.get("approval_decision", "UNKNOWN")
# SAFE - type check before operations
analysis = output.get("analysis", {})
if isinstance(analysis, dict):
category = analysis.get("category", "unknown")
# SAFE - handle JSON parsing trap (LLM response as string)
import json
recommendation = output.get("recommendation", "{}")
if isinstance(recommendation, str):
try:
parsed = json.loads(recommendation)
if isinstance(parsed, dict):
approval = parsed.get("approval_decision", "UNKNOWN")
except json.JSONDecodeError:
approval = "UNKNOWN"
elif isinstance(recommendation, dict):
approval = recommendation.get("approval_decision", "UNKNOWN")
# SAFE - type check before iteration
items = output.get("items", [])
if isinstance(items, list):
for item in items:
...
```
### Helper Functions for conftest.py
```python
import json
import re
def _parse_json_from_output(result, key):
"""Parse JSON from agent output (framework may store full LLM response as string)."""
response_text = result.output.get(key, "")
json_text = re.sub(r'```json\s*|\s*```', '', response_text).strip()
try:
return json.loads(json_text)
except (json.JSONDecodeError, AttributeError, TypeError):
return result.output.get(key)
def safe_get_nested(result, key_path, default=None):
"""Safely get nested value from result.output."""
output = result.output or {}
current = output
for key in key_path:
if isinstance(current, dict):
current = current.get(key)
elif isinstance(current, str):
try:
json_text = re.sub(r'```json\s*|\s*```', '', current).strip()
parsed = json.loads(json_text)
if isinstance(parsed, dict):
current = parsed.get(key)
else:
return default
except json.JSONDecodeError:
return default
else:
return default
return current if current is not None else default
# Make available in tests
pytest.parse_json_from_output = _parse_json_from_output
pytest.safe_get_nested = safe_get_nested
```
### ExecutionResult Fields
**`result.success=True` means NO exception, NOT goal achieved**
```python
# WRONG
assert result.success
# RIGHT
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
approval = output.get("approval_decision")
assert approval == "APPROVED", f"Expected APPROVED, got {approval}"
```
All fields:
- `success: bool` — Completed without exception (NOT goal achieved!)
- `output: dict` — Complete memory snapshot (may contain raw strings)
- `error: str | None` — Error message if failed
- `steps_executed: int` — Number of nodes executed
- `total_tokens: int` — Cumulative token usage
- `total_latency_ms: int` — Total execution time
- `path: list[str]` — Node IDs traversed (may repeat in feedback loops)
- `paused_at: str | None` — Node ID if paused
- `session_state: dict` — State for resuming
- `node_visit_counts: dict[str, int]` — Visit counts per node (feedback loop testing)
- `execution_quality: str` — "clean", "degraded", or "failed"
### Test Count Guidance
**Write 8-15 tests, not 30+**
- 2-3 tests per success criterion
- 1 happy path test
- 1 boundary/edge case test
- 1 error handling test (optional)
Each real test costs ~3 seconds + LLM tokens. 12 tests = ~36 seconds, $0.12.
---
## Test Patterns
### Happy Path
```python
@pytest.mark.asyncio
async def test_happy_path(runner, auto_responder, mock_mode):
"""Test normal successful execution."""
await auto_responder.start()
try:
result = await runner.run({"query": "python tutorials"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
assert output.get("report"), "No report produced"
```
### Boundary Condition
```python
@pytest.mark.asyncio
async def test_minimum_sources(runner, auto_responder, mock_mode):
"""Test at minimum source threshold."""
await auto_responder.start()
try:
result = await runner.run({"query": "niche topic"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
sources = output.get("sources", [])
if isinstance(sources, list):
assert len(sources) >= 3, f"Expected >= 3 sources, got {len(sources)}"
```
### Error Handling
```python
@pytest.mark.asyncio
async def test_empty_input(runner, auto_responder, mock_mode):
"""Test graceful handling of empty input."""
await auto_responder.start()
try:
result = await runner.run({"query": ""})
finally:
await auto_responder.stop()
# Agent should either fail gracefully or produce an error message
output = result.output or {}
assert not result.success or output.get("error"), "Should handle empty input"
```
### Feedback Loop
```python
@pytest.mark.asyncio
async def test_feedback_loop_terminates(runner, auto_responder, mock_mode):
"""Test that feedback loops don't run forever."""
await auto_responder.start()
try:
result = await runner.run({"query": "test"})
finally:
await auto_responder.stop()
visits = result.node_visit_counts or {}
for node_id, count in visits.items():
assert count <= 5, f"Node {node_id} visited {count} times — possible infinite loop"
```
---
## MCP Tool Reference
### Phase 1: Test Generation
```python
# Check existing tests
list_tests(goal_id, agent_path)
# Get constraint test guidelines (returns templates, NOT generated tests)
generate_constraint_tests(goal_id, goal_json, agent_path)
# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines
# Get success criteria test guidelines
generate_success_tests(goal_id, goal_json, node_names, tool_names, agent_path)
# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines
```
### Phase 2: Execution
```python
# Automated regression (no checkpoints, fresh runs)
run_tests(goal_id, agent_path, test_types='["all"]', parallel=-1, fail_fast=False)
# Run only specific test types
run_tests(goal_id, agent_path, test_types='["constraint"]')
run_tests(goal_id, agent_path, test_types='["success"]')
```
```bash
# Iterative debugging with checkpoints (via CLI)
uv run hive run exports/{agent_name} --input '{"query": "test"}'
```
### Phase 3: Analysis
```python
# Debug a specific failed test
debug_test(goal_id, test_name, agent_path)
# Find failed sessions
list_agent_sessions(agent_work_dir, status="failed", limit=5)
# Inspect session state (excludes memory values)
get_agent_session_state(agent_work_dir, session_id)
# Inspect memory data
get_agent_session_memory(agent_work_dir, session_id, key="research_results")
# Runtime logs: L1 summaries
query_runtime_logs(agent_work_dir, status="needs_attention")
# Runtime logs: L2 per-node details
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
# Runtime logs: L3 tool/LLM raw data
query_runtime_log_raw(agent_work_dir, run_id, node_id="research")
# Find clean checkpoints
list_agent_checkpoints(agent_work_dir, session_id, is_clean="true")
# Compare checkpoints (memory diff)
compare_agent_checkpoints(agent_work_dir, session_id, cp_before, cp_after)
```
### Phase 5: Recovery
```python
# Inspect checkpoint before resuming
get_agent_checkpoint(agent_work_dir, session_id, checkpoint_id)
# Empty checkpoint_id = latest checkpoint
```
```bash
# Resume from checkpoint via CLI (headless)
uv run hive run exports/{agent_name} \
--resume-session {session_id} --checkpoint {checkpoint_id}
```
---
## Anti-Patterns
| Don't | Do Instead |
|-------|-----------|
| Use `default_agent.run()` in tests | Use `runner.run()` with `auto_responder` fixtures (goes through AgentRuntime) |
| Re-run entire agent when a late node fails | Resume from last clean checkpoint |
| Treat `result.success` as goal achieved | Check `result.output` for actual criteria |
| Access `result.output["key"]` directly | Use `result.output.get("key")` |
| Fix random things hoping tests pass | Analyze L2/L3 logs to find root cause first |
| Write 30+ tests | Write 8-15 focused tests |
| Skip credential check | Use `/hive-credentials` before testing |
| Confuse `exports/` with `~/.hive/agents/` | Code in `exports/`, runtime data in `~/.hive/` |
| Use `run_tests` for iterative debugging | Use headless CLI with checkpoints for iterative debugging |
| Use headless CLI for final regression | Use `run_tests` for automated regression |
| Use `--tui` from Claude Code | Use headless `run` command — TUI hangs in non-interactive shells |
| Test client-facing nodes from Claude Code | Use mock mode, or have the user run the agent in their terminal |
| Run tests without reading goal first | Always understand the goal before writing tests |
| Skip Phase 3 analysis and guess | Use session + log tools to identify root cause |
---
## Example Walkthrough: Deep Research Agent
A complete iteration showing the test loop for an agent with nodes: `intake → research → review → report`.
### Phase 1: Generate tests
```python
# Read the goal
Read(file_path="exports/deep_research_agent/agent.py")
# Get success criteria test guidelines
result = generate_success_tests(
goal_id="rigorous-interactive-research",
goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "target": ">=5"}, {"id": "citation-coverage", "target": "100%"}, {"id": "report-completeness", "target": "90%"}]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/deep_research_agent"
)
# Write tests
Write(
file_path=result["output_file"],
content=result["file_header"] + "\n\n" + test_code
)
```
### Phase 2: First execution
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
Result: `test_success_source_diversity` fails — agent only found 2 sources instead of 5.
### Phase 3: Analyze
```python
# Debug the failing test
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_source_diversity",
agent_path="exports/deep_research_agent"
)
# → ASSERTION_FAILURE: Expected >= 5 sources, got 2
# Find the session
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_150000_abc12345
# See what the research node produced
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
key="research_results"
)
# → Only 2 web_search calls made, each returned 1 source
# Check the LLM's behavior in the research node
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id="session_20260209_150000_abc12345",
node_id="research"
)
# → LLM called web_search only twice, then called set_output
```
Root cause: The research node's prompt doesn't tell the LLM to search for at least 5 diverse sources. It stops after the first couple of searches.
### Phase 4: Fix the prompt
```python
Read(file_path="exports/deep_research_agent/nodes/__init__.py")
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Search for information on the user\'s topic."',
new_string='system_prompt="Search for information on the user\'s topic. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries to ensure source diversity. Do not stop searching until you have at least 5 distinct sources."'
)
```
### Phase 5: Resume from checkpoint
For this example, the fix is to the `research` node. If we had run via CLI with checkpointing, we could resume from the checkpoint after `intake` to skip re-running intake:
```bash
# Check if clean checkpoint exists after intake
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
is_clean="true"
)
# → cp_node_complete_intake_150005
# Resume from after intake, re-run research with fixed prompt
uv run hive run exports/deep_research_agent \
--resume-session session_20260209_150000_abc12345 \
--checkpoint cp_node_complete_intake_150005
```
Or for this simple case (intake is fast), just re-run:
```bash
uv run hive run exports/deep_research_agent --input '{"topic": "test"}'
```
### Phase 6: Final verification
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent"
)
# → All 12 tests pass
```
---
## Test File Structure
```
exports/{agent_name}/
├── agent.py ← Agent to test (goal, nodes, edges)
├── nodes/__init__.py ← Node implementations (prompts, config)
├── config.py ← Agent configuration
├── mcp_servers.json ← Tool server config
└── tests/
├── conftest.py ← Shared fixtures + safe access helpers
├── test_constraints.py ← Constraint tests
├── test_success_criteria.py ← Success criteria tests
└── test_edge_cases.py ← Edge case tests
```
## Integration with Other Skills
| Scenario | From | To | Action |
|----------|------|----|--------|
| Agent built, ready to test | `/hive-create` | `/hive-test` | Generate tests, start loop |
| Prompt fix needed | `/hive-test` Phase 4 | Direct edit | Edit `nodes/__init__.py`, resume |
| Goal definition wrong | `/hive-test` Phase 4 | `/hive-create` | Update goal, may need rebuild |
| Missing credentials | `/hive-test` Phase 3 | `/hive-credentials` | Set up credentials |
| Complex runtime failure | `/hive-test` Phase 3 | `/hive-debugger` | Deep L1/L2/L3 analysis |
| All tests pass | `/hive-test` Phase 6 | Done | Agent validated |
@@ -0,0 +1,333 @@
# Example: Iterative Testing of a Research Agent
This example walks through the full iterative test loop for a research agent that searches the web, reviews findings, and produces a cited report.
## Agent Structure
```
exports/deep_research_agent/
├── agent.py # Goal + graph: intake → research → review → report
├── nodes/__init__.py # Node definitions (system_prompt, input/output keys)
├── config.py # Model config
├── mcp_servers.json # Tools: web_search, web_scrape
└── tests/ # Test files (we'll create these)
```
**Goal:** "Rigorous Interactive Research" — find 5+ diverse sources, cite every claim, produce a complete report.
---
## Phase 1: Generate Tests
### Read the goal
```python
Read(file_path="exports/deep_research_agent/agent.py")
# Extract: goal_id="rigorous-interactive-research"
# success_criteria: source-diversity (>=5), citation-coverage (100%), report-completeness (90%)
# constraints: no-hallucination, source-attribution
```
### Get test guidelines
```python
result = generate_success_tests(
goal_id="rigorous-interactive-research",
goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "description": "Use multiple diverse sources", "target": ">=5"}, {"id": "citation-coverage", "description": "Every claim cites its source", "target": "100%"}, {"id": "report-completeness", "description": "Report answers the research questions", "target": "90%"}]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/deep_research_agent"
)
```
### Write tests
```python
Write(
file_path="exports/deep_research_agent/tests/test_success_criteria.py",
content=result["file_header"] + '''
@pytest.mark.asyncio
async def test_success_source_diversity(runner, auto_responder, mock_mode):
"""At least 5 diverse sources are found."""
await auto_responder.start()
try:
result = await runner.run({"query": "impact of remote work on productivity"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
sources = output.get("sources", [])
if isinstance(sources, list):
assert len(sources) >= 5, f"Expected >= 5 sources, got {len(sources)}"
@pytest.mark.asyncio
async def test_success_citation_coverage(runner, auto_responder, mock_mode):
"""Every factual claim in the report cites its source."""
await auto_responder.start()
try:
result = await runner.run({"query": "climate change effects on agriculture"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
# Check that report contains numbered references
assert "[1]" in str(report) or "[source" in str(report).lower(), "Report lacks citations"
@pytest.mark.asyncio
async def test_success_report_completeness(runner, auto_responder, mock_mode):
"""Report addresses the original research question."""
query = "pros and cons of nuclear energy"
await auto_responder.start()
try:
result = await runner.run({"query": query})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
assert len(str(report)) > 200, f"Report too short: {len(str(report))} chars"
@pytest.mark.asyncio
async def test_empty_query_handling(runner, auto_responder, mock_mode):
"""Agent handles empty input gracefully."""
await auto_responder.start()
try:
result = await runner.run({"query": ""})
finally:
await auto_responder.stop()
output = result.output or {}
assert not result.success or output.get("error"), "Should handle empty query"
@pytest.mark.asyncio
async def test_feedback_loop_terminates(runner, auto_responder, mock_mode):
"""Feedback loop between review and research terminates."""
await auto_responder.start()
try:
result = await runner.run({"query": "quantum computing basics"})
finally:
await auto_responder.stop()
visits = result.node_visit_counts or {}
for node_id, count in visits.items():
assert count <= 5, f"Node {node_id} visited {count} times"
'''
)
```
---
## Phase 2: First Execution
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Result:**
```json
{
"overall_passed": false,
"summary": {"total": 5, "passed": 3, "failed": 2, "pass_rate": "60.0%"},
"failures": [
{"test_name": "test_success_source_diversity", "details": "AssertionError: Expected >= 5 sources, got 2"},
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
---
## Phase 3: Analyze (Iteration 1)
### Debug the first failure
```python
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_source_diversity",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Expected >= 5 sources, got 2
```
### Find the session and inspect memory
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_150000_abc12345
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
key="research_results"
)
# → Only 2 sources found. LLM stopped searching after 2 queries.
```
### Check LLM behavior in the research node
```python
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id="session_20260209_150000_abc12345",
node_id="research"
)
# → LLM called web_search twice, got results, immediately called set_output.
# → Prompt doesn't instruct it to find at least 5 sources.
```
**Root cause:** The research node's system_prompt doesn't specify minimum source requirements.
---
## Phase 4: Fix (Iteration 1)
```python
Read(file_path="exports/deep_research_agent/nodes/__init__.py")
# Fix the research node prompt
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Search for information on the user\'s topic using web search."',
new_string='system_prompt="Search for information on the user\'s topic using web search. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries with varied keywords. Do NOT call set_output until you have gathered at least 5 distinct sources from different domains."'
)
```
---
## Phase 5: Recover & Resume (Iteration 1)
The fix is to the `research` node. Since this was a `run_tests` execution (no checkpoints), we re-run from scratch:
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Result:**
```json
{
"overall_passed": false,
"summary": {"total": 5, "passed": 4, "failed": 1, "pass_rate": "80.0%"},
"failures": [
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
Source diversity now passes. Citation coverage still fails.
---
## Phase 3: Analyze (Iteration 2)
```python
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_citation_coverage",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Report lacks citations
# Check what the report node produced
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_151500_def67890
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_151500_def67890",
key="report"
)
# → Report text exists but uses no numbered references.
# → Sources are in memory but report node doesn't cite them.
```
**Root cause:** The report node's prompt doesn't instruct the LLM to include numbered citations.
---
## Phase 4: Fix (Iteration 2)
```python
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Write a comprehensive report based on the research findings."',
new_string='system_prompt="Write a comprehensive report based on the research findings. You MUST include numbered citations [1], [2], etc. for every factual claim. At the end, include a References section listing all sources with their URLs. Every claim must be traceable to a specific source."'
)
```
---
## Phase 5: Resume (Iteration 2)
The fix is to the `report` node (the last node). To demonstrate checkpoint recovery, run via CLI:
```bash
# Run via CLI to get checkpoints
uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'
# After it runs, find the clean checkpoint before report
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_152000_ghi34567",
is_clean="true"
)
# → cp_node_complete_review_152100 (after review, before report)
# Resume — skips intake, research, review entirely
uv run hive run exports/deep_research_agent \
--resume-session session_20260209_152000_ghi34567 \
--checkpoint cp_node_complete_review_152100
```
Only the `report` node re-runs with the fixed prompt, using research data from the checkpoint.
---
## Phase 6: Final Verification
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent"
)
```
**Result:**
```json
{
"overall_passed": true,
"summary": {"total": 5, "passed": 5, "failed": 0, "pass_rate": "100.0%"}
}
```
All tests pass.
---
## Summary
| Iteration | Failure | Root Cause | Fix | Recovery |
|-----------|---------|------------|-----|----------|
| 1 | Source diversity (2 < 5) | Research prompt too vague | Added "at least 5 sources" to prompt | Re-run (no checkpoints) |
| 2 | No citations in report | Report prompt lacks citation instructions | Added citation requirements | Checkpoint resume (skipped 3 nodes) |
**Key takeaways:**
- Phase 3 analysis (session memory + L3 logs) identified root causes without guessing
- Checkpoint recovery in iteration 2 saved time by skipping 3 expensive nodes
- Final `run_tests` confirms all scenarios pass end-to-end
@@ -1,32 +1,53 @@
---
name: agent-workflow
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-* and testing-agent skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
name: hive
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: workflow-orchestrator
orchestrates:
- building-agents-core
- building-agents-construction
- building-agents-patterns
- testing-agent
- setup-credentials
- hive-concepts
- hive-create
- hive-patterns
- hive-test
- hive-credentials
- hive-debugger
---
# Agent Development Workflow
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
```
Use AskUserQuestion with these options:
- "Build a new agent" → Then invoke /hive-create
- "Test an existing agent" → Then invoke /hive-test
- "Learn agent concepts" → Then invoke /hive-concepts
- "Optimize agent design" → Then invoke /hive-patterns
- "Set up credentials" → Then invoke /hive-credentials
- "Debug a failing agent" → Then invoke /hive-debugger
- "Other" (please describe what you want to achieve)
```
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
---
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
## Overview
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
1. **Understand Concepts**`/building-agents-core` (optional)
2. **Build Structure**`/building-agents-construction`
3. **Optimize Design**`/building-agents-patterns` (optional)
4. **Setup Credentials**`/setup-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/testing-agent`
1. **Understand Concepts**`/hive-concepts` (optional)
2. **Build Structure**`/hive-create`
3. **Optimize Design**`/hive-patterns` (optional)
4. **Setup Credentials**`/hive-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/hive-test`
6. **Debug Issues**`/hive-debugger` (if agent fails at runtime)
## When to Use This Workflow
@@ -37,26 +58,26 @@ Use this meta-skill when:
- Want consistent, repeatable agent builds
**Skip this workflow** if:
- You only need to test an existing agent → use `/testing-agent` directly
- You only need to test an existing agent → use `/hive-test` directly
- You know exactly which phase you're in → use specific skill directly
## Quick Decision Tree
```
"Need to understand agent concepts" → building-agents-core
"Build a new agent" → building-agents-construction
"Optimize my agent design" → building-agents-patterns
"Need client-facing nodes or feedback loops" → building-agents-patterns
"Set up API keys for my agent" → setup-credentials
"Test my agent" → testing-agent
"Need to understand agent concepts" → hive-concepts
"Build a new agent" → hive-create
"Optimize my agent design" → hive-patterns
"Need client-facing nodes or feedback loops" → hive-patterns
"Set up API keys for my agent" → hive-credentials
"Test my agent" → hive-test
"My agent is failing/stuck/has errors" → hive-debugger
"Not sure what I need" → Read phases below, then decide
"Agent has structure but needs implementation" → See agent directory STATUS.md
```
## Phase 0: Understand Concepts (Optional)
**Duration**: 5-10 minutes
**Skill**: `/building-agents-core`
**Skill**: `/hive-concepts`
**Input**: Questions about agent architecture
### When to Use
@@ -77,9 +98,8 @@ Use this meta-skill when:
## Phase 1: Build Agent Structure
**Duration**: 15-30 minutes
**Skill**: `/building-agents-construction`
**Input**: User requirements ("Build an agent that...")
**Skill**: `/hive-create`
**Input**: User requirements ("Build an agent that...") or a template to start from
### What This Phase Does
@@ -121,7 +141,7 @@ You're ready for Phase 2 when:
### Common Outputs
The building-agents-construction skill produces:
The hive-create skill produces:
```
exports/agent_name/
├── __init__.py (package exports)
@@ -141,15 +161,14 @@ exports/agent_name/
→ You may need to add Python functions or MCP tools (not covered by current skills)
**If want to optimize design:**
→ Proceed to Phase 1.5 (building-agents-patterns)
→ Proceed to Phase 1.5 (hive-patterns)
**If ready to test:**
→ Proceed to Phase 2
## Phase 1.5: Optimize Design (Optional)
**Duration**: 10-15 minutes
**Skill**: `/building-agents-patterns`
**Skill**: `/hive-patterns`
**Input**: Completed agent structure
### When to Use
@@ -173,22 +192,21 @@ exports/agent_name/
## Phase 2: Test & Validate
**Duration**: 20-40 minutes
**Skill**: `/testing-agent`
**Skill**: `/hive-test`
**Input**: Working agent from Phase 1
### What This Phase Does
Creates comprehensive test suite:
- Constraint tests (verify hard requirements)
- Success criteria tests (measure goal achievement)
- Edge case tests (handle failures gracefully)
- Integration tests (end-to-end workflows)
Guides the creation and execution of a comprehensive test suite:
- Constraint tests
- Success criteria tests
- Edge case tests
- Integration tests
### Process
1. **Analyze agent** - Read goal, constraints, success criteria
2. **Generate tests** - Create pytest files in `exports/agent_name/tests/`
2. **Generate tests** - The calling agent writes pytest files in `exports/agent_name/tests/` using hive-test guidelines and templates
3. **User approval** - Review and approve each test
4. **Run evaluation** - Execute tests and collect results
5. **Debug failures** - Identify and fix issues
@@ -251,9 +269,9 @@ You're done when:
```
User: "Build an agent that monitors files"
→ Use /building-agents-construction
→ Use /hive-create
→ Agent structure created
→ Use /testing-agent
→ Use /hive-test
→ Tests created and passing
→ Done: Production-ready agent
```
@@ -262,19 +280,32 @@ User: "Build an agent that monitors files"
```
User: "Build an agent (first time)"
→ Use /building-agents-core (understand concepts)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (optimize design)
→ Use /testing-agent (validate)
→ Use /hive-concepts (understand concepts)
→ Use /hive-create (build structure)
→ Use /hive-patterns (optimize design)
→ Use /hive-test (validate)
→ Done: Production-ready agent
```
### Pattern 1c: Build from Template
```
User: "Build an agent based on the deep research template"
→ Use /hive-create
→ Select "From a template" path
→ Pick template, name new agent
→ Review/modify goal, nodes, graph
→ Agent exported with customizations
→ Use /hive-test
→ Done: Customized agent
```
### Pattern 2: Test Existing Agent
```
User: "Test my agent at exports/my_agent"
→ Skip Phase 1
→ Use /testing-agent directly
→ Use /hive-test directly
→ Tests created
→ Done: Validated agent
```
@@ -283,10 +314,10 @@ User: "Test my agent at exports/my_agent"
```
User: "Build an agent"
→ Use /building-agents-construction (Phase 1)
→ Use /hive-create (Phase 1)
→ Implementation needed (see STATUS.md)
→ [User implements functions]
→ Use /testing-agent (Phase 2)
→ Use /hive-test (Phase 2)
→ Tests reveal bugs
→ [Fix bugs manually]
→ Re-run tests
@@ -297,45 +328,57 @@ User: "Build an agent"
```
User: "Build an agent with human review and feedback loops"
→ Use /building-agents-core (learn event loop, client-facing nodes)
→ Use /building-agents-construction (build structure with feedback edges)
→ Use /building-agents-patterns (implement client-facing + feedback patterns)
→ Use /testing-agent (validate review flows and edge routing)
→ Use /hive-concepts (learn event loop, client-facing nodes)
→ Use /hive-create (build structure with feedback edges)
→ Use /hive-patterns (implement client-facing + feedback patterns)
→ Use /hive-test (validate review flows and edge routing)
→ Done: Agent with HITL checkpoints and review loops
```
## Skill Dependencies
```
agent-workflow (meta-skill)
hive (meta-skill)
├── building-agents-core (foundational)
├── hive-concepts (foundational)
│ ├── Architecture concepts (event loop, judges)
│ ├── Node types (event_loop, function)
│ ├── Edge routing and priority
│ ├── Tool discovery procedures
│ └── Workflow overview
├── building-agents-construction (procedural)
├── hive-create (procedural)
│ ├── Creates package structure
│ ├── Defines goal
│ ├── Adds nodes (event_loop, function)
│ ├── Connects edges with priority routing
│ ├── Finalizes agent class
│ └── Requires: building-agents-core
│ └── Requires: hive-concepts
├── building-agents-patterns (reference)
├── hive-patterns (reference)
│ ├── Client-facing interaction patterns
│ ├── Feedback edges and review loops
│ ├── Judge patterns (implicit, SchemaJudge)
│ ├── Fan-out/fan-in parallel execution
│ └── Context management and anti-patterns
── testing-agent
├── Reads agent goal
├── Generates tests
├── Runs evaluation
└── Reports results
── hive-credentials (utility)
├── Detects missing credentials
├── Offers auth method choices (Aden OAuth, direct API key)
├── Stores securely in ~/.hive/credentials
└── Validates with health checks
├── hive-test (validation)
│ ├── Reads agent goal
│ ├── Generates tests
│ ├── Runs evaluation
│ └── Reports results
└── hive-debugger (troubleshooting)
├── Monitors runtime logs (L1/L2/L3)
├── Identifies retry loops, tool failures
├── Categorizes issues (10 categories)
└── Provides fix recommendations
```
## Troubleshooting
@@ -351,7 +394,7 @@ agent-workflow (meta-skill)
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
- Implementation may be needed (Python functions or MCP tools)
- This is expected - building-agents-construction creates structure, not implementation
- This is expected - hive-create creates structure, not implementation
- See implementation guide for completion options
### "Tests are failing"
@@ -359,9 +402,16 @@ agent-workflow (meta-skill)
- Review test output for specific failures
- Check agent goal and success criteria
- Verify constraints are met
- Use `/testing-agent` to debug and iterate
- Use `/hive-test` to debug and iterate
- Fix agent code and re-run tests
### "Agent is failing at runtime"
- Use `/hive-debugger` to analyze runtime logs
- The debugger identifies retry loops, tool failures, and stalled execution
- Get actionable fix recommendations with code changes
- Monitor the agent in real-time during TUI sessions
### "Not sure which phase I'm in"
Run these checks:
@@ -420,10 +470,10 @@ You're done with the workflow when:
## Additional Resources
- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
- **Agent framework docs**: See `core/README.md`
- **Example agents**: See `exports/` directory
@@ -431,36 +481,46 @@ You're done with the workflow when:
This workflow provides a proven path from concept to production-ready agent:
1. **Learn** with `/building-agents-core` → Understand fundamentals (optional)
2. **Build** with `/building-agents-construction` → Get validated structure
3. **Optimize** with `/building-agents-patterns` → Apply best practices (optional)
4. **Test** with `/testing-agent`Get verified functionality
1. **Learn** with `/hive-concepts` → Understand fundamentals (optional)
2. **Build** with `/hive-create` → Get validated structure
3. **Optimize** with `/hive-patterns` → Apply best practices (optional)
4. **Configure** with `/hive-credentials`Set up API keys (if needed)
5. **Test** with `/hive-test` → Get verified functionality
6. **Debug** with `/hive-debugger` → Fix runtime issues (if needed)
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
## Skill Selection Guide
**Choose building-agents-core when:**
**Choose hive-concepts when:**
- First time building agents
- Need to understand event loop architecture
- Validating tool availability
- Learning about node types, edges, and judges
**Choose building-agents-construction when:**
**Choose hive-create when:**
- Actually building an agent
- Have clear requirements
- Ready to write code
- Want step-by-step guidance
- Want to start from an existing template and customize it
**Choose building-agents-patterns when:**
**Choose hive-patterns when:**
- Agent structure complete
- Need client-facing nodes or feedback edges
- Implementing review loops or fan-out/fan-in
- Want judge patterns or context management
- Want best practices
**Choose testing-agent when:**
**Choose hive-test when:**
- Agent structure complete
- Ready to validate functionality
- Need comprehensive test coverage
- Testing feedback loops, output keys, or fan-out
**Choose hive-debugger when:**
- Agent is failing or stuck at runtime
- Seeing retry loops or escalations
- Tool calls are failing
- Need to understand why a node isn't completing
- Want real-time monitoring of agent execution
@@ -1,6 +1,6 @@
# Example: File Monitor Agent
This example shows the complete agent-workflow in action for building a file monitoring agent.
This example shows the complete /hive workflow in action for building a file monitoring agent.
## Initial Request
@@ -12,7 +12,7 @@ User: "Build an agent that monitors ~/Downloads and copies new files to ~/Docume
### Step 1: Create Structure
Agent invokes `/building-agents` skill and:
Agent invokes `/hive-create` skill and:
1. Creates `exports/file_monitor_agent/` package
2. Writes skeleton files (__init__.py, __main__.py, agent.py, etc.)
@@ -107,7 +107,7 @@ exports/file_monitor_agent/
### Step 1: Analyze Agent
Agent invokes `/testing-agent` skill and:
Agent invokes `/hive-test` skill and:
1. Reads goal from `exports/file_monitor_agent/agent.py`
2. Identifies 4 success criteria to test
File diff suppressed because it is too large Load Diff
@@ -1,351 +0,0 @@
# Example: Testing a YouTube Research Agent
This example walks through testing a YouTube research agent that finds relevant videos based on a topic.
## Prerequisites
- Agent built with building-agents skill at `exports/youtube-research/`
- Goal defined with success criteria and constraints
## Step 1: Load the Goal
First, load the goal that was defined during the Goal stage:
```json
{
"id": "youtube-research",
"name": "YouTube Research Agent",
"description": "Find relevant YouTube videos on a given topic",
"success_criteria": [
{
"id": "find_videos",
"description": "Find 3-5 relevant videos",
"metric": "video_count",
"target": "3-5",
"weight": 1.0
},
{
"id": "relevance",
"description": "Videos must be relevant to the topic",
"metric": "relevance_score",
"target": ">0.8",
"weight": 0.8
}
],
"constraints": [
{
"id": "api_limits",
"description": "Must not exceed YouTube API rate limits",
"constraint_type": "hard",
"category": "technical"
},
{
"id": "content_safety",
"description": "Must filter out inappropriate content",
"constraint_type": "hard",
"category": "safety"
}
]
}
```
## Step 2: Get Constraint Test Guidelines
During the Goal stage (or early Eval), get test guidelines for constraints:
```python
result = generate_constraint_tests(
goal_id="youtube-research",
goal_json='<goal JSON above>',
agent_path="exports/youtube-research"
)
```
**The result contains guidelines (not generated tests):**
- `output_file`: Where to write tests
- `file_header`: Imports and fixtures to use
- `test_template`: Format for test functions
- `constraints_formatted`: The constraints to test
- `test_guidelines`: Rules for writing tests
## Step 3: Write Constraint Tests
Using the guidelines, write tests directly with the Write tool:
```python
# Write constraint tests using the provided file_header and guidelines
Write(
file_path="exports/youtube-research/tests/test_constraints.py",
content='''
"""Constraint tests for youtube-research agent."""
import os
import pytest
from exports.youtube_research import default_agent
pytestmark = pytest.mark.skipif(
not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing."
)
@pytest.mark.asyncio
async def test_constraint_api_limits_respected():
"""Verify API rate limits are not exceeded."""
import time
mock_mode = bool(os.environ.get("MOCK_MODE"))
for i in range(10):
result = await default_agent.run({"topic": f"test_{i}"}, mock_mode=mock_mode)
time.sleep(0.1)
# Should complete without rate limit errors
assert "rate limit" not in str(result).lower()
@pytest.mark.asyncio
async def test_constraint_content_safety_filter():
"""Verify inappropriate content is filtered."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "general topic"}, mock_mode=mock_mode)
for video in result.videos:
assert video.safe_for_work is True
assert video.age_restricted is False
'''
)
```
## Step 4: Get Success Criteria Test Guidelines
After the agent is built, get success criteria test guidelines:
```python
result = generate_success_tests(
goal_id="youtube-research",
goal_json='<goal JSON>',
node_names="search_node,filter_node,rank_node,format_node",
tool_names="youtube_search,video_details,channel_info",
agent_path="exports/youtube-research"
)
```
## Step 5: Write Success Criteria Tests
Using the guidelines, write success criteria tests:
```python
Write(
file_path="exports/youtube-research/tests/test_success_criteria.py",
content='''
"""Success criteria tests for youtube-research agent."""
import os
import pytest
from exports.youtube_research import default_agent
pytestmark = pytest.mark.skipif(
not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing."
)
@pytest.mark.asyncio
async def test_find_videos_happy_path():
"""Test finding videos for a common topic."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "machine learning"}, mock_mode=mock_mode)
assert result.success
assert 3 <= len(result.videos) <= 5
assert all(v.title for v in result.videos)
assert all(v.video_id for v in result.videos)
@pytest.mark.asyncio
async def test_find_videos_minimum_boundary():
"""Test at minimum threshold (3 videos)."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "niche topic xyz"}, mock_mode=mock_mode)
assert len(result.videos) >= 3
@pytest.mark.asyncio
async def test_relevance_score_threshold():
"""Test relevance scoring meets threshold."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "python programming"}, mock_mode=mock_mode)
for video in result.videos:
assert video.relevance_score > 0.8
@pytest.mark.asyncio
async def test_find_videos_no_results_graceful():
"""Test graceful handling of no results."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "xyznonexistent123"}, mock_mode=mock_mode)
# Should not crash, return empty or message
assert result.videos == [] or result.message
'''
)
```
## Step 6: Run All Tests
Execute all tests:
```python
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]',
parallel=4
)
```
**Results:**
```json
{
"goal_id": "youtube-research",
"overall_passed": false,
"summary": {
"total": 6,
"passed": 5,
"failed": 1,
"pass_rate": "83.3%"
},
"duration_ms": 4521,
"results": [
{"test_id": "test_constraint_api_001", "passed": true, "duration_ms": 1234},
{"test_id": "test_constraint_content_001", "passed": true, "duration_ms": 456},
{"test_id": "test_success_001", "passed": true, "duration_ms": 789},
{"test_id": "test_success_002", "passed": true, "duration_ms": 654},
{"test_id": "test_success_003", "passed": true, "duration_ms": 543},
{"test_id": "test_success_004", "passed": false, "duration_ms": 845,
"error_category": "IMPLEMENTATION_ERROR",
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'"}
]
}
```
## Step 7: Debug the Failed Test
```python
result = debug_test(
goal_id="youtube-research",
test_name="test_find_videos_no_results_graceful",
agent_path="exports/youtube-research"
)
```
**Debug Output:**
```json
{
"test_id": "test_success_004",
"test_name": "test_find_videos_no_results_graceful",
"input": {"topic": "xyznonexistent123"},
"expected": "Empty list or message",
"actual": {"error": "TypeError: 'NoneType' object has no attribute 'videos'"},
"passed": false,
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'",
"error_category": "IMPLEMENTATION_ERROR",
"stack_trace": "Traceback (most recent call last):\n File \"filter_node.py\", line 42\n for video in result.videos:\nTypeError: 'NoneType' object has no attribute 'videos'",
"logs": [
{"timestamp": "2026-01-20T10:00:01", "node": "search_node", "level": "INFO", "msg": "Searching for: xyznonexistent123"},
{"timestamp": "2026-01-20T10:00:02", "node": "search_node", "level": "WARNING", "msg": "No results found"},
{"timestamp": "2026-01-20T10:00:02", "node": "filter_node", "level": "ERROR", "msg": "NoneType error"}
],
"runtime_data": {
"execution_path": ["start", "search_node", "filter_node"],
"node_outputs": {
"search_node": null
}
},
"suggested_fix": "Add null check in filter_node before accessing .videos attribute",
"iteration_guidance": {
"stage": "Agent",
"action": "Fix the code in nodes/edges",
"restart_required": false,
"description": "The goal is correct, but filter_node doesn't handle null results from search_node."
}
}
```
## Step 8: Iterate Based on Category
Since this is an **IMPLEMENTATION_ERROR**, we:
1. **Don't restart** the Goal → Agent → Eval flow
2. **Fix the agent** using building-agents skill:
- Modify `filter_node` to handle null results
3. **Re-run Eval** (tests only)
### Fix in building-agents:
```python
# Update the filter_node to handle null
add_node(
node_id="filter_node",
name="Filter Node",
description="Filter and rank videos",
node_type="function",
input_keys=["search_results"],
output_keys=["filtered_videos"],
system_prompt="""
Filter videos by relevance.
IMPORTANT: Handle case where search_results is None or empty.
Return empty list if no results.
"""
)
```
### Re-export and re-test:
```python
# Re-export the fixed agent
export_graph(path="exports/youtube-research")
# Re-run tests
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]'
)
```
**Updated Results:**
```json
{
"goal_id": "youtube-research",
"overall_passed": true,
"summary": {
"total": 6,
"passed": 6,
"failed": 0,
"pass_rate": "100.0%"
}
}
```
## Summary
1. **Got guidelines** for constraint tests during Goal stage
2. **Wrote** constraint tests using Write tool
3. **Got guidelines** for success criteria tests during Eval stage
4. **Wrote** success criteria tests using Write tool
5. **Ran** tests in parallel
6. **Debugged** the one failure
7. **Categorized** as IMPLEMENTATION_ERROR
8. **Fixed** the agent (not the goal)
9. **Re-ran** Eval only (didn't restart full flow)
10. **Passed** all tests
The agent is now validated and ready for production use.
+7
View File
@@ -0,0 +1,7 @@
# Project-level Codex config for Hive.
# Keep this file minimal: MCP connectivity + skill discovery.
[mcp_servers.agent-builder]
command = "uv"
args = ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"]
cwd = "."
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/agent-workflow
@@ -1 +0,0 @@
../../.claude/skills/building-agents-construction
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-core
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/testing-agent
+3
View File
@@ -74,3 +74,6 @@ exports/*
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
-5
View File
@@ -4,11 +4,6 @@
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
+30
View File
@@ -0,0 +1,30 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": [
"run",
"python",
"-m",
"framework.mcp.agent_builder_server"
],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
},
"tools": {
"command": "uv",
"args": [
"run",
"python",
"mcp_server.py",
"--stdio"
],
"cwd": "tools",
"env": {
"PYTHONPATH": "src"
}
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-debugger
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/triage-issue
-41
View File
@@ -1,41 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
### Changed
- N/A
### Deprecated
- N/A
### Removed
- N/A
### Fixed
- tools: Fixed web_scrape tool attempting to parse non-HTML content (PDF, JSON) as HTML (#487)
### Security
- N/A
## [0.1.0] - 2025-01-13
### Added
- Initial release
[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
+19 -10
View File
@@ -1,10 +1,10 @@
# Contributing to Aden Agent Framework
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. Were especially looking for help building tools, integrations([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If youre interested in extending its functionality, this is the perfect place to start.
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. Were especially looking for help building tools, integrations ([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If youre interested in extending its functionality, this is the perfect place to start.
## Code of Conduct
By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md).
By participating in this project, you agree to abide by our [Code of Conduct](docs/CODE_OF_CONDUCT.md).
## Issue Assignment Policy
@@ -35,15 +35,22 @@ You may submit PRs without prior assignment for:
1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/hive.git`
3. Create a feature branch: `git checkout -b feature/your-feature-name`
4. Make your changes
5. Run checks and tests:
3. Add the upstream repository: `git remote add upstream https://github.com/adenhq/hive.git`
4. Sync with upstream to ensure you're starting from the latest code:
```bash
git fetch upstream
git checkout main
git merge upstream/main
```
5. Create a feature branch: `git checkout -b feature/your-feature-name`
6. Make your changes
7. Run checks and tests:
```bash
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
make test # Core tests (cd core && pytest tests/ -v)
```
6. Commit your changes following our commit conventions
7. Push to your fork and submit a Pull Request
8. Commit your changes following our commit conventions
9. Push to your fork and submit a Pull Request
## Development Setup
@@ -92,8 +99,7 @@ docs(readme): update installation instructions
2. Update documentation if needed
3. Add tests for new functionality
4. Ensure `make check` and `make test` pass
5. Update the CHANGELOG.md if applicable
6. Request review from maintainers
5. Request review from maintainers
### PR Title Format
@@ -138,6 +144,9 @@ make test
# Or run tests directly
cd core && pytest tests/ -v
# Run tools package tests (when contributing to tools/)
cd tools && uv run pytest tests/ -v
# Run tests for a specific agent
PYTHONPATH=exports uv run python -m agent_name test
```
@@ -152,4 +161,4 @@ By submitting a Pull Request, you agree that your contributions will be licensed
Feel free to open an issue for questions or join our [Discord community](https://discord.com/invite/MXE49hrKDk).
Thank you for contributing!
Thank you for contributing!
+3 -1
View File
@@ -4,9 +4,11 @@ help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter (with auto-fix)
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && ruff check --fix .
cd tools && ruff check --fix .
cd core && ruff format .
cd tools && ruff format .
format: ## Run ruff formatter
cd core && ruff format .
-47
View File
@@ -1,47 +0,0 @@
## Summary
- **Added HubSpot integration** — new HubSpot MCP tool with search, get, create, and update operations for contacts, companies, and deals. Includes OAuth2 provider for HubSpot credentials and credential store adapter for the tools layer.
- **Replaced web_scrape tool with Playwright + stealth** — swapped httpx/BeautifulSoup for a headless Chromium browser using `playwright` (async API) and `playwright-stealth`, enabling JS-rendered page scraping and bot detection evasion
- **Added empty response retry logic** — LLM provider now detects empty responses (e.g. Gemini returning 200 with no content on rate limit) and retries with exponential backoff, preventing hallucinated output from the cleanup LLM
- **Added context-aware input compaction** — LLM nodes now estimate input token count before calling the model and progressively truncate the largest values if they exceed the context window budget
- **Increased rate limit retries to 10** with verbose `[retry]` and `[compaction]` logging that includes model name, finish reason, and attempt count
- **Interactive quickstart onboarding** — `quickstart.sh` rewritten as bee-themed interactive wizard that detects existing API keys (including Claude Code subscription), lets user pick ONE default LLM provider, and saves configuration to `~/.hive/configuration.json`
- **Fixed lint errors** across `hubspot_tool.py` (line length) and `agent_builder_server.py` (unused variable)
## Changed files
### HubSpot Integration
- `tools/src/aden_tools/tools/hubspot_tool/` — New MCP tool: contacts, companies, and deals CRUD
- `tools/src/aden_tools/tools/__init__.py` — Registered HubSpot tools
- `tools/src/aden_tools/credentials/integrations.py` — HubSpot credential integration
- `tools/src/aden_tools/credentials/__init__.py` — Updated credential exports
- `core/framework/credentials/oauth2/hubspot_provider.py` — HubSpot OAuth2 provider
- `core/framework/credentials/oauth2/__init__.py` — Registered HubSpot OAuth2 provider
- `core/framework/runner/runner.py` — Updated runner for credential support
### Web Scrape Rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/web_scrape_tool.py` — Playwright async rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/README.md` — Updated docs
- `tools/pyproject.toml` — Added `playwright`, `playwright-stealth` deps
- `tools/Dockerfile` — Added `playwright install chromium --with-deps`
### LLM Reliability
- `core/framework/llm/litellm.py` — Empty response retry + max retries 10 + verbose logging
- `core/framework/graph/node.py` — Input compaction via `_compact_inputs()`, `_estimate_tokens()`, `_get_context_limit()`
### Quickstart & Setup
- `quickstart.sh` — Interactive bee-themed onboarding wizard with single provider selection
- `~/.hive/configuration.json` — New user config file for default LLM provider/model
### Fixes
- `core/framework/mcp/agent_builder_server.py` — Removed unused variable
- `tools/src/aden_tools/tools/hubspot_tool/hubspot_tool.py` — Fixed E501 line length violations
## Test plan
- [ ] Run `make lint` — passes clean
- [ ] Run `./quickstart.sh` and verify interactive flow works, config saved to `~/.hive/configuration.json`
- [ ] Run `pytest tests/tools/test_web_scrape_tool.py -v`
- [ ] Run agent against a JS-heavy site and verify `web_scrape` returns rendered content
- [ ] Set `HUBSPOT_ACCESS_TOKEN` and verify HubSpot tool CRUD operations work
- [ ] Trigger rate limit and verify `[retry]` logs appear with correct attempt counts
- [ ] Run agent with large inputs and verify `[compaction]` logs show truncation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
+107 -102
View File
@@ -1,5 +1,5 @@
<p align="center">
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
<img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
</p>
<p align="center">
@@ -13,16 +13,19 @@
<a href="docs/i18n/ko.md">한국어</a>
</p>
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<a href="https://github.com/adenhq/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
<a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
<a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
<a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
<a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
@@ -30,15 +33,16 @@
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
## Overview
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
## Who Is Hive For?
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
@@ -58,40 +62,26 @@ Hive may not be the best fit if youre only experimenting with simple agent ch
Use Hive when you need:
- Long-running, autonomous agents
- Multi-agent coordination
- Strong guardrails, process, and controls
- Continuous improvement based on failures
- Strong monitoring, safety, and budget controls
- Multi-agent coordination
- A framework that evolves with your goals
## What is Aden
<p align="center">
<img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
</p>
Aden is a platform for building, deploying, operating, and adapting AI agents:
- **Build** - A Coding Agent generates specialized Worker Agents (Sales, Marketing, Ops) from natural language goals
- **Deploy** - Headless deployment with CI/CD integration and full API lifecycle management
- **Operate** - Real-time monitoring, observability, and runtime guardrails keep agents reliable
- **Adapt** - Continuous evaluation, supervision, and adaptation ensure agents improve over time
- **Infra** - Shared memory, LLM integrations, tools, and skills power every agent
## Quick Links
- **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs
## Quick Start
## Prerequisites
### Prerequisites
- Python 3.11+ for agent development
- Claude Code or Cursor for utilizing agent skills
- Claude Code, Codex CLI, or Cursor for utilizing agent skills
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
@@ -107,45 +97,85 @@ cd hive
```
This sets up:
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- All required Python dependencies
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration
- All required Python dependencies with `uv`
### Build Your First Agent
```bash
# Build an agent using Claude Code
claude> /building-agents-construction
claude> /hive
# Test your agent
claude> /testing-agent
claude> /hive-debugger
# Run your agent
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
# (at separate terminal) Launch the interactive dashboard
hive tui
# Or run directly
hive run exports/your_agent_name --input '{"key": "value"}'
```
## Coding Agent Support
### Codex CLI
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1. **Config:** `.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2. **Skills:** `.agents/skills/` symlinks to Hive skills (tracked in git)
3. **Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
**[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
### Cursor IDE Support
1. **Setup:** Run the quickstart script
2. **Launch:** Open Opencode in the project root.
3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent.
4. **Verify:** Ask the agent *"List your tools"* to confirm the connection.
Skills are also available in Cursor. To enable:
The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat.
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
2. Run `MCP: Enable` to enable MCP servers
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
4. Type `/` in Agent chat and search for skills (e.g., `/building-agents-construction`)
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
## Features
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Integration
<a href="https://github.com/adenhq/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Aden
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
@@ -182,67 +212,60 @@ flowchart LR
style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
```
### The Aden Advantage
### The Hive Advantage
| Traditional Frameworks | Aden |
| Traditional Frameworks | Hive |
| -------------------------- | -------------------------------------- |
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## Run pre-built Agents (Coming Soon)
## Run Agents
### Run a sample agent
Aden Hive provides a list of featured agents that you can use and build on top of.
### Run an agent shared by others
Put the agent in `exports/` and run `PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'`
For building and running goal-driven agents with the framework:
The `hive` CLI is the primary interface for running agents.
```bash
# One-time setup
./quickstart.sh
# Browse and run agents interactively (Recommended)
hive tui
# This sets up:
# - framework package (core runtime)
# - aden_tools package (MCP tools)
# - All Python dependencies
# Run a specific agent directly
hive run exports/my_agent --input '{"task": "Your input here"}'
# Build new agents using Claude Code skills
claude> /building-agents-construction
# Run a specific agent with the TUI dashboard
hive run exports/my_agent --tui
# Test agents
claude> /testing-agent
# Run agents
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
# Interactive REPL
hive shell
```
See [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) for complete setup instructions.
The TUI scans both `exports/` and `examples/templates/` for available agents.
> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
## Documentation
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
## Roadmap
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [ROADMAP.md](ROADMAP.md) for details.
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TD
@@ -310,6 +333,7 @@ subgraph Expansion
j2["Cursor"]
j3["Opencode"]
j4["Antigravity"]
j5["Codex CLI"]
end
subgraph plat["Platform"]
k1["JavaScript/TypeScript SDK"]
@@ -332,11 +356,12 @@ end
classDef done fill:#9e9e9e,color:#fff,stroke:#757575
```
## Contributing
We welcome contributions from the community! Were especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/adenhq/hive/issues/2805)). If youre interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
1. Find or create an issue and get assigned
2. Fork the repository
@@ -369,10 +394,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
## Frequently Asked Questions (FAQ)
**Q: Does Hive depend on LangChain or other agent frameworks?**
No. Hive is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
**Q: What LLM providers does Hive support?**
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
@@ -383,37 +404,25 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma
**Q: What makes Hive different from other agent frameworks?**
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
**Q: Is Hive open-source?**
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: Does Hive collect data from users?**
Hive collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
**Q: What deployment options does Hive support?**
Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](ENVIRONMENT_SETUP.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
**Q: Can Hive handle complex, production-scale use cases?**
Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Hive support human-in-the-loop workflows?**
Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Hive provide?**
Hive includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and MCP tools for agent execution, including file operations, web search, data processing, and more.
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What programming languages does Hive support?**
The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.
**Q: Can Aden agents interact with external tools and APIs?**
**Q: Can Hive agents interact with external tools and APIs?**
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
@@ -423,7 +432,7 @@ Hive provides granular budget controls including spending limits, throttles, and
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
**Q: How can I contribute to Aden?**
@@ -437,10 +446,6 @@ Aden's adaptation loop begins working from the first execution. When an agent fa
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
**Q: Does Aden offer enterprise support?**
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
---
<p align="center">
+1
View File
@@ -1,4 +1,5 @@
exports/
docs/
.agent-builder-sessions/
.pytest_cache/
**/__pycache__/
+1 -1
View File
@@ -145,7 +145,7 @@ uv run python -m framework test-debug <agent_path> <test_name>
uv run python -m framework test-list <goal_id>
```
For detailed testing workflows, see the [testing-agent skill](../.claude/skills/testing-agent/SKILL.md).
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
### Analyzing Agent Behavior with Builder
+2 -2
View File
@@ -4,8 +4,8 @@
"name": "tools",
"description": "Aden tools including web search, file operations, and PDF reading",
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../tools",
"env": {
"BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
+7
View File
@@ -44,6 +44,13 @@ def _configure_paths():
if exports_str not in sys.path:
sys.path.insert(0, exports_str)
# Add examples/templates/ to sys.path so template agents are importable
templates_dir = project_root / "examples" / "templates"
if templates_dir.is_dir():
templates_str = str(templates_dir)
if templates_str not in sys.path:
sys.path.insert(0, templates_str)
# Ensure core/ is also in sys.path (for non-editable-install scenarios)
core_str = str(project_root / "core")
if (project_root / "core").is_dir() and core_str not in sys.path:
+74
View File
@@ -0,0 +1,74 @@
"""Shared Hive configuration utilities.
Centralises reading of ~/.hive/configuration.json so that the runner
and every agent template share one implementation instead of copy-pasting
helper functions.
"""
import json
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.graph.edge import DEFAULT_MAX_TOKENS
# ---------------------------------------------------------------------------
# Low-level config file access
# ---------------------------------------------------------------------------
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE, encoding="utf-8-sig") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
# ---------------------------------------------------------------------------
# Derived helpers
# ---------------------------------------------------------------------------
def get_preferred_model() -> str:
"""Return the user's preferred LLM model string (e.g. 'anthropic/claude-sonnet-4-20250514')."""
llm = get_hive_config().get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
def get_max_tokens() -> int:
"""Return the configured max_tokens, falling back to DEFAULT_MAX_TOKENS."""
return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
def get_api_key() -> str | None:
"""Return the API key from the environment variable specified in configuration."""
llm = get_hive_config().get("llm", {})
api_key_env_var = llm.get("api_key_env_var")
if api_key_env_var:
return os.environ.get(api_key_env_var)
return None
# ---------------------------------------------------------------------------
# RuntimeConfig shared across agent templates
# ---------------------------------------------------------------------------
@dataclass
class RuntimeConfig:
"""Agent runtime configuration loaded from ~/.hive/configuration.json."""
model: str = field(default_factory=get_preferred_model)
temperature: float = 0.7
max_tokens: int = field(default_factory=get_max_tokens)
api_key: str | None = field(default_factory=get_api_key)
api_base: str | None = None
+4
View File
@@ -68,6 +68,7 @@ from .storage import (
)
from .store import CredentialStore
from .template import TemplateResolver
from .validation import ensure_credential_key_env, validate_agent_credentials
# Aden sync components (lazy import to avoid httpx dependency when not needed)
# Usage: from core.framework.credentials.aden import AdenSyncProvider
@@ -111,6 +112,9 @@ __all__ = [
"CredentialRefreshError",
"CredentialValidationError",
"CredentialDecryptionError",
# Validation
"ensure_credential_key_env",
"validate_agent_credentials",
# Aden sync (optional - requires httpx)
"AdenSyncProvider",
"AdenCredentialClient",
+19 -4
View File
@@ -143,19 +143,34 @@ class AdenCredentialResponse:
def from_dict(
cls, data: dict[str, Any], integration_id: str | None = None
) -> AdenCredentialResponse:
"""Create from API response dictionary."""
"""Create from API response dictionary or normalized credential dict."""
expires_at = None
if data.get("expires_at"):
expires_at = datetime.fromisoformat(data["expires_at"].replace("Z", "+00:00"))
resolved_integration_id = (
integration_id
or data.get("integration_id")
or data.get("alias")
or data.get("provider", "")
)
resolved_integration_type = data.get("integration_type") or data.get("provider", "")
metadata = data.get("metadata")
if metadata is None and data.get("email"):
metadata = {"email": data.get("email")}
if metadata is None:
metadata = {}
return cls(
integration_id=integration_id or data.get("alias", data.get("provider", "")),
integration_type=data.get("provider", ""),
integration_id=resolved_integration_id,
integration_type=resolved_integration_type,
access_token=data["access_token"],
token_type=data.get("token_type", "Bearer"),
expires_at=expires_at,
scopes=data.get("scopes", []),
metadata={"email": data.get("email")} if data.get("email") else {},
metadata=metadata,
)
+133
View File
@@ -0,0 +1,133 @@
"""Credential validation utilities.
Provides reusable credential validation for agents, whether run through
the AgentRunner or directly via GraphExecutor.
"""
from __future__ import annotations
import logging
import os
logger = logging.getLogger(__name__)
def ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads it
directly so the runner (and any MCP subprocesses it spawns) can unlock the
encrypted credential store.
Only HIVE_CREDENTIAL_KEY is loaded this way all other secrets (API keys, etc.)
come from the credential store itself.
"""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
except ImportError:
pass
def validate_agent_credentials(nodes: list) -> None:
"""Check that required credentials are available before running an agent.
Scans node specs for required tools and node types, then checks whether
the corresponding credentials exist in the credential store.
Raises CredentialError with actionable guidance if any are missing.
Args:
nodes: List of NodeSpec objects from the agent graph.
"""
required_tools: set[str] = set()
for node in nodes:
if node.tools:
required_tools.update(node.tools)
node_types: set[str] = {node.node_type for node in nodes}
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
except ImportError:
return # aden_tools not installed, skip check
# Build credential store
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(primary=storages[0], fallbacks=storages[1:])
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[str] = []
checked: set[str] = set()
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected = sorted(t for t in required_tools if t in spec.tools)
entry = f" {spec.env_var} for {', '.join(affected)}"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
# Check node type credentials (e.g., ANTHROPIC_API_KEY for LLM nodes)
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_types = sorted(t for t in node_types if t in spec.node_types)
entry = f" {spec.env_var} for {', '.join(affected_types)} nodes"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
if missing:
from framework.credentials.models import CredentialError
lines = ["Missing required credentials:\n"]
lines.extend(missing)
lines.append(
"\nTo fix: run /hive-credentials in Claude Code."
"\nIf you've already set up credentials, restart your terminal to load them."
)
raise CredentialError("\n".join(lines))
+2 -1
View File
@@ -9,7 +9,7 @@ from framework.graph.client_io import (
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.event_loop_node import (
EventLoopNode,
JudgeProtocol,
@@ -58,6 +58,7 @@ __all__ = [
"EdgeSpec",
"EdgeCondition",
"GraphSpec",
"DEFAULT_MAX_TOKENS",
# Executor (fixed graph)
"GraphExecutor",
# Plan (flexible execution)
+85
View File
@@ -0,0 +1,85 @@
"""
Checkpoint Configuration - Controls checkpoint behavior during execution.
"""
from dataclasses import dataclass
@dataclass
class CheckpointConfig:
"""
Configuration for checkpoint behavior during graph execution.
Controls when checkpoints are created, how they're stored,
and when they're pruned.
"""
# Enable/disable checkpointing
enabled: bool = True
# When to checkpoint
checkpoint_on_node_start: bool = True
checkpoint_on_node_complete: bool = True
# Pruning (time-based)
checkpoint_max_age_days: int = 7 # Prune checkpoints older than 1 week
prune_every_n_nodes: int = 10 # Check for pruning every N nodes
# Performance
async_checkpoint: bool = True # Don't block execution on checkpoint writes
# What to include in checkpoints
include_full_memory: bool = True
include_metrics: bool = True
def should_checkpoint_node_start(self) -> bool:
"""Check if should checkpoint before node execution."""
return self.enabled and self.checkpoint_on_node_start
def should_checkpoint_node_complete(self) -> bool:
"""Check if should checkpoint after node execution."""
return self.enabled and self.checkpoint_on_node_complete
def should_prune_checkpoints(self, nodes_executed: int) -> bool:
"""
Check if should prune checkpoints based on execution progress.
Args:
nodes_executed: Number of nodes executed so far
Returns:
True if should check for old checkpoints and prune them
"""
return (
self.enabled
and self.prune_every_n_nodes > 0
and nodes_executed % self.prune_every_n_nodes == 0
)
# Default configuration for most agents
DEFAULT_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=True,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
prune_every_n_nodes=10,
async_checkpoint=True,
)
# Minimal configuration (only checkpoint at node completion)
MINIMAL_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
prune_every_n_nodes=20,
async_checkpoint=True,
)
# Disabled configuration (no checkpointing)
DISABLED_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=False,
)
+123 -10
View File
@@ -27,6 +27,9 @@ class Message:
tool_use_id: str | None = None
tool_calls: list[dict[str, Any]] | None = None
is_error: bool = False
# Phase-aware compaction metadata (continuous mode)
phase_id: str | None = None
is_transition_marker: bool = False
def to_llm_dict(self) -> dict[str, Any]:
"""Convert to OpenAI-format message dict."""
@@ -60,6 +63,10 @@ class Message:
d["tool_calls"] = self.tool_calls
if self.is_error:
d["is_error"] = self.is_error
if self.phase_id is not None:
d["phase_id"] = self.phase_id
if self.is_transition_marker:
d["is_transition_marker"] = self.is_transition_marker
return d
@classmethod
@@ -72,6 +79,8 @@ class Message:
tool_use_id=data.get("tool_use_id"),
tool_calls=data.get("tool_calls"),
is_error=data.get("is_error", False),
phase_id=data.get("phase_id"),
is_transition_marker=data.get("is_transition_marker", False),
)
@@ -188,6 +197,7 @@ class NodeConversation:
self._next_seq: int = 0
self._meta_persisted: bool = False
self._last_api_input_tokens: int | None = None
self._current_phase: str | None = None
# --- Properties --------------------------------------------------------
@@ -195,6 +205,33 @@ class NodeConversation:
def system_prompt(self) -> str:
return self._system_prompt
def update_system_prompt(self, new_prompt: str) -> None:
"""Update the system prompt.
Used in continuous conversation mode at phase transitions to swap
Layer 3 (focus) while preserving the conversation history.
"""
self._system_prompt = new_prompt
def set_current_phase(self, phase_id: str) -> None:
"""Set the current phase ID. Subsequent messages will be stamped with it."""
self._current_phase = phase_id
async def switch_store(self, new_store: ConversationStore) -> None:
"""Switch to a new persistence store at a phase transition.
Subsequent messages are written to *new_store*. Meta (system
prompt, config) is re-persisted on the next write so the new
store's ``meta.json`` reflects the updated prompt.
"""
self._store = new_store
self._meta_persisted = False
await new_store.write_cursor({"next_seq": self._next_seq})
@property
def current_phase(self) -> str | None:
return self._current_phase
@property
def messages(self) -> list[Message]:
"""Return a defensive copy of the message list."""
@@ -216,8 +253,19 @@ class NodeConversation:
# --- Add messages ------------------------------------------------------
async def add_user_message(self, content: str) -> Message:
msg = Message(seq=self._next_seq, role="user", content=content)
async def add_user_message(
self,
content: str,
*,
is_transition_marker: bool = False,
) -> Message:
msg = Message(
seq=self._next_seq,
role="user",
content=content,
phase_id=self._current_phase,
is_transition_marker=is_transition_marker,
)
self._messages.append(msg)
self._next_seq += 1
await self._persist(msg)
@@ -233,6 +281,7 @@ class NodeConversation:
role="assistant",
content=content,
tool_calls=tool_calls,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -251,6 +300,7 @@ class NodeConversation:
content=content,
tool_use_id=tool_use_id,
is_error=is_error,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -380,6 +430,11 @@ class NodeConversation:
spillover filename reference (if any). Message structure (role,
seq, tool_use_id) stays valid for the LLM API.
Phase-aware behavior (continuous mode): when messages have ``phase_id``
metadata, all messages in the current phase are protected regardless of
token budget. Transition markers are never pruned. Older phases' tool
results are pruned more aggressively.
Error tool results are never pruned they prevent re-calling
failing tools.
@@ -388,13 +443,18 @@ class NodeConversation:
if not self._messages:
return 0
# Phase 1: Walk backward, classify tool results as protected vs pruneable
# Walk backward, classify tool results as protected vs pruneable
protected_tokens = 0
pruneable: list[int] = [] # indices into self._messages
pruneable_tokens = 0
for i in range(len(self._messages) - 1, -1, -1):
msg = self._messages[i]
# Transition markers are never pruned (any role)
if msg.is_transition_marker:
continue
if msg.role != "tool":
continue
if msg.is_error:
@@ -402,6 +462,10 @@ class NodeConversation:
if msg.content.startswith("[Pruned tool result"):
continue # already pruned
# Phase-aware: protect current phase messages
if self._current_phase and msg.phase_id == self._current_phase:
continue
est = len(msg.content) // 4
if protected_tokens < protect_tokens:
protected_tokens += est
@@ -409,11 +473,11 @@ class NodeConversation:
pruneable.append(i)
pruneable_tokens += est
# Phase 2: Only prune if enough to be worthwhile
# Only prune if enough to be worthwhile
if pruneable_tokens < min_prune_tokens:
return 0
# Phase 3: Replace content with compact placeholder
# Replace content with compact placeholder
count = 0
for i in pruneable:
msg = self._messages[i]
@@ -436,6 +500,8 @@ class NodeConversation:
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
count += 1
@@ -446,22 +512,38 @@ class NodeConversation:
self._last_api_input_tokens = None
return count
async def compact(self, summary: str, keep_recent: int = 2) -> None:
async def compact(
self,
summary: str,
keep_recent: int = 2,
phase_graduated: bool = False,
) -> None:
"""Replace old messages with a summary, optionally keeping recent ones.
Args:
summary: Caller-provided summary text.
keep_recent: Number of recent messages to preserve (default 2).
Clamped to [0, len(messages) - 1].
phase_graduated: When True and messages have phase_id metadata,
split at phase boundaries instead of using keep_recent.
Keeps current + previous phase intact; compacts older phases.
"""
if not self._messages:
return
# Clamp: must discard at least 1 message
keep_recent = max(0, min(keep_recent, len(self._messages) - 1))
total = len(self._messages)
split = total - keep_recent if keep_recent > 0 else total
# Phase-graduated: find the split point based on phase boundaries.
# Keeps current phase + previous phase intact, compacts older phases.
if phase_graduated and self._current_phase:
split = self._find_phase_graduated_split()
else:
split = None
if split is None:
# Fallback: use keep_recent (non-phase or single-phase conversation)
keep_recent = max(0, min(keep_recent, total - 1))
split = total - keep_recent if keep_recent > 0 else total
# Advance split past orphaned tool results at the boundary.
# Tool-role messages reference a tool_use from the preceding
@@ -470,6 +552,10 @@ class NodeConversation:
while split < total and self._messages[split].role == "tool":
split += 1
# Nothing to compact
if split == 0:
return
old_messages = list(self._messages[:split])
recent_messages = list(self._messages[split:])
@@ -504,6 +590,33 @@ class NodeConversation:
self._messages = [summary_msg] + recent_messages
self._last_api_input_tokens = None # reset; next LLM call will recalibrate
def _find_phase_graduated_split(self) -> int | None:
"""Find split point that preserves current + previous phase.
Returns the index of the first message in the protected set,
or None if phase graduation doesn't apply (< 3 phases).
"""
# Collect distinct phases in order of first appearance
phases_seen: list[str] = []
for msg in self._messages:
if msg.phase_id and msg.phase_id not in phases_seen:
phases_seen.append(msg.phase_id)
# Need at least 3 phases for graduation to be meaningful
# (current + previous are protected, older get compacted)
if len(phases_seen) < 3:
return None
# Protect: current phase + previous phase
protected_phases = {phases_seen[-1], phases_seen[-2]}
# Find split: first message belonging to a protected phase
for i, msg in enumerate(self._messages):
if msg.phase_id in protected_phases:
return i
return None
async def clear(self) -> None:
"""Remove all messages, keep system prompt, preserve ``_next_seq``."""
if self._store:
+177
View File
@@ -0,0 +1,177 @@
"""Level 2 Conversation-Aware Judge.
When a node has `success_criteria` set, the implicit judge upgrades:
after Level 0 passes (all output keys set), a fast LLM call evaluates
whether the conversation actually meets the criteria.
This prevents nodes from "checking boxes" (setting output keys) without
doing quality work. The LLM reads the recent conversation and assesses
whether the phase's goal was genuinely accomplished.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import Any
from framework.graph.conversation import NodeConversation
from framework.llm.provider import LLMProvider
logger = logging.getLogger(__name__)
@dataclass
class PhaseVerdict:
"""Result of Level 2 conversation-aware evaluation."""
action: str # "ACCEPT" or "RETRY"
confidence: float = 0.8
feedback: str = ""
async def evaluate_phase_completion(
llm: LLMProvider,
conversation: NodeConversation,
phase_name: str,
phase_description: str,
success_criteria: str,
accumulator_state: dict[str, Any],
max_history_tokens: int = 8_196,
) -> PhaseVerdict:
"""Level 2 judge: read the conversation and evaluate quality.
Only called after Level 0 passes (all output keys set).
Args:
llm: LLM provider for evaluation
conversation: The current conversation to evaluate
phase_name: Name of the current phase/node
phase_description: Description of the phase
success_criteria: Natural-language criteria for phase completion
accumulator_state: Current output key values
max_history_tokens: Main conversation token budget (judge gets 20%)
Returns:
PhaseVerdict with action and optional feedback
"""
# Build a compact view of the recent conversation
recent_messages = _extract_recent_context(conversation, max_messages=10)
outputs_summary = _format_outputs(accumulator_state)
system_prompt = (
"You are a quality judge evaluating whether a phase of work is complete. "
"Be concise. Evaluate based on the success criteria, not on style."
)
user_prompt = f"""Evaluate this phase:
PHASE: {phase_name}
DESCRIPTION: {phase_description}
SUCCESS CRITERIA:
{success_criteria}
OUTPUTS SET:
{outputs_summary}
RECENT CONVERSATION:
{recent_messages}
Has this phase accomplished its goal based on the success criteria?
Respond in exactly this format:
ACTION: ACCEPT or RETRY
CONFIDENCE: 0.X
FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
try:
response = llm.complete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
max_tokens=max(1024, max_history_tokens // 5),
max_retries=1,
)
if not response.content or not response.content.strip():
logger.debug("Level 2 judge: empty response, accepting by default")
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
return _parse_verdict(response.content)
except Exception as e:
logger.warning(f"Level 2 judge failed, accepting by default: {e}")
# On failure, don't block — Level 0 already passed
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
def _extract_recent_context(conversation: NodeConversation, max_messages: int = 10) -> str:
"""Extract recent conversation messages for evaluation."""
messages = conversation.messages
recent = messages[-max_messages:] if len(messages) > max_messages else messages
parts = []
for msg in recent:
role = msg.role.upper()
content = msg.content or ""
# Truncate long tool results
if msg.role == "tool" and len(content) > 200:
content = content[:200] + "..."
if content.strip():
parts.append(f"[{role}]: {content.strip()}")
return "\n".join(parts) if parts else "(no messages)"
def _format_outputs(accumulator_state: dict[str, Any]) -> str:
"""Format output key values for evaluation.
Lists and dicts get structural formatting so the judge can assess
quantity and structure, not just a truncated stringification.
"""
if not accumulator_state:
return "(none)"
parts = []
for key, value in accumulator_state.items():
if isinstance(value, list):
# Show count + brief per-item preview so the judge can
# verify quantity without the full serialization.
items_preview = []
for i, item in enumerate(value[:8]):
item_str = str(item)
if len(item_str) > 150:
item_str = item_str[:150] + "..."
items_preview.append(f" [{i}]: {item_str}")
val_str = f"list ({len(value)} items):\n" + "\n".join(items_preview)
if len(value) > 8:
val_str += f"\n ... and {len(value) - 8} more"
elif isinstance(value, dict):
val_str = str(value)
if len(val_str) > 400:
val_str = val_str[:400] + "..."
else:
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
parts.append(f" {key}: {val_str}")
return "\n".join(parts)
def _parse_verdict(response: str) -> PhaseVerdict:
"""Parse LLM response into PhaseVerdict."""
action = "ACCEPT"
confidence = 0.8
feedback = ""
for line in response.strip().split("\n"):
line = line.strip()
if line.startswith("ACTION:"):
action_str = line.split(":", 1)[1].strip().upper()
if action_str in ("ACCEPT", "RETRY"):
action = action_str
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(line.split(":", 1)[1].strip())
except ValueError:
pass
elif line.startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return PhaseVerdict(action=action, confidence=confidence, feedback=feedback)
+62 -17
View File
@@ -21,13 +21,20 @@ allowing the LLM to evaluate whether proceeding along an edge makes sense
given the current goal, context, and execution state.
"""
import json
import logging
import re
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, model_validator
from framework.graph.safe_eval import safe_eval
logger = logging.getLogger(__name__)
DEFAULT_MAX_TOKENS = 8192
class EdgeCondition(StrEnum):
"""When an edge should be traversed."""
@@ -156,6 +163,7 @@ class EdgeSpec(BaseModel):
memory: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
if not self.condition_expr:
return True
@@ -172,12 +180,24 @@ class EdgeSpec(BaseModel):
try:
# Safe evaluation using AST-based whitelist
return bool(safe_eval(self.condition_expr, context))
result = bool(safe_eval(self.condition_expr, context))
# Log the evaluation for visibility
# Extract the variable names used in the expression for debugging
expr_vars = {
k: repr(context[k])
for k in context
if k not in ("output", "memory", "result", "true", "false")
and k in self.condition_expr
}
logger.info(
" Edge %s: condition '%s'%s (vars: %s)",
self.id,
self.condition_expr,
result,
expr_vars or "none matched",
)
return result
except Exception as e:
# Log the error for debugging
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ Condition evaluation failed: {self.condition_expr}")
logger.warning(f" Error: {e}")
logger.warning(f" Available context keys: {list(context.keys())}")
@@ -199,8 +219,6 @@ class EdgeSpec(BaseModel):
The LLM evaluates whether proceeding to the target node
is the best next step toward achieving the goal.
"""
import json
# Build context for LLM
prompt = f"""You are evaluating whether to proceed along an edge in an agent workflow.
@@ -236,8 +254,6 @@ Respond with ONLY a JSON object:
)
# Parse response
import re
json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
if json_match:
data = json.loads(json_match.group())
@@ -245,9 +261,6 @@ Respond with ONLY a JSON object:
reasoning = data.get("reasoning", "")
# Log the decision (using basic print for now)
import logging
logger = logging.getLogger(__name__)
logger.info(f" 🤔 LLM routing decision: {'PROCEED' if proceed else 'SKIP'}")
logger.info(f" Reason: {reasoning}")
@@ -255,9 +268,6 @@ Respond with ONLY a JSON object:
except Exception as e:
# Fallback: proceed on success
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ LLM routing failed, defaulting to on_success: {e}")
return source_success
@@ -408,7 +418,7 @@ class GraphSpec(BaseModel):
# Default LLM settings
default_model: str = "claude-haiku-4-5-20251001"
max_tokens: int = 1024
max_tokens: int = Field(default=None) # resolved by _resolve_max_tokens validator
# Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
# If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
@@ -419,12 +429,47 @@ class GraphSpec(BaseModel):
max_steps: int = Field(default=100, description="Maximum node executions before timeout")
max_retries_per_node: int = 3
# EventLoopNode configuration (from configure_loop)
loop_config: dict[str, Any] = Field(
default_factory=dict,
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
)
# Conversation mode
conversation_mode: str = Field(
default="continuous",
description=(
"How conversations flow between event_loop nodes. "
"'continuous' (default): one conversation threads through all "
"event_loop nodes with cumulative tools and layered prompt composition. "
"'isolated': each node gets a fresh conversation."
),
)
identity_prompt: str | None = Field(
default=None,
description=(
"Agent-level identity prompt (Layer 1 of the onion model). "
"In continuous mode, this is the static identity that persists "
"unchanged across all node transitions. In isolated mode, ignored."
),
)
# Metadata
description: str = ""
created_by: str = "" # "human" or "builder_agent"
model_config = {"extra": "allow"}
@model_validator(mode="before")
@classmethod
def _resolve_max_tokens(cls, values: Any) -> Any:
"""Resolve max_tokens from the global config store when not explicitly set."""
if isinstance(values, dict) and values.get("max_tokens") is None:
from framework.config import get_max_tokens
values["max_tokens"] = get_max_tokens()
return values
def get_node(self, node_id: str) -> Any | None:
"""Get a node by ID."""
for node in self.nodes:
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+179 -7
View File
@@ -238,6 +238,16 @@ class NodeSpec(BaseModel):
description="If True, this node streams output to the end user and can request input.",
)
# Phase completion criteria for conversation-aware judge (Level 2)
success_criteria: str | None = Field(
default=None,
description=(
"Natural-language criteria for phase completion. When set, the "
"implicit judge upgrades to Level 2: after output keys are satisfied, "
"a fast LLM evaluates whether the conversation meets these criteria."
),
)
model_config = {"extra": "allow", "arbitrary_types_allowed": True}
@@ -477,6 +487,17 @@ class NodeContext:
attempt: int = 1
max_attempts: int = 3
# Runtime logging (optional)
runtime_logger: Any = None # RuntimeLogger | None — uses Any to avoid import
# Pause control (optional) - asyncio.Event for pause requests
pause_event: Any = None # asyncio.Event | None
# Continuous conversation mode
continuous_mode: bool = False # True when graph has conversation_mode="continuous"
inherited_conversation: Any = None # NodeConversation | None (from prior node)
cumulative_output_keys: list[str] = field(default_factory=list) # All output keys from path
@dataclass
class NodeResult:
@@ -505,6 +526,9 @@ class NodeResult:
# Pydantic validation errors (if any)
validation_errors: list[str] = field(default_factory=list)
# Continuous conversation mode: return conversation for threading to next node
conversation: Any = None # NodeConversation | None
def to_summary(self, node_spec: Any = None) -> str:
"""
Generate a human-readable summary of this node's execution and output.
@@ -854,6 +878,8 @@ Keep the same JSON structure but with shorter content values.
)
start = time.time()
_step_index = 0
_captured_tool_calls: list[dict] = []
try:
# Build messages
@@ -893,6 +919,16 @@ Keep the same JSON structure but with shorter content values.
if len(str(result.content)) > 150:
result_str += "..."
logger.info(f" ✓ Tool result: {result_str}")
# Capture for runtime logging
_captured_tool_calls.append(
{
"tool_use_id": tool_use.id,
"tool_name": tool_use.name,
"tool_input": tool_use.input,
"content": result.content,
"is_error": result.is_error,
}
)
return result
response = ctx.llm.complete_with_tools(
@@ -1072,6 +1108,29 @@ Keep the same JSON structure but with shorter content values.
f"Pydantic validation failed after "
f"{max_validation_retries} retries: {err}"
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=error_msg,
total_steps=_step_index + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=False,
error=error_msg,
@@ -1093,7 +1152,7 @@ Keep the same JSON structure but with shorter content values.
decision_id=decision_id,
success=True,
result=response.content,
tokens_used=response.input_tokens + response.output_tokens,
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
)
@@ -1161,14 +1220,38 @@ Keep the same JSON structure but with shorter content values.
)
# Return failure instead of writing garbage to all keys
_extraction_error = (
f"Output extraction failed: {e}. LLM returned non-JSON response. "
f"Expected keys: {ctx.node_spec.output_keys}"
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=_extraction_error,
total_steps=_step_index + 1,
tokens_used=response.input_tokens + response.output_tokens,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=False,
error=(
f"Output extraction failed: {e}. LLM returned non-JSON response. "
f"Expected keys: {ctx.node_spec.output_keys}"
),
error=_extraction_error,
output={},
tokens_used=response.input_tokens + response.output_tokens,
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
)
# JSON extraction failed completely - still strip code blocks
@@ -1184,10 +1267,33 @@ Keep the same JSON structure but with shorter content values.
ctx.memory.write(key, stripped_content, validate=False)
output[key] = stripped_content
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=True,
total_steps=_step_index + 1,
tokens_used=response.input_tokens + response.output_tokens,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=True,
output=output,
tokens_used=response.input_tokens + response.output_tokens,
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
)
@@ -1199,6 +1305,15 @@ Keep the same JSON structure but with shorter content values.
error=str(e),
latency_ms=latency_ms,
)
if ctx.runtime_logger:
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=str(e),
latency_ms=latency_ms,
)
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
def _parse_output(self, content: str, node_spec: NodeSpec) -> dict[str, Any]:
@@ -1591,6 +1706,9 @@ class RouterNode(NodeProtocol):
async def execute(self, ctx: NodeContext) -> NodeResult:
"""Execute routing logic."""
import time as _time
start = _time.time()
ctx.runtime.set_node(ctx.node_id)
# Build options from routes
@@ -1635,10 +1753,30 @@ class RouterNode(NodeProtocol):
summary=f"Routing to {chosen_route[1]}",
)
latency_ms = int((_time.time() - start) * 1000)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="router",
step_index=0,
llm_text=f"Route: {chosen_route[0]} -> {chosen_route[1]}",
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="router",
success=True,
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(
success=True,
next_node=chosen_route[1],
route_reason=f"Chose route: {chosen_route[0]}",
latency_ms=latency_ms,
)
async def _llm_route(
@@ -1800,6 +1938,22 @@ class FunctionNode(NodeProtocol):
else:
output = {"result": result}
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="function",
step_index=0,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="function",
success=True,
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(success=True, output=output, latency_ms=latency_ms)
except Exception as e:
@@ -1810,4 +1964,22 @@ class FunctionNode(NodeProtocol):
error=str(e),
latency_ms=latency_ms,
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="function",
step_index=0,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="function",
success=False,
error=str(e),
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
+185
View File
@@ -0,0 +1,185 @@
"""Prompt composition for continuous agent mode.
Composes the three-layer system prompt (onion model) and generates
transition markers inserted into the conversation at phase boundaries.
Layer 1 Identity (static, defined at agent level, never changes):
"You are a thorough research agent. You prefer clarity over jargon..."
Layer 2 Narrative (auto-generated from conversation/memory state):
"We've finished scoping the project. The user wants to focus on..."
Layer 3 Focus (per-node system_prompt, reframed as focus directive):
"Your current attention: synthesize findings into a report..."
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec, SharedMemory
logger = logging.getLogger(__name__)
def compose_system_prompt(
identity_prompt: str | None,
focus_prompt: str | None,
narrative: str | None = None,
) -> str:
"""Compose the three-layer system prompt.
Args:
identity_prompt: Layer 1 static agent identity (from GraphSpec).
focus_prompt: Layer 3 per-node focus directive (from NodeSpec.system_prompt).
narrative: Layer 2 auto-generated from conversation state.
Returns:
Composed system prompt with all layers present.
"""
parts: list[str] = []
# Layer 1: Identity (always first, anchors the personality)
if identity_prompt:
parts.append(identity_prompt)
# Layer 2: Narrative (what's happened so far)
if narrative:
parts.append(f"\n--- Context (what has happened so far) ---\n{narrative}")
# Layer 3: Focus (current phase directive)
if focus_prompt:
parts.append(f"\n--- Current Focus ---\n{focus_prompt}")
return "\n".join(parts) if parts else ""
def build_narrative(
memory: SharedMemory,
execution_path: list[str],
graph: GraphSpec,
) -> str:
"""Build Layer 2 (narrative) from structured state.
Deterministic no LLM call. Reads SharedMemory and execution path
to describe what has happened so far. Cheap and fast.
Args:
memory: Current shared memory state.
execution_path: List of node IDs visited so far.
graph: Graph spec (for node names/descriptions).
Returns:
Narrative string describing the session state.
"""
parts: list[str] = []
# Describe execution path
if execution_path:
phase_descriptions: list[str] = []
for node_id in execution_path:
node_spec = graph.get_node(node_id)
if node_spec:
phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
else:
phase_descriptions.append(f"- {node_id}")
parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
# Describe key memory values (skip very long values)
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
memory_lines.append(f"- {key}: {val_str}")
if memory_lines:
parts.append("Current state:\n" + "\n".join(memory_lines))
return "\n\n".join(parts) if parts else ""
def build_transition_marker(
previous_node: NodeSpec,
next_node: NodeSpec,
memory: SharedMemory,
cumulative_tool_names: list[str],
data_dir: Path | str | None = None,
) -> str:
"""Build a 'State of the World' transition marker.
Inserted into the conversation as a user message at phase boundaries.
Gives the LLM full situational awareness: what happened, what's stored,
what tools are available, and what to focus on next.
Args:
previous_node: NodeSpec of the phase just completed.
next_node: NodeSpec of the phase about to start.
memory: Current shared memory state.
cumulative_tool_names: All tools available (cumulative set).
data_dir: Path to spillover data directory.
Returns:
Transition marker message text.
"""
sections: list[str] = []
# Header
sections.append(f"--- PHASE TRANSITION: {previous_node.name}{next_node.name} ---")
# What just completed
sections.append(f"\nCompleted: {previous_node.name}")
sections.append(f" {previous_node.description}")
# Outputs in memory
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
memory_lines.append(f" {key}: {val_str}")
if memory_lines:
sections.append("\nOutputs available:\n" + "\n".join(memory_lines))
# Files in data directory
if data_dir:
data_path = Path(data_dir)
if data_path.exists():
files = sorted(data_path.iterdir())
if files:
file_lines = [
f" {f.name} ({f.stat().st_size:,} bytes)" for f in files if f.is_file()
]
if file_lines:
sections.append(
"\nData files (use load_data to access):\n" + "\n".join(file_lines)
)
# Available tools
if cumulative_tool_names:
sections.append("\nAvailable tools: " + ", ".join(sorted(cumulative_tool_names)))
# Next phase
sections.append(f"\nNow entering: {next_node.name}")
sections.append(f" {next_node.description}")
# Reflection prompt (engineered metacognition)
sections.append(
"\nBefore proceeding, briefly reflect: what went well in the "
"previous phase? Are there any gaps or surprises worth noting?"
)
sections.append("\n--- END TRANSITION ---")
return "\n".join(sections)
+1 -1
View File
@@ -145,7 +145,7 @@ class SafeEvalVisitor(ast.NodeVisitor):
def visit_Attribute(self, node: ast.Attribute) -> Any:
# value.attr
# STIRCT CHECK: No access to private attributes (starting with _)
# STRICT CHECK: No access to private attributes (starting with _)
if node.attr.startswith("_"):
raise ValueError(f"Access to private attribute '{node.attr}' is not allowed")
+2
View File
@@ -70,6 +70,7 @@ class AnthropicProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion from Claude (via LiteLLM)."""
return self._provider.complete(
@@ -79,6 +80,7 @@ class AnthropicProvider(LLMProvider):
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
def complete_with_tools(
+19 -10
View File
@@ -150,10 +150,13 @@ class LiteLLMProvider(LLMProvider):
"LiteLLM is not installed. Please install it with: uv pip install litellm"
)
def _completion_with_rate_limit_retry(self, **kwargs: Any) -> Any:
def _completion_with_rate_limit_retry(
self, max_retries: int | None = None, **kwargs: Any
) -> Any:
"""Call litellm.completion with retry on 429 rate limit errors and empty responses."""
model = kwargs.get("model", self.model)
for attempt in range(RATE_LIMIT_MAX_RETRIES + 1):
retries = max_retries if max_retries is not None else RATE_LIMIT_MAX_RETRIES
for attempt in range(retries + 1):
try:
response = litellm.completion(**kwargs) # type: ignore[union-attr]
@@ -194,9 +197,9 @@ class LiteLLMProvider(LLMProvider):
f"Full request dumped to: {dump_path}"
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0})"
@@ -209,7 +212,7 @@ class LiteLLMProvider(LLMProvider):
f"choices={len(response.choices) if response.choices else 0}) — "
f"likely rate limited or quota exceeded. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
continue
@@ -225,9 +228,9 @@ class LiteLLMProvider(LLMProvider):
error_type="rate_limit",
attempt=attempt,
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — rate limit error: {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
@@ -239,7 +242,7 @@ class LiteLLMProvider(LLMProvider):
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
# unreachable, but satisfies type checker
@@ -253,6 +256,7 @@ class LiteLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion using LiteLLM."""
# Prepare messages with system prompt
@@ -293,12 +297,17 @@ class LiteLLMProvider(LLMProvider):
kwargs["response_format"] = response_format
# Make the call
response = self._completion_with_rate_limit_retry(**kwargs)
response = self._completion_with_rate_limit_retry(max_retries=max_retries, **kwargs)
# Extract content
content = response.choices[0].message.content or ""
# Get usage info
# Get usage info.
# NOTE: completion_tokens includes reasoning/thinking tokens for models
# that use them (o1, gpt-5-mini, etc.). LiteLLM does not reliably expose
# usage.completion_tokens_details.reasoning_tokens across all providers.
# This means output_tokens may be inflated for reasoning models.
# Compaction is unaffected — it uses prompt_tokens (input-side only).
usage = response.usage
input_tokens = usage.prompt_tokens if usage else 0
output_tokens = usage.completion_tokens if usage else 0
+1
View File
@@ -120,6 +120,7 @@ class MockLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a mock completion without calling a real LLM.
+3
View File
@@ -65,6 +65,7 @@ class LLMProvider(ABC):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a completion from the LLM.
@@ -79,6 +80,8 @@ class LLMProvider(ABC):
- {"type": "json_schema", "json_schema": {"name": "...", "schema": {...}}}
for strict JSON schema enforcement
json_mode: If True, request structured JSON output from the LLM
max_retries: Override retry count for rate-limit/empty-response retries.
None uses the provider default.
Returns:
LLMResponse with content and metadata
+576 -43
View File
@@ -8,21 +8,41 @@ Usage:
"""
import json
import logging
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Annotated
from mcp.server import FastMCP
# Project root resolution. This file lives at core/framework/mcp/agent_builder_server.py,
# so the project root (where exports/ lives) is four parents up.
_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent.parent
from framework.graph import Constraint, EdgeCondition, EdgeSpec, Goal, NodeSpec, SuccessCriterion
from framework.graph.plan import Plan
# Ensure exports/ is on sys.path so AgentRunner can import agent modules.
_exports_dir = _PROJECT_ROOT / "exports"
if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
sys.path.insert(0, str(_exports_dir))
del _exports_dir
from mcp.server import FastMCP # noqa: E402
from pydantic import ValidationError # noqa: E402
from framework.graph import ( # noqa: E402
Constraint,
EdgeCondition,
EdgeSpec,
Goal,
NodeSpec,
SuccessCriterion,
)
from framework.graph.plan import Plan # noqa: E402
# Testing framework imports
from framework.testing.prompts import (
from framework.testing.prompts import ( # noqa: E402
PYTEST_TEST_FILE_HEADER,
)
from framework.utils.io import atomic_write
from framework.utils.io import atomic_write # noqa: E402
# Initialize MCP server
mcp = FastMCP("agent-builder")
@@ -159,8 +179,8 @@ def _load_active_session() -> BuildSession | None:
if session_id:
return _load_session(session_id)
except Exception:
pass
except Exception as e:
logging.warning("Failed to load active session: %s", e)
return None
@@ -525,6 +545,9 @@ def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]:
"""
Validate and normalize agent_path.
Resolves relative paths against _PROJECT_ROOT since the MCP server's
cwd (core/) differs from the user's cwd (project root).
Returns:
(Path, None) if valid
(None, error_json) if invalid
@@ -539,6 +562,12 @@ def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]:
path = Path(agent_path)
# Resolve relative paths against project root (not MCP server's cwd)
if not path.is_absolute() and not path.exists():
resolved = _PROJECT_ROOT / path
if resolved.exists():
path = resolved
if not path.exists():
return None, json.dumps(
{
@@ -569,7 +598,11 @@ def add_node(
str, "JSON object mapping conditions to target node IDs for router nodes"
] = "{}",
client_facing: Annotated[
bool, "If True, node streams output to user and blocks for input between turns"
bool,
"If True, an ask_user() tool is injected so the LLM can explicitly request user input. "
"The node blocks ONLY when ask_user() is called — text-only turns stream freely. "
"Set True for nodes that interact with users (intake, review, approval). "
"Nodes that do autonomous work (research, data processing, API calls) MUST be False.",
] = False,
nullable_output_keys: Annotated[
str, "JSON array of output keys that may remain unset (for mutually exclusive outputs)"
@@ -650,6 +683,14 @@ def add_node(
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# Warn about client_facing on nodes with tools (likely autonomous work)
if node_type == "event_loop" and client_facing and tools_list:
warnings.append(
f"Node '{node_id}' is client_facing=True but has tools {tools_list}. "
"Nodes with tools typically do autonomous work and should be "
"client_facing=False. Only set True if this node needs user approval."
)
# nullable_output_keys must be a subset of output_keys
if nullable_output_keys_list:
invalid_nullable = [k for k in nullable_output_keys_list if k not in output_keys_list]
@@ -1360,6 +1401,17 @@ def validate_graph() -> str:
f"Node '{dn['node_id']}' uses deprecated type '{dn['type']}'. Use 'event_loop' instead."
)
# Warn if all event_loop nodes are client_facing (common misconfiguration)
el_nodes = [n for n in session.nodes if n.node_type == "event_loop"]
cf_el_nodes = [n for n in el_nodes if n.client_facing]
if len(el_nodes) > 1 and len(cf_el_nodes) == len(el_nodes):
warnings.append(
f"ALL {len(el_nodes)} event_loop nodes are client_facing=True. "
"This injects ask_user() on every node. Only nodes that need user "
"interaction (intake, review, approval) should be client_facing. Set "
"client_facing=False on autonomous processing nodes."
)
# Collect summary info
event_loop_nodes = [n.id for n in session.nodes if n.node_type == "event_loop"]
client_facing_nodes = [n.id for n in session.nodes if n.client_facing]
@@ -1817,6 +1869,85 @@ def export_graph() -> str:
)
@mcp.tool()
def import_from_export(
agent_json_path: Annotated[str, "Path to the agent.json file to import"],
) -> str:
"""
Import an agent definition from an exported agent.json file into the current build session.
Reads the agent.json, parses goal/nodes/edges, and populates the current session.
This is the reverse of export_graph().
Args:
agent_json_path: Path to the agent.json file to import
Returns:
JSON summary of what was imported (goal name, node count, edge count)
"""
session = get_session()
path = Path(agent_json_path)
if not path.exists():
return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
try:
data = json.loads(path.read_text())
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
try:
# Parse goal (same pattern as BuildSession.from_dict lines 88-99)
goal_data = data.get("goal")
if goal_data:
session.goal = Goal(
id=goal_data["id"],
name=goal_data["name"],
description=goal_data["description"],
success_criteria=[
SuccessCriterion(**sc) for sc in goal_data.get("success_criteria", [])
],
constraints=[Constraint(**c) for c in goal_data.get("constraints", [])],
)
# Parse nodes (same pattern as BuildSession.from_dict line 102)
graph_data = data.get("graph", {})
nodes_data = graph_data.get("nodes", [])
session.nodes = [NodeSpec(**n) for n in nodes_data]
# Parse edges (same pattern as BuildSession.from_dict lines 105-118)
edges_data = graph_data.get("edges", [])
session.edges = []
for e in edges_data:
condition_str = e.get("condition")
if isinstance(condition_str, str):
condition_map = {
"always": EdgeCondition.ALWAYS,
"on_success": EdgeCondition.ON_SUCCESS,
"on_failure": EdgeCondition.ON_FAILURE,
"conditional": EdgeCondition.CONDITIONAL,
"llm_decide": EdgeCondition.LLM_DECIDE,
}
e["condition"] = condition_map.get(condition_str, EdgeCondition.ON_SUCCESS)
session.edges.append(EdgeSpec(**e))
except (KeyError, TypeError, ValueError, ValidationError) as e:
return json.dumps({"success": False, "error": f"Malformed agent.json: {e}"})
# Persist updated session
_save_session(session)
return json.dumps(
{
"success": True,
"goal": session.goal.name if session.goal else None,
"nodes_count": len(session.nodes),
"edges_count": len(session.edges),
"node_ids": [n.id for n in session.nodes],
"edge_ids": [e.id for e in session.edges],
}
)
@mcp.tool()
def get_session_status() -> str:
"""Get the current status of the build session."""
@@ -1853,12 +1984,19 @@ def configure_loop(
max_history_tokens: Annotated[
int, "Maximum conversation history tokens before compaction (default 32000)"
] = 32000,
tool_call_overflow_margin: Annotated[
float,
"Overflow margin for max_tool_calls_per_turn. "
"Tool calls are only discarded when count exceeds "
"max_tool_calls_per_turn * (1 + margin). Default 0.5 (50% wiggle room)",
] = 0.5,
) -> str:
"""Configure event loop parameters for EventLoopNode execution.
These settings control how EventLoopNodes behave at runtime:
- max_iterations: prevents infinite loops
- max_tool_calls_per_turn: limits tool calls per LLM response
- tool_call_overflow_margin: wiggle room before tool calls are discarded (default 50%)
- stall_detection_threshold: detects when LLM repeats itself
- max_history_tokens: triggers conversation compaction
"""
@@ -1867,6 +2005,7 @@ def configure_loop(
session.loop_config = {
"max_iterations": max_iterations,
"max_tool_calls_per_turn": max_tool_calls_per_turn,
"tool_call_overflow_margin": tool_call_overflow_margin,
"stall_detection_threshold": stall_detection_threshold,
"max_history_tokens": max_history_tokens,
}
@@ -2189,7 +2328,7 @@ def test_node(
)
else:
cf_note = (
"Node is client-facing: will block for user input between turns. "
"Node is client-facing: has ask_user() tool, blocks when LLM calls it. "
if node_spec.client_facing
else ""
)
@@ -2892,18 +3031,15 @@ def _format_success_criteria(criteria: list[SuccessCriterion]) -> str:
# Test template for Claude to use when writing tests
CONSTRAINT_TEST_TEMPLATE = '''@pytest.mark.asyncio
async def test_constraint_{constraint_id}_{scenario}(mock_mode):
async def test_constraint_{constraint_id}_{scenario}(runner, auto_responder, mock_mode):
"""Test: {description}"""
result = await default_agent.run({{"key": "value"}}, mock_mode=mock_mode)
# IMPORTANT: result is an ExecutionResult object with these attributes:
# - result.success: bool - whether the agent succeeded
# - result.output: dict - the agent's output data (access data here!)
# - result.error: str or None - error message if failed
await auto_responder.start()
try:
result = await runner.run({{"key": "value"}})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {{result.error}}"
# Access output data via result.output
output_data = result.output or {{}}
# Add constraint-specific assertions here
@@ -2911,18 +3047,15 @@ async def test_constraint_{constraint_id}_{scenario}(mock_mode):
'''
SUCCESS_TEST_TEMPLATE = '''@pytest.mark.asyncio
async def test_success_{criteria_id}_{scenario}(mock_mode):
async def test_success_{criteria_id}_{scenario}(runner, auto_responder, mock_mode):
"""Test: {description}"""
result = await default_agent.run({{"key": "value"}}, mock_mode=mock_mode)
# IMPORTANT: result is an ExecutionResult object with these attributes:
# - result.success: bool - whether the agent succeeded
# - result.output: dict - the agent's output data (access data here!)
# - result.error: str or None - error message if failed
await auto_responder.start()
try:
result = await runner.run({{"key": "value"}})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {{result.error}}"
# Access output data via result.output
output_data = result.output or {{}}
# Add success criteria-specific assertions here
@@ -2978,7 +3111,6 @@ def generate_constraint_tests(
test_type="Constraint",
agent_name=agent_module,
description=f"Tests for constraints defined in goal: {goal.name}",
agent_module=agent_module,
)
# Return guidelines + data for Claude to write tests directly
@@ -2994,14 +3126,22 @@ def generate_constraint_tests(
"max_tests": 5,
"naming_convention": "test_constraint_<constraint_id>_<scenario>",
"required_decorator": "@pytest.mark.asyncio",
"required_fixture": "mock_mode",
"agent_call_pattern": "await default_agent.run(input_dict, mock_mode=mock_mode)",
"required_fixtures": "runner, auto_responder, mock_mode",
"agent_call_pattern": "await runner.run(input_dict)",
"auto_responder_pattern": (
"await auto_responder.start()\n"
"try:\n"
" result = await runner.run(input_dict)\n"
"finally:\n"
" await auto_responder.stop()"
),
"result_type": "ExecutionResult with .success, .output (dict), .error",
"critical_rules": [
"Every test function MUST be async with @pytest.mark.asyncio",
"Every test MUST accept mock_mode as a parameter",
"Use await default_agent.run(input, mock_mode=mock_mode)",
"default_agent is already imported - do NOT add imports",
"Every test MUST accept runner, auto_responder, and mock_mode fixtures",
"Use await runner.run(input) -- NOT default_agent.run()",
"Start auto_responder before running, stop in finally block",
"runner and auto_responder are from conftest.py -- do NOT import them",
"NEVER call result.get() - use result.output.get() instead",
"Always check result.success before accessing result.output",
],
@@ -3065,7 +3205,6 @@ def generate_success_tests(
test_type="Success criteria",
agent_name=agent_module,
description=f"Tests for success criteria defined in goal: {goal.name}",
agent_module=agent_module,
)
# Return guidelines + data for Claude to write tests directly
@@ -3087,14 +3226,22 @@ def generate_success_tests(
"max_tests": 12,
"naming_convention": "test_success_<criteria_id>_<scenario>",
"required_decorator": "@pytest.mark.asyncio",
"required_fixture": "mock_mode",
"agent_call_pattern": "await default_agent.run(input_dict, mock_mode=mock_mode)",
"required_fixtures": "runner, auto_responder, mock_mode",
"agent_call_pattern": "await runner.run(input_dict)",
"auto_responder_pattern": (
"await auto_responder.start()\n"
"try:\n"
" result = await runner.run(input_dict)\n"
"finally:\n"
" await auto_responder.stop()"
),
"result_type": "ExecutionResult with .success, .output (dict), .error",
"critical_rules": [
"Every test function MUST be async with @pytest.mark.asyncio",
"Every test MUST accept mock_mode as a parameter",
"Use await default_agent.run(input, mock_mode=mock_mode)",
"default_agent is already imported - do NOT add imports",
"Every test MUST accept runner, auto_responder, and mock_mode fixtures",
"Use await runner.run(input) -- NOT default_agent.run()",
"Start auto_responder before running, stop in finally block",
"runner and auto_responder are from conftest.py -- do NOT import them",
"NEVER call result.get() - use result.output.get() instead",
"Always check result.success before accessing result.output",
],
@@ -3191,11 +3338,13 @@ def run_tests(
# Add short traceback and quiet summary
cmd.append("--tb=short")
# Set PYTHONPATH to project root so agents can import from core.framework
# Set PYTHONPATH so framework and agent packages are importable
env = os.environ.copy()
pythonpath = env.get("PYTHONPATH", "")
project_root = Path(__file__).parent.parent.parent.parent.resolve()
env["PYTHONPATH"] = f"{project_root}:{pythonpath}"
core_path = project_root / "core"
exports_path = project_root / "exports"
env["PYTHONPATH"] = f"{core_path}:{exports_path}:{project_root}:{pythonpath}"
# Run pytest
try:
@@ -3665,7 +3814,11 @@ def check_missing_credentials(
from framework.runner import AgentRunner
runner = AgentRunner.load(agent_path)
path, err = _validate_agent_path(agent_path)
if err:
return err
runner = AgentRunner.load(str(path))
runner.validate()
store = _get_credential_store()
@@ -3865,7 +4018,11 @@ def verify_credentials(
try:
from framework.runner import AgentRunner
runner = AgentRunner.load(agent_path)
path, err = _validate_agent_path(agent_path)
if err:
return err
runner = AgentRunner.load(str(path))
validation = runner.validate()
return json.dumps(
@@ -3882,6 +4039,382 @@ def verify_credentials(
return json.dumps({"error": str(e)})
# =============================================================================
# SESSION & CHECKPOINT TOOLS (read-only, no build session required)
# =============================================================================
_MAX_DIFF_VALUE_LEN = 500
def _read_session_json(path: Path) -> dict | None:
"""Read a JSON file, returning None on failure."""
if not path.exists():
return None
try:
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return None
def _scan_agent_sessions(agent_work_dir: Path) -> list[tuple[str, Path]]:
"""Find session directories with state.json, sorted most-recent-first."""
sessions: list[tuple[str, Path]] = []
sessions_dir = agent_work_dir / "sessions"
if not sessions_dir.exists():
return sessions
for session_dir in sessions_dir.iterdir():
if session_dir.is_dir() and session_dir.name.startswith("session_"):
state_path = session_dir / "state.json"
if state_path.exists():
sessions.append((session_dir.name, state_path))
sessions.sort(key=lambda t: t[0], reverse=True)
return sessions
def _truncate_value(value: object, max_len: int = _MAX_DIFF_VALUE_LEN) -> object:
"""Truncate a value's JSON representation if too long."""
s = json.dumps(value, default=str)
if len(s) <= max_len:
return value
return {"_truncated": True, "_preview": s[:max_len] + "...", "_length": len(s)}
@mcp.tool()
def list_agent_sessions(
agent_work_dir: Annotated[
str,
"Path to the agent's working directory (e.g., ~/.hive/agents/my_agent)",
],
status: Annotated[
str,
"Filter by status: 'active', 'paused', 'completed', 'failed', 'cancelled'. Empty for all.",
] = "",
limit: Annotated[int, "Maximum number of results (default 20)"] = 20,
offset: Annotated[int, "Number of sessions to skip for pagination"] = 0,
) -> str:
"""
List sessions for an agent with optional status filter.
Use this to discover which sessions exist, find resumable sessions,
or identify failed sessions for debugging. Combines well with
query_runtime_logs for correlating session state with log data.
"""
work_dir = Path(agent_work_dir)
all_sessions = _scan_agent_sessions(work_dir)
if not all_sessions:
return json.dumps({"sessions": [], "total": 0, "offset": offset, "limit": limit})
summaries = []
for session_id, state_path in all_sessions:
data = _read_session_json(state_path)
if data is None:
continue
session_status = data.get("status", "")
if status and session_status != status:
continue
timestamps = data.get("timestamps", {})
progress = data.get("progress", {})
checkpoint_dir = state_path.parent / "checkpoints"
summaries.append(
{
"session_id": session_id,
"status": session_status,
"goal_id": data.get("goal_id", ""),
"started_at": timestamps.get("started_at", ""),
"updated_at": timestamps.get("updated_at", ""),
"completed_at": timestamps.get("completed_at"),
"is_resumable": data.get("is_resumable", False),
"is_resumable_from_checkpoint": data.get("is_resumable_from_checkpoint", False),
"current_node": progress.get("current_node"),
"paused_at": progress.get("paused_at"),
"steps_executed": progress.get("steps_executed", 0),
"execution_quality": progress.get("execution_quality", ""),
"has_checkpoints": checkpoint_dir.exists()
and any(checkpoint_dir.glob("cp_*.json")),
}
)
total = len(summaries)
page = summaries[offset : offset + limit]
return json.dumps(
{"sessions": page, "total": total, "offset": offset, "limit": limit}, indent=2
)
@mcp.tool()
def get_agent_session_state(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID (e.g., 'session_20260208_143022_abc12345')"],
) -> str:
"""
Load full session state for a specific session.
Returns complete session data including status, progress, result,
metrics, and checkpoint info. Memory values are excluded to prevent
context bloat -- use get_agent_session_memory to retrieve memory contents.
"""
state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
data = _read_session_json(state_path)
if data is None:
return json.dumps({"error": f"Session not found: {session_id}"})
memory = data.get("memory", {})
data["memory_keys"] = list(memory.keys()) if isinstance(memory, dict) else []
data["memory_size"] = len(memory) if isinstance(memory, dict) else 0
data.pop("memory", None)
return json.dumps(data, indent=2, default=str)
@mcp.tool()
def get_agent_session_memory(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID"],
key: Annotated[str, "Specific memory key to retrieve. Empty for all."] = "",
) -> str:
"""
Get memory contents from a session.
Memory stores intermediate results passed between nodes. Use this
to inspect what data was produced during execution.
If key is provided, returns only that memory key's value.
If key is empty, returns all memory keys and their values.
"""
state_path = Path(agent_work_dir) / "sessions" / session_id / "state.json"
data = _read_session_json(state_path)
if data is None:
return json.dumps({"error": f"Session not found: {session_id}"})
memory = data.get("memory", {})
if not isinstance(memory, dict):
memory = {}
if key:
if key not in memory:
return json.dumps(
{
"error": f"Memory key not found: '{key}'",
"available_keys": list(memory.keys()),
}
)
value = memory[key]
return json.dumps(
{
"session_id": session_id,
"key": key,
"value": value,
"value_type": type(value).__name__,
},
indent=2,
default=str,
)
return json.dumps(
{"session_id": session_id, "memory": memory, "total_keys": len(memory)},
indent=2,
default=str,
)
@mcp.tool()
def list_agent_checkpoints(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID to list checkpoints for"],
checkpoint_type: Annotated[
str,
"Filter by type: 'node_start', 'node_complete', 'loop_iteration'. Empty for all.",
] = "",
is_clean: Annotated[str, "Filter by clean status: 'true', 'false', or empty for all."] = "",
) -> str:
"""
List checkpoints for a specific session.
Checkpoints capture execution state at node boundaries for
crash recovery and resume. Use with get_agent_checkpoint for
detailed checkpoint inspection.
"""
session_dir = Path(agent_work_dir) / "sessions" / session_id
checkpoint_dir = session_dir / "checkpoints"
if not session_dir.exists():
return json.dumps({"error": f"Session not found: {session_id}"})
if not checkpoint_dir.exists():
return json.dumps(
{
"session_id": session_id,
"checkpoints": [],
"total": 0,
"latest_checkpoint_id": None,
}
)
# Try index.json first
index_data = _read_session_json(checkpoint_dir / "index.json")
if index_data and "checkpoints" in index_data:
checkpoints = index_data["checkpoints"]
else:
# Fallback: scan individual checkpoint files
checkpoints = []
for cp_file in sorted(checkpoint_dir.glob("cp_*.json")):
cp_data = _read_session_json(cp_file)
if cp_data:
checkpoints.append(
{
"checkpoint_id": cp_data.get("checkpoint_id", cp_file.stem),
"checkpoint_type": cp_data.get("checkpoint_type", ""),
"created_at": cp_data.get("created_at", ""),
"current_node": cp_data.get("current_node"),
"next_node": cp_data.get("next_node"),
"is_clean": cp_data.get("is_clean", True),
"description": cp_data.get("description", ""),
}
)
# Apply filters
if checkpoint_type:
checkpoints = [c for c in checkpoints if c.get("checkpoint_type") == checkpoint_type]
if is_clean:
clean_val = is_clean.lower() == "true"
checkpoints = [c for c in checkpoints if c.get("is_clean") == clean_val]
latest_id = None
if index_data:
latest_id = index_data.get("latest_checkpoint_id")
elif checkpoints:
latest_id = checkpoints[-1].get("checkpoint_id")
return json.dumps(
{
"session_id": session_id,
"checkpoints": checkpoints,
"total": len(checkpoints),
"latest_checkpoint_id": latest_id,
},
indent=2,
)
@mcp.tool()
def get_agent_checkpoint(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID"],
checkpoint_id: Annotated[str, "Specific checkpoint ID, or empty for latest"] = "",
) -> str:
"""
Load a specific checkpoint with full state data.
Returns the complete checkpoint including shared memory snapshot,
execution path, accumulated outputs, and metrics. If checkpoint_id
is empty, loads the latest checkpoint.
"""
session_dir = Path(agent_work_dir) / "sessions" / session_id
checkpoint_dir = session_dir / "checkpoints"
if not checkpoint_dir.exists():
return json.dumps({"error": f"No checkpoints found for session: {session_id}"})
if not checkpoint_id:
index_data = _read_session_json(checkpoint_dir / "index.json")
if index_data and index_data.get("latest_checkpoint_id"):
checkpoint_id = index_data["latest_checkpoint_id"]
else:
cp_files = sorted(checkpoint_dir.glob("cp_*.json"))
if not cp_files:
return json.dumps({"error": f"No checkpoints found for session: {session_id}"})
checkpoint_id = cp_files[-1].stem
cp_path = checkpoint_dir / f"{checkpoint_id}.json"
data = _read_session_json(cp_path)
if data is None:
return json.dumps({"error": f"Checkpoint not found: {checkpoint_id}"})
return json.dumps(data, indent=2, default=str)
@mcp.tool()
def compare_agent_checkpoints(
agent_work_dir: Annotated[str, "Path to the agent's working directory"],
session_id: Annotated[str, "The session ID"],
checkpoint_id_before: Annotated[str, "The earlier checkpoint ID"],
checkpoint_id_after: Annotated[str, "The later checkpoint ID"],
) -> str:
"""
Compare memory state between two checkpoints.
Shows what memory keys were added, removed, or changed between
two points in execution. Useful for understanding how data flows
through the agent graph.
"""
checkpoint_dir = Path(agent_work_dir) / "sessions" / session_id / "checkpoints"
before = _read_session_json(checkpoint_dir / f"{checkpoint_id_before}.json")
if before is None:
return json.dumps({"error": f"Checkpoint not found: {checkpoint_id_before}"})
after = _read_session_json(checkpoint_dir / f"{checkpoint_id_after}.json")
if after is None:
return json.dumps({"error": f"Checkpoint not found: {checkpoint_id_after}"})
mem_before = before.get("shared_memory", {})
mem_after = after.get("shared_memory", {})
keys_before = set(mem_before.keys())
keys_after = set(mem_after.keys())
added = {k: _truncate_value(mem_after[k]) for k in keys_after - keys_before}
removed = list(keys_before - keys_after)
unchanged = []
changed = {}
for k in keys_before & keys_after:
if mem_before[k] == mem_after[k]:
unchanged.append(k)
else:
changed[k] = {
"before": _truncate_value(mem_before[k]),
"after": _truncate_value(mem_after[k]),
}
path_before = before.get("execution_path", [])
path_after = after.get("execution_path", [])
new_nodes = path_after[len(path_before) :]
return json.dumps(
{
"session_id": session_id,
"before": {
"checkpoint_id": checkpoint_id_before,
"current_node": before.get("current_node"),
"created_at": before.get("created_at", ""),
},
"after": {
"checkpoint_id": checkpoint_id_after,
"current_node": after.get("current_node"),
"created_at": after.get("created_at", ""),
},
"memory_diff": {
"added": added,
"removed": removed,
"changed": changed,
"unchanged": unchanged,
},
"execution_path_diff": {
"new_nodes": new_nodes,
"path_before": path_before,
"path_after": path_after,
},
},
indent=2,
default=str,
)
# =============================================================================
# MAIN
# =============================================================================
+236
View File
@@ -0,0 +1,236 @@
# Observability - Structured Logging
## Configuration via Environment Variables
Control logging format using environment variables:
```bash
# JSON logging (production) - Machine-parseable, one line per log
export LOG_FORMAT=json
python -m my_agent run
# Human-readable (development) - Color-coded, easy to read
# Default if LOG_FORMAT is not set
python -m my_agent run
```
**Alternative:** Set `ENV=production` to automatically use JSON format:
```bash
export ENV=production
python -m my_agent run
```
---
## Overview
The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
**Features:**
- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
- **Dual output modes**: JSON for production, human-readable for development
- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
## Quick Start
Logging is automatically configured when you use `AgentRunner`. No setup required:
```python
from framework.runner import AgentRunner
runner = AgentRunner(graph=my_graph, goal=my_goal)
result = await runner.run({"input": "data"})
# Logs automatically include trace_id, execution_id, agent_id, etc.
```
## Programmatic Configuration
Configure logging explicitly in your code:
```python
from framework.observability import configure_logging
# Human-readable (development)
configure_logging(level="DEBUG", format="human")
# JSON (production)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
```
### Configuration Options
- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
- **format**:
- `"json"` - Machine-parseable JSON (one line per log entry)
- `"human"` - Human-readable with colors
- `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
## Log Format Examples
### JSON Format (Machine-parseable)
```json
{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "54e80d7b5bd6409dbc3217e5cd16a4fd", "execution_id": "b4c348ec54e80d7b5bd6409dbc3217e50", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
```
**Features:**
- `trace_id` and `execution_id` are 32 hex chars (W3C/OTel-aligned, no prefixes)
- Compact single-line format (easy to stream/parse)
- All trace context fields included automatically
### Human-Readable Format (Development)
```
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
```
**Features:**
- Color-coded log levels
- Shortened IDs for readability (first 8 chars)
- Context prefix shows trace correlation
## Trace Context Fields
When the framework sets trace context, these fields are included in all logs. IDs are 32 hex (W3C/OTel-aligned, no prefixes).
- **trace_id**: Trace identifier
- **execution_id**: Run/session correlation
- **agent_id**: Agent/graph identifier
- **goal_id**: Goal being pursued
- **node_id**: Current node (when set)
## Custom Log Fields
Add custom fields using the `extra` parameter:
```python
import logging
logger = logging.getLogger("my_module")
# Add custom fields
logger.info("LLM call completed", extra={
"latency_ms": 1250,
"tokens_used": 450,
"model": "claude-3-5-sonnet-20241022",
"node_id": "web-search"
})
```
These fields appear in both JSON and human-readable formats.
## Usage in Your Code
### Standard Logging (Recommended)
Just use Python's standard logging - context is automatic:
```python
import logging
logger = logging.getLogger(__name__)
def my_function():
# This log automatically includes trace_id, execution_id, etc.
logger.info("Processing data")
try:
result = do_work()
logger.info("Work completed", extra={"result_count": len(result)})
except Exception as e:
logger.error("Work failed", exc_info=True)
```
### Framework-Managed Context
The framework automatically sets trace context at key points:
- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
- **GraphExecutor.execute()**: Adds `agent_id`
- **Node execution**: Adds `node_id`
Propagation is automatic via ContextVar.
## Advanced Usage
### Manual Context Management
If you need to set trace context manually (rare):
```python
from framework.observability import set_trace_context, get_trace_context
# Set context (32-hex, no prefixes)
set_trace_context(
trace_id="54e80d7b5bd6409dbc3217e5cd16a4fd",
execution_id="b4c348ec54e80d7b5bd6409dbc3217e50",
agent_id="my-agent"
)
# Get current context
context = get_trace_context()
print(context["execution_id"])
# Clear context (usually not needed)
from framework.observability import clear_trace_context
clear_trace_context()
```
### Testing
For tests, you may want to configure logging explicitly:
```python
import pytest
from framework.observability import configure_logging
@pytest.fixture(autouse=True)
def setup_logging():
configure_logging(level="DEBUG", format="human")
```
## Best Practices
1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
2. **Development**: Use human-readable format (default)
3. **Don't manually set context**: Let the framework manage it
4. **Use standard logging**: No special APIs needed - just `logger.info()`
5. **Add custom fields**: Use `extra` dict for additional metadata
## Troubleshooting
### Logs missing trace context
Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
### JSON logs not appearing
Check environment variables:
```bash
echo $LOG_FORMAT
echo $ENV
```
Or explicitly set:
```python
configure_logging(format="json")
```
### Context not propagating
ContextVar automatically propagates through async calls. If context seems lost, check:
- Are you in the same async execution context?
- Has `set_trace_context()` been called for this execution?
## See Also
- [Logging Implementation](../observability/logging.py) - Source code
- [AgentRunner](../runner/runner.py) - Where logging is configured
- [Runtime Core](../runtime/core.py) - Where trace context is set
+23
View File
@@ -0,0 +1,23 @@
"""
Observability module for automatic trace correlation and structured logging.
This module provides zero-friction observability:
- Automatic trace context propagation via ContextVar
- Structured JSON logging for production
- Human-readable logging for development
- No manual ID passing required
"""
from framework.observability.logging import (
clear_trace_context,
configure_logging,
get_trace_context,
set_trace_context,
)
__all__ = [
"configure_logging",
"get_trace_context",
"set_trace_context",
"clear_trace_context",
]
+302
View File
@@ -0,0 +1,302 @@
"""
Structured logging with automatic trace context propagation.
Key Features:
- Zero developer friction: Standard logger.info() calls get automatic context
- ContextVar-based propagation: Thread-safe and async-safe
- Dual output modes: JSON for production, human-readable for development
- Correlation IDs: trace_id follows entire request flow automatically
Architecture:
Runtime.start_run() Generates trace_id, sets context once
(automatic propagation via ContextVar)
GraphExecutor.execute() Adds agent_id to context
(automatic propagation)
Node.execute() Adds node_id to context
(automatic propagation)
User code logger.info("message") Gets ALL context automatically!
"""
import json
import logging
import os
import re
from contextvars import ContextVar
from datetime import UTC, datetime
from typing import Any
# Context variable for trace propagation
# ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)
# ANSI escape code pattern (matches \033[...m or \x1b[...m)
ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")
def strip_ansi_codes(text: str) -> str:
"""Remove ANSI escape codes from text for clean JSON logging."""
return ANSI_ESCAPE_PATTERN.sub("", text)
class StructuredFormatter(logging.Formatter):
"""
JSON formatter for structured logging.
Produces machine-parseable log entries with:
- Standard fields (timestamp, level, logger, message)
- Trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC
- Custom fields from extra dict
"""
def format(self, record: logging.LogRecord) -> str:
"""Format log record as JSON."""
# Get trace context for correlation - AUTOMATIC!
context = trace_context.get() or {}
# Strip ANSI codes from message for clean JSON output
message = strip_ansi_codes(record.getMessage())
# Build base log entry
log_entry = {
"timestamp": datetime.now(UTC).isoformat(),
"level": record.levelname.lower(),
"logger": record.name,
"message": message,
}
# Add trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC!
log_entry.update(context)
# Add custom fields from extra (optional)
event = getattr(record, "event", None)
if event is not None:
if isinstance(event, str):
log_entry["event"] = strip_ansi_codes(str(event))
else:
log_entry["event"] = event
latency_ms = getattr(record, "latency_ms", None)
if latency_ms is not None:
log_entry["latency_ms"] = latency_ms
tokens_used = getattr(record, "tokens_used", None)
if tokens_used is not None:
log_entry["tokens_used"] = tokens_used
node_id = getattr(record, "node_id", None)
if node_id is not None:
log_entry["node_id"] = node_id
model = getattr(record, "model", None)
if model is not None:
log_entry["model"] = model
# Add exception info if present (strip ANSI codes from exception text too)
if record.exc_info:
exception_text = self.formatException(record.exc_info)
log_entry["exception"] = strip_ansi_codes(exception_text)
return json.dumps(log_entry)
class HumanReadableFormatter(logging.Formatter):
"""
Human-readable formatter for development.
Provides colorized logs with trace context for local debugging.
Includes trace_id prefix for correlation - AUTOMATIC!
"""
COLORS = {
"DEBUG": "\033[36m", # Cyan
"INFO": "\033[32m", # Green
"WARNING": "\033[33m", # Yellow
"ERROR": "\033[31m", # Red
"CRITICAL": "\033[35m", # Magenta
}
RESET = "\033[0m"
def format(self, record: logging.LogRecord) -> str:
"""Format log record as human-readable string."""
# Get trace context - AUTOMATIC!
context = trace_context.get() or {}
trace_id = context.get("trace_id", "")
execution_id = context.get("execution_id", "")
agent_id = context.get("agent_id", "")
# Build context prefix
prefix_parts = []
if trace_id:
prefix_parts.append(f"trace:{trace_id[:8]}")
if execution_id:
prefix_parts.append(f"exec:{execution_id[-8:]}")
if agent_id:
prefix_parts.append(f"agent:{agent_id}")
context_prefix = f"[{' | '.join(prefix_parts)}] " if prefix_parts else ""
# Get color
color = self.COLORS.get(record.levelname, "")
reset = self.RESET
# Format log level (5 chars wide for alignment)
level = f"{record.levelname:<8}"
# Add event if present
event = ""
record_event = getattr(record, "event", None)
if record_event is not None:
event = f" [{record_event}]"
# Format message: [LEVEL] [trace context] message
return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
def configure_logging(
level: str = "INFO",
format: str = "auto", # "json", "human", or "auto"
) -> None:
"""
Configure structured logging for the application.
This should be called ONCE at application startup, typically in:
- AgentRunner._setup()
- Main entry point
- Test fixtures
Args:
level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
format: Output format:
- "json": Machine-parseable JSON (for production)
- "human": Human-readable with colors (for development)
- "auto": JSON if LOG_FORMAT=json or ENV=production, else human
Examples:
# Development mode (human-readable)
configure_logging(level="DEBUG", format="human")
# Production mode (JSON)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
"""
# Auto-detect format
if format == "auto":
# Use JSON if LOG_FORMAT=json or ENV=production
log_format_env = os.getenv("LOG_FORMAT", "").lower()
env = os.getenv("ENV", "development").lower()
if log_format_env == "json" or env == "production":
format = "json"
else:
format = "human"
# Select formatter
if format == "json":
formatter = StructuredFormatter()
# Disable colors in third-party libraries when using JSON format
_disable_third_party_colors()
else:
formatter = HumanReadableFormatter()
# Configure handler
handler = logging.StreamHandler()
handler.setFormatter(formatter)
# Configure root logger
root_logger = logging.getLogger()
root_logger.handlers.clear()
root_logger.addHandler(handler)
root_logger.setLevel(level.upper())
# When in JSON mode, configure known third-party loggers to use JSON formatter
# This ensures libraries like LiteLLM, httpcore also output clean JSON
if format == "json":
third_party_loggers = [
"LiteLLM",
"httpcore",
"httpx",
"openai",
]
for logger_name in third_party_loggers:
logger = logging.getLogger(logger_name)
# Clear existing handlers so records propagate to root and use our formatter there
logger.handlers.clear()
logger.propagate = True # Still propagate to root for consistency
def _disable_third_party_colors() -> None:
"""Disable color output in third-party libraries for clean JSON logging."""
# Set NO_COLOR environment variable (common convention for disabling colors)
os.environ["NO_COLOR"] = "1"
os.environ["FORCE_COLOR"] = "0"
# Disable LiteLLM debug/verbose output colors if available
try:
import litellm
# LiteLLM respects NO_COLOR, but we can also suppress debug info
if hasattr(litellm, "suppress_debug_info"):
litellm.suppress_debug_info = True # type: ignore[attr-defined]
except (ImportError, AttributeError):
pass
def set_trace_context(**kwargs: Any) -> None:
"""
Set trace context for current execution.
Context is stored in a ContextVar and AUTOMATICALLY propagates
through async calls within the same execution context.
This is called by the framework at key points:
- Runtime.start_run(): Sets trace_id, execution_id, goal_id
- GraphExecutor.execute(): Adds agent_id
- Node execution: Adds node_id
Developers/agents NEVER call this directly - it's framework-managed.
Args:
**kwargs: Context fields (trace_id, execution_id, agent_id, etc.)
Example (framework code):
# In Runtime.start_run()
trace_id = uuid.uuid4().hex # 32 hex, W3C Trace Context compliant
execution_id = uuid.uuid4().hex # 32 hex, OTel-aligned for correlation
set_trace_context(
trace_id=trace_id,
execution_id=execution_id,
goal_id=goal_id
)
# All subsequent logs in this execution get these fields automatically!
"""
current = trace_context.get() or {}
trace_context.set({**current, **kwargs})
def get_trace_context() -> dict:
"""
Get current trace context.
Returns:
Dict with trace_id, execution_id, agent_id, etc.
Empty dict if no context set.
"""
context = trace_context.get() or {}
return context.copy()
def clear_trace_context() -> None:
"""
Clear trace context.
Useful for:
- Cleanup between test runs
- Starting a completely new execution context
- Manual context management (rare)
Note: Framework typically doesn't need to call this - ContextVar
is execution-scoped and cleans itself up automatically.
"""
trace_context.set(None)
+625 -35
View File
@@ -33,11 +33,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
type=str,
help="Input context from JSON file",
)
run_parser.add_argument(
"--mock",
action="store_true",
help="Run in mock mode (no real LLM calls)",
)
run_parser.add_argument(
"--output",
"-o",
@@ -68,6 +63,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
run_parser.add_argument(
"--resume-session",
type=str,
default=None,
help="Resume from a specific session ID",
)
run_parser.add_argument(
"--checkpoint",
type=str,
default=None,
help="Resume from a specific checkpoint (requires --resume-session)",
)
run_parser.set_defaults(func=cmd_run)
# info command
@@ -186,11 +193,206 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
)
shell_parser.set_defaults(func=cmd_shell)
# tui command (interactive agent dashboard)
tui_parser = subparsers.add_parser(
"tui",
help="Launch interactive TUI dashboard",
description="Browse available agents and launch the terminal dashboard.",
)
tui_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
tui_parser.set_defaults(func=cmd_tui)
# sessions command group (checkpoint/resume management)
sessions_parser = subparsers.add_parser(
"sessions",
help="Manage agent sessions",
description="List, inspect, and manage agent execution sessions.",
)
sessions_subparsers = sessions_parser.add_subparsers(
dest="sessions_cmd",
help="Session management commands",
)
# sessions list
sessions_list_parser = sessions_subparsers.add_parser(
"list",
help="List agent sessions",
description="List all sessions for an agent.",
)
sessions_list_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_list_parser.add_argument(
"--status",
choices=["all", "active", "failed", "completed", "paused"],
default="all",
help="Filter by session status (default: all)",
)
sessions_list_parser.add_argument(
"--has-checkpoints",
action="store_true",
help="Show only sessions with checkpoints",
)
sessions_list_parser.set_defaults(func=cmd_sessions_list)
# sessions show
sessions_show_parser = sessions_subparsers.add_parser(
"show",
help="Show session details",
description="Display detailed information about a specific session.",
)
sessions_show_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_show_parser.add_argument(
"session_id",
type=str,
help="Session ID to inspect",
)
sessions_show_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
sessions_show_parser.set_defaults(func=cmd_sessions_show)
# sessions checkpoints
sessions_checkpoints_parser = sessions_subparsers.add_parser(
"checkpoints",
help="List session checkpoints",
description="List all checkpoints for a session.",
)
sessions_checkpoints_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_checkpoints_parser.add_argument(
"session_id",
type=str,
help="Session ID",
)
sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
# pause command
pause_parser = subparsers.add_parser(
"pause",
help="Pause running session",
description="Request graceful pause of a running agent session.",
)
pause_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
pause_parser.add_argument(
"session_id",
type=str,
help="Session ID to pause",
)
pause_parser.set_defaults(func=cmd_pause)
# resume command
resume_parser = subparsers.add_parser(
"resume",
help="Resume session from checkpoint",
description="Resume a paused or failed session from a checkpoint.",
)
resume_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
resume_parser.add_argument(
"session_id",
type=str,
help="Session ID to resume",
)
resume_parser.add_argument(
"--checkpoint",
"-c",
type=str,
help="Specific checkpoint ID to resume from (default: latest)",
)
resume_parser.add_argument(
"--tui",
action="store_true",
help="Resume in TUI dashboard mode",
)
resume_parser.set_defaults(func=cmd_resume)
def _load_resume_state(
agent_path: str, session_id: str, checkpoint_id: str | None = None
) -> dict | None:
"""Load session or checkpoint state for headless resume.
Args:
agent_path: Path to the agent folder (e.g., exports/my_agent)
session_id: Session ID to resume from
checkpoint_id: Optional checkpoint ID within the session
Returns:
session_state dict for executor, or None if not found
"""
agent_name = Path(agent_path).name
agent_work_dir = Path.home() / ".hive" / "agents" / agent_name
session_dir = agent_work_dir / "sessions" / session_id
if not session_dir.exists():
return None
if checkpoint_id:
# Checkpoint-based resume: load checkpoint and extract state
cp_path = session_dir / "checkpoints" / f"{checkpoint_id}.json"
if not cp_path.exists():
return None
try:
cp_data = json.loads(cp_path.read_text())
except (json.JSONDecodeError, OSError):
return None
return {
"resume_session_id": session_id,
"memory": cp_data.get("shared_memory", {}),
"paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
"execution_path": cp_data.get("execution_path", []),
"node_visit_counts": {},
}
else:
# Session state resume: load state.json
state_path = session_dir / "state.json"
if not state_path.exists():
return None
try:
state_data = json.loads(state_path.read_text())
except (json.JSONDecodeError, OSError):
return None
progress = state_data.get("progress", {})
paused_at = progress.get("paused_at") or progress.get("resume_from")
return {
"resume_session_id": session_id,
"memory": state_data.get("memory", {}),
"paused_at": paused_at,
"execution_path": progress.get("path", []),
"node_visit_counts": progress.get("node_visit_counts", {}),
}
def cmd_run(args: argparse.Namespace) -> int:
"""Run an exported agent."""
import logging
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
# Set logging level (quiet by default for cleaner output)
@@ -228,10 +430,11 @@ def cmd_run(args: argparse.Namespace) -> int:
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=True,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
@@ -244,7 +447,11 @@ def cmd_run(args: argparse.Namespace) -> int:
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
app = AdenTUI(
runner._agent_runtime,
resume_session=getattr(args, "resume_session", None),
resume_checkpoint=getattr(args, "checkpoint", None),
)
# TUI handles execution via ChatRepl — user submits input,
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
@@ -266,14 +473,36 @@ def cmd_run(args: argparse.Namespace) -> int:
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=False,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
checkpoint = getattr(args, "checkpoint", None)
if resume_session:
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
if session_state is None:
print(
f"Error: Could not load session state for {resume_session}",
file=sys.stderr,
)
return 1
if not args.quiet:
resume_node = session_state.get("paused_at", "unknown")
if checkpoint:
print(f"Resuming from checkpoint: {checkpoint}")
else:
print(f"Resuming session: {resume_session}")
print(f"Resume point: {resume_node}")
print()
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
@@ -293,7 +522,7 @@ def cmd_run(args: argparse.Namespace) -> int:
print("=" * 60)
print()
result = asyncio.run(runner.run(context))
result = asyncio.run(runner.run(context, session_state=session_state))
# Format output
output = {
@@ -373,10 +602,14 @@ def cmd_run(args: argparse.Namespace) -> int:
def cmd_info(args: argparse.Namespace) -> int:
"""Show agent information."""
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
try:
runner = AgentRunner.load(args.agent_path)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
@@ -436,10 +669,14 @@ def cmd_info(args: argparse.Namespace) -> int:
def cmd_validate(args: argparse.Namespace) -> int:
"""Validate an exported agent."""
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
try:
runner = AgentRunner.load(args.agent_path)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
@@ -756,6 +993,7 @@ def cmd_shell(args: argparse.Namespace) -> int:
"""Start an interactive agent session."""
import logging
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
# Configure logging to show runtime visibility
@@ -780,6 +1018,9 @@ def cmd_shell(args: argparse.Namespace) -> int:
try:
runner = AgentRunner.load(agent_path)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
@@ -985,8 +1226,218 @@ def cmd_shell(args: argparse.Namespace) -> int:
return 0
def cmd_tui(args: argparse.Namespace) -> int:
"""Browse agents and launch the interactive TUI dashboard."""
import logging
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
from framework.tui.app import AdenTUI
logging.basicConfig(level=logging.WARNING, format="%(message)s")
exports_dir = Path("exports")
examples_dir = Path("examples/templates")
has_exports = _has_agents(exports_dir)
has_examples = _has_agents(examples_dir)
if not has_exports and not has_examples:
print("No agents found in exports/ or examples/templates/", file=sys.stderr)
return 1
# Determine which directory to browse
if has_exports and has_examples:
print("\nAgent sources:\n")
print(" 1. Your Agents (exports/)")
print(" 2. Sample Agents (examples/templates/)")
print()
try:
choice = input("Select source (number): ").strip()
if choice == "1":
agents_dir = exports_dir
elif choice == "2":
agents_dir = examples_dir
else:
print("Invalid selection")
return 1
except (EOFError, KeyboardInterrupt):
print()
return 1
elif has_exports:
agents_dir = exports_dir
else:
agents_dir = examples_dir
# Let user pick an agent
agent_path = _select_agent(agents_dir)
if not agent_path:
return 1
# Launch TUI (same pattern as cmd_run --tui)
async def run_with_tui():
try:
runner = AgentRunner.load(
agent_path,
model=args.model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
if runner._agent_runtime is None:
runner._setup()
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
try:
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
"""Extract name and description from a Python-based agent's config.py.
Uses AST parsing to safely extract values without executing code.
Returns (name, description) tuple, with fallbacks if parsing fails.
"""
import ast
config_path = agent_path / "config.py"
fallback_name = agent_path.name.replace("_", " ").title()
fallback_desc = "(Python-based agent)"
if not config_path.exists():
return fallback_name, fallback_desc
try:
with open(config_path) as f:
tree = ast.parse(f.read())
# Find AgentMetadata class definition
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef) and node.name == "AgentMetadata":
name = fallback_name
desc = fallback_desc
# Extract default values from class body
for item in node.body:
if isinstance(item, ast.AnnAssign) and isinstance(item.target, ast.Name):
field_name = item.target.id
if item.value:
# Handle simple string constants
if isinstance(item.value, ast.Constant):
if field_name == "name":
name = item.value.value
elif field_name == "description":
desc = item.value.value
# Handle parenthesized multi-line strings (concatenated)
elif isinstance(item.value, ast.JoinedStr):
# f-strings - skip, use fallback
pass
elif isinstance(item.value, ast.BinOp):
# String concatenation with + - try to evaluate
try:
result = _eval_string_binop(item.value)
if result and field_name == "name":
name = result
elif result and field_name == "description":
desc = result
except Exception:
pass
return name, desc
return fallback_name, fallback_desc
except Exception:
return fallback_name, fallback_desc
def _eval_string_binop(node) -> str | None:
"""Recursively evaluate a BinOp of string constants."""
import ast
if isinstance(node, ast.Constant) and isinstance(node.value, str):
return node.value
elif isinstance(node, ast.BinOp) and isinstance(node.op, ast.Add):
left = _eval_string_binop(node.left)
right = _eval_string_binop(node.right)
if left is not None and right is not None:
return left + right
return None
def _is_valid_agent_dir(path: Path) -> bool:
"""Check if a directory contains a valid agent (agent.json or agent.py)."""
if not path.is_dir():
return False
return (path / "agent.json").exists() or (path / "agent.py").exists()
def _has_agents(directory: Path) -> bool:
"""Check if a directory contains any valid agents (folders with agent.json or agent.py)."""
if not directory.exists():
return False
return any(_is_valid_agent_dir(p) for p in directory.iterdir())
def _getch() -> str:
"""Read a single character from stdin without waiting for Enter."""
try:
if sys.platform == "win32":
import msvcrt
ch = msvcrt.getch()
return ch.decode("utf-8", errors="ignore")
else:
import termios
import tty
fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
try:
tty.setraw(fd)
ch = sys.stdin.read(1)
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
return ch
except Exception:
return ""
def _read_key() -> str:
"""Read a key, handling arrow key escape sequences."""
ch = _getch()
if ch == "\x1b": # Escape sequence start
ch2 = _getch()
if ch2 == "[":
ch3 = _getch()
if ch3 == "C": # Right arrow
return "RIGHT"
elif ch3 == "D": # Left arrow
return "LEFT"
return ch
def _select_agent(agents_dir: Path) -> str | None:
"""Let user select an agent from available agents."""
"""Let user select an agent from available agents with pagination."""
AGENTS_PER_PAGE = 10
if not agents_dir.exists():
print(f"Directory not found: {agents_dir}", file=sys.stderr)
# fixes issue #696, creates an exports folder if it does not exist
@@ -996,37 +1447,126 @@ def _select_agent(agents_dir: Path) -> str | None:
agents = []
for path in agents_dir.iterdir():
if path.is_dir() and (path / "agent.json").exists():
if _is_valid_agent_dir(path):
agents.append(path)
if not agents:
print(f"No agents found in {agents_dir}", file=sys.stderr)
return None
print(f"\nAvailable agents in {agents_dir}:\n")
for i, agent_path in enumerate(agents, 1):
# Pagination setup
page = 0
total_pages = (len(agents) + AGENTS_PER_PAGE - 1) // AGENTS_PER_PAGE
while True:
start_idx = page * AGENTS_PER_PAGE
end_idx = min(start_idx + AGENTS_PER_PAGE, len(agents))
page_agents = agents[start_idx:end_idx]
# Show page header with indicator
if total_pages > 1:
print(f"\nAvailable agents in {agents_dir} (Page {page + 1}/{total_pages}):\n")
else:
print(f"\nAvailable agents in {agents_dir}:\n")
# Display agents for current page (with global numbering)
for i, agent_path in enumerate(page_agents, start_idx + 1):
try:
agent_json = agent_path / "agent.json"
if agent_json.exists():
with open(agent_json) as f:
data = json.load(f)
agent_meta = data.get("agent", {})
name = agent_meta.get("name", agent_path.name)
desc = agent_meta.get("description", "")
else:
# Python-based agent - extract from config.py
name, desc = _extract_python_agent_metadata(agent_path)
desc = desc[:50] + "..." if len(desc) > 50 else desc
print(f" {i}. {name}")
print(f" {desc}")
except Exception as e:
print(f" {i}. {agent_path.name} (error: {e})")
# Build navigation options
nav_options = []
if total_pages > 1:
nav_options.append("←/→ or p/n=navigate")
nav_options.append("q=quit")
print()
if total_pages > 1:
print(f" [{', '.join(nav_options)}]")
print()
# Show prompt
print("Select agent (number), use arrows to navigate, or q to quit: ", end="", flush=True)
try:
from framework.runner import AgentRunner
key = _read_key()
runner = AgentRunner.load(agent_path)
info = runner.info()
desc = info.description[:50] + "..." if len(info.description) > 50 else info.description
print(f" {i}. {info.name}")
print(f" {desc}")
runner.cleanup()
except Exception as e:
print(f" {i}. {agent_path.name} (error: {e})")
if key == "RIGHT" and page < total_pages - 1:
page += 1
print() # Newline before redrawing
elif key == "LEFT" and page > 0:
page -= 1
print()
elif key == "q":
print()
return None
elif key in ("n", ">") and page < total_pages - 1:
page += 1
print()
elif key in ("p", "<") and page > 0:
page -= 1
print()
elif key.isdigit():
# Build number with support for backspace
buffer = key
print(key, end="", flush=True)
print()
try:
choice = input("Select agent (number): ").strip()
idx = int(choice) - 1
if 0 <= idx < len(agents):
return str(agents[idx])
print("Invalid selection")
return None
except (ValueError, EOFError, KeyboardInterrupt):
return None
while True:
ch = _getch()
if ch in ("\r", "\n"):
# Enter pressed - submit
print()
break
elif ch in ("\x7f", "\x08"):
# Backspace (DEL or BS)
if buffer:
buffer = buffer[:-1]
# Erase character: move back, print space, move back
print("\b \b", end="", flush=True)
elif ch.isdigit():
buffer += ch
print(ch, end="", flush=True)
elif ch == "\x1b":
# Escape - cancel input
print()
buffer = ""
break
elif ch == "\x03":
# Ctrl+C
print()
return None
# Ignore other characters
if buffer:
try:
idx = int(buffer) - 1
if 0 <= idx < len(agents):
return str(agents[idx])
print("Invalid selection")
except ValueError:
print("Invalid input")
elif key == "\r" or key == "\n":
print() # Just pressed enter, redraw
else:
print()
print("Invalid input")
except (EOFError, KeyboardInterrupt):
print()
return None
def _interactive_multi(agents_dir: Path) -> int:
@@ -1042,7 +1582,7 @@ def _interactive_multi(agents_dir: Path) -> int:
# Register all agents
for path in agents_dir.iterdir():
if path.is_dir() and (path / "agent.json").exists():
if _is_valid_agent_dir(path):
try:
orchestrator.register(path.name, path)
agent_count += 1
@@ -1128,3 +1668,53 @@ def _interactive_multi(agents_dir: Path) -> int:
orchestrator.cleanup()
return 0
def cmd_sessions_list(args: argparse.Namespace) -> int:
"""List agent sessions."""
print("⚠ Sessions list command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Status filter: {args.status}")
print(f"Has checkpoints: {args.has_checkpoints}")
return 1
def cmd_sessions_show(args: argparse.Namespace) -> int:
"""Show detailed session information."""
print("⚠ Session show command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
"""List checkpoints for a session."""
print("⚠ Session checkpoints command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_pause(args: argparse.Namespace) -> int:
"""Pause a running session."""
print("⚠ Pause command not yet implemented")
print("This will be available once executor pause integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_resume(args: argparse.Namespace) -> int:
"""Resume a session from checkpoint."""
print("⚠ Resume command not yet implemented")
print("This will be available once checkpoint resume integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
if args.checkpoint:
print(f"Checkpoint: {args.checkpoint}")
if args.tui:
print("Mode: TUI")
return 1
+158 -175
View File
@@ -8,17 +8,26 @@ from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
validate_agent_credentials,
)
from framework.graph import Goal
from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.graph.edge import (
DEFAULT_MAX_TOKENS,
AsyncEntryPointSpec,
EdgeCondition,
EdgeSpec,
GraphSpec,
)
from framework.graph.executor import ExecutionResult
from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, Tool
from framework.runner.tool_registry import ToolRegistry
# Multi-entry-point runtime imports
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.core import Runtime
from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.runtime_log_store import RuntimeLogStore
if TYPE_CHECKING:
from framework.runner.protocol import AgentMessage, CapabilityResponse
@@ -26,49 +35,9 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
# Configuration paths
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def _ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads it
directly so the runner (and any MCP subprocesses it spawns) can unlock the
encrypted credential store.
Only HIVE_CREDENTIAL_KEY is loaded this way all other secrets (API keys, etc.)
come from the credential store itself.
"""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
except ImportError:
pass
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
def get_claude_code_token() -> str | None:
"""
Get the OAuth token from Claude Code subscription.
@@ -266,11 +235,7 @@ class AgentRunner:
@staticmethod
def _resolve_default_model() -> str:
"""Resolve the default model from ~/.hive/configuration.json."""
config = get_hive_config()
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
return get_preferred_model()
def __init__(
self,
@@ -280,7 +245,7 @@ class AgentRunner:
mock_mode: bool = False,
storage_path: Path | None = None,
model: str | None = None,
enable_tui: bool = False,
intro_message: str = "",
):
"""
Initialize the runner (use AgentRunner.load() instead).
@@ -292,23 +257,23 @@ class AgentRunner:
mock_mode: If True, use mock LLM responses
storage_path: Path for runtime storage (defaults to temp)
model: Model to use (reads from agent config or ~/.hive/configuration.json if None)
enable_tui: If True, forces use of AgentRuntime with EventBus
intro_message: Optional greeting shown to user on TUI load
"""
self.agent_path = agent_path
self.graph = graph
self.goal = goal
self.mock_mode = mock_mode
self.model = model or self._resolve_default_model()
self.enable_tui = enable_tui
self.intro_message = intro_message
# Set up storage
if storage_path:
self._storage_path = storage_path
self._temp_dir = None
else:
# Use persistent storage in ~/.hive by default
# Use persistent storage in ~/.hive/agents/{agent_name}/ per RUNTIME_LOGGING.md spec
home = Path.home()
default_storage = home / ".hive" / "storage" / agent_path.name
default_storage = home / ".hive" / "agents" / agent_path.name
default_storage.mkdir(parents=True, exist_ok=True)
self._storage_path = default_storage
self._temp_dir = None
@@ -319,15 +284,17 @@ class AgentRunner:
# Initialize components
self._tool_registry = ToolRegistry()
self._runtime: Runtime | None = None
self._llm: LLMProvider | None = None
self._executor: GraphExecutor | None = None
self._approval_callback: Callable | None = None
# Multi-entry-point support (AgentRuntime)
# AgentRuntime — unified execution path for all agents
self._agent_runtime: AgentRuntime | None = None
self._uses_async_entry_points = self.graph.has_async_entry_points()
# Validate credentials before spawning MCP servers.
# Fails fast with actionable guidance — no MCP noise on screen.
self._validate_credentials()
# Auto-discover tools from tools.py
tools_path = agent_path / "tools.py"
if tools_path.exists():
@@ -338,6 +305,13 @@ class AgentRunner:
if mcp_config_path.exists():
self._load_mcp_servers_from_config(mcp_config_path)
def _validate_credentials(self) -> None:
"""Check that required credentials are available before spawning MCP servers.
Raises CredentialError with actionable guidance if any are missing.
"""
validate_agent_credentials(self.graph.nodes)
@staticmethod
def _import_agent_module(agent_path: Path):
"""Import an agent package from its directory path.
@@ -381,7 +355,6 @@ class AgentRunner:
mock_mode: bool = False,
storage_path: Path | None = None,
model: str | None = None,
enable_tui: bool = False,
) -> "AgentRunner":
"""
Load an agent from an export folder.
@@ -393,9 +366,8 @@ class AgentRunner:
Args:
agent_path: Path to agent folder
mock_mode: If True, use mock LLM responses
storage_path: Path for runtime storage (defaults to ~/.hive/storage/{name})
storage_path: Path for runtime storage (defaults to ~/.hive/agents/{name})
model: LLM model to use (reads from agent's default_config if None)
enable_tui: If True, forces use of AgentRuntime with EventBus
Returns:
AgentRunner instance ready to run
@@ -423,7 +395,17 @@ class AgentRunner:
if agent_config and hasattr(agent_config, "model"):
model = agent_config.model
max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
if agent_config and hasattr(agent_config, "max_tokens"):
max_tokens = agent_config.max_tokens
else:
hive_config = get_hive_config()
max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# Read intro_message from agent metadata (shown on TUI load)
agent_metadata = getattr(agent_module, "metadata", None)
intro_message = ""
if agent_metadata and hasattr(agent_metadata, "intro_message"):
intro_message = agent_metadata.intro_message
# Build GraphSpec from module-level variables
graph = GraphSpec(
@@ -437,6 +419,7 @@ class AgentRunner:
nodes=nodes,
edges=edges,
max_tokens=max_tokens,
loop_config=getattr(agent_module, "loop_config", {}),
)
return cls(
@@ -446,7 +429,7 @@ class AgentRunner:
mock_mode=mock_mode,
storage_path=storage_path,
model=model,
enable_tui=enable_tui,
intro_message=intro_message,
)
# Fallback: load from agent.json (legacy JSON-based agents)
@@ -464,7 +447,6 @@ class AgentRunner:
mock_mode=mock_mode,
storage_path=storage_path,
model=model,
enable_tui=enable_tui,
)
def register_tool(
@@ -554,12 +536,14 @@ class AgentRunner:
callback: Function to call for approval (receives node info, returns bool)
"""
self._approval_callback = callback
# If executor already exists, update it
if self._executor is not None:
self._executor.approval_callback = callback
def _setup(self) -> None:
"""Set up runtime, LLM, and executor."""
# Configure structured logging (auto-detects JSON vs human-readable)
from framework.observability import configure_logging
configure_logging(level="INFO", format="auto")
# Set up session context for tools (workspace_id, agent_id, session_id)
workspace_id = "default" # Could be derived from storage path
agent_id = self.graph.id or "unknown"
@@ -600,7 +584,10 @@ class AgentRunner:
self._llm = LiteLLMProvider(model=self.model, api_key=api_key)
else:
# Fall back to environment variable
api_key_env = self._get_api_key_env_var(self.model)
# First check api_key_env_var from config (set by quickstart)
api_key_env = llm_config.get("api_key_env_var") or self._get_api_key_env_var(
self.model
)
if api_key_env and os.environ.get(api_key_env):
self._llm = LiteLLMProvider(model=self.model)
else:
@@ -616,16 +603,11 @@ class AgentRunner:
print(f"Warning: {api_key_env} not set. LLM calls will fail.")
print(f"Set it with: export {api_key_env}=your-api-key")
# Get tools for executor/runtime
# Get tools for runtime
tools = list(self._tool_registry.get_tools().values())
tool_executor = self._tool_registry.get_executor()
if self._uses_async_entry_points or self.enable_tui:
# Multi-entry-point mode or TUI mode: use AgentRuntime
self._setup_agent_runtime(tools, tool_executor)
else:
# Single-entry-point mode: use legacy GraphExecutor
self._setup_legacy_executor(tools, tool_executor)
self._setup_agent_runtime(tools, tool_executor)
def _get_api_key_env_var(self, model: str) -> str | None:
"""Get the environment variable name for the API key based on model name."""
@@ -640,7 +622,7 @@ class AgentRunner:
elif model_lower.startswith("anthropic/") or model_lower.startswith("claude"):
return "ANTHROPIC_API_KEY"
elif model_lower.startswith("gemini/") or model_lower.startswith("google/"):
return "GOOGLE_API_KEY"
return "GEMINI_API_KEY"
elif model_lower.startswith("mistral/"):
return "MISTRAL_API_KEY"
elif model_lower.startswith("groq/"):
@@ -686,20 +668,6 @@ class AgentRunner:
except Exception:
return None
def _setup_legacy_executor(self, tools: list, tool_executor: Callable | None) -> None:
"""Set up legacy single-entry-point execution using GraphExecutor."""
# Create runtime
self._runtime = Runtime(storage_path=self._storage_path)
# Create executor
self._executor = GraphExecutor(
runtime=self._runtime,
llm=self._llm,
tools=tools,
tool_executor=tool_executor,
approval_callback=self._approval_callback,
)
def _setup_agent_runtime(self, tools: list, tool_executor: Callable | None) -> None:
"""Set up multi-entry-point execution using AgentRuntime."""
# Convert AsyncEntryPointSpec to EntryPointSpec for AgentRuntime
@@ -717,9 +685,9 @@ class AgentRunner:
)
entry_points.append(ep)
# If TUI enabled but no entry points (single-entry agent), create default
if not entry_points and self.enable_tui and self.graph.entry_node:
logger.info("Creating default entry point for TUI")
# Single-entry agent with no async entry points: create a default entry point
if not entry_points and self.graph.entry_node:
logger.info("Creating default entry point for single-entry agent")
entry_points.append(
EntryPointSpec(
id="default",
@@ -731,6 +699,19 @@ class AgentRunner:
)
# Create AgentRuntime with all entry points
log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")
# Enable checkpointing by default for resumable sessions
from framework.graph.checkpoint_config import CheckpointConfig
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False, # Only checkpoint after nodes complete
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True, # Non-blocking
)
self._agent_runtime = create_agent_runtime(
graph=self.graph,
goal=self.goal,
@@ -739,8 +720,13 @@ class AgentRunner:
llm=self._llm,
tools=tools,
tool_executor=tool_executor,
runtime_log_store=log_store,
checkpoint_config=checkpoint_config,
)
# Pass intro_message through for TUI display
self._agent_runtime.intro_message = self.intro_message
async def run(
self,
input_data: dict | None = None,
@@ -780,32 +766,9 @@ class AgentRunner:
error=error_msg,
)
if self._uses_async_entry_points or self.enable_tui:
# Multi-entry-point mode: use AgentRuntime
return await self._run_with_agent_runtime(
input_data=input_data or {},
entry_point_id=entry_point_id,
)
else:
# Legacy single-entry-point mode
return await self._run_with_executor(
input_data=input_data or {},
session_state=session_state,
)
async def _run_with_executor(
self,
input_data: dict,
session_state: dict | None = None,
) -> ExecutionResult:
"""Run using legacy GraphExecutor (single entry point)."""
if self._executor is None:
self._setup()
return await self._executor.execute(
graph=self.graph,
goal=self.goal,
input_data=input_data,
return await self._run_with_agent_runtime(
input_data=input_data or {},
entry_point_id=entry_point_id,
session_state=session_state,
)
@@ -813,8 +776,11 @@ class AgentRunner:
self,
input_data: dict,
entry_point_id: str | None = None,
session_state: dict | None = None,
) -> ExecutionResult:
"""Run using AgentRuntime (multi-entry-point)."""
"""Run using AgentRuntime."""
import sys
if self._agent_runtime is None:
self._setup()
@@ -822,6 +788,52 @@ class AgentRunner:
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
# Set up stdin-based I/O for client-facing nodes in headless mode.
# When a client_facing EventLoopNode calls ask_user(), it emits
# CLIENT_INPUT_REQUESTED on the event bus and blocks. We subscribe
# a handler that prints the prompt and reads from stdin, then injects
# the user's response back into the node to unblock it.
has_client_facing = any(n.client_facing for n in self.graph.nodes)
sub_ids: list[str] = []
if has_client_facing and sys.stdin.isatty():
from framework.runtime.event_bus import EventType
runtime = self._agent_runtime
async def _handle_client_output(event):
"""Print agent output to stdout as it streams."""
content = event.data.get("content", "")
if content:
print(content, end="", flush=True)
async def _handle_input_requested(event):
"""Read user input from stdin and inject it into the node."""
import asyncio
node_id = event.node_id
try:
loop = asyncio.get_event_loop()
user_input = await loop.run_in_executor(None, input, "\n>>> ")
except EOFError:
user_input = ""
# Inject into the waiting EventLoopNode via runtime
await runtime.inject_input(node_id, user_input)
sub_ids.append(
runtime.subscribe_to_events(
event_types=[EventType.CLIENT_OUTPUT_DELTA],
handler=_handle_client_output,
)
)
sub_ids.append(
runtime.subscribe_to_events(
event_types=[EventType.CLIENT_INPUT_REQUESTED],
handler=_handle_input_requested,
)
)
# Determine entry point
if entry_point_id is None:
# Use first entry point or "default" if no entry points defined
@@ -831,44 +843,38 @@ class AgentRunner:
else:
entry_point_id = "default"
# Trigger and wait for result
result = await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point_id,
input_data=input_data,
)
# Return result or create error result
if result is not None:
return result
else:
return ExecutionResult(
success=False,
error="Execution timed out or failed to complete",
try:
# Trigger and wait for result
result = await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point_id,
input_data=input_data,
session_state=session_state,
)
# === Multi-Entry-Point API (for agents with async_entry_points) ===
# Return result or create error result
if result is not None:
return result
else:
return ExecutionResult(
success=False,
error="Execution timed out or failed to complete",
)
finally:
# Clean up subscriptions
for sub_id in sub_ids:
self._agent_runtime.unsubscribe_from_events(sub_id)
# === Runtime API ===
async def start(self) -> None:
"""
Start the agent runtime (for multi-entry-point agents).
This starts all registered entry points and allows concurrent execution.
For single-entry-point agents, this is a no-op.
"""
if not self._uses_async_entry_points:
return
"""Start the agent runtime."""
if self._agent_runtime is None:
self._setup()
await self._agent_runtime.start()
async def stop(self) -> None:
"""
Stop the agent runtime (for multi-entry-point agents).
For single-entry-point agents, this is a no-op.
"""
"""Stop the agent runtime."""
if self._agent_runtime is not None:
await self._agent_runtime.stop()
@@ -881,7 +887,7 @@ class AgentRunner:
"""
Trigger execution at a specific entry point (non-blocking).
For multi-entry-point agents only. Returns execution ID for tracking.
Returns execution ID for tracking.
Args:
entry_point_id: Which entry point to trigger
@@ -890,16 +896,7 @@ class AgentRunner:
Returns:
Execution ID for tracking
Raises:
RuntimeError: If agent doesn't use async entry points
"""
if not self._uses_async_entry_points:
raise RuntimeError(
"trigger() is only available for multi-entry-point agents. "
"Use run() for single-entry-point agents."
)
if self._agent_runtime is None:
self._setup()
@@ -916,19 +913,9 @@ class AgentRunner:
"""
Get goal progress across all execution streams.
For multi-entry-point agents only.
Returns:
Dict with overall_progress, criteria_status, constraint_violations, etc.
Raises:
RuntimeError: If agent doesn't use async entry points
"""
if not self._uses_async_entry_points:
raise RuntimeError(
"get_goal_progress() is only available for multi-entry-point agents."
)
if self._agent_runtime is None:
self._setup()
@@ -936,14 +923,11 @@ class AgentRunner:
def get_entry_points(self) -> list[EntryPointSpec]:
"""
Get all registered entry points (for multi-entry-point agents).
Get all registered entry points.
Returns:
List of EntryPointSpec objects
"""
if not self._uses_async_entry_points:
return []
if self._agent_runtime is None:
self._setup()
@@ -1367,7 +1351,7 @@ Respond with JSON only:
self._temp_dir = None
async def cleanup_async(self) -> None:
"""Clean up resources (asynchronous - for multi-entry-point agents)."""
"""Clean up resources (asynchronous)."""
# Stop agent runtime if running
if self._agent_runtime is not None and self._agent_runtime.is_running:
await self._agent_runtime.stop()
@@ -1378,8 +1362,7 @@ Respond with JSON only:
async def __aenter__(self) -> "AgentRunner":
"""Context manager entry."""
self._setup()
# Start runtime for multi-entry-point agents
if self._uses_async_entry_points and self._agent_runtime is not None:
if self._agent_runtime is not None:
await self._agent_runtime.start()
return self
+35 -5
View File
@@ -1,5 +1,6 @@
"""Tool discovery and registration for agent runner."""
import contextvars
import importlib.util
import inspect
import json
@@ -13,6 +14,13 @@ from framework.llm.provider import Tool, ToolResult, ToolUse
logger = logging.getLogger(__name__)
# Per-execution context overrides. Each asyncio task (and thus each
# concurrent graph execution) gets its own copy, so there are no races
# when multiple ExecutionStreams run in parallel.
_execution_context: contextvars.ContextVar[dict[str, Any] | None] = contextvars.ContextVar(
"_execution_context", default=None
)
@dataclass
class RegisteredTool:
@@ -36,7 +44,7 @@ class ToolRegistry:
# Framework-internal context keys injected into tool calls.
# Stripped from LLM-facing schemas (the LLM doesn't know these values)
# and auto-injected at call time for tools that accept them.
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id"})
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id", "data_dir"})
def __init__(self):
self._tools: dict[str, RegisteredTool] = {}
@@ -262,6 +270,24 @@ class ToolRegistry:
"""
self._session_context.update(context)
@staticmethod
def set_execution_context(**context) -> contextvars.Token:
"""Set per-execution context overrides (concurrency-safe via contextvars).
Values set here take precedence over session context. Each asyncio
task gets its own copy, so concurrent executions don't interfere.
Returns a token that must be passed to :meth:`reset_execution_context`
to restore the previous state.
"""
current = _execution_context.get() or {}
return _execution_context.set({**current, **context})
@staticmethod
def reset_execution_context(token: contextvars.Token) -> None:
"""Restore execution context to its previous state."""
_execution_context.reset(token)
def load_mcp_config(self, config_path: Path) -> None:
"""
Load and register MCP servers from a config file.
@@ -359,11 +385,15 @@ class ToolRegistry:
):
def executor(inputs: dict) -> Any:
try:
# Only inject session context params the tool accepts
# Build base context: session < execution (execution wins)
base_context = dict(registry_ref._session_context)
exec_ctx = _execution_context.get()
if exec_ctx:
base_context.update(exec_ctx)
# Only inject context params the tool accepts
filtered_context = {
k: v
for k, v in registry_ref._session_context.items()
if k in tool_params
k: v for k, v in base_context.items() if k in tool_params
}
merged_inputs = {**filtered_context, **inputs}
result = client_ref.call_tool(tool_name, merged_inputs)
+172
View File
@@ -0,0 +1,172 @@
# Agent Runtime
Unified execution system for all Hive agents. Every agent — single-entry or multi-entry, headless or TUI — runs through the same runtime stack.
## Topology
```
AgentRunner.load(agent_path)
|
AgentRunner
(factory + public API)
|
_setup_agent_runtime()
|
AgentRuntime
(lifecycle + orchestration)
/ | \
Stream A Stream B Stream C ← one per entry point
| | |
GraphExecutor GraphExecutor GraphExecutor
| | |
Node → Node → Node (graph traversal)
```
Single-entry agents get a `"default"` entry point automatically. There is no separate code path.
## Components
| Component | File | Role |
|---|---|---|
| `AgentRunner` | `runner/runner.py` | Load agents, configure tools/LLM, expose high-level API |
| `AgentRuntime` | `runtime/agent_runtime.py` | Lifecycle management, entry point routing, event bus |
| `ExecutionStream` | `runtime/execution_stream.py` | Per-entry-point execution queue, session persistence |
| `GraphExecutor` | `graph/executor.py` | Node traversal, tool dispatch, checkpointing |
| `EventBus` | `runtime/event_bus.py` | Pub/sub for execution events (streaming, I/O) |
| `SharedStateManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels |
| `OutcomeAggregator` | `runtime/outcome_aggregator.py` | Goal progress tracking across streams |
| `SessionStore` | `storage/session_store.py` | Session state persistence (`sessions/{id}/state.json`) |
## Programming Interface
### AgentRunner (high-level)
```python
from framework.runner import AgentRunner
# Load and run
runner = AgentRunner.load("exports/my_agent", model="anthropic/claude-sonnet-4-20250514")
result = await runner.run({"query": "hello"})
# Resume from paused session
result = await runner.run({"query": "continue"}, session_state=saved_state)
# Lifecycle
await runner.start() # Start the runtime
await runner.stop() # Stop the runtime
exec_id = await runner.trigger("default", {}) # Non-blocking trigger
progress = await runner.get_goal_progress() # Goal evaluation
entry_points = runner.get_entry_points() # List entry points
# Context manager
async with AgentRunner.load("exports/my_agent") as runner:
result = await runner.run({"query": "hello"})
# Cleanup
runner.cleanup() # Synchronous
await runner.cleanup_async() # Asynchronous
```
### AgentRuntime (lower-level)
```python
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
# Create runtime with entry points
runtime = create_agent_runtime(
graph=graph,
goal=goal,
storage_path=Path("~/.hive/agents/my_agent"),
entry_points=[
EntryPointSpec(id="default", name="Default", entry_node="start", trigger_type="manual"),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=checkpoint_config,
)
# Lifecycle
await runtime.start()
await runtime.stop()
# Execution
exec_id = await runtime.trigger("default", {"query": "hello"}) # Non-blocking
result = await runtime.trigger_and_wait("default", {"query": "hello"}) # Blocking
result = await runtime.trigger_and_wait("default", {}, session_state=state) # Resume
# Client-facing node I/O
await runtime.inject_input(node_id="chat", content="user response")
# Events
sub_id = runtime.subscribe_to_events(
event_types=[EventType.CLIENT_OUTPUT_DELTA],
handler=my_handler,
)
runtime.unsubscribe_from_events(sub_id)
# Inspection
runtime.is_running # bool
runtime.event_bus # EventBus
runtime.state_manager # SharedStateManager
runtime.get_stats() # Runtime statistics
```
## Execution Flow
1. `AgentRunner.run()` calls `AgentRuntime.trigger_and_wait()`
2. `AgentRuntime` routes to the `ExecutionStream` for the entry point
3. `ExecutionStream` creates a `GraphExecutor` and calls `execute()`
4. `GraphExecutor` traverses nodes, dispatches tools, manages checkpoints
5. `ExecutionResult` flows back up through the stack
6. `ExecutionStream` writes session state to disk
## Session Resume
All execution paths support session resume:
```python
# First run (agent pauses at a client-facing node)
result = await runner.run({"query": "start task"})
# result.paused_at = "review-node"
# result.session_state = {"memory": {...}, "paused_at": "review-node", ...}
# Resume
result = await runner.run({"input": "approved"}, session_state=result.session_state)
```
Session state flows: `AgentRunner.run()``AgentRuntime.trigger_and_wait()``ExecutionStream.execute()``GraphExecutor.execute()`.
Checkpoints are saved at node boundaries (`sessions/{id}/checkpoints/`) for crash recovery.
## Event Bus
The `EventBus` provides real-time execution visibility:
| Event | When |
|---|---|
| `NODE_STARTED` | Node begins execution |
| `NODE_COMPLETED` | Node finishes |
| `TOOL_CALL_STARTED` | Tool invocation begins |
| `TOOL_CALL_COMPLETED` | Tool invocation finishes |
| `CLIENT_OUTPUT_DELTA` | Agent streams text to user |
| `CLIENT_INPUT_REQUESTED` | Agent needs user input |
| `EXECUTION_COMPLETED` | Full execution finishes |
In headless mode, `AgentRunner` subscribes to `CLIENT_OUTPUT_DELTA` and `CLIENT_INPUT_REQUESTED` to print output and read stdin. In TUI mode, `AdenTUI` subscribes to route events to UI widgets.
## Storage Layout
```
~/.hive/agents/{agent_name}/
sessions/
session_YYYYMMDD_HHMMSS_{uuid}/
state.json # Session state (status, memory, progress)
checkpoints/ # Node-boundary snapshots
logs/
summary.json # Execution summary
details.jsonl # Detailed event log
tool_logs.jsonl # Tool call log
runtime_logs/ # Cross-session runtime logs
```
@@ -0,0 +1,842 @@
# Resumable Sessions Design
## Problem Statement
Currently, when an agent encounters a failure during execution (e.g., credential validation, API errors, tool failures), the entire session is lost. This creates a poor user experience, especially when:
1. The agent has completed significant work before the failure
2. The failure is recoverable (e.g., adding missing credentials)
3. The user wants to retry from the exact failure point without redoing work
## Design Goals
1. **Crash Recovery**: Sessions can resume after process crashes or errors
2. **Partial Completion**: Preserve work done by nodes that completed successfully
3. **Flexible Resume Points**: Resume from exact failure point or previous checkpoints
4. **State Consistency**: Guarantee consistent SharedMemory and conversation state
5. **Minimal Overhead**: Checkpointing shouldn't significantly impact performance
6. **User Control**: Users can inspect, modify, and resume sessions explicitly
## Architecture
### 1. Checkpoint System
#### Checkpoint Types
**Automatic Checkpoints** (saved automatically by framework):
- `node_start`: Before each node begins execution
- `node_complete`: After each node successfully completes
- `edge_transition`: Before traversing to next node
- `loop_iteration`: At each iteration in EventLoopNode (optional)
**Manual Checkpoints** (triggered by agent designer):
- `safe_point`: Explicitly marked safe points in graph
- `user_checkpoint`: Before awaiting user input in client-facing nodes
#### Checkpoint Data Structure
```python
@dataclass
class Checkpoint:
"""Single checkpoint in execution timeline."""
# Identity
checkpoint_id: str # Format: checkpoint_{timestamp}_{uuid_short}
session_id: str
checkpoint_type: str # "node_start", "node_complete", etc.
# Timestamps
created_at: str # ISO 8601
# Execution state
current_node: str | None
next_node: str | None # For edge_transition checkpoints
execution_path: list[str] # Nodes executed so far
# Memory state (snapshot)
shared_memory: dict[str, Any] # Full SharedMemory._data
# Per-node conversation state references
# (actual conversations stored separately, reference by node_id)
conversation_states: dict[str, str] # {node_id: conversation_checkpoint_id}
# Output accumulator state
accumulated_outputs: dict[str, Any]
# Execution metrics (for resuming quality tracking)
metrics_snapshot: dict[str, Any]
# Metadata
is_clean: bool # True if no failures/retries before this checkpoint
can_resume_from: bool # False if checkpoint is in unstable state
description: str # Human-readable checkpoint description
```
#### Storage Structure
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state (existing)
├── checkpoints/
│ ├── index.json # Checkpoint index/manifest
│ ├── checkpoint_1.json # Individual checkpoints
│ ├── checkpoint_2.json
│ └── checkpoint_N.json
├── conversations/ # Per-node conversation state (existing)
│ ├── node_id_1/
│ │ ├── parts/
│ │ ├── meta.json
│ │ └── cursor.json
│ └── node_id_2/...
├── data/ # Spillover artifacts (existing)
└── logs/ # L1/L2/L3 logs (existing)
```
**Checkpoint Index Format** (`checkpoints/index.json`):
```json
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30.123Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
{
"checkpoint_id": "checkpoint_20260208_143045_abc789",
"type": "node_start",
"created_at": "2026-02-08T14:30:45.456Z",
"current_node": "analyzer",
"is_clean": true,
"can_resume_from": true,
"description": "Starting analyzer node"
}
],
"latest_checkpoint_id": "checkpoint_20260208_143045_abc789",
"total_checkpoints": 2
}
```
### 2. Resume Mechanism
#### Resume Flow
```python
# High-level resume flow
async def resume_session(
session_id: str,
checkpoint_id: str | None = None, # None = resume from latest
modifications: dict[str, Any] | None = None, # Override memory values
) -> ExecutionResult:
"""
Resume a session from a checkpoint.
Args:
session_id: Session to resume
checkpoint_id: Specific checkpoint (None = latest)
modifications: Optional memory/state modifications before resume
Returns:
ExecutionResult with resumed execution
"""
# 1. Load session state
session_state = await session_store.read_state(session_id)
# 2. Verify session is resumable
if not session_state.is_resumable:
raise ValueError(f"Session {session_id} is not resumable")
# 3. Load checkpoint
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
checkpoint_id or session_state.progress.resume_from
)
# 4. Restore state
# - Restore SharedMemory from checkpoint.shared_memory
# - Restore per-node conversations from checkpoint.conversation_states
# - Restore output accumulator from checkpoint.accumulated_outputs
# - Apply modifications if provided
# 5. Resume execution from checkpoint.next_node or checkpoint.current_node
result = await executor.execute(
graph=graph,
goal=goal,
memory=restored_memory,
entry_point=checkpoint.next_node or checkpoint.current_node,
session_state=restored_session_state,
)
# 6. Update session state with resumed execution
await session_store.write_state(session_id, updated_state)
return result
```
#### Checkpoint Restoration
```python
@dataclass
class CheckpointStore:
"""Manages checkpoint storage and retrieval."""
async def save_checkpoint(
self,
session_id: str,
checkpoint: Checkpoint,
) -> None:
"""Save a checkpoint atomically."""
# 1. Write checkpoint file: checkpoints/checkpoint_{id}.json
# 2. Update index: checkpoints/index.json
# 3. Use atomic write for crash safety
async def load_checkpoint(
self,
session_id: str,
checkpoint_id: str | None = None,
) -> Checkpoint | None:
"""Load a checkpoint by ID or latest."""
# 1. Read checkpoint index
# 2. Find checkpoint by ID (or latest if None)
# 3. Load and deserialize checkpoint file
async def list_checkpoints(
self,
session_id: str,
checkpoint_type: str | None = None,
is_clean: bool | None = None,
) -> list[Checkpoint]:
"""List all checkpoints for a session with optional filters."""
async def delete_checkpoint(
self,
session_id: str,
checkpoint_id: str,
) -> bool:
"""Delete a specific checkpoint."""
async def prune_checkpoints(
self,
session_id: str,
keep_count: int = 10,
keep_clean_only: bool = False,
) -> int:
"""Prune old checkpoints, keeping most recent N."""
```
### 3. GraphExecutor Integration
#### Modified Execution Loop
```python
# In GraphExecutor.execute()
async def execute(
self,
graph: GraphSpec,
goal: Goal,
memory: SharedMemory | None = None,
entry_point: str = "start",
session_state: dict[str, Any] | None = None,
checkpoint_config: CheckpointConfig | None = None,
) -> ExecutionResult:
"""
Execute graph with checkpointing support.
New parameters:
checkpoint_config: Configuration for checkpointing behavior
"""
# Initialize checkpoint store
checkpoint_store = CheckpointStore(storage_path / "checkpoints")
# Restore from checkpoint if session_state indicates resume
if session_state and session_state.get("resume_from"):
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
session_state["resume_from"]
)
memory = self._restore_memory_from_checkpoint(checkpoint)
entry_point = checkpoint.next_node or checkpoint.current_node
current_node = entry_point
while current_node:
# CHECKPOINT: node_start
if checkpoint_config and checkpoint_config.checkpoint_on_node_start:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_start",
current_node=current_node,
memory=memory,
# ... other state
)
try:
# Execute node
result = await self._execute_node(current_node, memory, context)
# CHECKPOINT: node_complete
if checkpoint_config and checkpoint_config.checkpoint_on_node_complete:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_complete",
current_node=current_node,
memory=memory,
# ... other state
)
except Exception as e:
# On failure, mark current checkpoint as resume point
await self._mark_failure_checkpoint(
checkpoint_store,
current_node=current_node,
error=str(e),
)
raise
# Find next edge
next_node = self._find_next_node(current_node, result, memory)
# CHECKPOINT: edge_transition
if next_node and checkpoint_config and checkpoint_config.checkpoint_on_edge:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="edge_transition",
current_node=current_node,
next_node=next_node,
memory=memory,
# ... other state
)
current_node = next_node
```
### 4. EventLoopNode Integration
#### Conversation State Checkpointing
EventLoopNode already has conversation persistence via `ConversationStore`. For resumability:
```python
class EventLoopNode:
async def execute(self, ctx: NodeContext) -> NodeResult:
"""Execute with checkpoint support."""
# Try to restore from checkpoint
if ctx.checkpoint_id:
conversation = await self._restore_conversation(ctx.checkpoint_id)
output_accumulator = await OutputAccumulator.restore(self.store)
else:
# Fresh start
conversation = await self._initialize_conversation(ctx)
output_accumulator = OutputAccumulator(store=self.store)
# Event loop with periodic checkpointing
iteration = 0
while iteration < self.config.max_iterations:
# Optional: checkpoint every N iterations
if self.config.checkpoint_every_n_iterations:
if iteration % self.config.checkpoint_every_n_iterations == 0:
await self._save_loop_checkpoint(
conversation,
output_accumulator,
iteration,
)
# ... rest of event loop
iteration += 1
```
**Note**: EventLoopNode conversation state is already persisted to disk after each turn via `ConversationStore`, so it's naturally resumable. We just need to:
1. Track which conversation checkpoint to restore from
2. Ensure output accumulator state is also restored
### 5. User-Facing API
#### MCP Tools for Resume
```python
# In tools/src/aden_tools/tools/session_management/
@tool
async def list_resumable_sessions(
agent_work_dir: str,
status: str = "failed", # "failed", "paused", "cancelled"
limit: int = 20,
) -> dict:
"""
List sessions that can be resumed.
Returns:
{
"sessions": [
{
"session_id": "session_20260208_143022_abc12345",
"status": "failed",
"error": "Missing API key: OPENAI_API_KEY",
"failed_at_node": "analyzer",
"last_checkpoint": "checkpoint_20260208_143045_abc789",
"created_at": "2026-02-08T14:30:22Z",
"updated_at": "2026-02-08T14:30:45Z"
}
],
"total": 1
}
"""
@tool
async def list_session_checkpoints(
agent_work_dir: str,
session_id: str,
checkpoint_type: str = "", # Filter by type
clean_only: bool = False, # Only show clean checkpoints
) -> dict:
"""
List all checkpoints for a session.
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
...
]
}
"""
@tool
async def inspect_checkpoint(
agent_work_dir: str,
session_id: str,
checkpoint_id: str,
include_memory: bool = False, # Include full memory state
) -> dict:
"""
Inspect a checkpoint's detailed state.
Returns:
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"current_node": "collector",
"execution_path": ["start", "collector"],
"accumulated_outputs": {
"twitter_handles": ["@user1", "@user2"]
},
"memory": {...}, # If include_memory=True
"metrics_snapshot": {
"total_retries": 2,
"nodes_with_failures": []
}
}
"""
@tool
async def resume_session(
agent_work_dir: str,
session_id: str,
checkpoint_id: str = "", # Empty = latest checkpoint
memory_modifications: str = "{}", # JSON string of memory overrides
) -> dict:
"""
Resume a session from a checkpoint.
Args:
agent_work_dir: Path to agent workspace
session_id: Session to resume
checkpoint_id: Specific checkpoint (empty = latest)
memory_modifications: JSON object with memory key overrides
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"resumed_from": "checkpoint_20260208_143045_abc789",
"status": "active", # Now actively running
"message": "Session resumed successfully from checkpoint_20260208_143045_abc789"
}
"""
```
#### CLI Commands
```bash
# List resumable sessions
hive sessions list --agent deep_research_agent --status failed
# Show checkpoints for a session
hive sessions checkpoints session_20260208_143022_abc12345
# Inspect a checkpoint
hive sessions inspect session_20260208_143022_abc12345 checkpoint_20260208_143045_abc789
# Resume a session
hive sessions resume session_20260208_143022_abc12345
# Resume from specific checkpoint
hive sessions resume session_20260208_143022_abc12345 --checkpoint checkpoint_20260208_143030_xyz123
# Resume with memory modifications (e.g., after adding credentials)
hive sessions resume session_20260208_143022_abc12345 --set api_key=sk-...
```
### 6. Configuration
#### CheckpointConfig
```python
@dataclass
class CheckpointConfig:
"""Configuration for checkpoint behavior."""
# When to checkpoint
checkpoint_on_node_start: bool = True
checkpoint_on_node_complete: bool = True
checkpoint_on_edge: bool = False # Usually redundant with node_start
checkpoint_on_loop_iteration: bool = False # Can be expensive
checkpoint_every_n_iterations: int = 0 # 0 = disabled
# Pruning
max_checkpoints_per_session: int = 100
prune_after_node_count: int = 10 # Prune every N nodes
keep_clean_checkpoints_only: bool = False
# Performance
async_checkpoint: bool = True # Don't block execution on checkpoint writes
# What to include
include_conversation_snapshots: bool = True
include_full_memory: bool = True
```
#### Agent-Level Configuration
```python
# In agent.py or config.py
class MyAgent(Agent):
def get_checkpoint_config(self) -> CheckpointConfig:
"""Override to customize checkpoint behavior."""
return CheckpointConfig(
checkpoint_on_node_start=True,
checkpoint_on_node_complete=True,
checkpoint_every_n_iterations=5, # Checkpoint every 5 iterations in loops
max_checkpoints_per_session=50,
)
```
## Implementation Plan
### Phase 1: Core Checkpoint Infrastructure (Week 1)
1. **Create checkpoint schemas**
- `Checkpoint` dataclass
- `CheckpointIndex` for manifest
- Serialization/deserialization
2. **Implement CheckpointStore**
- `save_checkpoint()` with atomic writes
- `load_checkpoint()` with deserialization
- `list_checkpoints()` with filtering
- `prune_checkpoints()` for cleanup
3. **Update SessionState schema**
- Add `resume_from_checkpoint_id` field
- Add `checkpoints_enabled` flag
### Phase 2: GraphExecutor Integration (Week 2)
1. **Modify GraphExecutor**
- Add `CheckpointConfig` parameter
- Implement checkpoint saving at node boundaries
- Implement checkpoint restoration logic
- Handle memory state snapshots
2. **Update execution loop**
- Checkpoint before node execution
- Checkpoint after successful completion
- Mark failure checkpoints on errors
### Phase 3: EventLoopNode Integration (Week 3)
1. **Enhance conversation restoration**
- Link checkpoints to conversation states
- Ensure OutputAccumulator is checkpointed
- Test loop resumption from middle of execution
2. **Add optional loop iteration checkpoints**
- Configurable iteration frequency
- Balance between granularity and performance
### Phase 4: User-Facing Features (Week 4)
1. **Implement MCP tools**
- `list_resumable_sessions`
- `list_session_checkpoints`
- `inspect_checkpoint`
- `resume_session`
2. **Add CLI commands**
- `hive sessions list`
- `hive sessions checkpoints`
- `hive sessions inspect`
- `hive sessions resume`
3. **Update TUI**
- Show resumable sessions in UI
- Allow resume from TUI interface
### Phase 5: Testing & Documentation (Week 5)
1. **Write comprehensive tests**
- Unit tests for CheckpointStore
- Integration tests for resume flow
- Edge case testing (concurrent checkpoints, corruption, etc.)
2. **Performance testing**
- Measure checkpoint overhead
- Optimize async checkpoint writing
- Test with large memory states
3. **Documentation**
- Update skills with resume patterns
- Document checkpoint configuration
- Add troubleshooting guide
## Performance Considerations
### Checkpoint Overhead
**Estimated overhead per checkpoint**:
- Memory serialization: ~5-10ms for typical state (< 1MB)
- File I/O: ~10-20ms for atomic write
- Total: ~15-30ms per checkpoint
**Mitigation strategies**:
1. **Async checkpointing**: Don't block execution on writes
2. **Selective checkpointing**: Only checkpoint at important boundaries
3. **Incremental checkpoints**: Store deltas instead of full state (future)
4. **Compression**: Compress large memory states before writing
### Storage Size
**Typical checkpoint size**:
- Small memory state (< 100KB): ~50-100KB per checkpoint
- Medium memory state (< 1MB): ~500KB-1MB per checkpoint
- Large memory state (> 1MB): ~1-5MB per checkpoint
**Mitigation strategies**:
1. **Pruning**: Keep only N most recent checkpoints
2. **Clean-only retention**: Only keep checkpoints from clean execution
3. **Compression**: Use gzip for checkpoint files
4. **Archiving**: Move old checkpoints to archive storage
## Error Handling
### Checkpoint Save Failures
**Scenarios**:
- Disk full
- Permission errors
- Serialization failures
- Concurrent writes
**Handling**:
```python
try:
await checkpoint_store.save_checkpoint(session_id, checkpoint)
except CheckpointSaveError as e:
# Log warning but don't fail execution
logger.warning(f"Failed to save checkpoint: {e}")
# Continue execution without checkpoint
```
### Checkpoint Load Failures
**Scenarios**:
- Checkpoint file corrupted
- Checkpoint format incompatible
- Referenced conversation state missing
**Handling**:
```python
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, checkpoint_id)
except CheckpointLoadError as e:
# Try to find previous valid checkpoint
checkpoints = await checkpoint_store.list_checkpoints(session_id)
for cp in reversed(checkpoints):
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, cp.checkpoint_id)
logger.info(f"Fell back to checkpoint {cp.checkpoint_id}")
break
except CheckpointLoadError:
continue
else:
raise ValueError(f"No valid checkpoints found for session {session_id}")
```
### Resume Failures
**Scenarios**:
- Checkpoint state inconsistent with current graph
- Node no longer exists in updated agent code
- Memory keys missing required values
**Handling**:
1. **Validation**: Verify checkpoint compatibility before resume
2. **Graceful degradation**: Resume from earlier checkpoint if possible
3. **User notification**: Clear error messages about why resume failed
## Migration Path
### Backward Compatibility
**Existing sessions** (without checkpoints):
- Can still be executed normally
- Checkpoint system is opt-in per agent
- No breaking changes to existing APIs
**Enabling checkpoints**:
```python
# Option 1: Agent-level default
class MyAgent(Agent):
checkpoint_config = CheckpointConfig(
checkpoint_on_node_complete=True,
)
# Option 2: Runtime override
runtime = create_agent_runtime(
agent=my_agent,
checkpoint_config=CheckpointConfig(...),
)
# Option 3: Per-execution
result = await executor.execute(
graph=graph,
goal=goal,
checkpoint_config=CheckpointConfig(...),
)
```
### Gradual Rollout
1. **Phase 1**: Core infrastructure, no user-facing features
2. **Phase 2**: Opt-in for specific agents via config
3. **Phase 3**: User-facing MCP tools and CLI
4. **Phase 4**: Enable by default for all new agents
5. **Phase 5**: TUI integration
## Future Enhancements
### 1. Incremental Checkpoints
Instead of full state snapshots, store only deltas:
```python
@dataclass
class IncrementalCheckpoint:
"""Checkpoint with only changed state."""
base_checkpoint_id: str # Parent checkpoint
memory_delta: dict[str, Any] # Only changed keys
added_outputs: dict[str, Any] # Only new outputs
```
### 2. Distributed Checkpointing
For long-running agents, checkpoint to cloud storage:
```python
checkpoint_config = CheckpointConfig(
storage_backend="s3", # or "gcs", "azure"
storage_url="s3://my-bucket/checkpoints/",
)
```
### 3. Checkpoint Compression
Compress large memory states:
```python
checkpoint_config = CheckpointConfig(
compress=True,
compression_threshold_bytes=100_000, # Compress if > 100KB
)
```
### 4. Smart Checkpoint Selection
Use heuristics to decide when to checkpoint:
```python
class SmartCheckpointStrategy:
def should_checkpoint(self, context: ExecutionContext) -> bool:
# Checkpoint after expensive nodes
if context.node_latency_ms > 30_000:
return True
# Checkpoint before risky operations
if context.node_id in ["api_call", "external_tool"]:
return True
# Checkpoint after significant memory changes
if context.memory_delta_size > 10:
return True
return False
```
## Security Considerations
### 1. Sensitive Data in Checkpoints
**Problem**: Checkpoints may contain sensitive data (API keys, credentials, PII)
**Mitigation**:
```python
@dataclass
class CheckpointConfig:
# Exclude sensitive keys from checkpoint
exclude_memory_keys: list[str] = field(default_factory=lambda: [
"api_key",
"credentials",
"access_token",
])
# Encrypt checkpoint files
encrypt_checkpoints: bool = True
encryption_key_source: str = "keychain" # or "env_var", "file"
```
### 2. Checkpoint Tampering
**Problem**: Malicious modification of checkpoint files
**Mitigation**:
```python
@dataclass
class Checkpoint:
# Add cryptographic signature
signature: str # HMAC of checkpoint content
def verify_signature(self, secret_key: str) -> bool:
"""Verify checkpoint hasn't been tampered with."""
...
```
## References
- [RUNTIME_LOGGING.md](./RUNTIME_LOGGING.md) - Current logging system
- [session_state.py](../schemas/session_state.py) - Session state schema
- [session_store.py](../storage/session_store.py) - Session storage
- [executor.py](../graph/executor.py) - Graph executor
- [event_loop_node.py](../graph/event_loop_node.py) - EventLoop implementation
+698
View File
@@ -0,0 +1,698 @@
# Runtime Logging System
## Overview
The Hive framework uses a **three-level observability system** for tracking agent execution at different granularities:
- **L1 (Summary)**: High-level run outcomes - success/failure, execution quality, attention flags
- **L2 (Details)**: Per-node completion details - retries, verdicts, latency, attention reasons
- **L3 (Tool Logs)**: Step-by-step execution - tool calls, LLM responses, judge feedback
This layered approach enables efficient debugging: start with L1 to identify problematic runs, drill into L2 to find failing nodes, and analyze L3 for root cause details.
---
## Storage Architecture
### Current Structure (Unified Sessions)
**Default since 2026-02-06**
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state and metadata
├── logs/ # Runtime logs (L1/L2/L3)
│ ├── summary.json # L1: Run outcome
│ ├── details.jsonl # L2: Per-node results
│ └── tool_logs.jsonl # L3: Step-by-step execution
├── conversations/ # Per-node EventLoop state
└── data/ # Spillover artifacts
```
**Key characteristics:**
- All session data colocated in one directory
- Consistent ID format: `session_YYYYMMDD_HHMMSS_{short_uuid}`
- Logs written incrementally (JSONL for L2/L3)
- Single source of truth: `state.json`
### Legacy Structure (Deprecated)
**Read-only for backward compatibility**
```
~/.hive/agents/{agent_name}/
├── runtime_logs/
│ └── runs/
│ └── {run_id}/
│ ├── summary.json # L1
│ ├── details.jsonl # L2
│ └── tool_logs.jsonl # L3
├── sessions/
│ └── exec_{stream_id}_{uuid}/
│ ├── conversations/
│ └── data/
├── runs/ # Deprecated
│ └── run_start_*.json
└── summaries/ # Deprecated
└── run_start_*.json
```
**Migration status:**
- ✅ New sessions write to unified structure only
- ✅ Old sessions remain readable
- ❌ No new writes to `runs/`, `summaries/`, `runtime_logs/runs/`
- ⚠️ Deprecation warnings emitted when reading old locations
---
## Components
### RuntimeLogger
**Location:** `core/framework/runtime/runtime_logger.py`
**Responsibilities:**
- Receives execution events from GraphExecutor
- Tracks per-node execution details
- Aggregates attention flags
- Coordinates with RuntimeLogStore
**Key methods:**
```python
def start_run(goal_id: str, session_id: str = "") -> str:
"""Initialize a new run. Uses session_id as run_id if provided."""
def log_step(node_id: str, step_index: int, tool_calls: list, ...):
"""Record one LLM step (L3). Appends to tool_logs.jsonl immediately."""
def log_node_complete(node_id: str, exit_status: str, ...):
"""Record node completion (L2). Appends to details.jsonl immediately."""
async def end_run(status: str):
"""Finalize run, aggregate L2→L1, write summary.json."""
```
**Attention flag triggers:**
```python
# From runtime_logger.py:190-203
needs_attention = any([
retry_count > 3,
escalate_count > 2,
latency_ms > 60000,
tokens_used > 100000,
total_steps > 20,
])
```
### RuntimeLogStore
**Location:** `core/framework/runtime/runtime_log_store.py`
**Responsibilities:**
- Manages log file I/O
- Handles both old and new storage paths
- Provides incremental append for L2/L3 (crash-safe)
- Atomic writes for L1
**Storage path resolution:**
```python
def _get_run_dir(run_id: str) -> Path:
"""Determine log directory based on run_id format.
- session_* → {storage_root}/sessions/{run_id}/logs/
- Other → {base_path}/runtime_logs/runs/{run_id}/ (deprecated)
"""
```
**Key methods:**
```python
def ensure_run_dir(run_id: str):
"""Create log directory immediately at start_run()."""
def append_step(run_id: str, step: NodeStepLog):
"""Append L3 entry to tool_logs.jsonl. Thread-safe sync write."""
def append_node_detail(run_id: str, detail: NodeDetail):
"""Append L2 entry to details.jsonl. Thread-safe sync write."""
async def save_summary(run_id: str, summary: RunSummaryLog):
"""Write L1 summary.json atomically at end_run()."""
```
**File format:**
- **L1 (summary.json)**: Standard JSON, written once at end
- **L2 (details.jsonl)**: JSONL (one object per line), appended per node
- **L3 (tool_logs.jsonl)**: JSONL (one object per line), appended per step
### Runtime Log Schemas
**Location:** `core/framework/runtime/runtime_log_schemas.py`
**L1: RunSummaryLog**
```python
@dataclass
class RunSummaryLog:
run_id: str
goal_id: str
status: str # "success", "failure", "degraded", "in_progress"
started_at: str # ISO 8601
ended_at: str | None
needs_attention: bool
attention_summary: AttentionSummary
total_nodes_executed: int
nodes_with_failures: list[str]
execution_quality: str # "clean", "degraded", "failed"
total_latency_ms: int
# ... additional metrics
```
**L2: NodeDetail**
```python
@dataclass
class NodeDetail:
node_id: str
exit_status: str # "success", "escalate", "no_valid_edge"
retry_count: int
verdict_counts: dict[str, int] # {ACCEPT: 1, RETRY: 3, ...}
total_steps: int
latency_ms: int
needs_attention: bool
attention_reasons: list[str]
# ... tool error tracking, token counts
```
**L3: NodeStepLog**
```python
@dataclass
class NodeStepLog:
node_id: str
step_index: int
tool_calls: list[dict]
tool_results: list[dict]
verdict: str # "ACCEPT", "RETRY", "ESCALATE", "CONTINUE"
verdict_feedback: str
llm_response_text: str
tokens_used: int
latency_ms: int
# ... detailed execution state
# Trace context (OTel-aligned; empty if observability context not set):
trace_id: str # From set_trace_context (OTel trace)
span_id: str # 16 hex chars per step (OTel span)
parent_span_id: str # Optional; for nested span hierarchy
execution_id: str # Session/run correlation id
```
L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
---
## Querying Logs (MCP Tools)
### Tools Location
**MCP Server:** `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py`
Three MCP tools provide access to the logging system:
### L1: query_runtime_logs
**Purpose:** Find problematic runs
```python
query_runtime_logs(
agent_work_dir: str, # e.g., "~/.hive/agents/deep_research_agent"
status: str = "", # "needs_attention", "success", "failure", "degraded"
limit: int = 20
) -> dict # {"runs": [...], "total": int}
```
**Returns:**
```json
{
"runs": [
{
"run_id": "session_20260206_115718_e22339c5",
"status": "degraded",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"]
},
"started_at": "2026-02-06T11:57:18Z"
}
],
"total": 1
}
```
**Common queries:**
```python
# Find all problematic runs
query_runtime_logs(agent_work_dir, status="needs_attention")
# Get recent runs regardless of status
query_runtime_logs(agent_work_dir, limit=10)
# Check for failures
query_runtime_logs(agent_work_dir, status="failure")
```
### L2: query_runtime_log_details
**Purpose:** Identify which nodes failed
```python
query_runtime_log_details(
agent_work_dir: str,
run_id: str, # From L1 query
needs_attention_only: bool = False,
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "nodes": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"nodes": [
{
"node_id": "intake-collector",
"exit_status": "escalate",
"retry_count": 5,
"verdict_counts": {"RETRY": 5, "ESCALATE": 1},
"attention_reasons": ["high_retry_count", "missing_outputs"],
"total_steps": 8,
"latency_ms": 12500,
"needs_attention": true
}
]
}
```
**Common queries:**
```python
# Get all problematic nodes
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
# Analyze specific node across run
query_runtime_log_details(agent_work_dir, run_id, node_id="intake-collector")
# Full node breakdown
query_runtime_log_details(agent_work_dir, run_id)
```
### L3: query_runtime_log_raw
**Purpose:** Root cause analysis
```python
query_runtime_log_raw(
agent_work_dir: str,
run_id: str,
step_index: int = -1, # Specific step or -1 for all
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "steps": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"steps": [
{
"node_id": "intake-collector",
"step_index": 3,
"tool_calls": [
{
"tool": "web_search",
"args": {"query": "@RomuloNevesOf"}
}
],
"tool_results": [
{
"status": "success",
"data": "..."
}
],
"verdict": "RETRY",
"verdict_feedback": "Missing required output 'twitter_handles'. You found the handle but didn't call set_output.",
"llm_response_text": "I found the Twitter profile...",
"tokens_used": 1234,
"latency_ms": 2500
}
]
}
```
**Common queries:**
```python
# All steps for a problematic node
query_runtime_log_raw(agent_work_dir, run_id, node_id="intake-collector")
# Specific step analysis
query_runtime_log_raw(agent_work_dir, run_id, step_index=5)
# Full execution trace
query_runtime_log_raw(agent_work_dir, run_id)
```
---
## Usage Patterns
### Pattern 1: Top-Down Investigation
**Use case:** Debug a failing agent
```python
# 1. Find problematic runs (L1)
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="needs_attention"
)
run_id = result["runs"][0]["run_id"]
# 2. Identify failing nodes (L2)
details = query_runtime_log_details(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id=run_id,
needs_attention_only=True
)
problem_node = details["nodes"][0]["node_id"]
# 3. Analyze root cause (L3)
raw = query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id=run_id,
node_id=problem_node
)
# Examine verdict_feedback, tool_results, etc.
```
### Pattern 2: Node-Specific Debugging
**Use case:** Investigate why a specific node keeps failing
```python
# Get recent runs
runs = query_runtime_logs("~/.hive/agents/my_agent", limit=10)
# For each run, check specific node
for run in runs["runs"]:
node_details = query_runtime_log_details(
"~/.hive/agents/my_agent",
run["run_id"],
node_id="problematic-node"
)
# Analyze retry patterns, error types
```
### Pattern 3: Real-Time Monitoring
**Use case:** Watch for issues during development
```python
import time
while True:
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/my_agent",
status="needs_attention",
limit=1
)
if result["total"] > 0:
new_issue = result["runs"][0]
print(f"⚠️ New issue detected: {new_issue['run_id']}")
# Alert or drill into L2/L3
time.sleep(10) # Poll every 10 seconds
```
---
## Integration Points
### GraphExecutor → RuntimeLogger
**Location:** `core/framework/graph/executor.py`
```python
# Executor creates logger and passes session_id
logger = RuntimeLogger(store, agent_id)
run_id = logger.start_run(goal_id, session_id=execution_id)
# During execution
logger.log_step(node_id, step_index, tool_calls, ...)
logger.log_node_complete(node_id, exit_status, ...)
# At completion
await logger.end_run(status="success")
```
### EventLoopNode → RuntimeLogger
**Location:** `core/framework/graph/event_loop_node.py`
```python
# EventLoopNode logs each step
self._logger.log_step(
node_id=self.id,
step_index=step_count,
tool_calls=current_tool_calls,
tool_results=current_tool_results,
verdict=verdict,
verdict_feedback=feedback,
...
)
```
### AgentRuntime → RuntimeLogger
**Location:** `core/framework/runtime/agent_runtime.py`
```python
# Runtime initializes logger with storage path
log_store = RuntimeLogStore(base_path / "runtime_logs")
logger = RuntimeLogger(log_store, agent_id)
# Passes session_id from ExecutionStream
logger.start_run(goal_id, session_id=execution_id)
```
---
## File Format Details
### L1: summary.json
**Written:** Once at end_run()
**Format:** Standard JSON
```json
{
"run_id": "session_20260206_115718_e22339c5",
"goal_id": "deep-research",
"status": "degraded",
"started_at": "2026-02-06T11:57:18.593081",
"ended_at": "2026-02-06T11:58:45.123456",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"],
"nodes_with_attention": ["intake-collector"]
},
"total_nodes_executed": 4,
"nodes_with_failures": ["intake-collector"],
"execution_quality": "degraded",
"total_latency_ms": 86530,
"total_retries": 5
}
```
### L2: details.jsonl
**Written:** Incrementally (append per node completion)
**Format:** JSONL (one JSON object per line)
```jsonl
{"node_id":"intake-collector","exit_status":"escalate","retry_count":5,"verdict_counts":{"RETRY":5,"ESCALATE":1},"total_steps":8,"latency_ms":12500,"needs_attention":true,"attention_reasons":["high_retry_count","missing_outputs"],"tool_error_count":0,"tokens_used":9876}
{"node_id":"profile-analyzer","exit_status":"success","retry_count":0,"verdict_counts":{"ACCEPT":1},"total_steps":2,"latency_ms":5432,"needs_attention":false,"attention_reasons":[],"tool_error_count":0,"tokens_used":3456}
```
### L3: tool_logs.jsonl
**Written:** Incrementally (append per step)
**Format:** JSONL (one JSON object per line)
Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
```jsonl
{"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
```
**Why JSONL?**
- Incremental append during execution (crash-safe)
- No need to parse entire file to add one line
- Data persisted immediately, not buffered
- Easy to stream/process line-by-line
---
## Attention Flags System
### Automatic Detection
The runtime logger automatically flags issues based on execution metrics:
| Trigger | Threshold | Attention Reason | Category |
|---------|-----------|------------------|----------|
| High retries | `retry_count > 3` | `high_retry_count` | Retry Loops |
| Escalations | `escalate_count > 2` | `escalation_pattern` | Guard Failures |
| High latency | `latency_ms > 60000` | `high_latency` | High Latency |
| Token usage | `tokens_used > 100000` | `high_token_usage` | Memory/Context |
| Stalled steps | `total_steps > 20` | `excessive_steps` | Stalled Execution |
| Tool errors | `tool_error_count > 0` | `tool_failures` | Tool Errors |
| Missing outputs | `exit_status != "success"` | `missing_outputs` | Missing Outputs |
### Attention Categories
Used by `/hive-debugger` skill for issue categorization:
1. **Missing Outputs**: Node didn't set required output keys
2. **Tool Errors**: Tool calls failed (API errors, timeouts)
3. **Retry Loops**: Judge repeatedly rejecting outputs
4. **Guard Failures**: Output validation failed
5. **Stalled Execution**: EventLoopNode not making progress
6. **High Latency**: Slow tool calls or LLM responses
7. **Client-Facing Issues**: Premature set_output before user input
8. **Edge Routing Errors**: No edges match current state
9. **Memory/Context Issues**: Conversation history too long
10. **Constraint Violations**: Agent violated goal-level rules
---
## Migration Guide
### Reading Old Logs
The system automatically handles both old and new formats:
```python
# MCP tools check both locations automatically
result = query_runtime_logs("~/.hive/agents/old_agent")
# Returns logs from both:
# - ~/.hive/agents/old_agent/runtime_logs/runs/*/
# - ~/.hive/agents/old_agent/sessions/session_*/logs/
```
### Deprecation Warnings
When reading from old locations, deprecation warnings are emitted:
```
DeprecationWarning: Reading logs from deprecated location for run_id=20260101T120000_abc12345.
New sessions use unified storage at sessions/session_*/logs/
```
### Migration Script (Optional)
For migrating existing old logs to new format, see:
- `EXECUTION_STORAGE_REDESIGN.md` - Migration strategy
- Future: `scripts/migrate_to_unified_sessions.py`
---
## Performance Characteristics
### Write Performance
- **L3 append**: ~1-2ms per step (sync I/O, thread-safe)
- **L2 append**: ~1-2ms per node (sync I/O, thread-safe)
- **L1 write**: ~5-10ms at end_run (atomic, async)
**Overhead:** < 5% of total execution time for typical agents
### Read Performance
- **L1 summary**: ~1-5ms (single JSON file)
- **L2 details**: ~10-50ms (JSONL, depends on node count)
- **L3 raw logs**: ~50-500ms (JSONL, depends on step count)
**Optimization:** Use filters (node_id, step_index) to reduce data read
### Storage Size
Typical session with 5 nodes, 20 steps:
- **L1 (summary.json)**: ~2-5 KB
- **L2 (details.jsonl)**: ~5-10 KB (1-2 KB per node)
- **L3 (tool_logs.jsonl)**: ~50-200 KB (2-10 KB per step)
**Total per session:** ~60-215 KB
**Compression:** Consider archiving old sessions after 90 days
---
## Troubleshooting
### Issue: Logs not appearing
**Symptom:** MCP tools return empty results
**Check:**
1. Verify storage path exists: `~/.hive/agents/{agent_name}/`
2. Check session directories: `ls ~/.hive/agents/{agent_name}/sessions/`
3. Verify logs directory exists: `ls ~/.hive/agents/{agent_name}/sessions/session_*/logs/`
4. Check file permissions
### Issue: Corrupt JSONL files
**Symptom:** Partial data or JSON decode errors
**Cause:** Process crash during write (rare, but possible)
**Recovery:**
```python
# MCP tools skip corrupt lines automatically
query_runtime_log_details(agent_work_dir, run_id)
# Logs warning but continues with valid lines
```
### Issue: High disk usage
**Symptom:** Storage growing too large
**Solution:**
```bash
# Archive old sessions
cd ~/.hive/agents/{agent_name}/sessions/
find . -name "session_2025*" -type d -exec tar -czf archive.tar.gz {} +
rm -rf session_2025*
# Or set up automatic cleanup (future feature)
```
---
## References
**Implementation:**
- `core/framework/runtime/runtime_logger.py` - Logger implementation
- `core/framework/runtime/runtime_log_store.py` - Storage layer
- `core/framework/runtime/runtime_log_schemas.py` - Data schemas
- `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py` - MCP query tools
**Documentation:**
- `EXECUTION_STORAGE_REDESIGN.md` - Unified session storage design
- `/.claude/skills/hive-debugger/SKILL.md` - Interactive debugging skill
**Related:**
- `core/framework/schemas/session_state.py` - Session state schema
- `core/framework/storage/session_store.py` - Session state storage
- `core/framework/graph/executor.py` - GraphExecutor integration
+121 -2
View File
@@ -8,16 +8,18 @@ while preserving the goal-driven approach.
import asyncio
import logging
from collections.abc import Callable
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.executor import ExecutionResult
from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.runtime.shared_state import SharedStateManager
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
@@ -37,6 +39,11 @@ class AgentRuntimeConfig:
max_history: int = 1000
execution_result_max: int = 1000
execution_result_ttl_seconds: float | None = None
# Webhook server config (only starts if webhook_routes is non-empty)
webhook_host: str = "127.0.0.1"
webhook_port: int = 8080
webhook_routes: list[dict] = field(default_factory=list)
# Each dict: {"source_id": str, "path": str, "methods": ["POST"], "secret": str|None}
class AgentRuntime:
@@ -100,6 +107,8 @@ class AgentRuntime:
tools: list["Tool"] | None = None,
tool_executor: Callable | None = None,
config: AgentRuntimeConfig | None = None,
runtime_log_store: Any = None,
checkpoint_config: CheckpointConfig | None = None,
):
"""
Initialize agent runtime.
@@ -112,18 +121,26 @@ class AgentRuntime:
tools: Available tools
tool_executor: Function to execute tools
config: Optional runtime configuration
runtime_log_store: Optional RuntimeLogStore for per-execution logging
checkpoint_config: Optional checkpoint configuration for resumable sessions
"""
self.graph = graph
self.goal = goal
self._config = config or AgentRuntimeConfig()
self._runtime_log_store = runtime_log_store
self._checkpoint_config = checkpoint_config
# Initialize storage
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
self._storage = ConcurrentStorage(
base_path=storage_path,
base_path=storage_path_obj,
cache_ttl=self._config.cache_ttl,
batch_interval=self._config.batch_interval,
)
# Initialize SessionStore for unified sessions (always enabled)
self._session_store = SessionStore(storage_path_obj)
# Initialize shared components
self._state_manager = SharedStateManager()
self._event_bus = EventBus(max_history=self._config.max_history)
@@ -138,10 +155,18 @@ class AgentRuntime:
self._entry_points: dict[str, EntryPointSpec] = {}
self._streams: dict[str, ExecutionStream] = {}
# Webhook server (created on start if webhook_routes configured)
self._webhook_server: Any = None
# Event-driven entry point subscriptions
self._event_subscriptions: list[str] = []
# State
self._running = False
self._lock = asyncio.Lock()
# Optional greeting shown to user on TUI load (set by AgentRunner)
self.intro_message: str = ""
def register_entry_point(self, spec: EntryPointSpec) -> None:
"""
Register a named entry point for the agent.
@@ -212,10 +237,70 @@ class AgentRuntime:
tool_executor=self._tool_executor,
result_retention_max=self._config.execution_result_max,
result_retention_ttl_seconds=self._config.execution_result_ttl_seconds,
runtime_log_store=self._runtime_log_store,
session_store=self._session_store,
checkpoint_config=self._checkpoint_config,
)
await stream.start()
self._streams[ep_id] = stream
# Start webhook server if routes are configured
if self._config.webhook_routes:
from framework.runtime.webhook_server import (
WebhookRoute,
WebhookServer,
WebhookServerConfig,
)
wh_config = WebhookServerConfig(
host=self._config.webhook_host,
port=self._config.webhook_port,
)
self._webhook_server = WebhookServer(self._event_bus, wh_config)
for rc in self._config.webhook_routes:
route = WebhookRoute(
source_id=rc["source_id"],
path=rc["path"],
methods=rc.get("methods", ["POST"]),
secret=rc.get("secret"),
)
self._webhook_server.add_route(route)
await self._webhook_server.start()
# Subscribe event-driven entry points to EventBus
from framework.runtime.event_bus import EventType as _ET
for ep_id, spec in self._entry_points.items():
if spec.trigger_type != "event":
continue
tc = spec.trigger_config
event_types = [_ET(et) for et in tc.get("event_types", [])]
if not event_types:
logger.warning(
f"Entry point '{ep_id}' has trigger_type='event' "
"but no event_types in trigger_config"
)
continue
# Capture ep_id in closure
def _make_handler(entry_point_id: str):
async def _on_event(event):
if self._running and entry_point_id in self._streams:
await self.trigger(entry_point_id, {"event": event.to_dict()})
return _on_event
sub_id = self._event_bus.subscribe(
event_types=event_types,
handler=_make_handler(ep_id),
filter_stream=tc.get("filter_stream"),
filter_node=tc.get("filter_node"),
)
self._event_subscriptions.append(sub_id)
self._running = True
logger.info(f"AgentRuntime started with {len(self._streams)} streams")
@@ -225,6 +310,16 @@ class AgentRuntime:
return
async with self._lock:
# Unsubscribe event-driven entry points
for sub_id in self._event_subscriptions:
self._event_bus.unsubscribe(sub_id)
self._event_subscriptions.clear()
# Stop webhook server
if self._webhook_server:
await self._webhook_server.stop()
self._webhook_server = None
# Stop all streams
for stream in self._streams.values():
await stream.stop()
@@ -430,6 +525,11 @@ class AgentRuntime:
"""Access the outcome aggregator."""
return self._outcome_aggregator
@property
def webhook_server(self) -> Any:
"""Access the webhook server (None if no webhook entry points)."""
return self._webhook_server
@property
def is_running(self) -> bool:
"""Check if runtime is running."""
@@ -448,11 +548,15 @@ def create_agent_runtime(
tools: list["Tool"] | None = None,
tool_executor: Callable | None = None,
config: AgentRuntimeConfig | None = None,
runtime_log_store: Any = None,
enable_logging: bool = True,
checkpoint_config: CheckpointConfig | None = None,
) -> AgentRuntime:
"""
Create and configure an AgentRuntime with entry points.
Convenience factory that creates runtime and registers entry points.
Runtime logging is enabled by default for observability.
Args:
graph: Graph specification
@@ -463,10 +567,23 @@ def create_agent_runtime(
tools: Available tools
tool_executor: Tool executor function
config: Runtime configuration
runtime_log_store: Optional RuntimeLogStore for per-execution logging.
If None and enable_logging=True, creates one automatically.
enable_logging: Whether to enable runtime logging (default: True).
Set to False to disable logging entirely.
checkpoint_config: Optional checkpoint configuration for resumable sessions.
If None, uses default checkpointing behavior.
Returns:
Configured AgentRuntime (not yet started)
"""
# Auto-create runtime log store if logging is enabled and not provided
if enable_logging and runtime_log_store is None:
from framework.runtime.runtime_log_store import RuntimeLogStore
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
runtime_log_store = RuntimeLogStore(storage_path_obj / "runtime_logs")
runtime = AgentRuntime(
graph=graph,
goal=goal,
@@ -475,6 +592,8 @@ def create_agent_runtime(
tools=tools,
tool_executor=tool_executor,
config=config,
runtime_log_store=runtime_log_store,
checkpoint_config=checkpoint_config,
)
for spec in entry_points:

Some files were not shown because too many files have changed in this diff Show More