Compare commits
133 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| fe775a36c0 | |||
| 2df9adcb43 | |||
| c756cbf6d5 | |||
| d0ac67c9d3 | |||
| 6ee47e243d | |||
| c1844b7a9d | |||
| 99a29e79e5 | |||
| 589a66ef26 | |||
| 3f960763cb | |||
| 15f8f3783c | |||
| a2b045c7e3 | |||
| 055cef2fdc | |||
| 6c6c69cbc3 | |||
| 6fe0062e6e | |||
| 26b8b2f448 | |||
| 7e40d6950a | |||
| 590bfa92cb | |||
| f0e89a1720 | |||
| 575563b1e8 | |||
| 82ea0e47ce | |||
| 2f57ca10f7 | |||
| 75c2d541c4 | |||
| b666f8b50b | |||
| 09f9322676 | |||
| f9a864ef93 | |||
| 27f28afe9c | |||
| 8f85722fef | |||
| 5588445a01 | |||
| 40529b5722 | |||
| cee632f50c | |||
| 3453e3aa05 | |||
| 8de637c421 | |||
| 6c75de862c | |||
| 2971134882 | |||
| 6e79860b43 | |||
| 74d0287ec5 | |||
| 51e81d80fc | |||
| cd014e41e4 | |||
| 830f11c47d | |||
| a73239dd98 | |||
| d68783a612 | |||
| a28ea40a7d | |||
| b22be7a6cb | |||
| 5b00445c05 | |||
| 5179677e8f | |||
| 2c25b2eae7 | |||
| f6705fe2d3 | |||
| c2771fed20 | |||
| fc781eccd9 | |||
| d5a25ae081 | |||
| 23b6fb6391 | |||
| 433967f0cf | |||
| 2a876c2a10 | |||
| ff0adeaba7 | |||
| 846edbf256 | |||
| c68dd48f6d | |||
| 8b828dd139 | |||
| 50c0a5da9e | |||
| 2f0e5c42f1 | |||
| 903288468a | |||
| 9e3bba6f59 | |||
| bc16f0752f | |||
| 86badd70fa | |||
| ce5379516c | |||
| a50078bbf2 | |||
| 2cef168442 | |||
| 0a1a9e3545 | |||
| 3c8682d80c | |||
| ecc5a1608f | |||
| bc81b55600 | |||
| 28b628c1b4 | |||
| 148264ac73 | |||
| 4046e4e379 | |||
| 28298d9af2 | |||
| 221712128d | |||
| e9fc36f2d3 | |||
| 305b880b1d | |||
| 34782a6b85 | |||
| d25d94e71b | |||
| 51f1b449cd | |||
| 804e47dde4 | |||
| 582c810d15 | |||
| cede629718 | |||
| 10941dc7fc | |||
| c1c16878e4 | |||
| 80a41b434b | |||
| 9a8e117f1d | |||
| 878603033a | |||
| 1c6f17e8db | |||
| 8f32ef8064 | |||
| 7519c73f2a | |||
| e12bc96e21 | |||
| bf402aaa18 | |||
| 2355d3d729 | |||
| a093a59cb0 | |||
| d7917988c3 | |||
| ae566a2027 | |||
| b15473d3f3 | |||
| 265bf885ec | |||
| e318281989 | |||
| 3e2a11d60d | |||
| 4b9f73310e | |||
| b17c26116d | |||
| 3114af75e4 | |||
| 7a6d10639b | |||
| 6ff29ea6aa | |||
| a23f01973a | |||
| 0aaa3a3eca | |||
| 82f05d1102 | |||
| 8ff6d9c8bd | |||
| 23e249144d | |||
| 25014bfa89 | |||
| 78ea585779 | |||
| ac13c11f89 | |||
| fd1826a267 | |||
| b99b6c5cd3 | |||
| f4737dcfe7 | |||
| ca7f6d3514 | |||
| b033c56ae5 | |||
| 694feaffd2 | |||
| eb68e2143b | |||
| 960a4549ef | |||
| 25989d9f90 | |||
| 684da96a83 | |||
| abae7979cb | |||
| 49bce57fcf | |||
| d32308b6d2 | |||
| 604d16e353 | |||
| db577785d6 | |||
| c9ae3a0541 | |||
| ed95dab9f3 | |||
| a6536cef94 | |||
| 3ccc81e81c |
@@ -1,361 +0,0 @@
|
||||
---
|
||||
name: building-agents-construction
|
||||
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.0"
|
||||
type: procedural
|
||||
part_of: building-agents
|
||||
requires: building-agents-core
|
||||
---
|
||||
|
||||
# Agent Construction - EXECUTE THESE STEPS
|
||||
|
||||
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
|
||||
|
||||
When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it.
|
||||
|
||||
---
|
||||
|
||||
## STEP 1: Initialize Build Environment
|
||||
|
||||
**EXECUTE THESE TOOL CALLS NOW:**
|
||||
|
||||
1. Register the hive-tools MCP server:
|
||||
|
||||
```
|
||||
mcp__agent-builder__add_mcp_server(
|
||||
name="hive-tools",
|
||||
transport="stdio",
|
||||
command="python",
|
||||
args='["mcp_server.py", "--stdio"]',
|
||||
cwd="tools",
|
||||
description="Hive tools MCP server"
|
||||
)
|
||||
```
|
||||
|
||||
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
|
||||
|
||||
```
|
||||
mcp__agent-builder__create_session(name="AGENT_NAME")
|
||||
```
|
||||
|
||||
3. Discover available tools:
|
||||
|
||||
```
|
||||
mcp__agent-builder__list_mcp_tools()
|
||||
```
|
||||
|
||||
4. Create the package directory:
|
||||
|
||||
```
|
||||
mkdir -p exports/AGENT_NAME/nodes
|
||||
```
|
||||
|
||||
**AFTER completing these calls**, tell the user:
|
||||
|
||||
> ✅ Build environment initialized
|
||||
>
|
||||
> - Session created
|
||||
> - Available tools: [list the tools from step 3]
|
||||
>
|
||||
> Proceeding to define the agent goal...
|
||||
|
||||
**THEN immediately proceed to STEP 2.**
|
||||
|
||||
---
|
||||
|
||||
## STEP 2: Define and Approve Goal
|
||||
|
||||
**PROPOSE a goal to the user.** Based on what they asked for, propose:
|
||||
|
||||
- Goal ID (kebab-case)
|
||||
- Goal name
|
||||
- Goal description
|
||||
- 3-5 success criteria (each with: id, description, metric, target, weight)
|
||||
- 2-4 constraints (each with: id, description, constraint_type, category)
|
||||
|
||||
**FORMAT your proposal as a clear summary, then ask for approval:**
|
||||
|
||||
> **Proposed Goal: [Name]**
|
||||
>
|
||||
> [Description]
|
||||
>
|
||||
> **Success Criteria:**
|
||||
>
|
||||
> 1. [criterion 1]
|
||||
> 2. [criterion 2]
|
||||
> ...
|
||||
>
|
||||
> **Constraints:**
|
||||
>
|
||||
> 1. [constraint 1]
|
||||
> 2. [constraint 2]
|
||||
> ...
|
||||
|
||||
**THEN call AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Do you approve this goal definition?",
|
||||
"header": "Goal",
|
||||
"options": [
|
||||
{"label": "Approve", "description": "Goal looks good, proceed"},
|
||||
{"label": "Modify", "description": "I want to change something"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
|
||||
- If **Modify**: Ask what they want to change, update proposal, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 3: Design Node Workflow
|
||||
|
||||
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
|
||||
|
||||
**DESIGN the workflow** as a series of nodes. For each node, determine:
|
||||
|
||||
- node_id (kebab-case)
|
||||
- name
|
||||
- description
|
||||
- node_type: `"llm_generate"` (no tools) or `"llm_tool_use"` (uses tools)
|
||||
- input_keys (what data this node receives)
|
||||
- output_keys (what data this node produces)
|
||||
- tools (ONLY tools that exist - empty list for llm_generate)
|
||||
- system_prompt
|
||||
|
||||
**PRESENT the workflow to the user:**
|
||||
|
||||
> **Proposed Workflow: [N] nodes**
|
||||
>
|
||||
> 1. **[node-id]** - [description]
|
||||
>
|
||||
> - Type: [llm_generate/llm_tool_use]
|
||||
> - Input: [keys]
|
||||
> - Output: [keys]
|
||||
> - Tools: [tools or "none"]
|
||||
>
|
||||
> 2. **[node-id]** - [description]
|
||||
> ...
|
||||
>
|
||||
> **Flow:** node1 → node2 → node3 → ...
|
||||
|
||||
**THEN call AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Do you approve this workflow design?",
|
||||
"header": "Workflow",
|
||||
"options": [
|
||||
{"label": "Approve", "description": "Workflow looks good, proceed to build nodes"},
|
||||
{"label": "Modify", "description": "I want to change the workflow"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Proceed to STEP 4
|
||||
- If **Modify**: Ask what they want to change, update design, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 4: Build Nodes One by One
|
||||
|
||||
**FOR EACH node in the approved workflow:**
|
||||
|
||||
1. **Call** `mcp__agent-builder__add_node(...)` with the node details
|
||||
|
||||
- input_keys and output_keys must be JSON strings: `'["key1", "key2"]'`
|
||||
- tools must be a JSON string: `'["tool1"]'` or `'[]'`
|
||||
|
||||
2. **Call** `mcp__agent-builder__test_node(...)` to validate:
|
||||
|
||||
```
|
||||
mcp__agent-builder__test_node(
|
||||
node_id="the-node-id",
|
||||
test_input='{"key": "test value"}',
|
||||
mock_llm_response='{"output_key": "test output"}'
|
||||
)
|
||||
```
|
||||
|
||||
3. **Check result:**
|
||||
|
||||
- If valid: Tell user "✅ Node [id] validated" and continue to next node
|
||||
- If invalid: Show errors, fix the node, re-validate
|
||||
|
||||
4. **Show progress** after each node:
|
||||
|
||||
```
|
||||
mcp__agent-builder__get_session_status()
|
||||
```
|
||||
|
||||
> ✅ Node [X] of [Y] complete: [node-id]
|
||||
|
||||
**AFTER all nodes are added and validated**, proceed to STEP 5.
|
||||
|
||||
---
|
||||
|
||||
## STEP 5: Connect Edges
|
||||
|
||||
**DETERMINE the edges** based on the workflow flow. For each connection:
|
||||
|
||||
- edge_id (kebab-case)
|
||||
- source (node that outputs)
|
||||
- target (node that receives)
|
||||
- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"`
|
||||
- condition_expr (Python expression, only if conditional)
|
||||
- priority (integer, lower = higher priority)
|
||||
|
||||
**FOR EACH edge, call:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__add_edge(
|
||||
edge_id="source-to-target",
|
||||
source="source-node-id",
|
||||
target="target-node-id",
|
||||
condition="on_success",
|
||||
condition_expr="",
|
||||
priority=1
|
||||
)
|
||||
```
|
||||
|
||||
**AFTER all edges are added, validate the graph:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__validate_graph()
|
||||
```
|
||||
|
||||
- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6
|
||||
- If invalid: Show errors, fix edges, re-validate
|
||||
|
||||
---
|
||||
|
||||
## STEP 6: Generate Agent Package
|
||||
|
||||
**EXPORT the graph data:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__export_graph()
|
||||
```
|
||||
|
||||
This returns JSON with all the goal, nodes, edges, and MCP server configurations.
|
||||
|
||||
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
|
||||
|
||||
1. `config.py` - Runtime configuration with model settings
|
||||
2. `nodes/__init__.py` - All NodeSpec definitions
|
||||
3. `agent.py` - Goal, edges, graph config, and agent class
|
||||
4. `__init__.py` - Package exports
|
||||
5. `__main__.py` - CLI interface
|
||||
6. `mcp_servers.json` - MCP server configurations
|
||||
7. `README.md` - Usage documentation
|
||||
|
||||
**IMPORTANT entry_points format:**
|
||||
|
||||
- MUST be: `{"start": "first-node-id"}`
|
||||
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
|
||||
- NOT: `{"first-node-id"}` (WRONG - this is a set)
|
||||
|
||||
**Use the example agent** at `.claude/skills/building-agents-construction/examples/online_research_agent/` as a template for file structure and patterns.
|
||||
|
||||
**AFTER writing all files, tell the user:**
|
||||
|
||||
> ✅ Agent package created: `exports/AGENT_NAME/`
|
||||
>
|
||||
> **Files generated:**
|
||||
>
|
||||
> - `__init__.py` - Package exports
|
||||
> - `agent.py` - Goal, nodes, edges, agent class
|
||||
> - `config.py` - Runtime configuration
|
||||
> - `__main__.py` - CLI interface
|
||||
> - `nodes/__init__.py` - Node definitions
|
||||
> - `mcp_servers.json` - MCP server config
|
||||
> - `README.md` - Usage documentation
|
||||
>
|
||||
> **Test your agent:**
|
||||
>
|
||||
> ```bash
|
||||
> cd /home/timothy/oss/hive
|
||||
> PYTHONPATH=core:exports python -m AGENT_NAME validate
|
||||
> PYTHONPATH=core:exports python -m AGENT_NAME info
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
## STEP 7: Verify and Test
|
||||
|
||||
**RUN validation:**
|
||||
|
||||
```bash
|
||||
cd /home/timothy/oss/hive && PYTHONPATH=core:exports python -m AGENT_NAME validate
|
||||
```
|
||||
|
||||
- If valid: Agent is complete!
|
||||
- If errors: Fix the issues and re-run
|
||||
|
||||
**SHOW final session summary:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__get_session_status()
|
||||
```
|
||||
|
||||
**TELL the user the agent is ready** and suggest next steps:
|
||||
|
||||
- Run with mock mode to test without API calls
|
||||
- Use `/testing-agent` skill for comprehensive testing
|
||||
- Use `/setup-credentials` if the agent needs API keys
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: Node Types
|
||||
|
||||
| Type | tools param | Use when |
|
||||
| -------------- | ---------------------- | ---------------------------------------------- |
|
||||
| `llm_generate` | `'[]'` | Pure reasoning, JSON output, no external calls |
|
||||
| `llm_tool_use` | `'["tool1", "tool2"]'` | Needs to call MCP tools |
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: Edge Conditions
|
||||
|
||||
| Condition | When edge is followed |
|
||||
| ------------- | ------------------------------------- |
|
||||
| `on_success` | Source node completed successfully |
|
||||
| `on_failure` | Source node failed |
|
||||
| `always` | Always, regardless of success/failure |
|
||||
| `conditional` | When condition_expr evaluates to True |
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: System Prompt Best Practice
|
||||
|
||||
For nodes with JSON output, include this in the system_prompt:
|
||||
|
||||
```
|
||||
CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks.
|
||||
Just the JSON object starting with { and ending with }.
|
||||
|
||||
Return this exact structure:
|
||||
{
|
||||
"key1": "...",
|
||||
"key2": "..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## COMMON MISTAKES TO AVOID
|
||||
|
||||
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
|
||||
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
|
||||
3. **Skipping validation** - Always validate nodes and graph before proceeding
|
||||
4. **Not waiting for approval** - Always ask user before major steps
|
||||
5. **Displaying this file** - Execute the steps, don't show documentation
|
||||
@@ -1,80 +0,0 @@
|
||||
# Online Research Agent
|
||||
|
||||
Deep-dive research agent that searches 10+ sources and produces comprehensive narrative reports with citations.
|
||||
|
||||
## Features
|
||||
|
||||
- Generates multiple search queries from a topic
|
||||
- Searches and fetches 15+ web sources
|
||||
- Evaluates and ranks sources by relevance
|
||||
- Synthesizes findings into themes
|
||||
- Writes narrative report with numbered citations
|
||||
- Quality checks for uncited claims
|
||||
- Saves report to local markdown file
|
||||
|
||||
## Usage
|
||||
|
||||
### CLI
|
||||
|
||||
```bash
|
||||
# Show agent info
|
||||
python -m online_research_agent info
|
||||
|
||||
# Validate structure
|
||||
python -m online_research_agent validate
|
||||
|
||||
# Run research on a topic
|
||||
python -m online_research_agent run --topic "impact of AI on healthcare"
|
||||
|
||||
# Interactive shell
|
||||
python -m online_research_agent shell
|
||||
```
|
||||
|
||||
### Python API
|
||||
|
||||
```python
|
||||
from online_research_agent import default_agent
|
||||
|
||||
# Simple usage
|
||||
result = await default_agent.run({"topic": "climate change solutions"})
|
||||
|
||||
# Check output
|
||||
if result.success:
|
||||
print(f"Report saved to: {result.output['file_path']}")
|
||||
print(result.output['final_report'])
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
parse-query → search-sources → fetch-content → evaluate-sources
|
||||
↓
|
||||
write-report ← synthesize-findings
|
||||
↓
|
||||
quality-check → save-report
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Reports are saved to `./research_reports/` as markdown files with:
|
||||
|
||||
1. Executive Summary
|
||||
2. Introduction
|
||||
3. Key Findings (by theme)
|
||||
4. Analysis
|
||||
5. Conclusion
|
||||
6. References
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- LLM provider API key (Groq, Cerebras, etc.)
|
||||
- Internet access for web search/fetch
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit `config.py` to change:
|
||||
|
||||
- `model`: LLM model (default: groq/moonshotai/kimi-k2-instruct-0905)
|
||||
- `temperature`: Generation temperature (default: 0.7)
|
||||
- `max_tokens`: Max tokens per response (default: 16384)
|
||||
-23
@@ -1,23 +0,0 @@
|
||||
"""
|
||||
Online Research Agent - Deep-dive research with narrative reports.
|
||||
|
||||
Research any topic by searching multiple sources, synthesizing information,
|
||||
and producing a well-structured narrative report with citations.
|
||||
"""
|
||||
|
||||
from .agent import OnlineResearchAgent, default_agent, goal, nodes, edges
|
||||
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
|
||||
|
||||
__version__ = "1.0.0"
|
||||
|
||||
__all__ = [
|
||||
"OnlineResearchAgent",
|
||||
"default_agent",
|
||||
"goal",
|
||||
"nodes",
|
||||
"edges",
|
||||
"RuntimeConfig",
|
||||
"AgentMetadata",
|
||||
"default_config",
|
||||
"metadata",
|
||||
]
|
||||
@@ -1,419 +0,0 @@
|
||||
"""Agent graph construction for Online Research Agent."""
|
||||
|
||||
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
|
||||
from framework.graph.edge import GraphSpec
|
||||
from framework.graph.executor import ExecutionResult
|
||||
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
|
||||
from framework.runtime.execution_stream import EntryPointSpec
|
||||
from framework.llm import LiteLLMProvider
|
||||
from framework.runner.tool_registry import ToolRegistry
|
||||
|
||||
from .config import default_config, metadata
|
||||
from .nodes import (
|
||||
parse_query_node,
|
||||
search_sources_node,
|
||||
fetch_content_node,
|
||||
evaluate_sources_node,
|
||||
synthesize_findings_node,
|
||||
write_report_node,
|
||||
quality_check_node,
|
||||
save_report_node,
|
||||
)
|
||||
|
||||
# Goal definition
|
||||
goal = Goal(
|
||||
id="comprehensive-online-research",
|
||||
name="Comprehensive Online Research",
|
||||
description="Research any topic by searching multiple sources, synthesizing information, and producing a well-structured narrative report with citations.",
|
||||
success_criteria=[
|
||||
SuccessCriterion(
|
||||
id="source-coverage",
|
||||
description="Query 10+ diverse sources",
|
||||
metric="source_count",
|
||||
target=">=10",
|
||||
weight=0.20,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="relevance",
|
||||
description="All sources directly address the query",
|
||||
metric="relevance_score",
|
||||
target="90%",
|
||||
weight=0.25,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="synthesis",
|
||||
description="Synthesize findings into coherent narrative",
|
||||
metric="coherence_score",
|
||||
target="85%",
|
||||
weight=0.25,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="citations",
|
||||
description="Include citations for all claims",
|
||||
metric="citation_coverage",
|
||||
target="100%",
|
||||
weight=0.15,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="actionable",
|
||||
description="Report answers the user's question",
|
||||
metric="answer_completeness",
|
||||
target="90%",
|
||||
weight=0.15,
|
||||
),
|
||||
],
|
||||
constraints=[
|
||||
Constraint(
|
||||
id="no-hallucination",
|
||||
description="Only include information found in sources",
|
||||
constraint_type="quality",
|
||||
category="accuracy",
|
||||
),
|
||||
Constraint(
|
||||
id="source-attribution",
|
||||
description="Every factual claim must cite its source",
|
||||
constraint_type="quality",
|
||||
category="accuracy",
|
||||
),
|
||||
Constraint(
|
||||
id="recency-preference",
|
||||
description="Prefer recent sources when relevant",
|
||||
constraint_type="quality",
|
||||
category="relevance",
|
||||
),
|
||||
Constraint(
|
||||
id="no-paywalled",
|
||||
description="Avoid sources that require payment to access",
|
||||
constraint_type="functional",
|
||||
category="accessibility",
|
||||
),
|
||||
],
|
||||
)
|
||||
|
||||
# Node list
|
||||
nodes = [
|
||||
parse_query_node,
|
||||
search_sources_node,
|
||||
fetch_content_node,
|
||||
evaluate_sources_node,
|
||||
synthesize_findings_node,
|
||||
write_report_node,
|
||||
quality_check_node,
|
||||
save_report_node,
|
||||
]
|
||||
|
||||
# Edge definitions
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="parse-to-search",
|
||||
source="parse-query",
|
||||
target="search-sources",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="search-to-fetch",
|
||||
source="search-sources",
|
||||
target="fetch-content",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="fetch-to-evaluate",
|
||||
source="fetch-content",
|
||||
target="evaluate-sources",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="evaluate-to-synthesize",
|
||||
source="evaluate-sources",
|
||||
target="synthesize-findings",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="synthesize-to-write",
|
||||
source="synthesize-findings",
|
||||
target="write-report",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="write-to-quality",
|
||||
source="write-report",
|
||||
target="quality-check",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="quality-to-save",
|
||||
source="quality-check",
|
||||
target="save-report",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
]
|
||||
|
||||
# Graph configuration
|
||||
entry_node = "parse-query"
|
||||
entry_points = {"start": "parse-query"}
|
||||
pause_nodes = []
|
||||
terminal_nodes = ["save-report"]
|
||||
|
||||
|
||||
class OnlineResearchAgent:
|
||||
"""
|
||||
Online Research Agent - Deep-dive research with narrative reports.
|
||||
|
||||
Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
|
||||
"""
|
||||
|
||||
def __init__(self, config=None):
|
||||
self.config = config or default_config
|
||||
self.goal = goal
|
||||
self.nodes = nodes
|
||||
self.edges = edges
|
||||
self.entry_node = entry_node
|
||||
self.entry_points = entry_points
|
||||
self.pause_nodes = pause_nodes
|
||||
self.terminal_nodes = terminal_nodes
|
||||
self._runtime: AgentRuntime | None = None
|
||||
self._graph: GraphSpec | None = None
|
||||
|
||||
def _build_entry_point_specs(self) -> list[EntryPointSpec]:
|
||||
"""Convert entry_points dict to EntryPointSpec list."""
|
||||
specs = []
|
||||
for ep_id, node_id in self.entry_points.items():
|
||||
if ep_id == "start":
|
||||
trigger_type = "manual"
|
||||
name = "Start"
|
||||
elif "_resume" in ep_id:
|
||||
trigger_type = "resume"
|
||||
name = f"Resume from {ep_id.replace('_resume', '')}"
|
||||
else:
|
||||
trigger_type = "manual"
|
||||
name = ep_id.replace("-", " ").title()
|
||||
|
||||
specs.append(
|
||||
EntryPointSpec(
|
||||
id=ep_id,
|
||||
name=name,
|
||||
entry_node=node_id,
|
||||
trigger_type=trigger_type,
|
||||
isolation_level="shared",
|
||||
)
|
||||
)
|
||||
return specs
|
||||
|
||||
def _create_runtime(self, mock_mode=False) -> AgentRuntime:
|
||||
"""Create AgentRuntime instance."""
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
# Persistent storage in ~/.hive for telemetry and run history
|
||||
storage_path = Path.home() / ".hive" / "online_research_agent"
|
||||
storage_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
tool_registry = ToolRegistry()
|
||||
|
||||
# Load MCP servers (always load, needed for tool validation)
|
||||
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
|
||||
if mcp_config_path.exists():
|
||||
tool_registry.load_mcp_config(mcp_config_path)
|
||||
|
||||
llm = None
|
||||
if not mock_mode:
|
||||
# LiteLLMProvider uses environment variables for API keys
|
||||
llm = LiteLLMProvider(
|
||||
model=self.config.model,
|
||||
api_key=self.config.api_key,
|
||||
api_base=self.config.api_base,
|
||||
)
|
||||
|
||||
self._graph = GraphSpec(
|
||||
id="online-research-agent-graph",
|
||||
goal_id=self.goal.id,
|
||||
version="1.0.0",
|
||||
entry_node=self.entry_node,
|
||||
entry_points=self.entry_points,
|
||||
terminal_nodes=self.terminal_nodes,
|
||||
pause_nodes=self.pause_nodes,
|
||||
nodes=self.nodes,
|
||||
edges=self.edges,
|
||||
default_model=self.config.model,
|
||||
max_tokens=self.config.max_tokens,
|
||||
)
|
||||
|
||||
# Create AgentRuntime with all entry points
|
||||
self._runtime = create_agent_runtime(
|
||||
graph=self._graph,
|
||||
goal=self.goal,
|
||||
storage_path=storage_path,
|
||||
entry_points=self._build_entry_point_specs(),
|
||||
llm=llm,
|
||||
tools=list(tool_registry.get_tools().values()),
|
||||
tool_executor=tool_registry.get_executor(),
|
||||
)
|
||||
|
||||
return self._runtime
|
||||
|
||||
async def start(self, mock_mode=False) -> None:
|
||||
"""Start the agent runtime."""
|
||||
if self._runtime is None:
|
||||
self._create_runtime(mock_mode=mock_mode)
|
||||
await self._runtime.start()
|
||||
|
||||
async def stop(self) -> None:
|
||||
"""Stop the agent runtime."""
|
||||
if self._runtime is not None:
|
||||
await self._runtime.stop()
|
||||
|
||||
async def trigger(
|
||||
self,
|
||||
entry_point: str,
|
||||
input_data: dict,
|
||||
correlation_id: str | None = None,
|
||||
session_state: dict | None = None,
|
||||
) -> str:
|
||||
"""
|
||||
Trigger execution at a specific entry point (non-blocking).
|
||||
|
||||
Args:
|
||||
entry_point: Entry point ID (e.g., "start", "pause-node_resume")
|
||||
input_data: Input data for the execution
|
||||
correlation_id: Optional ID to correlate related executions
|
||||
session_state: Optional session state to resume from (with paused_at, memory)
|
||||
|
||||
Returns:
|
||||
Execution ID for tracking
|
||||
"""
|
||||
if self._runtime is None or not self._runtime.is_running:
|
||||
raise RuntimeError("Agent runtime not started. Call start() first.")
|
||||
return await self._runtime.trigger(
|
||||
entry_point, input_data, correlation_id, session_state=session_state
|
||||
)
|
||||
|
||||
async def trigger_and_wait(
|
||||
self,
|
||||
entry_point: str,
|
||||
input_data: dict,
|
||||
timeout: float | None = None,
|
||||
session_state: dict | None = None,
|
||||
) -> ExecutionResult | None:
|
||||
"""
|
||||
Trigger execution and wait for completion.
|
||||
|
||||
Args:
|
||||
entry_point: Entry point ID
|
||||
input_data: Input data for the execution
|
||||
timeout: Maximum time to wait (seconds)
|
||||
session_state: Optional session state to resume from (with paused_at, memory)
|
||||
|
||||
Returns:
|
||||
ExecutionResult or None if timeout
|
||||
"""
|
||||
if self._runtime is None or not self._runtime.is_running:
|
||||
raise RuntimeError("Agent runtime not started. Call start() first.")
|
||||
return await self._runtime.trigger_and_wait(
|
||||
entry_point, input_data, timeout, session_state=session_state
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, context: dict, mock_mode=False, session_state=None
|
||||
) -> ExecutionResult:
|
||||
"""
|
||||
Run the agent (convenience method for simple single execution).
|
||||
|
||||
For more control, use start() + trigger_and_wait() + stop().
|
||||
"""
|
||||
await self.start(mock_mode=mock_mode)
|
||||
try:
|
||||
# Determine entry point based on session_state
|
||||
if session_state and "paused_at" in session_state:
|
||||
paused_node = session_state["paused_at"]
|
||||
resume_key = f"{paused_node}_resume"
|
||||
if resume_key in self.entry_points:
|
||||
entry_point = resume_key
|
||||
else:
|
||||
entry_point = "start"
|
||||
else:
|
||||
entry_point = "start"
|
||||
|
||||
result = await self.trigger_and_wait(
|
||||
entry_point, context, session_state=session_state
|
||||
)
|
||||
return result or ExecutionResult(success=False, error="Execution timeout")
|
||||
finally:
|
||||
await self.stop()
|
||||
|
||||
async def get_goal_progress(self) -> dict:
|
||||
"""Get goal progress across all executions."""
|
||||
if self._runtime is None:
|
||||
raise RuntimeError("Agent runtime not started")
|
||||
return await self._runtime.get_goal_progress()
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
"""Get runtime statistics."""
|
||||
if self._runtime is None:
|
||||
return {"running": False}
|
||||
return self._runtime.get_stats()
|
||||
|
||||
def info(self):
|
||||
"""Get agent information."""
|
||||
return {
|
||||
"name": metadata.name,
|
||||
"version": metadata.version,
|
||||
"description": metadata.description,
|
||||
"goal": {
|
||||
"name": self.goal.name,
|
||||
"description": self.goal.description,
|
||||
},
|
||||
"nodes": [n.id for n in self.nodes],
|
||||
"edges": [e.id for e in self.edges],
|
||||
"entry_node": self.entry_node,
|
||||
"entry_points": self.entry_points,
|
||||
"pause_nodes": self.pause_nodes,
|
||||
"terminal_nodes": self.terminal_nodes,
|
||||
"multi_entrypoint": True,
|
||||
}
|
||||
|
||||
def validate(self):
|
||||
"""Validate agent structure."""
|
||||
errors = []
|
||||
warnings = []
|
||||
|
||||
node_ids = {node.id for node in self.nodes}
|
||||
for edge in self.edges:
|
||||
if edge.source not in node_ids:
|
||||
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
|
||||
if edge.target not in node_ids:
|
||||
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
|
||||
|
||||
if self.entry_node not in node_ids:
|
||||
errors.append(f"Entry node '{self.entry_node}' not found")
|
||||
|
||||
for terminal in self.terminal_nodes:
|
||||
if terminal not in node_ids:
|
||||
errors.append(f"Terminal node '{terminal}' not found")
|
||||
|
||||
for pause in self.pause_nodes:
|
||||
if pause not in node_ids:
|
||||
errors.append(f"Pause node '{pause}' not found")
|
||||
|
||||
# Validate entry points
|
||||
for ep_id, node_id in self.entry_points.items():
|
||||
if node_id not in node_ids:
|
||||
errors.append(
|
||||
f"Entry point '{ep_id}' references unknown node '{node_id}'"
|
||||
)
|
||||
|
||||
return {
|
||||
"valid": len(errors) == 0,
|
||||
"errors": errors,
|
||||
"warnings": warnings,
|
||||
}
|
||||
|
||||
|
||||
# Create default instance
|
||||
default_agent = OnlineResearchAgent()
|
||||
-396
@@ -1,396 +0,0 @@
|
||||
"""Node definitions for Online Research Agent."""
|
||||
|
||||
from framework.graph import NodeSpec
|
||||
|
||||
# Node 1: Parse Query
|
||||
parse_query_node = NodeSpec(
|
||||
id="parse-query",
|
||||
name="Parse Query",
|
||||
description="Analyze the research topic and generate 3-5 diverse search queries to cover different aspects",
|
||||
node_type="llm_generate",
|
||||
input_keys=["topic"],
|
||||
output_keys=["search_queries", "research_focus", "key_aspects"],
|
||||
output_schema={
|
||||
"research_focus": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Brief statement of what we're researching",
|
||||
},
|
||||
"key_aspects": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of 3-5 key aspects to investigate",
|
||||
},
|
||||
"search_queries": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of 3-5 search queries",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a research query strategist. Given a research topic, analyze it and generate search queries.
|
||||
|
||||
Your task:
|
||||
1. Understand the core research question
|
||||
2. Identify 3-5 key aspects to investigate
|
||||
3. Generate 3-5 diverse search queries that will find comprehensive information
|
||||
|
||||
CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks.
|
||||
|
||||
Return this JSON structure:
|
||||
{
|
||||
"research_focus": "Brief statement of what we're researching",
|
||||
"key_aspects": ["aspect1", "aspect2", "aspect3"],
|
||||
"search_queries": [
|
||||
"query 1 - broad overview",
|
||||
"query 2 - specific angle",
|
||||
"query 3 - recent developments",
|
||||
"query 4 - expert opinions",
|
||||
"query 5 - data/statistics"
|
||||
]
|
||||
}
|
||||
""",
|
||||
tools=[],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 2: Search Sources
|
||||
search_sources_node = NodeSpec(
|
||||
id="search-sources",
|
||||
name="Search Sources",
|
||||
description="Execute web searches using the generated queries to find 15+ source URLs",
|
||||
node_type="llm_tool_use",
|
||||
input_keys=["search_queries", "research_focus"],
|
||||
output_keys=["source_urls", "search_results_summary"],
|
||||
output_schema={
|
||||
"source_urls": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of source URLs found",
|
||||
},
|
||||
"search_results_summary": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Brief summary of what was found",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a research assistant executing web searches. Use the web_search tool to find sources.
|
||||
|
||||
Your task:
|
||||
1. Execute each search query using web_search tool
|
||||
2. Collect URLs from search results
|
||||
3. Aim for 15+ diverse sources
|
||||
|
||||
After searching, return JSON with found sources:
|
||||
{
|
||||
"source_urls": ["url1", "url2", ...],
|
||||
"search_results_summary": "Brief summary of what was found"
|
||||
}
|
||||
""",
|
||||
tools=["web_search"],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 3: Fetch Content
|
||||
fetch_content_node = NodeSpec(
|
||||
id="fetch-content",
|
||||
name="Fetch Content",
|
||||
description="Fetch and extract content from the discovered source URLs",
|
||||
node_type="llm_tool_use",
|
||||
input_keys=["source_urls", "research_focus"],
|
||||
output_keys=["fetched_sources", "fetch_errors"],
|
||||
output_schema={
|
||||
"fetched_sources": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of fetched source objects with url, title, content",
|
||||
},
|
||||
"fetch_errors": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of URLs that failed to fetch",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a content fetcher. Use web_scrape tool to retrieve content from URLs.
|
||||
|
||||
Your task:
|
||||
1. Fetch content from each source URL using web_scrape tool
|
||||
2. Extract the main content relevant to the research focus
|
||||
3. Track any URLs that failed to fetch
|
||||
|
||||
After fetching, return JSON:
|
||||
{
|
||||
"fetched_sources": [
|
||||
{"url": "...", "title": "...", "content": "extracted text..."},
|
||||
...
|
||||
],
|
||||
"fetch_errors": ["url that failed", ...]
|
||||
}
|
||||
""",
|
||||
tools=["web_scrape"],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 4: Evaluate Sources
|
||||
evaluate_sources_node = NodeSpec(
|
||||
id="evaluate-sources",
|
||||
name="Evaluate Sources",
|
||||
description="Score sources for relevance and quality, filter to top 10",
|
||||
node_type="llm_generate",
|
||||
input_keys=["fetched_sources", "research_focus", "key_aspects"],
|
||||
output_keys=["ranked_sources", "source_analysis"],
|
||||
output_schema={
|
||||
"ranked_sources": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of ranked sources with scores",
|
||||
},
|
||||
"source_analysis": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Overview of source quality and coverage",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a source evaluator. Assess each source for quality and relevance.
|
||||
|
||||
Scoring criteria:
|
||||
- Relevance to research focus (1-10)
|
||||
- Source credibility (1-10)
|
||||
- Information depth (1-10)
|
||||
- Recency if relevant (1-10)
|
||||
|
||||
Your task:
|
||||
1. Score each source
|
||||
2. Rank by combined score
|
||||
3. Select top 10 sources
|
||||
4. Note what each source uniquely contributes
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"ranked_sources": [
|
||||
{"url": "...", "title": "...", "content": "...", "score": 8.5, "unique_value": "..."},
|
||||
...
|
||||
],
|
||||
"source_analysis": "Overview of source quality and coverage"
|
||||
}
|
||||
""",
|
||||
tools=[],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 5: Synthesize Findings
|
||||
synthesize_findings_node = NodeSpec(
|
||||
id="synthesize-findings",
|
||||
name="Synthesize Findings",
|
||||
description="Extract key facts from sources and identify common themes",
|
||||
node_type="llm_generate",
|
||||
input_keys=["ranked_sources", "research_focus", "key_aspects"],
|
||||
output_keys=["key_findings", "themes", "source_citations"],
|
||||
output_schema={
|
||||
"key_findings": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of key findings with sources and confidence",
|
||||
},
|
||||
"themes": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of themes with descriptions and supporting sources",
|
||||
},
|
||||
"source_citations": {
|
||||
"type": "object",
|
||||
"required": True,
|
||||
"description": "Map of facts to supporting URLs",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a research synthesizer. Analyze multiple sources to extract insights.
|
||||
|
||||
Your task:
|
||||
1. Identify key facts from each source
|
||||
2. Find common themes across sources
|
||||
3. Note contradictions or debates
|
||||
4. Build a citation map (fact -> source URL)
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"key_findings": [
|
||||
{"finding": "...", "sources": ["url1", "url2"], "confidence": "high/medium/low"},
|
||||
...
|
||||
],
|
||||
"themes": [
|
||||
{"theme": "...", "description": "...", "supporting_sources": ["url1", ...]},
|
||||
...
|
||||
],
|
||||
"source_citations": {
|
||||
"fact or claim": ["supporting url1", "url2"],
|
||||
...
|
||||
}
|
||||
}
|
||||
""",
|
||||
tools=[],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 6: Write Report
|
||||
write_report_node = NodeSpec(
|
||||
id="write-report",
|
||||
name="Write Report",
|
||||
description="Generate a narrative report with proper citations",
|
||||
node_type="llm_generate",
|
||||
input_keys=[
|
||||
"key_findings",
|
||||
"themes",
|
||||
"source_citations",
|
||||
"research_focus",
|
||||
"ranked_sources",
|
||||
],
|
||||
output_keys=["report_content", "references"],
|
||||
output_schema={
|
||||
"report_content": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Full markdown report text with citations",
|
||||
},
|
||||
"references": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of reference objects with number, url, title",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a research report writer. Create a well-structured narrative report.
|
||||
|
||||
Report structure:
|
||||
1. Executive Summary (2-3 paragraphs)
|
||||
2. Introduction (context and scope)
|
||||
3. Key Findings (organized by theme)
|
||||
4. Analysis (synthesis and implications)
|
||||
5. Conclusion
|
||||
6. References (numbered list of all sources)
|
||||
|
||||
Citation format: Use numbered citations like [1], [2] that correspond to the References section.
|
||||
|
||||
IMPORTANT:
|
||||
- Every factual claim MUST have a citation
|
||||
- Write in clear, professional prose
|
||||
- Be objective and balanced
|
||||
- Highlight areas of consensus and debate
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"report_content": "Full markdown report text with citations...",
|
||||
"references": [
|
||||
{"number": 1, "url": "...", "title": "..."},
|
||||
...
|
||||
]
|
||||
}
|
||||
""",
|
||||
tools=[],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 7: Quality Check
|
||||
quality_check_node = NodeSpec(
|
||||
id="quality-check",
|
||||
name="Quality Check",
|
||||
description="Verify all claims have citations and report is coherent",
|
||||
node_type="llm_generate",
|
||||
input_keys=["report_content", "references", "source_citations"],
|
||||
output_keys=["quality_score", "issues", "final_report"],
|
||||
output_schema={
|
||||
"quality_score": {
|
||||
"type": "number",
|
||||
"required": True,
|
||||
"description": "Quality score 0-1",
|
||||
},
|
||||
"issues": {
|
||||
"type": "array",
|
||||
"required": True,
|
||||
"description": "List of issues found and fixed",
|
||||
},
|
||||
"final_report": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Corrected full report",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a quality assurance reviewer. Check the research report for issues.
|
||||
|
||||
Check for:
|
||||
1. Uncited claims (factual statements without [n] citation)
|
||||
2. Broken citations (references to non-existent numbers)
|
||||
3. Coherence (logical flow between sections)
|
||||
4. Completeness (all key aspects covered)
|
||||
5. Accuracy (claims match source content)
|
||||
|
||||
If issues found, fix them in the final report.
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"quality_score": 0.95,
|
||||
"issues": [
|
||||
{"type": "uncited_claim", "location": "paragraph 3", "fixed": true},
|
||||
...
|
||||
],
|
||||
"final_report": "Corrected full report with all issues fixed..."
|
||||
}
|
||||
""",
|
||||
tools=[],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
# Node 8: Save Report
|
||||
save_report_node = NodeSpec(
|
||||
id="save-report",
|
||||
name="Save Report",
|
||||
description="Write the final report to a local markdown file",
|
||||
node_type="llm_tool_use",
|
||||
input_keys=["final_report", "references", "research_focus"],
|
||||
output_keys=["file_path", "save_status"],
|
||||
output_schema={
|
||||
"file_path": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Path where report was saved",
|
||||
},
|
||||
"save_status": {
|
||||
"type": "string",
|
||||
"required": True,
|
||||
"description": "Status of save operation",
|
||||
},
|
||||
},
|
||||
system_prompt="""\
|
||||
You are a file manager. Save the research report to disk.
|
||||
|
||||
Your task:
|
||||
1. Generate a filename from the research focus (slugified, with date)
|
||||
2. Use the write_to_file tool to save the report as markdown
|
||||
3. Save to the ./research_reports/ directory
|
||||
|
||||
Filename format: research_YYYY-MM-DD_topic-slug.md
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"file_path": "research_reports/research_2026-01-23_topic-name.md",
|
||||
"save_status": "success"
|
||||
}
|
||||
""",
|
||||
tools=["write_to_file"],
|
||||
max_retries=3,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"parse_query_node",
|
||||
"search_sources_node",
|
||||
"fetch_content_node",
|
||||
"evaluate_sources_node",
|
||||
"synthesize_findings_node",
|
||||
"write_report_node",
|
||||
"quality_check_node",
|
||||
"save_report_node",
|
||||
]
|
||||
@@ -1,303 +0,0 @@
|
||||
---
|
||||
name: building-agents-core
|
||||
description: Core concepts for goal-driven agents - architecture, node types, tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "1.0"
|
||||
type: foundational
|
||||
part_of: building-agents
|
||||
---
|
||||
|
||||
# Building Agents - Core Concepts
|
||||
|
||||
Foundational knowledge for building goal-driven agents as Python packages.
|
||||
|
||||
## Architecture: Python Services (Not JSON Configs)
|
||||
|
||||
Agents are built as Python packages:
|
||||
|
||||
```
|
||||
exports/my_agent/
|
||||
├── __init__.py # Package exports
|
||||
├── __main__.py # CLI (run, info, validate, shell)
|
||||
├── agent.py # Graph construction (goal, edges, agent class)
|
||||
├── nodes/__init__.py # Node definitions (NodeSpec)
|
||||
├── config.py # Runtime config
|
||||
└── README.md # Documentation
|
||||
```
|
||||
|
||||
**Key Principle: Agent is visible and editable during build**
|
||||
|
||||
- ✅ Files created immediately as components are approved
|
||||
- ✅ User can watch files grow in their editor
|
||||
- ✅ No session state - just direct file writes
|
||||
- ✅ No "export" step - agent is ready when build completes
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Goal
|
||||
|
||||
Success criteria and constraints (written to agent.py)
|
||||
|
||||
```python
|
||||
goal = Goal(
|
||||
id="research-goal",
|
||||
name="Technical Research Agent",
|
||||
description="Research technical topics thoroughly",
|
||||
success_criteria=[
|
||||
SuccessCriterion(
|
||||
id="completeness",
|
||||
description="Cover all aspects of topic",
|
||||
metric="coverage_score",
|
||||
target=">=0.9",
|
||||
weight=0.4,
|
||||
),
|
||||
# 3-5 success criteria total
|
||||
],
|
||||
constraints=[
|
||||
Constraint(
|
||||
id="accuracy",
|
||||
description="All information must be verified",
|
||||
constraint_type="hard",
|
||||
category="quality",
|
||||
),
|
||||
# 1-5 constraints total
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
### Node
|
||||
|
||||
Unit of work (written to nodes/__init__.py)
|
||||
|
||||
**Node Types:**
|
||||
|
||||
- `llm_generate` - Text generation, parsing
|
||||
- `llm_tool_use` - Actions requiring tools
|
||||
- `router` - Conditional branching
|
||||
- `function` - Deterministic operations
|
||||
|
||||
```python
|
||||
search_node = NodeSpec(
|
||||
id="search-web",
|
||||
name="Search Web",
|
||||
description="Search for information online",
|
||||
node_type="llm_tool_use",
|
||||
input_keys=["query"],
|
||||
output_keys=["search_results"],
|
||||
system_prompt="Search the web for: {query}",
|
||||
tools=["web_search"],
|
||||
max_retries=3,
|
||||
)
|
||||
```
|
||||
|
||||
### Edge
|
||||
|
||||
Connection between nodes (written to agent.py)
|
||||
|
||||
**Edge Conditions:**
|
||||
|
||||
- `on_success` - Proceed if node succeeds
|
||||
- `on_failure` - Handle errors
|
||||
- `always` - Always proceed
|
||||
- `conditional` - Based on expression
|
||||
|
||||
```python
|
||||
EdgeSpec(
|
||||
id="search-to-analyze",
|
||||
source="search-web",
|
||||
target="analyze-results",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
)
|
||||
```
|
||||
|
||||
### Pause/Resume
|
||||
|
||||
Multi-turn conversations
|
||||
|
||||
- **Pause nodes** - Stop execution, wait for user input
|
||||
- **Resume entry points** - Continue from pause with user's response
|
||||
|
||||
```python
|
||||
# Example pause/resume configuration
|
||||
pause_nodes = ["request-clarification"]
|
||||
entry_points = {
|
||||
"start": "analyze-request",
|
||||
"request-clarification_resume": "process-clarification"
|
||||
}
|
||||
```
|
||||
|
||||
## Tool Discovery & Validation
|
||||
|
||||
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
|
||||
|
||||
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
|
||||
|
||||
### Step 1: Register MCP Server (if not already done)
|
||||
|
||||
```python
|
||||
mcp__agent-builder__add_mcp_server(
|
||||
name="tools",
|
||||
transport="stdio",
|
||||
command="python",
|
||||
args='["mcp_server.py", "--stdio"]',
|
||||
cwd="../tools"
|
||||
)
|
||||
```
|
||||
|
||||
### Step 2: Discover Available Tools
|
||||
|
||||
```python
|
||||
# List all tools from all registered servers
|
||||
mcp__agent-builder__list_mcp_tools()
|
||||
|
||||
# Or list tools from a specific server
|
||||
mcp__agent-builder__list_mcp_tools(server_name="tools")
|
||||
```
|
||||
|
||||
This returns available tools with their descriptions and parameters:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"tools_by_server": {
|
||||
"tools": [
|
||||
{
|
||||
"name": "web_search",
|
||||
"description": "Search the web...",
|
||||
"parameters": ["query"]
|
||||
},
|
||||
{
|
||||
"name": "web_scrape",
|
||||
"description": "Scrape a URL...",
|
||||
"parameters": ["url"]
|
||||
}
|
||||
]
|
||||
},
|
||||
"total_tools": 14
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Validate Before Adding Nodes
|
||||
|
||||
Before writing a node with `tools=[...]`:
|
||||
|
||||
1. Call `list_mcp_tools()` to get available tools
|
||||
2. Check each tool in your node exists in the response
|
||||
3. If a tool doesn't exist:
|
||||
- **DO NOT proceed** with the node
|
||||
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
|
||||
- Ask if they want to use an alternative or proceed without the tool
|
||||
|
||||
### Tool Validation Anti-Patterns
|
||||
|
||||
❌ **Never assume a tool exists** - always call `list_mcp_tools()` first
|
||||
❌ **Never write a node with unverified tools** - validate before writing
|
||||
❌ **Never silently drop tools** - if a tool doesn't exist, inform the user
|
||||
❌ **Never guess tool names** - use exact names from discovery response
|
||||
|
||||
### Example Validation Flow
|
||||
|
||||
```python
|
||||
# 1. User requests: "Add a node that searches the web"
|
||||
# 2. Discover available tools
|
||||
tools_response = mcp__agent-builder__list_mcp_tools()
|
||||
|
||||
# 3. Check if web_search exists
|
||||
available = [t["name"] for tools in tools_response["tools_by_server"].values() for t in tools]
|
||||
if "web_search" not in available:
|
||||
# Inform user and ask how to proceed
|
||||
print("❌ 'web_search' not available. Available tools:", available)
|
||||
else:
|
||||
# Proceed with node creation
|
||||
# ...
|
||||
```
|
||||
|
||||
## Workflow Overview: Incremental File Construction
|
||||
|
||||
```
|
||||
1. CREATE PACKAGE → mkdir + write skeletons
|
||||
2. DEFINE GOAL → Write to agent.py + config.py
|
||||
3. FOR EACH NODE:
|
||||
- Propose design
|
||||
- User approves
|
||||
- Write to nodes/__init__.py IMMEDIATELY ← FILE WRITTEN
|
||||
- (Optional) Validate with test_node ← MCP VALIDATION
|
||||
- User can open file and see it
|
||||
4. CONNECT EDGES → Update agent.py ← FILE WRITTEN
|
||||
- (Optional) Validate with validate_graph ← MCP VALIDATION
|
||||
5. FINALIZE → Write agent class to agent.py ← FILE WRITTEN
|
||||
6. DONE - Agent ready at exports/my_agent/
|
||||
```
|
||||
|
||||
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
|
||||
|
||||
### The Key Difference
|
||||
|
||||
**OLD (Bad):**
|
||||
|
||||
```
|
||||
MCP add_node → Session State → MCP add_node → Session State → ...
|
||||
↓
|
||||
MCP export_graph
|
||||
↓
|
||||
Files appear
|
||||
```
|
||||
|
||||
**NEW (Good):**
|
||||
|
||||
```
|
||||
Write node to file → (Optional: MCP test_node) → Write node to file → ...
|
||||
↓ ↓
|
||||
File visible File visible
|
||||
immediately immediately
|
||||
```
|
||||
|
||||
**Bottom line:** Use Write/Edit for construction, MCP for validation if needed.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use building-agents-core when:
|
||||
- Starting a new agent project and need to understand fundamentals
|
||||
- Need to understand agent architecture before building
|
||||
- Want to validate tool availability before proceeding
|
||||
- Learning about node types, edges, and graph execution
|
||||
|
||||
**Next Steps:**
|
||||
- Ready to build? → Use `building-agents-construction` skill
|
||||
- Need patterns and examples? → Use `building-agents-patterns` skill
|
||||
|
||||
## MCP Tools for Validation
|
||||
|
||||
After writing files, optionally use MCP tools for validation:
|
||||
|
||||
**test_node** - Validate node configuration with mock inputs
|
||||
```python
|
||||
mcp__agent-builder__test_node(
|
||||
node_id="search-web",
|
||||
test_input='{"query": "test query"}',
|
||||
mock_llm_response='{"results": "mock output"}'
|
||||
)
|
||||
```
|
||||
|
||||
**validate_graph** - Check graph structure
|
||||
```python
|
||||
mcp__agent-builder__validate_graph()
|
||||
# Returns: unreachable nodes, missing connections, etc.
|
||||
```
|
||||
|
||||
**create_session** - Track session state for bookkeeping
|
||||
```python
|
||||
mcp__agent-builder__create_session(session_name="my-build")
|
||||
```
|
||||
|
||||
**Key Point:** Files are written FIRST. MCP tools are for validation only.
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **building-agents-construction** - Step-by-step building process
|
||||
- **building-agents-patterns** - Best practices and examples
|
||||
- **agent-workflow** - Complete workflow orchestrator
|
||||
- **testing-agent** - Test and validate completed agents
|
||||
@@ -1,497 +0,0 @@
|
||||
---
|
||||
name: building-agents-patterns
|
||||
description: Best practices, patterns, and examples for building goal-driven agents. Includes pause/resume architecture, hybrid workflows, anti-patterns, and handoff to testing. Use when optimizing agent design.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "1.0"
|
||||
type: reference
|
||||
part_of: building-agents
|
||||
---
|
||||
|
||||
# Building Agents - Patterns & Best Practices
|
||||
|
||||
Design patterns, examples, and best practices for building robust goal-driven agents.
|
||||
|
||||
**Prerequisites:** Complete agent structure using `building-agents-construction`.
|
||||
|
||||
## Practical Example: Hybrid Workflow
|
||||
|
||||
How to build a node using both direct file writes and optional MCP validation:
|
||||
|
||||
```python
|
||||
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
|
||||
node_code = '''
|
||||
search_node = NodeSpec(
|
||||
id="search-web",
|
||||
node_type="llm_tool_use",
|
||||
input_keys=["query"],
|
||||
output_keys=["search_results"],
|
||||
system_prompt="Search the web for: {query}",
|
||||
tools=["web_search"],
|
||||
)
|
||||
'''
|
||||
|
||||
Edit(
|
||||
file_path="exports/research_agent/nodes/__init__.py",
|
||||
old_string="# Nodes will be added here",
|
||||
new_string=node_code
|
||||
)
|
||||
|
||||
print("✅ Added search_node to nodes/__init__.py")
|
||||
print("📁 Open exports/research_agent/nodes/__init__.py to see it!")
|
||||
|
||||
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
|
||||
validation = mcp__agent-builder__test_node(
|
||||
node_id="search-web",
|
||||
test_input='{"query": "python tutorials"}',
|
||||
mock_llm_response='{"search_results": [...mock results...]}'
|
||||
)
|
||||
|
||||
print(f"✓ Validation: {validation['success']}")
|
||||
```
|
||||
|
||||
**User experience:**
|
||||
|
||||
- Immediately sees node in their editor (from step 1)
|
||||
- Gets validation feedback (from step 2)
|
||||
- Can edit the file directly if needed
|
||||
|
||||
This combines visibility (files) with validation (MCP tools).
|
||||
|
||||
## Pause/Resume Architecture
|
||||
|
||||
For agents needing multi-turn conversations with user interaction:
|
||||
|
||||
### Basic Pause/Resume Flow
|
||||
|
||||
```python
|
||||
# Define pause nodes - execution stops at these nodes
|
||||
pause_nodes = ["request-clarification", "await-approval"]
|
||||
|
||||
# Define entry points - where to resume from each pause
|
||||
entry_points = {
|
||||
"start": "analyze-request", # Initial entry
|
||||
"request-clarification_resume": "process-clarification", # Resume from clarification
|
||||
"await-approval_resume": "execute-action", # Resume from approval
|
||||
}
|
||||
```
|
||||
|
||||
### Example: Multi-Turn Research Agent
|
||||
|
||||
```python
|
||||
# Nodes
|
||||
nodes = [
|
||||
NodeSpec(id="analyze-request", ...),
|
||||
NodeSpec(id="request-clarification", ...), # PAUSE NODE
|
||||
NodeSpec(id="process-clarification", ...),
|
||||
NodeSpec(id="generate-results", ...),
|
||||
NodeSpec(id="await-approval", ...), # PAUSE NODE
|
||||
NodeSpec(id="execute-action", ...),
|
||||
]
|
||||
|
||||
# Edges with resume flows
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="analyze-to-clarify",
|
||||
source="analyze-request",
|
||||
target="request-clarification",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="needs_clarification == true",
|
||||
),
|
||||
# When resumed, goes to process-clarification
|
||||
EdgeSpec(
|
||||
id="clarify-to-process",
|
||||
source="request-clarification",
|
||||
target="process-clarification",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="results-to-approval",
|
||||
source="generate-results",
|
||||
target="await-approval",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
# When resumed, goes to execute-action
|
||||
EdgeSpec(
|
||||
id="approval-to-execute",
|
||||
source="await-approval",
|
||||
target="execute-action",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
]
|
||||
|
||||
# Configuration
|
||||
pause_nodes = ["request-clarification", "await-approval"]
|
||||
entry_points = {
|
||||
"start": "analyze-request",
|
||||
"request-clarification_resume": "process-clarification",
|
||||
"await-approval_resume": "execute-action",
|
||||
}
|
||||
```
|
||||
|
||||
### Running Pause/Resume Agents
|
||||
|
||||
```python
|
||||
# Initial run - will pause at first pause node
|
||||
result1 = await agent.run(
|
||||
context={"query": "research topic"},
|
||||
session_state=None
|
||||
)
|
||||
|
||||
# Check if paused
|
||||
if result1.paused_at:
|
||||
print(f"Paused at: {result1.paused_at}")
|
||||
|
||||
# Resume with user input
|
||||
result2 = await agent.run(
|
||||
context={"user_response": "clarification details"},
|
||||
session_state=result1.session_state # Pass previous state
|
||||
)
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### What NOT to Do
|
||||
|
||||
❌ **Don't rely on `export_graph`** - Write files immediately, not at end
|
||||
```python
|
||||
# BAD: Building in session state, exporting at end
|
||||
mcp__agent-builder__add_node(...)
|
||||
mcp__agent-builder__add_node(...)
|
||||
mcp__agent-builder__export_graph() # Files appear only now
|
||||
|
||||
# GOOD: Writing files immediately
|
||||
Write(file_path="...", content=node_code) # File visible now
|
||||
Write(file_path="...", content=node_code) # File visible now
|
||||
```
|
||||
|
||||
❌ **Don't hide code in session** - Write to files as components approved
|
||||
```python
|
||||
# BAD: Accumulating changes invisibly
|
||||
session.add_component(component1)
|
||||
session.add_component(component2)
|
||||
# User can't see anything yet
|
||||
|
||||
# GOOD: Incremental visibility
|
||||
Edit(file_path="...", ...) # User sees change 1
|
||||
Edit(file_path="...", ...) # User sees change 2
|
||||
```
|
||||
|
||||
❌ **Don't wait to write files** - Agent visible from first step
|
||||
```python
|
||||
# BAD: Building everything before writing
|
||||
design_all_nodes()
|
||||
design_all_edges()
|
||||
write_everything_at_once()
|
||||
|
||||
# GOOD: Write as you go
|
||||
write_package_structure() # Visible
|
||||
write_goal() # Visible
|
||||
write_node_1() # Visible
|
||||
write_node_2() # Visible
|
||||
```
|
||||
|
||||
❌ **Don't batch everything** - Write incrementally
|
||||
```python
|
||||
# BAD: Batching all nodes
|
||||
nodes = [design_node_1(), design_node_2(), ...]
|
||||
write_all_nodes(nodes)
|
||||
|
||||
# GOOD: One at a time with user feedback
|
||||
write_node_1() # User approves
|
||||
write_node_2() # User approves
|
||||
write_node_3() # User approves
|
||||
```
|
||||
|
||||
### MCP Tools - Correct Usage
|
||||
|
||||
**MCP tools OK for:**
|
||||
✅ `test_node` - Validate node configuration with mock inputs
|
||||
✅ `validate_graph` - Check graph structure
|
||||
✅ `create_session` - Track session state for bookkeeping
|
||||
✅ Other validation tools
|
||||
|
||||
**Just don't:** Use MCP as the primary construction method or rely on export_graph
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Show Progress After Each Write
|
||||
|
||||
```python
|
||||
# After writing a node
|
||||
print("✅ Added analyze_request_node to nodes/__init__.py")
|
||||
print("📊 Progress: 1/6 nodes added")
|
||||
print("📁 Open exports/my_agent/nodes/__init__.py to see it!")
|
||||
```
|
||||
|
||||
### 2. Let User Open Files During Build
|
||||
|
||||
```python
|
||||
# Encourage file inspection
|
||||
print("✅ Goal written to agent.py")
|
||||
print("")
|
||||
print("💡 Tip: Open exports/my_agent/agent.py in your editor to see the goal!")
|
||||
```
|
||||
|
||||
### 3. Write Incrementally - One Component at a Time
|
||||
|
||||
```python
|
||||
# Good flow
|
||||
write_package_structure()
|
||||
show_user("Package created")
|
||||
|
||||
write_goal()
|
||||
show_user("Goal written")
|
||||
|
||||
for node in nodes:
|
||||
get_approval(node)
|
||||
write_node(node)
|
||||
show_user(f"Node {node.id} written")
|
||||
```
|
||||
|
||||
### 4. Test As You Build
|
||||
|
||||
```python
|
||||
# After adding several nodes
|
||||
print("💡 You can test current state with:")
|
||||
print(" PYTHONPATH=core:exports python -m my_agent validate")
|
||||
print(" PYTHONPATH=core:exports python -m my_agent info")
|
||||
```
|
||||
|
||||
### 5. Keep User Informed
|
||||
|
||||
```python
|
||||
# Clear status updates
|
||||
print("🔨 Creating package structure...")
|
||||
print("✅ Package created: exports/my_agent/")
|
||||
print("")
|
||||
print("📝 Next: Define agent goal")
|
||||
```
|
||||
|
||||
## Continuous Monitoring Agents
|
||||
|
||||
For agents that run continuously without terminal nodes:
|
||||
|
||||
```python
|
||||
# No terminal nodes - loops forever
|
||||
terminal_nodes = []
|
||||
|
||||
# Workflow loops back to start
|
||||
edges = [
|
||||
EdgeSpec(id="monitor-to-check", source="monitor", target="check-condition"),
|
||||
EdgeSpec(id="check-to-wait", source="check-condition", target="wait"),
|
||||
EdgeSpec(id="wait-to-monitor", source="wait", target="monitor"), # Loop
|
||||
]
|
||||
|
||||
# Entry node only
|
||||
entry_node = "monitor"
|
||||
entry_points = {"start": "monitor"}
|
||||
pause_nodes = []
|
||||
```
|
||||
|
||||
**Example: File Monitor**
|
||||
|
||||
```python
|
||||
nodes = [
|
||||
NodeSpec(id="list-files", ...),
|
||||
NodeSpec(id="check-new-files", node_type="router", ...),
|
||||
NodeSpec(id="process-files", ...),
|
||||
NodeSpec(id="wait-interval", node_type="function", ...),
|
||||
]
|
||||
|
||||
edges = [
|
||||
EdgeSpec(id="list-to-check", source="list-files", target="check-new-files"),
|
||||
EdgeSpec(
|
||||
id="check-to-process",
|
||||
source="check-new-files",
|
||||
target="process-files",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="new_files_count > 0",
|
||||
),
|
||||
EdgeSpec(
|
||||
id="check-to-wait",
|
||||
source="check-new-files",
|
||||
target="wait-interval",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="new_files_count == 0",
|
||||
),
|
||||
EdgeSpec(id="process-to-wait", source="process-files", target="wait-interval"),
|
||||
EdgeSpec(id="wait-to-list", source="wait-interval", target="list-files"), # Loop back
|
||||
]
|
||||
|
||||
terminal_nodes = [] # No terminal - runs forever
|
||||
```
|
||||
|
||||
## Complex Routing Patterns
|
||||
|
||||
### Multi-Condition Router
|
||||
|
||||
```python
|
||||
router_node = NodeSpec(
|
||||
id="decision-router",
|
||||
node_type="router",
|
||||
input_keys=["analysis_result"],
|
||||
output_keys=["decision"],
|
||||
system_prompt="""
|
||||
Based on the analysis result, decide the next action:
|
||||
- If confidence > 0.9: route to "execute"
|
||||
- If 0.5 <= confidence <= 0.9: route to "review"
|
||||
- If confidence < 0.5: route to "clarify"
|
||||
|
||||
Return: {"decision": "execute|review|clarify"}
|
||||
""",
|
||||
)
|
||||
|
||||
# Edges for each route
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="router-to-execute",
|
||||
source="decision-router",
|
||||
target="execute-action",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="decision == 'execute'",
|
||||
priority=1,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="router-to-review",
|
||||
source="decision-router",
|
||||
target="human-review",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="decision == 'review'",
|
||||
priority=2,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="router-to-clarify",
|
||||
source="decision-router",
|
||||
target="request-clarification",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="decision == 'clarify'",
|
||||
priority=3,
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
## Error Handling Patterns
|
||||
|
||||
### Graceful Failure with Fallback
|
||||
|
||||
```python
|
||||
# Primary node with error handling
|
||||
nodes = [
|
||||
NodeSpec(id="api-call", max_retries=3, ...),
|
||||
NodeSpec(id="fallback-cache", ...),
|
||||
NodeSpec(id="report-error", ...),
|
||||
]
|
||||
|
||||
edges = [
|
||||
# Success path
|
||||
EdgeSpec(
|
||||
id="api-success",
|
||||
source="api-call",
|
||||
target="process-results",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
),
|
||||
# Fallback on failure
|
||||
EdgeSpec(
|
||||
id="api-to-fallback",
|
||||
source="api-call",
|
||||
target="fallback-cache",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
priority=1,
|
||||
),
|
||||
# Report if fallback also fails
|
||||
EdgeSpec(
|
||||
id="fallback-to-error",
|
||||
source="fallback-cache",
|
||||
target="report-error",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
priority=1,
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Parallel Node Execution
|
||||
|
||||
```python
|
||||
# Use multiple edges from same source for parallel execution
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="start-to-search1",
|
||||
source="start",
|
||||
target="search-source-1",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="start-to-search2",
|
||||
source="start",
|
||||
target="search-source-2",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="start-to-search3",
|
||||
source="start",
|
||||
target="search-source-3",
|
||||
condition=EdgeCondition.ALWAYS,
|
||||
),
|
||||
# Converge results
|
||||
EdgeSpec(
|
||||
id="search1-to-merge",
|
||||
source="search-source-1",
|
||||
target="merge-results",
|
||||
),
|
||||
EdgeSpec(
|
||||
id="search2-to-merge",
|
||||
source="search-source-2",
|
||||
target="merge-results",
|
||||
),
|
||||
EdgeSpec(
|
||||
id="search3-to-merge",
|
||||
source="search-source-3",
|
||||
target="merge-results",
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
## Handoff to Testing
|
||||
|
||||
When agent is complete, transition to testing phase:
|
||||
|
||||
```python
|
||||
print("""
|
||||
✅ Agent complete: exports/my_agent/
|
||||
|
||||
Next steps:
|
||||
1. Switch to testing-agent skill
|
||||
2. Generate and approve tests
|
||||
3. Run evaluation
|
||||
4. Debug any failures
|
||||
|
||||
Command: "Test the agent at exports/my_agent/"
|
||||
""")
|
||||
```
|
||||
|
||||
### Pre-Testing Checklist
|
||||
|
||||
Before handing off to testing-agent:
|
||||
|
||||
- [ ] Agent structure validates: `python -m agent_name validate`
|
||||
- [ ] All nodes defined in nodes/__init__.py
|
||||
- [ ] All edges connect valid nodes
|
||||
- [ ] Entry node specified
|
||||
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
|
||||
- [ ] README.md with usage instructions
|
||||
- [ ] CLI commands work (info, validate)
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **building-agents-core** - Fundamental concepts
|
||||
- **building-agents-construction** - Step-by-step building
|
||||
- **testing-agent** - Test and validate agents
|
||||
- **agent-workflow** - Complete workflow orchestrator
|
||||
|
||||
---
|
||||
|
||||
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
|
||||
@@ -0,0 +1,399 @@
|
||||
---
|
||||
name: hive-concepts
|
||||
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.0"
|
||||
type: foundational
|
||||
part_of: hive
|
||||
---
|
||||
|
||||
# Building Agents - Core Concepts
|
||||
|
||||
Foundational knowledge for building goal-driven agents as Python packages.
|
||||
|
||||
## Architecture: Python Services (Not JSON Configs)
|
||||
|
||||
Agents are built as Python packages:
|
||||
|
||||
```
|
||||
exports/my_agent/
|
||||
├── __init__.py # Package exports
|
||||
├── __main__.py # CLI (run, info, validate, shell)
|
||||
├── agent.py # Graph construction (goal, edges, agent class)
|
||||
├── nodes/__init__.py # Node definitions (NodeSpec)
|
||||
├── config.py # Runtime config
|
||||
└── README.md # Documentation
|
||||
```
|
||||
|
||||
**Key Principle: Agent is visible and editable during build**
|
||||
|
||||
- Files created immediately as components are approved
|
||||
- User can watch files grow in their editor
|
||||
- No session state - just direct file writes
|
||||
- No "export" step - agent is ready when build completes
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Goal
|
||||
|
||||
Success criteria and constraints (written to agent.py)
|
||||
|
||||
```python
|
||||
goal = Goal(
|
||||
id="research-goal",
|
||||
name="Technical Research Agent",
|
||||
description="Research technical topics thoroughly",
|
||||
success_criteria=[
|
||||
SuccessCriterion(
|
||||
id="completeness",
|
||||
description="Cover all aspects of topic",
|
||||
metric="coverage_score",
|
||||
target=">=0.9",
|
||||
weight=0.4,
|
||||
),
|
||||
# 3-5 success criteria total
|
||||
],
|
||||
constraints=[
|
||||
Constraint(
|
||||
id="accuracy",
|
||||
description="All information must be verified",
|
||||
constraint_type="hard",
|
||||
category="quality",
|
||||
),
|
||||
# 1-5 constraints total
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
### Node
|
||||
|
||||
Unit of work (written to nodes/__init__.py)
|
||||
|
||||
**Node Types:**
|
||||
|
||||
- `event_loop` — Multi-turn streaming loop with tool execution and judge-based evaluation. Works with or without tools.
|
||||
- `function` — Deterministic Python operations. No LLM involved.
|
||||
|
||||
```python
|
||||
search_node = NodeSpec(
|
||||
id="search-web",
|
||||
name="Search Web",
|
||||
description="Search for information and extract results",
|
||||
node_type="event_loop",
|
||||
input_keys=["query"],
|
||||
output_keys=["search_results"],
|
||||
system_prompt="Search the web for: {query}. Use the web_search tool to find results, then call set_output to store them.",
|
||||
tools=["web_search"],
|
||||
)
|
||||
```
|
||||
|
||||
**NodeSpec Fields for Event Loop Nodes:**
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `client_facing` | `False` | If True, streams output to user and blocks for input between turns |
|
||||
| `nullable_output_keys` | `[]` | Output keys that may remain unset (for mutually exclusive outputs) |
|
||||
| `max_node_visits` | `1` | Max times this node executes per run. Set >1 for feedback loop targets |
|
||||
|
||||
### Edge
|
||||
|
||||
Connection between nodes (written to agent.py)
|
||||
|
||||
**Edge Conditions:**
|
||||
|
||||
- `on_success` — Proceed if node succeeds (most common)
|
||||
- `on_failure` — Handle errors
|
||||
- `always` — Always proceed
|
||||
- `conditional` — Based on expression evaluating node output
|
||||
|
||||
**Edge Priority:**
|
||||
|
||||
Priority controls evaluation order when multiple edges leave the same node. Higher priority edges are evaluated first. Use negative priority for feedback edges (edges that loop back to earlier nodes).
|
||||
|
||||
```python
|
||||
# Forward edge (evaluated first)
|
||||
EdgeSpec(
|
||||
id="review-to-campaign",
|
||||
source="review",
|
||||
target="campaign-builder",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="output.get('approved_contacts') is not None",
|
||||
priority=1,
|
||||
)
|
||||
|
||||
# Feedback edge (evaluated after forward edges)
|
||||
EdgeSpec(
|
||||
id="review-feedback",
|
||||
source="review",
|
||||
target="extractor",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="output.get('redo_extraction') is not None",
|
||||
priority=-1,
|
||||
)
|
||||
```
|
||||
|
||||
### Client-Facing Nodes
|
||||
|
||||
For multi-turn conversations with the user, set `client_facing=True` on a node. The node will:
|
||||
- Stream its LLM output directly to the end user
|
||||
- Block for user input between conversational turns
|
||||
- Resume when new input is injected via `inject_event()`
|
||||
|
||||
```python
|
||||
intake_node = NodeSpec(
|
||||
id="intake",
|
||||
name="Intake",
|
||||
description="Gather requirements from the user",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
input_keys=[],
|
||||
output_keys=["repo_url", "project_url"],
|
||||
system_prompt="You are the intake agent. Ask the user for the repo URL and project URL.",
|
||||
)
|
||||
```
|
||||
|
||||
> **Legacy Note:** The old `pause_nodes` / `entry_points` pattern still works but `client_facing=True` is preferred for new agents.
|
||||
|
||||
**STEP 1 / STEP 2 Prompt Pattern:** For client-facing nodes, structure the system prompt with two explicit phases:
|
||||
|
||||
```python
|
||||
system_prompt="""\
|
||||
**STEP 1 — Respond to the user (text only, NO tool calls):**
|
||||
[Present information, ask questions, etc.]
|
||||
|
||||
**STEP 2 — After the user responds, call set_output:**
|
||||
[Call set_output with the structured outputs]
|
||||
"""
|
||||
```
|
||||
|
||||
This prevents the LLM from calling `set_output` prematurely before the user has had a chance to respond.
|
||||
|
||||
### Node Design: Fewer, Richer Nodes
|
||||
|
||||
Prefer fewer nodes that do more work over many thin single-purpose nodes:
|
||||
|
||||
- **Bad**: 8 thin nodes (parse query → search → fetch → evaluate → synthesize → write → check → save)
|
||||
- **Good**: 4 rich nodes (intake → research → review → report)
|
||||
|
||||
Why: Each node boundary requires serializing outputs and passing context. Fewer nodes means the LLM retains full context of its work within the node. A research node that searches, fetches, and analyzes keeps all the source material in its conversation history.
|
||||
|
||||
### nullable_output_keys for Cross-Edge Inputs
|
||||
|
||||
When a node receives inputs that only arrive on certain edges (e.g., `feedback` only comes from a review → research feedback loop, not from intake → research), mark those keys as `nullable_output_keys`:
|
||||
|
||||
```python
|
||||
research_node = NodeSpec(
|
||||
id="research",
|
||||
input_keys=["research_brief", "feedback"],
|
||||
nullable_output_keys=["feedback"], # Not present on first visit
|
||||
max_node_visits=3,
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
## Event Loop Architecture Concepts
|
||||
|
||||
### How EventLoopNode Works
|
||||
|
||||
An event loop node runs a multi-turn loop:
|
||||
1. LLM receives system prompt + conversation history
|
||||
2. LLM responds (text and/or tool calls)
|
||||
3. Tool calls are executed, results added to conversation
|
||||
4. Judge evaluates: ACCEPT (exit loop), RETRY (loop again), or ESCALATE
|
||||
5. Repeat until judge ACCEPTs or max_iterations reached
|
||||
|
||||
### EventLoopNode Runtime
|
||||
|
||||
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. You do NOT need to manually register them. Both `GraphExecutor` (direct) and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically.
|
||||
|
||||
```python
|
||||
# Direct execution — executor auto-creates EventLoopNodes
|
||||
from framework.graph.executor import GraphExecutor
|
||||
from framework.runtime.core import Runtime
|
||||
|
||||
runtime = Runtime(storage_path)
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
storage_path=storage_path,
|
||||
)
|
||||
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
|
||||
|
||||
# TUI execution — AgentRuntime also works
|
||||
from framework.runtime.agent_runtime import create_agent_runtime
|
||||
runtime = create_agent_runtime(
|
||||
graph=graph, goal=goal, storage_path=storage_path,
|
||||
entry_points=[...], llm=llm, tools=tools, tool_executor=tool_executor,
|
||||
)
|
||||
```
|
||||
|
||||
### set_output
|
||||
|
||||
Nodes produce structured outputs by calling `set_output(key, value)` — a synthetic tool injected by the framework. When the LLM calls `set_output`, the value is stored in the output accumulator and made available to downstream nodes via shared memory.
|
||||
|
||||
`set_output` is NOT a real tool — it is excluded from `real_tool_results`. For client-facing nodes, this means a turn where the LLM only calls `set_output` (no other tools) is treated as a conversational boundary and will block for user input.
|
||||
|
||||
### JudgeProtocol
|
||||
|
||||
**The judge is the SOLE mechanism for acceptance decisions.** Do not add ad-hoc framework gating, output rollback, or premature rejection logic. If the LLM calls `set_output` too early, fix it with better prompts or a custom judge — not framework-level guards.
|
||||
|
||||
The judge controls when a node's loop exits:
|
||||
- **Implicit judge** (default, no judge configured): ACCEPTs when the LLM finishes with no tool calls and all required output keys are set
|
||||
- **SchemaJudge**: Validates outputs against a Pydantic model
|
||||
- **Custom judges**: Implement `evaluate(context) -> JudgeVerdict`
|
||||
|
||||
### LoopConfig
|
||||
|
||||
Controls loop behavior:
|
||||
- `max_iterations` (default 50) — prevents infinite loops
|
||||
- `max_tool_calls_per_turn` (default 10) — limits tool calls per LLM response
|
||||
- `tool_call_overflow_margin` (default 0.5) — wiggle room before discarding extra tool calls (50% means hard cutoff at 150% of limit)
|
||||
- `stall_detection_threshold` (default 3) — detects repeated identical responses
|
||||
- `max_history_tokens` (default 32000) — triggers conversation compaction
|
||||
|
||||
### Data Tools (Spillover Management)
|
||||
|
||||
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
|
||||
|
||||
- `save_data(filename, data)` — Write data to a file in the data directory
|
||||
- `load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
|
||||
- `list_data_files()` — List available data files
|
||||
- `serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
|
||||
|
||||
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
|
||||
|
||||
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
|
||||
|
||||
```python
|
||||
research_node = NodeSpec(
|
||||
...
|
||||
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
|
||||
)
|
||||
```
|
||||
|
||||
### Fan-Out / Fan-In
|
||||
|
||||
Multiple ON_SUCCESS edges from the same source create parallel execution. All branches run concurrently via `asyncio.gather()`. Parallel event_loop nodes must have disjoint `output_keys`.
|
||||
|
||||
### max_node_visits
|
||||
|
||||
Controls how many times a node can execute in one graph run. Default is 1. Set higher for nodes that are targets of feedback edges (review-reject loops). Set 0 for unlimited (guarded by max_steps).
|
||||
|
||||
## Tool Discovery & Validation
|
||||
|
||||
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
|
||||
|
||||
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
|
||||
|
||||
### Step 1: Register MCP Server (if not already done)
|
||||
|
||||
```python
|
||||
mcp__agent-builder__add_mcp_server(
|
||||
name="tools",
|
||||
transport="stdio",
|
||||
command="python",
|
||||
args='["mcp_server.py", "--stdio"]',
|
||||
cwd="../tools"
|
||||
)
|
||||
```
|
||||
|
||||
### Step 2: Discover Available Tools
|
||||
|
||||
```python
|
||||
# List all tools from all registered servers
|
||||
mcp__agent-builder__list_mcp_tools()
|
||||
|
||||
# Or list tools from a specific server
|
||||
mcp__agent-builder__list_mcp_tools(server_name="tools")
|
||||
```
|
||||
|
||||
### Step 3: Validate Before Adding Nodes
|
||||
|
||||
Before writing a node with `tools=[...]`:
|
||||
|
||||
1. Call `list_mcp_tools()` to get available tools
|
||||
2. Check each tool in your node exists in the response
|
||||
3. If a tool doesn't exist:
|
||||
- **DO NOT proceed** with the node
|
||||
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
|
||||
- Ask if they want to use an alternative or proceed without the tool
|
||||
|
||||
### Tool Validation Anti-Patterns
|
||||
|
||||
- **Never assume a tool exists** - always call `list_mcp_tools()` first
|
||||
- **Never write a node with unverified tools** - validate before writing
|
||||
- **Never silently drop tools** - if a tool doesn't exist, inform the user
|
||||
- **Never guess tool names** - use exact names from discovery response
|
||||
|
||||
## Workflow Overview: Incremental File Construction
|
||||
|
||||
```
|
||||
1. CREATE PACKAGE → mkdir + write skeletons
|
||||
2. DEFINE GOAL → Write to agent.py + config.py
|
||||
3. FOR EACH NODE:
|
||||
- Propose design (event_loop for LLM work, function for deterministic)
|
||||
- User approves
|
||||
- Write to nodes/__init__.py IMMEDIATELY
|
||||
- (Optional) Validate with test_node
|
||||
4. CONNECT EDGES → Update agent.py
|
||||
- Use priority for feedback edges (negative priority)
|
||||
- (Optional) Validate with validate_graph
|
||||
5. FINALIZE → Write agent class to agent.py
|
||||
6. DONE - Agent ready at exports/my_agent/
|
||||
```
|
||||
|
||||
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use hive-concepts when:
|
||||
- Starting a new agent project and need to understand fundamentals
|
||||
- Need to understand agent architecture before building
|
||||
- Want to validate tool availability before proceeding
|
||||
- Learning about node types, edges, and graph execution
|
||||
|
||||
**Next Steps:**
|
||||
- Ready to build? → Use `hive-create` skill
|
||||
- Need patterns and examples? → Use `hive-patterns` skill
|
||||
|
||||
## MCP Tools for Validation
|
||||
|
||||
After writing files, optionally use MCP tools for validation:
|
||||
|
||||
**test_node** - Validate node configuration with mock inputs
|
||||
```python
|
||||
mcp__agent-builder__test_node(
|
||||
node_id="search-web",
|
||||
test_input='{"query": "test query"}',
|
||||
mock_llm_response='{"results": "mock output"}'
|
||||
)
|
||||
```
|
||||
|
||||
**validate_graph** - Check graph structure
|
||||
```python
|
||||
mcp__agent-builder__validate_graph()
|
||||
# Returns: unreachable nodes, missing connections, event_loop validation, etc.
|
||||
```
|
||||
|
||||
**configure_loop** - Set event loop parameters
|
||||
```python
|
||||
mcp__agent-builder__configure_loop(
|
||||
max_iterations=50,
|
||||
max_tool_calls_per_turn=10,
|
||||
stall_detection_threshold=3,
|
||||
max_history_tokens=32000
|
||||
)
|
||||
```
|
||||
|
||||
**Key Point:** Files are written FIRST. MCP tools are for validation only.
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **hive-create** - Step-by-step building process
|
||||
- **hive-patterns** - Best practices: judges, feedback edges, fan-out, context management
|
||||
- **hive** - Complete workflow orchestrator
|
||||
- **hive-test** - Test and validate completed agents
|
||||
@@ -0,0 +1,540 @@
|
||||
---
|
||||
name: hive-create
|
||||
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.1"
|
||||
type: procedural
|
||||
part_of: hive
|
||||
requires: hive-concepts
|
||||
---
|
||||
|
||||
# Agent Construction - EXECUTE THESE STEPS
|
||||
|
||||
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
|
||||
|
||||
**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 1 — call the MCP tools listed in Step 1 as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 1 now.
|
||||
|
||||
---
|
||||
|
||||
## STEP 1: Initialize Build Environment
|
||||
|
||||
**EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):
|
||||
|
||||
1. Register the hive-tools MCP server:
|
||||
|
||||
```
|
||||
mcp__agent-builder__add_mcp_server(
|
||||
name="hive-tools",
|
||||
transport="stdio",
|
||||
command="uv",
|
||||
args='["run", "python", "mcp_server.py", "--stdio"]',
|
||||
cwd="tools",
|
||||
description="Hive tools MCP server"
|
||||
)
|
||||
```
|
||||
|
||||
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
|
||||
|
||||
```
|
||||
mcp__agent-builder__create_session(name="AGENT_NAME")
|
||||
```
|
||||
|
||||
3. Discover available tools:
|
||||
|
||||
```
|
||||
mcp__agent-builder__list_mcp_tools()
|
||||
```
|
||||
|
||||
4. Create the package directory:
|
||||
|
||||
```bash
|
||||
mkdir -p exports/AGENT_NAME/nodes
|
||||
```
|
||||
|
||||
**Save the tool list for step 3** — you will need it for node design in STEP 3.
|
||||
|
||||
**THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).
|
||||
|
||||
---
|
||||
|
||||
## STEP 2: Define Goal Together with User
|
||||
|
||||
**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.
|
||||
|
||||
**START by asking the user to help shape the goal:**
|
||||
|
||||
> I've set up the build environment and discovered [N] available tools. Let's define the goal for your agent together.
|
||||
>
|
||||
> To get started, can you help me understand:
|
||||
>
|
||||
> 1. **What should this agent accomplish?** (the core purpose)
|
||||
> 2. **How will we know it succeeded?** (what does "done" look like)
|
||||
> 3. **Are there any hard constraints?** (things it must never do, quality bars, etc.)
|
||||
|
||||
**WAIT for the user to respond.** Use their input to draft:
|
||||
|
||||
- Goal ID (kebab-case)
|
||||
- Goal name
|
||||
- Goal description
|
||||
- 3-5 success criteria (each with: id, description, metric, target, weight)
|
||||
- 2-4 constraints (each with: id, description, constraint_type, category)
|
||||
|
||||
**PRESENT the draft goal for approval:**
|
||||
|
||||
> **Proposed Goal: [Name]**
|
||||
>
|
||||
> [Description]
|
||||
>
|
||||
> **Success Criteria:**
|
||||
>
|
||||
> 1. [criterion 1]
|
||||
> 2. [criterion 2]
|
||||
> ...
|
||||
>
|
||||
> **Constraints:**
|
||||
>
|
||||
> 1. [constraint 1]
|
||||
> 2. [constraint 2]
|
||||
> ...
|
||||
|
||||
**THEN call AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Do you approve this goal definition?",
|
||||
"header": "Goal",
|
||||
"options": [
|
||||
{"label": "Approve", "description": "Goal looks good, proceed to workflow design"},
|
||||
{"label": "Modify", "description": "I want to change something"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
|
||||
- If **Modify**: Ask what they want to change, update the draft, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 3: Design Conceptual Nodes
|
||||
|
||||
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
|
||||
|
||||
**DESIGN the workflow** as a series of nodes. For each node, determine:
|
||||
|
||||
- node_id (kebab-case)
|
||||
- name
|
||||
- description
|
||||
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
|
||||
- input_keys (what data this node receives)
|
||||
- output_keys (what data this node produces)
|
||||
- tools (ONLY tools that exist from Step 1 — empty list if no tools needed)
|
||||
- client_facing: True if this node interacts with the user
|
||||
- nullable_output_keys (for mutually exclusive outputs or feedback-only inputs)
|
||||
- max_node_visits (>1 if this node is a feedback loop target)
|
||||
|
||||
**Prefer fewer, richer nodes** (4 nodes > 8 thin nodes). Each node boundary requires serializing outputs. A research node that searches, fetches, and analyzes keeps all source material in its conversation history.
|
||||
|
||||
**PRESENT the nodes to the user for review:**
|
||||
|
||||
> **Proposed Nodes ([N] total):**
|
||||
>
|
||||
> | # | Node ID | Type | Description | Tools | Client-Facing |
|
||||
> | --- | ---------- | ---------- | ----------------------------- | ---------------------- | :-----------: |
|
||||
> | 1 | `intake` | event_loop | Gather requirements from user | — | Yes |
|
||||
> | 2 | `research` | event_loop | Search and analyze sources | web_search, web_scrape | No |
|
||||
> | 3 | `review` | event_loop | Present findings for approval | — | Yes |
|
||||
> | 4 | `report` | event_loop | Generate final report | save_data | No |
|
||||
>
|
||||
> **Data Flow:**
|
||||
>
|
||||
> - `intake` produces: `research_brief`
|
||||
> - `research` receives: `research_brief` → produces: `findings`, `sources`
|
||||
> - `review` receives: `findings`, `sources` → produces: `approved_findings` or `feedback`
|
||||
> - `report` receives: `approved_findings` → produces: `final_report`
|
||||
|
||||
**THEN call AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Do you approve these nodes?",
|
||||
"header": "Nodes",
|
||||
"options": [
|
||||
{"label": "Approve", "description": "Nodes look good, proceed to graph design"},
|
||||
{"label": "Modify", "description": "I want to change the nodes"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Proceed to STEP 4
|
||||
- If **Modify**: Ask what they want to change, update design, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 4: Design Full Graph and Review
|
||||
|
||||
**DETERMINE the edges** connecting the approved nodes. For each edge:
|
||||
|
||||
- edge_id (kebab-case)
|
||||
- source → target
|
||||
- condition: `on_success`, `on_failure`, `always`, or `conditional`
|
||||
- condition_expr (Python expression, only if conditional)
|
||||
- priority (positive = forward, negative = feedback/loop-back)
|
||||
|
||||
**RENDER the complete graph as ASCII art.** Make it large and clear — the user needs to see and understand the full workflow at a glance.
|
||||
|
||||
**IMPORTANT: Make the ASCII art BIG and READABLE.** Use a box-and-arrow style with generous spacing. Do NOT make it tiny or compressed. Example format:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT: Research Agent │
|
||||
│ │
|
||||
│ Goal: Thoroughly research technical topics and produce verified reports │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌───────────────────────┐
|
||||
│ INTAKE │
|
||||
│ (client-facing) │
|
||||
│ │
|
||||
│ in: topic │
|
||||
│ out: research_brief │
|
||||
└───────────┬───────────┘
|
||||
│ on_success
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ RESEARCH │
|
||||
│ │
|
||||
│ tools: web_search, │
|
||||
│ web_scrape │
|
||||
│ │
|
||||
│ in: research_brief │
|
||||
│ [feedback] │
|
||||
│ out: findings, │
|
||||
│ sources │
|
||||
└───────────┬───────────┘
|
||||
│ on_success
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ REVIEW │
|
||||
│ (client-facing) │
|
||||
│ │
|
||||
│ in: findings, │
|
||||
│ sources │
|
||||
│ out: approved_findings│
|
||||
│ OR feedback │
|
||||
└───────┬───────┬───────┘
|
||||
│ │
|
||||
approved │ │ feedback (priority: -1)
|
||||
│ │
|
||||
▼ └──────────────────┐
|
||||
┌───────────────────────┐ │
|
||||
│ REPORT │ │
|
||||
│ │ │
|
||||
│ tools: save_data │ │
|
||||
│ │ │
|
||||
│ in: approved_ │ │
|
||||
│ findings │ │
|
||||
│ out: final_report │ │
|
||||
└───────────────────────┘ │
|
||||
│
|
||||
┌──────────────────────────┘
|
||||
│ loops back to RESEARCH
|
||||
▼ (max_node_visits: 3)
|
||||
|
||||
|
||||
EDGES:
|
||||
──────
|
||||
1. intake → research [on_success, priority: 1]
|
||||
2. research → review [on_success, priority: 1]
|
||||
3. review → report [conditional: approved_findings is not None, priority: 1]
|
||||
4. review → research [conditional: feedback is not None, priority: -1]
|
||||
```
|
||||
|
||||
**PRESENT the graph and edges to the user:**
|
||||
|
||||
> Here is the complete workflow graph:
|
||||
>
|
||||
> [ASCII art above]
|
||||
>
|
||||
> **Edge Summary:**
|
||||
>
|
||||
> | # | Edge | Condition | Priority |
|
||||
> | --- | ----------------- | -------------------------------------------- | -------- |
|
||||
> | 1 | intake → research | on_success | 1 |
|
||||
> | 2 | research → review | on_success | 1 |
|
||||
> | 3 | review → report | conditional: `approved_findings is not None` | 1 |
|
||||
> | 4 | review → research | conditional: `feedback is not None` | -1 |
|
||||
|
||||
**THEN call AskUserQuestion:**
|
||||
|
||||
```
|
||||
AskUserQuestion(questions=[{
|
||||
"question": "Do you approve this workflow graph?",
|
||||
"header": "Graph",
|
||||
"options": [
|
||||
{"label": "Approve", "description": "Graph looks good, proceed to build the agent"},
|
||||
{"label": "Modify", "description": "I want to change the graph"}
|
||||
],
|
||||
"multiSelect": false
|
||||
}])
|
||||
```
|
||||
|
||||
**WAIT for user response.**
|
||||
|
||||
- If **Approve**: Proceed to STEP 5
|
||||
- If **Modify**: Ask what they want to change, update the graph, re-render, ask again
|
||||
|
||||
---
|
||||
|
||||
## STEP 5: Build the Agent
|
||||
|
||||
**NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.
|
||||
|
||||
### 5a: Register nodes and edges with MCP
|
||||
|
||||
**FOR EACH approved node**, call:
|
||||
|
||||
```
|
||||
mcp__agent-builder__add_node(
|
||||
node_id="...",
|
||||
name="...",
|
||||
description="...",
|
||||
node_type="event_loop",
|
||||
input_keys='["key1", "key2"]',
|
||||
output_keys='["key1"]',
|
||||
tools='["tool1"]',
|
||||
system_prompt="...",
|
||||
client_facing=True/False,
|
||||
nullable_output_keys='["key"]',
|
||||
max_node_visits=1
|
||||
)
|
||||
```
|
||||
|
||||
**FOR EACH approved edge**, call:
|
||||
|
||||
```
|
||||
mcp__agent-builder__add_edge(
|
||||
edge_id="source-to-target",
|
||||
source="source-node-id",
|
||||
target="target-node-id",
|
||||
condition="on_success",
|
||||
condition_expr="",
|
||||
priority=1
|
||||
)
|
||||
```
|
||||
|
||||
**VALIDATE the graph:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__validate_graph()
|
||||
```
|
||||
|
||||
- If invalid: Fix the issues and re-validate
|
||||
- If valid: Continue to 5b
|
||||
|
||||
### 5b: Write Python package files
|
||||
|
||||
**EXPORT the graph data:**
|
||||
|
||||
```
|
||||
mcp__agent-builder__export_graph()
|
||||
```
|
||||
|
||||
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
|
||||
|
||||
1. `config.py` - Runtime configuration with model settings
|
||||
2. `nodes/__init__.py` - All NodeSpec definitions
|
||||
3. `agent.py` - Goal, edges, graph config, and agent class
|
||||
4. `__init__.py` - Package exports
|
||||
5. `__main__.py` - CLI interface
|
||||
6. `mcp_servers.json` - MCP server configurations
|
||||
7. `README.md` - Usage documentation
|
||||
|
||||
**IMPORTANT entry_points format:**
|
||||
|
||||
- MUST be: `{"start": "first-node-id"}`
|
||||
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
|
||||
- NOT: `{"first-node-id"}` (WRONG - this is a set)
|
||||
|
||||
**IMPORTANT mcp_servers.json format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"hive-tools": {
|
||||
"transport": "stdio",
|
||||
"command": "uv",
|
||||
"args": ["run", "python", "mcp_server.py", "--stdio"],
|
||||
"cwd": "../../tools",
|
||||
"description": "Hive tools MCP server"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- NO `"mcpServers"` wrapper (that's Claude Desktop format, NOT hive format)
|
||||
- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
|
||||
- `command` MUST be `"uv"` with `"args": ["run", "python", ...]` (NOT bare `"python"` which fails on Mac)
|
||||
|
||||
**Use the example agent** at `.claude/skills/hive-create/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.
|
||||
|
||||
**AFTER writing all files, tell the user:**
|
||||
|
||||
> Agent package created: `exports/AGENT_NAME/`
|
||||
>
|
||||
> **Files generated:**
|
||||
>
|
||||
> - `__init__.py` - Package exports
|
||||
> - `agent.py` - Goal, nodes, edges, agent class
|
||||
> - `config.py` - Runtime configuration
|
||||
> - `__main__.py` - CLI interface
|
||||
> - `nodes/__init__.py` - Node definitions
|
||||
> - `mcp_servers.json` - MCP server config
|
||||
> - `README.md` - Usage documentation
|
||||
|
||||
---
|
||||
|
||||
## STEP 6: Verify and Test
|
||||
|
||||
**RUN validation:**
|
||||
|
||||
```bash
|
||||
cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME validate
|
||||
```
|
||||
|
||||
- If valid: Agent is complete!
|
||||
- If errors: Fix the issues and re-run
|
||||
|
||||
**TELL the user the agent is ready** and display the next steps box:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ✅ AGENT BUILD COMPLETE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ NEXT STEPS: │
|
||||
│ │
|
||||
│ 1. SET UP CREDENTIALS (if agent uses tools like web_search, send_email): │
|
||||
│ │
|
||||
│ /hive-credentials --agent AGENT_NAME │
|
||||
│ │
|
||||
│ 2. RUN YOUR AGENT: │
|
||||
│ │
|
||||
│ hive tui │
|
||||
│ │
|
||||
│ Then select your agent from the list and press Enter. │
|
||||
│ │
|
||||
│ 3. DEBUG ANY ISSUES: │
|
||||
│ │
|
||||
│ /hive-debugger │
|
||||
│ │
|
||||
│ The debugger monitors runtime logs, identifies retry loops, │
|
||||
│ tool failures, and missing outputs, and provides fix recommendations. │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: Node Types
|
||||
|
||||
| Type | tools param | Use when |
|
||||
| ------------ | ----------------------- | --------------------------------------- |
|
||||
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
|
||||
| `function` | N/A | Deterministic Python operations, no LLM |
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: NodeSpec Fields
|
||||
|
||||
| Field | Default | Description |
|
||||
| ---------------------- | ------- | --------------------------------------------------------------------- |
|
||||
| `client_facing` | `False` | Streams output to user, blocks for input between turns |
|
||||
| `nullable_output_keys` | `[]` | Output keys that may remain unset (mutually exclusive outputs) |
|
||||
| `max_node_visits` | `1` | Max executions per run. Set >1 for feedback loop targets. 0=unlimited |
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: Edge Conditions & Priority
|
||||
|
||||
| Condition | When edge is followed |
|
||||
| ------------- | ------------------------------------- |
|
||||
| `on_success` | Source node completed successfully |
|
||||
| `on_failure` | Source node failed |
|
||||
| `always` | Always, regardless of success/failure |
|
||||
| `conditional` | When condition_expr evaluates to True |
|
||||
|
||||
**Priority:** Positive = forward edge (evaluated first). Negative = feedback edge (loops back to earlier node). Multiple ON_SUCCESS edges from same source = parallel execution (fan-out).
|
||||
|
||||
---
|
||||
|
||||
## REFERENCE: System Prompt Best Practice
|
||||
|
||||
For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:
|
||||
|
||||
```
|
||||
Use set_output(key, value) to store your results. For example:
|
||||
- set_output("search_results", <your results as a JSON string>)
|
||||
|
||||
Do NOT return raw JSON. Use the set_output tool to produce outputs.
|
||||
```
|
||||
|
||||
For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
|
||||
|
||||
```
|
||||
**STEP 1 — Respond to the user (text only, NO tool calls):**
|
||||
[Present information, ask questions, etc.]
|
||||
|
||||
**STEP 2 — After the user responds, call set_output:**
|
||||
- set_output("key", "value based on user's response")
|
||||
```
|
||||
|
||||
This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
|
||||
|
||||
---
|
||||
|
||||
## EventLoopNode Runtime
|
||||
|
||||
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.
|
||||
|
||||
```python
|
||||
# Direct execution
|
||||
from framework.graph.executor import GraphExecutor
|
||||
from framework.runtime.core import Runtime
|
||||
|
||||
storage_path = Path.home() / ".hive" / "agents" / "my_agent"
|
||||
storage_path.mkdir(parents=True, exist_ok=True)
|
||||
runtime = Runtime(storage_path)
|
||||
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
storage_path=storage_path,
|
||||
)
|
||||
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
|
||||
```
|
||||
|
||||
**DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.
|
||||
|
||||
---
|
||||
|
||||
## COMMON MISTAKES TO AVOID
|
||||
|
||||
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
|
||||
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
|
||||
3. **Skipping validation** - Always validate nodes and graph before proceeding
|
||||
4. **Not waiting for approval** - Always ask user before major steps
|
||||
5. **Displaying this file** - Execute the steps, don't show documentation
|
||||
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
|
||||
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
|
||||
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
|
||||
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
|
||||
10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
|
||||
11. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
|
||||
@@ -0,0 +1,24 @@
|
||||
"""
|
||||
Deep Research Agent - Interactive, rigorous research with TUI conversation.
|
||||
|
||||
Research any topic through multi-source web search, quality evaluation,
|
||||
and synthesis. Features client-facing TUI interaction at key checkpoints
|
||||
for user guidance and iterative deepening.
|
||||
"""
|
||||
|
||||
from .agent import DeepResearchAgent, default_agent, goal, nodes, edges
|
||||
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
|
||||
|
||||
__version__ = "1.0.0"
|
||||
|
||||
__all__ = [
|
||||
"DeepResearchAgent",
|
||||
"default_agent",
|
||||
"goal",
|
||||
"nodes",
|
||||
"edges",
|
||||
"RuntimeConfig",
|
||||
"AgentMetadata",
|
||||
"default_config",
|
||||
"metadata",
|
||||
]
|
||||
+100
-17
@@ -1,5 +1,5 @@
|
||||
"""
|
||||
CLI entry point for Online Research Agent.
|
||||
CLI entry point for Deep Research Agent.
|
||||
|
||||
Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
|
||||
"""
|
||||
@@ -10,7 +10,7 @@ import logging
|
||||
import sys
|
||||
import click
|
||||
|
||||
from .agent import default_agent, OnlineResearchAgent
|
||||
from .agent import default_agent, DeepResearchAgent
|
||||
|
||||
|
||||
def setup_logging(verbose=False, debug=False):
|
||||
@@ -28,7 +28,7 @@ def setup_logging(verbose=False, debug=False):
|
||||
@click.group()
|
||||
@click.version_option(version="1.0.0")
|
||||
def cli():
|
||||
"""Online Research Agent - Deep-dive research with narrative reports."""
|
||||
"""Deep Research Agent - Interactive, rigorous research with TUI conversation."""
|
||||
pass
|
||||
|
||||
|
||||
@@ -59,6 +59,85 @@ def run(topic, mock, quiet, verbose, debug):
|
||||
sys.exit(0 if result.success else 1)
|
||||
|
||||
|
||||
@cli.command()
|
||||
@click.option("--mock", is_flag=True, help="Run in mock mode")
|
||||
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
|
||||
@click.option("--debug", is_flag=True, help="Show debug logging")
|
||||
def tui(mock, verbose, debug):
|
||||
"""Launch the TUI dashboard for interactive research."""
|
||||
setup_logging(verbose=verbose, debug=debug)
|
||||
|
||||
try:
|
||||
from framework.tui.app import AdenTUI
|
||||
except ImportError:
|
||||
click.echo(
|
||||
"TUI requires the 'textual' package. Install with: pip install textual"
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from framework.llm import LiteLLMProvider
|
||||
from framework.runner.tool_registry import ToolRegistry
|
||||
from framework.runtime.agent_runtime import create_agent_runtime
|
||||
from framework.runtime.event_bus import EventBus
|
||||
from framework.runtime.execution_stream import EntryPointSpec
|
||||
|
||||
async def run_with_tui():
|
||||
agent = DeepResearchAgent()
|
||||
|
||||
# Build graph and tools
|
||||
agent._event_bus = EventBus()
|
||||
agent._tool_registry = ToolRegistry()
|
||||
|
||||
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
|
||||
storage_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
|
||||
if mcp_config_path.exists():
|
||||
agent._tool_registry.load_mcp_config(mcp_config_path)
|
||||
|
||||
llm = None
|
||||
if not mock:
|
||||
llm = LiteLLMProvider(
|
||||
model=agent.config.model,
|
||||
api_key=agent.config.api_key,
|
||||
api_base=agent.config.api_base,
|
||||
)
|
||||
|
||||
tools = list(agent._tool_registry.get_tools().values())
|
||||
tool_executor = agent._tool_registry.get_executor()
|
||||
graph = agent._build_graph()
|
||||
|
||||
runtime = create_agent_runtime(
|
||||
graph=graph,
|
||||
goal=agent.goal,
|
||||
storage_path=storage_path,
|
||||
entry_points=[
|
||||
EntryPointSpec(
|
||||
id="start",
|
||||
name="Start Research",
|
||||
entry_node="intake",
|
||||
trigger_type="manual",
|
||||
isolation_level="isolated",
|
||||
),
|
||||
],
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
)
|
||||
|
||||
await runtime.start()
|
||||
|
||||
try:
|
||||
app = AdenTUI(runtime)
|
||||
await app.run_async()
|
||||
finally:
|
||||
await runtime.stop()
|
||||
|
||||
asyncio.run(run_with_tui())
|
||||
|
||||
|
||||
@cli.command()
|
||||
@click.option("--json", "output_json", is_flag=True)
|
||||
def info(output_json):
|
||||
@@ -71,6 +150,7 @@ def info(output_json):
|
||||
click.echo(f"Version: {info_data['version']}")
|
||||
click.echo(f"Description: {info_data['description']}")
|
||||
click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
|
||||
click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
|
||||
click.echo(f"Entry: {info_data['entry_node']}")
|
||||
click.echo(f"Terminal: {', '.join(info_data['terminal_nodes'])}")
|
||||
|
||||
@@ -81,6 +161,9 @@ def validate():
|
||||
validation = default_agent.validate()
|
||||
if validation["valid"]:
|
||||
click.echo("Agent is valid")
|
||||
if validation["warnings"]:
|
||||
for warning in validation["warnings"]:
|
||||
click.echo(f" WARNING: {warning}")
|
||||
else:
|
||||
click.echo("Agent has errors:")
|
||||
for error in validation["errors"]:
|
||||
@@ -91,7 +174,7 @@ def validate():
|
||||
@cli.command()
|
||||
@click.option("--verbose", "-v", is_flag=True)
|
||||
def shell(verbose):
|
||||
"""Interactive research session."""
|
||||
"""Interactive research session (CLI, no TUI)."""
|
||||
asyncio.run(_interactive_shell(verbose))
|
||||
|
||||
|
||||
@@ -99,10 +182,10 @@ async def _interactive_shell(verbose=False):
|
||||
"""Async interactive shell."""
|
||||
setup_logging(verbose=verbose)
|
||||
|
||||
click.echo("=== Online Research Agent ===")
|
||||
click.echo("=== Deep Research Agent ===")
|
||||
click.echo("Enter a topic to research (or 'quit' to exit):\n")
|
||||
|
||||
agent = OnlineResearchAgent()
|
||||
agent = DeepResearchAgent()
|
||||
await agent.start()
|
||||
|
||||
try:
|
||||
@@ -118,7 +201,7 @@ async def _interactive_shell(verbose=False):
|
||||
if not topic.strip():
|
||||
continue
|
||||
|
||||
click.echo("\nResearching... (this may take a few minutes)\n")
|
||||
click.echo("\nResearching...\n")
|
||||
|
||||
result = await agent.trigger_and_wait("start", {"topic": topic})
|
||||
|
||||
@@ -128,16 +211,16 @@ async def _interactive_shell(verbose=False):
|
||||
|
||||
if result.success:
|
||||
output = result.output
|
||||
if "file_path" in output:
|
||||
click.echo(f"\nReport saved to: {output['file_path']}\n")
|
||||
if "final_report" in output:
|
||||
click.echo("\n--- Report Preview ---\n")
|
||||
preview = (
|
||||
output["final_report"][:500] + "..."
|
||||
if len(output.get("final_report", "")) > 500
|
||||
else output.get("final_report", "")
|
||||
)
|
||||
click.echo(preview)
|
||||
if "report_content" in output:
|
||||
click.echo("\n--- Report ---\n")
|
||||
click.echo(output["report_content"])
|
||||
click.echo("\n")
|
||||
if "references" in output:
|
||||
click.echo("--- References ---\n")
|
||||
for ref in output.get("references", []):
|
||||
click.echo(
|
||||
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
|
||||
)
|
||||
click.echo("\n")
|
||||
else:
|
||||
click.echo(f"\nResearch failed: {result.error}\n")
|
||||
@@ -0,0 +1,311 @@
|
||||
"""Agent graph construction for Deep Research Agent."""
|
||||
|
||||
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
|
||||
from framework.graph.edge import GraphSpec
|
||||
from framework.graph.executor import ExecutionResult, GraphExecutor
|
||||
from framework.runtime.event_bus import EventBus
|
||||
from framework.runtime.core import Runtime
|
||||
from framework.llm import LiteLLMProvider
|
||||
from framework.runner.tool_registry import ToolRegistry
|
||||
|
||||
from .config import default_config, metadata
|
||||
from .nodes import (
|
||||
intake_node,
|
||||
research_node,
|
||||
review_node,
|
||||
report_node,
|
||||
)
|
||||
|
||||
# Goal definition
|
||||
goal = Goal(
|
||||
id="rigorous-interactive-research",
|
||||
name="Rigorous Interactive Research",
|
||||
description=(
|
||||
"Research any topic by searching diverse sources, analyzing findings, "
|
||||
"and producing a cited report — with user checkpoints to guide direction."
|
||||
),
|
||||
success_criteria=[
|
||||
SuccessCriterion(
|
||||
id="source-diversity",
|
||||
description="Use multiple diverse, authoritative sources",
|
||||
metric="source_count",
|
||||
target=">=5",
|
||||
weight=0.25,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="citation-coverage",
|
||||
description="Every factual claim in the report cites its source",
|
||||
metric="citation_coverage",
|
||||
target="100%",
|
||||
weight=0.25,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="user-satisfaction",
|
||||
description="User reviews findings before report generation",
|
||||
metric="user_approval",
|
||||
target="true",
|
||||
weight=0.25,
|
||||
),
|
||||
SuccessCriterion(
|
||||
id="report-completeness",
|
||||
description="Final report answers the original research questions",
|
||||
metric="question_coverage",
|
||||
target="90%",
|
||||
weight=0.25,
|
||||
),
|
||||
],
|
||||
constraints=[
|
||||
Constraint(
|
||||
id="no-hallucination",
|
||||
description="Only include information found in fetched sources",
|
||||
constraint_type="quality",
|
||||
category="accuracy",
|
||||
),
|
||||
Constraint(
|
||||
id="source-attribution",
|
||||
description="Every claim must cite its source with a numbered reference",
|
||||
constraint_type="quality",
|
||||
category="accuracy",
|
||||
),
|
||||
Constraint(
|
||||
id="user-checkpoint",
|
||||
description="Present findings to the user before writing the final report",
|
||||
constraint_type="functional",
|
||||
category="interaction",
|
||||
),
|
||||
],
|
||||
)
|
||||
|
||||
# Node list
|
||||
nodes = [
|
||||
intake_node,
|
||||
research_node,
|
||||
review_node,
|
||||
report_node,
|
||||
]
|
||||
|
||||
# Edge definitions
|
||||
edges = [
|
||||
# intake -> research
|
||||
EdgeSpec(
|
||||
id="intake-to-research",
|
||||
source="intake",
|
||||
target="research",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
# research -> review
|
||||
EdgeSpec(
|
||||
id="research-to-review",
|
||||
source="research",
|
||||
target="review",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
priority=1,
|
||||
),
|
||||
# review -> research (feedback loop)
|
||||
EdgeSpec(
|
||||
id="review-to-research-feedback",
|
||||
source="review",
|
||||
target="research",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="needs_more_research == True",
|
||||
priority=1,
|
||||
),
|
||||
# review -> report (user satisfied)
|
||||
EdgeSpec(
|
||||
id="review-to-report",
|
||||
source="review",
|
||||
target="report",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="needs_more_research == False",
|
||||
priority=2,
|
||||
),
|
||||
]
|
||||
|
||||
# Graph configuration
|
||||
entry_node = "intake"
|
||||
entry_points = {"start": "intake"}
|
||||
pause_nodes = []
|
||||
terminal_nodes = ["report"]
|
||||
|
||||
|
||||
class DeepResearchAgent:
|
||||
"""
|
||||
Deep Research Agent — 4-node pipeline with user checkpoints.
|
||||
|
||||
Flow: intake -> research -> review -> report
|
||||
^ |
|
||||
+-- feedback loop (if user wants more)
|
||||
"""
|
||||
|
||||
def __init__(self, config=None):
|
||||
self.config = config or default_config
|
||||
self.goal = goal
|
||||
self.nodes = nodes
|
||||
self.edges = edges
|
||||
self.entry_node = entry_node
|
||||
self.entry_points = entry_points
|
||||
self.pause_nodes = pause_nodes
|
||||
self.terminal_nodes = terminal_nodes
|
||||
self._executor: GraphExecutor | None = None
|
||||
self._graph: GraphSpec | None = None
|
||||
self._event_bus: EventBus | None = None
|
||||
self._tool_registry: ToolRegistry | None = None
|
||||
|
||||
def _build_graph(self) -> GraphSpec:
|
||||
"""Build the GraphSpec."""
|
||||
return GraphSpec(
|
||||
id="deep-research-agent-graph",
|
||||
goal_id=self.goal.id,
|
||||
version="1.0.0",
|
||||
entry_node=self.entry_node,
|
||||
entry_points=self.entry_points,
|
||||
terminal_nodes=self.terminal_nodes,
|
||||
pause_nodes=self.pause_nodes,
|
||||
nodes=self.nodes,
|
||||
edges=self.edges,
|
||||
default_model=self.config.model,
|
||||
max_tokens=self.config.max_tokens,
|
||||
loop_config={
|
||||
"max_iterations": 100,
|
||||
"max_tool_calls_per_turn": 20,
|
||||
"max_history_tokens": 32000,
|
||||
},
|
||||
)
|
||||
|
||||
def _setup(self, mock_mode=False) -> GraphExecutor:
|
||||
"""Set up the executor with all components."""
|
||||
from pathlib import Path
|
||||
|
||||
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
|
||||
storage_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self._event_bus = EventBus()
|
||||
self._tool_registry = ToolRegistry()
|
||||
|
||||
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
|
||||
if mcp_config_path.exists():
|
||||
self._tool_registry.load_mcp_config(mcp_config_path)
|
||||
|
||||
llm = None
|
||||
if not mock_mode:
|
||||
llm = LiteLLMProvider(
|
||||
model=self.config.model,
|
||||
api_key=self.config.api_key,
|
||||
api_base=self.config.api_base,
|
||||
)
|
||||
|
||||
tool_executor = self._tool_registry.get_executor()
|
||||
tools = list(self._tool_registry.get_tools().values())
|
||||
|
||||
self._graph = self._build_graph()
|
||||
runtime = Runtime(storage_path)
|
||||
|
||||
self._executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
event_bus=self._event_bus,
|
||||
storage_path=storage_path,
|
||||
loop_config=self._graph.loop_config,
|
||||
)
|
||||
|
||||
return self._executor
|
||||
|
||||
async def start(self, mock_mode=False) -> None:
|
||||
"""Set up the agent (initialize executor and tools)."""
|
||||
if self._executor is None:
|
||||
self._setup(mock_mode=mock_mode)
|
||||
|
||||
async def stop(self) -> None:
|
||||
"""Clean up resources."""
|
||||
self._executor = None
|
||||
self._event_bus = None
|
||||
|
||||
async def trigger_and_wait(
|
||||
self,
|
||||
entry_point: str,
|
||||
input_data: dict,
|
||||
timeout: float | None = None,
|
||||
session_state: dict | None = None,
|
||||
) -> ExecutionResult | None:
|
||||
"""Execute the graph and wait for completion."""
|
||||
if self._executor is None:
|
||||
raise RuntimeError("Agent not started. Call start() first.")
|
||||
if self._graph is None:
|
||||
raise RuntimeError("Graph not built. Call start() first.")
|
||||
|
||||
return await self._executor.execute(
|
||||
graph=self._graph,
|
||||
goal=self.goal,
|
||||
input_data=input_data,
|
||||
session_state=session_state,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, context: dict, mock_mode=False, session_state=None
|
||||
) -> ExecutionResult:
|
||||
"""Run the agent (convenience method for single execution)."""
|
||||
await self.start(mock_mode=mock_mode)
|
||||
try:
|
||||
result = await self.trigger_and_wait(
|
||||
"start", context, session_state=session_state
|
||||
)
|
||||
return result or ExecutionResult(success=False, error="Execution timeout")
|
||||
finally:
|
||||
await self.stop()
|
||||
|
||||
def info(self):
|
||||
"""Get agent information."""
|
||||
return {
|
||||
"name": metadata.name,
|
||||
"version": metadata.version,
|
||||
"description": metadata.description,
|
||||
"goal": {
|
||||
"name": self.goal.name,
|
||||
"description": self.goal.description,
|
||||
},
|
||||
"nodes": [n.id for n in self.nodes],
|
||||
"edges": [e.id for e in self.edges],
|
||||
"entry_node": self.entry_node,
|
||||
"entry_points": self.entry_points,
|
||||
"pause_nodes": self.pause_nodes,
|
||||
"terminal_nodes": self.terminal_nodes,
|
||||
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
|
||||
}
|
||||
|
||||
def validate(self):
|
||||
"""Validate agent structure."""
|
||||
errors = []
|
||||
warnings = []
|
||||
|
||||
node_ids = {node.id for node in self.nodes}
|
||||
for edge in self.edges:
|
||||
if edge.source not in node_ids:
|
||||
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
|
||||
if edge.target not in node_ids:
|
||||
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
|
||||
|
||||
if self.entry_node not in node_ids:
|
||||
errors.append(f"Entry node '{self.entry_node}' not found")
|
||||
|
||||
for terminal in self.terminal_nodes:
|
||||
if terminal not in node_ids:
|
||||
errors.append(f"Terminal node '{terminal}' not found")
|
||||
|
||||
for ep_id, node_id in self.entry_points.items():
|
||||
if node_id not in node_ids:
|
||||
errors.append(
|
||||
f"Entry point '{ep_id}' references unknown node '{node_id}'"
|
||||
)
|
||||
|
||||
return {
|
||||
"valid": len(errors) == 0,
|
||||
"errors": errors,
|
||||
"warnings": warnings,
|
||||
}
|
||||
|
||||
|
||||
# Create default instance
|
||||
default_agent = DeepResearchAgent()
|
||||
@@ -0,0 +1,46 @@
|
||||
"""Runtime configuration."""
|
||||
|
||||
import json
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _load_preferred_model() -> str:
|
||||
"""Load preferred model from ~/.hive/configuration.json."""
|
||||
config_path = Path.home() / ".hive" / "configuration.json"
|
||||
if config_path.exists():
|
||||
try:
|
||||
with open(config_path) as f:
|
||||
config = json.load(f)
|
||||
llm = config.get("llm", {})
|
||||
if llm.get("provider") and llm.get("model"):
|
||||
return f"{llm['provider']}/{llm['model']}"
|
||||
except Exception:
|
||||
pass
|
||||
return "anthropic/claude-sonnet-4-20250514"
|
||||
|
||||
|
||||
@dataclass
|
||||
class RuntimeConfig:
|
||||
model: str = field(default_factory=_load_preferred_model)
|
||||
temperature: float = 0.7
|
||||
max_tokens: int = 40000
|
||||
api_key: str | None = None
|
||||
api_base: str | None = None
|
||||
|
||||
|
||||
default_config = RuntimeConfig()
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentMetadata:
|
||||
name: str = "Deep Research Agent"
|
||||
version: str = "1.0.0"
|
||||
description: str = (
|
||||
"Interactive research agent that rigorously investigates topics through "
|
||||
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
|
||||
"at key checkpoints for user guidance and feedback."
|
||||
)
|
||||
|
||||
|
||||
metadata = AgentMetadata()
|
||||
+2
-2
@@ -1,8 +1,8 @@
|
||||
{
|
||||
"hive-tools": {
|
||||
"transport": "stdio",
|
||||
"command": "python",
|
||||
"args": ["mcp_server.py", "--stdio"],
|
||||
"command": "uv",
|
||||
"args": ["run", "python", "mcp_server.py", "--stdio"],
|
||||
"cwd": "../../tools",
|
||||
"description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
|
||||
}
|
||||
@@ -0,0 +1,162 @@
|
||||
"""Node definitions for Deep Research Agent."""
|
||||
|
||||
from framework.graph import NodeSpec
|
||||
|
||||
# Node 1: Intake (client-facing)
|
||||
# Brief conversation to clarify what the user wants researched.
|
||||
intake_node = NodeSpec(
|
||||
id="intake",
|
||||
name="Research Intake",
|
||||
description="Discuss the research topic with the user, clarify scope, and confirm direction",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
input_keys=["topic"],
|
||||
output_keys=["research_brief"],
|
||||
system_prompt="""\
|
||||
You are a research intake specialist. The user wants to research a topic.
|
||||
Have a brief conversation to clarify what they need.
|
||||
|
||||
**STEP 1 — Read and respond (text only, NO tool calls):**
|
||||
1. Read the topic provided
|
||||
2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)
|
||||
3. If it's already clear, confirm your understanding and ask the user to confirm
|
||||
|
||||
Keep it short. Don't over-ask.
|
||||
|
||||
**STEP 2 — After the user confirms, call set_output:**
|
||||
- set_output("research_brief", "A clear paragraph describing exactly what to research, \
|
||||
what questions to answer, what scope to cover, and how deep to go.")
|
||||
""",
|
||||
tools=[],
|
||||
)
|
||||
|
||||
# Node 2: Research
|
||||
# The workhorse — searches the web, fetches content, analyzes sources.
|
||||
# One node with both tools avoids the context-passing overhead of 5 separate nodes.
|
||||
research_node = NodeSpec(
|
||||
id="research",
|
||||
name="Research",
|
||||
description="Search the web, fetch source content, and compile findings",
|
||||
node_type="event_loop",
|
||||
max_node_visits=3,
|
||||
input_keys=["research_brief", "feedback"],
|
||||
output_keys=["findings", "sources", "gaps"],
|
||||
nullable_output_keys=["feedback"],
|
||||
system_prompt="""\
|
||||
You are a research agent. Given a research brief, find and analyze sources.
|
||||
|
||||
If feedback is provided, this is a follow-up round — focus on the gaps identified.
|
||||
|
||||
Work in phases:
|
||||
1. **Search**: Use web_search with 3-5 diverse queries covering different angles.
|
||||
Prioritize authoritative sources (.edu, .gov, established publications).
|
||||
2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).
|
||||
Skip URLs that fail. Extract the substantive content.
|
||||
3. **Analyze**: Review what you've collected. Identify key findings, themes,
|
||||
and any contradictions between sources.
|
||||
|
||||
Important:
|
||||
- Work in batches of 3-4 tool calls at a time to manage context
|
||||
- After each batch, assess whether you have enough material
|
||||
- Prefer quality over quantity — 5 good sources beat 15 thin ones
|
||||
- Track which URL each finding comes from (you'll need citations later)
|
||||
|
||||
When done, use set_output:
|
||||
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
|
||||
Include themes, contradictions, and confidence levels.")
|
||||
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
|
||||
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
|
||||
""",
|
||||
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
|
||||
)
|
||||
|
||||
# Node 3: Review (client-facing)
|
||||
# Shows the user what was found and asks whether to dig deeper or proceed.
|
||||
review_node = NodeSpec(
|
||||
id="review",
|
||||
name="Review Findings",
|
||||
description="Present findings to user and decide whether to research more or write the report",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
max_node_visits=3,
|
||||
input_keys=["findings", "sources", "gaps", "research_brief"],
|
||||
output_keys=["needs_more_research", "feedback"],
|
||||
system_prompt="""\
|
||||
Present the research findings to the user clearly and concisely.
|
||||
|
||||
**STEP 1 — Present (your first message, text only, NO tool calls):**
|
||||
1. **Summary** (2-3 sentences of what was found)
|
||||
2. **Key Findings** (bulleted, with confidence levels)
|
||||
3. **Sources Used** (count and quality assessment)
|
||||
4. **Gaps** (what's still unclear or under-covered)
|
||||
|
||||
End by asking: Are they satisfied, or do they want deeper research? \
|
||||
Should we proceed to writing the final report?
|
||||
|
||||
**STEP 2 — After the user responds, call set_output:**
|
||||
- set_output("needs_more_research", "true") — if they want more
|
||||
- set_output("needs_more_research", "false") — if they're satisfied
|
||||
- set_output("feedback", "What the user wants explored further, or empty string")
|
||||
""",
|
||||
tools=[],
|
||||
)
|
||||
|
||||
# Node 4: Report (client-facing)
|
||||
# Writes an HTML report, serves the link to the user, and answers follow-ups.
|
||||
report_node = NodeSpec(
|
||||
id="report",
|
||||
name="Write & Deliver Report",
|
||||
description="Write a cited HTML report from the findings and present it to the user",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
input_keys=["findings", "sources", "research_brief"],
|
||||
output_keys=["delivery_status"],
|
||||
system_prompt="""\
|
||||
Write a comprehensive research report as an HTML file and present it to the user.
|
||||
|
||||
**STEP 1 — Write the HTML report (tool calls, NO text to user yet):**
|
||||
|
||||
1. Compose a complete, self-contained HTML document with embedded CSS styling.
|
||||
Use a clean, readable design: max-width container, pleasant typography,
|
||||
numbered citation links, a table of contents, and a references section.
|
||||
|
||||
Report structure inside the HTML:
|
||||
- Title & date
|
||||
- Executive Summary (2-3 paragraphs)
|
||||
- Table of Contents
|
||||
- Findings (organized by theme, with [n] citation links)
|
||||
- Analysis (synthesis, implications, areas of debate)
|
||||
- Conclusion (key takeaways, confidence assessment)
|
||||
- References (numbered list with clickable URLs)
|
||||
|
||||
Requirements:
|
||||
- Every factual claim must cite its source with [n] notation
|
||||
- Be objective — present multiple viewpoints where sources disagree
|
||||
- Distinguish well-supported conclusions from speculation
|
||||
- Answer the original research questions from the brief
|
||||
|
||||
2. Save the HTML file:
|
||||
save_data(filename="report.html", data=<your_html>)
|
||||
|
||||
3. Get the clickable link:
|
||||
serve_file_to_user(filename="report.html", label="Research Report")
|
||||
|
||||
**STEP 2 — Present the link to the user (text only, NO tool calls):**
|
||||
|
||||
Tell the user the report is ready and include the file:// URI from
|
||||
serve_file_to_user so they can click it to open. Give a brief summary
|
||||
of what the report covers. Ask if they have questions.
|
||||
|
||||
**STEP 3 — After the user responds:**
|
||||
- Answer follow-up questions from the research material
|
||||
- When the user is satisfied: set_output("delivery_status", "completed")
|
||||
""",
|
||||
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"intake_node",
|
||||
"research_node",
|
||||
"review_node",
|
||||
"report_node",
|
||||
]
|
||||
+109
-140
@@ -1,10 +1,10 @@
|
||||
---
|
||||
name: setup-credentials
|
||||
name: hive-credentials
|
||||
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.2"
|
||||
version: "2.3"
|
||||
type: utility
|
||||
---
|
||||
|
||||
@@ -31,95 +31,50 @@ Determine which agent needs credentials. The user will either:
|
||||
|
||||
Locate the agent's directory under `exports/{agent_name}/`.
|
||||
|
||||
### Step 2: Detect Required Credentials (Bash-First)
|
||||
### Step 2: Detect Missing Credentials
|
||||
|
||||
Use bash commands to determine what the agent needs and what's already configured. This avoids Python import issues and works even when `HIVE_CREDENTIAL_KEY` is not set.
|
||||
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
|
||||
|
||||
#### Step 2a: Read Agent Requirements
|
||||
|
||||
Extract `required_tools` and node types from the agent config:
|
||||
|
||||
```bash
|
||||
# Get required tools
|
||||
jq -r '.required_tools[]?' exports/{agent_name}/agent.json 2>/dev/null
|
||||
|
||||
# Get node types from graph nodes
|
||||
jq -r '.graph.nodes[]?.node_type' exports/{agent_name}/agent.json 2>/dev/null | sort -u
|
||||
```
|
||||
check_missing_credentials(agent_path="exports/{agent_name}")
|
||||
```
|
||||
|
||||
Map the extracted tools and node types to credentials by reading the spec files directly:
|
||||
The tool returns a JSON response:
|
||||
|
||||
```bash
|
||||
# Read all credential specs — each file defines tools, node_types, env_var, and credential_id
|
||||
cat tools/src/aden_tools/credentials/llm.py tools/src/aden_tools/credentials/search.py tools/src/aden_tools/credentials/email.py tools/src/aden_tools/credentials/integrations.py
|
||||
```json
|
||||
{
|
||||
"agent": "exports/{agent_name}",
|
||||
"missing": [
|
||||
{
|
||||
"credential_name": "brave_search",
|
||||
"env_var": "BRAVE_SEARCH_API_KEY",
|
||||
"description": "Brave Search API key for web search",
|
||||
"help_url": "https://brave.com/search/api/",
|
||||
"tools": ["web_search"]
|
||||
}
|
||||
],
|
||||
"available": [
|
||||
{
|
||||
"credential_name": "anthropic",
|
||||
"env_var": "ANTHROPIC_API_KEY",
|
||||
"source": "encrypted_store"
|
||||
}
|
||||
],
|
||||
"total_missing": 1,
|
||||
"ready": false
|
||||
}
|
||||
```
|
||||
|
||||
For each `CredentialSpec`, match its `tools` and `node_types` lists against the agent's required tools and node types. Extract the `env_var`, `credential_id`, and `credential_group` for every match. This is the list of needed credentials.
|
||||
|
||||
#### Step 2b: Check Existing Credential Sources
|
||||
|
||||
For each needed credential, check three sources. A credential is "found" if it exists in ANY of them:
|
||||
|
||||
**1. Encrypted store metadata index** (unencrypted JSON — no decryption key needed):
|
||||
|
||||
```bash
|
||||
cat ~/.hive/credentials/metadata/index.json 2>/dev/null | jq -r '.credentials | keys[]'
|
||||
```
|
||||
|
||||
If a credential ID appears in this list, it is stored in the encrypted store.
|
||||
|
||||
**2. Environment variables:**
|
||||
|
||||
```bash
|
||||
# Check each needed env var, e.g.:
|
||||
printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "ANTHROPIC_API_KEY: set" || echo "ANTHROPIC_API_KEY: not set"
|
||||
printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "BRAVE_SEARCH_API_KEY: set" || echo "BRAVE_SEARCH_API_KEY: not set"
|
||||
```
|
||||
|
||||
**3. Project `.env` file:**
|
||||
|
||||
```bash
|
||||
# Check each needed env var, e.g.:
|
||||
grep -q '^ANTHROPIC_API_KEY=' .env 2>/dev/null && echo "ANTHROPIC_API_KEY: in .env" || echo "ANTHROPIC_API_KEY: not in .env"
|
||||
grep -q '^BRAVE_SEARCH_API_KEY=' .env 2>/dev/null && echo "BRAVE_SEARCH_API_KEY: in .env" || echo "BRAVE_SEARCH_API_KEY: not in .env"
|
||||
```
|
||||
|
||||
#### Step 2c: HIVE_CREDENTIAL_KEY Check
|
||||
|
||||
If any credentials were found in the encrypted store metadata index, verify the encryption key is available. The key is typically persisted to shell config by a previous setup-credentials run.
|
||||
|
||||
Check both the current session AND shell config files:
|
||||
|
||||
```bash
|
||||
# Check 1: Current session
|
||||
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
|
||||
|
||||
# Check 2: Shell config files (where setup-credentials persists it)
|
||||
# Note: check each file individually to avoid non-zero exit when one doesn't exist
|
||||
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
|
||||
```
|
||||
|
||||
Decision logic:
|
||||
- **In current session** — no action needed, credentials in the store are usable
|
||||
- **In shell config but NOT in current session** — the key is persisted but this shell hasn't sourced it. Run `source ~/.zshrc` (or `~/.bashrc`), then re-check. Credentials in the store are usable after sourcing.
|
||||
- **Not in session AND not in shell config** — the key was never persisted. Warn the user that credentials in the store cannot be decrypted. Help fix the key situation (recover/re-persist), do NOT re-collect credential values that are already stored.
|
||||
|
||||
#### Step 2d: Compute Missing & Group
|
||||
|
||||
Diff the "needed" credentials against the "found" credentials to get the truly missing list.
|
||||
|
||||
Group related credentials by their `credential_group` field from the spec files. Credentials that share the same non-empty `credential_group` value should be presented as a single setup step rather than asking for each one individually.
|
||||
|
||||
**If nothing is missing and there's no HIVE_CREDENTIAL_KEY issue:** Report all credentials as configured and skip Steps 3-5. Example:
|
||||
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
|
||||
|
||||
```
|
||||
All required credentials are already configured:
|
||||
✓ anthropic (ANTHROPIC_API_KEY) — found in encrypted store
|
||||
✓ brave_search (BRAVE_SEARCH_API_KEY) — found in environment
|
||||
✓ anthropic (ANTHROPIC_API_KEY)
|
||||
✓ brave_search (BRAVE_SEARCH_API_KEY)
|
||||
Your agent is ready to run!
|
||||
```
|
||||
|
||||
**If credentials are missing:** Continue to Step 3 with only the missing ones.
|
||||
**If credentials are missing:** Continue to Step 3 with the `missing` list.
|
||||
|
||||
### Step 3: Present Auth Options for Each Missing Credential
|
||||
|
||||
@@ -153,7 +108,7 @@ Present the available options using AskUserQuestion:
|
||||
Choose how to configure HUBSPOT_ACCESS_TOKEN:
|
||||
|
||||
1) Aden Platform (OAuth) (Recommended)
|
||||
Secure OAuth2 flow via integration.adenhq.com
|
||||
Secure OAuth2 flow via hive.adenhq.com
|
||||
- Quick setup with automatic token refresh
|
||||
- No need to manage API keys manually
|
||||
|
||||
@@ -170,6 +125,22 @@ Choose how to configure HUBSPOT_ACCESS_TOKEN:
|
||||
|
||||
### Step 4: Execute Auth Flow Based on User Choice
|
||||
|
||||
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
|
||||
|
||||
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
|
||||
|
||||
```bash
|
||||
# Check current session
|
||||
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
|
||||
|
||||
# Check shell config files
|
||||
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
|
||||
```
|
||||
|
||||
- **In current session** — proceed to store credentials
|
||||
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
|
||||
- **Not set anywhere** — `EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
|
||||
|
||||
#### Option 1: Aden Platform (OAuth)
|
||||
|
||||
This is the recommended flow for supported integrations (HubSpot, etc.).
|
||||
@@ -195,7 +166,7 @@ If not set, guide user to get one from Aden (this is where they do OAuth):
|
||||
from aden_tools.credentials import open_browser, get_aden_setup_url
|
||||
|
||||
# Open browser to Aden - user will sign up and connect integrations there
|
||||
url = get_aden_setup_url() # https://integration.adenhq.com/setup
|
||||
url = get_aden_setup_url() # https://hive.adenhq.com
|
||||
success, msg = open_browser(url)
|
||||
|
||||
print("Please sign in to Aden and connect your integrations (HubSpot, etc.).")
|
||||
@@ -272,7 +243,7 @@ print(f"Synced credentials: {synced}")
|
||||
# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
|
||||
if "hubspot" not in synced:
|
||||
print("HubSpot not found in your Aden account.")
|
||||
print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.")
|
||||
print("Please visit https://hive.adenhq.com to connect HubSpot, then try again.")
|
||||
```
|
||||
|
||||
For more control over the sync process:
|
||||
@@ -442,28 +413,38 @@ config_path.write_text(json.dumps(config, indent=2))
|
||||
|
||||
### Step 6: Verify All Credentials
|
||||
|
||||
Run validation again to confirm everything is set:
|
||||
Use the `verify_credentials` MCP tool to confirm everything is properly configured:
|
||||
|
||||
```python
|
||||
runner = AgentRunner.load("exports/{agent_name}")
|
||||
validation = runner.validate()
|
||||
assert not validation.missing_credentials, "Still missing credentials!"
|
||||
```
|
||||
verify_credentials(agent_path="exports/{agent_name}")
|
||||
```
|
||||
|
||||
Report the result to the user.
|
||||
The tool returns:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": "exports/{agent_name}",
|
||||
"ready": true,
|
||||
"missing_credentials": [],
|
||||
"warnings": [],
|
||||
"errors": []
|
||||
}
|
||||
```
|
||||
|
||||
If `ready` is true, report success. If `missing_credentials` is non-empty, identify what failed and loop back to Step 3 for the remaining credentials.
|
||||
|
||||
## Health Check Reference
|
||||
|
||||
Health checks validate credentials by making lightweight API calls:
|
||||
|
||||
| Credential | Endpoint | What It Checks |
|
||||
| --------------- | --------------------------------------- | ---------------------------------- |
|
||||
| `anthropic` | `POST /v1/messages` | API key validity |
|
||||
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
|
||||
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
|
||||
| `github` | `GET /user` | Token validity, user identity |
|
||||
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
|
||||
| `resend` | `GET /domains` | API key validity |
|
||||
| Credential | Endpoint | What It Checks |
|
||||
| --------------- | --------------------------------------- | --------------------------------- |
|
||||
| `anthropic` | `POST /v1/messages` | API key validity |
|
||||
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
|
||||
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
|
||||
| `github` | `GET /user` | Token validity, user identity |
|
||||
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
|
||||
| `resend` | `GET /domains` | API key validity |
|
||||
|
||||
```python
|
||||
from aden_tools.credentials import check_credential_health, HealthCheckResult
|
||||
@@ -560,60 +541,27 @@ token = store.get_key("hubspot", "access_token")
|
||||
## Example Session
|
||||
|
||||
```
|
||||
User: /setup-credentials for my research-agent
|
||||
User: /hive-credentials for my research-agent
|
||||
|
||||
Agent: Let me check what credentials your research-agent needs.
|
||||
|
||||
[Reads agent config]
|
||||
$ jq -r '.required_tools[]?' exports/research-agent/agent.json
|
||||
web_search
|
||||
google_search
|
||||
|
||||
$ jq -r '.graph.nodes[]?.node_type' exports/research-agent/agent.json | sort -u
|
||||
llm_tool_use
|
||||
|
||||
[Maps tools/nodes to credentials using lookup table]
|
||||
Needed: anthropic, brave_search, google_search, google_cse
|
||||
|
||||
[Checks encrypted store metadata index]
|
||||
$ cat ~/.hive/credentials/metadata/index.json | jq -r '.credentials | keys[]'
|
||||
anthropic
|
||||
brave_search
|
||||
|
||||
[Checks environment variables]
|
||||
$ printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
|
||||
not set
|
||||
$ printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
|
||||
not set
|
||||
$ printenv GOOGLE_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
|
||||
not set
|
||||
$ printenv GOOGLE_CSE_ID > /dev/null 2>&1 && echo "set" || echo "not set"
|
||||
not set
|
||||
|
||||
[Checks HIVE_CREDENTIAL_KEY since credentials found in store]
|
||||
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
|
||||
session: not set
|
||||
$ for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
|
||||
/Users/user/.zshrc
|
||||
|
||||
[Key is in shell config but not current session — sourcing it]
|
||||
$ source ~/.zshrc
|
||||
|
||||
[Computes missing credentials]
|
||||
Found:
|
||||
✓ anthropic (ANTHROPIC_API_KEY) — in encrypted store
|
||||
✓ brave_search (BRAVE_SEARCH_API_KEY) — in encrypted store
|
||||
Missing:
|
||||
✗ google_search (GOOGLE_API_KEY)
|
||||
✗ google_cse (GOOGLE_CSE_ID)
|
||||
[Calls check_missing_credentials(agent_path="exports/research-agent")]
|
||||
→ Returns:
|
||||
available: anthropic (encrypted_store), brave_search (encrypted_store)
|
||||
missing: google_search (GOOGLE_API_KEY), google_cse (GOOGLE_CSE_ID)
|
||||
ready: false
|
||||
|
||||
Agent: 2 of 4 required credentials are already configured. Only Google Custom
|
||||
Search needs setup (2 values as a single group).
|
||||
Search needs setup (2 values).
|
||||
|
||||
--- Setting up Google Custom Search (google_search + google_cse) ---
|
||||
|
||||
This requires two values that work together.
|
||||
|
||||
[Checks HIVE_CREDENTIAL_KEY before storing]
|
||||
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
|
||||
set
|
||||
|
||||
First, the Google API Key:
|
||||
1. Go to https://console.cloud.google.com/apis/credentials
|
||||
2. Create a new project (or select an existing one)
|
||||
@@ -640,10 +588,31 @@ Now, the Custom Search Engine ID:
|
||||
|
||||
✓ Google Custom Search credentials valid
|
||||
|
||||
[Calls verify_credentials(agent_path="exports/research-agent")]
|
||||
→ Returns: ready: true, missing_credentials: []
|
||||
|
||||
All credentials are now configured:
|
||||
✓ anthropic (ANTHROPIC_API_KEY) — already in encrypted store
|
||||
✓ brave_search (BRAVE_SEARCH_API_KEY) — already in encrypted store
|
||||
✓ google_search (GOOGLE_API_KEY) — stored in encrypted store
|
||||
✓ google_cse (GOOGLE_CSE_ID) — stored in encrypted store
|
||||
Your agent is ready to run!
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ✅ CREDENTIALS CONFIGURED │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ NEXT STEPS: │
|
||||
│ │
|
||||
│ 1. RUN YOUR AGENT: │
|
||||
│ │
|
||||
│ PYTHONPATH=core:exports python -m research-agent tui │
|
||||
│ │
|
||||
│ 2. IF YOU ENCOUNTER ISSUES, USE THE DEBUGGER: │
|
||||
│ │
|
||||
│ /hive-debugger │
|
||||
│ │
|
||||
│ The debugger analyzes runtime logs, identifies retry loops, tool │
|
||||
│ failures, stalled execution, and provides actionable fix suggestions. │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
@@ -0,0 +1,848 @@
|
||||
---
|
||||
name: hive-debugger
|
||||
type: utility
|
||||
description: Interactive debugging companion for Hive agents - identifies runtime issues and proposes solutions
|
||||
version: 1.0.0
|
||||
requires:
|
||||
- hive-concepts
|
||||
tags:
|
||||
- debugging
|
||||
- runtime-logs
|
||||
- agent-development
|
||||
---
|
||||
|
||||
# Hive Debugger
|
||||
|
||||
An interactive debugging companion that helps developers identify and fix runtime issues in Hive agents. The debugger analyzes runtime logs at three levels (L1/L2/L3), categorizes issues, and provides actionable fix recommendations.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use `/hive-debugger` when:
|
||||
- Your agent is failing or producing unexpected results
|
||||
- You need to understand why a specific node is retrying repeatedly
|
||||
- Tool calls are failing and you need to identify the root cause
|
||||
- Agent execution is stalled or taking too long
|
||||
- You want to monitor agent behavior in real-time during development
|
||||
|
||||
This skill works alongside agents running in TUI mode and provides supervisor-level insights into execution behavior.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using this skill, ensure:
|
||||
1. You have an exported agent in `exports/{agent_name}/`
|
||||
2. The agent has been run at least once (logs exist)
|
||||
3. Runtime logging is enabled (default in Hive framework)
|
||||
4. You have access to the agent's working directory at `~/.hive/agents/{agent_name}/`
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Stage 1: Setup & Context Gathering
|
||||
|
||||
**Objective:** Understand the agent being debugged
|
||||
|
||||
**What to do:**
|
||||
|
||||
1. **Ask the developer which agent needs debugging:**
|
||||
- Get agent name (e.g., "twitter_outreach", "deep_research_agent")
|
||||
- Confirm the agent exists in `exports/{agent_name}/`
|
||||
|
||||
2. **Determine agent working directory:**
|
||||
- Calculate: `~/.hive/agents/{agent_name}/`
|
||||
- Verify this directory exists and contains session logs
|
||||
|
||||
3. **Read agent configuration:**
|
||||
- Read file: `exports/{agent_name}/agent.json`
|
||||
- Extract goal information from the JSON:
|
||||
- `goal.id` - The goal identifier
|
||||
- `goal.success_criteria` - What success looks like
|
||||
- `goal.constraints` - Rules the agent must follow
|
||||
- Extract graph information:
|
||||
- List of node IDs from `graph.nodes`
|
||||
- List of edges from `graph.edges`
|
||||
|
||||
4. **Store context for the debugging session:**
|
||||
- agent_name
|
||||
- agent_work_dir (e.g., `/home/user/.hive/twitter_outreach`)
|
||||
- goal_id
|
||||
- success_criteria
|
||||
- constraints
|
||||
- node_ids
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Developer: "My twitter_outreach agent keeps failing"
|
||||
|
||||
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
|
||||
|
||||
[Read exports/twitter_outreach/agent.json]
|
||||
|
||||
Context gathered:
|
||||
- Agent: twitter_outreach
|
||||
- Goal: twitter-outreach-multi-loop
|
||||
- Working Directory: /home/user/.hive/twitter_outreach
|
||||
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
|
||||
- Constraints: ["Must verify handle exists", "Must personalize message"]
|
||||
- Nodes: ["intake-collector", "profile-analyzer", "message-composer", "outreach-sender"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stage 2: Mode Selection
|
||||
|
||||
**Objective:** Choose the debugging approach that best fits the situation
|
||||
|
||||
**What to do:**
|
||||
|
||||
Ask the developer which debugging mode they want to use. Use AskUserQuestion with these options:
|
||||
|
||||
1. **Real-time Monitoring Mode**
|
||||
- Description: Monitor active TUI session continuously, poll logs every 5-10 seconds, alert on new issues immediately
|
||||
- Best for: Live debugging sessions where you want to catch issues as they happen
|
||||
- Note: Requires agent to be currently running
|
||||
|
||||
2. **Post-Mortem Analysis Mode**
|
||||
- Description: Analyze completed or failed runs in detail, deep dive into specific session
|
||||
- Best for: Understanding why a past execution failed
|
||||
- Note: Most common mode for debugging
|
||||
|
||||
3. **Historical Trends Mode**
|
||||
- Description: Analyze patterns across multiple runs, identify recurring issues
|
||||
- Best for: Finding systemic problems that happen repeatedly
|
||||
- Note: Useful for agents that have run many times
|
||||
|
||||
**Implementation:**
|
||||
```
|
||||
Use AskUserQuestion to present these options and let the developer choose.
|
||||
Store the selected mode for the session.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stage 3: Triage (L1 Analysis)
|
||||
|
||||
**Objective:** Identify which sessions need attention
|
||||
|
||||
**What to do:**
|
||||
|
||||
1. **Query high-level run summaries** using the MCP tool:
|
||||
```
|
||||
query_runtime_logs(
|
||||
agent_work_dir="{agent_work_dir}",
|
||||
status="needs_attention",
|
||||
limit=20
|
||||
)
|
||||
```
|
||||
|
||||
2. **Analyze the results:**
|
||||
- Look for runs with `needs_attention: true`
|
||||
- Check `attention_summary.categories` for issue types
|
||||
- Note the `run_id` of problematic sessions
|
||||
- Check `status` field: "degraded", "failure", "in_progress"
|
||||
|
||||
3. **Attention flag triggers to understand:**
|
||||
From runtime_logger.py, runs are flagged when:
|
||||
- retry_count > 3
|
||||
- escalate_count > 2
|
||||
- latency_ms > 60000
|
||||
- tokens_used > 100000
|
||||
- total_steps > 20
|
||||
|
||||
4. **Present findings to developer:**
|
||||
- Summarize how many runs need attention
|
||||
- List the most recent problematic runs
|
||||
- Show attention categories for each
|
||||
- Ask which run they want to investigate (if multiple)
|
||||
|
||||
**Example Output:**
|
||||
```
|
||||
Found 2 runs needing attention:
|
||||
|
||||
1. session_20260206_115718_e22339c5 (30 minutes ago)
|
||||
Status: degraded
|
||||
Categories: missing_outputs, retry_loops
|
||||
|
||||
2. session_20260206_103422_9f8d1b2a (2 hours ago)
|
||||
Status: failure
|
||||
Categories: tool_failures, high_latency
|
||||
|
||||
Which run would you like to investigate?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stage 4: Diagnosis (L2 Analysis)
|
||||
|
||||
**Objective:** Identify which nodes failed and what patterns exist
|
||||
|
||||
**What to do:**
|
||||
|
||||
1. **Query per-node details** using the MCP tool:
|
||||
```
|
||||
query_runtime_log_details(
|
||||
agent_work_dir="{agent_work_dir}",
|
||||
run_id="{selected_run_id}",
|
||||
needs_attention_only=True
|
||||
)
|
||||
```
|
||||
|
||||
2. **Categorize issues** using the Issue Taxonomy:
|
||||
|
||||
**10 Issue Categories:**
|
||||
|
||||
| Category | Detection Pattern | Meaning |
|
||||
|----------|------------------|---------|
|
||||
| **Missing Outputs** | `exit_status != "success"`, `attention_reasons` contains "missing_outputs" | Node didn't call set_output with required keys |
|
||||
| **Tool Errors** | `tool_error_count > 0`, `attention_reasons` contains "tool_failures" | Tool calls failed (API errors, timeouts, auth issues) |
|
||||
| **Retry Loops** | `retry_count > 3`, `verdict_counts.RETRY > 5` | Judge repeatedly rejecting outputs |
|
||||
| **Guard Failures** | `guard_reject_count > 0` | Output validation failed (wrong types, missing keys) |
|
||||
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress |
|
||||
| **High Latency** | `latency_ms > 60000`, `avg_step_latency > 5000` | Slow tool calls or LLM responses |
|
||||
| **Client-Facing Issues** | `client_input_requested` but no `user_input_received` | Premature set_output before user input |
|
||||
| **Edge Routing Errors** | `exit_status == "no_valid_edge"`, `attention_reasons` contains "routing_issue" | No edges match current state |
|
||||
| **Memory/Context Issues** | `tokens_used > 100000`, `context_overflow_count > 0` | Conversation history too long |
|
||||
| **Constraint Violations** | Compare output against goal constraints | Agent violated goal-level rules |
|
||||
|
||||
3. **Analyze each flagged node:**
|
||||
- Node ID and name
|
||||
- Exit status
|
||||
- Retry count
|
||||
- Verdict distribution (ACCEPT/RETRY/ESCALATE/CONTINUE)
|
||||
- Attention reasons
|
||||
- Total steps executed
|
||||
|
||||
4. **Present diagnosis to developer:**
|
||||
- List problematic nodes
|
||||
- Categorize each issue
|
||||
- Highlight the most severe problems
|
||||
- Show evidence (retry counts, error types)
|
||||
|
||||
**Example Output:**
|
||||
```
|
||||
Diagnosis for session_20260206_115718_e22339c5:
|
||||
|
||||
Problem Node: intake-collector
|
||||
├─ Exit Status: escalate
|
||||
├─ Retry Count: 5 (HIGH)
|
||||
├─ Verdict Counts: {RETRY: 5, ESCALATE: 1}
|
||||
├─ Attention Reasons: ["high_retry_count", "missing_outputs"]
|
||||
├─ Total Steps: 8
|
||||
└─ Categories: Missing Outputs + Retry Loops
|
||||
|
||||
Root Issue: The intake-collector node is stuck in a retry loop because it's not setting required outputs.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stage 5: Root Cause Analysis (L3 Analysis)
|
||||
|
||||
**Objective:** Understand exactly what went wrong by examining detailed logs
|
||||
|
||||
**What to do:**
|
||||
|
||||
1. **Query detailed tool/LLM logs** using the MCP tool:
|
||||
```
|
||||
query_runtime_log_raw(
|
||||
agent_work_dir="{agent_work_dir}",
|
||||
run_id="{run_id}",
|
||||
node_id="{problem_node_id}"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Analyze based on issue category:**
|
||||
|
||||
**For Missing Outputs:**
|
||||
- Check `step.tool_calls` for set_output usage
|
||||
- Look for conditional logic that skipped set_output
|
||||
- Check if LLM is calling other tools instead
|
||||
|
||||
**For Tool Errors:**
|
||||
- Check `step.tool_results` for error messages
|
||||
- Identify error types: rate limits, auth failures, timeouts, network errors
|
||||
- Note which specific tool is failing
|
||||
|
||||
**For Retry Loops:**
|
||||
- Check `step.verdict_feedback` from judge
|
||||
- Look for repeated failure reasons
|
||||
- Identify if it's the same issue every time
|
||||
|
||||
**For Guard Failures:**
|
||||
- Check `step.guard_results` for validation errors
|
||||
- Identify missing keys or type mismatches
|
||||
- Compare actual output to expected schema
|
||||
|
||||
**For Stalled Execution:**
|
||||
- Check `step.llm_response_text` for repetition
|
||||
- Look for LLM stuck in same action loop
|
||||
- Check if tool calls are succeeding but not progressing
|
||||
|
||||
3. **Extract evidence:**
|
||||
- Specific error messages
|
||||
- Tool call arguments and results
|
||||
- LLM response text
|
||||
- Judge feedback
|
||||
- Step-by-step progression
|
||||
|
||||
4. **Formulate root cause explanation:**
|
||||
- Clearly state what is happening
|
||||
- Explain why it's happening
|
||||
- Show evidence from logs
|
||||
|
||||
**Example Output:**
|
||||
```
|
||||
Root Cause Analysis for intake-collector:
|
||||
|
||||
Step-by-step breakdown:
|
||||
|
||||
Step 3:
|
||||
- Tool Call: web_search(query="@RomuloNevesOf")
|
||||
- Result: Found Twitter profile information
|
||||
- Verdict: RETRY
|
||||
- Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
|
||||
|
||||
Step 4:
|
||||
- Tool Call: web_search(query="@RomuloNevesOf twitter")
|
||||
- Result: Found additional Twitter information
|
||||
- Verdict: RETRY
|
||||
- Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
|
||||
|
||||
Steps 5-7: Similar pattern continues...
|
||||
|
||||
ROOT CAUSE: The node is successfully finding Twitter handles via web_search, but the LLM is not calling set_output to save the results. It keeps searching for more information instead of completing the task.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stage 6: Fix Recommendations
|
||||
|
||||
**Objective:** Provide actionable solutions the developer can implement
|
||||
|
||||
**What to do:**
|
||||
|
||||
Based on the issue category identified, provide specific fix recommendations using these templates:
|
||||
|
||||
#### Template 1: Missing Outputs (Client-Facing Nodes)
|
||||
|
||||
```markdown
|
||||
## Issue: Premature set_output in Client-Facing Node
|
||||
|
||||
**Root Cause:** Node called set_output before receiving user input
|
||||
|
||||
**Fix:** Use STEP 1/STEP 2 prompt pattern
|
||||
|
||||
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
|
||||
|
||||
**Changes:**
|
||||
1. Update the system_prompt to include explicit step guidance:
|
||||
```python
|
||||
system_prompt = """
|
||||
STEP 1: Analyze the user input and decide what action to take.
|
||||
DO NOT call set_output in this step.
|
||||
|
||||
STEP 2: After receiving feedback or completing analysis,
|
||||
ONLY THEN call set_output with your results.
|
||||
"""
|
||||
```
|
||||
|
||||
2. If some inputs are optional (like feedback on retry edges), add nullable_output_keys:
|
||||
```python
|
||||
nullable_output_keys=["feedback"]
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
- Run the agent with test input
|
||||
- Verify the client-facing node waits for user input before calling set_output
|
||||
```
|
||||
|
||||
#### Template 2: Retry Loops
|
||||
|
||||
```markdown
|
||||
## Issue: Judge Repeatedly Rejecting Outputs
|
||||
|
||||
**Root Cause:** {Insert specific reason from verdict_feedback}
|
||||
|
||||
**Fix Options:**
|
||||
|
||||
**Option A - If outputs are actually correct:** Adjust judge evaluation rules
|
||||
- File: `exports/{agent_name}/agent.json`
|
||||
- Update `evaluation_rules` section to accept the current output format
|
||||
- Example: If judge expects list but gets string, update rule to accept both
|
||||
|
||||
**Option B - If prompt is ambiguous:** Clarify node instructions
|
||||
- File: `exports/{agent_name}/nodes/{node_name}.py`
|
||||
- Make system_prompt more explicit about output format and requirements
|
||||
- Add examples of correct outputs
|
||||
|
||||
**Option C - If tool is unreliable:** Add retry logic with fallback
|
||||
- Consider using alternative tools
|
||||
- Add manual fallback option
|
||||
- Update prompt to handle tool failures gracefully
|
||||
|
||||
**Verification:**
|
||||
- Run the node with test input
|
||||
- Confirm judge accepts output on first try
|
||||
- Check that retry_count stays at 0
|
||||
```
|
||||
|
||||
#### Template 3: Tool Errors
|
||||
|
||||
```markdown
|
||||
## Issue: {tool_name} Failing with {error_type}
|
||||
|
||||
**Root Cause:** {Insert specific error message from logs}
|
||||
|
||||
**Fix Strategy:**
|
||||
|
||||
**If API rate limit:**
|
||||
1. Add exponential backoff in tool retry logic
|
||||
2. Reduce API call frequency
|
||||
3. Consider caching results
|
||||
|
||||
**If auth failure:**
|
||||
1. Check credentials using:
|
||||
```bash
|
||||
/hive-credentials --agent {agent_name}
|
||||
```
|
||||
2. Verify API key environment variables
|
||||
3. Update `mcp_servers.json` if needed
|
||||
|
||||
**If timeout:**
|
||||
1. Increase timeout in `mcp_servers.json`:
|
||||
```json
|
||||
{
|
||||
"timeout_ms": 60000
|
||||
}
|
||||
```
|
||||
2. Consider using faster alternative tools
|
||||
3. Break large requests into smaller chunks
|
||||
|
||||
**Verification:**
|
||||
- Test tool call manually
|
||||
- Confirm successful response
|
||||
- Monitor for recurring errors
|
||||
```
|
||||
|
||||
#### Template 4: Edge Routing Errors
|
||||
|
||||
```markdown
|
||||
## Issue: No Valid Edge from Node {node_id}
|
||||
|
||||
**Root Cause:** No edge condition matched the current state
|
||||
|
||||
**File to edit:** `exports/{agent_name}/agent.json`
|
||||
|
||||
**Analysis:**
|
||||
- Current node output: {show actual output keys}
|
||||
- Existing edge conditions: {list edge conditions}
|
||||
- Why no match: {explain the mismatch}
|
||||
|
||||
**Fix:**
|
||||
Add the missing edge to the graph:
|
||||
```json
|
||||
{
|
||||
"edge_id": "{node_id}_to_{target_node}",
|
||||
"source": "{node_id}",
|
||||
"target": "{target_node}",
|
||||
"condition": "on_success"
|
||||
}
|
||||
```
|
||||
|
||||
**Alternative:** Update existing edge condition to cover this case
|
||||
|
||||
**Verification:**
|
||||
- Run agent with same input
|
||||
- Verify edge is traversed successfully
|
||||
- Check that execution continues to next node
|
||||
```
|
||||
|
||||
#### Template 5: Stalled Execution
|
||||
|
||||
```markdown
|
||||
## Issue: EventLoopNode Not Making Progress
|
||||
|
||||
**Root Cause:** {Insert analysis - e.g., "LLM repeating same failed action"}
|
||||
|
||||
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
|
||||
|
||||
**Fix:** Update system_prompt to guide LLM out of loops
|
||||
|
||||
**Add this guidance:**
|
||||
```python
|
||||
system_prompt = """
|
||||
{existing prompt}
|
||||
|
||||
IMPORTANT: If a tool call fails multiple times:
|
||||
1. Try an alternative approach or different tool
|
||||
2. If no alternatives work, call set_output with partial results
|
||||
3. DO NOT retry the same failed action more than 3 times
|
||||
|
||||
Progress is more important than perfection. Move forward even with incomplete data.
|
||||
"""
|
||||
```
|
||||
|
||||
**Additional fix:** Lower max_iterations to prevent infinite loops
|
||||
```python
|
||||
# In node configuration
|
||||
max_node_visits=3 # Prevent getting stuck
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
- Run node with same input that caused stall
|
||||
- Verify it exits after reasonable attempts (< 10 steps)
|
||||
- Confirm it calls set_output eventually
|
||||
```
|
||||
|
||||
**Selecting the right template:**
|
||||
- Match the issue category from Stage 4
|
||||
- Customize with specific details from Stage 5
|
||||
- Include actual error messages and code snippets
|
||||
- Provide file paths and line numbers when possible
|
||||
|
||||
---
|
||||
|
||||
### Stage 7: Verification Support
|
||||
|
||||
**Objective:** Help the developer confirm their fixes work
|
||||
|
||||
**What to do:**
|
||||
|
||||
1. **Suggest appropriate tests based on fix type:**
|
||||
|
||||
**For node-level fixes:**
|
||||
```bash
|
||||
# Use hive-test to run goal-based tests
|
||||
/hive-test --agent {agent_name} --goal {goal_id}
|
||||
|
||||
# Or run specific test scenarios
|
||||
/hive-test --agent {agent_name} --scenario {specific_input}
|
||||
```
|
||||
|
||||
**For quick manual tests:**
|
||||
```bash
|
||||
# Launch the interactive TUI dashboard
|
||||
hive tui
|
||||
```
|
||||
Then use arrow keys to select the agent from the list and press Enter to run it.
|
||||
|
||||
2. **Provide MCP tool queries to validate the fix:**
|
||||
|
||||
**Check if issue is resolved:**
|
||||
```
|
||||
query_runtime_logs(
|
||||
agent_work_dir="~/.hive/agents/{agent_name}",
|
||||
status="needs_attention",
|
||||
limit=5
|
||||
)
|
||||
# Should show 0 results if fully fixed
|
||||
```
|
||||
|
||||
**Verify specific node behavior:**
|
||||
```
|
||||
query_runtime_log_details(
|
||||
agent_work_dir="~/.hive/agents/{agent_name}",
|
||||
run_id="{new_run_id}",
|
||||
node_id="{fixed_node_id}"
|
||||
)
|
||||
# Should show exit_status="success", retry_count=0
|
||||
```
|
||||
|
||||
3. **Monitor for regression:**
|
||||
- Run the agent multiple times
|
||||
- Check for similar issues reappearing
|
||||
- Verify fix works across different inputs
|
||||
|
||||
4. **Provide verification checklist:**
|
||||
```
|
||||
Verification Checklist:
|
||||
□ Applied recommended fix to code
|
||||
□ Ran agent with test input
|
||||
□ Checked runtime logs show no attention flags
|
||||
□ Verified specific node completes successfully
|
||||
□ Tested with multiple inputs
|
||||
□ No regression of original issue
|
||||
□ Agent meets success criteria
|
||||
```
|
||||
|
||||
**Example interaction:**
|
||||
```
|
||||
Developer: "I applied the fix to intake-collector. How do I verify it works?"
|
||||
|
||||
You: "Great! Let's verify the fix with these steps:
|
||||
|
||||
1. Launch the TUI dashboard:
|
||||
hive tui
|
||||
Then select your agent from the list and press Enter to run it.
|
||||
|
||||
2. After it completes, check the logs:
|
||||
[Use query_runtime_logs to check for attention flags]
|
||||
|
||||
3. Verify the specific node:
|
||||
[Use query_runtime_log_details for intake-collector]
|
||||
|
||||
Expected results:
|
||||
- No 'needs_attention' flags
|
||||
- intake-collector shows exit_status='success'
|
||||
- retry_count should be 0
|
||||
|
||||
Let me know when you've run it and I'll help check the logs!"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MCP Tool Usage Guide
|
||||
|
||||
### Three Levels of Observability
|
||||
|
||||
**L1: query_runtime_logs** - Session-level summaries
|
||||
- **When to use:** Initial triage, identifying problematic runs, monitoring trends
|
||||
- **Returns:** List of runs with status, attention flags, timestamps
|
||||
- **Example:**
|
||||
```
|
||||
query_runtime_logs(
|
||||
agent_work_dir="/home/user/.hive/twitter_outreach",
|
||||
status="needs_attention",
|
||||
limit=20
|
||||
)
|
||||
```
|
||||
|
||||
**L2: query_runtime_log_details** - Node-level details
|
||||
- **When to use:** Diagnosing which nodes failed, understanding retry patterns
|
||||
- **Returns:** Per-node completion details, retry counts, verdicts
|
||||
- **Example:**
|
||||
```
|
||||
query_runtime_log_details(
|
||||
agent_work_dir="/home/user/.hive/twitter_outreach",
|
||||
run_id="session_20260206_115718_e22339c5",
|
||||
needs_attention_only=True
|
||||
)
|
||||
```
|
||||
|
||||
**L3: query_runtime_log_raw** - Step-level details
|
||||
- **When to use:** Root cause analysis, understanding exact failures
|
||||
- **Returns:** Full tool calls, LLM responses, judge feedback
|
||||
- **Example:**
|
||||
```
|
||||
query_runtime_log_raw(
|
||||
agent_work_dir="/home/user/.hive/twitter_outreach",
|
||||
run_id="session_20260206_115718_e22339c5",
|
||||
node_id="intake-collector"
|
||||
)
|
||||
```
|
||||
|
||||
### Query Patterns
|
||||
|
||||
**Pattern 1: Top-Down Investigation** (Most common)
|
||||
```
|
||||
1. L1: Find problematic runs
|
||||
2. L2: Identify failing nodes
|
||||
3. L3: Analyze specific failures
|
||||
```
|
||||
|
||||
**Pattern 2: Node-Specific Debugging**
|
||||
```
|
||||
1. L2: Get details for specific node across all runs
|
||||
2. L3: Deep dive into worst failures
|
||||
```
|
||||
|
||||
**Pattern 3: Real-time Monitoring**
|
||||
```
|
||||
Loop every 10 seconds:
|
||||
1. L1: Check for new needs_attention runs
|
||||
2. If found: Alert and drill into L2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Example Walkthrough
|
||||
|
||||
### Scenario: twitter_outreach Agent Failing
|
||||
|
||||
**Developer says:** "My twitter_outreach agent keeps failing. It's not finding Twitter handles correctly."
|
||||
|
||||
#### Stage 1: Setup
|
||||
```
|
||||
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
|
||||
|
||||
[Read exports/twitter_outreach/agent.json]
|
||||
|
||||
Context:
|
||||
- Agent: twitter_outreach
|
||||
- Goal: twitter-outreach-multi-loop
|
||||
- Working Dir: ~/.hive/agents/twitter_outreach
|
||||
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
|
||||
- Constraints: ["Must verify handle exists", "Must personalize message"]
|
||||
- Nodes: intake-collector, profile-analyzer, message-composer, outreach-sender
|
||||
```
|
||||
|
||||
#### Stage 2: Mode Selection
|
||||
```
|
||||
You: "Which debugging approach would you prefer?"
|
||||
|
||||
[Present 3 mode options using AskUserQuestion]
|
||||
|
||||
Developer selects: "Post-Mortem Analysis Mode"
|
||||
```
|
||||
|
||||
#### Stage 3: Triage (L1)
|
||||
```
|
||||
You: "Querying runtime logs for problematic runs..."
|
||||
|
||||
[Call query_runtime_logs with status="needs_attention"]
|
||||
|
||||
Results:
|
||||
- Found 1 run needing attention
|
||||
- session_20260206_115718_e22339c5 (30 min ago)
|
||||
- Status: degraded
|
||||
- Categories: missing_outputs, retry_loops
|
||||
```
|
||||
|
||||
#### Stage 4: Diagnosis (L2)
|
||||
```
|
||||
You: "Analyzing node-level details..."
|
||||
|
||||
[Call query_runtime_log_details for the session]
|
||||
|
||||
Results:
|
||||
- Problem Node: intake-collector
|
||||
- Exit Status: escalate
|
||||
- Retry Count: 5
|
||||
- Verdict Counts: {RETRY: 5, ESCALATE: 1}
|
||||
- Attention Reasons: ["high_retry_count", "missing_outputs"]
|
||||
- Total Steps: 8
|
||||
|
||||
Diagnosis: intake-collector has retry loop + missing outputs issue
|
||||
Category: Missing Outputs + Retry Loops
|
||||
```
|
||||
|
||||
#### Stage 5: Root Cause (L3)
|
||||
```
|
||||
You: "Drilling into detailed logs for intake-collector..."
|
||||
|
||||
[Call query_runtime_log_raw for node intake-collector]
|
||||
|
||||
Evidence:
|
||||
Step 3: web_search("@RomuloNevesOf") → Found profile
|
||||
Verdict: RETRY
|
||||
Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
|
||||
|
||||
Step 4: web_search("@RomuloNevesOf twitter") → Found more info
|
||||
Verdict: RETRY
|
||||
Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
|
||||
|
||||
Steps 5-7: Same pattern repeats...
|
||||
|
||||
ROOT CAUSE: Node successfully finds handles but LLM doesn't call set_output. It keeps searching instead of completing.
|
||||
```
|
||||
|
||||
#### Stage 6: Fix Recommendation
|
||||
```
|
||||
You: "Here's the fix for this issue:
|
||||
|
||||
## Issue: intake-collector Not Setting Outputs
|
||||
|
||||
**Root Cause:** Node successfully finds Twitter handles but doesn't call set_output
|
||||
|
||||
**Category:** Missing Outputs
|
||||
|
||||
**File to edit:** exports/twitter_outreach/nodes/intake_collector.py
|
||||
|
||||
**Changes:**
|
||||
Update the system_prompt to emphasize the output requirement:
|
||||
|
||||
```python
|
||||
system_prompt = """
|
||||
You are the Intake Collector for the Twitter outreach agent.
|
||||
|
||||
Your task:
|
||||
1. Extract Twitter handle from user input (URL or @mention)
|
||||
2. Verify it's a valid Twitter handle format
|
||||
3. **CRITICAL: Call set_output(twitter_handles=[handle]) immediately after finding valid handle**
|
||||
|
||||
IMPORTANT: You MUST call set_output even if you found the handle on the first try.
|
||||
Do not continue searching once you have a valid handle - set output and finish.
|
||||
|
||||
Output format:
|
||||
{
|
||||
"twitter_handles": ["@handle1", "@handle2", ...]
|
||||
}
|
||||
"""
|
||||
```
|
||||
|
||||
**Why this works:** The explicit instruction to call set_output immediately prevents the LLM from continuing to search after finding the handle.
|
||||
```
|
||||
|
||||
#### Stage 7: Verification
|
||||
```
|
||||
Developer: "I updated the prompt. How do I test it?"
|
||||
|
||||
You: "Let's verify the fix:
|
||||
|
||||
1. Launch the TUI dashboard:
|
||||
hive tui
|
||||
Then select your agent and press Enter to run it.
|
||||
|
||||
2. I'll check the logs after it runs..."
|
||||
|
||||
[Developer runs agent]
|
||||
|
||||
Developer: "It completed! Can you check if it worked?"
|
||||
|
||||
You: "Checking now..."
|
||||
|
||||
[Call query_runtime_log_details for the new run]
|
||||
|
||||
Results:
|
||||
- Node: intake-collector
|
||||
- Exit Status: success ✓
|
||||
- Retry Count: 0 ✓
|
||||
- Total Steps: 2 ✓
|
||||
- No attention flags ✓
|
||||
|
||||
"Perfect! The fix worked. The intake-collector now:
|
||||
- Completes successfully on first try
|
||||
- No retries needed
|
||||
- Calls set_output properly
|
||||
|
||||
Your agent should now work correctly!"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tips for Effective Debugging
|
||||
|
||||
1. **Always start with L1 logs** - Don't jump straight to detailed logs
|
||||
2. **Focus on attention flags** - They highlight the real issues
|
||||
3. **Compare verdict_feedback across steps** - Patterns reveal root causes
|
||||
4. **Check tool error messages carefully** - They often contain the exact problem
|
||||
5. **Consider the agent's goal** - Fixes should align with success criteria
|
||||
6. **Test fixes immediately** - Quick verification prevents wasted effort
|
||||
7. **Look for patterns across multiple runs** - One-time failures might be transient
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
1. **Don't recommend code you haven't verified exists** - Always read files first
|
||||
2. **Don't assume tool capabilities** - Check MCP server configs
|
||||
3. **Don't ignore edge conditions** - Missing edges cause routing failures
|
||||
4. **Don't overlook judge configuration** - Mismatched expectations cause retry loops
|
||||
5. **Don't forget nullable_output_keys** - Optional inputs need explicit marking
|
||||
|
||||
---
|
||||
|
||||
## Storage Locations Reference
|
||||
|
||||
**New unified storage (default):**
|
||||
- Logs: `~/.hive/agents/{agent_name}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/logs/`
|
||||
- State: `~/.hive/agents/{agent_name}/sessions/{session_id}/state.json`
|
||||
- Conversations: `~/.hive/agents/{agent_name}/sessions/{session_id}/conversations/`
|
||||
|
||||
**Old storage (deprecated, still supported):**
|
||||
- Logs: `~/.hive/agents/{agent_name}/runtime_logs/runs/{run_id}/`
|
||||
|
||||
The MCP tools automatically check both locations.
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Your role is to be a debugging companion and thought partner. Guide the developer through the investigation, explain what you find, and provide actionable fixes. Don't just report errors - help understand and solve them.
|
||||
@@ -0,0 +1,385 @@
|
||||
---
|
||||
name: hive-patterns
|
||||
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.0"
|
||||
type: reference
|
||||
part_of: hive
|
||||
---
|
||||
|
||||
# Building Agents - Patterns & Best Practices
|
||||
|
||||
Design patterns, examples, and best practices for building robust goal-driven agents.
|
||||
|
||||
**Prerequisites:** Complete agent structure using `hive-create`.
|
||||
|
||||
## Practical Example: Hybrid Workflow
|
||||
|
||||
How to build a node using both direct file writes and optional MCP validation:
|
||||
|
||||
```python
|
||||
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
|
||||
node_code = '''
|
||||
search_node = NodeSpec(
|
||||
id="search-web",
|
||||
node_type="event_loop",
|
||||
input_keys=["query"],
|
||||
output_keys=["search_results"],
|
||||
system_prompt="Search the web for: {query}. Use web_search, then call set_output to store results.",
|
||||
tools=["web_search"],
|
||||
)
|
||||
'''
|
||||
|
||||
Edit(
|
||||
file_path="exports/research_agent/nodes/__init__.py",
|
||||
old_string="# Nodes will be added here",
|
||||
new_string=node_code
|
||||
)
|
||||
|
||||
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
|
||||
validation = mcp__agent-builder__test_node(
|
||||
node_id="search-web",
|
||||
test_input='{"query": "python tutorials"}',
|
||||
mock_llm_response='{"search_results": [...mock results...]}'
|
||||
)
|
||||
```
|
||||
|
||||
**User experience:**
|
||||
|
||||
- Immediately sees node in their editor (from step 1)
|
||||
- Gets validation feedback (from step 2)
|
||||
- Can edit the file directly if needed
|
||||
|
||||
## Multi-Turn Interaction Patterns
|
||||
|
||||
For agents needing multi-turn conversations with users, use `client_facing=True` on event_loop nodes.
|
||||
|
||||
### Client-Facing Nodes
|
||||
|
||||
A client-facing node streams LLM output to the user and blocks for user input between conversational turns. This replaces the old pause/resume pattern.
|
||||
|
||||
```python
|
||||
# Client-facing node with STEP 1/STEP 2 prompt pattern
|
||||
intake_node = NodeSpec(
|
||||
id="intake",
|
||||
name="Intake",
|
||||
description="Gather requirements from the user",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
input_keys=["topic"],
|
||||
output_keys=["research_brief"],
|
||||
system_prompt="""\
|
||||
You are an intake specialist.
|
||||
|
||||
**STEP 1 — Read and respond (text only, NO tool calls):**
|
||||
1. Read the topic provided
|
||||
2. If it's vague, ask 1-2 clarifying questions
|
||||
3. If it's clear, confirm your understanding
|
||||
|
||||
**STEP 2 — After the user confirms, call set_output:**
|
||||
- set_output("research_brief", "Clear description of what to research")
|
||||
""",
|
||||
)
|
||||
|
||||
# Internal node runs without user interaction
|
||||
research_node = NodeSpec(
|
||||
id="research",
|
||||
name="Research",
|
||||
description="Search and analyze sources",
|
||||
node_type="event_loop",
|
||||
input_keys=["research_brief"],
|
||||
output_keys=["findings", "sources"],
|
||||
system_prompt="Research the topic using web_search and web_scrape...",
|
||||
tools=["web_search", "web_scrape", "load_data", "save_data"],
|
||||
)
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
- Client-facing nodes stream LLM text to the user and block for input after each response
|
||||
- User input is injected via `node.inject_event(text)`
|
||||
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
|
||||
- Internal nodes (non-client-facing) run their entire loop without blocking
|
||||
- `set_output` is a synthetic tool — a turn with only `set_output` calls (no real tools) triggers user input blocking
|
||||
|
||||
**STEP 1/STEP 2 pattern:** Always structure client-facing prompts with explicit phases. STEP 1 is text-only conversation. STEP 2 calls `set_output` after user confirmation. This prevents the LLM from calling `set_output` prematurely before the user responds.
|
||||
|
||||
### When to Use client_facing
|
||||
|
||||
| Scenario | client_facing | Why |
|
||||
| ----------------------------------- | :-----------: | ---------------------- |
|
||||
| Gathering user requirements | Yes | Need user input |
|
||||
| Human review/approval checkpoint | Yes | Need human decision |
|
||||
| Data processing (scanning, scoring) | No | Runs autonomously |
|
||||
| Report generation | No | No user input needed |
|
||||
| Final confirmation before action | Yes | Need explicit approval |
|
||||
|
||||
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
|
||||
|
||||
## Edge-Based Routing and Feedback Loops
|
||||
|
||||
### Conditional Edge Routing
|
||||
|
||||
Multiple conditional edges from the same source replace the old `router` node type. Each edge checks a condition on the node's output.
|
||||
|
||||
```python
|
||||
# Node with mutually exclusive outputs
|
||||
review_node = NodeSpec(
|
||||
id="review",
|
||||
name="Review",
|
||||
node_type="event_loop",
|
||||
client_facing=True,
|
||||
output_keys=["approved_contacts", "redo_extraction"],
|
||||
nullable_output_keys=["approved_contacts", "redo_extraction"],
|
||||
max_node_visits=3,
|
||||
system_prompt="Present the contact list to the operator. If they approve, call set_output('approved_contacts', ...). If they want changes, call set_output('redo_extraction', 'true').",
|
||||
)
|
||||
|
||||
# Forward edge (positive priority, evaluated first)
|
||||
EdgeSpec(
|
||||
id="review-to-campaign",
|
||||
source="review",
|
||||
target="campaign-builder",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="output.get('approved_contacts') is not None",
|
||||
priority=1,
|
||||
)
|
||||
|
||||
# Feedback edge (negative priority, evaluated after forward edges)
|
||||
EdgeSpec(
|
||||
id="review-feedback",
|
||||
source="review",
|
||||
target="extractor",
|
||||
condition=EdgeCondition.CONDITIONAL,
|
||||
condition_expr="output.get('redo_extraction') is not None",
|
||||
priority=-1,
|
||||
)
|
||||
```
|
||||
|
||||
**Key concepts:**
|
||||
|
||||
- `nullable_output_keys`: Lists output keys that may remain unset. The node sets exactly one of the mutually exclusive keys per execution.
|
||||
- `max_node_visits`: Must be >1 on the feedback target (extractor) so it can re-execute. Default is 1.
|
||||
- `priority`: Positive = forward edge (evaluated first). Negative = feedback edge. The executor tries forward edges first; if none match, falls back to feedback edges.
|
||||
|
||||
### Routing Decision Table
|
||||
|
||||
| Pattern | Old Approach | New Approach |
|
||||
| ---------------------- | ----------------------- | --------------------------------------------- |
|
||||
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
|
||||
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
|
||||
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
|
||||
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
|
||||
|
||||
## Judge Patterns
|
||||
|
||||
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
|
||||
|
||||
- Output rollback logic
|
||||
- `_user_has_responded` flags
|
||||
- Premature set_output rejection
|
||||
- Interaction protocol injection into system prompts
|
||||
|
||||
Judges control when an event_loop node's loop exits. Choose based on validation needs.
|
||||
|
||||
### Implicit Judge (Default)
|
||||
|
||||
When no judge is configured, the implicit judge ACCEPTs when:
|
||||
|
||||
- The LLM finishes its response with no tool calls
|
||||
- All required output keys have been set via `set_output`
|
||||
|
||||
Best for simple nodes where "all outputs set" is sufficient validation.
|
||||
|
||||
### SchemaJudge
|
||||
|
||||
Validates outputs against a Pydantic model. Use when you need structural validation.
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
|
||||
class ScannerOutput(BaseModel):
|
||||
github_users: list[dict] # Must be a list of user objects
|
||||
|
||||
class SchemaJudge:
|
||||
def __init__(self, output_model: type[BaseModel]):
|
||||
self._model = output_model
|
||||
|
||||
async def evaluate(self, context: dict) -> JudgeVerdict:
|
||||
missing = context.get("missing_keys", [])
|
||||
if missing:
|
||||
return JudgeVerdict(
|
||||
action="RETRY",
|
||||
feedback=f"Missing output keys: {missing}. Use set_output to provide them.",
|
||||
)
|
||||
try:
|
||||
self._model.model_validate(context["output_accumulator"])
|
||||
return JudgeVerdict(action="ACCEPT")
|
||||
except ValidationError as e:
|
||||
return JudgeVerdict(action="RETRY", feedback=str(e))
|
||||
```
|
||||
|
||||
### When to Use Which Judge
|
||||
|
||||
| Judge | Use When | Example |
|
||||
| --------------- | ------------------------------------- | ---------------------- |
|
||||
| Implicit (None) | Output keys are sufficient validation | Simple data extraction |
|
||||
| SchemaJudge | Need structural validation of outputs | API response parsing |
|
||||
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
|
||||
|
||||
## Fan-Out / Fan-In (Parallel Execution)
|
||||
|
||||
Multiple ON_SUCCESS edges from the same source trigger parallel execution. All branches run concurrently via `asyncio.gather()`.
|
||||
|
||||
```python
|
||||
# Scanner fans out to Profiler and Scorer in parallel
|
||||
EdgeSpec(id="scanner-to-profiler", source="scanner", target="profiler",
|
||||
condition=EdgeCondition.ON_SUCCESS)
|
||||
EdgeSpec(id="scanner-to-scorer", source="scanner", target="scorer",
|
||||
condition=EdgeCondition.ON_SUCCESS)
|
||||
|
||||
# Both fan in to Extractor
|
||||
EdgeSpec(id="profiler-to-extractor", source="profiler", target="extractor",
|
||||
condition=EdgeCondition.ON_SUCCESS)
|
||||
EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
|
||||
condition=EdgeCondition.ON_SUCCESS)
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- Parallel event_loop nodes must have **disjoint output_keys** (no key written by both)
|
||||
- Only one parallel branch may contain a `client_facing` node
|
||||
- Fan-in node receives outputs from all completed branches in shared memory
|
||||
|
||||
## Context Management Patterns
|
||||
|
||||
### Tiered Compaction
|
||||
|
||||
EventLoopNode automatically manages context window usage with tiered compaction:
|
||||
|
||||
1. **Pruning** — Old tool results replaced with compact placeholders (zero-cost, no LLM call)
|
||||
2. **Normal compaction** — LLM summarizes older messages
|
||||
3. **Aggressive compaction** — Keeps only recent messages + summary
|
||||
4. **Emergency** — Hard reset with tool history preservation
|
||||
|
||||
### Spillover Pattern
|
||||
|
||||
The framework automatically truncates large tool results and saves full content to a spillover directory. The LLM receives a truncation message with instructions to use `load_data` to read the full result.
|
||||
|
||||
For explicit data management, use the data tools (real MCP tools, not synthetic):
|
||||
|
||||
```python
|
||||
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
|
||||
# data_dir is auto-injected by the framework — the LLM never sees it
|
||||
|
||||
# Saving large results
|
||||
save_data(filename="sources.json", data=large_json_string)
|
||||
|
||||
# Reading with pagination (line-based offset/limit)
|
||||
load_data(filename="sources.json", offset=0, limit=50)
|
||||
|
||||
# Listing available files
|
||||
list_data_files()
|
||||
|
||||
# Serving a file to the user as a clickable link
|
||||
serve_file_to_user(filename="report.html", label="Research Report")
|
||||
```
|
||||
|
||||
Add data tools to nodes that handle large tool results:
|
||||
|
||||
```python
|
||||
research_node = NodeSpec(
|
||||
...
|
||||
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
|
||||
)
|
||||
```
|
||||
|
||||
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### What NOT to Do
|
||||
|
||||
- **Don't rely on `export_graph`** — Write files immediately, not at end
|
||||
- **Don't hide code in session** — Write to files as components are approved
|
||||
- **Don't wait to write files** — Agent visible from first step
|
||||
- **Don't batch everything** — Write incrementally, one component at a time
|
||||
- **Don't create too many thin nodes** — Prefer fewer, richer nodes (see below)
|
||||
- **Don't add framework gating for LLM behavior** — Fix prompts or use judges instead
|
||||
|
||||
### Fewer, Richer Nodes
|
||||
|
||||
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
|
||||
|
||||
| Bad (8 thin nodes) | Good (4 rich nodes) |
|
||||
| ------------------- | ----------------------------------- |
|
||||
| parse-query | intake (client-facing) |
|
||||
| search-sources | research (search + fetch + analyze) |
|
||||
| fetch-content | review (client-facing) |
|
||||
| evaluate-sources | report (write + deliver) |
|
||||
| synthesize-findings | |
|
||||
| write-report | |
|
||||
| quality-check | |
|
||||
| save-report | |
|
||||
|
||||
**Why fewer nodes are better:**
|
||||
|
||||
- The LLM retains full context of its work within a single node
|
||||
- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
|
||||
- Fewer edges means simpler graph and fewer failure points
|
||||
- Data tools (`save_data`/`load_data`) handle context window limits within a single node
|
||||
|
||||
### MCP Tools - Correct Usage
|
||||
|
||||
**MCP tools OK for:**
|
||||
|
||||
- `test_node` — Validate node configuration with mock inputs
|
||||
- `validate_graph` — Check graph structure
|
||||
- `configure_loop` — Set event loop parameters
|
||||
- `create_session` — Track session state for bookkeeping
|
||||
|
||||
**Just don't:** Use MCP as the primary construction method or rely on export_graph
|
||||
|
||||
## Error Handling Patterns
|
||||
|
||||
### Graceful Failure with Fallback
|
||||
|
||||
```python
|
||||
edges = [
|
||||
# Success path
|
||||
EdgeSpec(id="api-success", source="api-call", target="process-results",
|
||||
condition=EdgeCondition.ON_SUCCESS),
|
||||
# Fallback on failure
|
||||
EdgeSpec(id="api-to-fallback", source="api-call", target="fallback-cache",
|
||||
condition=EdgeCondition.ON_FAILURE, priority=1),
|
||||
# Report if fallback also fails
|
||||
EdgeSpec(id="fallback-to-error", source="fallback-cache", target="report-error",
|
||||
condition=EdgeCondition.ON_FAILURE, priority=1),
|
||||
]
|
||||
```
|
||||
|
||||
## Handoff to Testing
|
||||
|
||||
When agent is complete, transition to testing phase:
|
||||
|
||||
### Pre-Testing Checklist
|
||||
|
||||
- [ ] Agent structure validates: `uv run python -m agent_name validate`
|
||||
- [ ] All nodes defined in nodes/**init**.py
|
||||
- [ ] All edges connect valid nodes with correct priorities
|
||||
- [ ] Feedback edge targets have `max_node_visits > 1`
|
||||
- [ ] Client-facing nodes have meaningful system prompts
|
||||
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **hive-concepts** — Fundamental concepts (node types, edges, event loop architecture)
|
||||
- **hive-create** — Step-by-step building process
|
||||
- **hive-test** — Test and validate agents
|
||||
- **hive** — Complete workflow orchestrator
|
||||
|
||||
---
|
||||
|
||||
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
|
||||
@@ -1,11 +1,11 @@
|
||||
---
|
||||
name: testing-agent
|
||||
name: hive-test
|
||||
description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results.
|
||||
---
|
||||
|
||||
# Testing Workflow
|
||||
|
||||
This skill provides tools for testing agents built with the building-agents skill.
|
||||
This skill provides tools for testing agents built with the hive-create skill.
|
||||
|
||||
## Workflow Overview
|
||||
|
||||
@@ -61,7 +61,7 @@ mcp__agent-builder__debug_test(
|
||||
|
||||
# Testing Agents with MCP Tools
|
||||
|
||||
Run goal-based evaluation tests for agents built with the building-agents skill.
|
||||
Run goal-based evaluation tests for agents built with the hive-create skill.
|
||||
|
||||
**Key Principle: MCP tools provide guidelines, Claude writes tests directly**
|
||||
- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines
|
||||
@@ -279,7 +279,7 @@ if missing_creds:
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ GOAL STAGE │
|
||||
│ (building-agents skill) │
|
||||
│ (hive-create skill) │
|
||||
│ │
|
||||
│ 1. User defines goal with success_criteria and constraints │
|
||||
│ 2. Goal written to agent.py immediately │
|
||||
@@ -289,7 +289,7 @@ if missing_creds:
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT STAGE │
|
||||
│ (building-agents skill) │
|
||||
│ (hive-create skill) │
|
||||
│ │
|
||||
│ Build nodes + edges, written immediately to files │
|
||||
│ Constraint tests can run during development: │
|
||||
@@ -608,7 +608,7 @@ Edit(
|
||||
)
|
||||
|
||||
# 4. May need to regenerate agent nodes if goal changed significantly
|
||||
# This requires going back to building-agents skill
|
||||
# This requires going back to hive-create skill
|
||||
```
|
||||
|
||||
#### EDGE_CASE → Add Test and Fix
|
||||
@@ -930,9 +930,10 @@ assert approval == "APPROVED", f"Expected APPROVED, got {approval}"
|
||||
- `steps_executed: int` - Number of nodes executed
|
||||
- `total_tokens: int` - Cumulative token usage
|
||||
- `total_latency_ms: int` - Total execution time
|
||||
- `path: list[str]` - Node IDs traversed
|
||||
- `path: list[str]` - Node IDs traversed (may contain repeated IDs from feedback loops)
|
||||
- `paused_at: str | None` - Node ID if HITL pause occurred
|
||||
- `session_state: dict` - State for resuming
|
||||
- `node_visit_counts: dict[str, int]` - How many times each node executed (useful for feedback loop testing)
|
||||
|
||||
### Happy Path Test
|
||||
```python
|
||||
@@ -975,17 +976,68 @@ async def test_performance_latency(mock_mode):
|
||||
assert duration < 5.0, f"Took {{duration}}s, expected <5s"
|
||||
```
|
||||
|
||||
## Integration with building-agents
|
||||
### Testing Event Loop Nodes
|
||||
|
||||
Event loop nodes run multi-turn loops internally. Tests should verify:
|
||||
|
||||
**Output Keys Test** — All required keys are set via `set_output`:
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_all_output_keys_set(mock_mode):
|
||||
"""Test that event_loop nodes set all required output keys."""
|
||||
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
|
||||
assert result.success, f"Agent failed: {{result.error}}"
|
||||
output = result.output or {{}}
|
||||
for key in ["expected_key_1", "expected_key_2"]:
|
||||
assert key in output, f"Output key '{{key}}' not set by event_loop node"
|
||||
```
|
||||
|
||||
**Feedback Loop Test** — Verify feedback loops terminate:
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_feedback_loop_respects_max_visits(mock_mode):
|
||||
"""Test that feedback loops terminate at max_node_visits."""
|
||||
result = await default_agent.run({{"input": "trigger_rejection"}}, mock_mode=mock_mode)
|
||||
assert result.success or result.error is not None
|
||||
visits = getattr(result, "node_visit_counts", {{}}) or {{}}
|
||||
for node_id, count in visits.items():
|
||||
assert count <= 5, f"Node {{node_id}} visited {{count}} times"
|
||||
```
|
||||
|
||||
**Fan-Out Test** — Verify parallel branches both complete:
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_parallel_branches_complete(mock_mode):
|
||||
"""Test that fan-out branches all complete and produce outputs."""
|
||||
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
|
||||
assert result.success
|
||||
output = result.output or {{}}
|
||||
# Check outputs from both parallel branches
|
||||
assert "branch_a_output" in output, "Branch A output missing"
|
||||
assert "branch_b_output" in output, "Branch B output missing"
|
||||
```
|
||||
|
||||
**Client-Facing Node Test** — In mock mode, client-facing nodes may not block:
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_client_facing_node(mock_mode):
|
||||
"""Test that client-facing nodes produce output."""
|
||||
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
|
||||
# In mock mode, client-facing blocking is typically bypassed
|
||||
assert result.success or result.paused_at is not None
|
||||
```
|
||||
|
||||
## Integration with hive-create
|
||||
|
||||
### Handoff Points
|
||||
|
||||
| Scenario | From | To | Action |
|
||||
|----------|------|-----|--------|
|
||||
| Agent built, ready to test | building-agents | testing-agent | Generate success tests |
|
||||
| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild |
|
||||
| IMPLEMENTATION_ERROR found | testing-agent | Direct fix | Edit agent files, re-run tests |
|
||||
| EDGE_CASE found | testing-agent | testing-agent | Add edge case test |
|
||||
| All tests pass | testing-agent | Done | Agent validated ✅ |
|
||||
| Agent built, ready to test | hive-create | hive-test | Generate success tests |
|
||||
| LOGIC_ERROR found | hive-test | hive-create | Update goal, rebuild |
|
||||
| IMPLEMENTATION_ERROR found | hive-test | Direct fix | Edit agent files, re-run tests |
|
||||
| EDGE_CASE found | hive-test | hive-test | Add edge case test |
|
||||
| All tests pass | hive-test | Done | Agent validated ✅ |
|
||||
|
||||
### Iteration Speed Comparison
|
||||
|
||||
+3
-3
@@ -4,7 +4,7 @@ This example walks through testing a YouTube research agent that finds relevant
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Agent built with building-agents skill at `exports/youtube-research/`
|
||||
- Agent built with hive-create skill at `exports/youtube-research/`
|
||||
- Goal defined with success criteria and constraints
|
||||
|
||||
## Step 1: Load the Goal
|
||||
@@ -283,11 +283,11 @@ result = debug_test(
|
||||
Since this is an **IMPLEMENTATION_ERROR**, we:
|
||||
|
||||
1. **Don't restart** the Goal → Agent → Eval flow
|
||||
2. **Fix the agent** using building-agents skill:
|
||||
2. **Fix the agent** using hive-create skill:
|
||||
- Modify `filter_node` to handle null results
|
||||
3. **Re-run Eval** (tests only)
|
||||
|
||||
### Fix in building-agents:
|
||||
### Fix in hive-create:
|
||||
|
||||
```python
|
||||
# Update the filter_node to handle null
|
||||
@@ -1,32 +1,49 @@
|
||||
---
|
||||
name: agent-workflow
|
||||
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-* and testing-agent skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
|
||||
name: hive
|
||||
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: hive
|
||||
version: "2.0"
|
||||
type: workflow-orchestrator
|
||||
orchestrates:
|
||||
- building-agents-core
|
||||
- building-agents-construction
|
||||
- building-agents-patterns
|
||||
- testing-agent
|
||||
- setup-credentials
|
||||
- hive-concepts
|
||||
- hive-create
|
||||
- hive-patterns
|
||||
- hive-test
|
||||
- hive-credentials
|
||||
- hive-debugger
|
||||
---
|
||||
|
||||
# Agent Development Workflow
|
||||
|
||||
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
|
||||
|
||||
When this skill is loaded, determine what the user needs and invoke the appropriate skill NOW:
|
||||
- **User wants to build an agent** → Invoke `/hive-create` immediately
|
||||
- **User wants to test an agent** → Invoke `/hive-test` immediately
|
||||
- **User wants to learn concepts** → Invoke `/hive-concepts` immediately
|
||||
- **User wants patterns/optimization** → Invoke `/hive-patterns` immediately
|
||||
- **User wants to set up credentials** → Invoke `/hive-credentials` immediately
|
||||
- **User has a failing/broken agent** → Invoke `/hive-debugger` immediately
|
||||
- **Unclear what user needs** → Ask the user (do NOT explore the codebase to figure it out)
|
||||
|
||||
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
|
||||
|
||||
---
|
||||
|
||||
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
|
||||
|
||||
## Overview
|
||||
|
||||
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
|
||||
|
||||
1. **Understand Concepts** → `/building-agents-core` (optional)
|
||||
2. **Build Structure** → `/building-agents-construction`
|
||||
3. **Optimize Design** → `/building-agents-patterns` (optional)
|
||||
4. **Setup Credentials** → `/setup-credentials` (if agent uses tools requiring API keys)
|
||||
5. **Test & Validate** → `/testing-agent`
|
||||
1. **Understand Concepts** → `/hive-concepts` (optional)
|
||||
2. **Build Structure** → `/hive-create`
|
||||
3. **Optimize Design** → `/hive-patterns` (optional)
|
||||
4. **Setup Credentials** → `/hive-credentials` (if agent uses tools requiring API keys)
|
||||
5. **Test & Validate** → `/hive-test`
|
||||
6. **Debug Issues** → `/hive-debugger` (if agent fails at runtime)
|
||||
|
||||
## When to Use This Workflow
|
||||
|
||||
@@ -37,17 +54,19 @@ Use this meta-skill when:
|
||||
- Want consistent, repeatable agent builds
|
||||
|
||||
**Skip this workflow** if:
|
||||
- You only need to test an existing agent → use `/testing-agent` directly
|
||||
- You only need to test an existing agent → use `/hive-test` directly
|
||||
- You know exactly which phase you're in → use specific skill directly
|
||||
|
||||
## Quick Decision Tree
|
||||
|
||||
```
|
||||
"Need to understand agent concepts" → building-agents-core
|
||||
"Build a new agent" → building-agents-construction
|
||||
"Optimize my agent design" → building-agents-patterns
|
||||
"Set up API keys for my agent" → setup-credentials
|
||||
"Test my agent" → testing-agent
|
||||
"Need to understand agent concepts" → hive-concepts
|
||||
"Build a new agent" → hive-create
|
||||
"Optimize my agent design" → hive-patterns
|
||||
"Need client-facing nodes or feedback loops" → hive-patterns
|
||||
"Set up API keys for my agent" → hive-credentials
|
||||
"Test my agent" → hive-test
|
||||
"My agent is failing/stuck/has errors" → hive-debugger
|
||||
"Not sure what I need" → Read phases below, then decide
|
||||
"Agent has structure but needs implementation" → See agent directory STATUS.md
|
||||
```
|
||||
@@ -55,7 +74,7 @@ Use this meta-skill when:
|
||||
## Phase 0: Understand Concepts (Optional)
|
||||
|
||||
**Duration**: 5-10 minutes
|
||||
**Skill**: `/building-agents-core`
|
||||
**Skill**: `/hive-concepts`
|
||||
**Input**: Questions about agent architecture
|
||||
|
||||
### When to Use
|
||||
@@ -63,12 +82,12 @@ Use this meta-skill when:
|
||||
- First time building an agent
|
||||
- Need to understand node types, edges, goals
|
||||
- Want to validate tool availability
|
||||
- Learning about pause/resume architecture
|
||||
- Learning about event loop architecture and client-facing nodes
|
||||
|
||||
### What This Phase Provides
|
||||
|
||||
- Architecture overview (Python packages, not JSON)
|
||||
- Core concepts (Goal, Node, Edge, Pause/Resume)
|
||||
- Core concepts (Goal, Node, Edge, Event Loop, Judges)
|
||||
- Tool discovery and validation procedures
|
||||
- Workflow overview
|
||||
|
||||
@@ -77,7 +96,7 @@ Use this meta-skill when:
|
||||
## Phase 1: Build Agent Structure
|
||||
|
||||
**Duration**: 15-30 minutes
|
||||
**Skill**: `/building-agents-construction`
|
||||
**Skill**: `/hive-create`
|
||||
**Input**: User requirements ("Build an agent that...")
|
||||
|
||||
### What This Phase Does
|
||||
@@ -106,7 +125,7 @@ Creates the complete agent architecture:
|
||||
- ✅ 1-5 constraints defined
|
||||
- ✅ 5-10 nodes specified in nodes/__init__.py
|
||||
- ✅ 8-15 edges connecting workflow
|
||||
- ✅ Validated structure (passes `python -m agent_name validate`)
|
||||
- ✅ Validated structure (passes `uv run python -m agent_name validate`)
|
||||
- ✅ README.md with usage instructions
|
||||
- ✅ CLI commands (info, validate, run, shell)
|
||||
|
||||
@@ -120,7 +139,7 @@ You're ready for Phase 2 when:
|
||||
|
||||
### Common Outputs
|
||||
|
||||
The building-agents-construction skill produces:
|
||||
The hive-create skill produces:
|
||||
```
|
||||
exports/agent_name/
|
||||
├── __init__.py (package exports)
|
||||
@@ -140,7 +159,7 @@ exports/agent_name/
|
||||
→ You may need to add Python functions or MCP tools (not covered by current skills)
|
||||
|
||||
**If want to optimize design:**
|
||||
→ Proceed to Phase 1.5 (building-agents-patterns)
|
||||
→ Proceed to Phase 1.5 (hive-patterns)
|
||||
|
||||
**If ready to test:**
|
||||
→ Proceed to Phase 2
|
||||
@@ -148,31 +167,32 @@ exports/agent_name/
|
||||
## Phase 1.5: Optimize Design (Optional)
|
||||
|
||||
**Duration**: 10-15 minutes
|
||||
**Skill**: `/building-agents-patterns`
|
||||
**Skill**: `/hive-patterns`
|
||||
**Input**: Completed agent structure
|
||||
|
||||
### When to Use
|
||||
|
||||
- Want to add pause/resume functionality
|
||||
- Want to add client-facing blocking or feedback edges
|
||||
- Need judge patterns for output validation
|
||||
- Want fan-out/fan-in (parallel execution)
|
||||
- Need error handling patterns
|
||||
- Want to optimize performance
|
||||
- Need examples of complex routing
|
||||
- Want best practices guidance
|
||||
|
||||
### What This Phase Provides
|
||||
|
||||
- Practical examples and patterns
|
||||
- Pause/resume architecture
|
||||
- Error handling strategies
|
||||
- Client-facing interaction patterns
|
||||
- Feedback edge routing with nullable output keys
|
||||
- Judge patterns (implicit, SchemaJudge)
|
||||
- Fan-out/fan-in parallel execution
|
||||
- Context management and spillover patterns
|
||||
- Anti-patterns to avoid
|
||||
- Performance optimization techniques
|
||||
|
||||
**Skip this phase** if your agent design is straightforward.
|
||||
|
||||
## Phase 2: Test & Validate
|
||||
|
||||
**Duration**: 20-40 minutes
|
||||
**Skill**: `/testing-agent`
|
||||
**Skill**: `/hive-test`
|
||||
**Input**: Working agent from Phase 1
|
||||
|
||||
### What This Phase Does
|
||||
@@ -249,9 +269,9 @@ You're done when:
|
||||
|
||||
```
|
||||
User: "Build an agent that monitors files"
|
||||
→ Use /building-agents-construction
|
||||
→ Use /hive-create
|
||||
→ Agent structure created
|
||||
→ Use /testing-agent
|
||||
→ Use /hive-test
|
||||
→ Tests created and passing
|
||||
→ Done: Production-ready agent
|
||||
```
|
||||
@@ -260,10 +280,10 @@ User: "Build an agent that monitors files"
|
||||
|
||||
```
|
||||
User: "Build an agent (first time)"
|
||||
→ Use /building-agents-core (understand concepts)
|
||||
→ Use /building-agents-construction (build structure)
|
||||
→ Use /building-agents-patterns (optimize design)
|
||||
→ Use /testing-agent (validate)
|
||||
→ Use /hive-concepts (understand concepts)
|
||||
→ Use /hive-create (build structure)
|
||||
→ Use /hive-patterns (optimize design)
|
||||
→ Use /hive-test (validate)
|
||||
→ Done: Production-ready agent
|
||||
```
|
||||
|
||||
@@ -272,7 +292,7 @@ User: "Build an agent (first time)"
|
||||
```
|
||||
User: "Test my agent at exports/my_agent"
|
||||
→ Skip Phase 1
|
||||
→ Use /testing-agent directly
|
||||
→ Use /hive-test directly
|
||||
→ Tests created
|
||||
→ Done: Validated agent
|
||||
```
|
||||
@@ -281,58 +301,71 @@ User: "Test my agent at exports/my_agent"
|
||||
|
||||
```
|
||||
User: "Build an agent"
|
||||
→ Use /building-agents-construction (Phase 1)
|
||||
→ Use /hive-create (Phase 1)
|
||||
→ Implementation needed (see STATUS.md)
|
||||
→ [User implements functions]
|
||||
→ Use /testing-agent (Phase 2)
|
||||
→ Use /hive-test (Phase 2)
|
||||
→ Tests reveal bugs
|
||||
→ [Fix bugs manually]
|
||||
→ Re-run tests
|
||||
→ Done: Working agent
|
||||
```
|
||||
|
||||
### Pattern 4: Complex Agent with Patterns
|
||||
### Pattern 4: Agent with Review Loops and HITL Checkpoints
|
||||
|
||||
```
|
||||
User: "Build an agent with multi-turn conversations"
|
||||
→ Use /building-agents-core (learn pause/resume)
|
||||
→ Use /building-agents-construction (build structure)
|
||||
→ Use /building-agents-patterns (implement pause/resume pattern)
|
||||
→ Use /testing-agent (validate conversation flows)
|
||||
→ Done: Complex conversational agent
|
||||
User: "Build an agent with human review and feedback loops"
|
||||
→ Use /hive-concepts (learn event loop, client-facing nodes)
|
||||
→ Use /hive-create (build structure with feedback edges)
|
||||
→ Use /hive-patterns (implement client-facing + feedback patterns)
|
||||
→ Use /hive-test (validate review flows and edge routing)
|
||||
→ Done: Agent with HITL checkpoints and review loops
|
||||
```
|
||||
|
||||
## Skill Dependencies
|
||||
|
||||
```
|
||||
agent-workflow (meta-skill)
|
||||
hive (meta-skill)
|
||||
│
|
||||
├── building-agents-core (foundational)
|
||||
│ ├── Architecture concepts
|
||||
│ ├── Node/Edge/Goal definitions
|
||||
├── hive-concepts (foundational)
|
||||
│ ├── Architecture concepts (event loop, judges)
|
||||
│ ├── Node types (event_loop, function)
|
||||
│ ├── Edge routing and priority
|
||||
│ ├── Tool discovery procedures
|
||||
│ └── Workflow overview
|
||||
│
|
||||
├── building-agents-construction (procedural)
|
||||
├── hive-create (procedural)
|
||||
│ ├── Creates package structure
|
||||
│ ├── Defines goal
|
||||
│ ├── Adds nodes incrementally
|
||||
│ ├── Connects edges
|
||||
│ ├── Adds nodes (event_loop, function)
|
||||
│ ├── Connects edges with priority routing
|
||||
│ ├── Finalizes agent class
|
||||
│ └── Requires: building-agents-core
|
||||
│ └── Requires: hive-concepts
|
||||
│
|
||||
├── building-agents-patterns (reference)
|
||||
│ ├── Best practices
|
||||
│ ├── Pause/resume patterns
|
||||
│ ├── Error handling
|
||||
│ ├── Anti-patterns
|
||||
│ └── Performance optimization
|
||||
├── hive-patterns (reference)
|
||||
│ ├── Client-facing interaction patterns
|
||||
│ ├── Feedback edges and review loops
|
||||
│ ├── Judge patterns (implicit, SchemaJudge)
|
||||
│ ├── Fan-out/fan-in parallel execution
|
||||
│ └── Context management and anti-patterns
|
||||
│
|
||||
└── testing-agent
|
||||
├── Reads agent goal
|
||||
├── Generates tests
|
||||
├── Runs evaluation
|
||||
└── Reports results
|
||||
├── hive-credentials (utility)
|
||||
│ ├── Detects missing credentials
|
||||
│ ├── Offers auth method choices (Aden OAuth, direct API key)
|
||||
│ ├── Stores securely in ~/.hive/credentials
|
||||
│ └── Validates with health checks
|
||||
│
|
||||
├── hive-test (validation)
|
||||
│ ├── Reads agent goal
|
||||
│ ├── Generates tests
|
||||
│ ├── Runs evaluation
|
||||
│ └── Reports results
|
||||
│
|
||||
└── hive-debugger (troubleshooting)
|
||||
├── Monitors runtime logs (L1/L2/L3)
|
||||
├── Identifies retry loops, tool failures
|
||||
├── Categorizes issues (10 categories)
|
||||
└── Provides fix recommendations
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
@@ -342,13 +375,13 @@ agent-workflow (meta-skill)
|
||||
- Check node IDs match between nodes/__init__.py and agent.py
|
||||
- Verify all edges reference valid node IDs
|
||||
- Ensure entry_node exists in nodes list
|
||||
- Run: `PYTHONPATH=core:exports python -m agent_name validate`
|
||||
- Run: `PYTHONPATH=exports uv run python -m agent_name validate`
|
||||
|
||||
### "Agent has structure but won't run"
|
||||
|
||||
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
|
||||
- Implementation may be needed (Python functions or MCP tools)
|
||||
- This is expected - building-agents-construction creates structure, not implementation
|
||||
- This is expected - hive-create creates structure, not implementation
|
||||
- See implementation guide for completion options
|
||||
|
||||
### "Tests are failing"
|
||||
@@ -356,9 +389,16 @@ agent-workflow (meta-skill)
|
||||
- Review test output for specific failures
|
||||
- Check agent goal and success criteria
|
||||
- Verify constraints are met
|
||||
- Use `/testing-agent` to debug and iterate
|
||||
- Use `/hive-test` to debug and iterate
|
||||
- Fix agent code and re-run tests
|
||||
|
||||
### "Agent is failing at runtime"
|
||||
|
||||
- Use `/hive-debugger` to analyze runtime logs
|
||||
- The debugger identifies retry loops, tool failures, and stalled execution
|
||||
- Get actionable fix recommendations with code changes
|
||||
- Monitor the agent in real-time during TUI sessions
|
||||
|
||||
### "Not sure which phase I'm in"
|
||||
|
||||
Run these checks:
|
||||
@@ -368,7 +408,7 @@ Run these checks:
|
||||
ls exports/my_agent/agent.py
|
||||
|
||||
# Check if it validates
|
||||
PYTHONPATH=core:exports python -m my_agent validate
|
||||
PYTHONPATH=exports uv run python -m my_agent validate
|
||||
|
||||
# Check if tests exist
|
||||
ls exports/my_agent/tests/
|
||||
@@ -417,10 +457,10 @@ You're done with the workflow when:
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
|
||||
- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
|
||||
- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
|
||||
- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
|
||||
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
|
||||
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
|
||||
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
|
||||
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
|
||||
- **Agent framework docs**: See `core/README.md`
|
||||
- **Example agents**: See `exports/` directory
|
||||
|
||||
@@ -428,36 +468,45 @@ You're done with the workflow when:
|
||||
|
||||
This workflow provides a proven path from concept to production-ready agent:
|
||||
|
||||
1. **Learn** with `/building-agents-core` → Understand fundamentals (optional)
|
||||
2. **Build** with `/building-agents-construction` → Get validated structure
|
||||
3. **Optimize** with `/building-agents-patterns` → Apply best practices (optional)
|
||||
4. **Test** with `/testing-agent` → Get verified functionality
|
||||
1. **Learn** with `/hive-concepts` → Understand fundamentals (optional)
|
||||
2. **Build** with `/hive-create` → Get validated structure
|
||||
3. **Optimize** with `/hive-patterns` → Apply best practices (optional)
|
||||
4. **Configure** with `/hive-credentials` → Set up API keys (if needed)
|
||||
5. **Test** with `/hive-test` → Get verified functionality
|
||||
6. **Debug** with `/hive-debugger` → Fix runtime issues (if needed)
|
||||
|
||||
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
|
||||
|
||||
## Skill Selection Guide
|
||||
|
||||
**Choose building-agents-core when:**
|
||||
**Choose hive-concepts when:**
|
||||
- First time building agents
|
||||
- Need to understand architecture
|
||||
- Need to understand event loop architecture
|
||||
- Validating tool availability
|
||||
- Learning about node types and edges
|
||||
- Learning about node types, edges, and judges
|
||||
|
||||
**Choose building-agents-construction when:**
|
||||
**Choose hive-create when:**
|
||||
- Actually building an agent
|
||||
- Have clear requirements
|
||||
- Ready to write code
|
||||
- Want step-by-step guidance
|
||||
|
||||
**Choose building-agents-patterns when:**
|
||||
**Choose hive-patterns when:**
|
||||
- Agent structure complete
|
||||
- Need advanced patterns
|
||||
- Implementing pause/resume
|
||||
- Optimizing performance
|
||||
- Need client-facing nodes or feedback edges
|
||||
- Implementing review loops or fan-out/fan-in
|
||||
- Want judge patterns or context management
|
||||
- Want best practices
|
||||
|
||||
**Choose testing-agent when:**
|
||||
**Choose hive-test when:**
|
||||
- Agent structure complete
|
||||
- Ready to validate functionality
|
||||
- Need comprehensive test coverage
|
||||
- Debugging agent behavior
|
||||
- Testing feedback loops, output keys, or fan-out
|
||||
|
||||
**Choose hive-debugger when:**
|
||||
- Agent is failing or stuck at runtime
|
||||
- Seeing retry loops or escalations
|
||||
- Tool calls are failing
|
||||
- Need to understand why a node isn't completing
|
||||
- Want real-time monitoring of agent execution
|
||||
+7
-7
@@ -1,6 +1,6 @@
|
||||
# Example: File Monitor Agent
|
||||
|
||||
This example shows the complete agent-workflow in action for building a file monitoring agent.
|
||||
This example shows the complete /hive workflow in action for building a file monitoring agent.
|
||||
|
||||
## Initial Request
|
||||
|
||||
@@ -12,7 +12,7 @@ User: "Build an agent that monitors ~/Downloads and copies new files to ~/Docume
|
||||
|
||||
### Step 1: Create Structure
|
||||
|
||||
Agent invokes `/building-agents` skill and:
|
||||
Agent invokes `/hive-create` skill and:
|
||||
|
||||
1. Creates `exports/file_monitor_agent/` package
|
||||
2. Writes skeleton files (__init__.py, __main__.py, agent.py, etc.)
|
||||
@@ -75,10 +75,10 @@ initialize → list → identify → check
|
||||
### Step 5: Finalize
|
||||
|
||||
```bash
|
||||
$ PYTHONPATH=core:exports python -m file_monitor_agent validate
|
||||
$ PYTHONPATH=exports uv run python -m file_monitor_agent validate
|
||||
✓ Agent is valid
|
||||
|
||||
$ PYTHONPATH=core:exports python -m file_monitor_agent info
|
||||
$ PYTHONPATH=exports uv run python -m file_monitor_agent info
|
||||
Agent: File Monitor & Copy Agent
|
||||
Nodes: 7
|
||||
Edges: 8
|
||||
@@ -107,7 +107,7 @@ exports/file_monitor_agent/
|
||||
|
||||
### Step 1: Analyze Agent
|
||||
|
||||
Agent invokes `/testing-agent` skill and:
|
||||
Agent invokes `/hive-test` skill and:
|
||||
|
||||
1. Reads goal from `exports/file_monitor_agent/agent.py`
|
||||
2. Identifies 4 success criteria to test
|
||||
@@ -131,7 +131,7 @@ Tests approved incrementally by user.
|
||||
### Step 3: Run Tests
|
||||
|
||||
```bash
|
||||
$ PYTHONPATH=core:exports pytest exports/file_monitor_agent/tests/
|
||||
$ PYTHONPATH=exports uv run pytest exports/file_monitor_agent/tests/
|
||||
|
||||
test_constraints.py::test_preserves_originals PASSED
|
||||
test_constraints.py::test_handles_errors PASSED
|
||||
@@ -162,7 +162,7 @@ test_edge_cases.py::test_large_files PASSED
|
||||
./RUN_AGENT.sh
|
||||
|
||||
# Or manually
|
||||
PYTHONPATH=core:exports:tools/src python -m file_monitor_agent run
|
||||
PYTHONPATH=exports uv run python -m file_monitor_agent run
|
||||
```
|
||||
|
||||
**Capabilities:**
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/agent-workflow
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/building-agents-construction
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/building-agents-core
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/building-agents-patterns
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive-concepts
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive-create
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive-credentials
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive-patterns
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../.claude/skills/hive-test
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/testing-agent
|
||||
@@ -55,14 +55,10 @@ jobs:
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v4
|
||||
|
||||
- name: Install dependencies
|
||||
- name: Install dependencies and run tests
|
||||
run: |
|
||||
cd core
|
||||
uv sync
|
||||
|
||||
- name: Run tests
|
||||
run: |
|
||||
cd core
|
||||
uv run pytest tests/ -v
|
||||
|
||||
test-tools:
|
||||
@@ -126,7 +122,7 @@ jobs:
|
||||
for agent_dir in "${agent_dirs[@]}"; do
|
||||
if [ -f "$agent_dir/agent.json" ]; then
|
||||
echo "Validating $agent_dir"
|
||||
python -c "import json; json.load(open('$agent_dir/agent.json'))"
|
||||
uv run python -c "import json; json.load(open('$agent_dir/agent.json'))"
|
||||
validated=$((validated + 1))
|
||||
fi
|
||||
done
|
||||
|
||||
+1
-1
@@ -54,7 +54,6 @@ __pycache__/
|
||||
*.egg-info/
|
||||
.eggs/
|
||||
*.egg
|
||||
uv.lock
|
||||
|
||||
# Generated runtime data
|
||||
core/data/
|
||||
@@ -75,3 +74,4 @@ exports/*
|
||||
|
||||
docs/github-issues/*
|
||||
core/tests/*dumps/*
|
||||
screenshots/*
|
||||
@@ -1,6 +1,6 @@
|
||||
repos:
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
rev: v0.8.6
|
||||
rev: v0.15.0
|
||||
hooks:
|
||||
- id: ruff
|
||||
name: ruff lint (core)
|
||||
|
||||
+14
-7
@@ -4,7 +4,7 @@ Thank you for your interest in contributing to the Aden Agent Framework! This do
|
||||
|
||||
## Code of Conduct
|
||||
|
||||
By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md).
|
||||
By participating in this project, you agree to abide by our [Code of Conduct](docs/CODE_OF_CONDUCT.md).
|
||||
|
||||
## Issue Assignment Policy
|
||||
|
||||
@@ -35,9 +35,16 @@ You may submit PRs without prior assignment for:
|
||||
|
||||
1. Fork the repository
|
||||
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/hive.git`
|
||||
3. Create a feature branch: `git checkout -b feature/your-feature-name`
|
||||
4. Make your changes
|
||||
5. Run checks and tests:
|
||||
3. Add the upstream repository: `git remote add upstream https://github.com/adenhq/hive.git`
|
||||
4. Sync with upstream to ensure you're starting from the latest code:
|
||||
```bash
|
||||
git fetch upstream
|
||||
git checkout main
|
||||
git merge upstream/main
|
||||
```
|
||||
5. Create a feature branch: `git checkout -b feature/your-feature-name`
|
||||
6. Make your changes
|
||||
7. Run checks and tests:
|
||||
```bash
|
||||
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
|
||||
make test # Core tests (cd core && pytest tests/ -v)
|
||||
@@ -125,7 +132,7 @@ feat(component): add new feature description
|
||||
> **Note:** When testing agents in `exports/`, always set PYTHONPATH:
|
||||
>
|
||||
> ```bash
|
||||
> PYTHONPATH=core:exports python -m agent_name test
|
||||
> PYTHONPATH=exports uv run python -m agent_name test
|
||||
> ```
|
||||
|
||||
```bash
|
||||
@@ -139,7 +146,7 @@ make test
|
||||
cd core && pytest tests/ -v
|
||||
|
||||
# Run tests for a specific agent
|
||||
PYTHONPATH=core:exports python -m agent_name test
|
||||
PYTHONPATH=exports uv run python -m agent_name test
|
||||
```
|
||||
|
||||
> **CI also validates** that all exported agent JSON files (`exports/*/agent.json`) are well-formed JSON. Ensure your agent exports are valid before submitting.
|
||||
@@ -152,4 +159,4 @@ By submitting a Pull Request, you agree that your contributions will be licensed
|
||||
|
||||
Feel free to open an issue for questions or join our [Discord community](https://discord.com/invite/MXE49hrKDk).
|
||||
|
||||
Thank you for contributing!
|
||||
Thank you for contributing!
|
||||
|
||||
@@ -4,9 +4,11 @@ help: ## Show this help
|
||||
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
|
||||
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
|
||||
|
||||
lint: ## Run ruff linter (with auto-fix)
|
||||
lint: ## Run ruff linter and formatter (with auto-fix)
|
||||
cd core && ruff check --fix .
|
||||
cd tools && ruff check --fix .
|
||||
cd core && ruff format .
|
||||
cd tools && ruff format .
|
||||
|
||||
format: ## Run ruff formatter
|
||||
cd core && ruff format .
|
||||
@@ -19,8 +21,8 @@ check: ## Run all checks without modifying files (CI-safe)
|
||||
cd tools && ruff format --check .
|
||||
|
||||
test: ## Run all tests
|
||||
cd core && python -m pytest tests/ -v
|
||||
cd core && uv run python -m pytest tests/ -v
|
||||
|
||||
install-hooks: ## Install pre-commit hooks
|
||||
pip install pre-commit
|
||||
uv pip install pre-commit
|
||||
pre-commit install
|
||||
|
||||
@@ -1,51 +0,0 @@
|
||||
## Summary
|
||||
- **Added HubSpot integration** — new HubSpot MCP tool with search, get, create, and update operations for contacts, companies, and deals. Includes OAuth2 provider for HubSpot credentials and credential store adapter for the tools layer.
|
||||
- **Replaced web_scrape tool with Playwright + stealth** — swapped httpx/BeautifulSoup for a headless Chromium browser using `playwright` (async API) and `playwright-stealth`, enabling JS-rendered page scraping and bot detection evasion
|
||||
- **Added empty response retry logic** — LLM provider now detects empty responses (e.g. Gemini returning 200 with no content on rate limit) and retries with exponential backoff, preventing hallucinated output from the cleanup LLM
|
||||
- **Added context-aware input compaction** — LLM nodes now estimate input token count before calling the model and progressively truncate the largest values if they exceed the context window budget
|
||||
- **Increased rate limit retries to 10** with verbose `[retry]` and `[compaction]` logging that includes model name, finish reason, and attempt count
|
||||
- **Updated setup scripts** — `scripts/setup-python.sh` now installs Playwright Chromium browser automatically for web scraping support
|
||||
- **Interactive quickstart onboarding** — `quickstart.sh` rewritten as bee-themed interactive wizard that detects existing API keys (including Claude Code subscription), lets user pick ONE default LLM provider, and saves configuration to `~/.hive/configuration.json`
|
||||
- **Fixed lint errors** across `hubspot_tool.py` (line length) and `agent_builder_server.py` (unused variable)
|
||||
|
||||
## Changed files
|
||||
|
||||
### HubSpot Integration
|
||||
- `tools/src/aden_tools/tools/hubspot_tool/` — New MCP tool: contacts, companies, and deals CRUD
|
||||
- `tools/src/aden_tools/tools/__init__.py` — Registered HubSpot tools
|
||||
- `tools/src/aden_tools/credentials/integrations.py` — HubSpot credential integration
|
||||
- `tools/src/aden_tools/credentials/__init__.py` — Updated credential exports
|
||||
- `core/framework/credentials/oauth2/hubspot_provider.py` — HubSpot OAuth2 provider
|
||||
- `core/framework/credentials/oauth2/__init__.py` — Registered HubSpot OAuth2 provider
|
||||
- `core/framework/runner/runner.py` — Updated runner for credential support
|
||||
|
||||
### Web Scrape Rewrite
|
||||
- `tools/src/aden_tools/tools/web_scrape_tool/web_scrape_tool.py` — Playwright async rewrite
|
||||
- `tools/src/aden_tools/tools/web_scrape_tool/README.md` — Updated docs
|
||||
- `tools/pyproject.toml` — Added `playwright`, `playwright-stealth` deps
|
||||
- `tools/Dockerfile` — Added `playwright install chromium --with-deps`
|
||||
- `scripts/setup-python.sh` — Added Playwright Chromium browser install step
|
||||
|
||||
### LLM Reliability
|
||||
- `core/framework/llm/litellm.py` — Empty response retry + max retries 10 + verbose logging
|
||||
- `core/framework/graph/node.py` — Input compaction via `_compact_inputs()`, `_estimate_tokens()`, `_get_context_limit()`
|
||||
|
||||
### Quickstart & Setup
|
||||
- `quickstart.sh` — Interactive bee-themed onboarding wizard with single provider selection
|
||||
- `~/.hive/configuration.json` — New user config file for default LLM provider/model
|
||||
|
||||
### Fixes
|
||||
- `core/framework/mcp/agent_builder_server.py` — Removed unused variable
|
||||
- `tools/src/aden_tools/tools/hubspot_tool/hubspot_tool.py` — Fixed E501 line length violations
|
||||
|
||||
## Test plan
|
||||
- [ ] Run `make lint` — passes clean
|
||||
- [ ] Run `./quickstart.sh` and verify interactive flow works, config saved to `~/.hive/configuration.json`
|
||||
- [ ] Run `./scripts/setup-python.sh` and verify Playwright Chromium installs
|
||||
- [ ] Run `pytest tests/tools/test_web_scrape_tool.py -v`
|
||||
- [ ] Run agent against a JS-heavy site and verify `web_scrape` returns rendered content
|
||||
- [ ] Set `HUBSPOT_ACCESS_TOKEN` and verify HubSpot tool CRUD operations work
|
||||
- [ ] Trigger rate limit and verify `[retry]` logs appear with correct attempt counts
|
||||
- [ ] Run agent with large inputs and verify `[compaction]` logs show truncation
|
||||
|
||||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||||
@@ -1,5 +1,5 @@
|
||||
<p align="center">
|
||||
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
|
||||
<img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
@@ -13,16 +13,20 @@
|
||||
<a href="docs/i18n/ko.md">한국어</a>
|
||||
</p>
|
||||
|
||||
[](https://github.com/adenhq/hive/blob/main/LICENSE)
|
||||
[](https://www.ycombinator.com/companies/aden)
|
||||
[](https://discord.com/invite/MXE49hrKDk)
|
||||
[](https://x.com/aden_hq)
|
||||
[](https://www.linkedin.com/company/teamaden/)
|
||||
<p align="center">
|
||||
<a href="https://github.com/adenhq/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
|
||||
<a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
|
||||
<a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
|
||||
<a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
|
||||
<a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
|
||||
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
|
||||
</p>
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
|
||||
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
|
||||
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
|
||||
<img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
|
||||
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
|
||||
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
|
||||
</p>
|
||||
@@ -30,15 +34,16 @@
|
||||
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
|
||||
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
|
||||
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
|
||||
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
|
||||
</p>
|
||||
|
||||
## Overview
|
||||
|
||||
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
|
||||
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
|
||||
|
||||
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
|
||||
|
||||
https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
|
||||
|
||||
## Who Is Hive For?
|
||||
|
||||
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
|
||||
@@ -58,37 +63,22 @@ Hive may not be the best fit if you’re only experimenting with simple agent ch
|
||||
Use Hive when you need:
|
||||
|
||||
- Long-running, autonomous agents
|
||||
- Multi-agent coordination
|
||||
- Strong guardrails, process, and controls
|
||||
- Continuous improvement based on failures
|
||||
- Strong monitoring, safety, and budget controls
|
||||
- Multi-agent coordination
|
||||
- A framework that evolves with your goals
|
||||
|
||||
|
||||
## What is Aden
|
||||
|
||||
<p align="center">
|
||||
<img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
|
||||
</p>
|
||||
|
||||
Aden is a platform for building, deploying, operating, and adapting AI agents:
|
||||
|
||||
- **Build** - A Coding Agent generates specialized Worker Agents (Sales, Marketing, Ops) from natural language goals
|
||||
- **Deploy** - Headless deployment with CI/CD integration and full API lifecycle management
|
||||
- **Operate** - Real-time monitoring, observability, and runtime guardrails keep agents reliable
|
||||
- **Adapt** - Continuous evaluation, supervision, and adaptation ensure agents improve over time
|
||||
- **Infra** - Shared memory, LLM integrations, tools, and skills power every agent
|
||||
|
||||
## Quick Links
|
||||
|
||||
- **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
|
||||
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
|
||||
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
|
||||
<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
|
||||
- **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans
|
||||
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
|
||||
|
||||
## Quick Start
|
||||
|
||||
## Prerequisites
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.11+ for agent development
|
||||
- Claude Code or Cursor for utilizing agent skills
|
||||
@@ -107,45 +97,53 @@ cd hive
|
||||
```
|
||||
|
||||
This sets up:
|
||||
|
||||
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
|
||||
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
|
||||
- All required Python dependencies
|
||||
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
|
||||
- **LLM provider** - Interactive default model configuration
|
||||
- All required Python dependencies with `uv`
|
||||
|
||||
### Build Your First Agent
|
||||
|
||||
```bash
|
||||
# Build an agent using Claude Code
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
|
||||
# Test your agent
|
||||
claude> /testing-agent
|
||||
claude> /hive-debugger
|
||||
|
||||
# Run your agent
|
||||
PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
|
||||
# (at separate terminal) Launch the interactive dashboard
|
||||
hive tui
|
||||
|
||||
# Or run directly
|
||||
hive run exports/your_agent_name --input '{"key": "value"}'
|
||||
```
|
||||
|
||||
**[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
|
||||
|
||||
### Cursor IDE Support
|
||||
|
||||
Skills are also available in Cursor. To enable:
|
||||
|
||||
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
|
||||
2. Run `MCP: Enable` to enable MCP servers
|
||||
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
|
||||
4. Type `/` in Agent chat and search for skills (e.g., `/building-agents-construction`)
|
||||
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
|
||||
|
||||
## Features
|
||||
|
||||
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
|
||||
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
|
||||
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
|
||||
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
|
||||
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
|
||||
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
|
||||
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
|
||||
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
|
||||
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
|
||||
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
|
||||
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
|
||||
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
|
||||
- **Production-Ready** - Self-hostable, built for scale and reliability
|
||||
|
||||
## Integration
|
||||
|
||||
<img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" />
|
||||
|
||||
Hive is built to be model-agnostic and system-agnostic.
|
||||
|
||||
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
|
||||
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
|
||||
|
||||
|
||||
## Why Aden
|
||||
|
||||
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
|
||||
@@ -182,67 +180,60 @@ flowchart LR
|
||||
style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
|
||||
```
|
||||
|
||||
### The Aden Advantage
|
||||
### The Hive Advantage
|
||||
|
||||
| Traditional Frameworks | Aden |
|
||||
| Traditional Frameworks | Hive |
|
||||
| -------------------------- | -------------------------------------- |
|
||||
| Hardcode agent workflows | Describe goals in natural language |
|
||||
| Manual graph definition | Auto-generated agent graphs |
|
||||
| Reactive error handling | Outcome-evaluation and adaptiveness |
|
||||
| Reactive error handling | Outcome-evaluation and adaptiveness |
|
||||
| Static tool configurations | Dynamic SDK-wrapped nodes |
|
||||
| Separate monitoring setup | Built-in real-time observability |
|
||||
| DIY budget management | Integrated cost controls & degradation |
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Define Your Goal** → Describe what you want to achieve in plain English
|
||||
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
|
||||
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
|
||||
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
|
||||
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
|
||||
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
|
||||
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
|
||||
5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
|
||||
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
|
||||
|
||||
## Run pre-built Agents (Coming Soon)
|
||||
## Run Agents
|
||||
|
||||
### Run a sample agent
|
||||
Aden Hive provides a list of featured agents that you can use and build on top of.
|
||||
|
||||
### Run an agent shared by others
|
||||
Put the agent in `exports/` and run `PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'`
|
||||
|
||||
|
||||
For building and running goal-driven agents with the framework:
|
||||
The `hive` CLI is the primary interface for running agents.
|
||||
|
||||
```bash
|
||||
# One-time setup
|
||||
./quickstart.sh
|
||||
# Browse and run agents interactively (Recommended)
|
||||
hive tui
|
||||
|
||||
# This sets up:
|
||||
# - framework package (core runtime)
|
||||
# - aden_tools package (MCP tools)
|
||||
# - All Python dependencies
|
||||
# Run a specific agent directly
|
||||
hive run exports/my_agent --input '{"task": "Your input here"}'
|
||||
|
||||
# Build new agents using Claude Code skills
|
||||
claude> /building-agents-construction
|
||||
# Run a specific agent with the TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
|
||||
# Test agents
|
||||
claude> /testing-agent
|
||||
|
||||
# Run agents
|
||||
PYTHONPATH=core:exports python -m agent_name run --input '{...}'
|
||||
# Interactive REPL
|
||||
hive shell
|
||||
```
|
||||
|
||||
See [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) for complete setup instructions.
|
||||
The TUI scans both `exports/` and `examples/templates/` for available agents.
|
||||
|
||||
> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
|
||||
|
||||
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
|
||||
|
||||
## Documentation
|
||||
|
||||
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
|
||||
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
|
||||
- [Getting Started](docs/getting-started.md) - Quick setup instructions
|
||||
- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
|
||||
- [Configuration Guide](docs/configuration.md) - All configuration options
|
||||
- [Architecture Overview](docs/architecture/README.md) - System design and structure
|
||||
|
||||
## Roadmap
|
||||
|
||||
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [ROADMAP.md](ROADMAP.md) for details.
|
||||
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
@@ -332,11 +323,12 @@ end
|
||||
|
||||
classDef done fill:#9e9e9e,color:#fff,stroke:#757575
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
We welcome contributions from the community! We’re especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/adenhq/hive/issues/2805)). If you’re interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
||||
|
||||
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
|
||||
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
|
||||
|
||||
1. Find or create an issue and get assigned
|
||||
2. Fork the repository
|
||||
@@ -369,10 +361,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
|
||||
|
||||
## Frequently Asked Questions (FAQ)
|
||||
|
||||
**Q: Does Hive depend on LangChain or other agent frameworks?**
|
||||
|
||||
No. Hive is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
|
||||
|
||||
**Q: What LLM providers does Hive support?**
|
||||
|
||||
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
|
||||
@@ -383,37 +371,25 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma
|
||||
|
||||
**Q: What makes Hive different from other agent frameworks?**
|
||||
|
||||
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
|
||||
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
|
||||
|
||||
**Q: Is Hive open-source?**
|
||||
|
||||
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
|
||||
|
||||
**Q: Does Hive collect data from users?**
|
||||
|
||||
Hive collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
|
||||
|
||||
**Q: What deployment options does Hive support?**
|
||||
|
||||
Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](ENVIRONMENT_SETUP.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
|
||||
|
||||
**Q: Can Hive handle complex, production-scale use cases?**
|
||||
|
||||
Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
|
||||
|
||||
**Q: Does Hive support human-in-the-loop workflows?**
|
||||
|
||||
Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
|
||||
|
||||
**Q: What monitoring and debugging tools does Hive provide?**
|
||||
|
||||
Hive includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and MCP tools for agent execution, including file operations, web search, data processing, and more.
|
||||
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
|
||||
|
||||
**Q: What programming languages does Hive support?**
|
||||
|
||||
The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.
|
||||
|
||||
**Q: Can Aden agents interact with external tools and APIs?**
|
||||
**Q: Can Hive agents interact with external tools and APIs?**
|
||||
|
||||
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
|
||||
|
||||
@@ -423,7 +399,7 @@ Hive provides granular budget controls including spending limits, throttles, and
|
||||
|
||||
**Q: Where can I find examples and documentation?**
|
||||
|
||||
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
|
||||
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
|
||||
|
||||
**Q: How can I contribute to Aden?**
|
||||
|
||||
@@ -437,10 +413,6 @@ Aden's adaptation loop begins working from the first execution. When an agent fa
|
||||
|
||||
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
|
||||
|
||||
**Q: Does Aden offer enterprise support?**
|
||||
|
||||
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
|
||||
+11
-11
@@ -14,7 +14,7 @@ Framework provides a runtime framework that captures **decisions**, not just act
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
## MCP Server Setup
|
||||
@@ -45,13 +45,13 @@ If you prefer manual setup:
|
||||
|
||||
```bash
|
||||
# Install framework
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
|
||||
# Install MCP dependencies
|
||||
pip install mcp fastmcp
|
||||
uv pip install mcp fastmcp
|
||||
|
||||
# Test the server
|
||||
python -m framework.mcp.agent_builder_server
|
||||
uv run python -m framework.mcp.agent_builder_server
|
||||
```
|
||||
|
||||
### Using with MCP Clients
|
||||
@@ -86,13 +86,13 @@ Run an LLM-powered calculator:
|
||||
|
||||
```bash
|
||||
# Single calculation
|
||||
python -m framework calculate "2 + 3 * 4"
|
||||
uv run python -m framework calculate "2 + 3 * 4"
|
||||
|
||||
# Interactive mode
|
||||
python -m framework interactive
|
||||
uv run python -m framework interactive
|
||||
|
||||
# Analyze runs with Builder
|
||||
python -m framework analyze calculator
|
||||
uv run python -m framework analyze calculator
|
||||
```
|
||||
|
||||
### Using the Runtime
|
||||
@@ -136,16 +136,16 @@ Tests are generated using MCP tools (`generate_constraint_tests`, `generate_succ
|
||||
|
||||
```bash
|
||||
# Run tests against an agent
|
||||
python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
|
||||
uv run python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
|
||||
|
||||
# Debug failed tests
|
||||
python -m framework test-debug <agent_path> <test_name>
|
||||
uv run python -m framework test-debug <agent_path> <test_name>
|
||||
|
||||
# List tests for a goal
|
||||
python -m framework test-list <goal_id>
|
||||
uv run python -m framework test-list <goal_id>
|
||||
```
|
||||
|
||||
For detailed testing workflows, see the [testing-agent skill](../.claude/skills/testing-agent/SKILL.md).
|
||||
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
|
||||
|
||||
### Analyzing Agent Behavior with Builder
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ for understanding the core runtime loop:
|
||||
Setup -> Graph definition -> Execution -> Result
|
||||
|
||||
Run with:
|
||||
PYTHONPATH=core python core/examples/manual_agent.py
|
||||
uv run python core/examples/manual_agent.py
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
|
||||
@@ -4,8 +4,8 @@
|
||||
"name": "tools",
|
||||
"description": "Aden tools including web search, file operations, and PDF reading",
|
||||
"transport": "stdio",
|
||||
"command": "python",
|
||||
"args": ["mcp_server.py", "--stdio"],
|
||||
"command": "uv",
|
||||
"args": ["run", "python", "mcp_server.py", "--stdio"],
|
||||
"cwd": "../tools",
|
||||
"env": {
|
||||
"BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
|
||||
|
||||
@@ -44,6 +44,13 @@ def _configure_paths():
|
||||
if exports_str not in sys.path:
|
||||
sys.path.insert(0, exports_str)
|
||||
|
||||
# Add examples/templates/ to sys.path so template agents are importable
|
||||
templates_dir = project_root / "examples" / "templates"
|
||||
if templates_dir.is_dir():
|
||||
templates_str = str(templates_dir)
|
||||
if templates_str not in sys.path:
|
||||
sys.path.insert(0, templates_str)
|
||||
|
||||
# Ensure core/ is also in sys.path (for non-editable-install scenarios)
|
||||
core_str = str(project_root / "core")
|
||||
if (project_root / "core").is_dir() and core_str not in sys.path:
|
||||
|
||||
@@ -96,7 +96,7 @@ class BaseOAuth2Provider(CredentialProvider):
|
||||
self._client = httpx.Client(timeout=self.config.request_timeout)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"OAuth2 provider requires 'httpx'. Install with: pip install httpx"
|
||||
"OAuth2 provider requires 'httpx'. Install with: uv pip install httpx"
|
||||
) from e
|
||||
return self._client
|
||||
|
||||
|
||||
@@ -136,7 +136,8 @@ class EncryptedFileStorage(CredentialStorage):
|
||||
from cryptography.fernet import Fernet
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Encrypted storage requires 'cryptography'. Install with: pip install cryptography"
|
||||
"Encrypted storage requires 'cryptography'. "
|
||||
"Install with: uv pip install cryptography"
|
||||
) from e
|
||||
|
||||
self.base_path = Path(base_path or self.DEFAULT_PATH).expanduser()
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
HashiCorp Vault storage adapter.
|
||||
|
||||
Provides integration with HashiCorp Vault for enterprise secret management.
|
||||
Requires the 'hvac' package: pip install hvac
|
||||
Requires the 'hvac' package: uv pip install hvac
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -66,7 +66,7 @@ class HashiCorpVaultStorage(CredentialStorage):
|
||||
- AWS IAM auth method
|
||||
|
||||
Requirements:
|
||||
pip install hvac
|
||||
uv pip install hvac
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
@@ -97,7 +97,7 @@ class HashiCorpVaultStorage(CredentialStorage):
|
||||
import hvac
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"HashiCorp Vault support requires 'hvac'. Install with: pip install hvac"
|
||||
"HashiCorp Vault support requires 'hvac'. Install with: uv pip install hvac"
|
||||
) from e
|
||||
|
||||
self._url = url
|
||||
|
||||
@@ -156,6 +156,10 @@ class EdgeSpec(BaseModel):
|
||||
memory: dict[str, Any],
|
||||
) -> bool:
|
||||
"""Evaluate a conditional expression."""
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if not self.condition_expr:
|
||||
return True
|
||||
|
||||
@@ -172,12 +176,24 @@ class EdgeSpec(BaseModel):
|
||||
|
||||
try:
|
||||
# Safe evaluation using AST-based whitelist
|
||||
return bool(safe_eval(self.condition_expr, context))
|
||||
result = bool(safe_eval(self.condition_expr, context))
|
||||
# Log the evaluation for visibility
|
||||
# Extract the variable names used in the expression for debugging
|
||||
expr_vars = {
|
||||
k: repr(context[k])
|
||||
for k in context
|
||||
if k not in ("output", "memory", "result", "true", "false")
|
||||
and k in self.condition_expr
|
||||
}
|
||||
logger.info(
|
||||
" Edge %s: condition '%s' → %s (vars: %s)",
|
||||
self.id,
|
||||
self.condition_expr,
|
||||
result,
|
||||
expr_vars or "none matched",
|
||||
)
|
||||
return result
|
||||
except Exception as e:
|
||||
# Log the error for debugging
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.warning(f" ⚠ Condition evaluation failed: {self.condition_expr}")
|
||||
logger.warning(f" Error: {e}")
|
||||
logger.warning(f" Available context keys: {list(context.keys())}")
|
||||
@@ -419,6 +435,12 @@ class GraphSpec(BaseModel):
|
||||
max_steps: int = Field(default=100, description="Maximum node executions before timeout")
|
||||
max_retries_per_node: int = 3
|
||||
|
||||
# EventLoopNode configuration (from configure_loop)
|
||||
loop_config: dict[str, Any] = Field(
|
||||
default_factory=dict,
|
||||
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
|
||||
)
|
||||
|
||||
# Metadata
|
||||
description: str = ""
|
||||
created_by: str = "" # "human" or "builder_agent"
|
||||
|
||||
@@ -74,6 +74,11 @@ class LoopConfig:
|
||||
max_history_tokens: int = 32_000
|
||||
store_prefix: str = ""
|
||||
|
||||
# Overflow margin for max_tool_calls_per_turn. Tool calls are only
|
||||
# discarded when the count exceeds max_tool_calls_per_turn * (1 + margin).
|
||||
# Default 0.5 means 50% wiggle room (e.g. limit=10 → hard cutoff at 15).
|
||||
tool_call_overflow_margin: float = 0.5
|
||||
|
||||
# --- Tool result context management ---
|
||||
# When a tool result exceeds this character count, it is truncated in the
|
||||
# conversation context. If *spillover_dir* is set the full result is
|
||||
@@ -144,19 +149,19 @@ class EventLoopNode(NodeProtocol):
|
||||
1. Try to restore from durable state (crash recovery)
|
||||
2. If no prior state, init from NodeSpec.system_prompt + input_keys
|
||||
3. Loop: drain injection queue -> stream LLM -> execute tools
|
||||
-> if client_facing + no tools: block for user input (inject_event)
|
||||
-> if not client_facing or tools present: judge evaluates
|
||||
-> if client_facing + ask_user called: block for user input
|
||||
-> judge evaluates (acceptance criteria)
|
||||
(each add_* and set_output writes through to store immediately)
|
||||
4. Publish events to EventBus at each stage
|
||||
5. Write cursor after each iteration
|
||||
6. Terminate when judge returns ACCEPT, shutdown signaled, or max iterations
|
||||
7. Build output dict from OutputAccumulator
|
||||
|
||||
Client-facing blocking: When ``client_facing=True`` and the LLM produces
|
||||
text without tool calls (a natural conversational turn), the node blocks
|
||||
via ``_await_user_input()`` until ``inject_event()`` or ``signal_shutdown()``
|
||||
is called. This separates blocking (node concern) from output evaluation
|
||||
(judge concern).
|
||||
Client-facing blocking: When ``client_facing=True``, a synthetic
|
||||
``ask_user`` tool is injected. The node blocks for user input ONLY
|
||||
when the LLM explicitly calls ``ask_user()``. Text-only turns
|
||||
without ``ask_user`` flow through without blocking, allowing the LLM
|
||||
to stream progress updates and summaries freely.
|
||||
|
||||
Always returns NodeResult with retryable=False semantics. The executor
|
||||
must NOT retry event loop nodes -- retry is handled internally by the
|
||||
@@ -205,15 +210,36 @@ class EventLoopNode(NodeProtocol):
|
||||
stream_id = ctx.node_id
|
||||
node_id = ctx.node_id
|
||||
|
||||
# Verdict counters for runtime logging
|
||||
_accept_count = _retry_count = _escalate_count = _continue_count = 0
|
||||
|
||||
# 1. Guard: LLM required
|
||||
if ctx.llm is None:
|
||||
return NodeResult(success=False, error="LLM provider not available")
|
||||
error_msg = "LLM provider not available"
|
||||
# Log guard failure
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error=error_msg,
|
||||
exit_status="guard_failure",
|
||||
total_steps=0,
|
||||
tokens_used=0,
|
||||
input_tokens=0,
|
||||
output_tokens=0,
|
||||
latency_ms=0,
|
||||
)
|
||||
return NodeResult(success=False, error=error_msg)
|
||||
|
||||
# 2. Restore or create new conversation + accumulator
|
||||
conversation, accumulator, start_iteration = await self._restore(ctx)
|
||||
if conversation is None:
|
||||
system_prompt = ctx.node_spec.system_prompt or ""
|
||||
|
||||
conversation = NodeConversation(
|
||||
system_prompt=ctx.node_spec.system_prompt or "",
|
||||
system_prompt=system_prompt,
|
||||
max_history_tokens=self._config.max_history_tokens,
|
||||
output_keys=ctx.node_spec.output_keys or None,
|
||||
store=self._conversation_store,
|
||||
@@ -226,11 +252,13 @@ class EventLoopNode(NodeProtocol):
|
||||
if initial_message:
|
||||
await conversation.add_user_message(initial_message)
|
||||
|
||||
# 3. Build tool list: node tools + synthetic set_output tool
|
||||
# 3. Build tool list: node tools + synthetic set_output + ask_user tools
|
||||
tools = list(ctx.available_tools)
|
||||
set_output_tool = self._build_set_output_tool(ctx.node_spec.output_keys)
|
||||
if set_output_tool:
|
||||
tools.append(set_output_tool)
|
||||
if ctx.node_spec.client_facing:
|
||||
tools.append(self._build_ask_user_tool())
|
||||
|
||||
logger.info(
|
||||
"[%s] Tools available (%d): %s | client_facing=%s | judge=%s",
|
||||
@@ -249,9 +277,28 @@ class EventLoopNode(NodeProtocol):
|
||||
|
||||
# 6. Main loop
|
||||
for iteration in range(start_iteration, self._config.max_iterations):
|
||||
# 6a. Check pause
|
||||
iter_start = time.time()
|
||||
|
||||
# 6a. Check pause (no current-iteration data yet — only log_node_complete needed)
|
||||
if await self._check_pause(ctx, conversation, iteration):
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=iteration,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="paused",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
@@ -276,20 +323,73 @@ class EventLoopNode(NodeProtocol):
|
||||
iteration,
|
||||
len(conversation.messages),
|
||||
)
|
||||
assistant_text, tool_results_list, turn_tokens = await self._run_single_turn(
|
||||
ctx, conversation, tools, iteration, accumulator
|
||||
)
|
||||
logger.info(
|
||||
"[%s] iter=%d: LLM done — text=%d chars, tool_calls=%d, tokens=%s, accumulator=%s",
|
||||
node_id,
|
||||
iteration,
|
||||
len(assistant_text),
|
||||
len(tool_results_list),
|
||||
turn_tokens,
|
||||
{k: ("set" if v is not None else "None") for k, v in accumulator.to_dict().items()},
|
||||
)
|
||||
total_input_tokens += turn_tokens.get("input", 0)
|
||||
total_output_tokens += turn_tokens.get("output", 0)
|
||||
try:
|
||||
(
|
||||
assistant_text,
|
||||
real_tool_results,
|
||||
outputs_set,
|
||||
turn_tokens,
|
||||
logged_tool_calls,
|
||||
user_input_requested,
|
||||
) = await self._run_single_turn(ctx, conversation, tools, iteration, accumulator)
|
||||
logger.info(
|
||||
"[%s] iter=%d: LLM done — text=%d chars, real_tools=%d, "
|
||||
"outputs_set=%s, tokens=%s, accumulator=%s",
|
||||
node_id,
|
||||
iteration,
|
||||
len(assistant_text),
|
||||
len(real_tool_results),
|
||||
outputs_set or "[]",
|
||||
turn_tokens,
|
||||
{
|
||||
k: ("set" if v is not None else "None")
|
||||
for k, v in accumulator.to_dict().items()
|
||||
},
|
||||
)
|
||||
total_input_tokens += turn_tokens.get("input", 0)
|
||||
total_output_tokens += turn_tokens.get("output", 0)
|
||||
except Exception as e:
|
||||
# LLM call crashed - log partial step with error
|
||||
import traceback
|
||||
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
error_msg = f"LLM call failed: {e}"
|
||||
stack_trace = traceback.format_exc()
|
||||
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
error=error_msg,
|
||||
stacktrace=stack_trace,
|
||||
is_partial=True,
|
||||
input_tokens=0,
|
||||
output_tokens=0,
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error=error_msg,
|
||||
stacktrace=stack_trace,
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="failure",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
|
||||
# Re-raise to maintain existing error handling
|
||||
raise
|
||||
|
||||
# 6e'. Feed actual API token count back for accurate estimation
|
||||
turn_input = turn_tokens.get("input", 0)
|
||||
@@ -300,6 +400,36 @@ class EventLoopNode(NodeProtocol):
|
||||
if conversation.needs_compaction():
|
||||
await self._compact_tiered(ctx, conversation, accumulator)
|
||||
|
||||
# 6e'''. Empty response guard — if the LLM returned nothing
|
||||
# (no text, no real tools, no set_output) and all required
|
||||
# outputs are already set, accept immediately. This prevents
|
||||
# wasted iterations when the LLM has genuinely finished its
|
||||
# work (e.g. after calling set_output in a previous turn).
|
||||
truly_empty = (
|
||||
not assistant_text
|
||||
and not real_tool_results
|
||||
and not outputs_set
|
||||
and not user_input_requested
|
||||
)
|
||||
if truly_empty and accumulator is not None:
|
||||
missing = self._get_missing_output_keys(
|
||||
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
|
||||
)
|
||||
if not missing:
|
||||
logger.info(
|
||||
"[%s] iter=%d: empty response but all outputs set — accepting",
|
||||
node_id,
|
||||
iteration,
|
||||
)
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
# 6f. Stall detection
|
||||
recent_responses.append(assistant_text)
|
||||
if len(recent_responses) > self._config.stall_detection_threshold:
|
||||
@@ -307,6 +437,38 @@ class EventLoopNode(NodeProtocol):
|
||||
if self._is_stalled(recent_responses):
|
||||
await self._publish_stalled(stream_id, node_id)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
_continue_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="CONTINUE",
|
||||
verdict_feedback="Stall detected before judge evaluation",
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error="Node stalled",
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="stalled",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=False,
|
||||
error=(
|
||||
@@ -321,21 +483,50 @@ class EventLoopNode(NodeProtocol):
|
||||
# 6g. Write cursor checkpoint
|
||||
await self._write_cursor(ctx, conversation, accumulator, iteration)
|
||||
|
||||
# 6h. Client-facing input wait
|
||||
logger.info(
|
||||
"[%s] iter=%d: 6h check — client_facing=%s, tool_results=%d",
|
||||
node_id,
|
||||
iteration,
|
||||
ctx.node_spec.client_facing,
|
||||
len(tool_results_list),
|
||||
)
|
||||
if ctx.node_spec.client_facing and not tool_results_list:
|
||||
# LLM finished speaking (no tool calls) on a client-facing node.
|
||||
# This is a conversational turn boundary: block for user input
|
||||
# instead of running the judge.
|
||||
# 6h. Client-facing input blocking
|
||||
#
|
||||
# For client_facing nodes, block for user input only when the
|
||||
# LLM explicitly called ask_user(). Text-only turns without
|
||||
# ask_user flow through without blocking, allowing progress
|
||||
# updates and summaries to stream freely.
|
||||
#
|
||||
# After user input, always fall through to judge evaluation
|
||||
# (6i). The judge handles all acceptance decisions.
|
||||
if ctx.node_spec.client_facing and user_input_requested:
|
||||
if self._shutdown:
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
_continue_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="CONTINUE",
|
||||
verdict_feedback="Shutdown signaled (client-facing)",
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="success",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
@@ -347,9 +538,39 @@ class EventLoopNode(NodeProtocol):
|
||||
got_input = await self._await_user_input(ctx)
|
||||
logger.info("[%s] iter=%d: unblocked, got_input=%s", node_id, iteration, got_input)
|
||||
if not got_input:
|
||||
# Shutdown signaled during wait
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
_continue_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="CONTINUE",
|
||||
verdict_feedback="No input received (shutdown during wait)",
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="success",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
@@ -357,118 +578,217 @@ class EventLoopNode(NodeProtocol):
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
# Clear stall detection — user input resets the conversation
|
||||
recent_responses.clear()
|
||||
|
||||
# For nodes with an explicit judge, fall through to judge
|
||||
# evaluation so the LLM gets structured feedback about missing
|
||||
# outputs (e.g. "Missing output keys: [...]"). Without this,
|
||||
# the LLM may generate text like "Ready to proceed!" without
|
||||
# ever calling set_output, and the judge feedback never reaches it.
|
||||
#
|
||||
# For nodes without a judge (HITL review/approval with all-
|
||||
# nullable keys), keep conversing UNLESS the LLM has already
|
||||
# set an output — in that case fall through to the implicit
|
||||
# judge which will ACCEPT and terminate the node.
|
||||
if self._judge is None:
|
||||
has_outputs = accumulator and any(
|
||||
v is not None for v in accumulator.to_dict().values()
|
||||
)
|
||||
if not has_outputs:
|
||||
logger.info(
|
||||
"[%s] iter=%d: no judge, no outputs, continuing",
|
||||
node_id,
|
||||
iteration,
|
||||
)
|
||||
continue
|
||||
logger.info(
|
||||
"[%s] iter=%d: no judge, outputs set — implicit judge",
|
||||
node_id,
|
||||
iteration,
|
||||
)
|
||||
else:
|
||||
logger.info(
|
||||
"[%s] iter=%d: has judge, falling through to 6i",
|
||||
node_id,
|
||||
iteration,
|
||||
)
|
||||
# Fall through to judge evaluation (6i)
|
||||
|
||||
# 6i. Judge evaluation
|
||||
should_judge = (
|
||||
(iteration + 1) % self._config.judge_every_n_turns == 0
|
||||
or not tool_results_list # no tool calls = natural stop
|
||||
or not real_tool_results # no real tool calls = natural stop
|
||||
)
|
||||
|
||||
logger.info("[%s] iter=%d: 6i should_judge=%s", node_id, iteration, should_judge)
|
||||
if should_judge:
|
||||
verdict = await self._evaluate(
|
||||
ctx,
|
||||
conversation,
|
||||
accumulator,
|
||||
assistant_text,
|
||||
tool_results_list,
|
||||
iteration,
|
||||
)
|
||||
fb_preview = (verdict.feedback or "")[:200]
|
||||
logger.info(
|
||||
"[%s] iter=%d: judge verdict=%s feedback=%r",
|
||||
node_id,
|
||||
iteration,
|
||||
verdict.action,
|
||||
fb_preview,
|
||||
)
|
||||
|
||||
if verdict.action == "ACCEPT":
|
||||
# Check for missing output keys
|
||||
missing = self._get_missing_output_keys(
|
||||
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
|
||||
if not should_judge:
|
||||
# Gap C: unjudged iteration — log as CONTINUE
|
||||
_continue_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="CONTINUE",
|
||||
verdict_feedback="Unjudged (judge_every_n_turns skip)",
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
if missing and self._judge is not None:
|
||||
hint = (
|
||||
f"Missing required output keys: {missing}. "
|
||||
"Use set_output to provide them."
|
||||
)
|
||||
logger.info(
|
||||
"[%s] iter=%d: ACCEPT but missing keys %s",
|
||||
node_id,
|
||||
iteration,
|
||||
missing,
|
||||
)
|
||||
await conversation.add_user_message(hint)
|
||||
continue
|
||||
continue
|
||||
|
||||
# Write outputs to shared memory
|
||||
for key, value in accumulator.to_dict().items():
|
||||
ctx.memory.write(key, value, validate=False)
|
||||
# Judge evaluation (should_judge is always True here)
|
||||
verdict = await self._evaluate(
|
||||
ctx,
|
||||
conversation,
|
||||
accumulator,
|
||||
assistant_text,
|
||||
real_tool_results,
|
||||
iteration,
|
||||
)
|
||||
fb_preview = (verdict.feedback or "")[:200]
|
||||
logger.info(
|
||||
"[%s] iter=%d: judge verdict=%s feedback=%r",
|
||||
node_id,
|
||||
iteration,
|
||||
verdict.action,
|
||||
fb_preview,
|
||||
)
|
||||
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
return NodeResult(
|
||||
if verdict.action == "ACCEPT":
|
||||
# Check for missing output keys
|
||||
missing = self._get_missing_output_keys(
|
||||
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
|
||||
)
|
||||
if missing and self._judge is not None:
|
||||
hint = (
|
||||
f"Missing required output keys: {missing}. Use set_output to provide them."
|
||||
)
|
||||
logger.info(
|
||||
"[%s] iter=%d: ACCEPT but missing keys %s",
|
||||
node_id,
|
||||
iteration,
|
||||
missing,
|
||||
)
|
||||
await conversation.add_user_message(hint)
|
||||
# Gap D: log ACCEPT-with-missing-keys as RETRY
|
||||
_retry_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="RETRY",
|
||||
verdict_feedback=(f"Judge accepted but missing output keys: {missing}"),
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
continue
|
||||
|
||||
# Exit point 5: Judge ACCEPT — log step + log_node_complete
|
||||
# Write outputs to shared memory
|
||||
for key, value in accumulator.to_dict().items():
|
||||
ctx.memory.write(key, value, validate=False)
|
||||
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
_accept_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="ACCEPT",
|
||||
verdict_feedback=verdict.feedback,
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="success",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=accumulator.to_dict(),
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
elif verdict.action == "ESCALATE":
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
return NodeResult(
|
||||
elif verdict.action == "ESCALATE":
|
||||
# Exit point 6: Judge ESCALATE — log step + log_node_complete
|
||||
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
_escalate_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="ESCALATE",
|
||||
verdict_feedback=verdict.feedback,
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error=f"Judge escalated: {verdict.feedback}",
|
||||
output=accumulator.to_dict(),
|
||||
total_steps=iteration + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="escalated",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=False,
|
||||
error=f"Judge escalated: {verdict.feedback}",
|
||||
output=accumulator.to_dict(),
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
elif verdict.action == "RETRY":
|
||||
if verdict.feedback:
|
||||
await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
|
||||
continue
|
||||
elif verdict.action == "RETRY":
|
||||
_retry_count += 1
|
||||
if ctx.runtime_logger:
|
||||
iter_latency_ms = int((time.time() - iter_start) * 1000)
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=node_id,
|
||||
node_type="event_loop",
|
||||
step_index=iteration,
|
||||
verdict="RETRY",
|
||||
verdict_feedback=verdict.feedback,
|
||||
tool_calls=logged_tool_calls,
|
||||
llm_text=assistant_text,
|
||||
input_tokens=turn_tokens.get("input", 0),
|
||||
output_tokens=turn_tokens.get("output", 0),
|
||||
latency_ms=iter_latency_ms,
|
||||
)
|
||||
if verdict.feedback:
|
||||
await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
|
||||
continue
|
||||
|
||||
# 7. Max iterations exhausted
|
||||
await self._publish_loop_completed(stream_id, node_id, self._config.max_iterations)
|
||||
latency_ms = int((time.time() - start_time) * 1000)
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error=f"Max iterations ({self._config.max_iterations}) reached without acceptance",
|
||||
total_steps=self._config.max_iterations,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
exit_status="failure",
|
||||
accept_count=_accept_count,
|
||||
retry_count=_retry_count,
|
||||
escalate_count=_escalate_count,
|
||||
continue_count=_continue_count,
|
||||
)
|
||||
return NodeResult(
|
||||
success=False,
|
||||
error=(f"Max iterations ({self._config.max_iterations}) reached without acceptance"),
|
||||
@@ -499,8 +819,8 @@ class EventLoopNode(NodeProtocol):
|
||||
async def _await_user_input(self, ctx: NodeContext) -> bool:
|
||||
"""Block until user input arrives or shutdown is signaled.
|
||||
|
||||
Called when a client_facing node produces text without tool calls —
|
||||
a natural conversational turn boundary.
|
||||
Called when a client_facing node explicitly calls ask_user() —
|
||||
an intentional conversational turn boundary.
|
||||
|
||||
Returns True if input arrived, False if shutdown was signaled.
|
||||
"""
|
||||
@@ -526,16 +846,35 @@ class EventLoopNode(NodeProtocol):
|
||||
tools: list[Tool],
|
||||
iteration: int,
|
||||
accumulator: OutputAccumulator,
|
||||
) -> tuple[str, list[dict], dict[str, int]]:
|
||||
) -> tuple[str, list[dict], list[str], dict[str, int], list[dict], bool]:
|
||||
"""Run a single LLM turn with streaming and tool execution.
|
||||
|
||||
Returns (assistant_text, tool_results, token_counts).
|
||||
Returns (assistant_text, real_tool_results, outputs_set, token_counts, logged_tool_calls,
|
||||
user_input_requested).
|
||||
|
||||
``real_tool_results`` contains only results from actual tools (web_search,
|
||||
etc.), NOT from the synthetic ``set_output`` or ``ask_user`` tools.
|
||||
``outputs_set`` lists the output keys written via ``set_output`` during
|
||||
this turn. ``user_input_requested`` is True if the LLM called
|
||||
``ask_user`` during this turn. This separation lets the caller treat
|
||||
synthetic tools as framework concerns rather than tool-execution concerns.
|
||||
|
||||
``logged_tool_calls`` accumulates ALL tool calls across inner iterations
|
||||
(real tools, set_output, and discarded calls) for L3 logging. Unlike
|
||||
``real_tool_results`` which resets each inner iteration, this list grows
|
||||
across the entire turn.
|
||||
"""
|
||||
stream_id = ctx.node_id
|
||||
node_id = ctx.node_id
|
||||
token_counts: dict[str, int] = {"input": 0, "output": 0}
|
||||
tool_call_count = 0
|
||||
final_text = ""
|
||||
# Track output keys set via set_output across all inner iterations
|
||||
outputs_set_this_turn: list[str] = []
|
||||
user_input_requested = False
|
||||
# Accumulate ALL tool calls across inner iterations for L3 logging.
|
||||
# Unlike real_tool_results (reset each inner iteration), this persists.
|
||||
logged_tool_calls: list[dict] = []
|
||||
|
||||
# Inner tool loop: stream may produce tool calls requiring re-invocation
|
||||
while True:
|
||||
@@ -606,15 +945,25 @@ class EventLoopNode(NodeProtocol):
|
||||
|
||||
# If no tool calls, turn is complete
|
||||
if not tool_calls:
|
||||
return final_text, [], token_counts
|
||||
return (
|
||||
final_text,
|
||||
[],
|
||||
outputs_set_this_turn,
|
||||
token_counts,
|
||||
logged_tool_calls,
|
||||
user_input_requested,
|
||||
)
|
||||
|
||||
# Execute tool calls
|
||||
tool_results: list[dict] = []
|
||||
# Execute tool calls — separate real tools from set_output
|
||||
real_tool_results: list[dict] = []
|
||||
limit_hit = False
|
||||
executed_in_batch = 0
|
||||
hard_limit = int(
|
||||
self._config.max_tool_calls_per_turn * (1 + self._config.tool_call_overflow_margin)
|
||||
)
|
||||
for tc in tool_calls:
|
||||
tool_call_count += 1
|
||||
if tool_call_count > self._config.max_tool_calls_per_turn:
|
||||
if tool_call_count > hard_limit:
|
||||
limit_hit = True
|
||||
break
|
||||
executed_in_batch += 1
|
||||
@@ -624,43 +973,73 @@ class EventLoopNode(NodeProtocol):
|
||||
stream_id, node_id, tc.tool_use_id, tc.tool_name, tc.tool_input
|
||||
)
|
||||
|
||||
# Handle set_output synthetic tool
|
||||
logger.info(
|
||||
"[%s] tool_call: %s(%s)",
|
||||
node_id,
|
||||
tc.tool_name,
|
||||
json.dumps(tc.tool_input)[:200],
|
||||
)
|
||||
|
||||
if tc.tool_name == "set_output":
|
||||
# --- Framework-level set_output handling ---
|
||||
result = self._handle_set_output(tc.tool_input, ctx.node_spec.output_keys)
|
||||
result = ToolResult(
|
||||
tool_use_id=tc.tool_use_id,
|
||||
content=result.content,
|
||||
is_error=result.is_error,
|
||||
)
|
||||
# Async write-through for set_output
|
||||
if not result.is_error:
|
||||
await accumulator.set(tc.tool_input["key"], tc.tool_input["value"])
|
||||
value = tc.tool_input["value"]
|
||||
# Parse JSON strings into native types so downstream
|
||||
# consumers get lists/dicts instead of serialised JSON,
|
||||
# and the hallucination validator skips non-string values.
|
||||
if isinstance(value, str):
|
||||
try:
|
||||
parsed = json.loads(value)
|
||||
if isinstance(parsed, (list, dict, bool, int, float)):
|
||||
value = parsed
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
await accumulator.set(tc.tool_input["key"], value)
|
||||
outputs_set_this_turn.append(tc.tool_input["key"])
|
||||
logged_tool_calls.append(
|
||||
{
|
||||
"tool_use_id": tc.tool_use_id,
|
||||
"tool_name": "set_output",
|
||||
"tool_input": tc.tool_input,
|
||||
"content": result.content,
|
||||
"is_error": result.is_error,
|
||||
}
|
||||
)
|
||||
elif tc.tool_name == "ask_user":
|
||||
# --- Framework-level ask_user handling ---
|
||||
user_input_requested = True
|
||||
result = ToolResult(
|
||||
tool_use_id=tc.tool_use_id,
|
||||
content="Waiting for user input...",
|
||||
is_error=False,
|
||||
)
|
||||
else:
|
||||
# Execute real tool
|
||||
# --- Real tool execution ---
|
||||
result = await self._execute_tool(tc)
|
||||
# Truncate large results to prevent context blowup
|
||||
result = self._truncate_tool_result(result, tc.tool_name)
|
||||
tool_entry = {
|
||||
"tool_use_id": tc.tool_use_id,
|
||||
"tool_name": tc.tool_name,
|
||||
"tool_input": tc.tool_input,
|
||||
"content": result.content,
|
||||
"is_error": result.is_error,
|
||||
}
|
||||
real_tool_results.append(tool_entry)
|
||||
logged_tool_calls.append(tool_entry)
|
||||
|
||||
# Record tool result in conversation (write-through)
|
||||
# Record tool result in conversation (both real and set_output
|
||||
# go into the conversation for LLM context continuity)
|
||||
await conversation.add_tool_result(
|
||||
tool_use_id=tc.tool_use_id,
|
||||
content=result.content,
|
||||
is_error=result.is_error,
|
||||
)
|
||||
tool_results.append(
|
||||
{
|
||||
"tool_use_id": tc.tool_use_id,
|
||||
"tool_name": tc.tool_name,
|
||||
"content": result.content,
|
||||
"is_error": result.is_error,
|
||||
}
|
||||
)
|
||||
|
||||
# Publish tool call completed
|
||||
await self._publish_tool_completed(
|
||||
@@ -678,17 +1057,16 @@ class EventLoopNode(NodeProtocol):
|
||||
# corresponding tool results, causing the LLM to repeat them
|
||||
# in the next turn (infinite loop).
|
||||
if limit_hit:
|
||||
max_tc = self._config.max_tool_calls_per_turn
|
||||
skipped = tool_calls[executed_in_batch:]
|
||||
logger.warning(
|
||||
"Max tool calls per turn (%d) exceeded — discarding %d remaining call(s): %s",
|
||||
max_tc,
|
||||
"Hard tool call limit (%d) exceeded — discarding %d remaining call(s): %s",
|
||||
hard_limit,
|
||||
len(skipped),
|
||||
", ".join(tc.tool_name for tc in skipped),
|
||||
)
|
||||
discard_msg = (
|
||||
f"Tool call discarded: max tool calls per turn "
|
||||
f"({max_tc}) exceeded. Consolidate your work and "
|
||||
f"Tool call discarded: hard limit of {hard_limit} tool calls "
|
||||
f"per turn exceeded. Consolidate your work and "
|
||||
f"use fewer tool calls."
|
||||
)
|
||||
for tc in skipped:
|
||||
@@ -697,17 +1075,42 @@ class EventLoopNode(NodeProtocol):
|
||||
content=discard_msg,
|
||||
is_error=True,
|
||||
)
|
||||
tool_results.append(
|
||||
{
|
||||
"tool_use_id": tc.tool_use_id,
|
||||
"tool_name": tc.tool_name,
|
||||
"content": discard_msg,
|
||||
"is_error": True,
|
||||
}
|
||||
# Discarded calls go into real_tool_results so the
|
||||
# caller sees they were attempted (for judge context).
|
||||
discard_entry = {
|
||||
"tool_use_id": tc.tool_use_id,
|
||||
"tool_name": tc.tool_name,
|
||||
"tool_input": tc.tool_input,
|
||||
"content": discard_msg,
|
||||
"is_error": True,
|
||||
}
|
||||
real_tool_results.append(discard_entry)
|
||||
logged_tool_calls.append(discard_entry)
|
||||
# Prune old tool results NOW to prevent context bloat on the
|
||||
# next turn. The char-based token estimator underestimates
|
||||
# actual API tokens, so the standard compaction check in the
|
||||
# outer loop may not trigger in time.
|
||||
protect = max(2000, self._config.max_history_tokens // 12)
|
||||
pruned = await conversation.prune_old_tool_results(
|
||||
protect_tokens=protect,
|
||||
min_prune_tokens=max(1000, protect // 3),
|
||||
)
|
||||
if pruned > 0:
|
||||
logger.info(
|
||||
"Post-limit pruning: cleared %d old tool results (budget: %d)",
|
||||
pruned,
|
||||
self._config.max_history_tokens,
|
||||
)
|
||||
# Limit hit — return from this turn so the judge can
|
||||
# evaluate instead of looping back for another stream.
|
||||
return final_text, tool_results, token_counts
|
||||
return (
|
||||
final_text,
|
||||
real_tool_results,
|
||||
outputs_set_this_turn,
|
||||
token_counts,
|
||||
logged_tool_calls,
|
||||
user_input_requested,
|
||||
)
|
||||
|
||||
# --- Mid-turn pruning: prevent context blowup within a single turn ---
|
||||
if conversation.usage_ratio() >= 0.6:
|
||||
@@ -723,12 +1126,51 @@ class EventLoopNode(NodeProtocol):
|
||||
conversation.usage_ratio() * 100,
|
||||
)
|
||||
|
||||
# If ask_user was called, return immediately so the outer loop
|
||||
# can block for user input instead of re-invoking the LLM.
|
||||
if user_input_requested:
|
||||
return (
|
||||
final_text,
|
||||
real_tool_results,
|
||||
outputs_set_this_turn,
|
||||
token_counts,
|
||||
logged_tool_calls,
|
||||
user_input_requested,
|
||||
)
|
||||
|
||||
# Tool calls processed -- loop back to stream with updated conversation
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# set_output synthetic tool
|
||||
# Synthetic tools: set_output, ask_user
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def _build_ask_user_tool(self) -> Tool:
|
||||
"""Build the synthetic ask_user tool for explicit user-input requests.
|
||||
|
||||
Client-facing nodes call ask_user() when they need to pause and wait
|
||||
for user input. Text-only turns WITHOUT ask_user flow through without
|
||||
blocking, allowing progress updates and summaries to stream freely.
|
||||
"""
|
||||
return Tool(
|
||||
name="ask_user",
|
||||
description=(
|
||||
"Call this tool when you need to wait for the user's response. "
|
||||
"Use it after greeting the user, asking a question, or requesting "
|
||||
"approval. Do NOT call it when you are just providing a status "
|
||||
"update or summary that doesn't require a response."
|
||||
),
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"question": {
|
||||
"type": "string",
|
||||
"description": "Optional: the question or prompt shown to the user.",
|
||||
},
|
||||
},
|
||||
"required": [],
|
||||
},
|
||||
)
|
||||
|
||||
def _build_set_output_tool(self, output_keys: list[str] | None) -> Tool | None:
|
||||
"""Build the synthetic set_output tool for explicit output declaration."""
|
||||
if not output_keys:
|
||||
@@ -1014,7 +1456,8 @@ class EventLoopNode(NodeProtocol):
|
||||
truncated = (
|
||||
f"[Result from {tool_name}: {len(result.content)} chars — "
|
||||
f"too large for context, saved to '{filename}'. "
|
||||
f"Use load_data('{filename}') to read the full result.]\n\n"
|
||||
f"Use load_data(filename='{filename}') "
|
||||
f"to read the full result.]\n\n"
|
||||
f"Preview:\n{preview}…"
|
||||
)
|
||||
logger.info(
|
||||
@@ -1235,7 +1678,7 @@ class EventLoopNode(NodeProtocol):
|
||||
if self._config.spillover_dir:
|
||||
parts.append(
|
||||
"NOTE: Large tool results were saved to files. "
|
||||
"Use load_data('<filename>') to read them."
|
||||
"Use load_data(filename='<filename>') to read them."
|
||||
)
|
||||
|
||||
# 6. Tool call history (prevent re-calling tools)
|
||||
|
||||
@@ -14,6 +14,7 @@ import logging
|
||||
import warnings
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
|
||||
@@ -128,6 +129,11 @@ class GraphExecutor:
|
||||
cleansing_config: CleansingConfig | None = None,
|
||||
enable_parallel_execution: bool = True,
|
||||
parallel_config: ParallelExecutionConfig | None = None,
|
||||
event_bus: Any | None = None,
|
||||
stream_id: str = "",
|
||||
runtime_logger: Any = None,
|
||||
storage_path: str | Path | None = None,
|
||||
loop_config: dict[str, Any] | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize the executor.
|
||||
@@ -142,6 +148,11 @@ class GraphExecutor:
|
||||
cleansing_config: Optional output cleansing configuration
|
||||
enable_parallel_execution: Enable parallel fan-out execution (default True)
|
||||
parallel_config: Configuration for parallel execution behavior
|
||||
event_bus: Optional event bus for emitting node lifecycle events
|
||||
stream_id: Stream ID for event correlation
|
||||
runtime_logger: Optional RuntimeLogger for per-graph-run logging
|
||||
storage_path: Optional base path for conversation persistence
|
||||
loop_config: Optional EventLoopNode configuration (max_iterations, etc.)
|
||||
"""
|
||||
self.runtime = runtime
|
||||
self.llm = llm
|
||||
@@ -151,6 +162,11 @@ class GraphExecutor:
|
||||
self.approval_callback = approval_callback
|
||||
self.validator = OutputValidator()
|
||||
self.logger = logging.getLogger(__name__)
|
||||
self._event_bus = event_bus
|
||||
self._stream_id = stream_id
|
||||
self.runtime_logger = runtime_logger
|
||||
self._storage_path = Path(storage_path) if storage_path else None
|
||||
self._loop_config = loop_config or {}
|
||||
|
||||
# Initialize output cleaner
|
||||
self.cleansing_config = cleansing_config or CleansingConfig()
|
||||
@@ -271,10 +287,28 @@ class GraphExecutor:
|
||||
input_data=input_data or {},
|
||||
)
|
||||
|
||||
if self.runtime_logger:
|
||||
# Extract session_id from storage_path if available (for unified sessions)
|
||||
# storage_path format: base_path/sessions/{session_id}/
|
||||
session_id = ""
|
||||
if self._storage_path and self._storage_path.name.startswith("session_"):
|
||||
session_id = self._storage_path.name
|
||||
self.runtime_logger.start_run(goal_id=goal.id, session_id=session_id)
|
||||
|
||||
self.logger.info(f"🚀 Starting execution: {goal.name}")
|
||||
self.logger.info(f" Goal: {goal.description}")
|
||||
self.logger.info(f" Entry node: {graph.entry_node}")
|
||||
|
||||
# Set per-execution data_dir so data tools (save_data, load_data, etc.)
|
||||
# and spillover files share the same session-scoped directory.
|
||||
_ctx_token = None
|
||||
if self._storage_path:
|
||||
from framework.runner.tool_registry import ToolRegistry
|
||||
|
||||
_ctx_token = ToolRegistry.set_execution_context(
|
||||
data_dir=str(self._storage_path / "data"),
|
||||
)
|
||||
|
||||
try:
|
||||
while steps < graph.max_steps:
|
||||
steps += 1
|
||||
@@ -357,13 +391,45 @@ class GraphExecutor:
|
||||
description=f"Validation errors for {current_node_id}: {validation_errors}",
|
||||
)
|
||||
|
||||
# Emit node-started event (skip event_loop nodes — they emit their own)
|
||||
if self._event_bus and node_spec.node_type != "event_loop":
|
||||
await self._event_bus.emit_node_loop_started(
|
||||
stream_id=self._stream_id, node_id=current_node_id
|
||||
)
|
||||
|
||||
# Execute node
|
||||
self.logger.info(" Executing...")
|
||||
result = await node_impl.execute(ctx)
|
||||
|
||||
# Emit node-completed event (skip event_loop nodes)
|
||||
if self._event_bus and node_spec.node_type != "event_loop":
|
||||
await self._event_bus.emit_node_loop_completed(
|
||||
stream_id=self._stream_id, node_id=current_node_id, iterations=1
|
||||
)
|
||||
|
||||
# Ensure runtime logging has an L2 entry for this node
|
||||
if self.runtime_logger:
|
||||
self.runtime_logger.ensure_node_logged(
|
||||
node_id=node_spec.id,
|
||||
node_name=node_spec.name,
|
||||
node_type=node_spec.node_type,
|
||||
success=result.success,
|
||||
error=result.error,
|
||||
tokens_used=result.tokens_used,
|
||||
latency_ms=result.latency_ms,
|
||||
)
|
||||
|
||||
if result.success:
|
||||
# Validate output before accepting it
|
||||
if result.output and node_spec.output_keys:
|
||||
# Validate output before accepting it.
|
||||
# Skip for event_loop nodes — their judge system is
|
||||
# the sole acceptance mechanism (see WP-8). Empty
|
||||
# strings and other flexible outputs are legitimate
|
||||
# for LLM-driven nodes that already passed the judge.
|
||||
if (
|
||||
result.output
|
||||
and node_spec.output_keys
|
||||
and node_spec.node_type != "event_loop"
|
||||
):
|
||||
validation = self.validator.validate_all(
|
||||
output=result.output,
|
||||
expected_keys=node_spec.output_keys,
|
||||
@@ -441,48 +507,74 @@ class GraphExecutor:
|
||||
_is_retry = True
|
||||
continue
|
||||
else:
|
||||
# Max retries exceeded - fail the execution
|
||||
# Max retries exceeded - check for failure handlers
|
||||
self.logger.error(
|
||||
f" ✗ Max retries ({max_retries}) exceeded for node {current_node_id}"
|
||||
)
|
||||
self.runtime.report_problem(
|
||||
severity="critical",
|
||||
description=(
|
||||
f"Node {current_node_id} failed after "
|
||||
f"{max_retries} attempts: {result.error}"
|
||||
),
|
||||
)
|
||||
self.runtime.end_run(
|
||||
success=False,
|
||||
output_data=memory.read_all(),
|
||||
narrative=(
|
||||
f"Failed at {node_spec.name} after "
|
||||
f"{max_retries} retries: {result.error}"
|
||||
),
|
||||
|
||||
# Check if there's an ON_FAILURE edge to follow
|
||||
next_node = self._follow_edges(
|
||||
graph=graph,
|
||||
goal=goal,
|
||||
current_node_id=current_node_id,
|
||||
current_node_spec=node_spec,
|
||||
result=result, # result.success=False triggers ON_FAILURE
|
||||
memory=memory,
|
||||
)
|
||||
|
||||
# Calculate quality metrics
|
||||
total_retries_count = sum(node_retry_counts.values())
|
||||
nodes_failed = list(node_retry_counts.keys())
|
||||
if next_node:
|
||||
# Found a failure handler - route to it
|
||||
self.logger.info(f" → Routing to failure handler: {next_node}")
|
||||
current_node_id = next_node
|
||||
continue # Continue execution with handler
|
||||
else:
|
||||
# No failure handler - terminate execution
|
||||
self.runtime.report_problem(
|
||||
severity="critical",
|
||||
description=(
|
||||
f"Node {current_node_id} failed after "
|
||||
f"{max_retries} attempts: {result.error}"
|
||||
),
|
||||
)
|
||||
self.runtime.end_run(
|
||||
success=False,
|
||||
output_data=memory.read_all(),
|
||||
narrative=(
|
||||
f"Failed at {node_spec.name} after "
|
||||
f"{max_retries} retries: {result.error}"
|
||||
),
|
||||
)
|
||||
|
||||
return ExecutionResult(
|
||||
success=False,
|
||||
error=(
|
||||
f"Node '{node_spec.name}' failed after "
|
||||
f"{max_retries} attempts: {result.error}"
|
||||
),
|
||||
output=memory.read_all(),
|
||||
steps_executed=steps,
|
||||
total_tokens=total_tokens,
|
||||
total_latency_ms=total_latency,
|
||||
path=path,
|
||||
total_retries=total_retries_count,
|
||||
nodes_with_failures=nodes_failed,
|
||||
retry_details=dict(node_retry_counts),
|
||||
had_partial_failures=len(nodes_failed) > 0,
|
||||
execution_quality="failed",
|
||||
node_visit_counts=dict(node_visit_counts),
|
||||
)
|
||||
# Calculate quality metrics
|
||||
total_retries_count = sum(node_retry_counts.values())
|
||||
nodes_failed = list(node_retry_counts.keys())
|
||||
|
||||
if self.runtime_logger:
|
||||
await self.runtime_logger.end_run(
|
||||
status="failure",
|
||||
duration_ms=total_latency,
|
||||
node_path=path,
|
||||
execution_quality="failed",
|
||||
)
|
||||
|
||||
return ExecutionResult(
|
||||
success=False,
|
||||
error=(
|
||||
f"Node '{node_spec.name}' failed after "
|
||||
f"{max_retries} attempts: {result.error}"
|
||||
),
|
||||
output=memory.read_all(),
|
||||
steps_executed=steps,
|
||||
total_tokens=total_tokens,
|
||||
total_latency_ms=total_latency,
|
||||
path=path,
|
||||
total_retries=total_retries_count,
|
||||
nodes_with_failures=nodes_failed,
|
||||
retry_details=dict(node_retry_counts),
|
||||
had_partial_failures=len(nodes_failed) > 0,
|
||||
execution_quality="failed",
|
||||
node_visit_counts=dict(node_visit_counts),
|
||||
)
|
||||
|
||||
# Check if we just executed a pause node - if so, save state and return
|
||||
# This must happen BEFORE determining next node, since pause nodes may have no edges
|
||||
@@ -507,6 +599,14 @@ class GraphExecutor:
|
||||
nodes_failed = [nid for nid, count in node_retry_counts.items() if count > 0]
|
||||
exec_quality = "degraded" if total_retries_count > 0 else "clean"
|
||||
|
||||
if self.runtime_logger:
|
||||
await self.runtime_logger.end_run(
|
||||
status="success",
|
||||
duration_ms=total_latency,
|
||||
node_path=path,
|
||||
execution_quality=exec_quality,
|
||||
)
|
||||
|
||||
return ExecutionResult(
|
||||
success=True,
|
||||
output=saved_memory,
|
||||
@@ -630,6 +730,14 @@ class GraphExecutor:
|
||||
),
|
||||
)
|
||||
|
||||
if self.runtime_logger:
|
||||
await self.runtime_logger.end_run(
|
||||
status="success" if exec_quality != "failed" else "failure",
|
||||
duration_ms=total_latency,
|
||||
node_path=path,
|
||||
execution_quality=exec_quality,
|
||||
)
|
||||
|
||||
return ExecutionResult(
|
||||
success=True,
|
||||
output=output,
|
||||
@@ -646,6 +754,10 @@ class GraphExecutor:
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
|
||||
stack_trace = traceback.format_exc()
|
||||
|
||||
self.runtime.report_problem(
|
||||
severity="critical",
|
||||
description=str(e),
|
||||
@@ -655,10 +767,29 @@ class GraphExecutor:
|
||||
narrative=f"Failed at step {steps}: {e}",
|
||||
)
|
||||
|
||||
# Log the crashing node to L2 with full stack trace
|
||||
if self.runtime_logger and node_spec is not None:
|
||||
self.runtime_logger.ensure_node_logged(
|
||||
node_id=node_spec.id,
|
||||
node_name=node_spec.name,
|
||||
node_type=node_spec.node_type,
|
||||
success=False,
|
||||
error=str(e),
|
||||
stacktrace=stack_trace,
|
||||
)
|
||||
|
||||
# Calculate quality metrics even for exceptions
|
||||
total_retries_count = sum(node_retry_counts.values())
|
||||
nodes_failed = list(node_retry_counts.keys())
|
||||
|
||||
if self.runtime_logger:
|
||||
await self.runtime_logger.end_run(
|
||||
status="failure",
|
||||
duration_ms=total_latency,
|
||||
node_path=path,
|
||||
execution_quality="failed",
|
||||
)
|
||||
|
||||
return ExecutionResult(
|
||||
success=False,
|
||||
error=str(e),
|
||||
@@ -672,6 +803,12 @@ class GraphExecutor:
|
||||
node_visit_counts=dict(node_visit_counts),
|
||||
)
|
||||
|
||||
finally:
|
||||
if _ctx_token is not None:
|
||||
from framework.runner.tool_registry import ToolRegistry
|
||||
|
||||
ToolRegistry.reset_execution_context(_ctx_token)
|
||||
|
||||
def _build_context(
|
||||
self,
|
||||
node_spec: NodeSpec,
|
||||
@@ -703,6 +840,7 @@ class GraphExecutor:
|
||||
goal_context=goal.to_prompt_context(),
|
||||
goal=goal, # Pass Goal object for LLM-powered routers
|
||||
max_tokens=max_tokens,
|
||||
runtime_logger=self.runtime_logger,
|
||||
)
|
||||
|
||||
# Valid node types - no ambiguous "llm" type allowed
|
||||
@@ -781,11 +919,48 @@ class GraphExecutor:
|
||||
)
|
||||
|
||||
if node_spec.node_type == "event_loop":
|
||||
# Event loop nodes must be pre-registered (like function nodes)
|
||||
raise RuntimeError(
|
||||
f"EventLoopNode '{node_spec.id}' not found in registry. "
|
||||
"Register it with executor.register_node() before execution."
|
||||
# Auto-create EventLoopNode with sensible defaults.
|
||||
# Custom configs can still be pre-registered via node_registry.
|
||||
from framework.graph.event_loop_node import EventLoopNode, LoopConfig
|
||||
|
||||
# Create a FileConversationStore if a storage path is available
|
||||
conv_store = None
|
||||
if self._storage_path:
|
||||
from framework.storage.conversation_store import FileConversationStore
|
||||
|
||||
store_path = self._storage_path / "conversations" / node_spec.id
|
||||
conv_store = FileConversationStore(base_path=store_path)
|
||||
|
||||
# Auto-configure spillover directory for large tool results.
|
||||
# When a tool result exceeds max_tool_result_chars, the full
|
||||
# content is written to spillover_dir and the agent gets a
|
||||
# truncated preview with instructions to use load_data().
|
||||
# Uses storage_path/data which is session-scoped, matching the
|
||||
# data_dir set via execution context for data tools.
|
||||
spillover = None
|
||||
if self._storage_path:
|
||||
spillover = str(self._storage_path / "data")
|
||||
|
||||
lc = self._loop_config
|
||||
default_max_iter = 100 if node_spec.client_facing else 50
|
||||
node = EventLoopNode(
|
||||
event_bus=self._event_bus,
|
||||
judge=None, # implicit judge: accept when output_keys are filled
|
||||
config=LoopConfig(
|
||||
max_iterations=lc.get("max_iterations", default_max_iter),
|
||||
max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 10),
|
||||
tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
|
||||
stall_detection_threshold=lc.get("stall_detection_threshold", 3),
|
||||
max_history_tokens=lc.get("max_history_tokens", 32000),
|
||||
max_tool_result_chars=lc.get("max_tool_result_chars", 3_000),
|
||||
spillover_dir=spillover,
|
||||
),
|
||||
tool_executor=self.tool_executor,
|
||||
conversation_store=conv_store,
|
||||
)
|
||||
# Cache so inject_event() is reachable for client-facing input
|
||||
self.node_registry[node_spec.id] = node
|
||||
return node
|
||||
|
||||
# Should never reach here due to validation above
|
||||
raise RuntimeError(f"Unhandled node type: {node_spec.node_type}")
|
||||
@@ -814,9 +989,12 @@ class GraphExecutor:
|
||||
source_node_name=current_node_spec.name if current_node_spec else current_node_id,
|
||||
target_node_name=target_node_spec.name if target_node_spec else edge.target,
|
||||
):
|
||||
# Validate and clean output before mapping inputs
|
||||
# Validate and clean output before mapping inputs.
|
||||
# Use full memory state (not just result.output) because
|
||||
# target input_keys may come from earlier nodes in the
|
||||
# graph, not only from the immediate source node.
|
||||
if self.cleansing_config.enabled and target_node_spec:
|
||||
output_to_validate = result.output
|
||||
output_to_validate = memory.read_all()
|
||||
|
||||
validation = self.output_cleaner.validate_output(
|
||||
output=output_to_validate,
|
||||
@@ -1012,10 +1190,13 @@ class GraphExecutor:
|
||||
branch.status = "running"
|
||||
|
||||
try:
|
||||
# Validate and clean output before mapping inputs (same as _follow_edges)
|
||||
# Validate and clean output before mapping inputs (same as _follow_edges).
|
||||
# Use full memory state since target input_keys may come
|
||||
# from earlier nodes, not just the immediate source.
|
||||
if self.cleansing_config.enabled and node_spec:
|
||||
mem_snapshot = memory.read_all()
|
||||
validation = self.output_cleaner.validate_output(
|
||||
output=source_result.output,
|
||||
output=mem_snapshot,
|
||||
source_node_id=source_node_spec.id if source_node_spec else "unknown",
|
||||
target_node_spec=node_spec,
|
||||
)
|
||||
@@ -1026,7 +1207,7 @@ class GraphExecutor:
|
||||
f"{branch.node_id}: {validation.errors}"
|
||||
)
|
||||
cleaned_output = self.output_cleaner.clean_output(
|
||||
output=source_result.output,
|
||||
output=mem_snapshot,
|
||||
source_node_id=source_node_spec.id if source_node_spec else "unknown",
|
||||
target_node_spec=node_spec,
|
||||
validation_errors=validation.errors,
|
||||
@@ -1049,12 +1230,36 @@ class GraphExecutor:
|
||||
ctx = self._build_context(node_spec, memory, goal, mapped, graph.max_tokens)
|
||||
node_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)
|
||||
|
||||
# Emit node-started event (skip event_loop nodes)
|
||||
if self._event_bus and node_spec.node_type != "event_loop":
|
||||
await self._event_bus.emit_node_loop_started(
|
||||
stream_id=self._stream_id, node_id=branch.node_id
|
||||
)
|
||||
|
||||
self.logger.info(
|
||||
f" ▶ Branch {node_spec.name}: executing (attempt {attempt + 1})"
|
||||
)
|
||||
result = await node_impl.execute(ctx)
|
||||
last_result = result
|
||||
|
||||
# Ensure L2 entry for this branch node
|
||||
if self.runtime_logger:
|
||||
self.runtime_logger.ensure_node_logged(
|
||||
node_id=node_spec.id,
|
||||
node_name=node_spec.name,
|
||||
node_type=node_spec.node_type,
|
||||
success=result.success,
|
||||
error=result.error,
|
||||
tokens_used=result.tokens_used,
|
||||
latency_ms=result.latency_ms,
|
||||
)
|
||||
|
||||
# Emit node-completed event (skip event_loop nodes)
|
||||
if self._event_bus and node_spec.node_type != "event_loop":
|
||||
await self._event_bus.emit_node_loop_completed(
|
||||
stream_id=self._stream_id, node_id=branch.node_id, iterations=1
|
||||
)
|
||||
|
||||
if result.success:
|
||||
# Write outputs to shared memory using async write
|
||||
for key, value in result.output.items():
|
||||
@@ -1084,9 +1289,24 @@ class GraphExecutor:
|
||||
return branch, last_result
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
|
||||
stack_trace = traceback.format_exc()
|
||||
branch.status = "failed"
|
||||
branch.error = str(e)
|
||||
self.logger.error(f" ✗ Branch {branch.node_id}: exception - {e}")
|
||||
|
||||
# Log the crashing branch node to L2 with full stack trace
|
||||
if self.runtime_logger and node_spec is not None:
|
||||
self.runtime_logger.ensure_node_logged(
|
||||
node_id=node_spec.id,
|
||||
node_name=node_spec.name,
|
||||
node_type=node_spec.node_type,
|
||||
success=False,
|
||||
error=str(e),
|
||||
stacktrace=stack_trace,
|
||||
)
|
||||
|
||||
return branch, e
|
||||
|
||||
# Execute all branches concurrently
|
||||
|
||||
@@ -477,6 +477,9 @@ class NodeContext:
|
||||
attempt: int = 1
|
||||
max_attempts: int = 3
|
||||
|
||||
# Runtime logging (optional)
|
||||
runtime_logger: Any = None # RuntimeLogger | None — uses Any to avoid import
|
||||
|
||||
|
||||
@dataclass
|
||||
class NodeResult:
|
||||
@@ -854,6 +857,8 @@ Keep the same JSON structure but with shorter content values.
|
||||
)
|
||||
|
||||
start = time.time()
|
||||
_step_index = 0
|
||||
_captured_tool_calls: list[dict] = []
|
||||
|
||||
try:
|
||||
# Build messages
|
||||
@@ -893,6 +898,16 @@ Keep the same JSON structure but with shorter content values.
|
||||
if len(str(result.content)) > 150:
|
||||
result_str += "..."
|
||||
logger.info(f" ✓ Tool result: {result_str}")
|
||||
# Capture for runtime logging
|
||||
_captured_tool_calls.append(
|
||||
{
|
||||
"tool_use_id": tool_use.id,
|
||||
"tool_name": tool_use.name,
|
||||
"tool_input": tool_use.input,
|
||||
"content": result.content,
|
||||
"is_error": result.is_error,
|
||||
}
|
||||
)
|
||||
return result
|
||||
|
||||
response = ctx.llm.complete_with_tools(
|
||||
@@ -1072,6 +1087,29 @@ Keep the same JSON structure but with shorter content values.
|
||||
f"Pydantic validation failed after "
|
||||
f"{max_validation_retries} retries: {err}"
|
||||
)
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
step_index=_step_index,
|
||||
llm_text=response.content,
|
||||
tool_calls=_captured_tool_calls,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
success=False,
|
||||
error=error_msg,
|
||||
total_steps=_step_index + 1,
|
||||
tokens_used=total_input_tokens + total_output_tokens,
|
||||
input_tokens=total_input_tokens,
|
||||
output_tokens=total_output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
return NodeResult(
|
||||
success=False,
|
||||
error=error_msg,
|
||||
@@ -1161,12 +1199,36 @@ Keep the same JSON structure but with shorter content values.
|
||||
)
|
||||
|
||||
# Return failure instead of writing garbage to all keys
|
||||
_extraction_error = (
|
||||
f"Output extraction failed: {e}. LLM returned non-JSON response. "
|
||||
f"Expected keys: {ctx.node_spec.output_keys}"
|
||||
)
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
step_index=_step_index,
|
||||
llm_text=response.content,
|
||||
tool_calls=_captured_tool_calls,
|
||||
input_tokens=response.input_tokens,
|
||||
output_tokens=response.output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
success=False,
|
||||
error=_extraction_error,
|
||||
total_steps=_step_index + 1,
|
||||
tokens_used=response.input_tokens + response.output_tokens,
|
||||
input_tokens=response.input_tokens,
|
||||
output_tokens=response.output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
return NodeResult(
|
||||
success=False,
|
||||
error=(
|
||||
f"Output extraction failed: {e}. LLM returned non-JSON response. "
|
||||
f"Expected keys: {ctx.node_spec.output_keys}"
|
||||
),
|
||||
error=_extraction_error,
|
||||
output={},
|
||||
tokens_used=response.input_tokens + response.output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
@@ -1184,6 +1246,29 @@ Keep the same JSON structure but with shorter content values.
|
||||
ctx.memory.write(key, stripped_content, validate=False)
|
||||
output[key] = stripped_content
|
||||
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
step_index=_step_index,
|
||||
llm_text=response.content,
|
||||
tool_calls=_captured_tool_calls,
|
||||
input_tokens=response.input_tokens,
|
||||
output_tokens=response.output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
success=True,
|
||||
total_steps=_step_index + 1,
|
||||
tokens_used=response.input_tokens + response.output_tokens,
|
||||
input_tokens=response.input_tokens,
|
||||
output_tokens=response.output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output=output,
|
||||
@@ -1199,6 +1284,15 @@ Keep the same JSON structure but with shorter content values.
|
||||
error=str(e),
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type=ctx.node_spec.node_type,
|
||||
success=False,
|
||||
error=str(e),
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
|
||||
|
||||
def _parse_output(self, content: str, node_spec: NodeSpec) -> dict[str, Any]:
|
||||
@@ -1591,6 +1685,9 @@ class RouterNode(NodeProtocol):
|
||||
|
||||
async def execute(self, ctx: NodeContext) -> NodeResult:
|
||||
"""Execute routing logic."""
|
||||
import time as _time
|
||||
|
||||
start = _time.time()
|
||||
ctx.runtime.set_node(ctx.node_id)
|
||||
|
||||
# Build options from routes
|
||||
@@ -1635,10 +1732,30 @@ class RouterNode(NodeProtocol):
|
||||
summary=f"Routing to {chosen_route[1]}",
|
||||
)
|
||||
|
||||
latency_ms = int((_time.time() - start) * 1000)
|
||||
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type="router",
|
||||
step_index=0,
|
||||
llm_text=f"Route: {chosen_route[0]} -> {chosen_route[1]}",
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="router",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
return NodeResult(
|
||||
success=True,
|
||||
next_node=chosen_route[1],
|
||||
route_reason=f"Chose route: {chosen_route[0]}",
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
async def _llm_route(
|
||||
@@ -1800,6 +1917,22 @@ class FunctionNode(NodeProtocol):
|
||||
else:
|
||||
output = {"result": result}
|
||||
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type="function",
|
||||
step_index=0,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="function",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
return NodeResult(success=True, output=output, latency_ms=latency_ms)
|
||||
|
||||
except Exception as e:
|
||||
@@ -1810,4 +1943,22 @@ class FunctionNode(NodeProtocol):
|
||||
error=str(e),
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
if ctx.runtime_logger:
|
||||
ctx.runtime_logger.log_step(
|
||||
node_id=ctx.node_id,
|
||||
node_type="function",
|
||||
step_index=0,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
ctx.runtime_logger.log_node_complete(
|
||||
node_id=ctx.node_id,
|
||||
node_name=ctx.node_spec.name,
|
||||
node_type="function",
|
||||
success=False,
|
||||
error=str(e),
|
||||
total_steps=1,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
|
||||
|
||||
@@ -144,8 +144,11 @@ class OutputCleaner:
|
||||
errors = []
|
||||
warnings = []
|
||||
|
||||
# Check 1: Required input keys present
|
||||
# Check 1: Required input keys present (skip nullable keys)
|
||||
nullable = set(getattr(target_node_spec, "nullable_output_keys", None) or [])
|
||||
for key in target_node_spec.input_keys:
|
||||
if key in nullable:
|
||||
continue
|
||||
if key not in output:
|
||||
errors.append(f"Missing required key: '{key}'")
|
||||
continue
|
||||
|
||||
@@ -207,7 +207,7 @@ class OutputValidator:
|
||||
def validate_no_hallucination(
|
||||
self,
|
||||
output: dict[str, Any],
|
||||
max_length: int = 10000,
|
||||
max_length: int = 50000,
|
||||
) -> ValidationResult:
|
||||
"""
|
||||
Check for signs of LLM hallucination in output values.
|
||||
|
||||
@@ -147,7 +147,7 @@ class LiteLLMProvider(LLMProvider):
|
||||
|
||||
if litellm is None:
|
||||
raise ImportError(
|
||||
"LiteLLM is not installed. Please install it with: pip install litellm"
|
||||
"LiteLLM is not installed. Please install it with: uv pip install litellm"
|
||||
)
|
||||
|
||||
def _completion_with_rate_limit_retry(self, **kwargs: Any) -> Any:
|
||||
@@ -572,17 +572,21 @@ class LiteLLMProvider(LLMProvider):
|
||||
# and we skip the retry path — nothing was yielded in vain.)
|
||||
has_content = accumulated_text or tool_calls_acc
|
||||
if not has_content and attempt < RATE_LIMIT_MAX_RETRIES:
|
||||
# If the conversation ends with an assistant message,
|
||||
# an empty stream is expected (nothing new to say).
|
||||
# Don't retry — just flush whatever we have.
|
||||
# If the conversation ends with an assistant or tool
|
||||
# message, an empty stream is expected — the LLM has
|
||||
# nothing new to say. Don't burn retries on this;
|
||||
# let the caller (EventLoopNode) decide what to do.
|
||||
# Typical case: client_facing node where the LLM set
|
||||
# all outputs via set_output tool calls, and the tool
|
||||
# results are the last messages.
|
||||
last_role = next(
|
||||
(m["role"] for m in reversed(full_messages) if m.get("role") != "system"),
|
||||
None,
|
||||
)
|
||||
if last_role == "assistant":
|
||||
if last_role in ("assistant", "tool"):
|
||||
logger.debug(
|
||||
"[stream] Empty response after assistant message — "
|
||||
"expected, not retrying."
|
||||
"[stream] Empty response after %s message — expected, not retrying.",
|
||||
last_role,
|
||||
)
|
||||
for event in tail_events:
|
||||
yield event
|
||||
|
||||
@@ -4,25 +4,41 @@ MCP Server for Agent Building Tools
|
||||
Exposes tools for building goal-driven agents via the Model Context Protocol.
|
||||
|
||||
Usage:
|
||||
python -m framework.mcp.agent_builder_server
|
||||
uv run python -m framework.mcp.agent_builder_server
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Annotated
|
||||
|
||||
from mcp.server import FastMCP
|
||||
# Ensure exports/ is on sys.path so AgentRunner can import agent modules.
|
||||
_framework_dir = Path(__file__).resolve().parent.parent # core/framework/ -> core/
|
||||
_project_root = _framework_dir.parent # core/ -> project root
|
||||
_exports_dir = _project_root / "exports"
|
||||
if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
|
||||
sys.path.insert(0, str(_exports_dir))
|
||||
del _framework_dir, _project_root, _exports_dir
|
||||
|
||||
from framework.graph import Constraint, EdgeCondition, EdgeSpec, Goal, NodeSpec, SuccessCriterion
|
||||
from framework.graph.plan import Plan
|
||||
from mcp.server import FastMCP # noqa: E402
|
||||
|
||||
from framework.graph import ( # noqa: E402
|
||||
Constraint,
|
||||
EdgeCondition,
|
||||
EdgeSpec,
|
||||
Goal,
|
||||
NodeSpec,
|
||||
SuccessCriterion,
|
||||
)
|
||||
from framework.graph.plan import Plan # noqa: E402
|
||||
|
||||
# Testing framework imports
|
||||
from framework.testing.prompts import (
|
||||
from framework.testing.prompts import ( # noqa: E402
|
||||
PYTEST_TEST_FILE_HEADER,
|
||||
)
|
||||
from framework.utils.io import atomic_write
|
||||
from framework.utils.io import atomic_write # noqa: E402
|
||||
|
||||
# Initialize MCP server
|
||||
mcp = FastMCP("agent-builder")
|
||||
@@ -44,6 +60,7 @@ class BuildSession:
|
||||
self.nodes: list[NodeSpec] = []
|
||||
self.edges: list[EdgeSpec] = []
|
||||
self.mcp_servers: list[dict] = [] # MCP server configurations
|
||||
self.loop_config: dict = {} # LoopConfig parameters for EventLoopNodes
|
||||
self.created_at = datetime.now().isoformat()
|
||||
self.last_modified = datetime.now().isoformat()
|
||||
|
||||
@@ -56,6 +73,7 @@ class BuildSession:
|
||||
"nodes": [n.model_dump() for n in self.nodes],
|
||||
"edges": [e.model_dump() for e in self.edges],
|
||||
"mcp_servers": self.mcp_servers,
|
||||
"loop_config": self.loop_config,
|
||||
"created_at": self.created_at,
|
||||
"last_modified": self.last_modified,
|
||||
}
|
||||
@@ -102,6 +120,9 @@ class BuildSession:
|
||||
# Restore MCP servers
|
||||
session.mcp_servers = data.get("mcp_servers", [])
|
||||
|
||||
# Restore loop config
|
||||
session.loop_config = data.get("loop_config", {})
|
||||
|
||||
return session
|
||||
|
||||
|
||||
@@ -551,14 +572,32 @@ def add_node(
|
||||
node_id: Annotated[str, "Unique identifier for the node"],
|
||||
name: Annotated[str, "Human-readable name"],
|
||||
description: Annotated[str, "What this node does"],
|
||||
node_type: Annotated[str, "Type: llm_generate, llm_tool_use, router, or function"],
|
||||
node_type: Annotated[
|
||||
str,
|
||||
"Type: event_loop (recommended), function, router. "
|
||||
"Deprecated: llm_generate, llm_tool_use (use event_loop instead)",
|
||||
],
|
||||
input_keys: Annotated[str, "JSON array of keys this node reads from shared memory"],
|
||||
output_keys: Annotated[str, "JSON array of keys this node writes to shared memory"],
|
||||
system_prompt: Annotated[str, "Instructions for LLM nodes"] = "",
|
||||
tools: Annotated[str, "JSON array of tool names for llm_tool_use nodes"] = "[]",
|
||||
tools: Annotated[str, "JSON array of tool names for event_loop or llm_tool_use nodes"] = "[]",
|
||||
routes: Annotated[
|
||||
str, "JSON object mapping conditions to target node IDs for router nodes"
|
||||
] = "{}",
|
||||
client_facing: Annotated[
|
||||
bool,
|
||||
"If True, an ask_user() tool is injected so the LLM can explicitly request user input. "
|
||||
"The node blocks ONLY when ask_user() is called — text-only turns stream freely. "
|
||||
"Set True for nodes that interact with users (intake, review, approval). "
|
||||
"Nodes that do autonomous work (research, data processing, API calls) MUST be False.",
|
||||
] = False,
|
||||
nullable_output_keys: Annotated[
|
||||
str, "JSON array of output keys that may remain unset (for mutually exclusive outputs)"
|
||||
] = "[]",
|
||||
max_node_visits: Annotated[
|
||||
int,
|
||||
"Max times this node executes per graph run. Set >1 for feedback loop targets. 0=unlimited",
|
||||
] = 1,
|
||||
) -> str:
|
||||
"""Add a node to the agent graph. Nodes process inputs and produce outputs."""
|
||||
session = get_session()
|
||||
@@ -569,6 +608,7 @@ def add_node(
|
||||
output_keys_list = json.loads(output_keys)
|
||||
tools_list = json.loads(tools)
|
||||
routes_dict = json.loads(routes)
|
||||
nullable_output_keys_list = json.loads(nullable_output_keys)
|
||||
except json.JSONDecodeError as e:
|
||||
return json.dumps(
|
||||
{
|
||||
@@ -597,6 +637,9 @@ def add_node(
|
||||
system_prompt=system_prompt or None,
|
||||
tools=tools_list,
|
||||
routes=routes_dict,
|
||||
client_facing=client_facing,
|
||||
nullable_output_keys=nullable_output_keys_list,
|
||||
max_node_visits=max_node_visits,
|
||||
)
|
||||
|
||||
session.nodes.append(node)
|
||||
@@ -616,6 +659,34 @@ def add_node(
|
||||
if node_type in ("llm_generate", "llm_tool_use") and not system_prompt:
|
||||
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
|
||||
|
||||
# EventLoopNode validation
|
||||
if node_type == "event_loop" and not system_prompt:
|
||||
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
|
||||
|
||||
# Deprecated type warnings
|
||||
if node_type in ("llm_generate", "llm_tool_use"):
|
||||
warnings.append(
|
||||
f"Node type '{node_type}' is deprecated. Use 'event_loop' instead. "
|
||||
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
|
||||
)
|
||||
|
||||
# Warn about client_facing on nodes with tools (likely autonomous work)
|
||||
if node_type == "event_loop" and client_facing and tools_list:
|
||||
warnings.append(
|
||||
f"Node '{node_id}' is client_facing=True but has tools {tools_list}. "
|
||||
"Nodes with tools typically do autonomous work and should be "
|
||||
"client_facing=False. Only set True if this node needs user approval."
|
||||
)
|
||||
|
||||
# nullable_output_keys must be a subset of output_keys
|
||||
if nullable_output_keys_list:
|
||||
invalid_nullable = [k for k in nullable_output_keys_list if k not in output_keys_list]
|
||||
if invalid_nullable:
|
||||
errors.append(
|
||||
f"nullable_output_keys {invalid_nullable} must be a subset of "
|
||||
f"output_keys {output_keys_list}"
|
||||
)
|
||||
|
||||
_save_session(session) # Auto-save
|
||||
|
||||
return json.dumps(
|
||||
@@ -692,6 +763,7 @@ def add_edge(
|
||||
|
||||
# Validate
|
||||
errors = []
|
||||
warnings = []
|
||||
|
||||
if not any(n.id == source for n in session.nodes):
|
||||
errors.append(f"Source node '{source}' not found")
|
||||
@@ -700,12 +772,24 @@ def add_edge(
|
||||
if edge_condition == EdgeCondition.CONDITIONAL and not condition_expr:
|
||||
errors.append(f"Conditional edge '{edge_id}' needs condition_expr")
|
||||
|
||||
# Feedback edge validation
|
||||
if priority < 0:
|
||||
target_node = next((n for n in session.nodes if n.id == target), None)
|
||||
if target_node and target_node.max_node_visits <= 1:
|
||||
warnings.append(
|
||||
f"Edge '{edge_id}' has negative priority (feedback edge) "
|
||||
f"targeting '{target}', but node '{target}' has "
|
||||
f"max_node_visits={target_node.max_node_visits}. "
|
||||
"Consider increasing max_node_visits on the target node."
|
||||
)
|
||||
|
||||
_save_session(session) # Auto-save
|
||||
|
||||
return json.dumps(
|
||||
{
|
||||
"valid": len(errors) == 0,
|
||||
"errors": errors,
|
||||
"warnings": warnings,
|
||||
"edge": edge.model_dump(),
|
||||
"total_edges": len(session.edges),
|
||||
"approval_required": True,
|
||||
@@ -739,12 +823,23 @@ def update_node(
|
||||
node_id: Annotated[str, "ID of the node to update"],
|
||||
name: Annotated[str, "Updated human-readable name"] = "",
|
||||
description: Annotated[str, "Updated description"] = "",
|
||||
node_type: Annotated[str, "Updated type: llm_generate, llm_tool_use, router, or function"] = "",
|
||||
node_type: Annotated[
|
||||
str,
|
||||
"Updated type: event_loop (recommended), function, router. "
|
||||
"Deprecated: llm_generate, llm_tool_use",
|
||||
] = "",
|
||||
input_keys: Annotated[str, "Updated JSON array of input keys"] = "",
|
||||
output_keys: Annotated[str, "Updated JSON array of output keys"] = "",
|
||||
system_prompt: Annotated[str, "Updated instructions for LLM nodes"] = "",
|
||||
tools: Annotated[str, "Updated JSON array of tool names"] = "",
|
||||
routes: Annotated[str, "Updated JSON object mapping conditions to target node IDs"] = "",
|
||||
client_facing: Annotated[
|
||||
str, "Updated client-facing flag ('true'/'false', empty=no change)"
|
||||
] = "",
|
||||
nullable_output_keys: Annotated[
|
||||
str, "Updated JSON array of nullable output keys (empty=no change)"
|
||||
] = "",
|
||||
max_node_visits: Annotated[int, "Updated max node visits per graph run. 0=no change"] = 0,
|
||||
) -> str:
|
||||
"""Update an existing node in the agent graph. Only provided fields will be updated."""
|
||||
session = get_session()
|
||||
@@ -765,6 +860,9 @@ def update_node(
|
||||
output_keys_list = json.loads(output_keys) if output_keys else None
|
||||
tools_list = json.loads(tools) if tools else None
|
||||
routes_dict = json.loads(routes) if routes else None
|
||||
nullable_output_keys_list = (
|
||||
json.loads(nullable_output_keys) if nullable_output_keys else None
|
||||
)
|
||||
except json.JSONDecodeError as e:
|
||||
return json.dumps(
|
||||
{
|
||||
@@ -797,6 +895,12 @@ def update_node(
|
||||
node.tools = tools_list
|
||||
if routes_dict is not None:
|
||||
node.routes = routes_dict
|
||||
if client_facing:
|
||||
node.client_facing = client_facing.lower() == "true"
|
||||
if nullable_output_keys_list is not None:
|
||||
node.nullable_output_keys = nullable_output_keys_list
|
||||
if max_node_visits > 0:
|
||||
node.max_node_visits = max_node_visits
|
||||
|
||||
# Validate
|
||||
errors = []
|
||||
@@ -809,6 +913,26 @@ def update_node(
|
||||
if node.node_type in ("llm_generate", "llm_tool_use") and not node.system_prompt:
|
||||
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
|
||||
|
||||
# EventLoopNode validation
|
||||
if node.node_type == "event_loop" and not node.system_prompt:
|
||||
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
|
||||
|
||||
# Deprecated type warnings
|
||||
if node.node_type in ("llm_generate", "llm_tool_use"):
|
||||
warnings.append(
|
||||
f"Node type '{node.node_type}' is deprecated. Use 'event_loop' instead. "
|
||||
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
|
||||
)
|
||||
|
||||
# nullable_output_keys must be a subset of output_keys
|
||||
if node.nullable_output_keys:
|
||||
invalid_nullable = [k for k in node.nullable_output_keys if k not in node.output_keys]
|
||||
if invalid_nullable:
|
||||
errors.append(
|
||||
f"nullable_output_keys {invalid_nullable} must be a subset of "
|
||||
f"output_keys {node.output_keys}"
|
||||
)
|
||||
|
||||
_save_session(session) # Auto-save
|
||||
|
||||
return json.dumps(
|
||||
@@ -1009,17 +1133,30 @@ def validate_graph() -> str:
|
||||
errors.append(f"Unreachable nodes: {unreachable}")
|
||||
|
||||
# === CONTEXT FLOW VALIDATION ===
|
||||
# Build dependency map (node_id -> list of nodes it depends on)
|
||||
# Build dependency maps — separate forward edges from feedback edges.
|
||||
# Feedback edges (priority < 0) create cycles; they must not block the
|
||||
# topological sort. Context they carry arrives on *revisits*, not on
|
||||
# the first execution of a node.
|
||||
feedback_edge_ids = {e.id for e in session.edges if e.priority < 0}
|
||||
forward_dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
|
||||
feedback_sources: dict[str, list[str]] = {node.id: [] for node in session.nodes}
|
||||
# Combined map kept for error-message generation (all deps)
|
||||
dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
|
||||
|
||||
for edge in session.edges:
|
||||
if edge.target in dependencies:
|
||||
dependencies[edge.target].append(edge.source)
|
||||
if edge.target not in forward_dependencies:
|
||||
continue
|
||||
dependencies[edge.target].append(edge.source)
|
||||
if edge.id in feedback_edge_ids:
|
||||
feedback_sources[edge.target].append(edge.source)
|
||||
else:
|
||||
forward_dependencies[edge.target].append(edge.source)
|
||||
|
||||
# Build output map (node_id -> keys it produces)
|
||||
node_outputs: dict[str, set[str]] = {node.id: set(node.output_keys) for node in session.nodes}
|
||||
|
||||
# Compute available context for each node (what keys it can read)
|
||||
# Using topological order
|
||||
# Using topological order on the forward-edge DAG
|
||||
available_context: dict[str, set[str]] = {}
|
||||
computed = set()
|
||||
nodes_by_id = {n.id: n for n in session.nodes}
|
||||
@@ -1029,7 +1166,8 @@ def validate_graph() -> str:
|
||||
# Entry nodes can only read from initial context
|
||||
initial_context_keys: set[str] = set()
|
||||
|
||||
# Compute in topological order
|
||||
# Compute in topological order (forward edges only — feedback edges
|
||||
# don't block, since their context arrives on revisits)
|
||||
remaining = {n.id for n in session.nodes}
|
||||
max_iterations = len(session.nodes) * 2
|
||||
|
||||
@@ -1038,18 +1176,23 @@ def validate_graph() -> str:
|
||||
break
|
||||
|
||||
for node_id in list(remaining):
|
||||
deps = dependencies.get(node_id, [])
|
||||
fwd_deps = forward_dependencies.get(node_id, [])
|
||||
|
||||
# Can compute if all dependencies are computed (or no dependencies)
|
||||
if all(d in computed for d in deps):
|
||||
# Collect outputs from all dependencies
|
||||
# Can compute if all FORWARD dependencies are computed
|
||||
if all(d in computed for d in fwd_deps):
|
||||
# Collect outputs from all forward dependencies
|
||||
available = set(initial_context_keys)
|
||||
for dep_id in deps:
|
||||
# Add outputs from dependency
|
||||
for dep_id in fwd_deps:
|
||||
available.update(node_outputs.get(dep_id, set()))
|
||||
# Also add what was available to the dependency (transitive)
|
||||
available.update(available_context.get(dep_id, set()))
|
||||
|
||||
# Also include context from already-computed feedback
|
||||
# sources (bonus, not blocking)
|
||||
for fb_src in feedback_sources.get(node_id, []):
|
||||
if fb_src in computed:
|
||||
available.update(node_outputs.get(fb_src, set()))
|
||||
available.update(available_context.get(fb_src, set()))
|
||||
|
||||
available_context[node_id] = available
|
||||
computed.add(node_id)
|
||||
remaining.remove(node_id)
|
||||
@@ -1059,15 +1202,37 @@ def validate_graph() -> str:
|
||||
context_errors = []
|
||||
context_warnings = []
|
||||
missing_inputs: dict[str, list[str]] = {}
|
||||
feedback_only_inputs: dict[str, list[str]] = {}
|
||||
|
||||
for node in session.nodes:
|
||||
available = available_context.get(node.id, set())
|
||||
|
||||
for input_key in node.input_keys:
|
||||
if input_key not in available:
|
||||
if node.id not in missing_inputs:
|
||||
missing_inputs[node.id] = []
|
||||
missing_inputs[node.id].append(input_key)
|
||||
# Check if this input is provided by a feedback source
|
||||
fb_provides = set()
|
||||
for fb_src in feedback_sources.get(node.id, []):
|
||||
fb_provides.update(node_outputs.get(fb_src, set()))
|
||||
fb_provides.update(available_context.get(fb_src, set()))
|
||||
|
||||
if input_key in fb_provides:
|
||||
# Input arrives via feedback edge — warn, don't error
|
||||
if node.id not in feedback_only_inputs:
|
||||
feedback_only_inputs[node.id] = []
|
||||
feedback_only_inputs[node.id].append(input_key)
|
||||
else:
|
||||
if node.id not in missing_inputs:
|
||||
missing_inputs[node.id] = []
|
||||
missing_inputs[node.id].append(input_key)
|
||||
|
||||
# Warn about feedback-only inputs (available on revisits, not first run)
|
||||
for node_id, fb_keys in feedback_only_inputs.items():
|
||||
fb_srcs = feedback_sources.get(node_id, [])
|
||||
context_warnings.append(
|
||||
f"Node '{node_id}' input(s) {fb_keys} are only provided via "
|
||||
f"feedback edge(s) from {fb_srcs}. These will be available on "
|
||||
f"revisits but not on the first execution."
|
||||
)
|
||||
|
||||
# Generate helpful error messages
|
||||
for node_id, missing in missing_inputs.items():
|
||||
@@ -1147,6 +1312,98 @@ def validate_graph() -> str:
|
||||
errors.extend(context_errors)
|
||||
warnings.extend(context_warnings)
|
||||
|
||||
# === EventLoopNode-specific validation ===
|
||||
from collections import defaultdict
|
||||
|
||||
# Detect fan-out: multiple ON_SUCCESS edges from same source
|
||||
outgoing_success: dict[str, list[str]] = defaultdict(list)
|
||||
for edge in session.edges:
|
||||
cond = edge.condition.value if hasattr(edge.condition, "value") else edge.condition
|
||||
if cond == "on_success":
|
||||
outgoing_success[edge.source].append(edge.target)
|
||||
|
||||
for source_id, targets in outgoing_success.items():
|
||||
if len(targets) > 1:
|
||||
# Client-facing fan-out: cannot target multiple client_facing nodes
|
||||
cf_targets = [
|
||||
t for t in targets if any(n.id == t and n.client_facing for n in session.nodes)
|
||||
]
|
||||
if len(cf_targets) > 1:
|
||||
errors.append(
|
||||
f"Fan-out from '{source_id}' targets multiple client_facing "
|
||||
f"nodes: {cf_targets}. Only one branch may be client-facing."
|
||||
)
|
||||
|
||||
# Output key overlap on parallel event_loop nodes
|
||||
el_targets = [
|
||||
t
|
||||
for t in targets
|
||||
if any(n.id == t and n.node_type == "event_loop" for n in session.nodes)
|
||||
]
|
||||
if len(el_targets) > 1:
|
||||
seen_keys: dict[str, str] = {}
|
||||
for nid in el_targets:
|
||||
node_obj = next((n for n in session.nodes if n.id == nid), None)
|
||||
if node_obj:
|
||||
for key in node_obj.output_keys:
|
||||
if key in seen_keys:
|
||||
errors.append(
|
||||
f"Fan-out from '{source_id}': event_loop "
|
||||
f"nodes '{seen_keys[key]}' and '{nid}' both "
|
||||
f"write to output_key '{key}'. Parallel "
|
||||
"nodes must have disjoint output_keys."
|
||||
)
|
||||
else:
|
||||
seen_keys[key] = nid
|
||||
|
||||
# Feedback loop validation: targets should allow re-visits
|
||||
for edge in session.edges:
|
||||
if edge.priority < 0:
|
||||
target_node = next((n for n in session.nodes if n.id == edge.target), None)
|
||||
if target_node and target_node.max_node_visits <= 1:
|
||||
warnings.append(
|
||||
f"Feedback edge '{edge.id}' targets '{edge.target}' "
|
||||
f"which has max_node_visits={target_node.max_node_visits}. "
|
||||
"Consider setting max_node_visits > 1."
|
||||
)
|
||||
|
||||
# nullable_output_keys must be subset of output_keys
|
||||
for node in session.nodes:
|
||||
if node.nullable_output_keys:
|
||||
invalid = [k for k in node.nullable_output_keys if k not in node.output_keys]
|
||||
if invalid:
|
||||
errors.append(
|
||||
f"Node '{node.id}': nullable_output_keys {invalid} "
|
||||
f"must be a subset of output_keys {node.output_keys}"
|
||||
)
|
||||
|
||||
# Deprecated node type warnings
|
||||
deprecated_nodes = [
|
||||
{"node_id": n.id, "type": n.node_type, "replacement": "event_loop"}
|
||||
for n in session.nodes
|
||||
if n.node_type in ("llm_generate", "llm_tool_use")
|
||||
]
|
||||
for dn in deprecated_nodes:
|
||||
warnings.append(
|
||||
f"Node '{dn['node_id']}' uses deprecated type '{dn['type']}'. Use 'event_loop' instead."
|
||||
)
|
||||
|
||||
# Warn if all event_loop nodes are client_facing (common misconfiguration)
|
||||
el_nodes = [n for n in session.nodes if n.node_type == "event_loop"]
|
||||
cf_el_nodes = [n for n in el_nodes if n.client_facing]
|
||||
if len(el_nodes) > 1 and len(cf_el_nodes) == len(el_nodes):
|
||||
warnings.append(
|
||||
f"ALL {len(el_nodes)} event_loop nodes are client_facing=True. "
|
||||
"This injects ask_user() on every node. Only nodes that need user "
|
||||
"interaction (intake, review, approval) should be client_facing. Set "
|
||||
"client_facing=False on autonomous processing nodes."
|
||||
)
|
||||
|
||||
# Collect summary info
|
||||
event_loop_nodes = [n.id for n in session.nodes if n.node_type == "event_loop"]
|
||||
client_facing_nodes = [n.id for n in session.nodes if n.client_facing]
|
||||
feedback_edges = [e.id for e in session.edges if e.priority < 0]
|
||||
|
||||
return json.dumps(
|
||||
{
|
||||
"valid": len(errors) == 0,
|
||||
@@ -1163,6 +1420,10 @@ def validate_graph() -> str:
|
||||
"context_flow": {node_id: list(keys) for node_id, keys in available_context.items()}
|
||||
if available_context
|
||||
else None,
|
||||
"event_loop_nodes": event_loop_nodes,
|
||||
"client_facing_nodes": client_facing_nodes,
|
||||
"feedback_edges": feedback_edges,
|
||||
"deprecated_node_types": deprecated_nodes,
|
||||
}
|
||||
)
|
||||
|
||||
@@ -1213,6 +1474,12 @@ def _generate_readme(session: BuildSession, export_data: dict, all_tools: set) -
|
||||
if node.routes:
|
||||
routes_str = ", ".join([f"{k}→{v}" for k, v in node.routes.items()])
|
||||
node_info.append(f" - Routes: {routes_str}")
|
||||
if node.client_facing:
|
||||
node_info.append(" - Client-facing: Yes (blocks for user input)")
|
||||
if node.nullable_output_keys:
|
||||
node_info.append(f" - Nullable outputs: `{', '.join(node.nullable_output_keys)}`")
|
||||
if node.max_node_visits > 1:
|
||||
node_info.append(f" - Max visits: {node.max_node_visits}")
|
||||
nodes_section.append("\n".join(node_info))
|
||||
|
||||
# Build success criteria section
|
||||
@@ -1266,7 +1533,12 @@ def _generate_readme(session: BuildSession, export_data: dict, all_tools: set) -
|
||||
|
||||
for edge in edges:
|
||||
cond = edge.condition.value if hasattr(edge.condition, "value") else edge.condition
|
||||
readme += f"- `{edge.source}` → `{edge.target}` (condition: {cond})\n"
|
||||
priority_note = f", priority={edge.priority}" if edge.priority != 0 else ""
|
||||
feedback_note = " **[FEEDBACK]**" if edge.priority < 0 else ""
|
||||
readme += (
|
||||
f"- `{edge.source}` → `{edge.target}` "
|
||||
f"(condition: {cond}{priority_note}){feedback_note}\n"
|
||||
)
|
||||
|
||||
readme += f"""
|
||||
|
||||
@@ -1481,6 +1753,10 @@ def export_graph() -> str:
|
||||
"created_at": datetime.now().isoformat(),
|
||||
}
|
||||
|
||||
# Include loop config if configured
|
||||
if session.loop_config:
|
||||
graph_spec["loop_config"] = session.loop_config
|
||||
|
||||
# Collect all tools referenced by nodes
|
||||
all_tools = set()
|
||||
for node in session.nodes:
|
||||
@@ -1596,6 +1872,58 @@ def get_session_status() -> str:
|
||||
"nodes": [n.id for n in session.nodes],
|
||||
"edges": [(e.source, e.target) for e in session.edges],
|
||||
"mcp_servers": [s["name"] for s in session.mcp_servers],
|
||||
"event_loop_nodes": [n.id for n in session.nodes if n.node_type == "event_loop"],
|
||||
"client_facing_nodes": [n.id for n in session.nodes if n.client_facing],
|
||||
"deprecated_nodes": [
|
||||
n.id for n in session.nodes if n.node_type in ("llm_generate", "llm_tool_use")
|
||||
],
|
||||
"feedback_edges": [e.id for e in session.edges if e.priority < 0],
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def configure_loop(
|
||||
max_iterations: Annotated[int, "Maximum loop iterations per node execution (default 50)"] = 50,
|
||||
max_tool_calls_per_turn: Annotated[int, "Maximum tool calls per LLM turn (default 10)"] = 10,
|
||||
stall_detection_threshold: Annotated[
|
||||
int, "Consecutive identical responses before stall detection triggers (default 3)"
|
||||
] = 3,
|
||||
max_history_tokens: Annotated[
|
||||
int, "Maximum conversation history tokens before compaction (default 32000)"
|
||||
] = 32000,
|
||||
tool_call_overflow_margin: Annotated[
|
||||
float,
|
||||
"Overflow margin for max_tool_calls_per_turn. "
|
||||
"Tool calls are only discarded when count exceeds "
|
||||
"max_tool_calls_per_turn * (1 + margin). Default 0.5 (50% wiggle room)",
|
||||
] = 0.5,
|
||||
) -> str:
|
||||
"""Configure event loop parameters for EventLoopNode execution.
|
||||
|
||||
These settings control how EventLoopNodes behave at runtime:
|
||||
- max_iterations: prevents infinite loops
|
||||
- max_tool_calls_per_turn: limits tool calls per LLM response
|
||||
- tool_call_overflow_margin: wiggle room before tool calls are discarded (default 50%)
|
||||
- stall_detection_threshold: detects when LLM repeats itself
|
||||
- max_history_tokens: triggers conversation compaction
|
||||
"""
|
||||
session = get_session()
|
||||
|
||||
session.loop_config = {
|
||||
"max_iterations": max_iterations,
|
||||
"max_tool_calls_per_turn": max_tool_calls_per_turn,
|
||||
"tool_call_overflow_margin": tool_call_overflow_margin,
|
||||
"stall_detection_threshold": stall_detection_threshold,
|
||||
"max_history_tokens": max_history_tokens,
|
||||
}
|
||||
|
||||
_save_session(session)
|
||||
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"loop_config": session.loop_config,
|
||||
}
|
||||
)
|
||||
|
||||
@@ -1891,10 +2219,41 @@ def test_node(
|
||||
result["routing_options"] = node_spec.routes
|
||||
result["simulation"] = "Router would evaluate routes based on input and select target node"
|
||||
|
||||
elif node_spec.node_type in ("llm_generate", "llm_tool_use"):
|
||||
# Show what prompt would be sent
|
||||
elif node_spec.node_type == "event_loop":
|
||||
# EventLoopNode simulation
|
||||
result["system_prompt"] = node_spec.system_prompt
|
||||
result["available_tools"] = node_spec.tools
|
||||
result["client_facing"] = node_spec.client_facing
|
||||
result["nullable_output_keys"] = node_spec.nullable_output_keys
|
||||
result["max_node_visits"] = node_spec.max_node_visits
|
||||
|
||||
if mock_llm_response:
|
||||
result["mock_response"] = mock_llm_response
|
||||
result["simulation"] = (
|
||||
"EventLoopNode would run a multi-turn streaming loop. "
|
||||
"Each iteration: LLM call -> tool execution -> judge evaluation. "
|
||||
"Loop continues until judge ACCEPTs or max_iterations reached."
|
||||
)
|
||||
else:
|
||||
cf_note = (
|
||||
"Node is client-facing: has ask_user() tool, blocks when LLM calls it. "
|
||||
if node_spec.client_facing
|
||||
else ""
|
||||
)
|
||||
result["simulation"] = (
|
||||
"EventLoopNode would stream LLM responses, execute tool calls, "
|
||||
"and use judge evaluation to decide when to stop. "
|
||||
+ cf_note
|
||||
+ f"Max visits per graph run: {node_spec.max_node_visits}."
|
||||
)
|
||||
|
||||
elif node_spec.node_type in ("llm_generate", "llm_tool_use"):
|
||||
# Legacy LLM node types
|
||||
result["system_prompt"] = node_spec.system_prompt
|
||||
result["available_tools"] = node_spec.tools
|
||||
result["deprecation_warning"] = (
|
||||
f"Node type '{node_spec.node_type}' is deprecated. Use 'event_loop' instead."
|
||||
)
|
||||
|
||||
if mock_llm_response:
|
||||
result["mock_response"] = mock_llm_response
|
||||
@@ -1909,6 +2268,7 @@ def test_node(
|
||||
result["expected_memory_state"] = {
|
||||
"inputs_available": {k: input_data.get(k, "<not provided>") for k in node_spec.input_keys},
|
||||
"outputs_to_write": node_spec.output_keys,
|
||||
"nullable_outputs": node_spec.nullable_output_keys or [],
|
||||
}
|
||||
|
||||
return json.dumps(
|
||||
@@ -1997,13 +2357,19 @@ def test_graph(
|
||||
"writes": current_node.output_keys,
|
||||
}
|
||||
|
||||
if current_node.node_type in ("llm_generate", "llm_tool_use"):
|
||||
if current_node.node_type in ("llm_generate", "llm_tool_use", "event_loop"):
|
||||
step_info["prompt_preview"] = (
|
||||
current_node.system_prompt[:200] + "..."
|
||||
if current_node.system_prompt and len(current_node.system_prompt) > 200
|
||||
else current_node.system_prompt
|
||||
)
|
||||
step_info["tools_available"] = current_node.tools
|
||||
if current_node.node_type == "event_loop":
|
||||
step_info["event_loop_config"] = {
|
||||
"client_facing": current_node.client_facing,
|
||||
"max_node_visits": current_node.max_node_visits,
|
||||
"nullable_output_keys": current_node.nullable_output_keys,
|
||||
}
|
||||
|
||||
execution_trace.append(step_info)
|
||||
|
||||
@@ -2012,16 +2378,32 @@ def test_graph(
|
||||
step_info["is_terminal"] = True
|
||||
break
|
||||
|
||||
# Find next node via edges
|
||||
# Find next node via edges (sorted by priority, highest first)
|
||||
outgoing = sorted(
|
||||
[e for e in session.edges if e.source == current_node_id],
|
||||
key=lambda e: -e.priority,
|
||||
)
|
||||
next_node = None
|
||||
for edge in session.edges:
|
||||
if edge.source == current_node_id:
|
||||
# In dry run, assume success path
|
||||
if edge.condition.value in ("always", "on_success"):
|
||||
next_node = edge.target
|
||||
step_info["next_node"] = next_node
|
||||
step_info["edge_condition"] = edge.condition.value
|
||||
break
|
||||
for edge in outgoing:
|
||||
# In dry run, follow success/always edges (highest priority first)
|
||||
if edge.condition.value in ("always", "on_success"):
|
||||
next_node = edge.target
|
||||
step_info["next_node"] = next_node
|
||||
step_info["edge_condition"] = edge.condition.value
|
||||
step_info["edge_priority"] = edge.priority
|
||||
break
|
||||
|
||||
# Note any feedback edges from this node
|
||||
feedback = [e for e in outgoing if e.priority < 0]
|
||||
if feedback:
|
||||
step_info["feedback_edges"] = [
|
||||
{
|
||||
"target": e.target,
|
||||
"condition_expr": e.condition_expr,
|
||||
"priority": e.priority,
|
||||
}
|
||||
for e in feedback
|
||||
]
|
||||
|
||||
if next_node is None:
|
||||
step_info["note"] = "No outgoing edge found (end of path)"
|
||||
|
||||
+418
-57
@@ -33,11 +33,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
|
||||
type=str,
|
||||
help="Input context from JSON file",
|
||||
)
|
||||
run_parser.add_argument(
|
||||
"--mock",
|
||||
action="store_true",
|
||||
help="Run in mock mode (no real LLM calls)",
|
||||
)
|
||||
run_parser.add_argument(
|
||||
"--output",
|
||||
"-o",
|
||||
@@ -56,6 +51,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
|
||||
action="store_true",
|
||||
help="Show detailed execution logs (steps, LLM calls, etc.)",
|
||||
)
|
||||
run_parser.add_argument(
|
||||
"--tui",
|
||||
action="store_true",
|
||||
help="Launch interactive terminal dashboard",
|
||||
)
|
||||
run_parser.add_argument(
|
||||
"--model",
|
||||
"-m",
|
||||
type=str,
|
||||
default=None,
|
||||
help="LLM model to use (any LiteLLM-compatible name)",
|
||||
)
|
||||
run_parser.set_defaults(func=cmd_run)
|
||||
|
||||
# info command
|
||||
@@ -174,6 +181,21 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
|
||||
)
|
||||
shell_parser.set_defaults(func=cmd_shell)
|
||||
|
||||
# tui command (interactive agent dashboard)
|
||||
tui_parser = subparsers.add_parser(
|
||||
"tui",
|
||||
help="Launch interactive TUI dashboard",
|
||||
description="Browse available agents and launch the terminal dashboard.",
|
||||
)
|
||||
tui_parser.add_argument(
|
||||
"--model",
|
||||
"-m",
|
||||
type=str,
|
||||
default=None,
|
||||
help="LLM model to use (any LiteLLM-compatible name)",
|
||||
)
|
||||
tui_parser.set_defaults(func=cmd_tui)
|
||||
|
||||
|
||||
def cmd_run(args: argparse.Namespace) -> int:
|
||||
"""Run an exported agent."""
|
||||
@@ -205,38 +227,81 @@ def cmd_run(args: argparse.Namespace) -> int:
|
||||
print(f"Error reading input file: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Load and run agent
|
||||
try:
|
||||
runner = AgentRunner.load(
|
||||
args.agent_path,
|
||||
mock_mode=args.mock,
|
||||
model=getattr(args, "model", "claude-haiku-4-5-20251001"),
|
||||
)
|
||||
except FileNotFoundError as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
return 1
|
||||
# Run the agent (with TUI or standard)
|
||||
if getattr(args, "tui", False):
|
||||
from framework.tui.app import AdenTUI
|
||||
|
||||
# Auto-inject user_id if the agent expects it but it's not provided
|
||||
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
|
||||
if "user_id" in entry_input_keys and context.get("user_id") is None:
|
||||
import os
|
||||
async def run_with_tui():
|
||||
try:
|
||||
# Load runner inside the async loop to ensure strict loop affinity
|
||||
# (only one load — avoids spawning duplicate MCP subprocesses)
|
||||
try:
|
||||
runner = AgentRunner.load(
|
||||
args.agent_path,
|
||||
model=args.model,
|
||||
enable_tui=True,
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"Error loading agent: {e}")
|
||||
return
|
||||
|
||||
context["user_id"] = os.environ.get("USER", "default_user")
|
||||
# Force setup inside the loop
|
||||
if runner._agent_runtime is None:
|
||||
runner._setup()
|
||||
|
||||
if not args.quiet:
|
||||
info = runner.info()
|
||||
print(f"Agent: {info.name}")
|
||||
print(f"Goal: {info.goal_name}")
|
||||
print(f"Steps: {info.node_count}")
|
||||
print(f"Input: {json.dumps(context)}")
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("Executing agent...")
|
||||
print("=" * 60)
|
||||
print()
|
||||
# Start runtime before TUI so it's ready for user input
|
||||
if runner._agent_runtime and not runner._agent_runtime.is_running:
|
||||
await runner._agent_runtime.start()
|
||||
|
||||
# Run the agent
|
||||
result = asyncio.run(runner.run(context))
|
||||
app = AdenTUI(runner._agent_runtime)
|
||||
|
||||
# TUI handles execution via ChatRepl — user submits input,
|
||||
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
|
||||
await app.run_async()
|
||||
except Exception as e:
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
print(f"TUI error: {e}")
|
||||
|
||||
await runner.cleanup_async()
|
||||
return None
|
||||
|
||||
asyncio.run(run_with_tui())
|
||||
print("TUI session ended.")
|
||||
return 0
|
||||
else:
|
||||
# Standard execution — load runner here (not shared with TUI path)
|
||||
try:
|
||||
runner = AgentRunner.load(
|
||||
args.agent_path,
|
||||
model=args.model,
|
||||
enable_tui=False,
|
||||
)
|
||||
except FileNotFoundError as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Auto-inject user_id if the agent expects it but it's not provided
|
||||
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
|
||||
if "user_id" in entry_input_keys and context.get("user_id") is None:
|
||||
import os
|
||||
|
||||
context["user_id"] = os.environ.get("USER", "default_user")
|
||||
|
||||
if not args.quiet:
|
||||
info = runner.info()
|
||||
print(f"Agent: {info.name}")
|
||||
print(f"Goal: {info.goal_name}")
|
||||
print(f"Steps: {info.node_count}")
|
||||
print(f"Input: {json.dumps(context)}")
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("Executing agent...")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
result = asyncio.run(runner.run(context))
|
||||
|
||||
# Format output
|
||||
output = {
|
||||
@@ -928,8 +993,215 @@ def cmd_shell(args: argparse.Namespace) -> int:
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_tui(args: argparse.Namespace) -> int:
|
||||
"""Browse agents and launch the interactive TUI dashboard."""
|
||||
import logging
|
||||
|
||||
from framework.runner import AgentRunner
|
||||
from framework.tui.app import AdenTUI
|
||||
|
||||
logging.basicConfig(level=logging.WARNING, format="%(message)s")
|
||||
|
||||
exports_dir = Path("exports")
|
||||
examples_dir = Path("examples/templates")
|
||||
|
||||
has_exports = _has_agents(exports_dir)
|
||||
has_examples = _has_agents(examples_dir)
|
||||
|
||||
if not has_exports and not has_examples:
|
||||
print("No agents found in exports/ or examples/templates/", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Determine which directory to browse
|
||||
if has_exports and has_examples:
|
||||
print("\nAgent sources:\n")
|
||||
print(" 1. Your Agents (exports/)")
|
||||
print(" 2. Sample Agents (examples/templates/)")
|
||||
print()
|
||||
try:
|
||||
choice = input("Select source (number): ").strip()
|
||||
if choice == "1":
|
||||
agents_dir = exports_dir
|
||||
elif choice == "2":
|
||||
agents_dir = examples_dir
|
||||
else:
|
||||
print("Invalid selection")
|
||||
return 1
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
print()
|
||||
return 1
|
||||
elif has_exports:
|
||||
agents_dir = exports_dir
|
||||
else:
|
||||
agents_dir = examples_dir
|
||||
|
||||
# Let user pick an agent
|
||||
agent_path = _select_agent(agents_dir)
|
||||
if not agent_path:
|
||||
return 1
|
||||
|
||||
# Launch TUI (same pattern as cmd_run --tui)
|
||||
async def run_with_tui():
|
||||
try:
|
||||
runner = AgentRunner.load(
|
||||
agent_path,
|
||||
model=args.model,
|
||||
enable_tui=True,
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"Error loading agent: {e}")
|
||||
return
|
||||
|
||||
if runner._agent_runtime is None:
|
||||
runner._setup()
|
||||
|
||||
if runner._agent_runtime and not runner._agent_runtime.is_running:
|
||||
await runner._agent_runtime.start()
|
||||
|
||||
app = AdenTUI(runner._agent_runtime)
|
||||
try:
|
||||
await app.run_async()
|
||||
except Exception as e:
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
print(f"TUI error: {e}")
|
||||
|
||||
await runner.cleanup_async()
|
||||
|
||||
asyncio.run(run_with_tui())
|
||||
print("TUI session ended.")
|
||||
return 0
|
||||
|
||||
|
||||
def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
|
||||
"""Extract name and description from a Python-based agent's config.py.
|
||||
|
||||
Uses AST parsing to safely extract values without executing code.
|
||||
Returns (name, description) tuple, with fallbacks if parsing fails.
|
||||
"""
|
||||
import ast
|
||||
|
||||
config_path = agent_path / "config.py"
|
||||
fallback_name = agent_path.name.replace("_", " ").title()
|
||||
fallback_desc = "(Python-based agent)"
|
||||
|
||||
if not config_path.exists():
|
||||
return fallback_name, fallback_desc
|
||||
|
||||
try:
|
||||
with open(config_path) as f:
|
||||
tree = ast.parse(f.read())
|
||||
|
||||
# Find AgentMetadata class definition
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.ClassDef) and node.name == "AgentMetadata":
|
||||
name = fallback_name
|
||||
desc = fallback_desc
|
||||
|
||||
# Extract default values from class body
|
||||
for item in node.body:
|
||||
if isinstance(item, ast.AnnAssign) and isinstance(item.target, ast.Name):
|
||||
field_name = item.target.id
|
||||
if item.value:
|
||||
# Handle simple string constants
|
||||
if isinstance(item.value, ast.Constant):
|
||||
if field_name == "name":
|
||||
name = item.value.value
|
||||
elif field_name == "description":
|
||||
desc = item.value.value
|
||||
# Handle parenthesized multi-line strings (concatenated)
|
||||
elif isinstance(item.value, ast.JoinedStr):
|
||||
# f-strings - skip, use fallback
|
||||
pass
|
||||
elif isinstance(item.value, ast.BinOp):
|
||||
# String concatenation with + - try to evaluate
|
||||
try:
|
||||
result = _eval_string_binop(item.value)
|
||||
if result and field_name == "name":
|
||||
name = result
|
||||
elif result and field_name == "description":
|
||||
desc = result
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return name, desc
|
||||
|
||||
return fallback_name, fallback_desc
|
||||
except Exception:
|
||||
return fallback_name, fallback_desc
|
||||
|
||||
|
||||
def _eval_string_binop(node) -> str | None:
|
||||
"""Recursively evaluate a BinOp of string constants."""
|
||||
import ast
|
||||
|
||||
if isinstance(node, ast.Constant) and isinstance(node.value, str):
|
||||
return node.value
|
||||
elif isinstance(node, ast.BinOp) and isinstance(node.op, ast.Add):
|
||||
left = _eval_string_binop(node.left)
|
||||
right = _eval_string_binop(node.right)
|
||||
if left is not None and right is not None:
|
||||
return left + right
|
||||
return None
|
||||
|
||||
|
||||
def _is_valid_agent_dir(path: Path) -> bool:
|
||||
"""Check if a directory contains a valid agent (agent.json or agent.py)."""
|
||||
if not path.is_dir():
|
||||
return False
|
||||
return (path / "agent.json").exists() or (path / "agent.py").exists()
|
||||
|
||||
|
||||
def _has_agents(directory: Path) -> bool:
|
||||
"""Check if a directory contains any valid agents (folders with agent.json or agent.py)."""
|
||||
if not directory.exists():
|
||||
return False
|
||||
return any(_is_valid_agent_dir(p) for p in directory.iterdir())
|
||||
|
||||
|
||||
def _getch() -> str:
|
||||
"""Read a single character from stdin without waiting for Enter."""
|
||||
try:
|
||||
if sys.platform == "win32":
|
||||
import msvcrt
|
||||
|
||||
ch = msvcrt.getch()
|
||||
return ch.decode("utf-8", errors="ignore")
|
||||
else:
|
||||
import termios
|
||||
import tty
|
||||
|
||||
fd = sys.stdin.fileno()
|
||||
old_settings = termios.tcgetattr(fd)
|
||||
try:
|
||||
tty.setraw(fd)
|
||||
ch = sys.stdin.read(1)
|
||||
finally:
|
||||
termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
|
||||
return ch
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def _read_key() -> str:
|
||||
"""Read a key, handling arrow key escape sequences."""
|
||||
ch = _getch()
|
||||
if ch == "\x1b": # Escape sequence start
|
||||
ch2 = _getch()
|
||||
if ch2 == "[":
|
||||
ch3 = _getch()
|
||||
if ch3 == "C": # Right arrow
|
||||
return "RIGHT"
|
||||
elif ch3 == "D": # Left arrow
|
||||
return "LEFT"
|
||||
return ch
|
||||
|
||||
|
||||
def _select_agent(agents_dir: Path) -> str | None:
|
||||
"""Let user select an agent from available agents."""
|
||||
"""Let user select an agent from available agents with pagination."""
|
||||
AGENTS_PER_PAGE = 10
|
||||
|
||||
if not agents_dir.exists():
|
||||
print(f"Directory not found: {agents_dir}", file=sys.stderr)
|
||||
# fixes issue #696, creates an exports folder if it does not exist
|
||||
@@ -939,37 +1211,126 @@ def _select_agent(agents_dir: Path) -> str | None:
|
||||
|
||||
agents = []
|
||||
for path in agents_dir.iterdir():
|
||||
if path.is_dir() and (path / "agent.json").exists():
|
||||
if _is_valid_agent_dir(path):
|
||||
agents.append(path)
|
||||
|
||||
if not agents:
|
||||
print(f"No agents found in {agents_dir}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
print(f"\nAvailable agents in {agents_dir}:\n")
|
||||
for i, agent_path in enumerate(agents, 1):
|
||||
# Pagination setup
|
||||
page = 0
|
||||
total_pages = (len(agents) + AGENTS_PER_PAGE - 1) // AGENTS_PER_PAGE
|
||||
|
||||
while True:
|
||||
start_idx = page * AGENTS_PER_PAGE
|
||||
end_idx = min(start_idx + AGENTS_PER_PAGE, len(agents))
|
||||
page_agents = agents[start_idx:end_idx]
|
||||
|
||||
# Show page header with indicator
|
||||
if total_pages > 1:
|
||||
print(f"\nAvailable agents in {agents_dir} (Page {page + 1}/{total_pages}):\n")
|
||||
else:
|
||||
print(f"\nAvailable agents in {agents_dir}:\n")
|
||||
|
||||
# Display agents for current page (with global numbering)
|
||||
for i, agent_path in enumerate(page_agents, start_idx + 1):
|
||||
try:
|
||||
agent_json = agent_path / "agent.json"
|
||||
if agent_json.exists():
|
||||
with open(agent_json) as f:
|
||||
data = json.load(f)
|
||||
agent_meta = data.get("agent", {})
|
||||
name = agent_meta.get("name", agent_path.name)
|
||||
desc = agent_meta.get("description", "")
|
||||
else:
|
||||
# Python-based agent - extract from config.py
|
||||
name, desc = _extract_python_agent_metadata(agent_path)
|
||||
desc = desc[:50] + "..." if len(desc) > 50 else desc
|
||||
print(f" {i}. {name}")
|
||||
print(f" {desc}")
|
||||
except Exception as e:
|
||||
print(f" {i}. {agent_path.name} (error: {e})")
|
||||
|
||||
# Build navigation options
|
||||
nav_options = []
|
||||
if total_pages > 1:
|
||||
nav_options.append("←/→ or p/n=navigate")
|
||||
nav_options.append("q=quit")
|
||||
|
||||
print()
|
||||
if total_pages > 1:
|
||||
print(f" [{', '.join(nav_options)}]")
|
||||
print()
|
||||
|
||||
# Show prompt
|
||||
print("Select agent (number), use arrows to navigate, or q to quit: ", end="", flush=True)
|
||||
|
||||
try:
|
||||
from framework.runner import AgentRunner
|
||||
key = _read_key()
|
||||
|
||||
runner = AgentRunner.load(agent_path)
|
||||
info = runner.info()
|
||||
desc = info.description[:50] + "..." if len(info.description) > 50 else info.description
|
||||
print(f" {i}. {info.name}")
|
||||
print(f" {desc}")
|
||||
runner.cleanup()
|
||||
except Exception as e:
|
||||
print(f" {i}. {agent_path.name} (error: {e})")
|
||||
if key == "RIGHT" and page < total_pages - 1:
|
||||
page += 1
|
||||
print() # Newline before redrawing
|
||||
elif key == "LEFT" and page > 0:
|
||||
page -= 1
|
||||
print()
|
||||
elif key == "q":
|
||||
print()
|
||||
return None
|
||||
elif key in ("n", ">") and page < total_pages - 1:
|
||||
page += 1
|
||||
print()
|
||||
elif key in ("p", "<") and page > 0:
|
||||
page -= 1
|
||||
print()
|
||||
elif key.isdigit():
|
||||
# Build number with support for backspace
|
||||
buffer = key
|
||||
print(key, end="", flush=True)
|
||||
|
||||
print()
|
||||
try:
|
||||
choice = input("Select agent (number): ").strip()
|
||||
idx = int(choice) - 1
|
||||
if 0 <= idx < len(agents):
|
||||
return str(agents[idx])
|
||||
print("Invalid selection")
|
||||
return None
|
||||
except (ValueError, EOFError, KeyboardInterrupt):
|
||||
return None
|
||||
while True:
|
||||
ch = _getch()
|
||||
if ch in ("\r", "\n"):
|
||||
# Enter pressed - submit
|
||||
print()
|
||||
break
|
||||
elif ch in ("\x7f", "\x08"):
|
||||
# Backspace (DEL or BS)
|
||||
if buffer:
|
||||
buffer = buffer[:-1]
|
||||
# Erase character: move back, print space, move back
|
||||
print("\b \b", end="", flush=True)
|
||||
elif ch.isdigit():
|
||||
buffer += ch
|
||||
print(ch, end="", flush=True)
|
||||
elif ch == "\x1b":
|
||||
# Escape - cancel input
|
||||
print()
|
||||
buffer = ""
|
||||
break
|
||||
elif ch == "\x03":
|
||||
# Ctrl+C
|
||||
print()
|
||||
return None
|
||||
# Ignore other characters
|
||||
|
||||
if buffer:
|
||||
try:
|
||||
idx = int(buffer) - 1
|
||||
if 0 <= idx < len(agents):
|
||||
return str(agents[idx])
|
||||
print("Invalid selection")
|
||||
except ValueError:
|
||||
print("Invalid input")
|
||||
elif key == "\r" or key == "\n":
|
||||
print() # Just pressed enter, redraw
|
||||
else:
|
||||
print()
|
||||
print("Invalid input")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
print()
|
||||
return None
|
||||
|
||||
|
||||
def _interactive_multi(agents_dir: Path) -> int:
|
||||
@@ -985,7 +1346,7 @@ def _interactive_multi(agents_dir: Path) -> int:
|
||||
|
||||
# Register all agents
|
||||
for path in agents_dir.iterdir():
|
||||
if path.is_dir() and (path / "agent.json").exists():
|
||||
if _is_valid_agent_dir(path):
|
||||
try:
|
||||
orchestrator.register(path.name, path)
|
||||
agent_count += 1
|
||||
|
||||
@@ -362,6 +362,15 @@ class MCPClient:
|
||||
# Call tool using persistent session
|
||||
result = await self._session.call_tool(tool_name, arguments=arguments)
|
||||
|
||||
# Check for server-side errors (validation failures, tool exceptions, etc.)
|
||||
if getattr(result, "isError", False):
|
||||
error_text = ""
|
||||
if result.content:
|
||||
content_item = result.content[0]
|
||||
if hasattr(content_item, "text"):
|
||||
error_text = content_item.text
|
||||
raise RuntimeError(f"MCP tool '{tool_name}' failed: {error_text}")
|
||||
|
||||
# Extract content
|
||||
if result.content:
|
||||
# MCP returns content as a list of content items
|
||||
|
||||
+225
-23
@@ -19,6 +19,8 @@ from framework.runner.tool_registry import ToolRegistry
|
||||
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
|
||||
from framework.runtime.core import Runtime
|
||||
from framework.runtime.execution_stream import EntryPointSpec
|
||||
from framework.runtime.runtime_log_store import RuntimeLogStore
|
||||
from framework.runtime.runtime_logger import RuntimeLogger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from framework.runner.protocol import AgentMessage, CapabilityResponse
|
||||
@@ -28,6 +30,33 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
# Configuration paths
|
||||
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
|
||||
|
||||
|
||||
def _ensure_credential_key_env() -> None:
|
||||
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
|
||||
|
||||
The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
|
||||
If the user hasn't sourced their config in the current shell, this reads it
|
||||
directly so the runner (and any MCP subprocesses it spawns) can unlock the
|
||||
encrypted credential store.
|
||||
|
||||
Only HIVE_CREDENTIAL_KEY is loaded this way — all other secrets (API keys, etc.)
|
||||
come from the credential store itself.
|
||||
"""
|
||||
if os.environ.get("HIVE_CREDENTIAL_KEY"):
|
||||
return
|
||||
|
||||
try:
|
||||
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
|
||||
|
||||
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
|
||||
if found and value:
|
||||
os.environ["HIVE_CREDENTIAL_KEY"] = value
|
||||
logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
|
||||
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
|
||||
|
||||
|
||||
@@ -236,6 +265,15 @@ class AgentRunner:
|
||||
result = await runner.run({"lead_id": "123"})
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
def _resolve_default_model() -> str:
|
||||
"""Resolve the default model from ~/.hive/configuration.json."""
|
||||
config = get_hive_config()
|
||||
llm = config.get("llm", {})
|
||||
if llm.get("provider") and llm.get("model"):
|
||||
return f"{llm['provider']}/{llm['model']}"
|
||||
return "anthropic/claude-sonnet-4-20250514"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
agent_path: Path,
|
||||
@@ -243,7 +281,8 @@ class AgentRunner:
|
||||
goal: Goal,
|
||||
mock_mode: bool = False,
|
||||
storage_path: Path | None = None,
|
||||
model: str = "cerebras/zai-glm-4.7",
|
||||
model: str | None = None,
|
||||
enable_tui: bool = False,
|
||||
):
|
||||
"""
|
||||
Initialize the runner (use AgentRunner.load() instead).
|
||||
@@ -254,27 +293,32 @@ class AgentRunner:
|
||||
goal: Loaded Goal object
|
||||
mock_mode: If True, use mock LLM responses
|
||||
storage_path: Path for runtime storage (defaults to temp)
|
||||
model: Model to use - any LiteLLM-compatible model name
|
||||
(e.g., "claude-sonnet-4-20250514", "gpt-4o-mini", "gemini/gemini-pro")
|
||||
model: Model to use (reads from agent config or ~/.hive/configuration.json if None)
|
||||
enable_tui: If True, forces use of AgentRuntime with EventBus
|
||||
"""
|
||||
self.agent_path = agent_path
|
||||
self.graph = graph
|
||||
self.goal = goal
|
||||
self.mock_mode = mock_mode
|
||||
self.model = model
|
||||
self.model = model or self._resolve_default_model()
|
||||
self.enable_tui = enable_tui
|
||||
|
||||
# Set up storage
|
||||
if storage_path:
|
||||
self._storage_path = storage_path
|
||||
self._temp_dir = None
|
||||
else:
|
||||
# Use persistent storage in ~/.hive by default
|
||||
# Use persistent storage in ~/.hive/agents/{agent_name}/ per RUNTIME_LOGGING.md spec
|
||||
home = Path.home()
|
||||
default_storage = home / ".hive" / "storage" / agent_path.name
|
||||
default_storage = home / ".hive" / "agents" / agent_path.name
|
||||
default_storage.mkdir(parents=True, exist_ok=True)
|
||||
self._storage_path = default_storage
|
||||
self._temp_dir = None
|
||||
|
||||
# Load HIVE_CREDENTIAL_KEY from shell config if not in env.
|
||||
# Must happen before MCP subprocesses are spawned so they inherit it.
|
||||
_ensure_credential_key_env()
|
||||
|
||||
# Initialize components
|
||||
self._tool_registry = ToolRegistry()
|
||||
self._runtime: Runtime | None = None
|
||||
@@ -296,32 +340,121 @@ class AgentRunner:
|
||||
if mcp_config_path.exists():
|
||||
self._load_mcp_servers_from_config(mcp_config_path)
|
||||
|
||||
@staticmethod
|
||||
def _import_agent_module(agent_path: Path):
|
||||
"""Import an agent package from its directory path.
|
||||
|
||||
Tries package import first (works when exports/ is on sys.path,
|
||||
which cli.py:_configure_paths() ensures). Falls back to direct
|
||||
file import of agent.py via importlib.util.
|
||||
"""
|
||||
import importlib
|
||||
|
||||
package_name = agent_path.name
|
||||
|
||||
# Try importing as a package (works when exports/ is on sys.path)
|
||||
try:
|
||||
return importlib.import_module(package_name)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
# Fallback: import agent.py directly via file path
|
||||
import importlib.util
|
||||
|
||||
agent_py = agent_path / "agent.py"
|
||||
if not agent_py.exists():
|
||||
raise FileNotFoundError(
|
||||
f"No importable agent found at {agent_path}. "
|
||||
f"Expected a Python package with agent.py."
|
||||
)
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
f"{package_name}.agent",
|
||||
agent_py,
|
||||
submodule_search_locations=[str(agent_path)],
|
||||
)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
@classmethod
|
||||
def load(
|
||||
cls,
|
||||
agent_path: str | Path,
|
||||
mock_mode: bool = False,
|
||||
storage_path: Path | None = None,
|
||||
model: str = "cerebras/zai-glm-4.7",
|
||||
model: str | None = None,
|
||||
enable_tui: bool = False,
|
||||
) -> "AgentRunner":
|
||||
"""
|
||||
Load an agent from an export folder.
|
||||
|
||||
Imports the agent's Python package and reads module-level variables
|
||||
(goal, nodes, edges, etc.) to build a GraphSpec. Falls back to
|
||||
agent.json if no Python module is found.
|
||||
|
||||
Args:
|
||||
agent_path: Path to agent folder (containing agent.json)
|
||||
agent_path: Path to agent folder
|
||||
mock_mode: If True, use mock LLM responses
|
||||
storage_path: Path for runtime storage (defaults to temp)
|
||||
model: LLM model to use (any LiteLLM-compatible model name)
|
||||
storage_path: Path for runtime storage (defaults to ~/.hive/agents/{name})
|
||||
model: LLM model to use (reads from agent's default_config if None)
|
||||
enable_tui: If True, forces use of AgentRuntime with EventBus
|
||||
|
||||
Returns:
|
||||
AgentRunner instance ready to run
|
||||
"""
|
||||
agent_path = Path(agent_path)
|
||||
|
||||
# Load agent.json
|
||||
# Try loading from Python module first (code-based agents)
|
||||
agent_py = agent_path / "agent.py"
|
||||
if agent_py.exists():
|
||||
agent_module = cls._import_agent_module(agent_path)
|
||||
|
||||
goal = getattr(agent_module, "goal", None)
|
||||
nodes = getattr(agent_module, "nodes", None)
|
||||
edges = getattr(agent_module, "edges", None)
|
||||
|
||||
if goal is None or nodes is None or edges is None:
|
||||
raise ValueError(
|
||||
f"Agent at {agent_path} must define 'goal', 'nodes', and 'edges' "
|
||||
f"in agent.py (or __init__.py)"
|
||||
)
|
||||
|
||||
# Read model and max_tokens from agent's config if not explicitly provided
|
||||
agent_config = getattr(agent_module, "default_config", None)
|
||||
if model is None:
|
||||
if agent_config and hasattr(agent_config, "model"):
|
||||
model = agent_config.model
|
||||
|
||||
max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
|
||||
|
||||
# Build GraphSpec from module-level variables
|
||||
graph = GraphSpec(
|
||||
id=f"{agent_path.name}-graph",
|
||||
goal_id=goal.id,
|
||||
version="1.0.0",
|
||||
entry_node=getattr(agent_module, "entry_node", nodes[0].id),
|
||||
entry_points=getattr(agent_module, "entry_points", {}),
|
||||
terminal_nodes=getattr(agent_module, "terminal_nodes", []),
|
||||
pause_nodes=getattr(agent_module, "pause_nodes", []),
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
return cls(
|
||||
agent_path=agent_path,
|
||||
graph=graph,
|
||||
goal=goal,
|
||||
mock_mode=mock_mode,
|
||||
storage_path=storage_path,
|
||||
model=model,
|
||||
enable_tui=enable_tui,
|
||||
)
|
||||
|
||||
# Fallback: load from agent.json (legacy JSON-based agents)
|
||||
agent_json_path = agent_path / "agent.json"
|
||||
if not agent_json_path.exists():
|
||||
raise FileNotFoundError(f"agent.json not found in {agent_path}")
|
||||
raise FileNotFoundError(f"No agent.py or agent.json found in {agent_path}")
|
||||
|
||||
with open(agent_json_path) as f:
|
||||
graph, goal = load_agent_export(f.read())
|
||||
@@ -333,6 +466,7 @@ class AgentRunner:
|
||||
mock_mode=mock_mode,
|
||||
storage_path=storage_path,
|
||||
model=model,
|
||||
enable_tui=enable_tui,
|
||||
)
|
||||
|
||||
def register_tool(
|
||||
@@ -471,16 +605,25 @@ class AgentRunner:
|
||||
api_key_env = self._get_api_key_env_var(self.model)
|
||||
if api_key_env and os.environ.get(api_key_env):
|
||||
self._llm = LiteLLMProvider(model=self.model)
|
||||
elif api_key_env:
|
||||
print(f"Warning: {api_key_env} not set. LLM calls will fail.")
|
||||
print(f"Set it with: export {api_key_env}=your-api-key")
|
||||
else:
|
||||
# Fall back to credential store
|
||||
api_key = self._get_api_key_from_credential_store()
|
||||
if api_key:
|
||||
self._llm = LiteLLMProvider(model=self.model, api_key=api_key)
|
||||
# Set env var so downstream code (e.g. cleanup LLM in
|
||||
# node._extract_json) can also find it
|
||||
if api_key_env:
|
||||
os.environ[api_key_env] = api_key
|
||||
elif api_key_env:
|
||||
print(f"Warning: {api_key_env} not set. LLM calls will fail.")
|
||||
print(f"Set it with: export {api_key_env}=your-api-key")
|
||||
|
||||
# Get tools for executor/runtime
|
||||
tools = list(self._tool_registry.get_tools().values())
|
||||
tool_executor = self._tool_registry.get_executor()
|
||||
|
||||
if self._uses_async_entry_points:
|
||||
# Multi-entry-point mode: use AgentRuntime
|
||||
if self._uses_async_entry_points or self.enable_tui:
|
||||
# Multi-entry-point mode or TUI mode: use AgentRuntime
|
||||
self._setup_agent_runtime(tools, tool_executor)
|
||||
else:
|
||||
# Single-entry-point mode: use legacy GraphExecutor
|
||||
@@ -518,11 +661,42 @@ class AgentRunner:
|
||||
# Default: assume OpenAI-compatible
|
||||
return "OPENAI_API_KEY"
|
||||
|
||||
def _get_api_key_from_credential_store(self) -> str | None:
|
||||
"""Get the LLM API key from the encrypted credential store.
|
||||
|
||||
Maps model name to credential store ID (e.g. "anthropic/..." -> "anthropic")
|
||||
and retrieves the key via CredentialStore.get().
|
||||
"""
|
||||
if not os.environ.get("HIVE_CREDENTIAL_KEY"):
|
||||
return None
|
||||
|
||||
# Map model prefix to credential store ID
|
||||
model_lower = self.model.lower()
|
||||
cred_id = None
|
||||
if model_lower.startswith("anthropic/") or model_lower.startswith("claude"):
|
||||
cred_id = "anthropic"
|
||||
# Add more mappings as providers are added to LLM_CREDENTIALS
|
||||
|
||||
if cred_id is None:
|
||||
return None
|
||||
|
||||
try:
|
||||
from framework.credentials import CredentialStore
|
||||
|
||||
store = CredentialStore.with_encrypted_storage()
|
||||
return store.get(cred_id)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def _setup_legacy_executor(self, tools: list, tool_executor: Callable | None) -> None:
|
||||
"""Set up legacy single-entry-point execution using GraphExecutor."""
|
||||
# Create runtime
|
||||
self._runtime = Runtime(storage_path=self._storage_path)
|
||||
|
||||
# Create runtime logger
|
||||
log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")
|
||||
runtime_logger = RuntimeLogger(store=log_store, agent_id=self.graph.id)
|
||||
|
||||
# Create executor
|
||||
self._executor = GraphExecutor(
|
||||
runtime=self._runtime,
|
||||
@@ -530,6 +704,8 @@ class AgentRunner:
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
approval_callback=self._approval_callback,
|
||||
runtime_logger=runtime_logger,
|
||||
loop_config=self.graph.loop_config,
|
||||
)
|
||||
|
||||
def _setup_agent_runtime(self, tools: list, tool_executor: Callable | None) -> None:
|
||||
@@ -549,7 +725,22 @@ class AgentRunner:
|
||||
)
|
||||
entry_points.append(ep)
|
||||
|
||||
# If TUI enabled but no entry points (single-entry agent), create default
|
||||
if not entry_points and self.enable_tui and self.graph.entry_node:
|
||||
logger.info("Creating default entry point for TUI")
|
||||
entry_points.append(
|
||||
EntryPointSpec(
|
||||
id="default",
|
||||
name="Default",
|
||||
entry_node=self.graph.entry_node,
|
||||
trigger_type="manual",
|
||||
isolation_level="shared",
|
||||
)
|
||||
)
|
||||
|
||||
# Create AgentRuntime with all entry points
|
||||
log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")
|
||||
|
||||
self._agent_runtime = create_agent_runtime(
|
||||
graph=self.graph,
|
||||
goal=self.goal,
|
||||
@@ -558,6 +749,7 @@ class AgentRunner:
|
||||
llm=self._llm,
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
runtime_log_store=log_store,
|
||||
)
|
||||
|
||||
async def run(
|
||||
@@ -599,7 +791,7 @@ class AgentRunner:
|
||||
error=error_msg,
|
||||
)
|
||||
|
||||
if self._uses_async_entry_points:
|
||||
if self._uses_async_entry_points or self.enable_tui:
|
||||
# Multi-entry-point mode: use AgentRuntime
|
||||
return await self._run_with_agent_runtime(
|
||||
input_data=input_data or {},
|
||||
@@ -891,15 +1083,25 @@ class AgentRunner:
|
||||
EnvVarStorage,
|
||||
)
|
||||
|
||||
# Build env mapping for fallback
|
||||
# Build env mapping for credential lookup
|
||||
env_mapping = {
|
||||
(spec.credential_id or name): spec.env_var
|
||||
for name, spec in CREDENTIAL_SPECS.items()
|
||||
}
|
||||
storage = CompositeStorage(
|
||||
primary=EncryptedFileStorage(),
|
||||
fallbacks=[EnvVarStorage(env_mapping=env_mapping)],
|
||||
)
|
||||
|
||||
# Only use EncryptedFileStorage if the encryption key is configured;
|
||||
# otherwise just check env vars (avoids generating a throwaway key)
|
||||
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
|
||||
if os.environ.get("HIVE_CREDENTIAL_KEY"):
|
||||
storages.insert(0, EncryptedFileStorage())
|
||||
|
||||
if len(storages) == 1:
|
||||
storage = storages[0]
|
||||
else:
|
||||
storage = CompositeStorage(
|
||||
primary=storages[0],
|
||||
fallbacks=storages[1:],
|
||||
)
|
||||
store = CredentialStore(storage=storage)
|
||||
|
||||
# Build reverse mappings
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
"""Tool discovery and registration for agent runner."""
|
||||
|
||||
import contextvars
|
||||
import importlib.util
|
||||
import inspect
|
||||
import json
|
||||
@@ -13,6 +14,13 @@ from framework.llm.provider import Tool, ToolResult, ToolUse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Per-execution context overrides. Each asyncio task (and thus each
|
||||
# concurrent graph execution) gets its own copy, so there are no races
|
||||
# when multiple ExecutionStreams run in parallel.
|
||||
_execution_context: contextvars.ContextVar[dict[str, Any] | None] = contextvars.ContextVar(
|
||||
"_execution_context", default=None
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class RegisteredTool:
|
||||
@@ -33,6 +41,11 @@ class ToolRegistry:
|
||||
4. Manually registered tools
|
||||
"""
|
||||
|
||||
# Framework-internal context keys injected into tool calls.
|
||||
# Stripped from LLM-facing schemas (the LLM doesn't know these values)
|
||||
# and auto-injected at call time for tools that accept them.
|
||||
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id", "data_dir"})
|
||||
|
||||
def __init__(self):
|
||||
self._tools: dict[str, RegisteredTool] = {}
|
||||
self._mcp_clients: list[Any] = [] # List of MCPClient instances
|
||||
@@ -257,6 +270,24 @@ class ToolRegistry:
|
||||
"""
|
||||
self._session_context.update(context)
|
||||
|
||||
@staticmethod
|
||||
def set_execution_context(**context) -> contextvars.Token:
|
||||
"""Set per-execution context overrides (concurrency-safe via contextvars).
|
||||
|
||||
Values set here take precedence over session context. Each asyncio
|
||||
task gets its own copy, so concurrent executions don't interfere.
|
||||
|
||||
Returns a token that must be passed to :meth:`reset_execution_context`
|
||||
to restore the previous state.
|
||||
"""
|
||||
current = _execution_context.get() or {}
|
||||
return _execution_context.set({**current, **context})
|
||||
|
||||
@staticmethod
|
||||
def reset_execution_context(token: contextvars.Token) -> None:
|
||||
"""Restore execution context to its previous state."""
|
||||
_execution_context.reset(token)
|
||||
|
||||
def load_mcp_config(self, config_path: Path) -> None:
|
||||
"""
|
||||
Load and register MCP servers from a config file.
|
||||
@@ -275,7 +306,16 @@ class ToolRegistry:
|
||||
return
|
||||
|
||||
base_dir = config_path.parent
|
||||
for server_config in config.get("servers", []):
|
||||
|
||||
# Support both formats:
|
||||
# {"servers": [{"name": "x", ...}]} (list format)
|
||||
# {"server-name": {"transport": ...}, ...} (dict format)
|
||||
server_list = config.get("servers", [])
|
||||
if not server_list and "servers" not in config:
|
||||
# Treat top-level keys as server names
|
||||
server_list = [{"name": name, **cfg} for name, cfg in config.items()]
|
||||
|
||||
for server_config in server_list:
|
||||
cwd = server_config.get("cwd")
|
||||
if cwd and not Path(cwd).is_absolute():
|
||||
server_config["cwd"] = str((base_dir / cwd).resolve())
|
||||
@@ -333,7 +373,7 @@ class ToolRegistry:
|
||||
# Register each tool
|
||||
count = 0
|
||||
for mcp_tool in client.list_tools():
|
||||
# Convert MCP tool to framework Tool
|
||||
# Convert MCP tool to framework Tool (strips context params from LLM schema)
|
||||
tool = self._convert_mcp_tool_to_framework_tool(mcp_tool)
|
||||
|
||||
# Create executor that calls the MCP server
|
||||
@@ -345,11 +385,15 @@ class ToolRegistry:
|
||||
):
|
||||
def executor(inputs: dict) -> Any:
|
||||
try:
|
||||
# Only inject session context params the tool accepts
|
||||
# Build base context: session < execution (execution wins)
|
||||
base_context = dict(registry_ref._session_context)
|
||||
exec_ctx = _execution_context.get()
|
||||
if exec_ctx:
|
||||
base_context.update(exec_ctx)
|
||||
|
||||
# Only inject context params the tool accepts
|
||||
filtered_context = {
|
||||
k: v
|
||||
for k, v in registry_ref._session_context.items()
|
||||
if k in tool_params
|
||||
k: v for k, v in base_context.items() if k in tool_params
|
||||
}
|
||||
merged_inputs = {**filtered_context, **inputs}
|
||||
result = client_ref.call_tool(tool_name, merged_inputs)
|
||||
@@ -395,6 +439,11 @@ class ToolRegistry:
|
||||
properties = input_schema.get("properties", {})
|
||||
required = input_schema.get("required", [])
|
||||
|
||||
# Strip framework-internal context params from LLM-facing schema.
|
||||
# The LLM can't know these values; they're auto-injected at call time.
|
||||
properties = {k: v for k, v in properties.items() if k not in self.CONTEXT_PARAMS}
|
||||
required = [r for r in required if r not in self.CONTEXT_PARAMS]
|
||||
|
||||
# Convert to framework Tool format
|
||||
tool = Tool(
|
||||
name=mcp_tool.name,
|
||||
|
||||
@@ -0,0 +1,688 @@
|
||||
# Runtime Logging System
|
||||
|
||||
## Overview
|
||||
|
||||
The Hive framework uses a **three-level observability system** for tracking agent execution at different granularities:
|
||||
|
||||
- **L1 (Summary)**: High-level run outcomes - success/failure, execution quality, attention flags
|
||||
- **L2 (Details)**: Per-node completion details - retries, verdicts, latency, attention reasons
|
||||
- **L3 (Tool Logs)**: Step-by-step execution - tool calls, LLM responses, judge feedback
|
||||
|
||||
This layered approach enables efficient debugging: start with L1 to identify problematic runs, drill into L2 to find failing nodes, and analyze L3 for root cause details.
|
||||
|
||||
---
|
||||
|
||||
## Storage Architecture
|
||||
|
||||
### Current Structure (Unified Sessions)
|
||||
|
||||
**Default since 2026-02-06**
|
||||
|
||||
```
|
||||
~/.hive/agents/{agent_name}/
|
||||
└── sessions/
|
||||
└── session_YYYYMMDD_HHMMSS_{uuid}/
|
||||
├── state.json # Session state and metadata
|
||||
├── logs/ # Runtime logs (L1/L2/L3)
|
||||
│ ├── summary.json # L1: Run outcome
|
||||
│ ├── details.jsonl # L2: Per-node results
|
||||
│ └── tool_logs.jsonl # L3: Step-by-step execution
|
||||
├── conversations/ # Per-node EventLoop state
|
||||
└── data/ # Spillover artifacts
|
||||
```
|
||||
|
||||
**Key characteristics:**
|
||||
- All session data colocated in one directory
|
||||
- Consistent ID format: `session_YYYYMMDD_HHMMSS_{short_uuid}`
|
||||
- Logs written incrementally (JSONL for L2/L3)
|
||||
- Single source of truth: `state.json`
|
||||
|
||||
### Legacy Structure (Deprecated)
|
||||
|
||||
**Read-only for backward compatibility**
|
||||
|
||||
```
|
||||
~/.hive/agents/{agent_name}/
|
||||
├── runtime_logs/
|
||||
│ └── runs/
|
||||
│ └── {run_id}/
|
||||
│ ├── summary.json # L1
|
||||
│ ├── details.jsonl # L2
|
||||
│ └── tool_logs.jsonl # L3
|
||||
├── sessions/
|
||||
│ └── exec_{stream_id}_{uuid}/
|
||||
│ ├── conversations/
|
||||
│ └── data/
|
||||
├── runs/ # Deprecated
|
||||
│ └── run_start_*.json
|
||||
└── summaries/ # Deprecated
|
||||
└── run_start_*.json
|
||||
```
|
||||
|
||||
**Migration status:**
|
||||
- ✅ New sessions write to unified structure only
|
||||
- ✅ Old sessions remain readable
|
||||
- ❌ No new writes to `runs/`, `summaries/`, `runtime_logs/runs/`
|
||||
- ⚠️ Deprecation warnings emitted when reading old locations
|
||||
|
||||
---
|
||||
|
||||
## Components
|
||||
|
||||
### RuntimeLogger
|
||||
|
||||
**Location:** `core/framework/runtime/runtime_logger.py`
|
||||
|
||||
**Responsibilities:**
|
||||
- Receives execution events from GraphExecutor
|
||||
- Tracks per-node execution details
|
||||
- Aggregates attention flags
|
||||
- Coordinates with RuntimeLogStore
|
||||
|
||||
**Key methods:**
|
||||
```python
|
||||
def start_run(goal_id: str, session_id: str = "") -> str:
|
||||
"""Initialize a new run. Uses session_id as run_id if provided."""
|
||||
|
||||
def log_step(node_id: str, step_index: int, tool_calls: list, ...):
|
||||
"""Record one LLM step (L3). Appends to tool_logs.jsonl immediately."""
|
||||
|
||||
def log_node_complete(node_id: str, exit_status: str, ...):
|
||||
"""Record node completion (L2). Appends to details.jsonl immediately."""
|
||||
|
||||
async def end_run(status: str):
|
||||
"""Finalize run, aggregate L2→L1, write summary.json."""
|
||||
```
|
||||
|
||||
**Attention flag triggers:**
|
||||
```python
|
||||
# From runtime_logger.py:190-203
|
||||
needs_attention = any([
|
||||
retry_count > 3,
|
||||
escalate_count > 2,
|
||||
latency_ms > 60000,
|
||||
tokens_used > 100000,
|
||||
total_steps > 20,
|
||||
])
|
||||
```
|
||||
|
||||
### RuntimeLogStore
|
||||
|
||||
**Location:** `core/framework/runtime/runtime_log_store.py`
|
||||
|
||||
**Responsibilities:**
|
||||
- Manages log file I/O
|
||||
- Handles both old and new storage paths
|
||||
- Provides incremental append for L2/L3 (crash-safe)
|
||||
- Atomic writes for L1
|
||||
|
||||
**Storage path resolution:**
|
||||
```python
|
||||
def _get_run_dir(run_id: str) -> Path:
|
||||
"""Determine log directory based on run_id format.
|
||||
|
||||
- session_* → {storage_root}/sessions/{run_id}/logs/
|
||||
- Other → {base_path}/runtime_logs/runs/{run_id}/ (deprecated)
|
||||
"""
|
||||
```
|
||||
|
||||
**Key methods:**
|
||||
```python
|
||||
def ensure_run_dir(run_id: str):
|
||||
"""Create log directory immediately at start_run()."""
|
||||
|
||||
def append_step(run_id: str, step: NodeStepLog):
|
||||
"""Append L3 entry to tool_logs.jsonl. Thread-safe sync write."""
|
||||
|
||||
def append_node_detail(run_id: str, detail: NodeDetail):
|
||||
"""Append L2 entry to details.jsonl. Thread-safe sync write."""
|
||||
|
||||
async def save_summary(run_id: str, summary: RunSummaryLog):
|
||||
"""Write L1 summary.json atomically at end_run()."""
|
||||
```
|
||||
|
||||
**File format:**
|
||||
- **L1 (summary.json)**: Standard JSON, written once at end
|
||||
- **L2 (details.jsonl)**: JSONL (one object per line), appended per node
|
||||
- **L3 (tool_logs.jsonl)**: JSONL (one object per line), appended per step
|
||||
|
||||
### Runtime Log Schemas
|
||||
|
||||
**Location:** `core/framework/runtime/runtime_log_schemas.py`
|
||||
|
||||
**L1: RunSummaryLog**
|
||||
```python
|
||||
@dataclass
|
||||
class RunSummaryLog:
|
||||
run_id: str
|
||||
goal_id: str
|
||||
status: str # "success", "failure", "degraded", "in_progress"
|
||||
started_at: str # ISO 8601
|
||||
ended_at: str | None
|
||||
needs_attention: bool
|
||||
attention_summary: AttentionSummary
|
||||
total_nodes_executed: int
|
||||
nodes_with_failures: list[str]
|
||||
execution_quality: str # "clean", "degraded", "failed"
|
||||
total_latency_ms: int
|
||||
# ... additional metrics
|
||||
```
|
||||
|
||||
**L2: NodeDetail**
|
||||
```python
|
||||
@dataclass
|
||||
class NodeDetail:
|
||||
node_id: str
|
||||
exit_status: str # "success", "escalate", "no_valid_edge"
|
||||
retry_count: int
|
||||
verdict_counts: dict[str, int] # {ACCEPT: 1, RETRY: 3, ...}
|
||||
total_steps: int
|
||||
latency_ms: int
|
||||
needs_attention: bool
|
||||
attention_reasons: list[str]
|
||||
# ... tool error tracking, token counts
|
||||
```
|
||||
|
||||
**L3: NodeStepLog**
|
||||
```python
|
||||
@dataclass
|
||||
class NodeStepLog:
|
||||
node_id: str
|
||||
step_index: int
|
||||
tool_calls: list[dict]
|
||||
tool_results: list[dict]
|
||||
verdict: str # "ACCEPT", "RETRY", "ESCALATE", "CONTINUE"
|
||||
verdict_feedback: str
|
||||
llm_response_text: str
|
||||
tokens_used: int
|
||||
latency_ms: int
|
||||
# ... detailed execution state
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Querying Logs (MCP Tools)
|
||||
|
||||
### Tools Location
|
||||
|
||||
**MCP Server:** `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py`
|
||||
|
||||
Three MCP tools provide access to the logging system:
|
||||
|
||||
### L1: query_runtime_logs
|
||||
|
||||
**Purpose:** Find problematic runs
|
||||
|
||||
```python
|
||||
query_runtime_logs(
|
||||
agent_work_dir: str, # e.g., "~/.hive/agents/twitter_outreach"
|
||||
status: str = "", # "needs_attention", "success", "failure", "degraded"
|
||||
limit: int = 20
|
||||
) -> dict # {"runs": [...], "total": int}
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
```json
|
||||
{
|
||||
"runs": [
|
||||
{
|
||||
"run_id": "session_20260206_115718_e22339c5",
|
||||
"status": "degraded",
|
||||
"needs_attention": true,
|
||||
"attention_summary": {
|
||||
"total_attention_flags": 3,
|
||||
"categories": ["missing_outputs", "retry_loops"]
|
||||
},
|
||||
"started_at": "2026-02-06T11:57:18Z"
|
||||
}
|
||||
],
|
||||
"total": 1
|
||||
}
|
||||
```
|
||||
|
||||
**Common queries:**
|
||||
```python
|
||||
# Find all problematic runs
|
||||
query_runtime_logs(agent_work_dir, status="needs_attention")
|
||||
|
||||
# Get recent runs regardless of status
|
||||
query_runtime_logs(agent_work_dir, limit=10)
|
||||
|
||||
# Check for failures
|
||||
query_runtime_logs(agent_work_dir, status="failure")
|
||||
```
|
||||
|
||||
### L2: query_runtime_log_details
|
||||
|
||||
**Purpose:** Identify which nodes failed
|
||||
|
||||
```python
|
||||
query_runtime_log_details(
|
||||
agent_work_dir: str,
|
||||
run_id: str, # From L1 query
|
||||
needs_attention_only: bool = False,
|
||||
node_id: str = "" # Filter to specific node
|
||||
) -> dict # {"run_id": str, "nodes": [...]}
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
```json
|
||||
{
|
||||
"run_id": "session_20260206_115718_e22339c5",
|
||||
"nodes": [
|
||||
{
|
||||
"node_id": "intake-collector",
|
||||
"exit_status": "escalate",
|
||||
"retry_count": 5,
|
||||
"verdict_counts": {"RETRY": 5, "ESCALATE": 1},
|
||||
"attention_reasons": ["high_retry_count", "missing_outputs"],
|
||||
"total_steps": 8,
|
||||
"latency_ms": 12500,
|
||||
"needs_attention": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Common queries:**
|
||||
```python
|
||||
# Get all problematic nodes
|
||||
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
|
||||
|
||||
# Analyze specific node across run
|
||||
query_runtime_log_details(agent_work_dir, run_id, node_id="intake-collector")
|
||||
|
||||
# Full node breakdown
|
||||
query_runtime_log_details(agent_work_dir, run_id)
|
||||
```
|
||||
|
||||
### L3: query_runtime_log_raw
|
||||
|
||||
**Purpose:** Root cause analysis
|
||||
|
||||
```python
|
||||
query_runtime_log_raw(
|
||||
agent_work_dir: str,
|
||||
run_id: str,
|
||||
step_index: int = -1, # Specific step or -1 for all
|
||||
node_id: str = "" # Filter to specific node
|
||||
) -> dict # {"run_id": str, "steps": [...]}
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
```json
|
||||
{
|
||||
"run_id": "session_20260206_115718_e22339c5",
|
||||
"steps": [
|
||||
{
|
||||
"node_id": "intake-collector",
|
||||
"step_index": 3,
|
||||
"tool_calls": [
|
||||
{
|
||||
"tool": "web_search",
|
||||
"args": {"query": "@RomuloNevesOf"}
|
||||
}
|
||||
],
|
||||
"tool_results": [
|
||||
{
|
||||
"status": "success",
|
||||
"data": "..."
|
||||
}
|
||||
],
|
||||
"verdict": "RETRY",
|
||||
"verdict_feedback": "Missing required output 'twitter_handles'. You found the handle but didn't call set_output.",
|
||||
"llm_response_text": "I found the Twitter profile...",
|
||||
"tokens_used": 1234,
|
||||
"latency_ms": 2500
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Common queries:**
|
||||
```python
|
||||
# All steps for a problematic node
|
||||
query_runtime_log_raw(agent_work_dir, run_id, node_id="intake-collector")
|
||||
|
||||
# Specific step analysis
|
||||
query_runtime_log_raw(agent_work_dir, run_id, step_index=5)
|
||||
|
||||
# Full execution trace
|
||||
query_runtime_log_raw(agent_work_dir, run_id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Patterns
|
||||
|
||||
### Pattern 1: Top-Down Investigation
|
||||
|
||||
**Use case:** Debug a failing agent
|
||||
|
||||
```python
|
||||
# 1. Find problematic runs (L1)
|
||||
result = query_runtime_logs(
|
||||
agent_work_dir="~/.hive/agents/twitter_outreach",
|
||||
status="needs_attention"
|
||||
)
|
||||
run_id = result["runs"][0]["run_id"]
|
||||
|
||||
# 2. Identify failing nodes (L2)
|
||||
details = query_runtime_log_details(
|
||||
agent_work_dir="~/.hive/agents/twitter_outreach",
|
||||
run_id=run_id,
|
||||
needs_attention_only=True
|
||||
)
|
||||
problem_node = details["nodes"][0]["node_id"]
|
||||
|
||||
# 3. Analyze root cause (L3)
|
||||
raw = query_runtime_log_raw(
|
||||
agent_work_dir="~/.hive/agents/twitter_outreach",
|
||||
run_id=run_id,
|
||||
node_id=problem_node
|
||||
)
|
||||
# Examine verdict_feedback, tool_results, etc.
|
||||
```
|
||||
|
||||
### Pattern 2: Node-Specific Debugging
|
||||
|
||||
**Use case:** Investigate why a specific node keeps failing
|
||||
|
||||
```python
|
||||
# Get recent runs
|
||||
runs = query_runtime_logs("~/.hive/agents/my_agent", limit=10)
|
||||
|
||||
# For each run, check specific node
|
||||
for run in runs["runs"]:
|
||||
node_details = query_runtime_log_details(
|
||||
"~/.hive/agents/my_agent",
|
||||
run["run_id"],
|
||||
node_id="problematic-node"
|
||||
)
|
||||
# Analyze retry patterns, error types
|
||||
```
|
||||
|
||||
### Pattern 3: Real-Time Monitoring
|
||||
|
||||
**Use case:** Watch for issues during development
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
while True:
|
||||
result = query_runtime_logs(
|
||||
agent_work_dir="~/.hive/agents/my_agent",
|
||||
status="needs_attention",
|
||||
limit=1
|
||||
)
|
||||
|
||||
if result["total"] > 0:
|
||||
new_issue = result["runs"][0]
|
||||
print(f"⚠️ New issue detected: {new_issue['run_id']}")
|
||||
# Alert or drill into L2/L3
|
||||
|
||||
time.sleep(10) # Poll every 10 seconds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### GraphExecutor → RuntimeLogger
|
||||
|
||||
**Location:** `core/framework/graph/executor.py`
|
||||
|
||||
```python
|
||||
# Executor creates logger and passes session_id
|
||||
logger = RuntimeLogger(store, agent_id)
|
||||
run_id = logger.start_run(goal_id, session_id=execution_id)
|
||||
|
||||
# During execution
|
||||
logger.log_step(node_id, step_index, tool_calls, ...)
|
||||
logger.log_node_complete(node_id, exit_status, ...)
|
||||
|
||||
# At completion
|
||||
await logger.end_run(status="success")
|
||||
```
|
||||
|
||||
### EventLoopNode → RuntimeLogger
|
||||
|
||||
**Location:** `core/framework/graph/event_loop_node.py`
|
||||
|
||||
```python
|
||||
# EventLoopNode logs each step
|
||||
self._logger.log_step(
|
||||
node_id=self.id,
|
||||
step_index=step_count,
|
||||
tool_calls=current_tool_calls,
|
||||
tool_results=current_tool_results,
|
||||
verdict=verdict,
|
||||
verdict_feedback=feedback,
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
### AgentRuntime → RuntimeLogger
|
||||
|
||||
**Location:** `core/framework/runtime/agent_runtime.py`
|
||||
|
||||
```python
|
||||
# Runtime initializes logger with storage path
|
||||
log_store = RuntimeLogStore(base_path / "runtime_logs")
|
||||
logger = RuntimeLogger(log_store, agent_id)
|
||||
|
||||
# Passes session_id from ExecutionStream
|
||||
logger.start_run(goal_id, session_id=execution_id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File Format Details
|
||||
|
||||
### L1: summary.json
|
||||
|
||||
**Written:** Once at end_run()
|
||||
**Format:** Standard JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"run_id": "session_20260206_115718_e22339c5",
|
||||
"goal_id": "twitter-outreach-multi-loop",
|
||||
"status": "degraded",
|
||||
"started_at": "2026-02-06T11:57:18.593081",
|
||||
"ended_at": "2026-02-06T11:58:45.123456",
|
||||
"needs_attention": true,
|
||||
"attention_summary": {
|
||||
"total_attention_flags": 3,
|
||||
"categories": ["missing_outputs", "retry_loops"],
|
||||
"nodes_with_attention": ["intake-collector"]
|
||||
},
|
||||
"total_nodes_executed": 4,
|
||||
"nodes_with_failures": ["intake-collector"],
|
||||
"execution_quality": "degraded",
|
||||
"total_latency_ms": 86530,
|
||||
"total_retries": 5
|
||||
}
|
||||
```
|
||||
|
||||
### L2: details.jsonl
|
||||
|
||||
**Written:** Incrementally (append per node completion)
|
||||
**Format:** JSONL (one JSON object per line)
|
||||
|
||||
```jsonl
|
||||
{"node_id":"intake-collector","exit_status":"escalate","retry_count":5,"verdict_counts":{"RETRY":5,"ESCALATE":1},"total_steps":8,"latency_ms":12500,"needs_attention":true,"attention_reasons":["high_retry_count","missing_outputs"],"tool_error_count":0,"tokens_used":9876}
|
||||
{"node_id":"profile-analyzer","exit_status":"success","retry_count":0,"verdict_counts":{"ACCEPT":1},"total_steps":2,"latency_ms":5432,"needs_attention":false,"attention_reasons":[],"tool_error_count":0,"tokens_used":3456}
|
||||
```
|
||||
|
||||
### L3: tool_logs.jsonl
|
||||
|
||||
**Written:** Incrementally (append per step)
|
||||
**Format:** JSONL (one JSON object per line)
|
||||
|
||||
```jsonl
|
||||
{"node_id":"intake-collector","step_index":3,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Missing required output 'twitter_handles'. You found the handle but didn't call set_output.","llm_response_text":"I found the profile...","tokens_used":1234,"latency_ms":2500}
|
||||
{"node_id":"intake-collector","step_index":4,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf twitter"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Still missing 'twitter_handles'.","llm_response_text":"Found more info...","tokens_used":1456,"latency_ms":2300}
|
||||
```
|
||||
|
||||
**Why JSONL?**
|
||||
- Incremental append during execution (crash-safe)
|
||||
- No need to parse entire file to add one line
|
||||
- Data persisted immediately, not buffered
|
||||
- Easy to stream/process line-by-line
|
||||
|
||||
---
|
||||
|
||||
## Attention Flags System
|
||||
|
||||
### Automatic Detection
|
||||
|
||||
The runtime logger automatically flags issues based on execution metrics:
|
||||
|
||||
| Trigger | Threshold | Attention Reason | Category |
|
||||
|---------|-----------|------------------|----------|
|
||||
| High retries | `retry_count > 3` | `high_retry_count` | Retry Loops |
|
||||
| Escalations | `escalate_count > 2` | `escalation_pattern` | Guard Failures |
|
||||
| High latency | `latency_ms > 60000` | `high_latency` | High Latency |
|
||||
| Token usage | `tokens_used > 100000` | `high_token_usage` | Memory/Context |
|
||||
| Stalled steps | `total_steps > 20` | `excessive_steps` | Stalled Execution |
|
||||
| Tool errors | `tool_error_count > 0` | `tool_failures` | Tool Errors |
|
||||
| Missing outputs | `exit_status != "success"` | `missing_outputs` | Missing Outputs |
|
||||
|
||||
### Attention Categories
|
||||
|
||||
Used by `/hive-debugger` skill for issue categorization:
|
||||
|
||||
1. **Missing Outputs**: Node didn't set required output keys
|
||||
2. **Tool Errors**: Tool calls failed (API errors, timeouts)
|
||||
3. **Retry Loops**: Judge repeatedly rejecting outputs
|
||||
4. **Guard Failures**: Output validation failed
|
||||
5. **Stalled Execution**: EventLoopNode not making progress
|
||||
6. **High Latency**: Slow tool calls or LLM responses
|
||||
7. **Client-Facing Issues**: Premature set_output before user input
|
||||
8. **Edge Routing Errors**: No edges match current state
|
||||
9. **Memory/Context Issues**: Conversation history too long
|
||||
10. **Constraint Violations**: Agent violated goal-level rules
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### Reading Old Logs
|
||||
|
||||
The system automatically handles both old and new formats:
|
||||
|
||||
```python
|
||||
# MCP tools check both locations automatically
|
||||
result = query_runtime_logs("~/.hive/agents/old_agent")
|
||||
# Returns logs from both:
|
||||
# - ~/.hive/agents/old_agent/runtime_logs/runs/*/
|
||||
# - ~/.hive/agents/old_agent/sessions/session_*/logs/
|
||||
```
|
||||
|
||||
### Deprecation Warnings
|
||||
|
||||
When reading from old locations, deprecation warnings are emitted:
|
||||
|
||||
```
|
||||
DeprecationWarning: Reading logs from deprecated location for run_id=20260101T120000_abc12345.
|
||||
New sessions use unified storage at sessions/session_*/logs/
|
||||
```
|
||||
|
||||
### Migration Script (Optional)
|
||||
|
||||
For migrating existing old logs to new format, see:
|
||||
- `EXECUTION_STORAGE_REDESIGN.md` - Migration strategy
|
||||
- Future: `scripts/migrate_to_unified_sessions.py`
|
||||
|
||||
---
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Write Performance
|
||||
|
||||
- **L3 append**: ~1-2ms per step (sync I/O, thread-safe)
|
||||
- **L2 append**: ~1-2ms per node (sync I/O, thread-safe)
|
||||
- **L1 write**: ~5-10ms at end_run (atomic, async)
|
||||
|
||||
**Overhead:** < 5% of total execution time for typical agents
|
||||
|
||||
### Read Performance
|
||||
|
||||
- **L1 summary**: ~1-5ms (single JSON file)
|
||||
- **L2 details**: ~10-50ms (JSONL, depends on node count)
|
||||
- **L3 raw logs**: ~50-500ms (JSONL, depends on step count)
|
||||
|
||||
**Optimization:** Use filters (node_id, step_index) to reduce data read
|
||||
|
||||
### Storage Size
|
||||
|
||||
Typical session with 5 nodes, 20 steps:
|
||||
|
||||
- **L1 (summary.json)**: ~2-5 KB
|
||||
- **L2 (details.jsonl)**: ~5-10 KB (1-2 KB per node)
|
||||
- **L3 (tool_logs.jsonl)**: ~50-200 KB (2-10 KB per step)
|
||||
|
||||
**Total per session:** ~60-215 KB
|
||||
|
||||
**Compression:** Consider archiving old sessions after 90 days
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Logs not appearing
|
||||
|
||||
**Symptom:** MCP tools return empty results
|
||||
|
||||
**Check:**
|
||||
1. Verify storage path exists: `~/.hive/agents/{agent_name}/`
|
||||
2. Check session directories: `ls ~/.hive/agents/{agent_name}/sessions/`
|
||||
3. Verify logs directory exists: `ls ~/.hive/agents/{agent_name}/sessions/session_*/logs/`
|
||||
4. Check file permissions
|
||||
|
||||
### Issue: Corrupt JSONL files
|
||||
|
||||
**Symptom:** Partial data or JSON decode errors
|
||||
|
||||
**Cause:** Process crash during write (rare, but possible)
|
||||
|
||||
**Recovery:**
|
||||
```python
|
||||
# MCP tools skip corrupt lines automatically
|
||||
query_runtime_log_details(agent_work_dir, run_id)
|
||||
# Logs warning but continues with valid lines
|
||||
```
|
||||
|
||||
### Issue: High disk usage
|
||||
|
||||
**Symptom:** Storage growing too large
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Archive old sessions
|
||||
cd ~/.hive/agents/{agent_name}/sessions/
|
||||
find . -name "session_2025*" -type d -exec tar -czf archive.tar.gz {} +
|
||||
rm -rf session_2025*
|
||||
|
||||
# Or set up automatic cleanup (future feature)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
**Implementation:**
|
||||
- `core/framework/runtime/runtime_logger.py` - Logger implementation
|
||||
- `core/framework/runtime/runtime_log_store.py` - Storage layer
|
||||
- `core/framework/runtime/runtime_log_schemas.py` - Data schemas
|
||||
- `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py` - MCP query tools
|
||||
|
||||
**Documentation:**
|
||||
- `EXECUTION_STORAGE_REDESIGN.md` - Unified session storage design
|
||||
- `/.claude/skills/hive-debugger/SKILL.md` - Interactive debugging skill
|
||||
|
||||
**Related:**
|
||||
- `core/framework/schemas/session_state.py` - Session state schema
|
||||
- `core/framework/storage/session_store.py` - Session state storage
|
||||
- `core/framework/graph/executor.py` - GraphExecutor integration
|
||||
@@ -18,6 +18,7 @@ from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
|
||||
from framework.runtime.outcome_aggregator import OutcomeAggregator
|
||||
from framework.runtime.shared_state import SharedStateManager
|
||||
from framework.storage.concurrent import ConcurrentStorage
|
||||
from framework.storage.session_store import SessionStore
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from framework.graph.edge import GraphSpec
|
||||
@@ -100,6 +101,7 @@ class AgentRuntime:
|
||||
tools: list["Tool"] | None = None,
|
||||
tool_executor: Callable | None = None,
|
||||
config: AgentRuntimeConfig | None = None,
|
||||
runtime_log_store: Any = None,
|
||||
):
|
||||
"""
|
||||
Initialize agent runtime.
|
||||
@@ -112,18 +114,24 @@ class AgentRuntime:
|
||||
tools: Available tools
|
||||
tool_executor: Function to execute tools
|
||||
config: Optional runtime configuration
|
||||
runtime_log_store: Optional RuntimeLogStore for per-execution logging
|
||||
"""
|
||||
self.graph = graph
|
||||
self.goal = goal
|
||||
self._config = config or AgentRuntimeConfig()
|
||||
self._runtime_log_store = runtime_log_store
|
||||
|
||||
# Initialize storage
|
||||
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
|
||||
self._storage = ConcurrentStorage(
|
||||
base_path=storage_path,
|
||||
base_path=storage_path_obj,
|
||||
cache_ttl=self._config.cache_ttl,
|
||||
batch_interval=self._config.batch_interval,
|
||||
)
|
||||
|
||||
# Initialize SessionStore for unified sessions (always enabled)
|
||||
self._session_store = SessionStore(storage_path_obj)
|
||||
|
||||
# Initialize shared components
|
||||
self._state_manager = SharedStateManager()
|
||||
self._event_bus = EventBus(max_history=self._config.max_history)
|
||||
@@ -212,6 +220,8 @@ class AgentRuntime:
|
||||
tool_executor=self._tool_executor,
|
||||
result_retention_max=self._config.execution_result_max,
|
||||
result_retention_ttl_seconds=self._config.execution_result_ttl_seconds,
|
||||
runtime_log_store=self._runtime_log_store,
|
||||
session_store=self._session_store,
|
||||
)
|
||||
await stream.start()
|
||||
self._streams[ep_id] = stream
|
||||
@@ -296,6 +306,25 @@ class AgentRuntime:
|
||||
raise ValueError(f"Entry point '{entry_point_id}' not found")
|
||||
return await stream.wait_for_completion(exec_id, timeout)
|
||||
|
||||
async def inject_input(self, node_id: str, content: str) -> bool:
|
||||
"""Inject user input into a running client-facing node.
|
||||
|
||||
Routes input to the EventLoopNode identified by ``node_id``
|
||||
across all active streams. Used by the TUI ChatRepl to deliver
|
||||
user responses during client-facing node execution.
|
||||
|
||||
Args:
|
||||
node_id: The node currently waiting for input
|
||||
content: The user's input text
|
||||
|
||||
Returns:
|
||||
True if input was delivered, False if no matching node found
|
||||
"""
|
||||
for stream in self._streams.values():
|
||||
if await stream.inject_input(node_id, content):
|
||||
return True
|
||||
return False
|
||||
|
||||
async def get_goal_progress(self) -> dict[str, Any]:
|
||||
"""
|
||||
Evaluate goal progress across all streams.
|
||||
@@ -429,11 +458,14 @@ def create_agent_runtime(
|
||||
tools: list["Tool"] | None = None,
|
||||
tool_executor: Callable | None = None,
|
||||
config: AgentRuntimeConfig | None = None,
|
||||
runtime_log_store: Any = None,
|
||||
enable_logging: bool = True,
|
||||
) -> AgentRuntime:
|
||||
"""
|
||||
Create and configure an AgentRuntime with entry points.
|
||||
|
||||
Convenience factory that creates runtime and registers entry points.
|
||||
Runtime logging is enabled by default for observability.
|
||||
|
||||
Args:
|
||||
graph: Graph specification
|
||||
@@ -444,10 +476,21 @@ def create_agent_runtime(
|
||||
tools: Available tools
|
||||
tool_executor: Tool executor function
|
||||
config: Runtime configuration
|
||||
runtime_log_store: Optional RuntimeLogStore for per-execution logging.
|
||||
If None and enable_logging=True, creates one automatically.
|
||||
enable_logging: Whether to enable runtime logging (default: True).
|
||||
Set to False to disable logging entirely.
|
||||
|
||||
Returns:
|
||||
Configured AgentRuntime (not yet started)
|
||||
"""
|
||||
# Auto-create runtime log store if logging is enabled and not provided
|
||||
if enable_logging and runtime_log_store is None:
|
||||
from framework.runtime.runtime_log_store import RuntimeLogStore
|
||||
|
||||
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
|
||||
runtime_log_store = RuntimeLogStore(storage_path_obj / "runtime_logs")
|
||||
|
||||
runtime = AgentRuntime(
|
||||
graph=graph,
|
||||
goal=goal,
|
||||
@@ -456,6 +499,7 @@ def create_agent_runtime(
|
||||
tools=tools,
|
||||
tool_executor=tool_executor,
|
||||
config=config,
|
||||
runtime_log_store=runtime_log_store,
|
||||
)
|
||||
|
||||
for spec in entry_points:
|
||||
|
||||
@@ -28,6 +28,7 @@ if TYPE_CHECKING:
|
||||
from framework.runtime.event_bus import EventBus
|
||||
from framework.runtime.outcome_aggregator import OutcomeAggregator
|
||||
from framework.storage.concurrent import ConcurrentStorage
|
||||
from framework.storage.session_store import SessionStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -112,6 +113,8 @@ class ExecutionStream:
|
||||
tool_executor: Callable | None = None,
|
||||
result_retention_max: int | None = 1000,
|
||||
result_retention_ttl_seconds: float | None = None,
|
||||
runtime_log_store: Any = None,
|
||||
session_store: "SessionStore | None" = None,
|
||||
):
|
||||
"""
|
||||
Initialize execution stream.
|
||||
@@ -128,6 +131,8 @@ class ExecutionStream:
|
||||
llm: LLM provider for nodes
|
||||
tools: Available tools
|
||||
tool_executor: Function to execute tools
|
||||
runtime_log_store: Optional RuntimeLogStore for per-execution logging
|
||||
session_store: Optional SessionStore for unified session storage
|
||||
"""
|
||||
self.stream_id = stream_id
|
||||
self.entry_spec = entry_spec
|
||||
@@ -142,6 +147,8 @@ class ExecutionStream:
|
||||
self._tool_executor = tool_executor
|
||||
self._result_retention_max = result_retention_max
|
||||
self._result_retention_ttl_seconds = result_retention_ttl_seconds
|
||||
self._runtime_log_store = runtime_log_store
|
||||
self._session_store = session_store
|
||||
|
||||
# Create stream-scoped runtime
|
||||
self._runtime = StreamRuntime(
|
||||
@@ -153,6 +160,7 @@ class ExecutionStream:
|
||||
# Execution tracking
|
||||
self._active_executions: dict[str, ExecutionContext] = {}
|
||||
self._execution_tasks: dict[str, asyncio.Task] = {}
|
||||
self._active_executors: dict[str, GraphExecutor] = {}
|
||||
self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
|
||||
self._execution_result_times: dict[str, float] = {}
|
||||
self._completion_events: dict[str, asyncio.Event] = {}
|
||||
@@ -220,6 +228,13 @@ class ExecutionStream:
|
||||
await task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
except RuntimeError as e:
|
||||
# Task may be attached to a different event loop (e.g., when TUI
|
||||
# uses a separate loop). Log and continue cleanup.
|
||||
if "attached to a different loop" in str(e):
|
||||
logger.warning(f"Task cleanup skipped (different event loop): {e}")
|
||||
else:
|
||||
raise
|
||||
|
||||
self._execution_tasks.clear()
|
||||
self._active_executions.clear()
|
||||
@@ -237,6 +252,21 @@ class ExecutionStream:
|
||||
)
|
||||
)
|
||||
|
||||
async def inject_input(self, node_id: str, content: str) -> bool:
|
||||
"""Inject user input into a running client-facing EventLoopNode.
|
||||
|
||||
Searches active executors for a node matching ``node_id`` and calls
|
||||
its ``inject_event()`` method to unblock ``_await_user_input()``.
|
||||
|
||||
Returns True if input was delivered, False otherwise.
|
||||
"""
|
||||
for executor in self._active_executors.values():
|
||||
node = executor.node_registry.get(node_id)
|
||||
if node is not None and hasattr(node, "inject_event"):
|
||||
await node.inject_event(content)
|
||||
return True
|
||||
return False
|
||||
|
||||
async def execute(
|
||||
self,
|
||||
input_data: dict[str, Any],
|
||||
@@ -259,8 +289,21 @@ class ExecutionStream:
|
||||
if not self._running:
|
||||
raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")
|
||||
|
||||
# Generate execution ID
|
||||
execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex[:8]}"
|
||||
# Generate execution ID using unified session format
|
||||
if self._session_store:
|
||||
execution_id = self._session_store.generate_session_id()
|
||||
else:
|
||||
# Fallback to old format if SessionStore not available (shouldn't happen)
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
"SessionStore not available, using deprecated exec_* ID format. "
|
||||
"Please ensure AgentRuntime is properly initialized.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex[:8]}"
|
||||
|
||||
if correlation_id is None:
|
||||
correlation_id = execution_id
|
||||
|
||||
@@ -314,13 +357,38 @@ class ExecutionStream:
|
||||
# Create runtime adapter for this execution
|
||||
runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)
|
||||
|
||||
# Create executor for this execution
|
||||
# Create per-execution runtime logger
|
||||
runtime_logger = None
|
||||
if self._runtime_log_store:
|
||||
from framework.runtime.runtime_logger import RuntimeLogger
|
||||
|
||||
runtime_logger = RuntimeLogger(
|
||||
store=self._runtime_log_store, agent_id=self.graph.id
|
||||
)
|
||||
|
||||
# Create executor for this execution.
|
||||
# Each execution gets its own storage under sessions/{exec_id}/
|
||||
# so conversations, spillover, and data files are all scoped
|
||||
# to this execution. The executor sets data_dir via execution
|
||||
# context (contextvars) so data tools and spillover share the
|
||||
# same session-scoped directory.
|
||||
exec_storage = self._storage.base_path / "sessions" / execution_id
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime_adapter,
|
||||
llm=self._llm,
|
||||
tools=self._tools,
|
||||
tool_executor=self._tool_executor,
|
||||
event_bus=self._event_bus,
|
||||
stream_id=self.stream_id,
|
||||
storage_path=exec_storage,
|
||||
runtime_logger=runtime_logger,
|
||||
loop_config=self.graph.loop_config,
|
||||
)
|
||||
# Track executor so inject_input() can reach EventLoopNode instances
|
||||
self._active_executors[execution_id] = executor
|
||||
|
||||
# Write initial session state
|
||||
await self._write_session_state(execution_id, ctx)
|
||||
|
||||
# Create modified graph with entry point
|
||||
# We need to override the entry_node to use our entry point
|
||||
@@ -334,6 +402,9 @@ class ExecutionStream:
|
||||
session_state=ctx.session_state,
|
||||
)
|
||||
|
||||
# Clean up executor reference
|
||||
self._active_executors.pop(execution_id, None)
|
||||
|
||||
# Store result with retention
|
||||
self._record_execution_result(execution_id, result)
|
||||
|
||||
@@ -343,6 +414,9 @@ class ExecutionStream:
|
||||
if result.paused_at:
|
||||
ctx.status = "paused"
|
||||
|
||||
# Write final session state
|
||||
await self._write_session_state(execution_id, ctx, result=result)
|
||||
|
||||
# Emit completion/failure event
|
||||
if self._event_bus:
|
||||
if result.success:
|
||||
@@ -379,6 +453,9 @@ class ExecutionStream:
|
||||
),
|
||||
)
|
||||
|
||||
# Write error session state
|
||||
await self._write_session_state(execution_id, ctx, error=str(e))
|
||||
|
||||
# Emit failure event
|
||||
if self._event_bus:
|
||||
await self._event_bus.emit_execution_failed(
|
||||
@@ -402,6 +479,88 @@ class ExecutionStream:
|
||||
self._completion_events.pop(execution_id, None)
|
||||
self._execution_tasks.pop(execution_id, None)
|
||||
|
||||
async def _write_session_state(
|
||||
self,
|
||||
execution_id: str,
|
||||
ctx: ExecutionContext,
|
||||
result: ExecutionResult | None = None,
|
||||
error: str | None = None,
|
||||
) -> None:
|
||||
"""
|
||||
Write state.json for a session.
|
||||
|
||||
Args:
|
||||
execution_id: Session/execution ID
|
||||
ctx: Execution context
|
||||
result: Optional execution result (if completed)
|
||||
error: Optional error message (if failed)
|
||||
"""
|
||||
# Only write if session_store is available
|
||||
if not self._session_store:
|
||||
return
|
||||
|
||||
from framework.schemas.session_state import SessionState, SessionStatus
|
||||
|
||||
try:
|
||||
# Determine status
|
||||
if result:
|
||||
if result.paused_at:
|
||||
status = SessionStatus.PAUSED
|
||||
elif result.success:
|
||||
status = SessionStatus.COMPLETED
|
||||
else:
|
||||
status = SessionStatus.FAILED
|
||||
elif error:
|
||||
status = SessionStatus.FAILED
|
||||
else:
|
||||
status = SessionStatus.ACTIVE
|
||||
|
||||
# Create SessionState
|
||||
if result:
|
||||
# Create from execution result
|
||||
state = SessionState.from_execution_result(
|
||||
session_id=execution_id,
|
||||
goal_id=self.goal.id,
|
||||
result=result,
|
||||
stream_id=self.stream_id,
|
||||
correlation_id=ctx.correlation_id,
|
||||
started_at=ctx.started_at.isoformat(),
|
||||
input_data=ctx.input_data,
|
||||
agent_id=self.graph.id,
|
||||
entry_point=self.entry_spec.id,
|
||||
)
|
||||
else:
|
||||
# Create initial state
|
||||
from framework.schemas.session_state import SessionTimestamps
|
||||
|
||||
now = datetime.now().isoformat()
|
||||
state = SessionState(
|
||||
session_id=execution_id,
|
||||
stream_id=self.stream_id,
|
||||
correlation_id=ctx.correlation_id,
|
||||
goal_id=self.goal.id,
|
||||
agent_id=self.graph.id,
|
||||
entry_point=self.entry_spec.id,
|
||||
status=status,
|
||||
timestamps=SessionTimestamps(
|
||||
started_at=ctx.started_at.isoformat(),
|
||||
updated_at=now,
|
||||
),
|
||||
input_data=ctx.input_data,
|
||||
)
|
||||
|
||||
# Handle error case
|
||||
if error:
|
||||
state.result.error = error
|
||||
|
||||
# Write state.json
|
||||
await self._session_store.write_state(execution_id, state)
|
||||
logger.debug(f"Wrote state.json for session {execution_id} (status={status})")
|
||||
|
||||
except Exception as e:
|
||||
# Log but don't fail the execution
|
||||
logger.error(f"Failed to write state.json for {execution_id}: {e}")
|
||||
|
||||
def _create_modified_graph(self) -> "GraphSpec":
|
||||
"""Create a graph with the entry point overridden."""
|
||||
# Use the existing graph but override entry_node
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
"""Pydantic models for the three-level runtime logging system.
|
||||
|
||||
Level 1 - SUMMARY: Per graph run pass/fail, token counts, timing
|
||||
Level 2 - DETAILS: Per node completion results and attention flags
|
||||
Level 3 - TOOL LOGS: Per step within any node (tool calls, LLM text, tokens)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Level 3: Tool logs (most granular) — per step within any node
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class ToolCallLog(BaseModel):
|
||||
"""A single tool call within a step."""
|
||||
|
||||
tool_use_id: str
|
||||
tool_name: str
|
||||
tool_input: dict[str, Any] = Field(default_factory=dict)
|
||||
result: str = ""
|
||||
is_error: bool = False
|
||||
|
||||
|
||||
class NodeStepLog(BaseModel):
|
||||
"""Full tool and LLM details for one step within a node.
|
||||
|
||||
For EventLoopNode, each iteration is a step. For single-step nodes
|
||||
(LLMNode, FunctionNode, RouterNode), step_index is 0.
|
||||
"""
|
||||
|
||||
node_id: str
|
||||
node_type: str = "" # "event_loop"|"llm_tool_use"|"llm_generate"|"function"|"router"
|
||||
step_index: int = 0 # iteration number for event_loop, 0 for single-step nodes
|
||||
llm_text: str = ""
|
||||
tool_calls: list[ToolCallLog] = Field(default_factory=list)
|
||||
input_tokens: int = 0
|
||||
output_tokens: int = 0
|
||||
latency_ms: int = 0
|
||||
# EventLoopNode only:
|
||||
verdict: str = "" # "ACCEPT"|"RETRY"|"ESCALATE"|"CONTINUE"
|
||||
verdict_feedback: str = ""
|
||||
# Error tracking:
|
||||
error: str = "" # Error message if step failed
|
||||
stacktrace: str = "" # Full stack trace if exception occurred
|
||||
is_partial: bool = False # True if step didn't complete normally
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Level 2: Per-node completion details
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class NodeDetail(BaseModel):
|
||||
"""Per-node completion result and attention flags."""
|
||||
|
||||
node_id: str
|
||||
node_name: str = ""
|
||||
node_type: str = ""
|
||||
success: bool = True
|
||||
error: str | None = None
|
||||
stacktrace: str = "" # Full stack trace if exception occurred
|
||||
total_steps: int = 0
|
||||
tokens_used: int = 0 # combined input+output from NodeResult
|
||||
input_tokens: int = 0
|
||||
output_tokens: int = 0
|
||||
latency_ms: int = 0
|
||||
attempt: int = 1 # retry attempt number
|
||||
# EventLoopNode-specific:
|
||||
exit_status: str = "" # "success"|"failure"|"stalled"|"escalated"|"paused"|"guard_failure"
|
||||
accept_count: int = 0
|
||||
retry_count: int = 0
|
||||
escalate_count: int = 0
|
||||
continue_count: int = 0
|
||||
needs_attention: bool = False
|
||||
attention_reasons: list[str] = Field(default_factory=list)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Level 1: Run summary — one per full graph execution
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class RunSummaryLog(BaseModel):
|
||||
"""Run-level summary for a full graph execution."""
|
||||
|
||||
run_id: str
|
||||
agent_id: str = ""
|
||||
goal_id: str = ""
|
||||
status: str = "" # "success"|"failure"|"degraded"
|
||||
total_nodes_executed: int = 0
|
||||
node_path: list[str] = Field(default_factory=list)
|
||||
total_input_tokens: int = 0
|
||||
total_output_tokens: int = 0
|
||||
needs_attention: bool = False
|
||||
attention_reasons: list[str] = Field(default_factory=list)
|
||||
started_at: str = "" # ISO timestamp
|
||||
duration_ms: int = 0
|
||||
execution_quality: str = "" # "clean"|"degraded"|"failed"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Container models for file serialization
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class RunDetailsLog(BaseModel):
|
||||
"""Level 2 container: all node details for a run."""
|
||||
|
||||
run_id: str
|
||||
nodes: list[NodeDetail] = Field(default_factory=list)
|
||||
|
||||
|
||||
class RunToolLogs(BaseModel):
|
||||
"""Level 3 container: all step logs for a run."""
|
||||
|
||||
run_id: str
|
||||
steps: list[NodeStepLog] = Field(default_factory=list)
|
||||
@@ -0,0 +1,306 @@
|
||||
"""File-based storage for runtime logs.
|
||||
|
||||
Each run gets its own directory under ``runs/``. No shared mutable index —
|
||||
``list_runs()`` scans the directory and loads summary.json from each run.
|
||||
This eliminates concurrency issues when parallel EventLoopNodes write
|
||||
simultaneously.
|
||||
|
||||
L2 (details) and L3 (tool logs) use JSONL (one JSON object per line) for
|
||||
incremental append-on-write. This provides crash resilience — data is on
|
||||
disk as soon as it's logged, not only at end_run(). L1 (summary) is still
|
||||
written once at end as a regular JSON file since it aggregates L2.
|
||||
|
||||
Storage layout (current)::
|
||||
|
||||
{base_path}/
|
||||
sessions/
|
||||
{session_id}/
|
||||
logs/
|
||||
summary.json # Level 1 — written once at end
|
||||
details.jsonl # Level 2 — appended per node completion
|
||||
tool_logs.jsonl # Level 3 — appended per step
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from framework.runtime.runtime_log_schemas import (
|
||||
NodeDetail,
|
||||
NodeStepLog,
|
||||
RunDetailsLog,
|
||||
RunSummaryLog,
|
||||
RunToolLogs,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class RuntimeLogStore:
|
||||
"""Persists runtime logs at three levels. Thread-safe via per-run directories."""
|
||||
|
||||
def __init__(self, base_path: Path) -> None:
|
||||
self._base_path = base_path
|
||||
# Note: _runs_dir is determined per-run_id by _get_run_dir()
|
||||
|
||||
def _get_run_dir(self, run_id: str) -> Path:
|
||||
"""Determine run directory path based on run_id format.
|
||||
|
||||
- New format (session_*): {storage_root}/sessions/{run_id}/logs/
|
||||
- Old format (anything else): {base_path}/runs/{run_id}/ (deprecated)
|
||||
|
||||
When base_path ends with 'runtime_logs', we use the parent directory
|
||||
to avoid nesting under runtime_logs/.
|
||||
|
||||
This allows backward compatibility for reading old logs.
|
||||
"""
|
||||
if run_id.startswith("session_"):
|
||||
# New: sessions/{session_id}/logs/
|
||||
# If base_path ends with runtime_logs, use parent (storage root)
|
||||
is_runtime_logs = self._base_path.name == "runtime_logs"
|
||||
root = self._base_path.parent if is_runtime_logs else self._base_path
|
||||
return root / "sessions" / run_id / "logs"
|
||||
else:
|
||||
# Old: runs/{run_id}/ (deprecated, backward compatibility only)
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
f"Reading logs from deprecated location for run_id={run_id}. "
|
||||
"New sessions use unified storage at sessions/session_*/logs/",
|
||||
DeprecationWarning,
|
||||
stacklevel=3,
|
||||
)
|
||||
return self._base_path / "runs" / run_id
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Incremental write (sync — called from locked sections)
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def ensure_run_dir(self, run_id: str) -> None:
|
||||
"""Create the run directory immediately. Called by start_run()."""
|
||||
run_dir = self._get_run_dir(run_id)
|
||||
run_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def append_step(self, run_id: str, step: NodeStepLog) -> None:
|
||||
"""Append one JSONL line to tool_logs.jsonl. Sync."""
|
||||
path = self._get_run_dir(run_id) / "tool_logs.jsonl"
|
||||
line = json.dumps(step.model_dump(), ensure_ascii=False) + "\n"
|
||||
with open(path, "a", encoding="utf-8") as f:
|
||||
f.write(line)
|
||||
|
||||
def append_node_detail(self, run_id: str, detail: NodeDetail) -> None:
|
||||
"""Append one JSONL line to details.jsonl. Sync."""
|
||||
path = self._get_run_dir(run_id) / "details.jsonl"
|
||||
line = json.dumps(detail.model_dump(), ensure_ascii=False) + "\n"
|
||||
with open(path, "a", encoding="utf-8") as f:
|
||||
f.write(line)
|
||||
|
||||
def read_node_details_sync(self, run_id: str) -> list[NodeDetail]:
|
||||
"""Read details.jsonl back into a list of NodeDetail. Sync.
|
||||
|
||||
Used by end_run() to aggregate L2 into L1. Skips corrupt lines.
|
||||
"""
|
||||
path = self._get_run_dir(run_id) / "details.jsonl"
|
||||
return _read_jsonl_as_models(path, NodeDetail)
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Summary write (async — called from end_run)
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
async def save_summary(self, run_id: str, summary: RunSummaryLog) -> None:
|
||||
"""Write summary.json atomically. Called once at end_run()."""
|
||||
run_dir = self._get_run_dir(run_id)
|
||||
await asyncio.to_thread(run_dir.mkdir, parents=True, exist_ok=True)
|
||||
await self._write_json(run_dir / "summary.json", summary.model_dump())
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Read
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
async def load_summary(self, run_id: str) -> RunSummaryLog | None:
|
||||
"""Load Level 1 summary for a specific run."""
|
||||
data = await self._read_json(self._get_run_dir(run_id) / "summary.json")
|
||||
return RunSummaryLog(**data) if data is not None else None
|
||||
|
||||
async def load_details(self, run_id: str) -> RunDetailsLog | None:
|
||||
"""Load Level 2 details from details.jsonl for a specific run."""
|
||||
path = self._get_run_dir(run_id) / "details.jsonl"
|
||||
|
||||
def _read() -> RunDetailsLog | None:
|
||||
if not path.exists():
|
||||
return None
|
||||
nodes = _read_jsonl_as_models(path, NodeDetail)
|
||||
return RunDetailsLog(run_id=run_id, nodes=nodes)
|
||||
|
||||
return await asyncio.to_thread(_read)
|
||||
|
||||
async def load_tool_logs(self, run_id: str) -> RunToolLogs | None:
|
||||
"""Load Level 3 tool logs from tool_logs.jsonl for a specific run."""
|
||||
path = self._get_run_dir(run_id) / "tool_logs.jsonl"
|
||||
|
||||
def _read() -> RunToolLogs | None:
|
||||
if not path.exists():
|
||||
return None
|
||||
steps = _read_jsonl_as_models(path, NodeStepLog)
|
||||
return RunToolLogs(run_id=run_id, steps=steps)
|
||||
|
||||
return await asyncio.to_thread(_read)
|
||||
|
||||
async def list_runs(
|
||||
self,
|
||||
status: str = "",
|
||||
needs_attention: bool | None = None,
|
||||
limit: int = 20,
|
||||
) -> list[RunSummaryLog]:
|
||||
"""Scan both old and new directory structures, load summaries, filter, and sort.
|
||||
|
||||
Scans:
|
||||
- Old: base_path/runs/{run_id}/
|
||||
- New: base_path/sessions/{session_id}/logs/
|
||||
|
||||
Directories without summary.json are treated as in-progress runs and
|
||||
get a synthetic summary with status="in_progress".
|
||||
"""
|
||||
entries = await asyncio.to_thread(self._scan_run_dirs)
|
||||
summaries: list[RunSummaryLog] = []
|
||||
|
||||
for run_id in entries:
|
||||
summary = await self.load_summary(run_id)
|
||||
if summary is None:
|
||||
# In-progress run: no summary.json yet. Synthesize one.
|
||||
run_dir = self._get_run_dir(run_id)
|
||||
if not run_dir.is_dir():
|
||||
continue
|
||||
summary = RunSummaryLog(
|
||||
run_id=run_id,
|
||||
status="in_progress",
|
||||
started_at=_infer_started_at(run_id),
|
||||
)
|
||||
if status and status != "needs_attention" and summary.status != status:
|
||||
continue
|
||||
if status == "needs_attention" and not summary.needs_attention:
|
||||
continue
|
||||
if needs_attention is not None and summary.needs_attention != needs_attention:
|
||||
continue
|
||||
summaries.append(summary)
|
||||
|
||||
# Sort by started_at descending (most recent first)
|
||||
summaries.sort(key=lambda s: s.started_at, reverse=True)
|
||||
return summaries[:limit]
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def _scan_run_dirs(self) -> list[str]:
|
||||
"""Return list of run_id directory names from both old and new locations.
|
||||
|
||||
Scans:
|
||||
- New: base_path/sessions/{session_id}/logs/ (preferred)
|
||||
- Old: base_path/runs/{run_id}/ (deprecated, backward compatibility)
|
||||
|
||||
Returns run_ids/session_ids. Includes all directories, not just those
|
||||
with summary.json, so in-progress runs are visible.
|
||||
"""
|
||||
run_ids = []
|
||||
|
||||
# Scan new location: base_path/sessions/{session_id}/logs/
|
||||
# Determine the correct base path for sessions
|
||||
is_runtime_logs = self._base_path.name == "runtime_logs"
|
||||
root = self._base_path.parent if is_runtime_logs else self._base_path
|
||||
sessions_dir = root / "sessions"
|
||||
|
||||
if sessions_dir.exists():
|
||||
for session_dir in sessions_dir.iterdir():
|
||||
if session_dir.is_dir() and session_dir.name.startswith("session_"):
|
||||
logs_dir = session_dir / "logs"
|
||||
if logs_dir.exists() and logs_dir.is_dir():
|
||||
run_ids.append(session_dir.name)
|
||||
|
||||
# Scan old location: base_path/runs/ (deprecated)
|
||||
old_runs_dir = self._base_path / "runs"
|
||||
if old_runs_dir.exists():
|
||||
old_ids = [d.name for d in old_runs_dir.iterdir() if d.is_dir()]
|
||||
if old_ids:
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
f"Found {len(old_ids)} runs in deprecated location. "
|
||||
"Consider migrating to unified session storage.",
|
||||
DeprecationWarning,
|
||||
stacklevel=3,
|
||||
)
|
||||
run_ids.extend(old_ids)
|
||||
|
||||
return run_ids
|
||||
|
||||
@staticmethod
|
||||
async def _write_json(path: Path, data: dict) -> None:
|
||||
"""Write JSON atomically: write to .tmp then rename."""
|
||||
tmp = path.with_suffix(".tmp")
|
||||
content = json.dumps(data, indent=2, ensure_ascii=False)
|
||||
|
||||
def _write() -> None:
|
||||
tmp.write_text(content, encoding="utf-8")
|
||||
tmp.rename(path)
|
||||
|
||||
await asyncio.to_thread(_write)
|
||||
|
||||
@staticmethod
|
||||
async def _read_json(path: Path) -> dict | None:
|
||||
"""Read and parse a JSON file. Returns None if missing or corrupt."""
|
||||
|
||||
def _read() -> dict | None:
|
||||
if not path.exists():
|
||||
return None
|
||||
try:
|
||||
return json.loads(path.read_text(encoding="utf-8"))
|
||||
except (json.JSONDecodeError, OSError) as e:
|
||||
logger.warning("Failed to read %s: %s", path, e)
|
||||
return None
|
||||
|
||||
return await asyncio.to_thread(_read)
|
||||
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Module-level helpers
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
|
||||
def _read_jsonl_as_models(path: Path, model_cls: type) -> list:
|
||||
"""Parse a JSONL file into a list of Pydantic model instances.
|
||||
|
||||
Skips blank lines and corrupt JSON lines (partial writes from crashes).
|
||||
"""
|
||||
results = []
|
||||
if not path.exists():
|
||||
return results
|
||||
try:
|
||||
with open(path, encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
data = json.loads(line)
|
||||
results.append(model_cls(**data))
|
||||
except (json.JSONDecodeError, Exception) as e:
|
||||
logger.warning("Skipping corrupt JSONL line in %s: %s", path, e)
|
||||
continue
|
||||
except OSError as e:
|
||||
logger.warning("Failed to read %s: %s", path, e)
|
||||
return results
|
||||
|
||||
|
||||
def _infer_started_at(run_id: str) -> str:
|
||||
"""Best-effort ISO timestamp from a run_id like '20250101T120000_abc12345'."""
|
||||
try:
|
||||
ts_part = run_id.split("_")[0] # '20250101T120000'
|
||||
dt = datetime.strptime(ts_part, "%Y%m%dT%H%M%S").replace(tzinfo=UTC)
|
||||
return dt.isoformat()
|
||||
except (ValueError, IndexError):
|
||||
return ""
|
||||
@@ -0,0 +1,304 @@
|
||||
"""RuntimeLogger: captures runtime data during graph execution.
|
||||
|
||||
Injected into GraphExecutor as an optional parameter. Each log_step() and
|
||||
log_node_complete() call writes immediately to disk (JSONL append). Only
|
||||
the L1 summary is written at end_run() since it aggregates L2 data.
|
||||
|
||||
This provides crash resilience — L2 and L3 data survives process death
|
||||
without needing end_run() to complete.
|
||||
|
||||
Usage::
|
||||
|
||||
store = RuntimeLogStore(Path(work_dir) / "runtime_logs")
|
||||
runtime_logger = RuntimeLogger(store=store, agent_id="my-agent")
|
||||
executor = GraphExecutor(..., runtime_logger=runtime_logger)
|
||||
# After execution, logger has persisted all data to store
|
||||
|
||||
Safety: ``end_run()`` catches all exceptions internally and logs them via
|
||||
the Python logger. Logging failure must never kill a successful run.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import threading
|
||||
import uuid
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
|
||||
from framework.runtime.runtime_log_schemas import (
|
||||
NodeDetail,
|
||||
NodeStepLog,
|
||||
RunSummaryLog,
|
||||
ToolCallLog,
|
||||
)
|
||||
from framework.runtime.runtime_log_store import RuntimeLogStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class RuntimeLogger:
|
||||
"""Captures runtime data during graph execution.
|
||||
|
||||
Thread-safe: uses a lock around file appends for parallel node safety.
|
||||
"""
|
||||
|
||||
def __init__(self, store: RuntimeLogStore, agent_id: str = "") -> None:
|
||||
self._store = store
|
||||
self._agent_id = agent_id
|
||||
self._run_id = ""
|
||||
self._goal_id = ""
|
||||
self._started_at = ""
|
||||
self._logged_node_ids: set[str] = set()
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def start_run(self, goal_id: str = "", session_id: str = "") -> str:
|
||||
"""Start a new run. Called by GraphExecutor at graph start. Returns run_id.
|
||||
|
||||
Args:
|
||||
goal_id: Goal ID for this run
|
||||
session_id: Optional session ID. If provided, uses it as run_id (for unified sessions).
|
||||
Otherwise generates a new run_id in old format.
|
||||
|
||||
Returns:
|
||||
The run_id (same as session_id if provided)
|
||||
"""
|
||||
if session_id:
|
||||
# Use provided session_id as run_id (unified sessions)
|
||||
self._run_id = session_id
|
||||
else:
|
||||
# Generate run_id in old format (backward compatibility)
|
||||
ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S")
|
||||
short_uuid = uuid.uuid4().hex[:8]
|
||||
self._run_id = f"{ts}_{short_uuid}"
|
||||
|
||||
self._goal_id = goal_id
|
||||
self._started_at = datetime.now(UTC).isoformat()
|
||||
self._logged_node_ids = set()
|
||||
self._store.ensure_run_dir(self._run_id)
|
||||
return self._run_id
|
||||
|
||||
def log_step(
|
||||
self,
|
||||
node_id: str,
|
||||
node_type: str,
|
||||
step_index: int,
|
||||
llm_text: str = "",
|
||||
tool_calls: list[dict[str, Any]] | None = None,
|
||||
input_tokens: int = 0,
|
||||
output_tokens: int = 0,
|
||||
latency_ms: int = 0,
|
||||
verdict: str = "",
|
||||
verdict_feedback: str = "",
|
||||
error: str = "",
|
||||
stacktrace: str = "",
|
||||
is_partial: bool = False,
|
||||
) -> None:
|
||||
"""Record data for one step within a node.
|
||||
|
||||
Called by any node during execution. Synchronous, appends to JSONL file.
|
||||
|
||||
Args:
|
||||
error: Error message if step failed
|
||||
stacktrace: Full stack trace if exception occurred
|
||||
is_partial: True if step didn't complete normally (e.g., LLM call crashed)
|
||||
"""
|
||||
if tool_calls is None:
|
||||
tool_calls = []
|
||||
|
||||
call_logs = []
|
||||
for tc in tool_calls:
|
||||
call_logs.append(
|
||||
ToolCallLog(
|
||||
tool_use_id=tc.get("tool_use_id", ""),
|
||||
tool_name=tc.get("tool_name", ""),
|
||||
tool_input=tc.get("tool_input", {}),
|
||||
result=tc.get("content", ""),
|
||||
is_error=tc.get("is_error", False),
|
||||
)
|
||||
)
|
||||
|
||||
step_log = NodeStepLog(
|
||||
node_id=node_id,
|
||||
node_type=node_type,
|
||||
step_index=step_index,
|
||||
llm_text=llm_text,
|
||||
tool_calls=call_logs,
|
||||
input_tokens=input_tokens,
|
||||
output_tokens=output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
verdict=verdict,
|
||||
verdict_feedback=verdict_feedback,
|
||||
error=error,
|
||||
stacktrace=stacktrace,
|
||||
is_partial=is_partial,
|
||||
)
|
||||
|
||||
with self._lock:
|
||||
self._store.append_step(self._run_id, step_log)
|
||||
|
||||
def log_node_complete(
|
||||
self,
|
||||
node_id: str,
|
||||
node_name: str,
|
||||
node_type: str,
|
||||
success: bool,
|
||||
error: str | None = None,
|
||||
stacktrace: str = "",
|
||||
total_steps: int = 0,
|
||||
tokens_used: int = 0,
|
||||
input_tokens: int = 0,
|
||||
output_tokens: int = 0,
|
||||
latency_ms: int = 0,
|
||||
attempt: int = 1,
|
||||
# EventLoopNode-specific kwargs:
|
||||
exit_status: str = "",
|
||||
accept_count: int = 0,
|
||||
retry_count: int = 0,
|
||||
escalate_count: int = 0,
|
||||
continue_count: int = 0,
|
||||
) -> None:
|
||||
"""Record completion of a node.
|
||||
|
||||
Called after each node completes. EventLoopNode calls this with
|
||||
verdict counts and exit_status. Other nodes: executor calls this
|
||||
from NodeResult data.
|
||||
"""
|
||||
needs_attention = not success
|
||||
attention_reasons: list[str] = []
|
||||
if not success and error:
|
||||
attention_reasons.append(f"Node {node_id} failed: {error}")
|
||||
|
||||
# Enhanced attention flags
|
||||
if retry_count > 3:
|
||||
needs_attention = True
|
||||
attention_reasons.append(f"Excessive retries: {retry_count}")
|
||||
|
||||
if escalate_count > 2:
|
||||
needs_attention = True
|
||||
attention_reasons.append(f"Excessive escalations: {escalate_count}")
|
||||
|
||||
if latency_ms > 60000: # > 1 minute
|
||||
needs_attention = True
|
||||
attention_reasons.append(f"High latency: {latency_ms}ms")
|
||||
|
||||
if tokens_used > 100000: # High token usage
|
||||
needs_attention = True
|
||||
attention_reasons.append(f"High token usage: {tokens_used}")
|
||||
|
||||
if total_steps > 20: # Many iterations
|
||||
needs_attention = True
|
||||
attention_reasons.append(f"Many iterations: {total_steps}")
|
||||
|
||||
detail = NodeDetail(
|
||||
node_id=node_id,
|
||||
node_name=node_name,
|
||||
node_type=node_type,
|
||||
success=success,
|
||||
error=error,
|
||||
stacktrace=stacktrace,
|
||||
total_steps=total_steps,
|
||||
tokens_used=tokens_used,
|
||||
input_tokens=input_tokens,
|
||||
output_tokens=output_tokens,
|
||||
latency_ms=latency_ms,
|
||||
attempt=attempt,
|
||||
exit_status=exit_status,
|
||||
accept_count=accept_count,
|
||||
retry_count=retry_count,
|
||||
escalate_count=escalate_count,
|
||||
continue_count=continue_count,
|
||||
needs_attention=needs_attention,
|
||||
attention_reasons=attention_reasons,
|
||||
)
|
||||
|
||||
with self._lock:
|
||||
self._store.append_node_detail(self._run_id, detail)
|
||||
self._logged_node_ids.add(node_id)
|
||||
|
||||
def ensure_node_logged(
|
||||
self,
|
||||
node_id: str,
|
||||
node_name: str,
|
||||
node_type: str,
|
||||
success: bool,
|
||||
error: str | None = None,
|
||||
stacktrace: str = "",
|
||||
tokens_used: int = 0,
|
||||
latency_ms: int = 0,
|
||||
) -> None:
|
||||
"""Fallback: ensure a node has an L2 entry.
|
||||
|
||||
Called by executor after each node returns. If node_id already
|
||||
appears in _logged_node_ids (because the node called log_node_complete
|
||||
itself), this is a no-op. Otherwise appends a basic NodeDetail.
|
||||
"""
|
||||
with self._lock:
|
||||
if node_id in self._logged_node_ids:
|
||||
return # Already logged by the node itself
|
||||
|
||||
# Not yet logged — create a basic entry
|
||||
self.log_node_complete(
|
||||
node_id=node_id,
|
||||
node_name=node_name,
|
||||
node_type=node_type,
|
||||
success=success,
|
||||
error=error,
|
||||
stacktrace=stacktrace,
|
||||
tokens_used=tokens_used,
|
||||
latency_ms=latency_ms,
|
||||
)
|
||||
|
||||
async def end_run(
|
||||
self,
|
||||
status: str,
|
||||
duration_ms: int,
|
||||
node_path: list[str] | None = None,
|
||||
execution_quality: str = "",
|
||||
) -> None:
|
||||
"""Read L2 from disk, aggregate into L1, write summary.json.
|
||||
|
||||
Called by GraphExecutor when graph finishes. Async, writes 1 file.
|
||||
Catches all exceptions internally -- logging failure must not
|
||||
propagate to the caller.
|
||||
"""
|
||||
try:
|
||||
# Read L2 back from disk to aggregate into L1
|
||||
node_details = self._store.read_node_details_sync(self._run_id)
|
||||
|
||||
total_input = sum(nd.input_tokens for nd in node_details)
|
||||
total_output = sum(nd.output_tokens for nd in node_details)
|
||||
|
||||
needs_attention = any(nd.needs_attention for nd in node_details)
|
||||
attention_reasons: list[str] = []
|
||||
for nd in node_details:
|
||||
attention_reasons.extend(nd.attention_reasons)
|
||||
|
||||
summary = RunSummaryLog(
|
||||
run_id=self._run_id,
|
||||
agent_id=self._agent_id,
|
||||
goal_id=self._goal_id,
|
||||
status=status,
|
||||
total_nodes_executed=len(node_details),
|
||||
node_path=node_path or [],
|
||||
total_input_tokens=total_input,
|
||||
total_output_tokens=total_output,
|
||||
needs_attention=needs_attention,
|
||||
attention_reasons=attention_reasons,
|
||||
started_at=self._started_at,
|
||||
duration_ms=duration_ms,
|
||||
execution_quality=execution_quality,
|
||||
)
|
||||
|
||||
await self._store.save_summary(self._run_id, summary)
|
||||
logger.info(
|
||||
"Runtime logs saved: run_id=%s status=%s nodes=%d",
|
||||
self._run_id,
|
||||
status,
|
||||
len(node_details),
|
||||
)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to save runtime logs for run_id=%s (non-fatal)",
|
||||
self._run_id,
|
||||
)
|
||||
@@ -0,0 +1,274 @@
|
||||
"""
|
||||
Session State Schema - Unified state for session execution.
|
||||
|
||||
This schema consolidates data from Run, ExecutionResult, and runtime logs
|
||||
into a single source of truth for session status and resumability.
|
||||
"""
|
||||
|
||||
from datetime import datetime
|
||||
from enum import StrEnum
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from pydantic import BaseModel, Field, computed_field
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from framework.graph.executor import ExecutionResult
|
||||
from framework.schemas.run import Run
|
||||
|
||||
|
||||
class SessionStatus(StrEnum):
|
||||
"""Status of a session execution."""
|
||||
|
||||
ACTIVE = "active" # Currently executing
|
||||
PAUSED = "paused" # Waiting for resume (client input, pause node)
|
||||
COMPLETED = "completed" # Finished successfully
|
||||
FAILED = "failed" # Finished with error
|
||||
CANCELLED = "cancelled" # User/system cancelled
|
||||
|
||||
|
||||
class SessionTimestamps(BaseModel):
|
||||
"""Timestamps tracking session lifecycle."""
|
||||
|
||||
started_at: str # ISO 8601 format
|
||||
updated_at: str # ISO 8601 format (updated on every state write)
|
||||
completed_at: str | None = None
|
||||
paused_at_time: str | None = None # When it was paused
|
||||
|
||||
model_config = {"extra": "allow"}
|
||||
|
||||
|
||||
class SessionProgress(BaseModel):
|
||||
"""Execution progress tracking."""
|
||||
|
||||
current_node: str | None = None
|
||||
paused_at: str | None = None # Node ID where paused
|
||||
resume_from: str | None = None # Entry point or node ID to resume from
|
||||
steps_executed: int = 0
|
||||
total_tokens: int = 0
|
||||
total_latency_ms: int = 0
|
||||
path: list[str] = Field(default_factory=list) # Node IDs traversed
|
||||
|
||||
# Quality metrics (from ExecutionResult)
|
||||
total_retries: int = 0
|
||||
nodes_with_failures: list[str] = Field(default_factory=list)
|
||||
retry_details: dict[str, int] = Field(default_factory=dict)
|
||||
had_partial_failures: bool = False
|
||||
execution_quality: str = "clean" # "clean", "degraded", or "failed"
|
||||
node_visit_counts: dict[str, int] = Field(default_factory=dict)
|
||||
|
||||
model_config = {"extra": "allow"}
|
||||
|
||||
|
||||
class SessionResult(BaseModel):
|
||||
"""Final result of session execution."""
|
||||
|
||||
success: bool | None = None # None if still running
|
||||
error: str | None = None
|
||||
output: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
model_config = {"extra": "allow"}
|
||||
|
||||
|
||||
class SessionMetrics(BaseModel):
|
||||
"""Execution metrics (from Run.metrics)."""
|
||||
|
||||
decision_count: int = 0
|
||||
problem_count: int = 0
|
||||
total_input_tokens: int = 0
|
||||
total_output_tokens: int = 0
|
||||
nodes_executed: list[str] = Field(default_factory=list)
|
||||
edges_traversed: list[str] = Field(default_factory=list)
|
||||
|
||||
model_config = {"extra": "allow"}
|
||||
|
||||
|
||||
class SessionState(BaseModel):
|
||||
"""
|
||||
Complete state for a session execution.
|
||||
|
||||
This is the single source of truth for session status and resumability.
|
||||
Consolidates data from ExecutionResult, ExecutionContext, Run, and runtime logs.
|
||||
|
||||
Version History:
|
||||
- v1.0: Initial schema (2026-02-06)
|
||||
"""
|
||||
|
||||
# Schema version for forward/backward compatibility
|
||||
schema_version: str = "1.0"
|
||||
|
||||
# Identity
|
||||
session_id: str # Format: session_YYYYMMDD_HHMMSS_{uuid_8char}
|
||||
stream_id: str = "" # Which ExecutionStream created this
|
||||
correlation_id: str = "" # For correlating related executions
|
||||
|
||||
# Status
|
||||
status: SessionStatus = SessionStatus.ACTIVE
|
||||
|
||||
# Goal/Agent context
|
||||
goal_id: str
|
||||
agent_id: str = ""
|
||||
entry_point: str = "start"
|
||||
|
||||
# Timestamps
|
||||
timestamps: SessionTimestamps
|
||||
|
||||
# Progress
|
||||
progress: SessionProgress = Field(default_factory=SessionProgress)
|
||||
|
||||
# Result
|
||||
result: SessionResult = Field(default_factory=SessionResult)
|
||||
|
||||
# Memory (for resumability)
|
||||
memory: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
# Metrics
|
||||
metrics: SessionMetrics = Field(default_factory=SessionMetrics)
|
||||
|
||||
# Problems (from Run.problems)
|
||||
problems: list[dict[str, Any]] = Field(default_factory=list)
|
||||
|
||||
# Decisions (from Run.decisions - can be large, so store references)
|
||||
decisions: list[dict[str, Any]] = Field(default_factory=list)
|
||||
|
||||
# Input data (for debugging/replay)
|
||||
input_data: dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
# Isolation level (from ExecutionContext)
|
||||
isolation_level: str = "shared"
|
||||
|
||||
model_config = {"extra": "allow"}
|
||||
|
||||
@computed_field
|
||||
@property
|
||||
def duration_ms(self) -> int:
|
||||
"""Duration of the session in milliseconds."""
|
||||
if not self.timestamps.completed_at:
|
||||
return 0
|
||||
started = datetime.fromisoformat(self.timestamps.started_at)
|
||||
completed = datetime.fromisoformat(self.timestamps.completed_at)
|
||||
return int((completed - started).total_seconds() * 1000)
|
||||
|
||||
@computed_field
|
||||
@property
|
||||
def is_resumable(self) -> bool:
|
||||
"""Can this session be resumed?"""
|
||||
return self.status == SessionStatus.PAUSED and self.progress.resume_from is not None
|
||||
|
||||
@classmethod
|
||||
def from_execution_result(
|
||||
cls,
|
||||
session_id: str,
|
||||
goal_id: str,
|
||||
result: "ExecutionResult",
|
||||
stream_id: str = "",
|
||||
correlation_id: str = "",
|
||||
started_at: str = "",
|
||||
input_data: dict[str, Any] | None = None,
|
||||
agent_id: str = "",
|
||||
entry_point: str = "start",
|
||||
) -> "SessionState":
|
||||
"""Create SessionState from ExecutionResult."""
|
||||
|
||||
now = datetime.now().isoformat()
|
||||
|
||||
# Determine status based on execution result
|
||||
if result.paused_at:
|
||||
status = SessionStatus.PAUSED
|
||||
elif result.success:
|
||||
status = SessionStatus.COMPLETED
|
||||
else:
|
||||
status = SessionStatus.FAILED
|
||||
|
||||
return cls(
|
||||
session_id=session_id,
|
||||
stream_id=stream_id,
|
||||
correlation_id=correlation_id,
|
||||
goal_id=goal_id,
|
||||
agent_id=agent_id,
|
||||
entry_point=entry_point,
|
||||
status=status,
|
||||
timestamps=SessionTimestamps(
|
||||
started_at=started_at or now,
|
||||
updated_at=now,
|
||||
completed_at=now if not result.paused_at else None,
|
||||
paused_at_time=now if result.paused_at else None,
|
||||
),
|
||||
progress=SessionProgress(
|
||||
current_node=result.paused_at or (result.path[-1] if result.path else None),
|
||||
paused_at=result.paused_at,
|
||||
resume_from=result.session_state.get("resume_from")
|
||||
if result.session_state
|
||||
else None,
|
||||
steps_executed=result.steps_executed,
|
||||
total_tokens=result.total_tokens,
|
||||
total_latency_ms=result.total_latency_ms,
|
||||
path=result.path,
|
||||
total_retries=result.total_retries,
|
||||
nodes_with_failures=result.nodes_with_failures,
|
||||
retry_details=result.retry_details,
|
||||
had_partial_failures=result.had_partial_failures,
|
||||
execution_quality=result.execution_quality,
|
||||
node_visit_counts=result.node_visit_counts,
|
||||
),
|
||||
result=SessionResult(
|
||||
success=result.success,
|
||||
error=result.error,
|
||||
output=result.output,
|
||||
),
|
||||
memory=result.session_state.get("memory", {}) if result.session_state else {},
|
||||
input_data=input_data or {},
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_legacy_run(cls, run: "Run", session_id: str, stream_id: str = "") -> "SessionState":
|
||||
"""Create SessionState from legacy Run object."""
|
||||
from framework.schemas.run import RunStatus
|
||||
|
||||
now = datetime.now().isoformat()
|
||||
|
||||
# Map RunStatus to SessionStatus
|
||||
status_mapping = {
|
||||
RunStatus.RUNNING: SessionStatus.ACTIVE,
|
||||
RunStatus.COMPLETED: SessionStatus.COMPLETED,
|
||||
RunStatus.FAILED: SessionStatus.FAILED,
|
||||
RunStatus.CANCELLED: SessionStatus.CANCELLED,
|
||||
RunStatus.STUCK: SessionStatus.FAILED,
|
||||
}
|
||||
status = status_mapping.get(run.status, SessionStatus.FAILED)
|
||||
|
||||
return cls(
|
||||
schema_version="1.0",
|
||||
session_id=session_id,
|
||||
stream_id=stream_id,
|
||||
goal_id=run.goal_id,
|
||||
status=status,
|
||||
timestamps=SessionTimestamps(
|
||||
started_at=run.started_at.isoformat(),
|
||||
updated_at=now,
|
||||
completed_at=run.completed_at.isoformat() if run.completed_at else None,
|
||||
),
|
||||
result=SessionResult(
|
||||
success=run.status == RunStatus.COMPLETED,
|
||||
output=run.output_data,
|
||||
),
|
||||
metrics=SessionMetrics(
|
||||
decision_count=run.metrics.total_decisions,
|
||||
problem_count=len(run.problems),
|
||||
total_input_tokens=run.metrics.total_tokens, # Approximate
|
||||
total_output_tokens=0, # Not tracked in old format
|
||||
nodes_executed=run.metrics.nodes_executed,
|
||||
edges_traversed=run.metrics.edges_traversed,
|
||||
),
|
||||
decisions=[d.model_dump() for d in run.decisions],
|
||||
problems=[p.model_dump() for p in run.problems],
|
||||
input_data=run.input_data,
|
||||
)
|
||||
|
||||
def to_session_state_dict(self) -> dict[str, Any]:
|
||||
"""Convert to session_state format for GraphExecutor.execute()."""
|
||||
return {
|
||||
"paused_at": self.progress.paused_at,
|
||||
"resume_from": self.progress.resume_from,
|
||||
"memory": self.memory,
|
||||
"next_node": None,
|
||||
}
|
||||
@@ -1,7 +1,10 @@
|
||||
"""
|
||||
File-based storage backend for runtime data.
|
||||
|
||||
Stores runs as JSON files with indexes for efficient querying.
|
||||
DEPRECATED: This storage backend is deprecated for new sessions.
|
||||
New sessions use unified storage at sessions/{session_id}/state.json.
|
||||
This module is kept for backward compatibility with old run data only.
|
||||
|
||||
Uses Pydantic's built-in serialization.
|
||||
"""
|
||||
|
||||
@@ -14,21 +17,24 @@ from framework.utils.io import atomic_write
|
||||
|
||||
class FileStorage:
|
||||
"""
|
||||
Simple file-based storage for runs.
|
||||
DEPRECATED: File-based storage for old runs only.
|
||||
|
||||
Directory structure:
|
||||
New sessions use unified storage at sessions/{session_id}/state.json.
|
||||
This class is kept for backward compatibility with old run data.
|
||||
|
||||
Old directory structure (deprecated):
|
||||
{base_path}/
|
||||
runs/
|
||||
{run_id}.json # Full run data
|
||||
indexes/
|
||||
runs/ # DEPRECATED - no longer written
|
||||
{run_id}.json
|
||||
summaries/ # DEPRECATED - no longer written
|
||||
{run_id}.json
|
||||
indexes/ # DEPRECATED - no longer written or read
|
||||
by_goal/
|
||||
{goal_id}.json # List of run IDs for this goal
|
||||
{goal_id}.json
|
||||
by_status/
|
||||
{status}.json # List of run IDs with this status
|
||||
{status}.json
|
||||
by_node/
|
||||
{node_id}.json # List of run IDs that used this node
|
||||
summaries/
|
||||
{run_id}.json # Run summary (for quick loading)
|
||||
{node_id}.json
|
||||
"""
|
||||
|
||||
def __init__(self, base_path: str | Path):
|
||||
@@ -36,16 +42,14 @@ class FileStorage:
|
||||
self._ensure_dirs()
|
||||
|
||||
def _ensure_dirs(self) -> None:
|
||||
"""Create directory structure if it doesn't exist."""
|
||||
dirs = [
|
||||
self.base_path / "runs",
|
||||
self.base_path / "indexes" / "by_goal",
|
||||
self.base_path / "indexes" / "by_status",
|
||||
self.base_path / "indexes" / "by_node",
|
||||
self.base_path / "summaries",
|
||||
]
|
||||
for d in dirs:
|
||||
d.mkdir(parents=True, exist_ok=True)
|
||||
"""Create directory structure if it doesn't exist.
|
||||
|
||||
DEPRECATED: All directories (runs/, summaries/, indexes/) are deprecated.
|
||||
New sessions use unified storage at sessions/{session_id}/state.json.
|
||||
This method is now a no-op. Tests should not rely on this.
|
||||
"""
|
||||
# No-op: do not create deprecated directories
|
||||
pass
|
||||
|
||||
def _validate_key(self, key: str) -> None:
|
||||
"""
|
||||
@@ -84,23 +88,22 @@ class FileStorage:
|
||||
# === RUN OPERATIONS ===
|
||||
|
||||
def save_run(self, run: Run) -> None:
|
||||
"""Save a run to storage."""
|
||||
# Save full run using Pydantic's model_dump_json
|
||||
run_path = self.base_path / "runs" / f"{run.id}.json"
|
||||
with atomic_write(run_path) as f:
|
||||
f.write(run.model_dump_json(indent=2))
|
||||
"""Save a run to storage.
|
||||
|
||||
# Save summary
|
||||
summary = RunSummary.from_run(run)
|
||||
summary_path = self.base_path / "summaries" / f"{run.id}.json"
|
||||
with atomic_write(summary_path) as f:
|
||||
f.write(summary.model_dump_json(indent=2))
|
||||
DEPRECATED: This method is now a no-op.
|
||||
New sessions use unified storage at sessions/{session_id}/state.json.
|
||||
Tests should not rely on FileStorage - use unified session storage instead.
|
||||
"""
|
||||
import warnings
|
||||
|
||||
# Update indexes
|
||||
self._add_to_index("by_goal", run.goal_id, run.id)
|
||||
self._add_to_index("by_status", run.status.value, run.id)
|
||||
for node_id in run.metrics.nodes_executed:
|
||||
self._add_to_index("by_node", node_id, run.id)
|
||||
warnings.warn(
|
||||
"FileStorage.save_run() is deprecated. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json. "
|
||||
"This write has been skipped.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
# No-op: do not write to deprecated locations
|
||||
|
||||
def load_run(self, run_id: str) -> Run | None:
|
||||
"""Load a run from storage."""
|
||||
@@ -148,17 +151,53 @@ class FileStorage:
|
||||
# === QUERY OPERATIONS ===
|
||||
|
||||
def get_runs_by_goal(self, goal_id: str) -> list[str]:
|
||||
"""Get all run IDs for a goal."""
|
||||
"""Get all run IDs for a goal.
|
||||
|
||||
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
|
||||
This method only returns old run IDs from deprecated indexes.
|
||||
"""
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
"FileStorage.get_runs_by_goal() is deprecated. "
|
||||
"For new sessions, scan sessions/*/state.json instead.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
return self._get_index("by_goal", goal_id)
|
||||
|
||||
def get_runs_by_status(self, status: str | RunStatus) -> list[str]:
|
||||
"""Get all run IDs with a status."""
|
||||
"""Get all run IDs with a status.
|
||||
|
||||
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
|
||||
This method only returns old run IDs from deprecated indexes.
|
||||
"""
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
"FileStorage.get_runs_by_status() is deprecated. "
|
||||
"For new sessions, scan sessions/*/state.json instead.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
if isinstance(status, RunStatus):
|
||||
status = status.value
|
||||
return self._get_index("by_status", status)
|
||||
|
||||
def get_runs_by_node(self, node_id: str) -> list[str]:
|
||||
"""Get all run IDs that executed a node."""
|
||||
"""Get all run IDs that executed a node.
|
||||
|
||||
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
|
||||
This method only returns old run IDs from deprecated indexes.
|
||||
"""
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
"FileStorage.get_runs_by_node() is deprecated. "
|
||||
"For new sessions, scan sessions/*/state.json instead.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
return self._get_index("by_node", node_id)
|
||||
|
||||
def list_all_runs(self) -> list[str]:
|
||||
@@ -167,8 +206,22 @@ class FileStorage:
|
||||
return [f.stem for f in runs_dir.glob("*.json")]
|
||||
|
||||
def list_all_goals(self) -> list[str]:
|
||||
"""List all goal IDs that have runs."""
|
||||
"""List all goal IDs that have runs.
|
||||
|
||||
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
|
||||
This method only returns goals from old run IDs in deprecated indexes.
|
||||
"""
|
||||
import warnings
|
||||
|
||||
warnings.warn(
|
||||
"FileStorage.list_all_goals() is deprecated. "
|
||||
"For new sessions, scan sessions/*/state.json instead.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
goals_dir = self.base_path / "indexes" / "by_goal"
|
||||
if not goals_dir.exists():
|
||||
return []
|
||||
return [f.stem for f in goals_dir.glob("*.json")]
|
||||
|
||||
# === INDEX OPERATIONS ===
|
||||
|
||||
@@ -0,0 +1,213 @@
|
||||
"""
|
||||
Session Store - Unified session storage with state.json.
|
||||
|
||||
Handles reading and writing session state to the new unified structure:
|
||||
sessions/session_YYYYMMDD_HHMMSS_{uuid}/state.json
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import uuid
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from framework.schemas.session_state import SessionState
|
||||
from framework.utils.io import atomic_write
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SessionStore:
|
||||
"""
|
||||
Unified session storage with state.json.
|
||||
|
||||
Manages sessions in the new structure:
|
||||
{base_path}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/
|
||||
├── state.json # Single source of truth
|
||||
├── conversations/ # Per-node EventLoop state
|
||||
├── artifacts/ # Spillover data
|
||||
└── logs/ # L1/L2/L3 observability
|
||||
├── summary.json
|
||||
├── details.jsonl
|
||||
└── tool_logs.jsonl
|
||||
"""
|
||||
|
||||
def __init__(self, base_path: Path):
|
||||
"""
|
||||
Initialize session store.
|
||||
|
||||
Args:
|
||||
base_path: Base path for storage (e.g., ~/.hive/agents/twitter_outreach)
|
||||
"""
|
||||
self.base_path = Path(base_path)
|
||||
self.sessions_dir = self.base_path / "sessions"
|
||||
|
||||
def generate_session_id(self) -> str:
|
||||
"""
|
||||
Generate session ID in format: session_YYYYMMDD_HHMMSS_{uuid}.
|
||||
|
||||
Returns:
|
||||
Session ID string (e.g., "session_20260206_143022_abc12345")
|
||||
"""
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
short_uuid = uuid.uuid4().hex[:8]
|
||||
return f"session_{timestamp}_{short_uuid}"
|
||||
|
||||
def get_session_path(self, session_id: str) -> Path:
|
||||
"""
|
||||
Get path to session directory.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
|
||||
Returns:
|
||||
Path to session directory
|
||||
"""
|
||||
return self.sessions_dir / session_id
|
||||
|
||||
def get_state_path(self, session_id: str) -> Path:
|
||||
"""
|
||||
Get path to state.json file.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
|
||||
Returns:
|
||||
Path to state.json
|
||||
"""
|
||||
return self.get_session_path(session_id) / "state.json"
|
||||
|
||||
async def write_state(self, session_id: str, state: SessionState) -> None:
|
||||
"""
|
||||
Atomically write state.json for a session.
|
||||
|
||||
Uses temp file + rename for crash safety.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
state: SessionState to write
|
||||
"""
|
||||
|
||||
def _write():
|
||||
state_path = self.get_state_path(session_id)
|
||||
state_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with atomic_write(state_path) as f:
|
||||
f.write(state.model_dump_json(indent=2))
|
||||
|
||||
await asyncio.to_thread(_write)
|
||||
logger.debug(f"Wrote state.json for session {session_id}")
|
||||
|
||||
async def read_state(self, session_id: str) -> SessionState | None:
|
||||
"""
|
||||
Read state.json for a session.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
|
||||
Returns:
|
||||
SessionState or None if not found
|
||||
"""
|
||||
|
||||
def _read():
|
||||
state_path = self.get_state_path(session_id)
|
||||
if not state_path.exists():
|
||||
return None
|
||||
|
||||
return SessionState.model_validate_json(state_path.read_text())
|
||||
|
||||
return await asyncio.to_thread(_read)
|
||||
|
||||
async def list_sessions(
|
||||
self,
|
||||
status: str | None = None,
|
||||
goal_id: str | None = None,
|
||||
limit: int = 100,
|
||||
) -> list[SessionState]:
|
||||
"""
|
||||
List sessions, optionally filtered by status or goal.
|
||||
|
||||
Args:
|
||||
status: Optional status filter (e.g., "paused", "completed")
|
||||
goal_id: Optional goal ID filter
|
||||
limit: Maximum number of sessions to return
|
||||
|
||||
Returns:
|
||||
List of SessionState objects
|
||||
"""
|
||||
|
||||
def _scan():
|
||||
sessions = []
|
||||
|
||||
if not self.sessions_dir.exists():
|
||||
return sessions
|
||||
|
||||
for session_dir in self.sessions_dir.iterdir():
|
||||
if not session_dir.is_dir():
|
||||
continue
|
||||
|
||||
state_path = session_dir / "state.json"
|
||||
if not state_path.exists():
|
||||
continue
|
||||
|
||||
try:
|
||||
state = SessionState.model_validate_json(state_path.read_text())
|
||||
|
||||
# Apply filters
|
||||
if status and state.status != status:
|
||||
continue
|
||||
|
||||
if goal_id and state.goal_id != goal_id:
|
||||
continue
|
||||
|
||||
sessions.append(state)
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to load {state_path}: {e}")
|
||||
continue
|
||||
|
||||
# Sort by updated_at descending (most recent first)
|
||||
sessions.sort(key=lambda s: s.timestamps.updated_at, reverse=True)
|
||||
return sessions[:limit]
|
||||
|
||||
return await asyncio.to_thread(_scan)
|
||||
|
||||
async def delete_session(self, session_id: str) -> bool:
|
||||
"""
|
||||
Delete a session and all its data.
|
||||
|
||||
Args:
|
||||
session_id: Session ID to delete
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
|
||||
def _delete():
|
||||
import shutil
|
||||
|
||||
session_path = self.get_session_path(session_id)
|
||||
if not session_path.exists():
|
||||
return False
|
||||
|
||||
shutil.rmtree(session_path)
|
||||
logger.info(f"Deleted session {session_id}")
|
||||
return True
|
||||
|
||||
return await asyncio.to_thread(_delete)
|
||||
|
||||
async def session_exists(self, session_id: str) -> bool:
|
||||
"""
|
||||
Check if a session exists.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
|
||||
Returns:
|
||||
True if session exists
|
||||
"""
|
||||
|
||||
def _check():
|
||||
return self.get_state_path(session_id).exists()
|
||||
|
||||
return await asyncio.to_thread(_check)
|
||||
@@ -0,0 +1,179 @@
|
||||
"""
|
||||
State Writer - Dual-write adapter for migration period.
|
||||
|
||||
Writes execution state to both old (Run/RunSummary) and new (state.json) formats
|
||||
to maintain backward compatibility during the transition period.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime
|
||||
|
||||
from framework.schemas.run import Problem, Run, RunMetrics, RunStatus
|
||||
from framework.schemas.session_state import SessionState, SessionStatus
|
||||
from framework.storage.concurrent import ConcurrentStorage
|
||||
from framework.storage.session_store import SessionStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class StateWriter:
|
||||
"""
|
||||
Writes execution state to both old and new formats during migration.
|
||||
|
||||
During the dual-write phase:
|
||||
- New format (state.json) is written when USE_UNIFIED_SESSIONS=true
|
||||
- Old format (Run/RunSummary) is always written for backward compatibility
|
||||
"""
|
||||
|
||||
def __init__(self, old_storage: ConcurrentStorage, session_store: SessionStore):
|
||||
"""
|
||||
Initialize state writer.
|
||||
|
||||
Args:
|
||||
old_storage: ConcurrentStorage for old format (runs/, summaries/)
|
||||
session_store: SessionStore for new format (sessions/*/state.json)
|
||||
"""
|
||||
self.old = old_storage
|
||||
self.new = session_store
|
||||
self.dual_write_enabled = os.getenv("USE_UNIFIED_SESSIONS", "false").lower() == "true"
|
||||
|
||||
async def write_execution_state(
|
||||
self,
|
||||
session_id: str,
|
||||
state: SessionState,
|
||||
) -> None:
|
||||
"""
|
||||
Write execution state to both old and new formats.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
state: SessionState to write
|
||||
"""
|
||||
# Write to new format if enabled
|
||||
if self.dual_write_enabled:
|
||||
try:
|
||||
await self.new.write_state(session_id, state)
|
||||
logger.debug(f"Wrote state.json for session {session_id}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to write state.json for {session_id}: {e}")
|
||||
# Don't fail - old format is still written
|
||||
|
||||
# Always write to old format for backward compatibility
|
||||
try:
|
||||
run = self._convert_to_run(state)
|
||||
await self.old.save_run(run)
|
||||
logger.debug(f"Wrote Run object for session {session_id}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to write Run object for {session_id}: {e}")
|
||||
# This is more critical - reraise if old format fails
|
||||
raise
|
||||
|
||||
def _convert_to_run(self, state: SessionState) -> Run:
|
||||
"""
|
||||
Convert SessionState to legacy Run object.
|
||||
|
||||
Args:
|
||||
state: SessionState to convert
|
||||
|
||||
Returns:
|
||||
Run object
|
||||
"""
|
||||
# Map SessionStatus to RunStatus
|
||||
status_mapping = {
|
||||
SessionStatus.ACTIVE: RunStatus.RUNNING,
|
||||
SessionStatus.PAUSED: RunStatus.RUNNING, # Paused is still "running" in old format
|
||||
SessionStatus.COMPLETED: RunStatus.COMPLETED,
|
||||
SessionStatus.FAILED: RunStatus.FAILED,
|
||||
SessionStatus.CANCELLED: RunStatus.CANCELLED,
|
||||
}
|
||||
run_status = status_mapping.get(state.status, RunStatus.FAILED)
|
||||
|
||||
# Convert timestamps
|
||||
started_at = datetime.fromisoformat(state.timestamps.started_at)
|
||||
completed_at = (
|
||||
datetime.fromisoformat(state.timestamps.completed_at)
|
||||
if state.timestamps.completed_at
|
||||
else None
|
||||
)
|
||||
|
||||
# Build RunMetrics
|
||||
metrics = RunMetrics(
|
||||
total_decisions=state.metrics.decision_count,
|
||||
successful_decisions=state.metrics.decision_count
|
||||
- len(state.progress.nodes_with_failures), # Approximate
|
||||
failed_decisions=len(state.progress.nodes_with_failures),
|
||||
total_tokens=state.metrics.total_input_tokens + state.metrics.total_output_tokens,
|
||||
total_latency_ms=state.progress.total_latency_ms,
|
||||
nodes_executed=state.metrics.nodes_executed,
|
||||
edges_traversed=state.metrics.edges_traversed,
|
||||
)
|
||||
|
||||
# Convert problems (SessionState stores as dicts, Run expects Problem objects)
|
||||
problems = []
|
||||
for p_dict in state.problems:
|
||||
# Handle both old Problem objects and new dict format
|
||||
if isinstance(p_dict, dict):
|
||||
problems.append(Problem(**p_dict))
|
||||
else:
|
||||
problems.append(p_dict)
|
||||
|
||||
# Convert decisions (SessionState stores as dicts, Run expects Decision objects)
|
||||
from framework.schemas.decision import Decision
|
||||
|
||||
decisions = []
|
||||
for d_dict in state.decisions:
|
||||
# Handle both old Decision objects and new dict format
|
||||
if isinstance(d_dict, dict):
|
||||
try:
|
||||
decisions.append(Decision(**d_dict))
|
||||
except Exception:
|
||||
# Skip invalid decisions
|
||||
continue
|
||||
else:
|
||||
decisions.append(d_dict)
|
||||
|
||||
# Create Run object
|
||||
run = Run(
|
||||
id=state.session_id, # Use session_id as run_id
|
||||
goal_id=state.goal_id,
|
||||
started_at=started_at,
|
||||
status=run_status,
|
||||
completed_at=completed_at,
|
||||
decisions=decisions,
|
||||
problems=problems,
|
||||
metrics=metrics,
|
||||
goal_description="", # Not stored in SessionState
|
||||
input_data=state.input_data,
|
||||
output_data=state.result.output,
|
||||
)
|
||||
|
||||
return run
|
||||
|
||||
async def read_state(
|
||||
self,
|
||||
session_id: str,
|
||||
prefer_new: bool = True,
|
||||
) -> SessionState | None:
|
||||
"""
|
||||
Read execution state from either format.
|
||||
|
||||
Args:
|
||||
session_id: Session ID
|
||||
prefer_new: If True, try new format first (default)
|
||||
|
||||
Returns:
|
||||
SessionState or None if not found
|
||||
"""
|
||||
if prefer_new:
|
||||
# Try new format first
|
||||
state = await self.new.read_state(session_id)
|
||||
if state:
|
||||
return state
|
||||
|
||||
# Fall back to old format
|
||||
run = await self.old.load_run(session_id)
|
||||
if run:
|
||||
return SessionState.from_legacy_run(run, session_id)
|
||||
|
||||
return None
|
||||
@@ -26,9 +26,9 @@ Testing tools are integrated into the main agent_builder_server.py:
|
||||
## CLI Commands
|
||||
|
||||
```bash
|
||||
python -m framework test-run <agent_path> --goal <goal_id>
|
||||
python -m framework test-debug <goal_id> <test_id>
|
||||
python -m framework test-list <agent_path> --goal <goal_id>
|
||||
uv run python -m framework test-run <agent_path> --goal <goal_id>
|
||||
uv run python -m framework test-debug <goal_id> <test_id>
|
||||
uv run python -m framework test-list <agent_path> --goal <goal_id>
|
||||
```
|
||||
"""
|
||||
|
||||
|
||||
@@ -0,0 +1,543 @@
|
||||
import logging
|
||||
import platform
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
from textual.app import App, ComposeResult
|
||||
from textual.binding import Binding
|
||||
from textual.containers import Container, Horizontal, Vertical
|
||||
from textual.widgets import Footer, Label
|
||||
|
||||
from framework.runtime.agent_runtime import AgentRuntime
|
||||
from framework.runtime.event_bus import AgentEvent, EventType
|
||||
from framework.tui.widgets.chat_repl import ChatRepl
|
||||
from framework.tui.widgets.graph_view import GraphOverview
|
||||
from framework.tui.widgets.log_pane import LogPane
|
||||
from framework.tui.widgets.selectable_rich_log import SelectableRichLog
|
||||
|
||||
|
||||
class StatusBar(Container):
|
||||
"""Live status bar showing agent execution state."""
|
||||
|
||||
DEFAULT_CSS = """
|
||||
StatusBar {
|
||||
dock: top;
|
||||
height: 1;
|
||||
background: $panel;
|
||||
color: $text;
|
||||
padding: 0 1;
|
||||
}
|
||||
StatusBar > Label {
|
||||
width: 100%;
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self, graph_id: str = ""):
|
||||
super().__init__()
|
||||
self._graph_id = graph_id
|
||||
self._state = "idle"
|
||||
self._active_node: str | None = None
|
||||
self._node_detail: str = ""
|
||||
self._start_time: float | None = None
|
||||
self._final_elapsed: float | None = None
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
yield Label(id="status-content")
|
||||
|
||||
def on_mount(self) -> None:
|
||||
self._refresh()
|
||||
self.set_interval(1.0, self._refresh)
|
||||
|
||||
def _format_elapsed(self, seconds: float) -> str:
|
||||
total = int(seconds)
|
||||
hours, remainder = divmod(total, 3600)
|
||||
mins, secs = divmod(remainder, 60)
|
||||
if hours:
|
||||
return f"{hours}:{mins:02d}:{secs:02d}"
|
||||
return f"{mins}:{secs:02d}"
|
||||
|
||||
def _refresh(self) -> None:
|
||||
parts: list[str] = []
|
||||
|
||||
if self._graph_id:
|
||||
parts.append(f"[bold]{self._graph_id}[/bold]")
|
||||
|
||||
if self._state == "idle":
|
||||
parts.append("[dim]○ idle[/dim]")
|
||||
elif self._state == "running":
|
||||
parts.append("[bold green]● running[/bold green]")
|
||||
elif self._state == "completed":
|
||||
parts.append("[green]✓ done[/green]")
|
||||
elif self._state == "failed":
|
||||
parts.append("[bold red]✗ failed[/bold red]")
|
||||
|
||||
if self._active_node:
|
||||
node_str = f"[cyan]{self._active_node}[/cyan]"
|
||||
if self._node_detail:
|
||||
node_str += f" [dim]({self._node_detail})[/dim]"
|
||||
parts.append(node_str)
|
||||
|
||||
if self._state == "running" and self._start_time:
|
||||
parts.append(f"[dim]{self._format_elapsed(time.time() - self._start_time)}[/dim]")
|
||||
elif self._final_elapsed is not None:
|
||||
parts.append(f"[dim]{self._format_elapsed(self._final_elapsed)}[/dim]")
|
||||
|
||||
try:
|
||||
label = self.query_one("#status-content", Label)
|
||||
label.update(" │ ".join(parts))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def set_graph_id(self, graph_id: str) -> None:
|
||||
self._graph_id = graph_id
|
||||
self._refresh()
|
||||
|
||||
def set_running(self, entry_node: str = "") -> None:
|
||||
self._state = "running"
|
||||
self._active_node = entry_node or None
|
||||
self._node_detail = ""
|
||||
self._start_time = time.time()
|
||||
self._final_elapsed = None
|
||||
self._refresh()
|
||||
|
||||
def set_completed(self) -> None:
|
||||
self._state = "completed"
|
||||
if self._start_time:
|
||||
self._final_elapsed = time.time() - self._start_time
|
||||
self._active_node = None
|
||||
self._node_detail = ""
|
||||
self._start_time = None
|
||||
self._refresh()
|
||||
|
||||
def set_failed(self, error: str = "") -> None:
|
||||
self._state = "failed"
|
||||
if self._start_time:
|
||||
self._final_elapsed = time.time() - self._start_time
|
||||
self._node_detail = error[:40] if error else ""
|
||||
self._start_time = None
|
||||
self._refresh()
|
||||
|
||||
def set_active_node(self, node_id: str, detail: str = "") -> None:
|
||||
self._active_node = node_id
|
||||
self._node_detail = detail
|
||||
self._refresh()
|
||||
|
||||
def set_node_detail(self, detail: str) -> None:
|
||||
self._node_detail = detail
|
||||
self._refresh()
|
||||
|
||||
|
||||
class AdenTUI(App):
|
||||
TITLE = "Aden TUI Dashboard"
|
||||
COMMAND_PALETTE_BINDING = "ctrl+o"
|
||||
CSS = """
|
||||
Screen {
|
||||
layout: vertical;
|
||||
background: $surface;
|
||||
}
|
||||
|
||||
#left-pane {
|
||||
width: 60%;
|
||||
height: 100%;
|
||||
layout: vertical;
|
||||
background: $surface;
|
||||
}
|
||||
|
||||
GraphOverview {
|
||||
height: 40%;
|
||||
background: $panel;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
LogPane {
|
||||
height: 60%;
|
||||
background: $surface;
|
||||
padding: 0;
|
||||
margin-bottom: 1;
|
||||
}
|
||||
|
||||
ChatRepl {
|
||||
width: 40%;
|
||||
height: 100%;
|
||||
background: $panel;
|
||||
border-left: tall $primary;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
#chat-history {
|
||||
height: 1fr;
|
||||
width: 100%;
|
||||
background: $surface;
|
||||
border: none;
|
||||
scrollbar-background: $panel;
|
||||
scrollbar-color: $primary;
|
||||
}
|
||||
|
||||
RichLog {
|
||||
background: $surface;
|
||||
border: none;
|
||||
scrollbar-background: $panel;
|
||||
scrollbar-color: $primary;
|
||||
}
|
||||
|
||||
Input {
|
||||
background: $surface;
|
||||
border: tall $primary;
|
||||
margin-top: 1;
|
||||
}
|
||||
|
||||
Input:focus {
|
||||
border: tall $accent;
|
||||
}
|
||||
|
||||
StatusBar {
|
||||
background: $panel;
|
||||
color: $text;
|
||||
height: 1;
|
||||
padding: 0 1;
|
||||
}
|
||||
|
||||
Footer {
|
||||
background: $panel;
|
||||
color: $text-muted;
|
||||
}
|
||||
"""
|
||||
|
||||
BINDINGS = [
|
||||
Binding("q", "quit", "Quit"),
|
||||
Binding("ctrl+c", "ctrl_c", "Interrupt", show=False, priority=True),
|
||||
Binding("super+c", "ctrl_c", "Copy", show=False, priority=True),
|
||||
Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
|
||||
Binding("tab", "focus_next", "Next Panel", show=True),
|
||||
Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
|
||||
]
|
||||
|
||||
def __init__(self, runtime: AgentRuntime):
|
||||
super().__init__()
|
||||
|
||||
self.runtime = runtime
|
||||
self.log_pane = LogPane()
|
||||
self.graph_view = GraphOverview(runtime)
|
||||
self.chat_repl = ChatRepl(runtime)
|
||||
self.status_bar = StatusBar(graph_id=runtime.graph.id)
|
||||
self.is_ready = False
|
||||
|
||||
def open_url(self, url: str, *, new_tab: bool = True) -> None:
|
||||
"""Override to use native `open` for file:// URLs on macOS."""
|
||||
if url.startswith("file://") and platform.system() == "Darwin":
|
||||
path = url.removeprefix("file://")
|
||||
subprocess.Popen(["open", path])
|
||||
else:
|
||||
super().open_url(url, new_tab=new_tab)
|
||||
|
||||
def action_ctrl_c(self) -> None:
|
||||
# Check if any SelectableRichLog has an active selection to copy
|
||||
for widget in self.query(SelectableRichLog):
|
||||
if widget.selection is not None:
|
||||
text = widget.copy_selection()
|
||||
if text:
|
||||
widget.clear_selection()
|
||||
self.notify("Copied to clipboard", severity="information", timeout=2)
|
||||
return
|
||||
|
||||
self.notify("Press [b]q[/b] to quit", severity="warning", timeout=3)
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
yield self.status_bar
|
||||
|
||||
yield Horizontal(
|
||||
Vertical(
|
||||
self.log_pane,
|
||||
self.graph_view,
|
||||
id="left-pane",
|
||||
),
|
||||
self.chat_repl,
|
||||
)
|
||||
|
||||
yield Footer()
|
||||
|
||||
async def on_mount(self) -> None:
|
||||
"""Called when app starts."""
|
||||
self.title = "Aden TUI Dashboard"
|
||||
|
||||
# Add logging setup
|
||||
self._setup_logging_queue()
|
||||
|
||||
# Set ready immediately so _poll_logs can process messages
|
||||
self.is_ready = True
|
||||
|
||||
# Add event subscription with delay to ensure TUI is fully initialized
|
||||
self.call_later(self._init_runtime_connection)
|
||||
|
||||
# Delay initial log messages until layout is fully rendered
|
||||
def write_initial_logs():
|
||||
logging.info("TUI Dashboard initialized successfully")
|
||||
logging.info("Waiting for agent execution to start...")
|
||||
|
||||
# Wait for layout to be fully rendered before writing logs
|
||||
self.set_timer(0.2, write_initial_logs)
|
||||
|
||||
def _setup_logging_queue(self) -> None:
|
||||
"""Setup a thread-safe queue for logs."""
|
||||
try:
|
||||
import queue
|
||||
from logging.handlers import QueueHandler
|
||||
|
||||
self.log_queue = queue.Queue()
|
||||
self.queue_handler = QueueHandler(self.log_queue)
|
||||
self.queue_handler.setLevel(logging.INFO)
|
||||
|
||||
# Get root logger
|
||||
root_logger = logging.getLogger()
|
||||
|
||||
# Remove ALL existing handlers to prevent stdout output
|
||||
# This is critical - StreamHandlers cause text to appear in header
|
||||
for handler in root_logger.handlers[:]:
|
||||
root_logger.removeHandler(handler)
|
||||
|
||||
# Add ONLY our queue handler
|
||||
root_logger.addHandler(self.queue_handler)
|
||||
root_logger.setLevel(logging.INFO)
|
||||
|
||||
# Suppress LiteLLM logging completely
|
||||
litellm_logger = logging.getLogger("LiteLLM")
|
||||
litellm_logger.setLevel(logging.CRITICAL) # Only show critical errors
|
||||
litellm_logger.propagate = False # Don't propagate to root logger
|
||||
|
||||
# Start polling
|
||||
self.set_interval(0.1, self._poll_logs)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _poll_logs(self) -> None:
|
||||
"""Poll the log queue and update UI."""
|
||||
if not self.is_ready:
|
||||
return
|
||||
|
||||
try:
|
||||
while not self.log_queue.empty():
|
||||
record = self.log_queue.get_nowait()
|
||||
# Filter out framework/library logs
|
||||
if record.name.startswith(("textual", "LiteLLM", "litellm")):
|
||||
continue
|
||||
|
||||
self.log_pane.write_python_log(record)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_EVENT_TYPES = [
|
||||
EventType.LLM_TEXT_DELTA,
|
||||
EventType.CLIENT_OUTPUT_DELTA,
|
||||
EventType.TOOL_CALL_STARTED,
|
||||
EventType.TOOL_CALL_COMPLETED,
|
||||
EventType.EXECUTION_STARTED,
|
||||
EventType.EXECUTION_COMPLETED,
|
||||
EventType.EXECUTION_FAILED,
|
||||
EventType.NODE_LOOP_STARTED,
|
||||
EventType.NODE_LOOP_ITERATION,
|
||||
EventType.NODE_LOOP_COMPLETED,
|
||||
EventType.CLIENT_INPUT_REQUESTED,
|
||||
EventType.NODE_STALLED,
|
||||
EventType.GOAL_PROGRESS,
|
||||
EventType.GOAL_ACHIEVED,
|
||||
EventType.CONSTRAINT_VIOLATION,
|
||||
EventType.STATE_CHANGED,
|
||||
EventType.NODE_INPUT_BLOCKED,
|
||||
]
|
||||
|
||||
_LOG_PANE_EVENTS = frozenset(_EVENT_TYPES) - {
|
||||
EventType.LLM_TEXT_DELTA,
|
||||
EventType.CLIENT_OUTPUT_DELTA,
|
||||
}
|
||||
|
||||
async def _init_runtime_connection(self) -> None:
|
||||
"""Subscribe to runtime events with an async handler."""
|
||||
try:
|
||||
self._subscription_id = self.runtime.subscribe_to_events(
|
||||
event_types=self._EVENT_TYPES,
|
||||
handler=self._handle_event,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
async def _handle_event(self, event: AgentEvent) -> None:
|
||||
"""Called from the agent thread — bridge to Textual's main thread."""
|
||||
try:
|
||||
self.call_from_thread(self._route_event, event)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _route_event(self, event: AgentEvent) -> None:
|
||||
"""Route incoming events to widgets. Runs on Textual's main thread."""
|
||||
if not self.is_ready:
|
||||
return
|
||||
|
||||
try:
|
||||
et = event.type
|
||||
|
||||
# --- Chat REPL events ---
|
||||
if et in (EventType.LLM_TEXT_DELTA, EventType.CLIENT_OUTPUT_DELTA):
|
||||
self.chat_repl.handle_text_delta(
|
||||
event.data.get("content", ""),
|
||||
event.data.get("snapshot", ""),
|
||||
)
|
||||
elif et == EventType.TOOL_CALL_STARTED:
|
||||
self.chat_repl.handle_tool_started(
|
||||
event.data.get("tool_name", "unknown"),
|
||||
event.data.get("tool_input", {}),
|
||||
)
|
||||
elif et == EventType.TOOL_CALL_COMPLETED:
|
||||
self.chat_repl.handle_tool_completed(
|
||||
event.data.get("tool_name", "unknown"),
|
||||
event.data.get("result", ""),
|
||||
event.data.get("is_error", False),
|
||||
)
|
||||
elif et == EventType.EXECUTION_COMPLETED:
|
||||
self.chat_repl.handle_execution_completed(event.data.get("output", {}))
|
||||
elif et == EventType.EXECUTION_FAILED:
|
||||
self.chat_repl.handle_execution_failed(event.data.get("error", "Unknown error"))
|
||||
elif et == EventType.CLIENT_INPUT_REQUESTED:
|
||||
self.chat_repl.handle_input_requested(
|
||||
event.node_id or event.data.get("node_id", ""),
|
||||
)
|
||||
|
||||
# --- Graph view events ---
|
||||
if et in (
|
||||
EventType.EXECUTION_STARTED,
|
||||
EventType.EXECUTION_COMPLETED,
|
||||
EventType.EXECUTION_FAILED,
|
||||
):
|
||||
self.graph_view.update_execution(event)
|
||||
|
||||
if et == EventType.NODE_LOOP_STARTED:
|
||||
self.graph_view.handle_node_loop_started(event.node_id or "")
|
||||
elif et == EventType.NODE_LOOP_ITERATION:
|
||||
self.graph_view.handle_node_loop_iteration(
|
||||
event.node_id or "",
|
||||
event.data.get("iteration", 0),
|
||||
)
|
||||
elif et == EventType.NODE_LOOP_COMPLETED:
|
||||
self.graph_view.handle_node_loop_completed(event.node_id or "")
|
||||
elif et == EventType.NODE_STALLED:
|
||||
self.graph_view.handle_stalled(
|
||||
event.node_id or "",
|
||||
event.data.get("reason", ""),
|
||||
)
|
||||
|
||||
if et == EventType.TOOL_CALL_STARTED:
|
||||
self.graph_view.handle_tool_call(
|
||||
event.node_id or "",
|
||||
event.data.get("tool_name", "unknown"),
|
||||
started=True,
|
||||
)
|
||||
elif et == EventType.TOOL_CALL_COMPLETED:
|
||||
self.graph_view.handle_tool_call(
|
||||
event.node_id or "",
|
||||
event.data.get("tool_name", "unknown"),
|
||||
started=False,
|
||||
)
|
||||
|
||||
# --- Status bar events ---
|
||||
if et == EventType.EXECUTION_STARTED:
|
||||
entry_node = event.data.get("entry_node") or (
|
||||
self.runtime.graph.entry_node if self.runtime else ""
|
||||
)
|
||||
self.status_bar.set_running(entry_node)
|
||||
elif et == EventType.EXECUTION_COMPLETED:
|
||||
self.status_bar.set_completed()
|
||||
elif et == EventType.EXECUTION_FAILED:
|
||||
self.status_bar.set_failed(event.data.get("error", ""))
|
||||
elif et == EventType.NODE_LOOP_STARTED:
|
||||
self.status_bar.set_active_node(event.node_id or "", "thinking...")
|
||||
elif et == EventType.NODE_LOOP_ITERATION:
|
||||
self.status_bar.set_node_detail(f"step {event.data.get('iteration', '?')}")
|
||||
elif et == EventType.TOOL_CALL_STARTED:
|
||||
self.status_bar.set_node_detail(f"{event.data.get('tool_name', '')}...")
|
||||
elif et == EventType.TOOL_CALL_COMPLETED:
|
||||
self.status_bar.set_node_detail("thinking...")
|
||||
elif et == EventType.NODE_STALLED:
|
||||
self.status_bar.set_node_detail(f"stalled: {event.data.get('reason', '')}")
|
||||
|
||||
# --- Log pane events ---
|
||||
if et in self._LOG_PANE_EVENTS:
|
||||
self.log_pane.write_event(event)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def save_screenshot(self, filename: str | None = None) -> str:
|
||||
"""Save a screenshot of the current screen as SVG (viewable in browsers).
|
||||
|
||||
Args:
|
||||
filename: Optional filename for the screenshot. If None, generates timestamp-based name.
|
||||
|
||||
Returns:
|
||||
Path to the saved SVG file.
|
||||
"""
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# Create screenshots directory
|
||||
screenshots_dir = Path("screenshots")
|
||||
screenshots_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Generate filename if not provided
|
||||
if filename is None:
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"tui_screenshot_{timestamp}.svg"
|
||||
|
||||
# Ensure .svg extension
|
||||
if not filename.endswith(".svg"):
|
||||
filename += ".svg"
|
||||
|
||||
# Full path
|
||||
filepath = screenshots_dir / filename
|
||||
|
||||
# Temporarily hide borders for cleaner screenshot
|
||||
chat_widget = self.query_one(ChatRepl)
|
||||
original_chat_border = chat_widget.styles.border_left
|
||||
chat_widget.styles.border_left = ("none", "transparent")
|
||||
|
||||
# Hide all Input widget borders
|
||||
input_widgets = self.query("Input")
|
||||
original_input_borders = []
|
||||
for input_widget in input_widgets:
|
||||
original_input_borders.append(input_widget.styles.border)
|
||||
input_widget.styles.border = ("none", "transparent")
|
||||
|
||||
try:
|
||||
# Get SVG data from Textual and save it
|
||||
svg_data = self.export_screenshot()
|
||||
filepath.write_text(svg_data, encoding="utf-8")
|
||||
finally:
|
||||
# Restore the original borders
|
||||
chat_widget.styles.border_left = original_chat_border
|
||||
for i, input_widget in enumerate(input_widgets):
|
||||
input_widget.styles.border = original_input_borders[i]
|
||||
|
||||
return str(filepath)
|
||||
|
||||
def action_screenshot(self) -> None:
|
||||
"""Take a screenshot (bound to Ctrl+S)."""
|
||||
try:
|
||||
filepath = self.save_screenshot()
|
||||
self.notify(
|
||||
f"Screenshot saved: {filepath} (SVG - open in browser)",
|
||||
severity="information",
|
||||
timeout=5,
|
||||
)
|
||||
except Exception as e:
|
||||
self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)
|
||||
|
||||
async def on_unmount(self) -> None:
|
||||
"""Cleanup on app shutdown."""
|
||||
self.is_ready = False
|
||||
try:
|
||||
if hasattr(self, "_subscription_id"):
|
||||
self.runtime.unsubscribe_from_events(self._subscription_id)
|
||||
except Exception:
|
||||
pass
|
||||
try:
|
||||
if hasattr(self, "queue_handler"):
|
||||
logging.getLogger().removeHandler(self.queue_handler)
|
||||
except Exception:
|
||||
pass
|
||||
@@ -0,0 +1,325 @@
|
||||
"""
|
||||
Chat / REPL Widget - Uses RichLog for append-only, selection-safe display.
|
||||
|
||||
Streaming display approach:
|
||||
- The processing-indicator Label is used as a live status bar during streaming
|
||||
(Label.update() replaces text in-place, unlike RichLog which is append-only).
|
||||
- On EXECUTION_COMPLETED, the final output is written to RichLog as permanent history.
|
||||
- Tool events are written directly to RichLog as discrete status lines.
|
||||
|
||||
Client-facing input:
|
||||
- When a client_facing=True EventLoopNode emits CLIENT_INPUT_REQUESTED, the
|
||||
ChatRepl transitions to "waiting for input" state: input is re-enabled and
|
||||
subsequent submissions are routed to runtime.inject_input() instead of
|
||||
starting a new execution.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import re
|
||||
import threading
|
||||
from typing import Any
|
||||
|
||||
from textual.app import ComposeResult
|
||||
from textual.containers import Vertical
|
||||
from textual.widgets import Input, Label
|
||||
|
||||
from framework.runtime.agent_runtime import AgentRuntime
|
||||
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
|
||||
|
||||
|
||||
class ChatRepl(Vertical):
|
||||
"""Widget for interactive chat/REPL."""
|
||||
|
||||
DEFAULT_CSS = """
|
||||
ChatRepl {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
layout: vertical;
|
||||
}
|
||||
|
||||
ChatRepl > RichLog {
|
||||
width: 100%;
|
||||
height: 1fr;
|
||||
background: $surface;
|
||||
border: none;
|
||||
scrollbar-background: $panel;
|
||||
scrollbar-color: $primary;
|
||||
}
|
||||
|
||||
ChatRepl > #processing-indicator {
|
||||
width: 100%;
|
||||
height: 1;
|
||||
background: $primary 20%;
|
||||
color: $text;
|
||||
text-style: bold;
|
||||
display: none;
|
||||
}
|
||||
|
||||
ChatRepl > Input {
|
||||
width: 100%;
|
||||
height: auto;
|
||||
dock: bottom;
|
||||
background: $surface;
|
||||
border: tall $primary;
|
||||
margin-top: 1;
|
||||
}
|
||||
|
||||
ChatRepl > Input:focus {
|
||||
border: tall $accent;
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self, runtime: AgentRuntime):
|
||||
super().__init__()
|
||||
self.runtime = runtime
|
||||
self._current_exec_id: str | None = None
|
||||
self._streaming_snapshot: str = ""
|
||||
self._waiting_for_input: bool = False
|
||||
self._input_node_id: str | None = None
|
||||
|
||||
# Dedicated event loop for agent execution.
|
||||
# Keeps blocking runtime code (LLM calls, MCP tools) off
|
||||
# the Textual event loop so the UI stays responsive.
|
||||
self._agent_loop = asyncio.new_event_loop()
|
||||
self._agent_thread = threading.Thread(
|
||||
target=self._agent_loop.run_forever,
|
||||
daemon=True,
|
||||
name="agent-execution",
|
||||
)
|
||||
self._agent_thread.start()
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
yield RichLog(
|
||||
id="chat-history",
|
||||
highlight=True,
|
||||
markup=True,
|
||||
auto_scroll=False,
|
||||
wrap=True,
|
||||
min_width=0,
|
||||
)
|
||||
yield Label("Agent is processing...", id="processing-indicator")
|
||||
yield Input(placeholder="Enter input for agent...", id="chat-input")
|
||||
|
||||
# Regex for file:// URIs that are NOT already inside Rich [link=...] markup
|
||||
_FILE_URI_RE = re.compile(r"(?<!\[link=)(file://[^\s)\]>*]+)")
|
||||
|
||||
def _linkify(self, text: str) -> str:
|
||||
"""Convert bare file:// URIs to clickable Rich [link=...] markup with short display text."""
|
||||
|
||||
def _shorten(match: re.Match) -> str:
|
||||
uri = match.group(1)
|
||||
filename = uri.rsplit("/", 1)[-1] if "/" in uri else uri
|
||||
return f"[link={uri}]{filename}[/link]"
|
||||
|
||||
return self._FILE_URI_RE.sub(_shorten, text)
|
||||
|
||||
def _write_history(self, content: str) -> None:
|
||||
"""Write to chat history, only auto-scrolling if user is at the bottom."""
|
||||
history = self.query_one("#chat-history", RichLog)
|
||||
was_at_bottom = history.is_vertical_scroll_end
|
||||
history.write(self._linkify(content))
|
||||
if was_at_bottom:
|
||||
history.scroll_end(animate=False)
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Add welcome message when widget mounts."""
|
||||
history = self.query_one("#chat-history", RichLog)
|
||||
history.write("[bold cyan]Chat REPL Ready[/bold cyan] — Type your input below\n")
|
||||
|
||||
async def on_input_submitted(self, message: Input.Submitted) -> None:
|
||||
"""Handle input submission — either start new execution or inject input."""
|
||||
user_input = message.value.strip()
|
||||
if not user_input:
|
||||
return
|
||||
|
||||
# Client-facing input: route to the waiting node
|
||||
if self._waiting_for_input and self._input_node_id:
|
||||
self._write_history(f"[bold green]You:[/bold green] {user_input}")
|
||||
message.input.value = ""
|
||||
|
||||
# Disable input while agent processes the response
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = True
|
||||
chat_input.placeholder = "Enter input for agent..."
|
||||
self._waiting_for_input = False
|
||||
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.update("Thinking...")
|
||||
|
||||
node_id = self._input_node_id
|
||||
self._input_node_id = None
|
||||
|
||||
try:
|
||||
future = asyncio.run_coroutine_threadsafe(
|
||||
self.runtime.inject_input(node_id, user_input),
|
||||
self._agent_loop,
|
||||
)
|
||||
await asyncio.wrap_future(future)
|
||||
except Exception as e:
|
||||
self._write_history(f"[bold red]Error delivering input:[/bold red] {e}")
|
||||
return
|
||||
|
||||
# Double-submit guard: reject input while an execution is in-flight
|
||||
if self._current_exec_id is not None:
|
||||
self._write_history("[dim]Agent is still running — please wait.[/dim]")
|
||||
return
|
||||
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
|
||||
# Append user message and clear input
|
||||
self._write_history(f"[bold green]You:[/bold green] {user_input}")
|
||||
message.input.value = ""
|
||||
|
||||
try:
|
||||
# Get entry point
|
||||
entry_points = self.runtime.get_entry_points()
|
||||
if not entry_points:
|
||||
self._write_history("[bold red]Error:[/bold red] No entry points")
|
||||
return
|
||||
|
||||
# Determine the input key from the entry node
|
||||
entry_point = entry_points[0]
|
||||
entry_node = self.runtime.graph.get_node(entry_point.entry_node)
|
||||
|
||||
if entry_node and entry_node.input_keys:
|
||||
input_key = entry_node.input_keys[0]
|
||||
else:
|
||||
input_key = "input"
|
||||
|
||||
# Reset streaming state
|
||||
self._streaming_snapshot = ""
|
||||
|
||||
# Show processing indicator
|
||||
indicator.update("Thinking...")
|
||||
indicator.display = True
|
||||
|
||||
# Disable input while the agent is working
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = True
|
||||
|
||||
# Submit execution to the dedicated agent loop so blocking
|
||||
# runtime code (LLM, MCP tools) never touches Textual's loop.
|
||||
# trigger() returns immediately with an exec_id; the heavy
|
||||
# execution task runs entirely on the agent thread.
|
||||
future = asyncio.run_coroutine_threadsafe(
|
||||
self.runtime.trigger(
|
||||
entry_point_id=entry_point.id,
|
||||
input_data={input_key: user_input},
|
||||
),
|
||||
self._agent_loop,
|
||||
)
|
||||
# wrap_future lets us await without blocking Textual's loop
|
||||
self._current_exec_id = await asyncio.wrap_future(future)
|
||||
|
||||
except Exception as e:
|
||||
indicator.display = False
|
||||
self._current_exec_id = None
|
||||
# Re-enable input on error
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = False
|
||||
self._write_history(f"[bold red]Error:[/bold red] {e}")
|
||||
|
||||
# -- Event handlers called by app.py _handle_event --
|
||||
|
||||
def handle_text_delta(self, content: str, snapshot: str) -> None:
|
||||
"""Handle a streaming text token from the LLM."""
|
||||
self._streaming_snapshot = snapshot
|
||||
|
||||
# Show a truncated live preview in the indicator label
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
preview = snapshot[-80:] if len(snapshot) > 80 else snapshot
|
||||
# Replace newlines for single-line display
|
||||
preview = preview.replace("\n", " ")
|
||||
indicator.update(
|
||||
f"Thinking: ...{preview}" if len(snapshot) > 80 else f"Thinking: {preview}"
|
||||
)
|
||||
|
||||
def handle_tool_started(self, tool_name: str, tool_input: dict[str, Any]) -> None:
|
||||
"""Handle a tool call starting."""
|
||||
# Update indicator to show tool activity
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.update(f"Using tool: {tool_name}...")
|
||||
|
||||
# Write a discrete status line to history
|
||||
self._write_history(f"[dim]Tool: {tool_name}[/dim]")
|
||||
|
||||
def handle_tool_completed(self, tool_name: str, result: str, is_error: bool) -> None:
|
||||
"""Handle a tool call completing."""
|
||||
result_str = str(result)
|
||||
preview = result_str[:200] + "..." if len(result_str) > 200 else result_str
|
||||
preview = preview.replace("\n", " ")
|
||||
|
||||
if is_error:
|
||||
self._write_history(f"[dim red]Tool {tool_name} error: {preview}[/dim red]")
|
||||
else:
|
||||
self._write_history(f"[dim]Tool {tool_name} result: {preview}[/dim]")
|
||||
|
||||
# Restore thinking indicator
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.update("Thinking...")
|
||||
|
||||
def handle_execution_completed(self, output: dict[str, Any]) -> None:
|
||||
"""Handle execution finishing successfully."""
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.display = False
|
||||
|
||||
# Write the final streaming snapshot to permanent history (if any)
|
||||
if self._streaming_snapshot:
|
||||
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
|
||||
else:
|
||||
output_str = str(output.get("output_string", output))
|
||||
self._write_history(f"[bold blue]Agent:[/bold blue] {output_str}")
|
||||
self._write_history("") # separator
|
||||
|
||||
self._current_exec_id = None
|
||||
self._streaming_snapshot = ""
|
||||
self._waiting_for_input = False
|
||||
self._input_node_id = None
|
||||
|
||||
# Re-enable input
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = False
|
||||
chat_input.placeholder = "Enter input for agent..."
|
||||
chat_input.focus()
|
||||
|
||||
def handle_execution_failed(self, error: str) -> None:
|
||||
"""Handle execution failing."""
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.display = False
|
||||
|
||||
self._write_history(f"[bold red]Error:[/bold red] {error}")
|
||||
self._write_history("") # separator
|
||||
|
||||
self._current_exec_id = None
|
||||
self._streaming_snapshot = ""
|
||||
self._waiting_for_input = False
|
||||
self._input_node_id = None
|
||||
|
||||
# Re-enable input
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = False
|
||||
chat_input.placeholder = "Enter input for agent..."
|
||||
chat_input.focus()
|
||||
|
||||
def handle_input_requested(self, node_id: str) -> None:
|
||||
"""Handle a client-facing node requesting user input.
|
||||
|
||||
Transitions to 'waiting for input' state: flushes the current
|
||||
streaming snapshot to history, re-enables the input widget,
|
||||
and sets a flag so the next submission routes to inject_input().
|
||||
"""
|
||||
# Flush accumulated streaming text as agent output
|
||||
if self._streaming_snapshot:
|
||||
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
|
||||
self._streaming_snapshot = ""
|
||||
|
||||
self._waiting_for_input = True
|
||||
self._input_node_id = node_id or None
|
||||
|
||||
indicator = self.query_one("#processing-indicator", Label)
|
||||
indicator.update("Waiting for your input...")
|
||||
|
||||
chat_input = self.query_one("#chat-input", Input)
|
||||
chat_input.disabled = False
|
||||
chat_input.placeholder = "Type your response..."
|
||||
chat_input.focus()
|
||||
@@ -0,0 +1,194 @@
|
||||
"""
|
||||
Graph/Tree Overview Widget - Displays real agent graph structure.
|
||||
"""
|
||||
|
||||
from textual.app import ComposeResult
|
||||
from textual.containers import Vertical
|
||||
|
||||
from framework.runtime.agent_runtime import AgentRuntime
|
||||
from framework.runtime.event_bus import EventType
|
||||
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
|
||||
|
||||
|
||||
class GraphOverview(Vertical):
|
||||
"""Widget to display Agent execution graph/tree with real data."""
|
||||
|
||||
DEFAULT_CSS = """
|
||||
GraphOverview {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
background: $panel;
|
||||
}
|
||||
|
||||
GraphOverview > RichLog {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
background: $panel;
|
||||
border: none;
|
||||
scrollbar-background: $surface;
|
||||
scrollbar-color: $primary;
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self, runtime: AgentRuntime):
|
||||
super().__init__()
|
||||
self.runtime = runtime
|
||||
self.active_node: str | None = None
|
||||
self.execution_path: list[str] = []
|
||||
# Per-node status strings shown next to the node in the graph display.
|
||||
# e.g. {"planner": "thinking...", "searcher": "web_search..."}
|
||||
self._node_status: dict[str, str] = {}
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
# Use RichLog for formatted output
|
||||
yield RichLog(id="graph-display", highlight=True, markup=True)
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Display initial graph structure."""
|
||||
self._display_graph()
|
||||
|
||||
def _topo_order(self) -> list[str]:
|
||||
"""BFS from entry_node following edges."""
|
||||
graph = self.runtime.graph
|
||||
visited: list[str] = []
|
||||
seen: set[str] = set()
|
||||
queue = [graph.entry_node]
|
||||
while queue:
|
||||
nid = queue.pop(0)
|
||||
if nid in seen:
|
||||
continue
|
||||
seen.add(nid)
|
||||
visited.append(nid)
|
||||
for edge in graph.get_outgoing_edges(nid):
|
||||
if edge.target not in seen:
|
||||
queue.append(edge.target)
|
||||
# Append orphan nodes not reachable from entry
|
||||
for node in graph.nodes:
|
||||
if node.id not in seen:
|
||||
visited.append(node.id)
|
||||
return visited
|
||||
|
||||
def _render_node_line(self, node_id: str) -> str:
|
||||
"""Render a single node with status symbol and optional status text."""
|
||||
graph = self.runtime.graph
|
||||
is_terminal = node_id in (graph.terminal_nodes or [])
|
||||
is_active = node_id == self.active_node
|
||||
is_done = node_id in self.execution_path and not is_active
|
||||
status = self._node_status.get(node_id, "")
|
||||
|
||||
if is_active:
|
||||
sym = "[bold green]●[/bold green]"
|
||||
elif is_done:
|
||||
sym = "[dim]✓[/dim]"
|
||||
elif is_terminal:
|
||||
sym = "[yellow]■[/yellow]"
|
||||
else:
|
||||
sym = "○"
|
||||
|
||||
if is_active:
|
||||
name = f"[bold green]{node_id}[/bold green]"
|
||||
elif is_done:
|
||||
name = f"[dim]{node_id}[/dim]"
|
||||
else:
|
||||
name = node_id
|
||||
|
||||
suffix = f" [italic]{status}[/italic]" if status else ""
|
||||
return f" {sym} {name}{suffix}"
|
||||
|
||||
def _render_edges(self, node_id: str) -> list[str]:
|
||||
"""Render edge connectors from this node to its targets."""
|
||||
edges = self.runtime.graph.get_outgoing_edges(node_id)
|
||||
if not edges:
|
||||
return []
|
||||
if len(edges) == 1:
|
||||
return [" │", " ▼"]
|
||||
# Fan-out: show branches
|
||||
lines: list[str] = []
|
||||
for i, edge in enumerate(edges):
|
||||
connector = "└" if i == len(edges) - 1 else "├"
|
||||
cond = ""
|
||||
if edge.condition.value not in ("always", "on_success"):
|
||||
cond = f" [dim]({edge.condition.value})[/dim]"
|
||||
lines.append(f" {connector}──▶ {edge.target}{cond}")
|
||||
return lines
|
||||
|
||||
def _display_graph(self) -> None:
|
||||
"""Display the graph as an ASCII DAG with edge connectors."""
|
||||
display = self.query_one("#graph-display", RichLog)
|
||||
display.clear()
|
||||
|
||||
graph = self.runtime.graph
|
||||
display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
|
||||
|
||||
# Render each node in topological order with edges
|
||||
ordered = self._topo_order()
|
||||
for node_id in ordered:
|
||||
display.write(self._render_node_line(node_id))
|
||||
for edge_line in self._render_edges(node_id):
|
||||
display.write(edge_line)
|
||||
|
||||
# Execution path footer
|
||||
if self.execution_path:
|
||||
display.write("")
|
||||
display.write(f"[dim]Path:[/dim] {' → '.join(self.execution_path[-5:])}")
|
||||
|
||||
def update_active_node(self, node_id: str) -> None:
|
||||
"""Update the currently active node."""
|
||||
self.active_node = node_id
|
||||
if node_id not in self.execution_path:
|
||||
self.execution_path.append(node_id)
|
||||
self._display_graph()
|
||||
|
||||
def update_execution(self, event) -> None:
|
||||
"""Update the displayed node status based on execution lifecycle events."""
|
||||
if event.type == EventType.EXECUTION_STARTED:
|
||||
self._node_status.clear()
|
||||
self.execution_path.clear()
|
||||
entry_node = event.data.get("entry_node") or (
|
||||
self.runtime.graph.entry_node if self.runtime else None
|
||||
)
|
||||
if entry_node:
|
||||
self.update_active_node(entry_node)
|
||||
|
||||
elif event.type == EventType.EXECUTION_COMPLETED:
|
||||
self.active_node = None
|
||||
self._node_status.clear()
|
||||
self._display_graph()
|
||||
|
||||
elif event.type == EventType.EXECUTION_FAILED:
|
||||
error = event.data.get("error", "Unknown error")
|
||||
if self.active_node:
|
||||
self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
|
||||
self.active_node = None
|
||||
self._display_graph()
|
||||
|
||||
# -- Event handlers called by app.py _handle_event --
|
||||
|
||||
def handle_node_loop_started(self, node_id: str) -> None:
|
||||
"""A node's event loop has started."""
|
||||
self._node_status[node_id] = "thinking..."
|
||||
self.update_active_node(node_id)
|
||||
|
||||
def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
|
||||
"""A node advanced to a new loop iteration."""
|
||||
self._node_status[node_id] = f"step {iteration}"
|
||||
self._display_graph()
|
||||
|
||||
def handle_node_loop_completed(self, node_id: str) -> None:
|
||||
"""A node's event loop completed."""
|
||||
self._node_status.pop(node_id, None)
|
||||
self._display_graph()
|
||||
|
||||
def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
|
||||
"""Show tool activity next to the active node."""
|
||||
if started:
|
||||
self._node_status[node_id] = f"{tool_name}..."
|
||||
else:
|
||||
# Restore to generic thinking status after tool completes
|
||||
self._node_status[node_id] = "thinking..."
|
||||
self._display_graph()
|
||||
|
||||
def handle_stalled(self, node_id: str, reason: str) -> None:
|
||||
"""Highlight a stalled node."""
|
||||
self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
|
||||
self._display_graph()
|
||||
@@ -0,0 +1,147 @@
|
||||
"""
|
||||
Log Pane Widget - Uses RichLog for reliable rendering.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
from textual.app import ComposeResult
|
||||
from textual.containers import Container
|
||||
|
||||
from framework.runtime.event_bus import AgentEvent, EventType
|
||||
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
|
||||
|
||||
|
||||
class LogPane(Container):
|
||||
"""Widget to display logs with reliable rendering."""
|
||||
|
||||
_EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
|
||||
EventType.EXECUTION_STARTED: (">>", "bold cyan"),
|
||||
EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
|
||||
EventType.EXECUTION_FAILED: ("!!", "bold red"),
|
||||
EventType.TOOL_CALL_STARTED: ("->", "yellow"),
|
||||
EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
|
||||
EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
|
||||
EventType.NODE_LOOP_ITERATION: ("..", "dim"),
|
||||
EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
|
||||
EventType.NODE_STALLED: ("!!", "bold yellow"),
|
||||
EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
|
||||
EventType.GOAL_PROGRESS: ("%%", "blue"),
|
||||
EventType.GOAL_ACHIEVED: ("**", "bold green"),
|
||||
EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
|
||||
EventType.STATE_CHANGED: ("~~", "dim"),
|
||||
EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
|
||||
}
|
||||
|
||||
_LOG_LEVEL_COLORS = {
|
||||
logging.DEBUG: "dim",
|
||||
logging.INFO: "",
|
||||
logging.WARNING: "yellow",
|
||||
logging.ERROR: "red",
|
||||
logging.CRITICAL: "bold red",
|
||||
}
|
||||
|
||||
DEFAULT_CSS = """
|
||||
LogPane {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
LogPane > RichLog {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
background: $surface;
|
||||
border: none;
|
||||
scrollbar-background: $panel;
|
||||
scrollbar-color: $primary;
|
||||
}
|
||||
"""
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
# RichLog is designed for log display and doesn't have TextArea's rendering issues
|
||||
yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
|
||||
|
||||
def write_event(self, event: AgentEvent) -> None:
|
||||
"""Format an AgentEvent with timestamp + symbol and write to the log."""
|
||||
ts = event.timestamp.strftime("%H:%M:%S")
|
||||
symbol, color = self._EVENT_FORMAT.get(event.type, ("--", "dim"))
|
||||
text = self._extract_event_text(event)
|
||||
self.write_log(f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]")
|
||||
|
||||
def _extract_event_text(self, event: AgentEvent) -> str:
|
||||
"""Extract human-readable text from an event's data dict."""
|
||||
et = event.type
|
||||
data = event.data
|
||||
|
||||
if et == EventType.EXECUTION_STARTED:
|
||||
return "Execution started"
|
||||
elif et == EventType.EXECUTION_COMPLETED:
|
||||
return "Execution completed"
|
||||
elif et == EventType.EXECUTION_FAILED:
|
||||
return f"Execution FAILED: {data.get('error', 'unknown')}"
|
||||
elif et == EventType.TOOL_CALL_STARTED:
|
||||
return f"Tool call: {data.get('tool_name', 'unknown')}"
|
||||
elif et == EventType.TOOL_CALL_COMPLETED:
|
||||
name = data.get("tool_name", "unknown")
|
||||
if data.get("is_error"):
|
||||
preview = str(data.get("result", ""))[:80]
|
||||
return f"Tool error: {name} - {preview}"
|
||||
return f"Tool done: {name}"
|
||||
elif et == EventType.NODE_LOOP_STARTED:
|
||||
return f"Node started: {event.node_id or 'unknown'}"
|
||||
elif et == EventType.NODE_LOOP_ITERATION:
|
||||
return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
|
||||
elif et == EventType.NODE_LOOP_COMPLETED:
|
||||
return f"Node done: {event.node_id or 'unknown'}"
|
||||
elif et == EventType.NODE_STALLED:
|
||||
reason = data.get("reason", "")
|
||||
node = event.node_id or "unknown"
|
||||
return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
|
||||
elif et == EventType.NODE_INPUT_BLOCKED:
|
||||
return f"Node input blocked: {event.node_id or 'unknown'}"
|
||||
elif et == EventType.GOAL_PROGRESS:
|
||||
return f"Goal progress: {data.get('progress', '?')}"
|
||||
elif et == EventType.GOAL_ACHIEVED:
|
||||
return "Goal achieved"
|
||||
elif et == EventType.CONSTRAINT_VIOLATION:
|
||||
return f"Constraint violated: {data.get('description', 'unknown')}"
|
||||
elif et == EventType.STATE_CHANGED:
|
||||
return f"State changed: {data.get('key', 'unknown')}"
|
||||
elif et == EventType.CLIENT_INPUT_REQUESTED:
|
||||
return "Waiting for user input"
|
||||
else:
|
||||
return f"{et.value}: {data}"
|
||||
|
||||
def write_python_log(self, record: logging.LogRecord) -> None:
|
||||
"""Format a Python log record with timestamp and severity color."""
|
||||
ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
|
||||
color = self._LOG_LEVEL_COLORS.get(record.levelno, "")
|
||||
msg = record.getMessage()
|
||||
if color:
|
||||
self.write_log(f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}")
|
||||
else:
|
||||
self.write_log(f"[dim]{ts}[/dim] {record.levelname} {msg}")
|
||||
|
||||
def write_log(self, message: str) -> None:
|
||||
"""Write a log message to the log pane."""
|
||||
try:
|
||||
# Check if widget is mounted
|
||||
if not self.is_mounted:
|
||||
return
|
||||
|
||||
log = self.query_one("#main-log", RichLog)
|
||||
|
||||
# Check if log is mounted
|
||||
if not log.is_mounted:
|
||||
return
|
||||
|
||||
# Only auto-scroll if user is already at the bottom
|
||||
was_at_bottom = log.is_vertical_scroll_end
|
||||
|
||||
log.write(message)
|
||||
|
||||
if was_at_bottom:
|
||||
log.scroll_end(animate=False)
|
||||
|
||||
except Exception:
|
||||
pass
|
||||
@@ -0,0 +1,206 @@
|
||||
"""
|
||||
SelectableRichLog - RichLog with mouse-driven text selection and clipboard copy.
|
||||
|
||||
Drop-in replacement for RichLog. Click-and-drag to select text, which is
|
||||
visually highlighted. Press Ctrl+C to copy selection to clipboard (handled
|
||||
by app.py). Press Escape or single-click to clear selection.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
from rich.segment import Segment as RichSegment
|
||||
from rich.style import Style
|
||||
from textual.geometry import Offset
|
||||
from textual.selection import Selection
|
||||
from textual.strip import Strip
|
||||
from textual.widgets import RichLog
|
||||
|
||||
# Highlight style for selected text
|
||||
_HIGHLIGHT_STYLE = Style(bgcolor="blue", color="white")
|
||||
|
||||
|
||||
class SelectableRichLog(RichLog):
|
||||
"""RichLog with mouse-driven text selection."""
|
||||
|
||||
DEFAULT_CSS = """
|
||||
SelectableRichLog {
|
||||
pointer: text;
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self, **kwargs) -> None:
|
||||
super().__init__(**kwargs)
|
||||
self._sel_anchor: Offset | None = None
|
||||
self._sel_end: Offset | None = None
|
||||
self._selecting: bool = False
|
||||
|
||||
# -- Internal helpers --
|
||||
|
||||
def _apply_highlight(self, strip: Strip) -> Strip:
|
||||
"""Apply highlight with correct precedence (highlight wins over base style)."""
|
||||
segments = []
|
||||
for text, style, control in strip._segments:
|
||||
if control:
|
||||
segments.append(RichSegment(text, style, control))
|
||||
else:
|
||||
new_style = (style + _HIGHLIGHT_STYLE) if style else _HIGHLIGHT_STYLE
|
||||
segments.append(RichSegment(text, new_style, control))
|
||||
return Strip(segments, strip.cell_length)
|
||||
|
||||
# -- Selection helpers --
|
||||
|
||||
@property
|
||||
def selection(self) -> Selection | None:
|
||||
"""Build a Selection from current anchor/end, or None if no selection."""
|
||||
if self._sel_anchor is None or self._sel_end is None:
|
||||
return None
|
||||
if self._sel_anchor == self._sel_end:
|
||||
return None
|
||||
return Selection.from_offsets(self._sel_anchor, self._sel_end)
|
||||
|
||||
def _mouse_to_content(self, event_x: int, event_y: int) -> Offset:
|
||||
"""Convert viewport mouse coords to content (line, col) coords."""
|
||||
scroll_x, scroll_y = self.scroll_offset
|
||||
return Offset(scroll_x + event_x, scroll_y + event_y)
|
||||
|
||||
def clear_selection(self) -> None:
|
||||
"""Clear any active selection."""
|
||||
had_selection = self._sel_anchor is not None
|
||||
self._sel_anchor = None
|
||||
self._sel_end = None
|
||||
self._selecting = False
|
||||
if had_selection:
|
||||
self.refresh()
|
||||
|
||||
# -- Mouse handlers (left button only) --
|
||||
|
||||
def on_mouse_down(self, event) -> None:
|
||||
"""Start selection on left mouse button."""
|
||||
if event.button != 1:
|
||||
return
|
||||
self._sel_anchor = self._mouse_to_content(event.x, event.y)
|
||||
self._sel_end = self._sel_anchor
|
||||
self._selecting = True
|
||||
self.capture_mouse()
|
||||
self.refresh()
|
||||
|
||||
def on_mouse_move(self, event) -> None:
|
||||
"""Extend selection while dragging."""
|
||||
if not self._selecting:
|
||||
return
|
||||
self._sel_end = self._mouse_to_content(event.x, event.y)
|
||||
self.refresh()
|
||||
|
||||
def on_mouse_up(self, event) -> None:
|
||||
"""End selection on mouse release."""
|
||||
if not self._selecting:
|
||||
return
|
||||
self._selecting = False
|
||||
self.release_mouse()
|
||||
|
||||
# Single-click (no drag) clears selection
|
||||
if self._sel_anchor == self._sel_end:
|
||||
self.clear_selection()
|
||||
|
||||
# -- Keyboard handlers --
|
||||
|
||||
def on_key(self, event) -> None:
|
||||
"""Clear selection on Escape."""
|
||||
if event.key == "escape":
|
||||
self.clear_selection()
|
||||
|
||||
# -- Rendering with highlight --
|
||||
|
||||
def render_line(self, y: int) -> Strip:
|
||||
"""Override to apply selection highlight on top of the base strip."""
|
||||
strip = super().render_line(y)
|
||||
|
||||
sel = self.selection
|
||||
if sel is None:
|
||||
return strip
|
||||
|
||||
# Determine which content line this viewport row corresponds to
|
||||
_, scroll_y = self.scroll_offset
|
||||
content_y = scroll_y + y
|
||||
|
||||
span = sel.get_span(content_y)
|
||||
if span is None:
|
||||
return strip
|
||||
|
||||
start_x, end_x = span
|
||||
cell_len = strip.cell_length
|
||||
if cell_len == 0:
|
||||
return strip
|
||||
|
||||
scroll_x, _ = self.scroll_offset
|
||||
|
||||
# -1 means "to end of content line" — use viewport end
|
||||
if end_x == -1:
|
||||
end_x = cell_len
|
||||
else:
|
||||
# Convert content-space x to viewport-space x
|
||||
end_x = end_x - scroll_x
|
||||
|
||||
# Convert content-space x to viewport-space x
|
||||
start_x = start_x - scroll_x
|
||||
|
||||
# Clamp to viewport strip bounds
|
||||
start_x = max(0, start_x)
|
||||
end_x = min(end_x, cell_len)
|
||||
|
||||
if start_x >= end_x:
|
||||
return strip
|
||||
|
||||
# Divide strip into [before, selected, after] and highlight the middle
|
||||
parts = strip.divide([start_x, end_x])
|
||||
if len(parts) < 2:
|
||||
return strip
|
||||
|
||||
highlighted_parts: list[Strip] = []
|
||||
for i, part in enumerate(parts):
|
||||
if i == 1:
|
||||
highlighted_parts.append(self._apply_highlight(part))
|
||||
else:
|
||||
highlighted_parts.append(part)
|
||||
|
||||
return Strip.join(highlighted_parts)
|
||||
|
||||
# -- Text extraction & clipboard --
|
||||
|
||||
def get_selected_text(self) -> str | None:
|
||||
"""Extract the plain text of the current selection, or None."""
|
||||
sel = self.selection
|
||||
if sel is None:
|
||||
return None
|
||||
|
||||
# Build full text from all lines
|
||||
all_text = "\n".join(strip.text for strip in self.lines)
|
||||
extracted = sel.extract(all_text)
|
||||
return extracted if extracted else None
|
||||
|
||||
def copy_selection(self) -> str | None:
|
||||
"""Copy selected text to system clipboard. Returns text or None."""
|
||||
text = self.get_selected_text()
|
||||
if not text:
|
||||
return None
|
||||
_copy_to_clipboard(text)
|
||||
return text
|
||||
|
||||
|
||||
def _copy_to_clipboard(text: str) -> None:
|
||||
"""Copy text to system clipboard using platform-native tools."""
|
||||
try:
|
||||
if sys.platform == "darwin":
|
||||
subprocess.run(["pbcopy"], input=text.encode(), check=True, timeout=5)
|
||||
elif sys.platform.startswith("linux"):
|
||||
subprocess.run(
|
||||
["xclip", "-selection", "clipboard"],
|
||||
input=text.encode(),
|
||||
check=True,
|
||||
timeout=5,
|
||||
)
|
||||
except (subprocess.SubprocessError, FileNotFoundError):
|
||||
pass
|
||||
+3
-1
@@ -11,13 +11,15 @@ dependencies = [
|
||||
"litellm>=1.81.0",
|
||||
"mcp>=1.0.0",
|
||||
"fastmcp>=2.0.0",
|
||||
"textual>=1.0.0",
|
||||
"pytest>=8.0",
|
||||
"pytest-asyncio>=0.23",
|
||||
"pytest-xdist>=3.0",
|
||||
"tools",
|
||||
]
|
||||
|
||||
# [project.optional-dependencies]
|
||||
[project.optional-dependencies]
|
||||
tui = ["textual>=0.75.0"]
|
||||
|
||||
[project.scripts]
|
||||
hive = "framework.cli:main"
|
||||
|
||||
+1
-1
@@ -143,7 +143,7 @@ def main():
|
||||
logger.info("The MCP server is now ready to use!")
|
||||
logger.info("")
|
||||
logger.info(f"{Colors.BLUE}To start the MCP server manually:{Colors.NC}")
|
||||
logger.info(" python -m framework.mcp.agent_builder_server")
|
||||
logger.info(" uv run python -m framework.mcp.agent_builder_server")
|
||||
logger.info("")
|
||||
logger.info(f"{Colors.BLUE}MCP Configuration location:{Colors.NC}")
|
||||
logger.info(f" {mcp_config_path}")
|
||||
|
||||
+4
-4
@@ -19,7 +19,7 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
echo -e "${YELLOW}Step 1: Installing framework package...${NC}"
|
||||
pip install -e . || {
|
||||
uv pip install -e . || {
|
||||
echo -e "${RED}Failed to install framework package${NC}"
|
||||
exit 1
|
||||
}
|
||||
@@ -27,7 +27,7 @@ echo -e "${GREEN}✓ Framework package installed${NC}"
|
||||
echo ""
|
||||
|
||||
echo -e "${YELLOW}Step 2: Installing MCP dependencies...${NC}"
|
||||
pip install mcp fastmcp || {
|
||||
uv pip install mcp fastmcp || {
|
||||
echo -e "${RED}Failed to install MCP dependencies${NC}"
|
||||
exit 1
|
||||
}
|
||||
@@ -59,7 +59,7 @@ fi
|
||||
echo ""
|
||||
|
||||
echo -e "${YELLOW}Step 4: Testing MCP server...${NC}"
|
||||
python -c "from framework.mcp import agent_builder_server; print('✓ MCP server module loads successfully')" || {
|
||||
uv run python -c "from framework.mcp import agent_builder_server; print('✓ MCP server module loads successfully')" || {
|
||||
echo -e "${RED}Failed to import MCP server module${NC}"
|
||||
exit 1
|
||||
}
|
||||
@@ -71,7 +71,7 @@ echo ""
|
||||
echo "The MCP server is now ready to use!"
|
||||
echo ""
|
||||
echo "To start the MCP server manually:"
|
||||
echo " python -m framework.mcp.agent_builder_server"
|
||||
echo " uv run python -m framework.mcp.agent_builder_server"
|
||||
echo ""
|
||||
echo "MCP Configuration location:"
|
||||
echo " $SCRIPT_DIR/.mcp.json"
|
||||
|
||||
@@ -1,10 +1,20 @@
|
||||
"""Tests for the BuilderQuery interface - how Builder analyzes agent runs."""
|
||||
"""Tests for the BuilderQuery interface - how Builder analyzes agent runs.
|
||||
|
||||
DEPRECATED: These tests rely on the deprecated FileStorage backend.
|
||||
BuilderQuery and Runtime both use FileStorage which is deprecated.
|
||||
New code should use unified session storage instead.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from framework import BuilderQuery, Runtime
|
||||
from framework.schemas.run import RunStatus
|
||||
|
||||
# Mark all tests in this module as skipped - they rely on deprecated FileStorage
|
||||
pytestmark = pytest.mark.skip(reason="Tests rely on deprecated FileStorage backend")
|
||||
|
||||
|
||||
def create_successful_run(runtime: Runtime, goal_id: str = "test_goal") -> str:
|
||||
"""Helper to create a successful run with decisions."""
|
||||
|
||||
@@ -26,6 +26,11 @@ def create_test_run(
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() is deprecated and now a no-op. "
|
||||
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
@pytest.mark.asyncio
|
||||
async def test_cache_invalidation_on_save(tmp_path: Path):
|
||||
"""Test that summary cache is invalidated when a run is saved.
|
||||
@@ -62,6 +67,11 @@ async def test_cache_invalidation_on_save(tmp_path: Path):
|
||||
await storage.stop()
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() is deprecated and now a no-op. "
|
||||
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
@pytest.mark.asyncio
|
||||
async def test_batched_write_cache_consistency(tmp_path: Path):
|
||||
"""Test that cache is only updated after successful batched write.
|
||||
@@ -104,6 +114,11 @@ async def test_batched_write_cache_consistency(tmp_path: Path):
|
||||
await storage.stop()
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() is deprecated and now a no-op. "
|
||||
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
@pytest.mark.asyncio
|
||||
async def test_immediate_write_updates_cache(tmp_path: Path):
|
||||
"""Test that immediate writes still update cache correctly."""
|
||||
@@ -129,6 +144,11 @@ async def test_immediate_write_updates_cache(tmp_path: Path):
|
||||
await storage.stop()
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() is deprecated and now a no-op. "
|
||||
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
@pytest.mark.asyncio
|
||||
async def test_summary_cache_invalidated_on_multiple_saves(tmp_path: Path):
|
||||
"""Test that summary cache is invalidated on each save, not just the first."""
|
||||
|
||||
@@ -8,7 +8,6 @@ Set HIVE_TEST_LLM_MODEL=<model> to override the real model.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
from collections.abc import AsyncIterator, Callable
|
||||
from dataclasses import dataclass
|
||||
@@ -508,7 +507,7 @@ async def test_event_loop_set_output():
|
||||
|
||||
assert result.success
|
||||
if USE_MOCK_LLM:
|
||||
assert result.output == {"lead_score": "87", "company": "TechCorp"}
|
||||
assert result.output == {"lead_score": 87, "company": "TechCorp"}
|
||||
else:
|
||||
assert "lead_score" in result.output
|
||||
assert "company" in result.output
|
||||
@@ -549,7 +548,7 @@ async def test_event_loop_missing_output_keys_retried():
|
||||
assert "score" in result.output
|
||||
assert "reason" in result.output
|
||||
if USE_MOCK_LLM:
|
||||
assert result.output["score"] == "87"
|
||||
assert result.output["score"] == 87
|
||||
assert result.output["reason"] == "good fit"
|
||||
|
||||
|
||||
@@ -920,7 +919,7 @@ async def test_context_handoff_between_nodes(runtime):
|
||||
assert "lead_score" in result.output
|
||||
assert "strategy" in result.output
|
||||
if USE_MOCK_LLM:
|
||||
assert result.output["lead_score"] == "92"
|
||||
assert result.output["lead_score"] == 92
|
||||
assert result.output["strategy"] == "premium"
|
||||
|
||||
|
||||
@@ -952,14 +951,9 @@ async def test_client_facing_node_streams_output():
|
||||
config=LoopConfig(max_iterations=5),
|
||||
)
|
||||
|
||||
# client_facing + text-only blocks for user input; use shutdown to unblock
|
||||
async def auto_shutdown():
|
||||
await asyncio.sleep(0.05)
|
||||
node.signal_shutdown()
|
||||
|
||||
task = asyncio.create_task(auto_shutdown())
|
||||
# Text-only on client_facing no longer blocks (no ask_user called),
|
||||
# so the node completes without needing a shutdown workaround.
|
||||
result = await node.execute(ctx)
|
||||
await task
|
||||
|
||||
assert result.success
|
||||
|
||||
|
||||
@@ -316,7 +316,7 @@ class TestSetOutput:
|
||||
result = await node.execute(ctx)
|
||||
|
||||
assert result.success is True
|
||||
assert result.output["result"] == "42"
|
||||
assert result.output["result"] == 42
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_set_output_rejects_invalid_key(self, runtime, node_spec, memory):
|
||||
@@ -447,14 +447,9 @@ class TestEventBusLifecycle:
|
||||
ctx = build_ctx(runtime, spec, memory, llm)
|
||||
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
|
||||
|
||||
# client_facing + text-only blocks for user input; use shutdown to unblock
|
||||
async def auto_shutdown():
|
||||
await asyncio.sleep(0.05)
|
||||
node.signal_shutdown()
|
||||
|
||||
task = asyncio.create_task(auto_shutdown())
|
||||
# Text-only on client_facing no longer blocks (no ask_user), so
|
||||
# the node completes without needing shutdown.
|
||||
await node.execute(ctx)
|
||||
await task
|
||||
|
||||
assert EventType.CLIENT_OUTPUT_DELTA in received_types
|
||||
assert EventType.LLM_TEXT_DELTA not in received_types
|
||||
@@ -480,11 +475,38 @@ class TestClientFacingBlocking:
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_client_facing_blocks_on_text(self, runtime, memory, client_spec):
|
||||
"""client_facing + text-only response blocks until inject_event."""
|
||||
async def test_text_only_no_blocking(self, runtime, memory, client_spec):
|
||||
"""client_facing + text-only (no ask_user) should NOT block."""
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
text_scenario("Hello!"),
|
||||
text_scenario("Hello! Here is your status update."),
|
||||
]
|
||||
)
|
||||
bus = EventBus()
|
||||
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
|
||||
ctx = build_ctx(runtime, client_spec, memory, llm)
|
||||
|
||||
# Should complete without blocking — no ask_user called, no output_keys required
|
||||
result = await node.execute(ctx)
|
||||
|
||||
assert result.success is True
|
||||
assert llm._call_index >= 1
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ask_user_triggers_blocking(self, runtime, memory, client_spec):
|
||||
"""client_facing + ask_user() blocks until inject_event."""
|
||||
# Give the node an output key so the judge doesn't auto-accept
|
||||
# after the user responds — it needs set_output first.
|
||||
client_spec.output_keys = ["answer"]
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
# Turn 1: LLM greets user and calls ask_user
|
||||
tool_call_scenario(
|
||||
"ask_user", {"question": "What do you need?"}, tool_use_id="ask_1"
|
||||
),
|
||||
# Turn 2: after user responds, LLM processes and sets output
|
||||
tool_call_scenario("set_output", {"key": "answer", "value": "help provided"}),
|
||||
# Turn 3: text finish (implicit judge accepts — output key set)
|
||||
text_scenario("Got your message."),
|
||||
]
|
||||
)
|
||||
@@ -495,20 +517,19 @@ class TestClientFacingBlocking:
|
||||
async def user_responds():
|
||||
await asyncio.sleep(0.05)
|
||||
await node.inject_event("I need help")
|
||||
await asyncio.sleep(0.05)
|
||||
node.signal_shutdown()
|
||||
|
||||
user_task = asyncio.create_task(user_responds())
|
||||
result = await node.execute(ctx)
|
||||
await user_task
|
||||
|
||||
assert result.success is True
|
||||
# LLM should have been called at least twice (first response + after inject)
|
||||
# LLM called at least twice: once for ask_user turn, once after user responded
|
||||
assert llm._call_index >= 2
|
||||
assert result.output["answer"] == "help provided"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_client_facing_does_not_block_on_tools(self, runtime, memory):
|
||||
"""client_facing + tool calls should NOT block — judge evaluates normally."""
|
||||
"""client_facing + tool calls (no ask_user) should NOT block."""
|
||||
spec = NodeSpec(
|
||||
id="chat",
|
||||
name="Chat",
|
||||
@@ -517,10 +538,9 @@ class TestClientFacingBlocking:
|
||||
output_keys=["result"],
|
||||
client_facing=True,
|
||||
)
|
||||
# Scenario 1: LLM calls set_output (tool call present → no blocking, judge RETRYs)
|
||||
# Scenario 2: LLM produces text (implicit judge sees output key set → ACCEPT)
|
||||
# But scenario 2 is text-only on client_facing → would block.
|
||||
# So we need shutdown to handle that case.
|
||||
# Scenario 1: LLM calls set_output
|
||||
# Scenario 2: LLM produces text — implicit judge ACCEPTs (output key set)
|
||||
# No ask_user called, so no blocking occurs.
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
tool_call_scenario("set_output", {"key": "result", "value": "done"}),
|
||||
@@ -530,18 +550,8 @@ class TestClientFacingBlocking:
|
||||
node = EventLoopNode(config=LoopConfig(max_iterations=5))
|
||||
ctx = build_ctx(runtime, spec, memory, llm)
|
||||
|
||||
# After set_output, implicit judge RETRYs (tool calls present).
|
||||
# Next turn: text-only on client_facing → blocks.
|
||||
# But implicit judge should ACCEPT first (output key is set, no tools).
|
||||
# Actually, client_facing check happens BEFORE judge, so it blocks.
|
||||
# Use shutdown as safety net.
|
||||
async def auto_shutdown():
|
||||
await asyncio.sleep(0.1)
|
||||
node.signal_shutdown()
|
||||
|
||||
task = asyncio.create_task(auto_shutdown())
|
||||
# Should complete without blocking — no ask_user called
|
||||
result = await node.execute(ctx)
|
||||
await task
|
||||
|
||||
assert result.success is True
|
||||
assert result.output["result"] == "done"
|
||||
@@ -567,7 +577,11 @@ class TestClientFacingBlocking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_signal_shutdown_unblocks(self, runtime, memory, client_spec):
|
||||
"""signal_shutdown should unblock a waiting client_facing node."""
|
||||
llm = MockStreamingLLM(scenarios=[text_scenario("Waiting...")])
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
tool_call_scenario("ask_user", {"question": "Waiting..."}, tool_use_id="ask_1"),
|
||||
]
|
||||
)
|
||||
bus = EventBus()
|
||||
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=10))
|
||||
ctx = build_ctx(runtime, client_spec, memory, llm)
|
||||
@@ -584,8 +598,12 @@ class TestClientFacingBlocking:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_client_input_requested_event_published(self, runtime, memory, client_spec):
|
||||
"""CLIENT_INPUT_REQUESTED should be published when blocking."""
|
||||
llm = MockStreamingLLM(scenarios=[text_scenario("Hello!")])
|
||||
"""CLIENT_INPUT_REQUESTED should be published when ask_user blocks."""
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
tool_call_scenario("ask_user", {"question": "Hello!"}, tool_use_id="ask_1"),
|
||||
]
|
||||
)
|
||||
bus = EventBus()
|
||||
received = []
|
||||
|
||||
@@ -611,6 +629,77 @@ class TestClientFacingBlocking:
|
||||
assert len(received) >= 1
|
||||
assert received[0].type == EventType.CLIENT_INPUT_REQUESTED
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ask_user_with_real_tools(self, runtime, memory):
|
||||
"""ask_user alongside real tool calls still triggers blocking."""
|
||||
spec = NodeSpec(
|
||||
id="chat",
|
||||
name="Chat",
|
||||
description="chat node",
|
||||
node_type="event_loop",
|
||||
output_keys=[],
|
||||
client_facing=True,
|
||||
)
|
||||
# LLM calls a real tool AND ask_user in the same turn
|
||||
llm = MockStreamingLLM(
|
||||
scenarios=[
|
||||
[
|
||||
ToolCallEvent(
|
||||
tool_use_id="tool_1", tool_name="search", tool_input={"q": "test"}
|
||||
),
|
||||
ToolCallEvent(tool_use_id="ask_1", tool_name="ask_user", tool_input={}),
|
||||
FinishEvent(
|
||||
stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"
|
||||
),
|
||||
],
|
||||
text_scenario("Done"),
|
||||
]
|
||||
)
|
||||
|
||||
def my_executor(tool_use: ToolUse) -> ToolResult:
|
||||
return ToolResult(tool_use_id=tool_use.id, content="result", is_error=False)
|
||||
|
||||
node = EventLoopNode(
|
||||
tool_executor=my_executor,
|
||||
config=LoopConfig(max_iterations=5),
|
||||
)
|
||||
ctx = build_ctx(
|
||||
runtime, spec, memory, llm, tools=[Tool(name="search", description="", parameters={})]
|
||||
)
|
||||
|
||||
async def unblock():
|
||||
await asyncio.sleep(0.05)
|
||||
await node.inject_event("user input")
|
||||
|
||||
task = asyncio.create_task(unblock())
|
||||
result = await node.execute(ctx)
|
||||
await task
|
||||
|
||||
assert result.success is True
|
||||
assert llm._call_index >= 2
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ask_user_not_available_non_client_facing(self, runtime, memory):
|
||||
"""ask_user tool should NOT be injected for non-client-facing nodes."""
|
||||
spec = NodeSpec(
|
||||
id="internal",
|
||||
name="Internal",
|
||||
description="internal node",
|
||||
node_type="event_loop",
|
||||
output_keys=[],
|
||||
)
|
||||
llm = MockStreamingLLM(scenarios=[text_scenario("thinking...")])
|
||||
node = EventLoopNode(config=LoopConfig(max_iterations=2))
|
||||
ctx = build_ctx(runtime, spec, memory, llm)
|
||||
|
||||
await node.execute(ctx)
|
||||
|
||||
# Verify ask_user was NOT in the tools passed to the LLM
|
||||
assert llm._call_index >= 1
|
||||
for call in llm.stream_calls:
|
||||
tool_names = [t.name for t in (call["tools"] or [])]
|
||||
assert "ask_user" not in tool_names
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Tool execution
|
||||
|
||||
@@ -104,8 +104,10 @@ def test_event_loop_node_spec_accepted():
|
||||
# --- _get_node_implementation() tests ---
|
||||
|
||||
|
||||
def test_unregistered_event_loop_raises(runtime):
|
||||
"""An event_loop node not in the registry should raise RuntimeError."""
|
||||
def test_unregistered_event_loop_auto_creates(runtime):
|
||||
"""An event_loop node not in the registry should be auto-created."""
|
||||
from framework.graph.event_loop_node import EventLoopNode
|
||||
|
||||
spec = NodeSpec(
|
||||
id="el1",
|
||||
name="Event Loop",
|
||||
@@ -114,8 +116,10 @@ def test_unregistered_event_loop_raises(runtime):
|
||||
)
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
|
||||
with pytest.raises(RuntimeError, match="not found in registry"):
|
||||
executor._get_node_implementation(spec)
|
||||
result = executor._get_node_implementation(spec)
|
||||
assert isinstance(result, EventLoopNode)
|
||||
# Auto-created node should be cached in registry
|
||||
assert "el1" in executor.node_registry
|
||||
|
||||
|
||||
def test_registered_event_loop_returns_impl(runtime):
|
||||
|
||||
@@ -5,7 +5,7 @@ Focused on minimal success and failure scenarios.
|
||||
|
||||
import pytest
|
||||
|
||||
from framework.graph.edge import GraphSpec
|
||||
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
|
||||
from framework.graph.executor import GraphExecutor
|
||||
from framework.graph.goal import Goal
|
||||
from framework.graph.node import NodeResult, NodeSpec
|
||||
@@ -130,3 +130,169 @@ async def test_executor_single_node_failure():
|
||||
assert result.success is False
|
||||
assert result.error is not None
|
||||
assert result.path == ["n1"]
|
||||
|
||||
|
||||
# ---- Fake event bus that records calls ----
|
||||
class FakeEventBus:
|
||||
def __init__(self):
|
||||
self.events = []
|
||||
|
||||
async def emit_node_loop_started(self, **kwargs):
|
||||
self.events.append(("started", kwargs))
|
||||
|
||||
async def emit_node_loop_completed(self, **kwargs):
|
||||
self.events.append(("completed", kwargs))
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_executor_emits_node_events():
|
||||
"""Executor should emit NODE_LOOP_STARTED/COMPLETED for each non-event_loop node."""
|
||||
runtime = DummyRuntime()
|
||||
event_bus = FakeEventBus()
|
||||
|
||||
graph = GraphSpec(
|
||||
id="graph-ev",
|
||||
goal_id="g-ev",
|
||||
nodes=[
|
||||
NodeSpec(
|
||||
id="n1",
|
||||
name="first",
|
||||
description="first node",
|
||||
node_type="llm_generate",
|
||||
input_keys=[],
|
||||
output_keys=["result"],
|
||||
max_retries=0,
|
||||
),
|
||||
NodeSpec(
|
||||
id="n2",
|
||||
name="second",
|
||||
description="second node",
|
||||
node_type="llm_generate",
|
||||
input_keys=["result"],
|
||||
output_keys=["result"],
|
||||
max_retries=0,
|
||||
),
|
||||
],
|
||||
edges=[
|
||||
EdgeSpec(
|
||||
id="e1",
|
||||
source="n1",
|
||||
target="n2",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
),
|
||||
],
|
||||
entry_node="n1",
|
||||
terminal_nodes=["n2"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
node_registry={
|
||||
"n1": SuccessNode(),
|
||||
"n2": SuccessNode(),
|
||||
},
|
||||
event_bus=event_bus,
|
||||
stream_id="test-stream",
|
||||
)
|
||||
|
||||
goal = Goal(id="g-ev", name="event-test", description="test events")
|
||||
result = await executor.execute(graph=graph, goal=goal)
|
||||
|
||||
assert result.success is True
|
||||
assert result.path == ["n1", "n2"]
|
||||
|
||||
# Should have 4 events: started/completed for n1, then started/completed for n2
|
||||
assert len(event_bus.events) == 4
|
||||
assert event_bus.events[0] == ("started", {"stream_id": "test-stream", "node_id": "n1"})
|
||||
assert event_bus.events[1] == (
|
||||
"completed",
|
||||
{"stream_id": "test-stream", "node_id": "n1", "iterations": 1},
|
||||
)
|
||||
assert event_bus.events[2] == ("started", {"stream_id": "test-stream", "node_id": "n2"})
|
||||
assert event_bus.events[3] == (
|
||||
"completed",
|
||||
{"stream_id": "test-stream", "node_id": "n2", "iterations": 1},
|
||||
)
|
||||
|
||||
|
||||
# ---- Fake event_loop node (registered, so executor won't emit for it) ----
|
||||
class FakeEventLoopNode:
|
||||
def validate_input(self, ctx):
|
||||
return []
|
||||
|
||||
async def execute(self, ctx):
|
||||
return NodeResult(success=True, output={"result": "loop-done"}, tokens_used=1, latency_ms=1)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_executor_skips_events_for_event_loop_nodes():
|
||||
"""Executor should NOT emit events for event_loop nodes (they emit their own)."""
|
||||
runtime = DummyRuntime()
|
||||
event_bus = FakeEventBus()
|
||||
|
||||
graph = GraphSpec(
|
||||
id="graph-el",
|
||||
goal_id="g-el",
|
||||
nodes=[
|
||||
NodeSpec(
|
||||
id="el1",
|
||||
name="event-loop-node",
|
||||
description="event loop node",
|
||||
node_type="event_loop",
|
||||
input_keys=[],
|
||||
output_keys=["result"],
|
||||
max_retries=0,
|
||||
),
|
||||
],
|
||||
edges=[],
|
||||
entry_node="el1",
|
||||
)
|
||||
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
node_registry={"el1": FakeEventLoopNode()},
|
||||
event_bus=event_bus,
|
||||
stream_id="test-stream",
|
||||
)
|
||||
|
||||
goal = Goal(id="g-el", name="el-test", description="test event_loop guard")
|
||||
result = await executor.execute(graph=graph, goal=goal)
|
||||
|
||||
assert result.success is True
|
||||
# No events should have been emitted — event_loop nodes are skipped
|
||||
assert len(event_bus.events) == 0
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_executor_no_events_without_event_bus():
|
||||
"""Executor should work fine without an event bus (backward compat)."""
|
||||
runtime = DummyRuntime()
|
||||
|
||||
graph = GraphSpec(
|
||||
id="graph-nobus",
|
||||
goal_id="g-nobus",
|
||||
nodes=[
|
||||
NodeSpec(
|
||||
id="n1",
|
||||
name="node1",
|
||||
description="test node",
|
||||
node_type="llm_generate",
|
||||
input_keys=[],
|
||||
output_keys=["result"],
|
||||
max_retries=0,
|
||||
)
|
||||
],
|
||||
edges=[],
|
||||
entry_node="n1",
|
||||
)
|
||||
|
||||
# No event_bus passed — should not crash
|
||||
executor = GraphExecutor(
|
||||
runtime=runtime,
|
||||
node_registry={"n1": SuccessNode()},
|
||||
)
|
||||
|
||||
goal = Goal(id="g-nobus", name="nobus-test", description="no event bus")
|
||||
result = await executor.execute(graph=graph, goal=goal)
|
||||
|
||||
assert result.success is True
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
Run with:
|
||||
cd core
|
||||
pip install litellm pytest
|
||||
uv pip install litellm pytest
|
||||
pytest tests/test_litellm_provider.py -v
|
||||
|
||||
For live tests (requires API keys):
|
||||
|
||||
@@ -4,7 +4,7 @@ Calls live LLM APIs and dumps stream events to JSON files for review.
|
||||
Results are saved to core/tests/stream_event_dumps/{provider}_{model}_{scenario}.json
|
||||
|
||||
Run with:
|
||||
cd core && python -m pytest tests/test_litellm_streaming.py -v -s -k "RealAPI"
|
||||
cd core && uv run python -m pytest tests/test_litellm_streaming.py -v -s -k "RealAPI"
|
||||
|
||||
Requires API keys set in environment:
|
||||
ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY (or via credential store)
|
||||
|
||||
@@ -0,0 +1,360 @@
|
||||
"""
|
||||
Test that ON_FAILURE edges are followed when a node fails after max retries.
|
||||
|
||||
Verifies the fix for Issue #3449 where the executor would immediately terminate
|
||||
when max retries were exceeded, without checking for ON_FAILURE edges that could
|
||||
route to error handler nodes.
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
|
||||
from framework.graph.executor import GraphExecutor
|
||||
from framework.graph.goal import Goal
|
||||
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
|
||||
from framework.runtime.core import Runtime
|
||||
|
||||
|
||||
class AlwaysFailsNode(NodeProtocol):
|
||||
"""A node that always fails."""
|
||||
|
||||
def __init__(self):
|
||||
self.attempt_count = 0
|
||||
|
||||
async def execute(self, ctx: NodeContext) -> NodeResult:
|
||||
self.attempt_count += 1
|
||||
return NodeResult(success=False, error=f"Permanent error (attempt {self.attempt_count})")
|
||||
|
||||
|
||||
class FailureHandlerNode(NodeProtocol):
|
||||
"""A node that handles failures from upstream nodes."""
|
||||
|
||||
def __init__(self):
|
||||
self.executed = False
|
||||
self.execute_count = 0
|
||||
|
||||
async def execute(self, ctx: NodeContext) -> NodeResult:
|
||||
self.executed = True
|
||||
self.execute_count += 1
|
||||
return NodeResult(
|
||||
success=True,
|
||||
output={"handled": True, "recovery": "graceful"},
|
||||
)
|
||||
|
||||
|
||||
class SuccessNode(NodeProtocol):
|
||||
"""A node that always succeeds with configurable output."""
|
||||
|
||||
def __init__(self, output: dict | None = None):
|
||||
self.execute_count = 0
|
||||
self._output = output or {"result": "ok"}
|
||||
|
||||
async def execute(self, ctx: NodeContext) -> NodeResult:
|
||||
self.execute_count += 1
|
||||
return NodeResult(success=True, output=self._output)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def fast_sleep(monkeypatch):
|
||||
"""Mock asyncio.sleep to avoid real delays from exponential backoff."""
|
||||
monkeypatch.setattr("asyncio.sleep", AsyncMock())
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def runtime():
|
||||
"""Create a mock Runtime for testing."""
|
||||
runtime = MagicMock(spec=Runtime)
|
||||
runtime.start_run = MagicMock(return_value="test_run_id")
|
||||
runtime.decide = MagicMock(return_value="test_decision_id")
|
||||
runtime.record_outcome = MagicMock()
|
||||
runtime.end_run = MagicMock()
|
||||
runtime.report_problem = MagicMock()
|
||||
runtime.set_node = MagicMock()
|
||||
return runtime
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def goal():
|
||||
return Goal(
|
||||
id="test_goal",
|
||||
name="Test Goal",
|
||||
description="Test ON_FAILURE edge routing",
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_on_failure_edge_followed_after_max_retries(runtime, goal):
|
||||
"""
|
||||
When a node fails after exhausting max retries, ON_FAILURE edges should
|
||||
be followed to route execution to a failure handler node.
|
||||
"""
|
||||
nodes = [
|
||||
NodeSpec(
|
||||
id="failing",
|
||||
name="Failing Node",
|
||||
description="Always fails",
|
||||
node_type="function",
|
||||
output_keys=[],
|
||||
max_retries=1,
|
||||
),
|
||||
NodeSpec(
|
||||
id="handler",
|
||||
name="Failure Handler",
|
||||
description="Handles failures",
|
||||
node_type="function",
|
||||
output_keys=["handled", "recovery"],
|
||||
),
|
||||
]
|
||||
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="fail_to_handler",
|
||||
source="failing",
|
||||
target="handler",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
),
|
||||
]
|
||||
|
||||
graph = GraphSpec(
|
||||
id="test_graph",
|
||||
goal_id="test_goal",
|
||||
name="Test Graph",
|
||||
entry_node="failing",
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
terminal_nodes=["handler"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
failing_node = AlwaysFailsNode()
|
||||
handler_node = FailureHandlerNode()
|
||||
executor.register_node("failing", failing_node)
|
||||
executor.register_node("handler", handler_node)
|
||||
|
||||
result = await executor.execute(graph, goal, {})
|
||||
|
||||
# The handler should have executed
|
||||
assert handler_node.executed, "Failure handler was not executed"
|
||||
assert handler_node.execute_count == 1
|
||||
|
||||
# Overall execution should succeed (handler recovered)
|
||||
assert result.success
|
||||
# Handler node should appear in the execution path
|
||||
assert "handler" in result.path
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_on_failure_edge_still_terminates(runtime, goal):
|
||||
"""
|
||||
When a node fails after max retries and there is no ON_FAILURE edge,
|
||||
the executor should terminate with a failure result (original behavior).
|
||||
"""
|
||||
nodes = [
|
||||
NodeSpec(
|
||||
id="failing",
|
||||
name="Failing Node",
|
||||
description="Always fails",
|
||||
node_type="function",
|
||||
output_keys=[],
|
||||
max_retries=1,
|
||||
),
|
||||
]
|
||||
|
||||
graph = GraphSpec(
|
||||
id="test_graph",
|
||||
goal_id="test_goal",
|
||||
name="Test Graph",
|
||||
entry_node="failing",
|
||||
nodes=[nodes[0]],
|
||||
edges=[],
|
||||
terminal_nodes=["failing"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
failing_node = AlwaysFailsNode()
|
||||
executor.register_node("failing", failing_node)
|
||||
|
||||
result = await executor.execute(graph, goal, {})
|
||||
|
||||
assert not result.success
|
||||
assert "failed after 1 attempts" in result.error
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_on_failure_edge_not_followed_on_success(runtime, goal):
|
||||
"""
|
||||
ON_FAILURE edges should NOT be followed when a node succeeds.
|
||||
Only ON_SUCCESS edges should fire.
|
||||
"""
|
||||
nodes = [
|
||||
NodeSpec(
|
||||
id="working",
|
||||
name="Working Node",
|
||||
description="Always succeeds",
|
||||
node_type="function",
|
||||
output_keys=["result"],
|
||||
),
|
||||
NodeSpec(
|
||||
id="handler",
|
||||
name="Failure Handler",
|
||||
description="Should not be reached",
|
||||
node_type="function",
|
||||
output_keys=["handled"],
|
||||
),
|
||||
NodeSpec(
|
||||
id="next",
|
||||
name="Next Node",
|
||||
description="Normal successor",
|
||||
node_type="function",
|
||||
output_keys=["done"],
|
||||
),
|
||||
]
|
||||
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="on_fail",
|
||||
source="working",
|
||||
target="handler",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
),
|
||||
EdgeSpec(
|
||||
id="on_success",
|
||||
source="working",
|
||||
target="next",
|
||||
condition=EdgeCondition.ON_SUCCESS,
|
||||
),
|
||||
]
|
||||
|
||||
graph = GraphSpec(
|
||||
id="test_graph",
|
||||
goal_id="test_goal",
|
||||
name="Test Graph",
|
||||
entry_node="working",
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
terminal_nodes=["handler", "next"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
executor.register_node("working", SuccessNode(output={"result": "ok"}))
|
||||
handler_node = FailureHandlerNode()
|
||||
executor.register_node("handler", handler_node)
|
||||
executor.register_node("next", SuccessNode(output={"done": True}))
|
||||
|
||||
result = await executor.execute(graph, goal, {})
|
||||
|
||||
assert result.success
|
||||
assert not handler_node.executed, "Failure handler should not run on success"
|
||||
assert "next" in result.path, "Should follow ON_SUCCESS edge to 'next'"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_on_failure_edge_with_zero_retries(runtime, goal):
|
||||
"""
|
||||
ON_FAILURE edges should work even when max_retries=0 (no retries allowed).
|
||||
The node fails once and immediately routes to the failure handler.
|
||||
"""
|
||||
nodes = [
|
||||
NodeSpec(
|
||||
id="fragile",
|
||||
name="Fragile Node",
|
||||
description="Fails with no retries",
|
||||
node_type="function",
|
||||
output_keys=[],
|
||||
max_retries=0,
|
||||
),
|
||||
NodeSpec(
|
||||
id="handler",
|
||||
name="Failure Handler",
|
||||
description="Handles failures",
|
||||
node_type="function",
|
||||
output_keys=["handled", "recovery"],
|
||||
),
|
||||
]
|
||||
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="fail_to_handler",
|
||||
source="fragile",
|
||||
target="handler",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
),
|
||||
]
|
||||
|
||||
graph = GraphSpec(
|
||||
id="test_graph",
|
||||
goal_id="test_goal",
|
||||
name="Test Graph",
|
||||
entry_node="fragile",
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
terminal_nodes=["handler"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
failing_node = AlwaysFailsNode()
|
||||
handler_node = FailureHandlerNode()
|
||||
executor.register_node("fragile", failing_node)
|
||||
executor.register_node("handler", handler_node)
|
||||
|
||||
result = await executor.execute(graph, goal, {})
|
||||
|
||||
# Should route to handler after single failure (no retries)
|
||||
assert failing_node.attempt_count == 1
|
||||
assert handler_node.executed
|
||||
assert result.success
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_on_failure_handler_appears_in_path(runtime, goal):
|
||||
"""
|
||||
The failure handler node should appear in the execution path.
|
||||
"""
|
||||
nodes = [
|
||||
NodeSpec(
|
||||
id="failing",
|
||||
name="Failing Node",
|
||||
description="Always fails",
|
||||
node_type="function",
|
||||
output_keys=[],
|
||||
max_retries=1,
|
||||
),
|
||||
NodeSpec(
|
||||
id="handler",
|
||||
name="Failure Handler",
|
||||
description="Handles failures",
|
||||
node_type="function",
|
||||
output_keys=["handled", "recovery"],
|
||||
),
|
||||
]
|
||||
|
||||
edges = [
|
||||
EdgeSpec(
|
||||
id="fail_to_handler",
|
||||
source="failing",
|
||||
target="handler",
|
||||
condition=EdgeCondition.ON_FAILURE,
|
||||
),
|
||||
]
|
||||
|
||||
graph = GraphSpec(
|
||||
id="test_graph",
|
||||
goal_id="test_goal",
|
||||
name="Test Graph",
|
||||
entry_node="failing",
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
terminal_nodes=["handler"],
|
||||
)
|
||||
|
||||
executor = GraphExecutor(runtime=runtime)
|
||||
executor.register_node("failing", AlwaysFailsNode())
|
||||
executor.register_node("handler", FailureHandlerNode())
|
||||
|
||||
result = await executor.execute(graph, goal, {})
|
||||
|
||||
assert "failing" in result.path
|
||||
assert "handler" in result.path
|
||||
assert result.node_visit_counts.get("handler") == 1
|
||||
@@ -37,6 +37,10 @@ class TestRuntimeBasics:
|
||||
runtime.end_run(success=True)
|
||||
assert runtime.current_run is None
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() is deprecated and now a no-op. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
def test_run_saved_on_end(self, tmp_path: Path):
|
||||
"""Run is saved to storage when ended."""
|
||||
runtime = Runtime(tmp_path)
|
||||
@@ -341,6 +345,10 @@ class TestConvenienceMethods:
|
||||
class TestNarrativeGeneration:
|
||||
"""Test automatic narrative generation."""
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() and get_runs_by_goal() are deprecated. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
def test_default_narrative_success(self, tmp_path: Path):
|
||||
"""Test default narrative for successful run."""
|
||||
runtime = Runtime(tmp_path)
|
||||
@@ -360,6 +368,10 @@ class TestNarrativeGeneration:
|
||||
run = runtime.storage.load_run(runtime.storage.get_runs_by_goal("test_goal")[0])
|
||||
assert "completed successfully" in run.narrative
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason="FileStorage.save_run() and get_runs_by_goal() are deprecated. "
|
||||
"New sessions use unified storage at sessions/{session_id}/state.json"
|
||||
)
|
||||
def test_default_narrative_failure(self, tmp_path: Path):
|
||||
"""Test default narrative for failed run."""
|
||||
runtime = Runtime(tmp_path)
|
||||
|
||||
@@ -0,0 +1,942 @@
|
||||
"""Tests for RuntimeLogger and RuntimeLogStore.
|
||||
|
||||
Tests incremental JSONL writes (L2/L3), crash resilience, and L1
|
||||
summary aggregation at end_run().
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from framework.runtime.runtime_log_schemas import (
|
||||
NodeDetail,
|
||||
NodeStepLog,
|
||||
RunSummaryLog,
|
||||
ToolCallLog,
|
||||
)
|
||||
from framework.runtime.runtime_log_store import RuntimeLogStore
|
||||
from framework.runtime.runtime_logger import RuntimeLogger
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# RuntimeLogStore tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRuntimeLogStore:
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_run_dir_creates_directory(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
store.ensure_run_dir("test_run_1")
|
||||
assert (tmp_path / "logs" / "runs" / "test_run_1").is_dir()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_append_and_load_details(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
store.ensure_run_dir("test_run_2")
|
||||
|
||||
detail1 = NodeDetail(
|
||||
node_id="node-1",
|
||||
node_name="Search Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=2,
|
||||
exit_status="success",
|
||||
accept_count=1,
|
||||
retry_count=1,
|
||||
)
|
||||
detail2 = NodeDetail(
|
||||
node_id="node-2",
|
||||
node_name="Process Node",
|
||||
node_type="function",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
)
|
||||
|
||||
store.append_node_detail("test_run_2", detail1)
|
||||
store.append_node_detail("test_run_2", detail2)
|
||||
|
||||
loaded = await store.load_details("test_run_2")
|
||||
assert loaded is not None
|
||||
assert len(loaded.nodes) == 2
|
||||
assert loaded.nodes[0].node_id == "node-1"
|
||||
assert loaded.nodes[0].exit_status == "success"
|
||||
assert loaded.nodes[1].node_type == "function"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_append_and_load_tool_logs(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
store.ensure_run_dir("test_run_3")
|
||||
|
||||
step = NodeStepLog(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
llm_text="I will search for the data.",
|
||||
tool_calls=[
|
||||
ToolCallLog(
|
||||
tool_use_id="tc_1",
|
||||
tool_name="web_search",
|
||||
tool_input={"query": "test"},
|
||||
result="Found 3 results",
|
||||
is_error=False,
|
||||
)
|
||||
],
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
latency_ms=1200,
|
||||
verdict="CONTINUE",
|
||||
)
|
||||
|
||||
store.append_step("test_run_3", step)
|
||||
|
||||
loaded = await store.load_tool_logs("test_run_3")
|
||||
assert loaded is not None
|
||||
assert len(loaded.steps) == 1
|
||||
assert loaded.steps[0].tool_calls[0].tool_name == "web_search"
|
||||
assert loaded.steps[0].input_tokens == 100
|
||||
assert loaded.steps[0].node_id == "node-1"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_save_and_load_summary(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
summary = RunSummaryLog(
|
||||
run_id="test_run_1",
|
||||
agent_id="agent-a",
|
||||
goal_id="goal-1",
|
||||
status="success",
|
||||
total_nodes_executed=3,
|
||||
node_path=["node-1", "node-2", "node-3"],
|
||||
started_at="2025-01-01T00:00:00",
|
||||
duration_ms=5000,
|
||||
execution_quality="clean",
|
||||
)
|
||||
|
||||
await store.save_summary("test_run_1", summary)
|
||||
|
||||
loaded = await store.load_summary("test_run_1")
|
||||
assert loaded is not None
|
||||
assert loaded.run_id == "test_run_1"
|
||||
assert loaded.status == "success"
|
||||
assert loaded.total_nodes_executed == 3
|
||||
assert loaded.goal_id == "goal-1"
|
||||
assert loaded.execution_quality == "clean"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_load_missing_run_returns_none(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
assert await store.load_summary("nonexistent") is None
|
||||
assert await store.load_details("nonexistent") is None
|
||||
assert await store.load_tool_logs("nonexistent") is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_runs_empty(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
runs = await store.list_runs()
|
||||
assert runs == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_runs_with_filter(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
|
||||
# Save a success run
|
||||
store.ensure_run_dir("run_ok")
|
||||
await store.save_summary(
|
||||
"run_ok",
|
||||
RunSummaryLog(
|
||||
run_id="run_ok",
|
||||
status="success",
|
||||
started_at="2025-01-01T00:00:01",
|
||||
),
|
||||
)
|
||||
# Save a failure run
|
||||
store.ensure_run_dir("run_fail")
|
||||
await store.save_summary(
|
||||
"run_fail",
|
||||
RunSummaryLog(
|
||||
run_id="run_fail",
|
||||
status="failure",
|
||||
needs_attention=True,
|
||||
started_at="2025-01-01T00:00:02",
|
||||
),
|
||||
)
|
||||
|
||||
# All runs
|
||||
all_runs = await store.list_runs()
|
||||
assert len(all_runs) == 2
|
||||
|
||||
# Filter by status
|
||||
success_runs = await store.list_runs(status="success")
|
||||
assert len(success_runs) == 1
|
||||
assert success_runs[0].run_id == "run_ok"
|
||||
|
||||
# Filter by needs_attention
|
||||
attention_runs = await store.list_runs(status="needs_attention")
|
||||
assert len(attention_runs) == 1
|
||||
assert attention_runs[0].run_id == "run_fail"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_runs_sorted_by_timestamp_desc(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
|
||||
for i in range(5):
|
||||
run_id = f"run_{i}"
|
||||
store.ensure_run_dir(run_id)
|
||||
await store.save_summary(
|
||||
run_id,
|
||||
RunSummaryLog(
|
||||
run_id=run_id,
|
||||
status="success",
|
||||
started_at=f"2025-01-01T00:00:{i:02d}",
|
||||
),
|
||||
)
|
||||
|
||||
runs = await store.list_runs()
|
||||
# Most recent first
|
||||
assert runs[0].run_id == "run_4"
|
||||
assert runs[-1].run_id == "run_0"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_runs_limit(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
|
||||
for i in range(10):
|
||||
run_id = f"run_{i}"
|
||||
store.ensure_run_dir(run_id)
|
||||
await store.save_summary(
|
||||
run_id,
|
||||
RunSummaryLog(
|
||||
run_id=run_id,
|
||||
status="success",
|
||||
started_at=f"2025-01-01T00:00:{i:02d}",
|
||||
),
|
||||
)
|
||||
|
||||
runs = await store.list_runs(limit=3)
|
||||
assert len(runs) == 3
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_runs_includes_in_progress(self, tmp_path: Path):
|
||||
"""Directories without summary.json appear as in_progress."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
|
||||
# Completed run with summary
|
||||
store.ensure_run_dir("run_done")
|
||||
await store.save_summary(
|
||||
"run_done",
|
||||
RunSummaryLog(
|
||||
run_id="run_done",
|
||||
status="success",
|
||||
started_at="2025-01-01T00:00:01",
|
||||
),
|
||||
)
|
||||
|
||||
# In-progress run: directory exists but no summary.json
|
||||
store.ensure_run_dir("run_active")
|
||||
|
||||
all_runs = await store.list_runs()
|
||||
assert len(all_runs) == 2
|
||||
run_ids = {r.run_id for r in all_runs}
|
||||
assert "run_done" in run_ids
|
||||
assert "run_active" in run_ids
|
||||
|
||||
active = next(r for r in all_runs if r.run_id == "run_active")
|
||||
assert active.status == "in_progress"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_node_details_sync(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
store.ensure_run_dir("test_run")
|
||||
|
||||
store.append_node_detail(
|
||||
"test_run",
|
||||
NodeDetail(
|
||||
node_id="n1", node_name="A", success=True, input_tokens=100, output_tokens=50
|
||||
),
|
||||
)
|
||||
store.append_node_detail(
|
||||
"test_run",
|
||||
NodeDetail(node_id="n2", node_name="B", success=False, error="oops"),
|
||||
)
|
||||
|
||||
details = store.read_node_details_sync("test_run")
|
||||
assert len(details) == 2
|
||||
assert details[0].node_id == "n1"
|
||||
assert details[1].error == "oops"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_corrupt_jsonl_line_skipped(self, tmp_path: Path):
|
||||
"""A corrupt JSONL line should be skipped without breaking reads."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
store.ensure_run_dir("test_run")
|
||||
|
||||
# Write a valid line, a corrupt line, then another valid line
|
||||
jsonl_path = tmp_path / "logs" / "runs" / "test_run" / "details.jsonl"
|
||||
valid1 = json.dumps(NodeDetail(node_id="n1", node_name="A", success=True).model_dump())
|
||||
valid2 = json.dumps(NodeDetail(node_id="n2", node_name="B", success=True).model_dump())
|
||||
jsonl_path.write_text(f"{valid1}\n{{corrupt line\n{valid2}\n")
|
||||
|
||||
details = store.read_node_details_sync("test_run")
|
||||
assert len(details) == 2
|
||||
assert details[0].node_id == "n1"
|
||||
assert details[1].node_id == "n2"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# RuntimeLogger tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRuntimeLogger:
|
||||
@pytest.mark.asyncio
|
||||
async def test_start_run_returns_run_id(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rl = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rl.start_run("goal-1")
|
||||
assert run_id
|
||||
assert len(run_id) > 10 # timestamp + uuid
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_start_run_creates_directory(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rl = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rl.start_run("goal-1")
|
||||
assert (tmp_path / "logs" / "runs" / run_id).is_dir()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_log_step_writes_to_disk_immediately(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rl = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rl.start_run("goal-1")
|
||||
|
||||
rl.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
llm_text="Searching.",
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
)
|
||||
|
||||
# Verify the file exists and has one line
|
||||
jsonl_path = tmp_path / "logs" / "runs" / run_id / "tool_logs.jsonl"
|
||||
assert jsonl_path.exists()
|
||||
lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
|
||||
assert len(lines) == 1
|
||||
|
||||
data = json.loads(lines[0])
|
||||
assert data["node_id"] == "node-1"
|
||||
assert data["input_tokens"] == 100
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_log_node_complete_writes_to_disk_immediately(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rl = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rl.start_run("goal-1")
|
||||
|
||||
rl.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
exit_status="success",
|
||||
)
|
||||
|
||||
jsonl_path = tmp_path / "logs" / "runs" / run_id / "details.jsonl"
|
||||
assert jsonl_path.exists()
|
||||
lines = [line for line in jsonl_path.read_text().strip().split("\n") if line]
|
||||
assert len(lines) == 1
|
||||
|
||||
data = json.loads(lines[0])
|
||||
assert data["node_id"] == "node-1"
|
||||
assert data["exit_status"] == "success"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_full_lifecycle(self, tmp_path: Path):
|
||||
"""Test start_run -> log_step (x3) -> log_node_complete -> end_run."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Step 0: RETRY (event_loop iteration)
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
verdict="RETRY",
|
||||
verdict_feedback="Missing output keys: ['result']",
|
||||
tool_calls=[
|
||||
{
|
||||
"tool_use_id": "tc_1",
|
||||
"tool_name": "web_search",
|
||||
"tool_input": {"query": "test"},
|
||||
"content": "Found data",
|
||||
"is_error": False,
|
||||
}
|
||||
],
|
||||
llm_text="Let me search for that.",
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
latency_ms=1000,
|
||||
)
|
||||
|
||||
# Step 1: CONTINUE (unjudged)
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=1,
|
||||
verdict="CONTINUE",
|
||||
verdict_feedback="Unjudged",
|
||||
tool_calls=[],
|
||||
llm_text="Processing...",
|
||||
input_tokens=80,
|
||||
output_tokens=30,
|
||||
latency_ms=500,
|
||||
)
|
||||
|
||||
# Step 2: ACCEPT
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=2,
|
||||
verdict="ACCEPT",
|
||||
verdict_feedback="All outputs set",
|
||||
tool_calls=[],
|
||||
llm_text="Here is your result.",
|
||||
input_tokens=90,
|
||||
output_tokens=40,
|
||||
latency_ms=800,
|
||||
)
|
||||
|
||||
# Log node completion
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=3,
|
||||
tokens_used=390,
|
||||
input_tokens=270,
|
||||
output_tokens=120,
|
||||
latency_ms=2300,
|
||||
exit_status="success",
|
||||
accept_count=1,
|
||||
retry_count=1,
|
||||
continue_count=1,
|
||||
)
|
||||
|
||||
await rt_logger.end_run(
|
||||
status="success",
|
||||
duration_ms=2300,
|
||||
node_path=["node-1"],
|
||||
execution_quality="clean",
|
||||
)
|
||||
|
||||
# Verify Level 1: Summary
|
||||
summary = await store.load_summary(run_id)
|
||||
assert summary is not None
|
||||
assert summary.status == "success"
|
||||
assert summary.total_nodes_executed == 1
|
||||
assert summary.total_input_tokens == 270
|
||||
assert summary.total_output_tokens == 120
|
||||
assert summary.needs_attention is False
|
||||
assert summary.duration_ms == 2300
|
||||
assert summary.execution_quality == "clean"
|
||||
assert summary.node_path == ["node-1"]
|
||||
|
||||
# Verify Level 2: Details
|
||||
details = await store.load_details(run_id)
|
||||
assert details is not None
|
||||
assert len(details.nodes) == 1
|
||||
assert details.nodes[0].node_id == "node-1"
|
||||
assert details.nodes[0].exit_status == "success"
|
||||
assert details.nodes[0].accept_count == 1
|
||||
assert details.nodes[0].retry_count == 1
|
||||
|
||||
# Verify Level 3: Tool logs
|
||||
tool_logs = await store.load_tool_logs(run_id)
|
||||
assert tool_logs is not None
|
||||
assert len(tool_logs.steps) == 3
|
||||
assert tool_logs.steps[0].tool_calls[0].tool_name == "web_search"
|
||||
assert tool_logs.steps[0].input_tokens == 100
|
||||
assert tool_logs.steps[0].verdict == "RETRY"
|
||||
assert tool_logs.steps[2].verdict == "ACCEPT"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_multi_node_lifecycle(self, tmp_path: Path):
|
||||
"""Test logging across multiple nodes in a graph run."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Node 1: event_loop
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
verdict="ACCEPT",
|
||||
llm_text="Done.",
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
)
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
tokens_used=150,
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
exit_status="success",
|
||||
accept_count=1,
|
||||
)
|
||||
|
||||
# Node 2: function
|
||||
rt_logger.log_step(
|
||||
node_id="node-2",
|
||||
node_type="function",
|
||||
step_index=0,
|
||||
latency_ms=50,
|
||||
)
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-2",
|
||||
node_name="Process",
|
||||
node_type="function",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
latency_ms=50,
|
||||
)
|
||||
|
||||
await rt_logger.end_run(
|
||||
status="success",
|
||||
duration_ms=1000,
|
||||
node_path=["node-1", "node-2"],
|
||||
execution_quality="clean",
|
||||
)
|
||||
|
||||
summary = await store.load_summary(run_id)
|
||||
assert summary.total_nodes_executed == 2
|
||||
assert summary.node_path == ["node-1", "node-2"]
|
||||
assert summary.total_input_tokens == 100
|
||||
assert summary.total_output_tokens == 50
|
||||
|
||||
details = await store.load_details(run_id)
|
||||
assert len(details.nodes) == 2
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_failed_node_needs_attention(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
verdict="ESCALATE",
|
||||
verdict_feedback="Cannot proceed, need human input",
|
||||
tool_calls=[],
|
||||
llm_text="I'm stuck.",
|
||||
input_tokens=50,
|
||||
output_tokens=20,
|
||||
latency_ms=300,
|
||||
)
|
||||
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error="Judge escalated: Cannot proceed",
|
||||
total_steps=1,
|
||||
tokens_used=70,
|
||||
latency_ms=300,
|
||||
exit_status="escalated",
|
||||
escalate_count=1,
|
||||
)
|
||||
|
||||
await rt_logger.end_run(
|
||||
status="failure",
|
||||
duration_ms=300,
|
||||
node_path=["node-1"],
|
||||
execution_quality="failed",
|
||||
)
|
||||
|
||||
summary = await store.load_summary(run_id)
|
||||
assert summary is not None
|
||||
assert summary.needs_attention is True
|
||||
assert any(
|
||||
"failed" in r.lower() or "escalat" in r.lower() for r in summary.attention_reasons
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_node_logged_no_op_if_already_logged(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Node logs itself
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
exit_status="success",
|
||||
)
|
||||
|
||||
# Executor calls ensure_node_logged — should be no-op
|
||||
rt_logger.ensure_node_logged(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
)
|
||||
|
||||
# Only one entry on disk
|
||||
details = store.read_node_details_sync(run_id)
|
||||
assert len(details) == 1
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_node_logged_creates_entry_if_missing(self, tmp_path: Path):
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Node didn't log itself — executor calls ensure
|
||||
rt_logger.ensure_node_logged(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error="Crashed",
|
||||
)
|
||||
|
||||
details = store.read_node_details_sync(run_id)
|
||||
assert len(details) == 1
|
||||
assert details[0].error == "Crashed"
|
||||
assert details[0].needs_attention is True
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_large_data_preserved(self, tmp_path: Path):
|
||||
"""Large tool input/result/llm_text values should be stored in full."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
long_value = "x" * 2000
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
verdict="ACCEPT",
|
||||
tool_calls=[
|
||||
{
|
||||
"tool_use_id": "tc_1",
|
||||
"tool_name": "write_file",
|
||||
"tool_input": {"content": long_value},
|
||||
"content": "y" * 5000,
|
||||
"is_error": False,
|
||||
}
|
||||
],
|
||||
llm_text="z" * 5000,
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
latency_ms=500,
|
||||
)
|
||||
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Writer",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=1,
|
||||
exit_status="success",
|
||||
)
|
||||
|
||||
await rt_logger.end_run(
|
||||
status="success",
|
||||
duration_ms=500,
|
||||
node_path=["node-1"],
|
||||
)
|
||||
|
||||
tool_logs = await store.load_tool_logs(run_id)
|
||||
assert tool_logs is not None
|
||||
tc = tool_logs.steps[0].tool_calls[0]
|
||||
# Full values preserved
|
||||
assert len(tc.tool_input["content"]) == 2000
|
||||
assert len(tc.result) == 5000
|
||||
assert len(tool_logs.steps[0].llm_text) == 5000
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_end_run_does_not_propagate_exceptions(self, tmp_path: Path):
|
||||
"""end_run must catch all exceptions and never propagate."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
rt_logger.start_run("goal-1")
|
||||
|
||||
# Make the store path unwritable to force an error
|
||||
import os
|
||||
|
||||
bad_path = tmp_path / "logs" / "runs"
|
||||
bad_path.mkdir(parents=True, exist_ok=True)
|
||||
# Create a file where directory should be
|
||||
run_dir = bad_path / rt_logger._run_id
|
||||
run_dir.mkdir(parents=True, exist_ok=True)
|
||||
blocker = run_dir / "summary.json"
|
||||
blocker.write_text("not json")
|
||||
os.chmod(str(run_dir), 0o444)
|
||||
|
||||
try:
|
||||
# This should NOT raise, even though writing will fail
|
||||
await rt_logger.end_run("success", duration_ms=100)
|
||||
finally:
|
||||
# Restore permissions for cleanup
|
||||
os.chmod(str(run_dir), 0o755)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_crash_resilience_l2_l3_survive(self, tmp_path: Path):
|
||||
"""L2 and L3 data survives even if end_run() is never called (crash)."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log some steps and a node
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
llm_text="Working...",
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
)
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=1,
|
||||
llm_text="Still working...",
|
||||
input_tokens=80,
|
||||
output_tokens=30,
|
||||
)
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Search",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=2,
|
||||
input_tokens=180,
|
||||
output_tokens=80,
|
||||
)
|
||||
|
||||
# Simulate crash: do NOT call end_run()
|
||||
|
||||
# Verify L2 and L3 are recoverable from disk
|
||||
details = await store.load_details(run_id)
|
||||
assert details is not None
|
||||
assert len(details.nodes) == 1
|
||||
assert details.nodes[0].node_id == "node-1"
|
||||
|
||||
tool_logs = await store.load_tool_logs(run_id)
|
||||
assert tool_logs is not None
|
||||
assert len(tool_logs.steps) == 2
|
||||
|
||||
# But no L1 summary exists
|
||||
summary = await store.load_summary(run_id)
|
||||
assert summary is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_in_progress_run_visible_in_list(self, tmp_path: Path):
|
||||
"""An in-progress run (no summary.json) appears in list_runs."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log a step but don't end
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
llm_text="Working...",
|
||||
)
|
||||
|
||||
runs = await store.list_runs()
|
||||
assert len(runs) == 1
|
||||
assert runs[0].run_id == run_id
|
||||
assert runs[0].status == "in_progress"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_log_step_with_error_and_stacktrace(self, tmp_path: Path):
|
||||
"""Test logging partial steps with errors and stack traces."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log a partial step with error
|
||||
rt_logger.log_step(
|
||||
node_id="node-1",
|
||||
node_type="event_loop",
|
||||
step_index=0,
|
||||
error="LLM call failed: Connection timeout",
|
||||
stacktrace=(
|
||||
"Traceback (most recent call last):\n"
|
||||
" File test.py line 10\n"
|
||||
" raise TimeoutError()"
|
||||
),
|
||||
is_partial=True,
|
||||
)
|
||||
|
||||
# Verify the step was logged
|
||||
loaded = await store.load_tool_logs(run_id)
|
||||
assert loaded is not None
|
||||
assert len(loaded.steps) == 1
|
||||
step = loaded.steps[0]
|
||||
assert step.error == "LLM call failed: Connection timeout"
|
||||
assert "TimeoutError" in step.stacktrace
|
||||
assert step.is_partial is True
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_log_node_complete_with_stacktrace(self, tmp_path: Path):
|
||||
"""Test logging node completion with stack traces."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log node failure with stacktrace
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Test Node",
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error="Node crashed",
|
||||
stacktrace=(
|
||||
"Traceback (most recent call last):\n"
|
||||
" File node.py line 42\n"
|
||||
" raise RuntimeError('crash')"
|
||||
),
|
||||
)
|
||||
|
||||
# Verify the detail was logged with stacktrace
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
assert len(loaded.nodes) == 1
|
||||
node = loaded.nodes[0]
|
||||
assert node.error == "Node crashed"
|
||||
assert "RuntimeError" in node.stacktrace
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attention_flags_excessive_retries(self, tmp_path: Path):
|
||||
"""Test that excessive retries trigger attention flags."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log node with excessive retries
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Retry Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
retry_count=5, # > 3 threshold
|
||||
)
|
||||
|
||||
# Verify attention flag is set
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
node = loaded.nodes[0]
|
||||
assert node.needs_attention is True
|
||||
assert any("Excessive retries" in reason for reason in node.attention_reasons)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attention_flags_high_latency(self, tmp_path: Path):
|
||||
"""Test that high latency triggers attention flags."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log node with high latency
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Slow Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
latency_ms=65000, # > 60000 threshold
|
||||
)
|
||||
|
||||
# Verify attention flag is set
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
node = loaded.nodes[0]
|
||||
assert node.needs_attention is True
|
||||
assert any("High latency" in reason for reason in node.attention_reasons)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attention_flags_high_token_usage(self, tmp_path: Path):
|
||||
"""Test that high token usage triggers attention flags."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log node with high token usage
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Token Heavy Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
tokens_used=150000, # > 100000 threshold
|
||||
)
|
||||
|
||||
# Verify attention flag is set
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
node = loaded.nodes[0]
|
||||
assert node.needs_attention is True
|
||||
assert any("High token usage" in reason for reason in node.attention_reasons)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attention_flags_many_iterations(self, tmp_path: Path):
|
||||
"""Test that many iterations trigger attention flags."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log node with many iterations
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Iterative Node",
|
||||
node_type="event_loop",
|
||||
success=True,
|
||||
total_steps=25, # > 20 threshold
|
||||
)
|
||||
|
||||
# Verify attention flag is set
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
node = loaded.nodes[0]
|
||||
assert node.needs_attention is True
|
||||
assert any("Many iterations" in reason for reason in node.attention_reasons)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_guard_failure_exit_status(self, tmp_path: Path):
|
||||
"""Test that guard failures use the correct exit status."""
|
||||
store = RuntimeLogStore(tmp_path / "logs")
|
||||
rt_logger = RuntimeLogger(store=store, agent_id="test-agent")
|
||||
run_id = rt_logger.start_run("goal-1")
|
||||
|
||||
# Log a guard failure
|
||||
rt_logger.log_node_complete(
|
||||
node_id="node-1",
|
||||
node_name="Guard Node",
|
||||
node_type="event_loop",
|
||||
success=False,
|
||||
error="LLM provider not available",
|
||||
exit_status="guard_failure",
|
||||
)
|
||||
|
||||
# Verify exit status
|
||||
loaded = await store.load_details(run_id)
|
||||
assert loaded is not None
|
||||
node = loaded.nodes[0]
|
||||
assert node.exit_status == "guard_failure"
|
||||
assert node.success is False
|
||||
@@ -1,4 +1,9 @@
|
||||
"""Tests for the storage module - FileStorage and ConcurrentStorage backends."""
|
||||
"""Tests for the storage module - FileStorage and ConcurrentStorage backends.
|
||||
|
||||
DEPRECATED: FileStorage and ConcurrentStorage are deprecated.
|
||||
New sessions use unified storage at sessions/{session_id}/state.json.
|
||||
These tests are kept for backward compatibility verification only.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
@@ -38,6 +43,7 @@ def create_test_run(
|
||||
# === FILESTORAGE TESTS ===
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
|
||||
class TestFileStorageBasics:
|
||||
"""Test basic FileStorage operations."""
|
||||
|
||||
@@ -57,6 +63,7 @@ class TestFileStorageBasics:
|
||||
assert storage.base_path == tmp_path
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
|
||||
class TestFileStorageRunOperations:
|
||||
"""Test FileStorage run CRUD operations."""
|
||||
|
||||
@@ -155,6 +162,7 @@ class TestFileStorageRunOperations:
|
||||
assert result is False
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
|
||||
class TestFileStorageIndexing:
|
||||
"""Test FileStorage index operations."""
|
||||
|
||||
@@ -259,6 +267,7 @@ class TestFileStorageIndexing:
|
||||
assert storage.get_runs_by_node("nonexistent") == []
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
|
||||
class TestFileStorageListOperations:
|
||||
"""Test FileStorage list operations."""
|
||||
|
||||
@@ -323,6 +332,7 @@ class TestCacheEntry:
|
||||
# === CONCURRENTSTORAGE TESTS ===
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageBasics:
|
||||
"""Test basic ConcurrentStorage operations."""
|
||||
|
||||
@@ -367,6 +377,7 @@ class TestConcurrentStorageBasics:
|
||||
assert storage._running is False
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageRunOperations:
|
||||
"""Test ConcurrentStorage run operations."""
|
||||
|
||||
@@ -471,6 +482,7 @@ class TestConcurrentStorageRunOperations:
|
||||
await storage.stop()
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageQueryOperations:
|
||||
"""Test ConcurrentStorage query operations."""
|
||||
|
||||
@@ -526,6 +538,7 @@ class TestConcurrentStorageQueryOperations:
|
||||
await storage.stop()
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageCacheManagement:
|
||||
"""Test ConcurrentStorage cache management."""
|
||||
|
||||
@@ -565,6 +578,7 @@ class TestConcurrentStorageCacheManagement:
|
||||
assert stats["valid_entries"] == 1
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageSyncAPI:
|
||||
"""Test ConcurrentStorage synchronous API for backward compatibility."""
|
||||
|
||||
@@ -598,6 +612,7 @@ class TestConcurrentStorageSyncAPI:
|
||||
assert loaded is None
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
|
||||
class TestConcurrentStorageStats:
|
||||
"""Test ConcurrentStorage statistics."""
|
||||
|
||||
|
||||
Generated
+2990
File diff suppressed because it is too large
Load Diff
+4
-4
@@ -76,7 +76,7 @@ def main():
|
||||
success(f"installed at {framework_path}")
|
||||
except subprocess.CalledProcessError:
|
||||
error("framework package not found")
|
||||
logger.info(f" Run: pip install -e {script_dir}")
|
||||
logger.info(f" Run: uv pip install -e {script_dir}")
|
||||
all_checks_passed = False
|
||||
|
||||
# Check 2: MCP dependencies
|
||||
@@ -90,7 +90,7 @@ def main():
|
||||
|
||||
if missing_deps:
|
||||
error(f"missing: {', '.join(missing_deps)}")
|
||||
logger.info(f" Run: pip install {' '.join(missing_deps)}")
|
||||
logger.info(f" Run: uv pip install {' '.join(missing_deps)}")
|
||||
all_checks_passed = False
|
||||
else:
|
||||
success("all installed")
|
||||
@@ -194,7 +194,7 @@ def main():
|
||||
logger.info("Your MCP server is ready to use.")
|
||||
logger.info("")
|
||||
logger.info(f"{Colors.BLUE}To start the server:{Colors.NC}")
|
||||
logger.info(" python -m framework.mcp.agent_builder_server")
|
||||
logger.info(" uv run python -m framework.mcp.agent_builder_server")
|
||||
logger.info("")
|
||||
logger.info(f"{Colors.BLUE}To use with Claude Desktop:{Colors.NC}")
|
||||
logger.info(" Add the configuration from .mcp.json to your")
|
||||
@@ -203,7 +203,7 @@ def main():
|
||||
logger.info(f"{Colors.RED}✗ Some checks failed{Colors.NC}")
|
||||
logger.info("")
|
||||
logger.info("To fix issues, run:")
|
||||
logger.info(f" python {script_dir / 'setup_mcp.py'}")
|
||||
logger.info(f" uv run python {script_dir / 'setup_mcp.py'}")
|
||||
logger.info("")
|
||||
|
||||
|
||||
|
||||
@@ -13,26 +13,26 @@ The Aden server handles OAuth2 authorization code flows (user login, consent, to
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Local Agent Environment │
|
||||
│ │
|
||||
│ Local Agent Environment │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ CredentialStore │ │
|
||||
│ │ CredentialStore │ │
|
||||
│ │ ┌────────────────────┐ ┌────────────────────────────┐ │ │
|
||||
│ │ │EncryptedFileStorage│ │ AdenSyncProvider │ │ │
|
||||
│ │ │ (local cache) │ │ - Fetches from Aden │ │ │
|
||||
│ │ │ ~/.hive/creds │ │ - Delegates refresh │ │ │
|
||||
│ │ │ ~/.hive/credentials│ │ - Delegates refresh │ │ │
|
||||
│ │ └────────────────────┘ │ - Reports usage │ │ │
|
||||
│ │ └─────────────┬──────────────┘ │ │
|
||||
│ └────────────────────────────────────────┼─────────────────┘ │
|
||||
│ │ │
|
||||
└───────────────────────────────────────────┼──────────────────────┘
|
||||
│ │ │
|
||||
└───────────────────────────────────────────┼─────────────────────┘
|
||||
│ HTTPS
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Aden Server │
|
||||
│ │
|
||||
│ Aden Server │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ Integration Management │ │
|
||||
│ │ Integration Management │ │
|
||||
│ │ - HubSpot, GitHub, Slack, etc. │ │
|
||||
│ │ - Handles OAuth2 auth code flow │ │
|
||||
│ │ - Stores refresh tokens securely │ │
|
||||
|
||||
+10
-10
@@ -96,14 +96,14 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
|
||||
{
|
||||
"mcpServers": {
|
||||
"agent-builder": {
|
||||
"command": "core/.venv/bin/python",
|
||||
"args": ["-m", "framework.mcp.agent_builder_server"],
|
||||
"cwd": "."
|
||||
"command": "uv",
|
||||
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
|
||||
"cwd": "core"
|
||||
},
|
||||
"tools": {
|
||||
"command": "tools/.venv/bin/python",
|
||||
"args": ["-m", "aden_tools.mcp_server", "--stdio"],
|
||||
"cwd": "."
|
||||
"command": "uv",
|
||||
"args": ["run", "mcp_server.py", "--stdio"],
|
||||
"cwd": "tools"
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -152,7 +152,7 @@ Add to `.vscode/settings.json`:
|
||||
|
||||
1. **Never commit API keys** - Use environment variables or `.env` files
|
||||
2. **`.env` is git-ignored** - Copy `.env.example` to `.env` at the project root and fill in your values
|
||||
3. **Mock mode for testing** - Set `MOCK_MODE=1` to avoid LLM calls during development
|
||||
3. **Use real provider keys in non-production environments** - validate configuration with low-risk inputs before production rollout
|
||||
4. **Credential isolation** - Each tool validates its own credentials at runtime
|
||||
|
||||
## Troubleshooting
|
||||
@@ -162,7 +162,7 @@ Add to `.vscode/settings.json`:
|
||||
Install the core package:
|
||||
|
||||
```bash
|
||||
cd core && pip install -e .
|
||||
cd core && uv pip install -e .
|
||||
```
|
||||
|
||||
### API key not found
|
||||
@@ -184,7 +184,7 @@ $env:ANTHROPIC_API_KEY = "sk-ant-..."
|
||||
Run from the project root with PYTHONPATH:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports python -m my_agent validate
|
||||
PYTHONPATH=exports uv run python -m my_agent validate
|
||||
```
|
||||
|
||||
See [Environment Setup](../ENVIRONMENT_SETUP.md) for detailed installation instructions.
|
||||
See [Environment Setup](./environment-setup.md) for detailed installation instructions.
|
||||
|
||||
@@ -8,7 +8,7 @@ Hive uses [Ruff](https://docs.astral.sh/ruff/) for all Python linting and format
|
||||
|
||||
```bash
|
||||
# 1. Install dev dependencies
|
||||
cd core && pip install -e ".[dev]"
|
||||
cd core && uv pip install -e ".[dev]"
|
||||
|
||||
# 2. Install pre-commit hooks (runs ruff automatically before each commit)
|
||||
make install-hooks
|
||||
@@ -142,7 +142,7 @@ The single source of truth for lint rules is the `[tool.ruff]` section in each p
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: Do I need to install anything beyond `pip install -e ".[dev]"`?**
|
||||
**Q: Do I need to install anything beyond `uv pip install -e ".[dev]"`?**
|
||||
Only if you want pre-commit hooks: `make install-hooks`. Everything else (VS Code settings, editorconfig) works automatically.
|
||||
|
||||
**Q: Can I use a different formatter (black, autopep8)?**
|
||||
|
||||
@@ -1202,7 +1202,7 @@ class HashiCorpVaultStorage(CredentialStorage):
|
||||
"""
|
||||
HashiCorp Vault storage adapter.
|
||||
|
||||
Requires: pip install hvac
|
||||
Requires: uv pip install hvac
|
||||
|
||||
Features:
|
||||
- KV v2 secrets engine support
|
||||
@@ -1243,7 +1243,7 @@ class HashiCorpVaultStorage(CredentialStorage):
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"HashiCorp Vault support requires 'hvac'. "
|
||||
"Install with: pip install hvac"
|
||||
"Install with: uv pip install hvac"
|
||||
)
|
||||
|
||||
import os
|
||||
|
||||
@@ -20,12 +20,12 @@ This guide covers everything you need to know to develop with the Aden Agent Fra
|
||||
|
||||
Aden Agent Framework is a Python-based system for building goal-driven, self-improving AI agents.
|
||||
|
||||
| Package | Directory | Description | Tech Stack |
|
||||
| ------------- | ---------- | --------------------------------------- | ------------ |
|
||||
| **framework** | `/core` | Core runtime, graph executor, protocols | Python 3.11+ |
|
||||
| **tools** | `/tools` | MCP tools for agent capabilities | Python 3.11+ |
|
||||
| Package | Directory | Description | Tech Stack |
|
||||
| ------------- | ---------- | ----------------------------------------- | ------------ |
|
||||
| **framework** | `/core` | Core runtime, graph executor, protocols | Python 3.11+ |
|
||||
| **tools** | `/tools` | MCP tools for agent capabilities | Python 3.11+ |
|
||||
| **exports** | `/exports` | Agent packages (user-created, gitignored) | Python 3.11+ |
|
||||
| **skills** | `.claude` | Claude Code skills for building/testing | Markdown |
|
||||
| **skills** | `.claude` | Claude Code skills for building/testing | Markdown |
|
||||
|
||||
### Key Principles
|
||||
|
||||
@@ -101,22 +101,31 @@ Get API keys:
|
||||
|
||||
This installs agent-related Claude Code skills:
|
||||
|
||||
- `/building-agents-core` - Fundamental agent concepts
|
||||
- `/building-agents-construction` - Step-by-step agent building
|
||||
- `/building-agents-patterns` - Best practices and design patterns
|
||||
- `/testing-agent` - Test and validate agents
|
||||
- `/agent-workflow` - End-to-end guided workflow
|
||||
- `/hive` - Complete workflow for building agents
|
||||
- `/hive-create` - Step-by-step agent building
|
||||
- `/hive-concepts` - Fundamental agent concepts
|
||||
- `/hive-patterns` - Best practices and design patterns
|
||||
- `/hive-test` - Test and validate agents
|
||||
|
||||
### Cursor IDE Support
|
||||
|
||||
Skills are also available in Cursor. To enable:
|
||||
|
||||
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
|
||||
2. Run `MCP: Enable` to enable MCP servers
|
||||
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
|
||||
4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
|
||||
|
||||
### Verify Setup
|
||||
|
||||
```bash
|
||||
# Verify package imports
|
||||
python -c "import framework; print('✓ framework OK')"
|
||||
python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
python -c "import litellm; print('✓ litellm OK')"
|
||||
uv run python -c "import framework; print('✓ framework OK')"
|
||||
uv run python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
uv run python -c "import litellm; print('✓ litellm OK')"
|
||||
|
||||
# Run an agent (after building one via /building-agents-construction)
|
||||
PYTHONPATH=core:exports python -m your_agent_name validate
|
||||
# Run an agent (after building one via /hive-create)
|
||||
PYTHONPATH=exports uv run python -m your_agent_name validate
|
||||
```
|
||||
|
||||
---
|
||||
@@ -140,21 +149,11 @@ hive/ # Repository root
|
||||
│
|
||||
├── .claude/ # Claude Code Skills
|
||||
│ └── skills/ # Skills for building
|
||||
│ ├── building-agents-core/
|
||||
| | ├── SKILL.md # Main skill definition
|
||||
│ | └── examples
|
||||
│ ├── building-agents-patterns/
|
||||
| | ├── SKILL.md
|
||||
│ | └── examples
|
||||
│ ├── building-agents-construction/
|
||||
| | ├── SKILL.md
|
||||
│ | └── examples
|
||||
│ ├── testing-agent/ # Skills for testing agents
|
||||
│ │ ├── SKILL.md
|
||||
│ | └── examples
|
||||
│ └── agent-workflow/ # Complete workflow
|
||||
| ├── SKILL.md
|
||||
│ └── examples
|
||||
│ ├── hive/ # Complete workflow
|
||||
│ ├── hive-create/ # Step-by-step build guide
|
||||
│ ├── hive-concepts/ # Fundamental concepts
|
||||
│ ├── hive-patterns/ # Best practices
|
||||
│ └── hive-test/ # Test and validate agents
|
||||
│
|
||||
├── core/ # CORE FRAMEWORK PACKAGE
|
||||
│ ├── framework/ # Main package code
|
||||
@@ -168,6 +167,7 @@ hive/ # Repository root
|
||||
│ │ ├── schemas/ # Data schemas
|
||||
│ │ ├── storage/ # File-based persistence
|
||||
│ │ ├── testing/ # Testing utilities
|
||||
│ │ ├── tui/ # Terminal UI dashboard
|
||||
│ │ └── __init__.py
|
||||
│ ├── pyproject.toml # Package metadata and dependencies
|
||||
│ ├── README.md # Framework documentation
|
||||
@@ -188,7 +188,10 @@ hive/ # Repository root
|
||||
│ └── README.md # Tools documentation
|
||||
│
|
||||
├── exports/ # AGENT PACKAGES (user-created, gitignored)
|
||||
│ └── your_agent_name/ # Created via /building-agents-construction
|
||||
│ └── your_agent_name/ # Created via /hive-create
|
||||
│
|
||||
├── examples/ # Example agents
|
||||
│ └── templates/ # Pre-built template agents
|
||||
│
|
||||
├── docs/ # Documentation
|
||||
│ ├── getting-started.md # Quick start guide
|
||||
@@ -198,19 +201,15 @@ hive/ # Repository root
|
||||
│ ├── quizzes/ # Developer quizzes
|
||||
│ └── i18n/ # Translations
|
||||
│
|
||||
├── scripts/ # Build & utility scripts
|
||||
│ ├── setup-python.sh # Python environment setup
|
||||
│ └── setup.sh # Legacy setup script
|
||||
├── scripts/ # Utility scripts
|
||||
│ └── auto-close-duplicates.ts # GitHub duplicate issue closer
|
||||
│
|
||||
├── quickstart.sh # Interactive setup wizard
|
||||
├── ENVIRONMENT_SETUP.md # Complete Python setup guide
|
||||
├── README.md # Project overview
|
||||
├── DEVELOPER.md # This file
|
||||
├── CONTRIBUTING.md # Contribution guidelines
|
||||
├── CHANGELOG.md # Version history
|
||||
├── ROADMAP.md # Product roadmap
|
||||
├── LICENSE # Apache 2.0 License
|
||||
├── CODE_OF_CONDUCT.md # Community guidelines
|
||||
├── docs/CODE_OF_CONDUCT.md # Community guidelines
|
||||
└── SECURITY.md # Security policy
|
||||
```
|
||||
|
||||
@@ -227,10 +226,10 @@ The fastest way to build agents is using the Claude Code skills:
|
||||
./quickstart.sh
|
||||
|
||||
# Build a new agent
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
|
||||
# Test the agent
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
```
|
||||
|
||||
### Agent Development Workflow
|
||||
@@ -238,7 +237,7 @@ claude> /testing-agent
|
||||
1. **Define Your Goal**
|
||||
|
||||
```
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
Enter goal: "Build an agent that processes customer support tickets"
|
||||
```
|
||||
|
||||
@@ -256,12 +255,12 @@ claude> /testing-agent
|
||||
4. **Validate the Agent**
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports python -m your_agent_name validate
|
||||
PYTHONPATH=exports uv run python -m your_agent_name validate
|
||||
```
|
||||
|
||||
5. **Test the Agent**
|
||||
```
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
```
|
||||
|
||||
### Manual Agent Development
|
||||
@@ -301,22 +300,19 @@ If you prefer to build agents manually:
|
||||
### Running Agents
|
||||
|
||||
```bash
|
||||
# Validate agent structure
|
||||
PYTHONPATH=core:exports python -m agent_name validate
|
||||
# Browse and run agents interactively (Recommended)
|
||||
hive tui
|
||||
|
||||
# Show agent information
|
||||
PYTHONPATH=core:exports python -m agent_name info
|
||||
# Run a specific agent
|
||||
hive run exports/my_agent --input '{"ticket_content": "My login is broken", "customer_id": "CUST-123"}'
|
||||
|
||||
# Run agent with input
|
||||
PYTHONPATH=core:exports python -m agent_name run --input '{
|
||||
"ticket_content": "My login is broken",
|
||||
"customer_id": "CUST-123"
|
||||
}'
|
||||
# Run with TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
|
||||
# Run in mock mode (no LLM calls)
|
||||
PYTHONPATH=core:exports python -m agent_name run --mock --input '{...}'
|
||||
```
|
||||
|
||||
> **Using Python directly:** `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
|
||||
|
||||
---
|
||||
|
||||
## Testing Agents
|
||||
@@ -325,7 +321,7 @@ PYTHONPATH=core:exports python -m agent_name run --mock --input '{...}'
|
||||
|
||||
```bash
|
||||
# Run tests for an agent
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
```
|
||||
|
||||
This generates and runs:
|
||||
@@ -338,17 +334,17 @@ This generates and runs:
|
||||
|
||||
```bash
|
||||
# Run all tests for an agent
|
||||
PYTHONPATH=core:exports python -m agent_name test
|
||||
PYTHONPATH=exports uv run python -m agent_name test
|
||||
|
||||
# Run specific test type
|
||||
PYTHONPATH=core:exports python -m agent_name test --type constraint
|
||||
PYTHONPATH=core:exports python -m agent_name test --type success
|
||||
PYTHONPATH=exports uv run python -m agent_name test --type constraint
|
||||
PYTHONPATH=exports uv run python -m agent_name test --type success
|
||||
|
||||
# Run with parallel execution
|
||||
PYTHONPATH=core:exports python -m agent_name test --parallel 4
|
||||
PYTHONPATH=exports uv run python -m agent_name test --parallel 4
|
||||
|
||||
# Fail fast (stop on first failure)
|
||||
PYTHONPATH=core:exports python -m agent_name test --fail-fast
|
||||
PYTHONPATH=exports uv run python -m agent_name test --fail-fast
|
||||
```
|
||||
|
||||
### Writing Custom Tests
|
||||
@@ -543,7 +539,7 @@ uv add <package>
|
||||
|
||||
```bash
|
||||
# Option 1: Use Claude Code skill (recommended)
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
|
||||
# Option 2: Create manually
|
||||
# Note: exports/ is initially empty (gitignored). Create your agent directory:
|
||||
@@ -629,16 +625,10 @@ echo 'ANTHROPIC_API_KEY=your-key-here' >> .env
|
||||
|
||||
### Debugging Agent Execution
|
||||
|
||||
```python
|
||||
# Add debug logging to your agent
|
||||
import logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
```bash
|
||||
# Run with verbose output
|
||||
PYTHONPATH=core:exports python -m agent_name run --input '{...}' --verbose
|
||||
hive run exports/my_agent --verbose --input '{"task": "..."}'
|
||||
|
||||
# Use mock mode to test without LLM calls
|
||||
PYTHONPATH=core:exports python -m agent_name run --mock --input '{...}'
|
||||
```
|
||||
|
||||
---
|
||||
@@ -658,8 +648,6 @@ kill -9 <PID>
|
||||
# Or change ports in config.yaml and regenerate
|
||||
```
|
||||
|
||||
|
||||
|
||||
### Environment Variables Not Loading
|
||||
|
||||
```bash
|
||||
@@ -673,8 +661,6 @@ echo $ANTHROPIC_API_KEY
|
||||
# Then add your API keys
|
||||
```
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
@@ -9,8 +9,8 @@ Complete setup guide for building and running goal-driven agents with the Aden A
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
> **Note for Windows Users:**
|
||||
> Running the setup script on native Windows shells (PowerShell / Git Bash) may sometimes fail due to Python App Execution Aliases.
|
||||
> **Note for Windows Users:**
|
||||
> Running the setup script on native Windows shells (PowerShell / Git Bash) may sometimes fail due to Python App Execution Aliases.
|
||||
> It is **strongly recommended to use WSL (Windows Subsystem for Linux)** for a smoother setup experience.
|
||||
|
||||
This will:
|
||||
@@ -18,62 +18,45 @@ This will:
|
||||
- Check Python version (requires 3.11+)
|
||||
- Install the core framework package (`framework`)
|
||||
- Install the tools package (`aden_tools`)
|
||||
- Initialize encrypted credential store (`~/.hive/credentials`)
|
||||
- Configure default LLM provider
|
||||
- Fix package compatibility issues (openai + litellm)
|
||||
- Verify all installations
|
||||
|
||||
## Quick Setup (Windows – PowerShell)
|
||||
## Windows Setup
|
||||
|
||||
Windows users can use the native PowerShell setup script.
|
||||
Windows users should use **WSL (Windows Subsystem for Linux)** to set up and run agents.
|
||||
|
||||
Before running the script, allow script execution for the current session:
|
||||
|
||||
```powershell
|
||||
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
|
||||
```
|
||||
|
||||
Run setup from the project root:
|
||||
|
||||
```powershell
|
||||
./scripts/setup-python.ps1
|
||||
```
|
||||
|
||||
This will:
|
||||
|
||||
- Check Python version (requires 3.11+)
|
||||
- Create a local `.venv` virtual environment
|
||||
- Install the core framework package (`framework`)
|
||||
- Install the tools package (`aden_tools`)
|
||||
- Fix package compatibility issues (openai + litellm)
|
||||
- Verify all installations
|
||||
|
||||
After setup, activate the virtual environment:
|
||||
|
||||
```powershell
|
||||
.\.venv\Scripts\Activate.ps1
|
||||
```
|
||||
|
||||
Set `PYTHONPATH` (required in every new PowerShell session):
|
||||
|
||||
```powershell
|
||||
$env:PYTHONPATH="core;exports"
|
||||
```
|
||||
1. [Install WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) if you haven't already:
|
||||
```powershell
|
||||
wsl --install
|
||||
```
|
||||
2. Open your WSL terminal, clone the repo, and run the quickstart script:
|
||||
```bash
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
## Alpine Linux Setup
|
||||
|
||||
If you are using Alpine Linux (e.g., inside a Docker container), you must install system dependencies and use a virtual environment before running the setup script:
|
||||
|
||||
1. Install System Dependencies:
|
||||
|
||||
```bash
|
||||
apk update
|
||||
apk add bash git python3 py3-pip nodejs npm curl build-base python3-dev linux-headers libffi-dev
|
||||
```
|
||||
|
||||
2. Set up Virtual Environment (Required for Python 3.12+):
|
||||
|
||||
```
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install --upgrade pip setuptools wheel
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
# uv handles pip/setuptools/wheel automatically
|
||||
```
|
||||
|
||||
3. Run the Quickstart Script:
|
||||
|
||||
```
|
||||
./quickstart.sh
|
||||
```
|
||||
@@ -86,32 +69,32 @@ If you prefer to set up manually or the script fails:
|
||||
|
||||
```bash
|
||||
cd core
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### 2. Install Tools Package
|
||||
|
||||
```bash
|
||||
cd tools
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### 3. Upgrade OpenAI Package
|
||||
|
||||
```bash
|
||||
# litellm requires openai >= 1.0.0
|
||||
pip install --upgrade "openai>=1.0.0"
|
||||
uv pip install --upgrade "openai>=1.0.0"
|
||||
```
|
||||
|
||||
### 4. Verify Installation
|
||||
|
||||
```bash
|
||||
python -c "import framework; print('✓ framework OK')"
|
||||
python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
python -c "import litellm; print('✓ litellm OK')"
|
||||
uv run python -c "import framework; print('✓ framework OK')"
|
||||
uv run python -c "import aden_tools; print('✓ aden_tools OK')"
|
||||
uv run python -c "import litellm; print('✓ litellm OK')"
|
||||
```
|
||||
|
||||
> **Windows Tip:**
|
||||
> **Windows Tip:**
|
||||
> On Windows, if the verification commands fail, ensure you are running them in **WSL** or after **disabling Python App Execution Aliases** in Windows Settings → Apps → App Execution Aliases.
|
||||
|
||||
## Requirements
|
||||
@@ -129,27 +112,42 @@ python -c "import litellm; print('✓ litellm OK')"
|
||||
- Internet connection (for LLM API calls)
|
||||
- For Windows users: WSL 2 is recommended for full compatibility.
|
||||
|
||||
### API Keys (Optional)
|
||||
### API Keys
|
||||
|
||||
For running agents with real LLMs:
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="your-key-here"
|
||||
```
|
||||
|
||||
Windows (PowerShell):
|
||||
|
||||
```powershell
|
||||
$env:ANTHROPIC_API_KEY="your-key-here"
|
||||
```
|
||||
We recommend using quickstart.sh for LLM API credential setup and /hive-credentials for the tools credentials
|
||||
|
||||
## Running Agents
|
||||
|
||||
All agent commands must be run from the project root with `PYTHONPATH` set:
|
||||
The `hive` CLI is the primary interface for running agents:
|
||||
|
||||
```bash
|
||||
# Browse and run agents interactively (Recommended)
|
||||
hive tui
|
||||
|
||||
# Run a specific agent
|
||||
hive run exports/my_agent --input '{"task": "Your input here"}'
|
||||
|
||||
# Run with TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
```
|
||||
|
||||
### CLI Command Reference
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `hive tui` | Browse agents and launch TUI dashboard |
|
||||
| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
|
||||
| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
|
||||
| `hive info <path>` | Show agent details |
|
||||
| `hive validate <path>` | Validate agent structure |
|
||||
| `hive list [dir]` | List available agents |
|
||||
| `hive dispatch [dir]` | Multi-agent orchestration |
|
||||
|
||||
### Using Python directly (alternative)
|
||||
|
||||
```bash
|
||||
# From /hive/ directory
|
||||
PYTHONPATH=core:exports python -m agent_name COMMAND
|
||||
PYTHONPATH=exports uv run python -m agent_name COMMAND
|
||||
```
|
||||
|
||||
Windows (PowerShell):
|
||||
@@ -159,24 +157,6 @@ $env:PYTHONPATH="core;exports"
|
||||
python -m agent_name COMMAND
|
||||
```
|
||||
|
||||
### Example: Support Ticket Agent
|
||||
|
||||
```bash
|
||||
# Validate agent structure
|
||||
PYTHONPATH=core:exports python -m your_agent_name validate
|
||||
|
||||
# Show agent information
|
||||
PYTHONPATH=core:exports python -m your_agent_name info
|
||||
|
||||
# Run agent with input
|
||||
PYTHONPATH=core:exports python -m your_agent_name run --input '{
|
||||
"task": "Your input here"
|
||||
}'
|
||||
|
||||
# Run in mock mode (no LLM calls)
|
||||
PYTHONPATH=core:exports python -m your_agent_name run --mock --input '{...}'
|
||||
```
|
||||
|
||||
## Building New Agents and Run Flow
|
||||
|
||||
Build and run an agent using Claude Code CLI with the agent building skills:
|
||||
@@ -189,16 +169,25 @@ Build and run an agent using Claude Code CLI with the agent building skills:
|
||||
|
||||
This verifies agent-related Claude Code skills are available:
|
||||
|
||||
- `/building-agents-construction` - Step-by-step build guide
|
||||
- `/building-agents-core` - Fundamental concepts
|
||||
- `/building-agents-patterns` - Best practices
|
||||
- `/testing-agent` - Test and validate agents
|
||||
- `/agent-workflow` - Complete workflow
|
||||
- `/hive` - Complete workflow for building agents
|
||||
- `/hive-create` - Step-by-step build guide
|
||||
- `/hive-concepts` - Fundamental concepts
|
||||
- `/hive-patterns` - Best practices
|
||||
- `/hive-test` - Test and validate agents
|
||||
|
||||
### Cursor IDE Support
|
||||
|
||||
Skills are also available in Cursor. To enable:
|
||||
|
||||
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
|
||||
2. Run `MCP: Enable` to enable MCP servers
|
||||
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
|
||||
4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
|
||||
|
||||
### 2. Build an Agent
|
||||
|
||||
```
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
```
|
||||
|
||||
Follow the prompts to:
|
||||
@@ -213,7 +202,7 @@ This step creates the initial agent structure required for further development.
|
||||
### 3. Define Agent Logic
|
||||
|
||||
```
|
||||
claude> /building-agents-core
|
||||
claude> /hive-concepts
|
||||
```
|
||||
|
||||
Follow the prompts to:
|
||||
@@ -228,7 +217,7 @@ This step establishes the core concepts and rules needed before building an agen
|
||||
### 4. Apply Agent Patterns
|
||||
|
||||
```
|
||||
claude> /building-agents-patterns
|
||||
claude> /hive-patterns
|
||||
```
|
||||
|
||||
Follow the prompts to:
|
||||
@@ -243,8 +232,9 @@ This step helps optimize agent design before final testing.
|
||||
### 5. Test Your Agent
|
||||
|
||||
```
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
```
|
||||
|
||||
Follow the prompts to:
|
||||
|
||||
1. Generate test guidelines for constraints and success criteria
|
||||
@@ -254,21 +244,6 @@ Follow the prompts to:
|
||||
|
||||
This step verifies that the agent meets its goals before production use.
|
||||
|
||||
### 6. Agent Development Workflow (End-to-End)
|
||||
|
||||
```
|
||||
claude> /agent-workflow
|
||||
```
|
||||
|
||||
Follow the guided flow to:
|
||||
|
||||
1. Understand core agent concepts (optional)
|
||||
2. Build the agent structure step by step
|
||||
3. Apply best-practice design patterns (optional)
|
||||
4. Test and validate the agent against its goals
|
||||
|
||||
This workflow orchestrates all agent-building skills to take you from idea → production-ready agent.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "externally-managed-environment" error (PEP 668)
|
||||
@@ -279,7 +254,7 @@ This workflow orchestrates all agent-building skills to take you from idea → p
|
||||
|
||||
```bash
|
||||
# Create virtual environment
|
||||
python3 -m venv .venv
|
||||
uv venv
|
||||
|
||||
# Activate it
|
||||
source .venv/bin/activate # macOS/Linux
|
||||
@@ -293,7 +268,7 @@ Always activate the venv before running agents:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
PYTHONPATH=core:exports python -m your_agent_name demo
|
||||
PYTHONPATH=exports uv run python -m your_agent_name demo
|
||||
```
|
||||
|
||||
### PowerShell: “running scripts is disabled on this system”
|
||||
@@ -309,7 +284,7 @@ Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
|
||||
**Solution:** Install the core package:
|
||||
|
||||
```bash
|
||||
cd core && pip install -e .
|
||||
cd core && uv pip install -e .
|
||||
```
|
||||
|
||||
### "ModuleNotFoundError: No module named 'aden_tools'"
|
||||
@@ -317,7 +292,7 @@ cd core && pip install -e .
|
||||
**Solution:** Install the tools package:
|
||||
|
||||
```bash
|
||||
cd tools && pip install -e .
|
||||
cd tools && uv pip install -e .
|
||||
```
|
||||
|
||||
Or run the setup script:
|
||||
@@ -326,12 +301,6 @@ Or run the setup script:
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
Windows:
|
||||
|
||||
```powershell
|
||||
./scripts/setup-python.ps1
|
||||
```
|
||||
|
||||
### "ModuleNotFoundError: No module named 'openai.\_models'"
|
||||
|
||||
**Cause:** Outdated `openai` package (0.27.x) incompatible with `litellm`
|
||||
@@ -339,7 +308,7 @@ Windows:
|
||||
**Solution:** Upgrade openai:
|
||||
|
||||
```bash
|
||||
pip install --upgrade "openai>=1.0.0"
|
||||
uv pip install --upgrade "openai>=1.0.0"
|
||||
```
|
||||
|
||||
### "No module named 'your_agent_name'"
|
||||
@@ -351,7 +320,7 @@ pip install --upgrade "openai>=1.0.0"
|
||||
Linux/macOS:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports python -m your_agent_name validate
|
||||
PYTHONPATH=exports uv run python -m your_agent_name validate
|
||||
```
|
||||
|
||||
Windows:
|
||||
@@ -369,18 +338,12 @@ python -m support_ticket_agent validate
|
||||
|
||||
```bash
|
||||
# Remove broken installations
|
||||
pip uninstall -y framework tools
|
||||
uv pip uninstall framework tools
|
||||
|
||||
# Reinstall correctly
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
Windows:
|
||||
|
||||
```powershell
|
||||
./scripts/setup-python.ps1
|
||||
```
|
||||
|
||||
## Package Structure
|
||||
|
||||
The Hive framework consists of three Python packages:
|
||||
@@ -398,13 +361,18 @@ hive/
|
||||
│ ├── .venv/ # Created by quickstart.sh
|
||||
│ └── pyproject.toml
|
||||
│
|
||||
└── exports/ # Agent packages (user-created, gitignored)
|
||||
└── your_agent_name/ # Created via /building-agents-construction
|
||||
├── exports/ # Agent packages (user-created, gitignored)
|
||||
│ └── your_agent_name/ # Created via /hive-create
|
||||
│
|
||||
└── examples/
|
||||
└── templates/ # Pre-built template agents
|
||||
```
|
||||
|
||||
## Separate Virtual Environments
|
||||
|
||||
The project uses **separate virtual environments** for `core` and `tools` packages to:
|
||||
Hive primarily uses **uv** to create and manage separate virtual environments for `core` and `tools`.
|
||||
|
||||
The project uses separate virtual environments to:
|
||||
|
||||
- Isolate dependencies and avoid conflicts
|
||||
- Allow independent development and testing of each package
|
||||
@@ -412,11 +380,18 @@ The project uses **separate virtual environments** for `core` and `tools` packag
|
||||
|
||||
### How It Works
|
||||
|
||||
When you run `./quickstart.sh` or `uv sync` in each directory:
|
||||
When you run `./quickstart.sh`, `uv` sets up:
|
||||
|
||||
1. **core/.venv/** - Contains the `framework` package and its dependencies (anthropic, litellm, mcp, etc.)
|
||||
2. **tools/.venv/** - Contains the `aden_tools` package and its dependencies (beautifulsoup4, pandas, etc.)
|
||||
|
||||
If you need to refresh environments manually, use `uv`:
|
||||
|
||||
```bash
|
||||
cd core && uv sync
|
||||
cd ../tools && uv sync
|
||||
```
|
||||
|
||||
### Cross-Package Imports
|
||||
|
||||
The `core` and `tools` packages are **intentionally independent**:
|
||||
@@ -425,42 +400,38 @@ The `core` and `tools` packages are **intentionally independent**:
|
||||
- **Communication via MCP**: Tools are exposed to agents through MCP servers, not direct Python imports
|
||||
- **Runtime integration**: The agent runner loads tools via the MCP protocol at runtime
|
||||
|
||||
If you need to use both packages in a single script (e.g., for testing), you have two options:
|
||||
If you need to use both packages in a single script (e.g., for testing), prefer `uv run` with `PYTHONPATH`:
|
||||
|
||||
```bash
|
||||
# Option 1: Install both in a shared environment
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e core/ -e tools/
|
||||
|
||||
# Option 2: Use PYTHONPATH (for quick testing)
|
||||
PYTHONPATH=core:tools/src python your_script.py
|
||||
PYTHONPATH=tools/src uv run python your_script.py
|
||||
```
|
||||
|
||||
### MCP Server Configuration
|
||||
|
||||
The `.mcp.json` at project root configures MCP servers to use their respective virtual environments:
|
||||
The `.mcp.json` at project root configures MCP servers to run through `uv run` in each package directory:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"agent-builder": {
|
||||
"command": "core/.venv/bin/python",
|
||||
"args": ["-m", "framework.mcp.agent_builder_server"]
|
||||
"command": "uv",
|
||||
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
|
||||
"cwd": "core"
|
||||
},
|
||||
"tools": {
|
||||
"command": "tools/.venv/bin/python",
|
||||
"args": ["-m", "aden_tools.mcp_server", "--stdio"]
|
||||
"command": "uv",
|
||||
"args": ["run", "mcp_server.py", "--stdio"],
|
||||
"cwd": "tools"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This ensures each MCP server runs with its correct dependencies.
|
||||
This ensures each MCP server runs with the correct project environment managed by `uv`.
|
||||
|
||||
### Why PYTHONPATH is Required
|
||||
|
||||
The packages are installed in **editable mode** (`pip install -e`), which means:
|
||||
The packages are installed in **editable mode** (`uv pip install -e`), which means:
|
||||
|
||||
- `framework` and `aden_tools` are globally importable (no PYTHONPATH needed)
|
||||
- `exports` is NOT installed as a package (PYTHONPATH required)
|
||||
@@ -479,35 +450,33 @@ This design allows agents in `exports/` to be:
|
||||
./quickstart.sh
|
||||
```
|
||||
|
||||
Windows:
|
||||
|
||||
```powershell
|
||||
./scripts/setup-python.ps1
|
||||
```
|
||||
|
||||
### 2. Build Agent (Claude Code)
|
||||
|
||||
```
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
Enter goal: "Build an agent that processes customer support tickets"
|
||||
```
|
||||
|
||||
### 3. Validate Agent
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports python -m your_agent_name validate
|
||||
PYTHONPATH=exports uv run python -m your_agent_name validate
|
||||
```
|
||||
|
||||
### 4. Test Agent
|
||||
|
||||
```
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
```
|
||||
|
||||
### 5. Run Agent
|
||||
|
||||
```bash
|
||||
PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
|
||||
# Interactive dashboard
|
||||
hive tui
|
||||
|
||||
# Or run directly
|
||||
hive run exports/your_agent_name --input '{"task": "..."}'
|
||||
```
|
||||
|
||||
## IDE Setup
|
||||
@@ -555,11 +524,11 @@ export AGENT_STORAGE_PATH="/custom/storage"
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Framework Documentation:** [core/README.md](core/README.md)
|
||||
- **Tools Documentation:** [tools/README.md](tools/README.md)
|
||||
- **Example Agents:** [exports/](exports/)
|
||||
- **Agent Building Guide:** [.claude/skills/building-agents-construction/SKILL.md](.claude/skills/building-agents-construction/SKILL.md)
|
||||
- **Testing Guide:** [.claude/skills/testing-agent/SKILL.md](.claude/skills/testing-agent/SKILL.md)
|
||||
- **Framework Documentation:** [core/README.md](../core/README.md)
|
||||
- **Tools Documentation:** [tools/README.md](../tools/README.md)
|
||||
- **Example Agents:** [exports/](../exports/)
|
||||
- **Agent Building Guide:** [.claude/skills/hive-create/SKILL.md](../.claude/skills/hive-create/SKILL.md)
|
||||
- **Testing Guide:** [.claude/skills/hive-test/SKILL.md](../.claude/skills/hive-test/SKILL.md)
|
||||
|
||||
## Contributing
|
||||
|
||||
@@ -568,7 +537,7 @@ When contributing agent packages:
|
||||
1. Place agents in `exports/agent_name/`
|
||||
2. Follow the standard agent structure (see existing agents)
|
||||
3. Include README.md with usage instructions
|
||||
4. Add tests if using `/testing-agent`
|
||||
4. Add tests if using `/hive-test`
|
||||
5. Document required environment variables
|
||||
|
||||
## Support
|
||||
+37
-36
@@ -22,7 +22,7 @@ cd hive
|
||||
./quickstart.sh
|
||||
|
||||
# 3. Verify installation (optional, quickstart.sh already verifies)
|
||||
python -c "import framework; import aden_tools; print('✓ Setup complete')"
|
||||
uv run python -c "import framework; import aden_tools; print('✓ Setup complete')"
|
||||
```
|
||||
|
||||
## Building Your First Agent
|
||||
@@ -33,10 +33,11 @@ python -c "import framework; import aden_tools; print('✓ Setup complete')"
|
||||
# Setup already done via quickstart.sh above
|
||||
|
||||
# Start Claude Code and build an agent
|
||||
claude> /building-agents-construction
|
||||
claude> /hive
|
||||
```
|
||||
|
||||
Follow the interactive prompts to:
|
||||
|
||||
1. Define your agent's goal
|
||||
2. Design the workflow (nodes and edges)
|
||||
3. Generate the agent package
|
||||
@@ -52,10 +53,10 @@ mkdir -p exports/my_agent
|
||||
|
||||
# Create your agent structure
|
||||
cd exports/my_agent
|
||||
# Create agent.json, tools.py, README.md (see DEVELOPER.md for structure)
|
||||
# Create agent.json, tools.py, README.md (see developer-guide.md for structure)
|
||||
|
||||
# Validate the agent
|
||||
PYTHONPATH=core:exports python -m my_agent validate
|
||||
PYTHONPATH=exports uv run python -m my_agent validate
|
||||
```
|
||||
|
||||
### Option 3: Manual Code-First (Minimal Example)
|
||||
@@ -67,7 +68,7 @@ If you prefer to start with code rather than CLI wizards, check out the manual a
|
||||
cat core/examples/manual_agent.py
|
||||
|
||||
# Run it (no API keys required)
|
||||
PYTHONPATH=core python core/examples/manual_agent.py
|
||||
uv run python core/examples/manual_agent.py
|
||||
```
|
||||
|
||||
This demonstrates the core runtime loop using pure Python functions, skipping the complexity of LLM setup and file-based configuration.
|
||||
@@ -87,7 +88,8 @@ hive/
|
||||
│ │ ├── runtime/ # Runtime environment
|
||||
│ │ ├── schemas/ # Data schemas
|
||||
│ │ ├── storage/ # File-based persistence
|
||||
│ │ └── testing/ # Testing utilities
|
||||
│ │ ├── testing/ # Testing utilities
|
||||
│ │ └── tui/ # Terminal UI dashboard
|
||||
│ └── pyproject.toml # Package metadata
|
||||
│
|
||||
├── tools/ # MCP Tools Package
|
||||
@@ -99,15 +101,18 @@ hive/
|
||||
│ └── mcp_server.py # HTTP MCP server
|
||||
│
|
||||
├── exports/ # Agent Packages (user-generated, not in repo)
|
||||
│ └── your_agent/ # Your agents created via /building-agents
|
||||
│ └── your_agent/ # Your agents created via /hive
|
||||
│
|
||||
├── examples/
|
||||
│ └── templates/ # Pre-built template agents
|
||||
│
|
||||
├── .claude/ # Claude Code Skills
|
||||
│ └── skills/
|
||||
│ ├── agent-workflow/
|
||||
│ ├── building-agents-construction/
|
||||
│ ├── building-agents-core/
|
||||
│ ├── building-agents-patterns/
|
||||
│ └── testing-agent/
|
||||
│ ├── hive/
|
||||
│ ├── hive-create/
|
||||
│ ├── hive-concepts/
|
||||
│ ├── hive-patterns/
|
||||
│ └── hive-test/
|
||||
│
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
@@ -115,19 +120,15 @@ hive/
|
||||
## Running an Agent
|
||||
|
||||
```bash
|
||||
# Validate agent structure
|
||||
PYTHONPATH=core:exports python -m my_agent validate
|
||||
# Browse and run agents interactively (Recommended)
|
||||
hive tui
|
||||
|
||||
# Show agent information
|
||||
PYTHONPATH=core:exports python -m my_agent info
|
||||
# Run a specific agent
|
||||
hive run exports/my_agent --input '{"task": "Your input here"}'
|
||||
|
||||
# Run agent with input
|
||||
PYTHONPATH=core:exports python -m my_agent run --input '{
|
||||
"task": "Your input here"
|
||||
}'
|
||||
# Run with TUI dashboard
|
||||
hive run exports/my_agent --tui
|
||||
|
||||
# Run in mock mode (no LLM calls)
|
||||
PYTHONPATH=core:exports python -m my_agent run --mock --input '{...}'
|
||||
```
|
||||
|
||||
## API Keys Setup
|
||||
@@ -142,6 +143,7 @@ export BRAVE_SEARCH_API_KEY="your-key-here" # Optional, for web search
|
||||
```
|
||||
|
||||
Get your API keys:
|
||||
|
||||
- **Anthropic**: [console.anthropic.com](https://console.anthropic.com/)
|
||||
- **OpenAI**: [platform.openai.com](https://platform.openai.com/)
|
||||
- **Brave Search**: [brave.com/search/api](https://brave.com/search/api/)
|
||||
@@ -150,23 +152,24 @@ Get your API keys:
|
||||
|
||||
```bash
|
||||
# Using Claude Code
|
||||
claude> /testing-agent
|
||||
claude> /hive-test
|
||||
|
||||
# Or manually
|
||||
PYTHONPATH=core:exports python -m my_agent test
|
||||
PYTHONPATH=exports uv run python -m my_agent test
|
||||
|
||||
# Run with specific test type
|
||||
PYTHONPATH=core:exports python -m my_agent test --type constraint
|
||||
PYTHONPATH=core:exports python -m my_agent test --type success
|
||||
PYTHONPATH=exports uv run python -m my_agent test --type constraint
|
||||
PYTHONPATH=exports uv run python -m my_agent test --type success
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Detailed Setup**: See [ENVIRONMENT_SETUP.md](../ENVIRONMENT_SETUP.md)
|
||||
2. **Developer Guide**: See [DEVELOPER.md](../DEVELOPER.md)
|
||||
3. **Build Agents**: Use `/building-agents` skill in Claude Code
|
||||
4. **Custom Tools**: Learn to integrate MCP servers
|
||||
5. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
|
||||
1. **TUI Dashboard**: Run `hive tui` to explore agents interactively
|
||||
2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
|
||||
3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
|
||||
4. **Build Agents**: Use `/hive` skill in Claude Code
|
||||
5. **Custom Tools**: Learn to integrate MCP servers
|
||||
6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@@ -175,7 +178,7 @@ PYTHONPATH=core:exports python -m my_agent test --type success
|
||||
```bash
|
||||
# Reinstall framework package
|
||||
cd core
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### ModuleNotFoundError: No module named 'aden_tools'
|
||||
@@ -183,7 +186,7 @@ pip install -e .
|
||||
```bash
|
||||
# Reinstall tools package
|
||||
cd tools
|
||||
pip install -e .
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### LLM API Errors
|
||||
@@ -192,8 +195,6 @@ pip install -e .
|
||||
# Verify API key is set
|
||||
echo $ANTHROPIC_API_KEY
|
||||
|
||||
# Run in mock mode to test without API
|
||||
PYTHONPATH=core:exports python -m my_agent run --mock --input '{...}'
|
||||
```
|
||||
|
||||
### Package Installation Issues
|
||||
@@ -209,4 +210,4 @@ pip uninstall -y framework tools
|
||||
- **Documentation**: Check the `/docs` folder
|
||||
- **Issues**: [github.com/adenhq/hive/issues](https://github.com/adenhq/hive/issues)
|
||||
- **Discord**: [discord.com/invite/MXE49hrKDk](https://discord.com/invite/MXE49hrKDk)
|
||||
- **Build Agents**: Use `/building-agents` skill to create agents
|
||||
- **Build Agents**: Use `/hive` skill to create agents
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user