Merge branch 'main' into feature/automated-testing-skill

This commit is contained in:
Timothy
2026-02-09 20:16:11 -08:00
35 changed files with 2255 additions and 322 deletions
+484 -44
View File
@@ -1,10 +1,10 @@
--- ---
name: hive-create name: hive-create
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent. description: Step-by-step guide for building goal-driven agents. Qualifies use cases first (the good, bad, and ugly), then creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0 license: Apache-2.0
metadata: metadata:
author: hive author: hive
version: "2.1" version: "2.2"
type: procedural type: procedural
part_of: hive part_of: hive
requires: hive-concepts requires: hive-concepts
@@ -14,15 +14,53 @@ metadata:
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.** **THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 1call the MCP tools listed in Step 1 as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 1 now. **CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 0determine the build path as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 0 now.
--- ---
## STEP 1: Initialize Build Environment ## STEP 0: Choose Build Path
**If the user has already indicated whether they want to build from scratch or from a template, skip this question and proceed to the appropriate step.**
Otherwise, ask:
```
AskUserQuestion(questions=[{
"question": "How would you like to build your agent?",
"header": "Build Path",
"options": [
{"label": "From scratch", "description": "Design goal, nodes, and graph collaboratively from nothing"},
{"label": "From a template", "description": "Start from a working sample agent and customize it"}
],
"multiSelect": false
}])
```
- If **From scratch**: Proceed to STEP 1A
- If **From a template**: Proceed to STEP 1B
---
## STEP 1A: Initialize Build Environment (From Scratch)
**EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed): **EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):
1. Register the hive-tools MCP server: 1. Check for existing sessions:
```
mcp__agent-builder__list_sessions()
```
- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to step 3.
- If no matching session exists, proceed to step 2.
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Register the hive-tools MCP server:
``` ```
mcp__agent-builder__add_mcp_server( mcp__agent-builder__add_mcp_server(
@@ -35,45 +73,368 @@ mcp__agent-builder__add_mcp_server(
) )
``` ```
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case): 4. Discover available tools:
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Discover available tools:
``` ```
mcp__agent-builder__list_mcp_tools() mcp__agent-builder__list_mcp_tools()
``` ```
4. Create the package directory: 5. Create the package directory:
```bash ```bash
mkdir -p exports/AGENT_NAME/nodes mkdir -p exports/AGENT_NAME/nodes
``` ```
**Save the tool list for step 3** — you will need it for node design in STEP 3. **Save the tool list for STEP 4** — you will need it for node design.
**THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on). **THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).
--- ---
## STEP 1B: Initialize Build Environment (From Template)
**EXECUTE THESE STEPS NOW:**
### 1B.1: Discover available templates
List the template directories and read each template's `agent.json` to get its name and description:
```bash
ls examples/templates/
```
For each directory found, read `examples/templates/TEMPLATE_DIR/agent.json` with the Read tool and extract:
- `agent.name` — the template's display name
- `agent.description` — what the template does
### 1B.2: Present templates to user
Show the user a table of available templates:
> **Available Templates:**
>
> | # | Template | Description |
> |---|----------|-------------|
> | 1 | [name from agent.json] | [description from agent.json] |
> | 2 | ... | ... |
Then ask the user to pick a template and provide a name for their new agent:
```
AskUserQuestion(questions=[{
"question": "Which template would you like to start from?",
"header": "Template",
"options": [
{"label": "[template 1 name]", "description": "[template 1 description]"},
{"label": "[template 2 name]", "description": "[template 2 description]"},
...
],
"multiSelect": false
}, {
"question": "What should the new agent be named? (snake_case)",
"header": "Agent Name",
"options": [
{"label": "Use template name", "description": "Keep the original template name as-is"},
{"label": "Custom name", "description": "I'll provide a new snake_case name"}
],
"multiSelect": false
}])
```
### 1B.3: Copy template to exports
```bash
cp -r examples/templates/TEMPLATE_DIR exports/NEW_AGENT_NAME
```
### 1B.4: Create session and register MCP (same logic as STEP 1A)
First, check for existing sessions:
```
mcp__agent-builder__list_sessions()
```
- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to `list_mcp_tools`.
- If no matching session exists, create one:
```
mcp__agent-builder__create_session(name="NEW_AGENT_NAME")
```
Then register MCP and discover tools:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="uv",
args='["run", "python", "mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
```
mcp__agent-builder__list_mcp_tools()
```
### 1B.5: Load template into builder session
Import the entire agent definition in one call:
```
mcp__agent-builder__import_from_export(agent_json_path="exports/NEW_AGENT_NAME/agent.json")
```
This reads the agent.json and populates the builder session with the goal, all nodes, and all edges.
**THEN immediately proceed to STEP 2.**
---
## STEP 2: Define Goal Together with User ## STEP 2: Define Goal Together with User
**A responsible engineer doesn't jump into building. First, understand the problem and be transparent about what the framework can and cannot do.**
**If starting from a template**, the goal is already loaded in the builder session. Present the existing goal to the user using the format below and ask for approval. Skip the collaborative drafting questions — go straight to presenting and asking "Do you approve this goal, or would you like to modify it?"
**If the user has NOT already described what they want to build**, start by asking what kind of agent they have in mind:
```
AskUserQuestion(questions=[{
"question": "What kind of agent do you want to build? Select an option below, or choose 'Other' to describe your own.",
"header": "Agent type",
"options": [
{"label": "Data collection", "description": "Gathers information from the web, analyzes it, and produces a report or sends outreach (e.g. market research, news digest, email campaigns, competitive analysis)"},
{"label": "Workflow automation", "description": "Automates a multi-step business process end-to-end (e.g. lead qualification, content publishing pipeline, data entry)"},
{"label": "Personal assistant", "description": "Handles recurring tasks or monitors for events and acts on them (e.g. daily briefings, meeting prep, file organization)"}
],
"multiSelect": false
}])
```
Use the user's selection (or their custom description if they chose "Other") as context when shaping the goal below. If the user already described what they want before this step, skip the question and proceed directly.
**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it. **DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.
**START by asking the user to help shape the goal:** ### 2a: Fast Discovery (3-8 Turns)
> I've set up the build environment and discovered [N] available tools. Let's define the goal for your agent together. **The core principle**: Discovery should feel like progress, not paperwork. The stakeholder should walk away feeling like you understood them faster than anyone else would have.
>
> To get started, can you help me understand:
>
> 1. **What should this agent accomplish?** (the core purpose)
> 2. **How will we know it succeeded?** (what does "done" look like)
> 3. **Are there any hard constraints?** (things it must never do, quality bars, etc.)
**WAIT for the user to respond.** Use their input to draft: **Communication sytle**: Be concise. Say less. Mean more. Impatient stakeholders don't want a wall of text — they want to know you get it. Every sentence you say should either move the conversation forward or prove you understood something. If it does neither, cut it.
**Ask Question Rules: Respect Their Time.** Every question must earn its place by:
1. **Preventing a costly wrong turn** — you're about to build the wrong thing
2. **Unlocking a shortcut** — their answer lets you simplify the design
3. **Surfacing a dealbreaker** — there's a constraint that changes everything
4. **Provide Options** - Provide options to your questions if possible, but also always allow the user to type something beyong the options.
If a question doesn't do one of these, don't ask it. Make an assumption, state it, and move on.
---
#### 2a.1: Let Them Talk, But Listen Like an Architect
When the stakeholder describes what they want, don't just hear the words — listen for the architecture underneath. While they talk, mentally construct:
- **The actors**: Who are the people/systems involved?
- **The trigger**: What kicks off the workflow?
- **The core loop**: What's the main thing that happens repeatedly?
- **The output**: What's the valuable thing produced at the end?
- **The pain**: What about today's situation is broken, slow, or missing?
You are extracting a **domain model** from natural language in real time. Most stakeholders won't give you this structure explicitly — they'll give you a story. Your job is to hear the structure inside the story.
| They say... | You're hearing... |
|-------------|-------------------|
| Nouns they repeat | Your entities |
| Verbs they emphasize | Your core operations |
| Frustrations they mention | Your design constraints |
| Workarounds they describe | What the system must replace |
| People they name | Your user types |
---
#### 2a.2: Use Domain Knowledge to Fill In the Blanks
You have broad knowledge of how systems work. Use it aggressively.
If they say "I need a research agent," you already know it probably involves: search, summarization, source tracking, and iteration. Don't ask about each — use them as your starting mental model and let their specifics override your defaults.
If they say "I need to monitor files and alert me," you know this probably involves: watch patterns, triggers, notifications, and state tracking.
**The key move**: Take your general knowledge of the domain and merge it with the specifics they've given you. The result is a draft understanding that's 60-80% right before you've asked a single question. Your questions close the remaining 20-40%.
---
#### 2a.3: Play Back a Proposed Model (Not a List of Questions)
After listening, present a **concrete picture** of what you think they need. Make it specific enough that they can spot what's wrong.
**Pattern: "Here's what I heard — tell me where I'm off"**
> "OK here's how I'm picturing this: [User type] needs to [core action]. Right now they're [current painful workflow]. What you want is [proposed solution that replaces the pain].
>
> The way I'd structure this: [key entities] connected by [key relationships], with the main flow being [trigger → steps → outcome].
>
> For the MVP, I'd focus on [the one thing that delivers the most value] and hold off on [things that can wait].
>
> Before I start — [1-2 specific questions you genuinely can't infer]."
Why this works:
- **Proves you were listening** — they don't feel like they have to repeat themselves
- **Shows competence** — you're already thinking in systems
- **Fast to correct** — "no, it's more like X" takes 10 seconds vs. answering 15 questions
- **Creates momentum** — heading toward building, not more talking
---
#### 2a.4: Ask Only What You Cannot Infer
Your questions should be **narrow, specific, and consequential**. Never ask what you could answer yourself.
**Good questions** (high-stakes, can't infer):
- "Who's the primary user — you or your end customers?"
- "Is this replacing a spreadsheet, or is there literally nothing today?"
- "Does this need to integrate with anything, or standalone?"
- "Is there existing data to migrate, or starting fresh?"
**Bad questions** (low-stakes, inferable):
- "What should happen if there's an error?" *(handle gracefully, obviously)*
- "Should it have search?" *(if there's a list, yes)*
- "How should we handle permissions?" *(follow standard patterns)*
- "What tools should I use?" *(your call, not theirs)*
---
#### Conversation Flow (3-5 Turns)
| Turn | Who | What |
|------|-----|------|
| 1 | User | Describes what they need |
| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions max. |
| 3 | User | Corrects, confirms, or adds detail |
| 4 | Agent | Adjusts model, confirms MVP scope, states assumptions, declares starting point |
| *(5)* | *(Only if Turn 3 revealed something that fundamentally changes the approach)* |
**AFTER the conversation, IMMEDIATELY proceed to 2b. DO NOT skip to building.**
---
#### Anti-Patterns
| Don't | Do Instead |
|-------|------------|
| Open with a list of questions | Open with what you understood from their request |
| "What are your requirements?" | "Here's what I think you need — am I right?" |
| Ask about every edge case | Handle with smart defaults, flag in summary |
| 10+ turn discovery conversation | 3-8 turns. Start building, iterate with real software. |
| Being lazy nd not understand what user want to achieve | Understand "what" and "why |
| Ask for permission to start | State your plan and start |
| Wait for certainty | Start at 80% confidence, iterate the rest |
| Ask what tech/tools to use | That's your job. Decide, disclose, move on. |
---
### 2b: Capability Assessment
**After the user responds, analyze the fit.** Present this assessment honestly:
> **Framework Fit Assessment**
>
> Based on what you've described, here's my honest assessment of how well this framework fits your use case:
>
> **What Works Well (The Good):**
> - [List 2-4 things the framework handles well for this use case]
> - Examples: multi-turn conversations, human-in-the-loop review, tool orchestration, structured outputs
>
> **Limitations to Be Aware Of (The Bad):**
> - [List 2-3 limitations that apply but are workable]
> - Examples: LLM latency means not suitable for sub-second responses, context window limits for very large documents, cost per run for heavy tool usage
>
> **Potential Deal-Breakers (The Ugly):**
> - [List any significant challenges or missing capabilities — be honest]
> - Examples: no tool available for X, would require custom MCP server, framework not designed for Y
**Be specific.** Reference the actual tools discovered in Step 1. If the user needs `send_email` but it's not available, say so. If they need real-time streaming from a database, explain that's not how the framework works.
### 2c: Gap Analysis
**Identify specific gaps** between what the user wants and what you can deliver:
| Requirement | Framework Support | Gap/Workaround |
|-------------|-------------------|----------------|
| [User need] | [✅ Supported / ⚠️ Partial / ❌ Not supported] | [How to handle or why it's a problem] |
**Examples of gaps to identify:**
- Missing tools (user needs X, but only Y and Z are available)
- Scope issues (user wants to process 10,000 items, but LLM rate limits apply)
- Interaction mismatches (user wants CLI-only, but agent is designed for TUI)
- Data flow issues (user needs to persist state across runs, but sessions are isolated)
- Latency requirements (user needs instant responses, but LLM calls take seconds)
### 2d: Recommendation
**Give a clear recommendation:**
> **My Recommendation:**
>
> [One of these three:]
>
> **✅ PROCEED** — This is a good fit. The framework handles your core needs well. [List any minor caveats.]
>
> **⚠️ PROCEED WITH SCOPE ADJUSTMENT** — This can work, but we should adjust: [specific changes]. Without these adjustments, you'll hit [specific problems].
>
> **🛑 RECONSIDER** — This framework may not be the right tool for this job because [specific reasons]. Consider instead: [alternatives — simpler script, different framework, custom solution].
### 2e: Get Explicit Acknowledgment
**CALL AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Based on this assessment, how would you like to proceed?",
"header": "Proceed",
"options": [
{"label": "Proceed as described", "description": "I understand the limitations, let's build it"},
{"label": "Adjust scope", "description": "Let's modify the requirements to fit better"},
{"label": "More questions", "description": "I have questions about the assessment"},
{"label": "Reconsider", "description": "Maybe this isn't the right approach"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Proceed**: Move to STEP 3
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
- If **More questions**: Answer them honestly, then ask again
- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, that's their informed choice
---
## STEP 3: Define Goal Together with User
**Now that the use case is qualified, collaborate on the goal definition.**
**START by synthesizing what you learned:**
> Based on our discussion, here's my understanding of the goal:
>
> **Core purpose:** [what you understood from 2a]
> **Success looks like:** [what you inferred]
> **Key constraints:** [what you inferred]
>
> Let me refine this with you:
>
> 1. **What should this agent accomplish?** (confirm or correct my understanding)
> 2. **How will we know it succeeded?** (what specific outcomes matter)
> 3. **Are there any hard constraints?** (things it must never do, quality bars)
**WAIT for the user to respond.** Use their input (and the agent type they selected) to draft:
- Goal ID (kebab-case) - Goal ID (kebab-case)
- Goal name - Goal name
@@ -115,12 +476,14 @@ AskUserQuestion(questions=[{
**WAIT for user response.** **WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3 - If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 4
- If **Modify**: Ask what they want to change, update the draft, ask again - If **Modify**: Ask what they want to change, update the draft, ask again
--- ---
## STEP 3: Design Conceptual Nodes ## STEP 4: Design Conceptual Nodes
**If starting from a template**, the nodes are already loaded in the builder session. Present the existing nodes using the table format below and ask for approval. Skip the design phase.
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist. **BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
@@ -173,12 +536,14 @@ AskUserQuestion(questions=[{
**WAIT for user response.** **WAIT for user response.**
- If **Approve**: Proceed to STEP 4 - If **Approve**: Proceed to STEP 5
- If **Modify**: Ask what they want to change, update design, ask again - If **Modify**: Ask what they want to change, update design, ask again
--- ---
## STEP 4: Design Full Graph and Review ## STEP 5: Design Full Graph and Review
**If starting from a template**, the edges are already loaded in the builder session. Render the existing graph as ASCII art and present it to the user for approval. Skip the edge design phase.
**DETERMINE the edges** connecting the approved nodes. For each edge: **DETERMINE the edges** connecting the approved nodes. For each edge:
@@ -288,16 +653,37 @@ AskUserQuestion(questions=[{
**WAIT for user response.** **WAIT for user response.**
- If **Approve**: Proceed to STEP 5 - If **Approve**: Proceed to STEP 6
- If **Modify**: Ask what they want to change, update the graph, re-render, ask again - If **Modify**: Ask what they want to change, update the graph, re-render, ask again
--- ---
## STEP 5: Build the Agent ## STEP 6: Build the Agent
**NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph. **NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.
### 5a: Register nodes and edges with MCP ### 6a: Register nodes and edges with MCP
**If starting from a template**, the copied files will be overwritten with the approved design. You MUST replace every occurrence of the old template name with the new agent name. Here is the complete checklist — miss NONE of these:
| File | What to rename |
|------|---------------|
| `config.py` | `AgentMetadata.name` — the display name shown in TUI agent selection |
| `config.py` | `AgentMetadata.description` — agent description |
| `agent.py` | Module docstring (line 1) |
| `agent.py` | `class OldNameAgent:``class NewNameAgent:` |
| `agent.py` | `GraphSpec(id="old-name-graph")``GraphSpec(id="new-name-graph")` — shown in TUI status bar |
| `agent.py` | Storage path: `Path.home() / ".hive" / "agents" / "old_name"``"new_name"` |
| `__main__.py` | Module docstring (line 1) |
| `__main__.py` | `from .agent import ... OldNameAgent``NewNameAgent` |
| `__main__.py` | CLI help string in `def cli()` docstring |
| `__main__.py` | All `OldNameAgent()` instantiations |
| `__main__.py` | Storage path (duplicated from agent.py) |
| `__main__.py` | Shell banner string (e.g. `"=== Old Name Agent ==="`) |
| `__init__.py` | Package docstring |
| `__init__.py` | `from .agent import OldNameAgent` import |
| `__init__.py` | `__all__` list entry |
**If starting from a template and no modifications were made in Steps 2-5**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).
**FOR EACH approved node**, call: **FOR EACH approved node**, call:
@@ -337,9 +723,9 @@ mcp__agent-builder__validate_graph()
``` ```
- If invalid: Fix the issues and re-validate - If invalid: Fix the issues and re-validate
- If valid: Continue to 5b - If valid: Continue to 6b
### 5b: Write Python package files ### 6b: Write Python package files
**EXPORT the graph data:** **EXPORT the graph data:**
@@ -399,7 +785,7 @@ mcp__agent-builder__export_graph()
--- ---
## STEP 6: Verify and Test ## STEP 7: Verify and Test
**RUN validation:** **RUN validation:**
@@ -525,16 +911,70 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
--- ---
## REFERENCE: Framework Capabilities for Qualification
Use this reference during STEP 2 to give accurate, honest assessments.
### What the Framework Does Well (The Good)
| Capability | Description |
|------------|-------------|
| Multi-turn conversations | Client-facing nodes stream to users and block for input |
| Human-in-the-loop review | Approval checkpoints with feedback loops back to earlier nodes |
| Tool orchestration | LLM can call multiple tools, framework handles execution |
| Structured outputs | `set_output` produces validated, typed outputs |
| Parallel execution | Fan-out/fan-in for concurrent node execution |
| Context management | Automatic compaction and spillover for large data |
| Error recovery | Retry logic, judges, and feedback edges for self-correction |
| Session persistence | State saved to disk, resumable sessions |
### Framework Limitations (The Bad)
| Limitation | Impact | Workaround |
|------------|--------|------------|
| LLM latency | 2-10+ seconds per turn | Not suitable for real-time/low-latency needs |
| Context window limits | ~128K tokens max | Use data tools for spillover, design for chunking |
| Cost per run | LLM API calls cost money | Budget planning, caching where possible |
| Rate limits | API throttling on heavy usage | Backoff, queue management |
| Node boundaries lose context | Outputs must be serialized | Prefer fewer, richer nodes |
| Single-threaded within node | One LLM call at a time per node | Use fan-out for parallelism |
### Not Designed For (The Ugly)
| Use Case | Why It's Problematic | Alternative |
|----------|---------------------|-------------|
| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
| Multi-user concurrency | Single-user session model | Separate agent instances per user |
### Tool Availability Reality Check
**Before promising any capability, check `list_mcp_tools()`.** Common gaps:
- **Email**: May not have `send_email` — check before promising email automation
- **Calendar**: May not have calendar APIs — check before promising scheduling
- **Database**: May not have SQL tools — check before promising data queries
- **File system**: Has data tools but not arbitrary filesystem access
- **External APIs**: Depends entirely on what MCP servers are registered
---
## COMMON MISTAKES TO AVOID ## COMMON MISTAKES TO AVOID
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first 1. **Skipping use case qualification** - A responsible engineer qualifies the use case BEFORE building. Be transparent about what works, what doesn't, and what's problematic
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list 2. **Hiding limitations** - Don't oversell the framework. If a tool doesn't exist or a capability is missing, say so upfront
3. **Skipping validation** - Always validate nodes and graph before proceeding 3. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
4. **Not waiting for approval** - Always ask user before major steps 4. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
5. **Displaying this file** - Execute the steps, don't show documentation 5. **Skipping validation** - Always validate nodes and graph before proceeding
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes) 6. **Not waiting for approval** - Always ask user before major steps
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output 7. **Displaying this file** - Execute the steps, don't show documentation
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node 8. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code 9. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code 10. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
11. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]` 11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
@@ -1,33 +1,8 @@
"""Runtime configuration.""" """Runtime configuration."""
import json from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
from framework.config import RuntimeConfig
default_config = RuntimeConfig() default_config = RuntimeConfig()
+27 -15
View File
@@ -19,14 +19,18 @@ metadata:
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.** **THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, determine what the user needs and invoke the appropriate skill NOW: When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
- **User wants to build an agent** → Invoke `/hive-create` immediately
- **User wants to test an agent** → Invoke `/hive-test` immediately ```
- **User wants to learn concepts** → Invoke `/hive-concepts` immediately Use AskUserQuestion with these options:
- **User wants patterns/optimization** → Invoke `/hive-patterns` immediately - "Build a new agent" → Then invoke /hive-create
- **User wants to set up credentials** → Invoke `/hive-credentials` immediately - "Test an existing agent" → Then invoke /hive-test
- **User has a failing/broken agent** → Invoke `/hive-debugger` immediately - "Learn agent concepts" → Then invoke /hive-concepts
- **Unclear what user needs** → Ask the user (do NOT explore the codebase to figure it out) - "Optimize agent design" → Then invoke /hive-patterns
- "Set up credentials" → Then invoke /hive-credentials
- "Debug a failing agent" → Then invoke /hive-debugger
- "Other" (please describe what you want to achieve)
```
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that. **DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
@@ -73,7 +77,6 @@ Use this meta-skill when:
## Phase 0: Understand Concepts (Optional) ## Phase 0: Understand Concepts (Optional)
**Duration**: 5-10 minutes
**Skill**: `/hive-concepts` **Skill**: `/hive-concepts`
**Input**: Questions about agent architecture **Input**: Questions about agent architecture
@@ -95,9 +98,8 @@ Use this meta-skill when:
## Phase 1: Build Agent Structure ## Phase 1: Build Agent Structure
**Duration**: 15-30 minutes
**Skill**: `/hive-create` **Skill**: `/hive-create`
**Input**: User requirements ("Build an agent that...") **Input**: User requirements ("Build an agent that...") or a template to start from
### What This Phase Does ### What This Phase Does
@@ -166,7 +168,6 @@ exports/agent_name/
## Phase 1.5: Optimize Design (Optional) ## Phase 1.5: Optimize Design (Optional)
**Duration**: 10-15 minutes
**Skill**: `/hive-patterns` **Skill**: `/hive-patterns`
**Input**: Completed agent structure **Input**: Completed agent structure
@@ -191,14 +192,11 @@ exports/agent_name/
## Phase 2: Test & Validate ## Phase 2: Test & Validate
**Duration**: 20-40 minutes
**Skill**: `/hive-test` **Skill**: `/hive-test`
**Input**: Working agent from Phase 1 **Input**: Working agent from Phase 1
### What This Phase Does ### What This Phase Does
### What This Phase Does
Guides the creation and execution of a comprehensive test suite: Guides the creation and execution of a comprehensive test suite:
- Constraint tests - Constraint tests
- Success criteria tests - Success criteria tests
@@ -289,6 +287,19 @@ User: "Build an agent (first time)"
→ Done: Production-ready agent → Done: Production-ready agent
``` ```
### Pattern 1c: Build from Template
```
User: "Build an agent based on the deep research template"
→ Use /hive-create
→ Select "From a template" path
→ Pick template, name new agent
→ Review/modify goal, nodes, graph
→ Agent exported with customizations
→ Use /hive-test
→ Done: Customized agent
```
### Pattern 2: Test Existing Agent ### Pattern 2: Test Existing Agent
``` ```
@@ -492,6 +503,7 @@ The workflow is **flexible** - skip phases as needed, iterate freely, and adapt
- Have clear requirements - Have clear requirements
- Ready to write code - Ready to write code
- Want step-by-step guidance - Want step-by-step guidance
- Want to start from an existing template and customize it
**Choose hive-patterns when:** **Choose hive-patterns when:**
- Agent structure complete - Agent structure complete
-65
View File
@@ -1,65 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
### Changed
### Fixed
### Security
## [0.4.2] - 2026-02-08
### Added
- Resumable sessions: agents now automatically save state and can resume after interruptions
- `/resume` command in TUI to resume latest paused/failed session
- `/resume <session_id>` command to resume specific sessions
- `/sessions` command to list all sessions for current agent
- `--resume-session` CLI flag for automatic session resumption on startup
- `--checkpoint <checkpoint_id>` CLI flag for checkpoint-based recovery
- Ctrl+Z now immediately pauses execution with full state capture
- `/pause` command for immediate pause during execution
- Session state persistence: memory, execution path, node positions, visit counts
- Unified session storage at `~/.hive/agents/{agent_name}/sessions/`
- Automatic memory restoration on resume with full conversation history
### Changed
- TUI quit now pauses execution and saves state instead of cancelling
- Pause operations now use immediate task cancellation instead of waiting for node boundaries
- Session cleanup timeout increased from 0.5s to 5s to ensure proper state saving
- Session status now tracked as: active, paused, completed, failed, cancelled
### Deprecated
- Pause nodes (use client-facing EventLoopNodes instead)
- `request_pause()` method (replaced with immediate task cancellation)
### Removed
- N/A
### Fixed
- Memory persistence: ExecutionResult.session_state["memory"] now populated at all exit points
- Resume now starts at correct paused_at node instead of intake node
- Visit count double-counting on resume (paused node count now properly adjusted)
- Session selection now picks most recent session instead of oldest
- Quit state save failures due to insufficient timeout
- Ctrl+Z pause implementation (was only showing notification without pausing)
- Empty memory on resume by ensuring session_state["memory"] is properly populated
### Security
- N/A
## [0.1.0] - 2025-01-13
### Added
- Initial release
[Unreleased]: https://github.com/adenhq/hive/compare/v0.4.2...HEAD
[0.4.2]: https://github.com/adenhq/hive/compare/v0.4.0...v0.4.2
[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
+1
View File
@@ -1,4 +1,5 @@
exports/ exports/
docs/ docs/
.agent-builder-sessions/
.pytest_cache/ .pytest_cache/
**/__pycache__/ **/__pycache__/
+64
View File
@@ -0,0 +1,64 @@
"""Shared Hive configuration utilities.
Centralises reading of ~/.hive/configuration.json so that the runner
and every agent template share one implementation instead of copy-pasting
helper functions.
"""
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.graph.edge import DEFAULT_MAX_TOKENS
# ---------------------------------------------------------------------------
# Low-level config file access
# ---------------------------------------------------------------------------
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
# ---------------------------------------------------------------------------
# Derived helpers
# ---------------------------------------------------------------------------
def get_preferred_model() -> str:
"""Return the user's preferred LLM model string (e.g. 'anthropic/claude-sonnet-4-20250514')."""
llm = get_hive_config().get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
def get_max_tokens() -> int:
"""Return the configured max_tokens, falling back to DEFAULT_MAX_TOKENS."""
return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# ---------------------------------------------------------------------------
# RuntimeConfig shared across agent templates
# ---------------------------------------------------------------------------
@dataclass
class RuntimeConfig:
"""Agent runtime configuration loaded from ~/.hive/configuration.json."""
model: str = field(default_factory=get_preferred_model)
temperature: float = 0.7
max_tokens: int = field(default_factory=get_max_tokens)
api_key: str | None = None
api_base: str | None = None
+2 -1
View File
@@ -9,7 +9,7 @@ from framework.graph.client_io import (
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.event_loop_node import ( from framework.graph.event_loop_node import (
EventLoopNode, EventLoopNode,
JudgeProtocol, JudgeProtocol,
@@ -58,6 +58,7 @@ __all__ = [
"EdgeSpec", "EdgeSpec",
"EdgeCondition", "EdgeCondition",
"GraphSpec", "GraphSpec",
"DEFAULT_MAX_TOKENS",
# Executor (fixed graph) # Executor (fixed graph)
"GraphExecutor", "GraphExecutor",
# Plan (flexible execution) # Plan (flexible execution)
+14 -2
View File
@@ -24,10 +24,12 @@ given the current goal, context, and execution state.
from enum import StrEnum from enum import StrEnum
from typing import Any from typing import Any
from pydantic import BaseModel, Field from pydantic import BaseModel, Field, model_validator
from framework.graph.safe_eval import safe_eval from framework.graph.safe_eval import safe_eval
DEFAULT_MAX_TOKENS = 8192
class EdgeCondition(StrEnum): class EdgeCondition(StrEnum):
"""When an edge should be traversed.""" """When an edge should be traversed."""
@@ -424,7 +426,7 @@ class GraphSpec(BaseModel):
# Default LLM settings # Default LLM settings
default_model: str = "claude-haiku-4-5-20251001" default_model: str = "claude-haiku-4-5-20251001"
max_tokens: int = 1024 max_tokens: int = Field(default=None) # resolved by _resolve_max_tokens validator
# Cleanup LLM for JSON extraction fallback (fast/cheap model preferred) # Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
# If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or # If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
@@ -447,6 +449,16 @@ class GraphSpec(BaseModel):
model_config = {"extra": "allow"} model_config = {"extra": "allow"}
@model_validator(mode="before")
@classmethod
def _resolve_max_tokens(cls, values: Any) -> Any:
"""Resolve max_tokens from the global config store when not explicitly set."""
if isinstance(values, dict) and values.get("max_tokens") is None:
from framework.config import get_max_tokens
values["max_tokens"] = get_max_tokens()
return values
def get_node(self, node_id: str) -> Any | None: def get_node(self, node_id: str) -> Any | None:
"""Get a node by ID.""" """Get a node by ID."""
for node in self.nodes: for node in self.nodes:
+4 -1
View File
@@ -33,6 +33,7 @@ from framework.graph.node import (
from framework.graph.output_cleaner import CleansingConfig, OutputCleaner from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
from framework.graph.validator import OutputValidator from framework.graph.validator import OutputValidator
from framework.llm.provider import LLMProvider, Tool from framework.llm.provider import LLMProvider, Tool
from framework.observability import set_trace_context
from framework.runtime.core import Runtime from framework.runtime.core import Runtime
from framework.schemas.checkpoint import Checkpoint from framework.schemas.checkpoint import Checkpoint
from framework.storage.checkpoint_store import CheckpointStore from framework.storage.checkpoint_store import CheckpointStore
@@ -228,6 +229,9 @@ class GraphExecutor:
Returns: Returns:
ExecutionResult with output and metrics ExecutionResult with output and metrics
""" """
# Add agent_id to trace context for correlation
set_trace_context(agent_id=graph.id)
# Validate graph # Validate graph
errors = graph.validate() errors = graph.validate()
if errors: if errors:
@@ -404,7 +408,6 @@ class GraphExecutor:
if self.runtime_logger: if self.runtime_logger:
# Extract session_id from storage_path if available (for unified sessions) # Extract session_id from storage_path if available (for unified sessions)
# storage_path format: base_path/sessions/{session_id}/
session_id = "" session_id = ""
if self._storage_path and self._storage_path.name.startswith("session_"): if self._storage_path and self._storage_path.name.startswith("session_"):
session_id = self._storage_path.name session_id = self._storage_path.name
@@ -25,6 +25,7 @@ if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
del _exports_dir del _exports_dir
from mcp.server import FastMCP # noqa: E402 from mcp.server import FastMCP # noqa: E402
from pydantic import ValidationError # noqa: E402
from framework.graph import ( # noqa: E402 from framework.graph import ( # noqa: E402
Constraint, Constraint,
@@ -1867,6 +1868,85 @@ def export_graph() -> str:
) )
@mcp.tool()
def import_from_export(
agent_json_path: Annotated[str, "Path to the agent.json file to import"],
) -> str:
"""
Import an agent definition from an exported agent.json file into the current build session.
Reads the agent.json, parses goal/nodes/edges, and populates the current session.
This is the reverse of export_graph().
Args:
agent_json_path: Path to the agent.json file to import
Returns:
JSON summary of what was imported (goal name, node count, edge count)
"""
session = get_session()
path = Path(agent_json_path)
if not path.exists():
return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
try:
data = json.loads(path.read_text())
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
try:
# Parse goal (same pattern as BuildSession.from_dict lines 88-99)
goal_data = data.get("goal")
if goal_data:
session.goal = Goal(
id=goal_data["id"],
name=goal_data["name"],
description=goal_data["description"],
success_criteria=[
SuccessCriterion(**sc) for sc in goal_data.get("success_criteria", [])
],
constraints=[Constraint(**c) for c in goal_data.get("constraints", [])],
)
# Parse nodes (same pattern as BuildSession.from_dict line 102)
graph_data = data.get("graph", {})
nodes_data = graph_data.get("nodes", [])
session.nodes = [NodeSpec(**n) for n in nodes_data]
# Parse edges (same pattern as BuildSession.from_dict lines 105-118)
edges_data = graph_data.get("edges", [])
session.edges = []
for e in edges_data:
condition_str = e.get("condition")
if isinstance(condition_str, str):
condition_map = {
"always": EdgeCondition.ALWAYS,
"on_success": EdgeCondition.ON_SUCCESS,
"on_failure": EdgeCondition.ON_FAILURE,
"conditional": EdgeCondition.CONDITIONAL,
"llm_decide": EdgeCondition.LLM_DECIDE,
}
e["condition"] = condition_map.get(condition_str, EdgeCondition.ON_SUCCESS)
session.edges.append(EdgeSpec(**e))
except (KeyError, TypeError, ValueError, ValidationError) as e:
return json.dumps({"success": False, "error": f"Malformed agent.json: {e}"})
# Persist updated session
_save_session(session)
return json.dumps(
{
"success": True,
"goal": session.goal.name if session.goal else None,
"nodes_count": len(session.nodes),
"edges_count": len(session.edges),
"node_ids": [n.id for n in session.nodes],
"edge_ids": [e.id for e in session.edges],
}
)
@mcp.tool() @mcp.tool()
def get_session_status() -> str: def get_session_status() -> str:
"""Get the current status of the build session.""" """Get the current status of the build session."""
+236
View File
@@ -0,0 +1,236 @@
# Observability - Structured Logging
## Configuration via Environment Variables
Control logging format using environment variables:
```bash
# JSON logging (production) - Machine-parseable, one line per log
export LOG_FORMAT=json
python -m my_agent run
# Human-readable (development) - Color-coded, easy to read
# Default if LOG_FORMAT is not set
python -m my_agent run
```
**Alternative:** Set `ENV=production` to automatically use JSON format:
```bash
export ENV=production
python -m my_agent run
```
---
## Overview
The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
**Features:**
- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
- **Dual output modes**: JSON for production, human-readable for development
- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
## Quick Start
Logging is automatically configured when you use `AgentRunner`. No setup required:
```python
from framework.runner import AgentRunner
runner = AgentRunner(graph=my_graph, goal=my_goal)
result = await runner.run({"input": "data"})
# Logs automatically include trace_id, execution_id, agent_id, etc.
```
## Programmatic Configuration
Configure logging explicitly in your code:
```python
from framework.observability import configure_logging
# Human-readable (development)
configure_logging(level="DEBUG", format="human")
# JSON (production)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
```
### Configuration Options
- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
- **format**:
- `"json"` - Machine-parseable JSON (one line per log entry)
- `"human"` - Human-readable with colors
- `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
## Log Format Examples
### JSON Format (Machine-parseable)
```json
{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "54e80d7b5bd6409dbc3217e5cd16a4fd", "execution_id": "b4c348ec54e80d7b5bd6409dbc3217e50", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
```
**Features:**
- `trace_id` and `execution_id` are 32 hex chars (W3C/OTel-aligned, no prefixes)
- Compact single-line format (easy to stream/parse)
- All trace context fields included automatically
### Human-Readable Format (Development)
```
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
```
**Features:**
- Color-coded log levels
- Shortened IDs for readability (first 8 chars)
- Context prefix shows trace correlation
## Trace Context Fields
When the framework sets trace context, these fields are included in all logs. IDs are 32 hex (W3C/OTel-aligned, no prefixes).
- **trace_id**: Trace identifier
- **execution_id**: Run/session correlation
- **agent_id**: Agent/graph identifier
- **goal_id**: Goal being pursued
- **node_id**: Current node (when set)
## Custom Log Fields
Add custom fields using the `extra` parameter:
```python
import logging
logger = logging.getLogger("my_module")
# Add custom fields
logger.info("LLM call completed", extra={
"latency_ms": 1250,
"tokens_used": 450,
"model": "claude-3-5-sonnet-20241022",
"node_id": "web-search"
})
```
These fields appear in both JSON and human-readable formats.
## Usage in Your Code
### Standard Logging (Recommended)
Just use Python's standard logging - context is automatic:
```python
import logging
logger = logging.getLogger(__name__)
def my_function():
# This log automatically includes trace_id, execution_id, etc.
logger.info("Processing data")
try:
result = do_work()
logger.info("Work completed", extra={"result_count": len(result)})
except Exception as e:
logger.error("Work failed", exc_info=True)
```
### Framework-Managed Context
The framework automatically sets trace context at key points:
- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
- **GraphExecutor.execute()**: Adds `agent_id`
- **Node execution**: Adds `node_id`
Propagation is automatic via ContextVar.
## Advanced Usage
### Manual Context Management
If you need to set trace context manually (rare):
```python
from framework.observability import set_trace_context, get_trace_context
# Set context (32-hex, no prefixes)
set_trace_context(
trace_id="54e80d7b5bd6409dbc3217e5cd16a4fd",
execution_id="b4c348ec54e80d7b5bd6409dbc3217e50",
agent_id="my-agent"
)
# Get current context
context = get_trace_context()
print(context["execution_id"])
# Clear context (usually not needed)
from framework.observability import clear_trace_context
clear_trace_context()
```
### Testing
For tests, you may want to configure logging explicitly:
```python
import pytest
from framework.observability import configure_logging
@pytest.fixture(autouse=True)
def setup_logging():
configure_logging(level="DEBUG", format="human")
```
## Best Practices
1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
2. **Development**: Use human-readable format (default)
3. **Don't manually set context**: Let the framework manage it
4. **Use standard logging**: No special APIs needed - just `logger.info()`
5. **Add custom fields**: Use `extra` dict for additional metadata
## Troubleshooting
### Logs missing trace context
Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
### JSON logs not appearing
Check environment variables:
```bash
echo $LOG_FORMAT
echo $ENV
```
Or explicitly set:
```python
configure_logging(format="json")
```
### Context not propagating
ContextVar automatically propagates through async calls. If context seems lost, check:
- Are you in the same async execution context?
- Has `set_trace_context()` been called for this execution?
## See Also
- [Logging Implementation](../observability/logging.py) - Source code
- [AgentRunner](../runner/runner.py) - Where logging is configured
- [Runtime Core](../runtime/core.py) - Where trace context is set
+23
View File
@@ -0,0 +1,23 @@
"""
Observability module for automatic trace correlation and structured logging.
This module provides zero-friction observability:
- Automatic trace context propagation via ContextVar
- Structured JSON logging for production
- Human-readable logging for development
- No manual ID passing required
"""
from framework.observability.logging import (
clear_trace_context,
configure_logging,
get_trace_context,
set_trace_context,
)
__all__ = [
"configure_logging",
"get_trace_context",
"set_trace_context",
"clear_trace_context",
]
+302
View File
@@ -0,0 +1,302 @@
"""
Structured logging with automatic trace context propagation.
Key Features:
- Zero developer friction: Standard logger.info() calls get automatic context
- ContextVar-based propagation: Thread-safe and async-safe
- Dual output modes: JSON for production, human-readable for development
- Correlation IDs: trace_id follows entire request flow automatically
Architecture:
Runtime.start_run() Generates trace_id, sets context once
(automatic propagation via ContextVar)
GraphExecutor.execute() Adds agent_id to context
(automatic propagation)
Node.execute() Adds node_id to context
(automatic propagation)
User code logger.info("message") Gets ALL context automatically!
"""
import json
import logging
import os
import re
from contextvars import ContextVar
from datetime import UTC, datetime
from typing import Any
# Context variable for trace propagation
# ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)
# ANSI escape code pattern (matches \033[...m or \x1b[...m)
ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")
def strip_ansi_codes(text: str) -> str:
"""Remove ANSI escape codes from text for clean JSON logging."""
return ANSI_ESCAPE_PATTERN.sub("", text)
class StructuredFormatter(logging.Formatter):
"""
JSON formatter for structured logging.
Produces machine-parseable log entries with:
- Standard fields (timestamp, level, logger, message)
- Trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC
- Custom fields from extra dict
"""
def format(self, record: logging.LogRecord) -> str:
"""Format log record as JSON."""
# Get trace context for correlation - AUTOMATIC!
context = trace_context.get() or {}
# Strip ANSI codes from message for clean JSON output
message = strip_ansi_codes(record.getMessage())
# Build base log entry
log_entry = {
"timestamp": datetime.now(UTC).isoformat(),
"level": record.levelname.lower(),
"logger": record.name,
"message": message,
}
# Add trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC!
log_entry.update(context)
# Add custom fields from extra (optional)
event = getattr(record, "event", None)
if event is not None:
if isinstance(event, str):
log_entry["event"] = strip_ansi_codes(str(event))
else:
log_entry["event"] = event
latency_ms = getattr(record, "latency_ms", None)
if latency_ms is not None:
log_entry["latency_ms"] = latency_ms
tokens_used = getattr(record, "tokens_used", None)
if tokens_used is not None:
log_entry["tokens_used"] = tokens_used
node_id = getattr(record, "node_id", None)
if node_id is not None:
log_entry["node_id"] = node_id
model = getattr(record, "model", None)
if model is not None:
log_entry["model"] = model
# Add exception info if present (strip ANSI codes from exception text too)
if record.exc_info:
exception_text = self.formatException(record.exc_info)
log_entry["exception"] = strip_ansi_codes(exception_text)
return json.dumps(log_entry)
class HumanReadableFormatter(logging.Formatter):
"""
Human-readable formatter for development.
Provides colorized logs with trace context for local debugging.
Includes trace_id prefix for correlation - AUTOMATIC!
"""
COLORS = {
"DEBUG": "\033[36m", # Cyan
"INFO": "\033[32m", # Green
"WARNING": "\033[33m", # Yellow
"ERROR": "\033[31m", # Red
"CRITICAL": "\033[35m", # Magenta
}
RESET = "\033[0m"
def format(self, record: logging.LogRecord) -> str:
"""Format log record as human-readable string."""
# Get trace context - AUTOMATIC!
context = trace_context.get() or {}
trace_id = context.get("trace_id", "")
execution_id = context.get("execution_id", "")
agent_id = context.get("agent_id", "")
# Build context prefix
prefix_parts = []
if trace_id:
prefix_parts.append(f"trace:{trace_id[:8]}")
if execution_id:
prefix_parts.append(f"exec:{execution_id[-8:]}")
if agent_id:
prefix_parts.append(f"agent:{agent_id}")
context_prefix = f"[{' | '.join(prefix_parts)}] " if prefix_parts else ""
# Get color
color = self.COLORS.get(record.levelname, "")
reset = self.RESET
# Format log level (5 chars wide for alignment)
level = f"{record.levelname:<8}"
# Add event if present
event = ""
record_event = getattr(record, "event", None)
if record_event is not None:
event = f" [{record_event}]"
# Format message: [LEVEL] [trace context] message
return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
def configure_logging(
level: str = "INFO",
format: str = "auto", # "json", "human", or "auto"
) -> None:
"""
Configure structured logging for the application.
This should be called ONCE at application startup, typically in:
- AgentRunner._setup()
- Main entry point
- Test fixtures
Args:
level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
format: Output format:
- "json": Machine-parseable JSON (for production)
- "human": Human-readable with colors (for development)
- "auto": JSON if LOG_FORMAT=json or ENV=production, else human
Examples:
# Development mode (human-readable)
configure_logging(level="DEBUG", format="human")
# Production mode (JSON)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
"""
# Auto-detect format
if format == "auto":
# Use JSON if LOG_FORMAT=json or ENV=production
log_format_env = os.getenv("LOG_FORMAT", "").lower()
env = os.getenv("ENV", "development").lower()
if log_format_env == "json" or env == "production":
format = "json"
else:
format = "human"
# Select formatter
if format == "json":
formatter = StructuredFormatter()
# Disable colors in third-party libraries when using JSON format
_disable_third_party_colors()
else:
formatter = HumanReadableFormatter()
# Configure handler
handler = logging.StreamHandler()
handler.setFormatter(formatter)
# Configure root logger
root_logger = logging.getLogger()
root_logger.handlers.clear()
root_logger.addHandler(handler)
root_logger.setLevel(level.upper())
# When in JSON mode, configure known third-party loggers to use JSON formatter
# This ensures libraries like LiteLLM, httpcore also output clean JSON
if format == "json":
third_party_loggers = [
"LiteLLM",
"httpcore",
"httpx",
"openai",
]
for logger_name in third_party_loggers:
logger = logging.getLogger(logger_name)
# Clear existing handlers so records propagate to root and use our formatter there
logger.handlers.clear()
logger.propagate = True # Still propagate to root for consistency
def _disable_third_party_colors() -> None:
"""Disable color output in third-party libraries for clean JSON logging."""
# Set NO_COLOR environment variable (common convention for disabling colors)
os.environ["NO_COLOR"] = "1"
os.environ["FORCE_COLOR"] = "0"
# Disable LiteLLM debug/verbose output colors if available
try:
import litellm
# LiteLLM respects NO_COLOR, but we can also suppress debug info
if hasattr(litellm, "suppress_debug_info"):
litellm.suppress_debug_info = True # type: ignore[attr-defined]
except (ImportError, AttributeError):
pass
def set_trace_context(**kwargs: Any) -> None:
"""
Set trace context for current execution.
Context is stored in a ContextVar and AUTOMATICALLY propagates
through async calls within the same execution context.
This is called by the framework at key points:
- Runtime.start_run(): Sets trace_id, execution_id, goal_id
- GraphExecutor.execute(): Adds agent_id
- Node execution: Adds node_id
Developers/agents NEVER call this directly - it's framework-managed.
Args:
**kwargs: Context fields (trace_id, execution_id, agent_id, etc.)
Example (framework code):
# In Runtime.start_run()
trace_id = uuid.uuid4().hex # 32 hex, W3C Trace Context compliant
execution_id = uuid.uuid4().hex # 32 hex, OTel-aligned for correlation
set_trace_context(
trace_id=trace_id,
execution_id=execution_id,
goal_id=goal_id
)
# All subsequent logs in this execution get these fields automatically!
"""
current = trace_context.get() or {}
trace_context.set({**current, **kwargs})
def get_trace_context() -> dict:
"""
Get current trace context.
Returns:
Dict with trace_id, execution_id, agent_id, etc.
Empty dict if no context set.
"""
context = trace_context.get() or {}
return context.copy()
def clear_trace_context() -> None:
"""
Clear trace context.
Useful for:
- Cleanup between test runs
- Starting a completely new execution context
- Manual context management (rare)
Note: Framework typically doesn't need to call this - ContextVar
is execution-scoped and cleans itself up automatically.
"""
trace_context.set(None)
+19 -21
View File
@@ -8,8 +8,15 @@ from dataclasses import dataclass, field
from pathlib import Path from pathlib import Path
from typing import TYPE_CHECKING, Any from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.graph import Goal from framework.graph import Goal
from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec from framework.graph.edge import (
DEFAULT_MAX_TOKENS,
AsyncEntryPointSpec,
EdgeCondition,
EdgeSpec,
GraphSpec,
)
from framework.graph.executor import ExecutionResult from framework.graph.executor import ExecutionResult
from framework.graph.node import NodeSpec from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, Tool from framework.llm.provider import LLMProvider, Tool
@@ -24,9 +31,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Configuration paths
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def _ensure_credential_key_env() -> None: def _ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment. """Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
@@ -56,17 +60,6 @@ def _ensure_credential_key_env() -> None:
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json" CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
def get_claude_code_token() -> str | None: def get_claude_code_token() -> str | None:
""" """
Get the OAuth token from Claude Code subscription. Get the OAuth token from Claude Code subscription.
@@ -264,11 +257,7 @@ class AgentRunner:
@staticmethod @staticmethod
def _resolve_default_model() -> str: def _resolve_default_model() -> str:
"""Resolve the default model from ~/.hive/configuration.json.""" """Resolve the default model from ~/.hive/configuration.json."""
config = get_hive_config() return get_preferred_model()
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
def __init__( def __init__(
self, self,
@@ -414,7 +403,11 @@ class AgentRunner:
if agent_config and hasattr(agent_config, "model"): if agent_config and hasattr(agent_config, "model"):
model = agent_config.model model = agent_config.model
max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024 if agent_config and hasattr(agent_config, "max_tokens"):
max_tokens = agent_config.max_tokens
else:
hive_config = get_hive_config()
max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# Build GraphSpec from module-level variables # Build GraphSpec from module-level variables
graph = GraphSpec( graph = GraphSpec(
@@ -546,6 +539,11 @@ class AgentRunner:
def _setup(self) -> None: def _setup(self) -> None:
"""Set up runtime, LLM, and executor.""" """Set up runtime, LLM, and executor."""
# Configure structured logging (auto-detects JSON vs human-readable)
from framework.observability import configure_logging
configure_logging(level="INFO", format="auto")
# Set up session context for tools (workspace_id, agent_id, session_id) # Set up session context for tools (workspace_id, agent_id, session_id)
workspace_id = "default" # Could be derived from storage path workspace_id = "default" # Could be derived from storage path
agent_id = self.graph.id or "unknown" agent_id = self.graph.id or "unknown"
+12 -2
View File
@@ -197,8 +197,17 @@ class NodeStepLog:
tokens_used: int tokens_used: int
latency_ms: int latency_ms: int
# ... detailed execution state # ... detailed execution state
# Trace context (OTel-aligned; empty if observability context not set):
trace_id: str # From set_trace_context (OTel trace)
span_id: str # 16 hex chars per step (OTel span)
parent_span_id: str # Optional; for nested span hierarchy
execution_id: str # Session/run correlation id
``` ```
L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
--- ---
## Querying Logs (MCP Tools) ## Querying Logs (MCP Tools)
@@ -520,9 +529,10 @@ logger.start_run(goal_id, session_id=execution_id)
**Written:** Incrementally (append per step) **Written:** Incrementally (append per step)
**Format:** JSONL (one JSON object per line) **Format:** JSONL (one JSON object per line)
Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
```jsonl ```jsonl
{"node_id":"intake-collector","step_index":3,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Missing required output 'twitter_handles'. You found the handle but didn't call set_output.","llm_response_text":"I found the profile...","tokens_used":1234,"latency_ms":2500} {"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
{"node_id":"intake-collector","step_index":4,"tool_calls":[{"tool":"web_search","args":{"query":"@RomuloNevesOf twitter"}}],"tool_results":[{"status":"success","data":"..."}],"verdict":"RETRY","verdict_feedback":"Still missing 'twitter_handles'.","llm_response_text":"Found more info...","tokens_used":1456,"latency_ms":2300}
``` ```
**Why JSONL?** **Why JSONL?**
+9
View File
@@ -13,6 +13,7 @@ from datetime import datetime
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
from framework.observability import set_trace_context
from framework.schemas.decision import Decision, DecisionType, Option, Outcome from framework.schemas.decision import Decision, DecisionType, Option, Outcome
from framework.schemas.run import Run, RunStatus from framework.schemas.run import Run, RunStatus
from framework.storage.backend import FileStorage from framework.storage.backend import FileStorage
@@ -79,6 +80,14 @@ class Runtime:
The run ID The run ID
""" """
run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}" run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
trace_id = uuid.uuid4().hex
execution_id = uuid.uuid4().hex # 32 hex, OTel/W3C-aligned for logs
set_trace_context(
trace_id=trace_id,
execution_id=execution_id,
goal_id=goal_id,
)
self._current_run = Run( self._current_run = Run(
id=run_id, id=run_id,
@@ -361,6 +361,13 @@ class ExecutionStream:
# Create runtime adapter for this execution # Create runtime adapter for this execution
runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id) runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)
# Start run to set trace context (CRITICAL for observability)
runtime_adapter.start_run(
goal_id=self.goal.id,
goal_description=self.goal.description,
input_data=ctx.input_data,
)
# Create per-execution runtime logger # Create per-execution runtime logger
runtime_logger = None runtime_logger = None
if self._runtime_log_store: if self._runtime_log_store:
@@ -413,6 +420,13 @@ class ExecutionStream:
# Store result with retention # Store result with retention
self._record_execution_result(execution_id, result) self._record_execution_result(execution_id, result)
# End run to complete trace (for observability)
runtime_adapter.end_run(
success=result.success,
narrative=f"Execution {'succeeded' if result.success else 'failed'}",
output_data=result.output,
)
# Update context # Update context
ctx.completed_at = datetime.now() ctx.completed_at = datetime.now()
ctx.status = "completed" if result.success else "failed" ctx.status = "completed" if result.success else "failed"
@@ -495,6 +509,16 @@ class ExecutionStream:
# Write error session state # Write error session state
await self._write_session_state(execution_id, ctx, error=str(e)) await self._write_session_state(execution_id, ctx, error=str(e))
# End run with failure (for observability)
try:
runtime_adapter.end_run(
success=False,
narrative=f"Execution failed: {str(e)}",
output_data={},
)
except Exception:
pass # Don't let end_run errors mask the original error
# Emit failure event # Emit failure event
if self._event_bus: if self._event_bus:
await self._event_bus.emit_execution_failed( await self._event_bus.emit_execution_failed(
+22 -2
View File
@@ -31,6 +31,9 @@ class NodeStepLog(BaseModel):
For EventLoopNode, each iteration is a step. For single-step nodes For EventLoopNode, each iteration is a step. For single-step nodes
(LLMNode, FunctionNode, RouterNode), step_index is 0. (LLMNode, FunctionNode, RouterNode), step_index is 0.
OTel-aligned fields (trace_id, span_id, execution_id) enable correlation
and future OpenTelemetry export without schema changes.
""" """
node_id: str node_id: str
@@ -48,6 +51,11 @@ class NodeStepLog(BaseModel):
error: str = "" # Error message if step failed error: str = "" # Error message if step failed
stacktrace: str = "" # Full stack trace if exception occurred stacktrace: str = "" # Full stack trace if exception occurred
is_partial: bool = False # True if step didn't complete normally is_partial: bool = False # True if step didn't complete normally
# OTel / trace context (from observability; empty if not set):
trace_id: str = "" # OTel trace id (e.g. from set_trace_context)
span_id: str = "" # OTel span id (16 hex chars per step)
parent_span_id: str = "" # Optional; for nested span hierarchy
execution_id: str = "" # Session/run correlation id
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -56,7 +64,10 @@ class NodeStepLog(BaseModel):
class NodeDetail(BaseModel): class NodeDetail(BaseModel):
"""Per-node completion result and attention flags.""" """Per-node completion result and attention flags.
OTel-aligned fields (trace_id, span_id) tie L2 to the same trace as L3.
"""
node_id: str node_id: str
node_name: str = "" node_name: str = ""
@@ -78,6 +89,9 @@ class NodeDetail(BaseModel):
continue_count: int = 0 continue_count: int = 0
needs_attention: bool = False needs_attention: bool = False
attention_reasons: list[str] = Field(default_factory=list) attention_reasons: list[str] = Field(default_factory=list)
# OTel / trace context (from observability; empty if not set):
trace_id: str = ""
span_id: str = "" # Optional node-level span for hierarchy
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -86,7 +100,10 @@ class NodeDetail(BaseModel):
class RunSummaryLog(BaseModel): class RunSummaryLog(BaseModel):
"""Run-level summary for a full graph execution.""" """Run-level summary for a full graph execution.
OTel-aligned fields (trace_id, execution_id) tie L1 to the same trace as L2/L3.
"""
run_id: str run_id: str
agent_id: str = "" agent_id: str = ""
@@ -101,6 +118,9 @@ class RunSummaryLog(BaseModel):
started_at: str = "" # ISO timestamp started_at: str = "" # ISO timestamp
duration_ms: int = 0 duration_ms: int = 0
execution_quality: str = "" # "clean"|"degraded"|"failed" execution_quality: str = "" # "clean"|"degraded"|"failed"
# OTel / trace context (from observability; empty if not set):
trace_id: str = ""
execution_id: str = ""
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
+8 -17
View File
@@ -52,29 +52,20 @@ class RuntimeLogStore:
- New format (session_*): {storage_root}/sessions/{run_id}/logs/ - New format (session_*): {storage_root}/sessions/{run_id}/logs/
- Old format (anything else): {base_path}/runs/{run_id}/ (deprecated) - Old format (anything else): {base_path}/runs/{run_id}/ (deprecated)
When base_path ends with 'runtime_logs', we use the parent directory
to avoid nesting under runtime_logs/.
This allows backward compatibility for reading old logs.
""" """
if run_id.startswith("session_"): if run_id.startswith("session_"):
# New: sessions/{session_id}/logs/
# If base_path ends with runtime_logs, use parent (storage root)
is_runtime_logs = self._base_path.name == "runtime_logs" is_runtime_logs = self._base_path.name == "runtime_logs"
root = self._base_path.parent if is_runtime_logs else self._base_path root = self._base_path.parent if is_runtime_logs else self._base_path
return root / "sessions" / run_id / "logs" return root / "sessions" / run_id / "logs"
else: import warnings
# Old: runs/{run_id}/ (deprecated, backward compatibility only)
import warnings
warnings.warn( warnings.warn(
f"Reading logs from deprecated location for run_id={run_id}. " f"Reading logs from deprecated location for run_id={run_id}. "
"New sessions use unified storage at sessions/session_*/logs/", "New sessions use unified storage at sessions/session_*/logs/",
DeprecationWarning, DeprecationWarning,
stacklevel=3, stacklevel=3,
) )
return self._base_path / "runs" / run_id return self._base_path / "runs" / run_id
# ------------------------------------------------------------------- # -------------------------------------------------------------------
# Incremental write (sync — called from locked sections) # Incremental write (sync — called from locked sections)
+24 -2
View File
@@ -26,6 +26,7 @@ import uuid
from datetime import UTC, datetime from datetime import UTC, datetime
from typing import Any from typing import Any
from framework.observability import get_trace_context
from framework.runtime.runtime_log_schemas import ( from framework.runtime.runtime_log_schemas import (
NodeDetail, NodeDetail,
NodeStepLog, NodeStepLog,
@@ -64,10 +65,8 @@ class RuntimeLogger:
The run_id (same as session_id if provided) The run_id (same as session_id if provided)
""" """
if session_id: if session_id:
# Use provided session_id as run_id (unified sessions)
self._run_id = session_id self._run_id = session_id
else: else:
# Generate run_id in old format (backward compatibility)
ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S") ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S")
short_uuid = uuid.uuid4().hex[:8] short_uuid = uuid.uuid4().hex[:8]
self._run_id = f"{ts}_{short_uuid}" self._run_id = f"{ts}_{short_uuid}"
@@ -118,6 +117,12 @@ class RuntimeLogger:
) )
) )
# OTel / trace context: from observability ContextVar (empty if not set)
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
execution_id = ctx.get("execution_id", "")
span_id = uuid.uuid4().hex[:16] # OTel 16-hex span_id per step
step_log = NodeStepLog( step_log = NodeStepLog(
node_id=node_id, node_id=node_id,
node_type=node_type, node_type=node_type,
@@ -132,6 +137,9 @@ class RuntimeLogger:
error=error, error=error,
stacktrace=stacktrace, stacktrace=stacktrace,
is_partial=is_partial, is_partial=is_partial,
trace_id=trace_id,
span_id=span_id,
execution_id=execution_id,
) )
with self._lock: with self._lock:
@@ -190,6 +198,11 @@ class RuntimeLogger:
needs_attention = True needs_attention = True
attention_reasons.append(f"Many iterations: {total_steps}") attention_reasons.append(f"Many iterations: {total_steps}")
# OTel / trace context for L2 correlation
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
span_id = uuid.uuid4().hex[:16] # Optional node-level span
detail = NodeDetail( detail = NodeDetail(
node_id=node_id, node_id=node_id,
node_name=node_name, node_name=node_name,
@@ -210,6 +223,8 @@ class RuntimeLogger:
continue_count=continue_count, continue_count=continue_count,
needs_attention=needs_attention, needs_attention=needs_attention,
attention_reasons=attention_reasons, attention_reasons=attention_reasons,
trace_id=trace_id,
span_id=span_id,
) )
with self._lock: with self._lock:
@@ -274,6 +289,11 @@ class RuntimeLogger:
for nd in node_details: for nd in node_details:
attention_reasons.extend(nd.attention_reasons) attention_reasons.extend(nd.attention_reasons)
# OTel / trace context for L1 correlation
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
execution_id = ctx.get("execution_id", "")
summary = RunSummaryLog( summary = RunSummaryLog(
run_id=self._run_id, run_id=self._run_id,
agent_id=self._agent_id, agent_id=self._agent_id,
@@ -288,6 +308,8 @@ class RuntimeLogger:
started_at=self._started_at, started_at=self._started_at,
duration_ms=duration_ms, duration_ms=duration_ms,
execution_quality=execution_quality, execution_quality=execution_quality,
trace_id=trace_id,
execution_id=execution_id,
) )
await self._store.save_summary(self._run_id, summary) await self._store.save_summary(self._run_id, summary)
+11
View File
@@ -12,6 +12,7 @@ import uuid
from datetime import datetime from datetime import datetime
from typing import TYPE_CHECKING, Any from typing import TYPE_CHECKING, Any
from framework.observability import set_trace_context
from framework.schemas.decision import Decision, DecisionType, Option, Outcome from framework.schemas.decision import Decision, DecisionType, Option, Outcome
from framework.schemas.run import Run, RunStatus from framework.schemas.run import Run, RunStatus
from framework.storage.concurrent import ConcurrentStorage from framework.storage.concurrent import ConcurrentStorage
@@ -119,6 +120,16 @@ class StreamRuntime:
""" """
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}" run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}"
trace_id = uuid.uuid4().hex
otel_execution_id = uuid.uuid4().hex # 32 hex, OTel/W3C-aligned for logs
set_trace_context(
trace_id=trace_id,
execution_id=otel_execution_id,
run_id=run_id,
goal_id=goal_id,
stream_id=self.stream_id,
)
run = Run( run = Run(
id=run_id, id=run_id,
+109
View File
@@ -11,6 +11,7 @@ from pathlib import Path
import pytest import pytest
from framework.observability import clear_trace_context, set_trace_context
from framework.runtime.runtime_log_schemas import ( from framework.runtime.runtime_log_schemas import (
NodeDetail, NodeDetail,
NodeStepLog, NodeStepLog,
@@ -464,6 +465,114 @@ class TestRuntimeLogger:
assert tool_logs.steps[0].verdict == "RETRY" assert tool_logs.steps[0].verdict == "RETRY"
assert tool_logs.steps[2].verdict == "ACCEPT" assert tool_logs.steps[2].verdict == "ACCEPT"
@pytest.mark.asyncio
async def test_trace_context_populated_in_l1_l2_l3(self, tmp_path: Path):
"""With trace context set, L3/L2/L1 entries include trace_id, span_id, execution_id."""
set_trace_context(
trace_id="a1b2c3d4e5f6789012345678abcdef01",
execution_id="b2c3d4e5f6789012345678abcdef0123",
)
try:
store = RuntimeLogStore(tmp_path / "logs")
rl = RuntimeLogger(store=store, agent_id="test-agent")
run_id = rl.start_run("goal-1")
rl.log_step(
node_id="node-1",
node_type="event_loop",
step_index=0,
llm_text="Step.",
input_tokens=10,
output_tokens=5,
)
rl.log_node_complete(
node_id="node-1",
node_name="Search",
node_type="event_loop",
success=True,
exit_status="success",
)
await rl.end_run(
status="success",
duration_ms=100,
node_path=["node-1"],
execution_quality="clean",
)
# L3: tool_logs
tool_logs = await store.load_tool_logs(run_id)
assert tool_logs is not None
assert len(tool_logs.steps) == 1
step = tool_logs.steps[0]
assert step.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
assert step.execution_id == "b2c3d4e5f6789012345678abcdef0123"
assert len(step.span_id) == 16
assert all(c in "0123456789abcdef" for c in step.span_id)
# L2: details
details = await store.load_details(run_id)
assert details is not None
assert len(details.nodes) == 1
nd = details.nodes[0]
assert nd.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
assert len(nd.span_id) == 16
# L1: summary
summary = await store.load_summary(run_id)
assert summary is not None
assert summary.trace_id == "a1b2c3d4e5f6789012345678abcdef01"
assert summary.execution_id == "b2c3d4e5f6789012345678abcdef0123"
finally:
clear_trace_context()
@pytest.mark.asyncio
async def test_trace_context_empty_when_not_set(self, tmp_path: Path):
"""Without trace context, L3/L2/L1 trace_id and execution_id are empty."""
clear_trace_context()
store = RuntimeLogStore(tmp_path / "logs")
rl = RuntimeLogger(store=store, agent_id="test-agent")
run_id = rl.start_run("goal-1")
rl.log_step(
node_id="node-1",
node_type="event_loop",
step_index=0,
llm_text="Step.",
input_tokens=10,
output_tokens=5,
)
rl.log_node_complete(
node_id="node-1",
node_name="Search",
node_type="event_loop",
success=True,
exit_status="success",
)
await rl.end_run(
status="success",
duration_ms=100,
node_path=["node-1"],
execution_quality="clean",
)
# L3: trace_id and execution_id from context should be empty
tool_logs = await store.load_tool_logs(run_id)
assert tool_logs is not None
assert len(tool_logs.steps) == 1
assert tool_logs.steps[0].trace_id == ""
assert tool_logs.steps[0].execution_id == ""
# L2
details = await store.load_details(run_id)
assert details is not None
assert details.nodes[0].trace_id == ""
# L1
summary = await store.load_summary(run_id)
assert summary is not None
assert summary.trace_id == ""
assert summary.execution_id == ""
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_multi_node_lifecycle(self, tmp_path: Path): async def test_multi_node_lifecycle(self, tmp_path: Path):
"""Test logging across multiple nodes in a graph run.""" """Test logging across multiple nodes in a graph run."""
+27 -6
View File
@@ -5,12 +5,31 @@ Aden Hive is a Python-based agent framework. Configuration is handled through en
## Configuration Overview ## Configuration Overview
``` ```
Environment variables (API keys, runtime flags) ~/.hive/configuration.json (global defaults: provider, model, max_tokens)
Agent config.py (per-agent settings: model, tools, storage) Environment variables (API keys, runtime flags)
pyproject.toml (package metadata and dependencies) Agent config.py (per-agent settings: model, tools, storage)
.mcp.json (MCP server connections) pyproject.toml (package metadata and dependencies)
.mcp.json (MCP server connections)
``` ```
## Global Configuration (~/.hive/configuration.json)
The `quickstart.sh` script creates this file during setup. It stores the default LLM provider, model, and max_tokens used by all agents unless overridden in an agent's own `config.py`.
```json
{
"llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 8192,
"api_key_env_var": "ANTHROPIC_API_KEY"
},
"created_at": "2026-01-15T12:00:00+00:00"
}
```
The default `max_tokens` value (8192) is defined as `DEFAULT_MAX_TOKENS` in `framework.graph.edge` and re-exported from `framework.graph`. Each agent's `RuntimeConfig` reads from this file at startup. To change defaults, either re-run `quickstart.sh` or edit the file directly.
## Environment Variables ## Environment Variables
### LLM Providers (at least one required for real execution) ### LLM Providers (at least one required for real execution)
@@ -61,14 +80,16 @@ Each agent package in `exports/` contains its own `config.py`:
```python ```python
# exports/my_agent/config.py # exports/my_agent/config.py
CONFIG = { CONFIG = {
"model": "claude-haiku-4-5-20251001", # Default LLM model "model": "anthropic/claude-sonnet-4-5-20250929", # Default LLM model
"max_tokens": 4096, "max_tokens": 8192, # default: DEFAULT_MAX_TOKENS from framework.graph
"temperature": 0.7, "temperature": 0.7,
"tools": ["web_search", "pdf_read"], # MCP tools to enable "tools": ["web_search", "pdf_read"], # MCP tools to enable
"storage_path": "/tmp/my_agent", # Runtime data location "storage_path": "/tmp/my_agent", # Runtime data location
} }
``` ```
If `model` or `max_tokens` are omitted, the agent loads defaults from `~/.hive/configuration.json`.
### Agent Graph Specification ### Agent Graph Specification
Agent behavior is defined in `agent.json` (or constructed in `agent.py`): Agent behavior is defined in `agent.json` (or constructed in `agent.py`):
+1
View File
@@ -163,6 +163,7 @@ hive/ # Repository root
│ │ ├── llm/ # LLM provider integrations (Anthropic, OpenAI, etc.) │ │ ├── llm/ # LLM provider integrations (Anthropic, OpenAI, etc.)
│ │ ├── mcp/ # MCP server integration │ │ ├── mcp/ # MCP server integration
│ │ ├── runner/ # AgentRunner - loads and runs agents │ │ ├── runner/ # AgentRunner - loads and runs agents
| | ├── observability/ # Structured logging - human-readable and machine-parseable tracing
│ │ ├── runtime/ # Runtime environment │ │ ├── runtime/ # Runtime environment
│ │ ├── schemas/ # Data schemas │ │ ├── schemas/ # Data schemas
│ │ ├── storage/ # File-based persistence │ │ ├── storage/ # File-based persistence
+12 -3
View File
@@ -11,6 +11,7 @@ template_name/
├── __init__.py # Package exports ├── __init__.py # Package exports
├── __main__.py # CLI entry point ├── __main__.py # CLI entry point
├── agent.py # Goal, edges, graph spec, agent class ├── agent.py # Goal, edges, graph spec, agent class
├── agent.json # Agent definition (used by build-from-template)
├── config.py # Runtime configuration ├── config.py # Runtime configuration
├── nodes/ ├── nodes/
│ └── __init__.py # Node definitions (NodeSpec instances) │ └── __init__.py # Node definitions (NodeSpec instances)
@@ -19,20 +20,28 @@ template_name/
## How to use a template ## How to use a template
### Option 1: Build from template (recommended)
Use the `/hive-create` skill and select "From a template" to interactively pick a template, customize the goal/nodes/graph, and export a new agent.
### Option 2: Manual copy
```bash ```bash
# 1. Copy to your exports directory # 1. Copy to your exports directory
cp -r examples/templates/marketing_agent exports/my_marketing_agent cp -r examples/templates/deep_research_agent exports/my_research_agent
# 2. Update the module references in __main__.py and __init__.py # 2. Update the module references in __main__.py and __init__.py
# 3. Customize goal, nodes, edges, and prompts # 3. Customize goal, nodes, edges, and prompts
# 4. Run it # 4. Run it
uv run python -m exports.my_marketing_agent --input '{"product_description": "..."}' uv run python -m exports.my_research_agent --input '{"topic": "..."}'
``` ```
## Available templates ## Available templates
| Template | Description | | Template | Description |
|----------|-------------| |----------|-------------|
| [marketing_agent](marketing_agent/) | Multi-channel marketing content generator with audience analysis, content generation, and editorial review nodes | | [deep_research_agent](deep_research_agent/) | Interactive research agent that searches diverse sources, evaluates findings with user checkpoints, and produces a cited HTML report |
| [tech_news_reporter](tech_news_reporter/) | Researches the latest technology and AI news from the web and produces a well-organized report |
| [twitter_outreach](twitter_outreach/) | Researches a Twitter/X profile, crafts a personalized outreach email, and sends it after user approval |
@@ -207,17 +207,8 @@ async def _interactive_shell(verbose=False):
if result.success: if result.success:
output = result.output output = result.output
if "report_content" in output: status = output.get("delivery_status", "unknown")
click.echo("\n--- Report ---\n") click.echo(f"\nResearch complete (status: {status})\n")
click.echo(output["report_content"])
click.echo("\n")
if "references" in output:
click.echo("--- References ---\n")
for ref in output.get("references", []):
click.echo(
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
)
click.echo("\n")
else: else:
click.echo(f"\nResearch failed: {result.error}\n") click.echo(f"\nResearch failed: {result.error}\n")
@@ -0,0 +1,276 @@
{
"agent": {
"id": "deep_research_agent",
"name": "Deep Research Agent",
"version": "1.0.0",
"description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback."
},
"graph": {
"id": "deep-research-agent-graph",
"goal_id": "rigorous-interactive-research",
"version": "1.0.0",
"entry_node": "intake",
"entry_points": {
"start": "intake"
},
"pause_nodes": [],
"terminal_nodes": [
"report"
],
"nodes": [
{
"id": "intake",
"name": "Research Intake",
"description": "Discuss the research topic with the user, clarify scope, and confirm direction",
"node_type": "event_loop",
"input_keys": [
"topic"
],
"output_keys": [
"research_brief"
],
"nullable_output_keys": [],
"input_schema": {},
"output_schema": {},
"system_prompt": "You are a research intake specialist. The user wants to research a topic.\nHave a brief conversation to clarify what they need.\n\n**STEP 1 \u2014 Read and respond (text only, NO tool calls):**\n1. Read the topic provided\n2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)\n3. If it's already clear, confirm your understanding and ask the user to confirm\n\nKeep it short. Don't over-ask.\n\nAfter your message, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user confirms, call set_output:**\n- set_output(\"research_brief\", \"A clear paragraph describing exactly what to research, what questions to answer, what scope to cover, and how deep to go.\")",
"tools": [],
"model": null,
"function": null,
"routes": {},
"max_retries": 3,
"retry_on": [],
"max_node_visits": 1,
"output_model": null,
"max_validation_retries": 2,
"client_facing": true
},
{
"id": "research",
"name": "Research",
"description": "Search the web, fetch source content, and compile findings",
"node_type": "event_loop",
"input_keys": [
"research_brief",
"feedback"
],
"output_keys": [
"findings",
"sources",
"gaps"
],
"nullable_output_keys": [
"feedback"
],
"input_schema": {},
"output_schema": {},
"system_prompt": "You are a research agent. Given a research brief, find and analyze sources.\n\nIf feedback is provided, this is a follow-up round \u2014 focus on the gaps identified.\n\nWork in phases:\n1. **Search**: Use web_search with 3-5 diverse queries covering different angles.\n Prioritize authoritative sources (.edu, .gov, established publications).\n2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).\n Skip URLs that fail. Extract the substantive content.\n3. **Analyze**: Review what you've collected. Identify key findings, themes,\n and any contradictions between sources.\n\nImportant:\n- Work in batches of 3-4 tool calls at a time to manage context\n- After each batch, assess whether you have enough material\n- Prefer quality over quantity \u2014 5 good sources beat 15 thin ones\n- Track which URL each finding comes from (you'll need citations later)\n\nWhen done, use set_output:\n- set_output(\"findings\", \"Structured summary: key findings with source URLs for each claim. Include themes, contradictions, and confidence levels.\")\n- set_output(\"sources\", [{\"url\": \"...\", \"title\": \"...\", \"summary\": \"...\"}])\n- set_output(\"gaps\", \"What aspects of the research brief are NOT well-covered yet, if any.\")",
"tools": [
"web_search",
"web_scrape",
"load_data",
"save_data",
"list_data_files"
],
"model": null,
"function": null,
"routes": {},
"max_retries": 3,
"retry_on": [],
"max_node_visits": 3,
"output_model": null,
"max_validation_retries": 2,
"client_facing": false
},
{
"id": "review",
"name": "Review Findings",
"description": "Present findings to user and decide whether to research more or write the report",
"node_type": "event_loop",
"input_keys": [
"findings",
"sources",
"gaps",
"research_brief"
],
"output_keys": [
"needs_more_research",
"feedback"
],
"nullable_output_keys": [],
"input_schema": {},
"output_schema": {},
"system_prompt": "Present the research findings to the user clearly and concisely.\n\n**STEP 1 \u2014 Present (your first message, text only, NO tool calls):**\n1. **Summary** (2-3 sentences of what was found)\n2. **Key Findings** (bulleted, with confidence levels)\n3. **Sources Used** (count and quality assessment)\n4. **Gaps** (what's still unclear or under-covered)\n\nEnd by asking: Are they satisfied, or do they want deeper research? Should we proceed to writing the final report?\n\nAfter your presentation, call ask_user() to wait for the user's response.\n\n**STEP 2 \u2014 After the user responds, call set_output:**\n- set_output(\"needs_more_research\", \"true\") \u2014 if they want more\n- set_output(\"needs_more_research\", \"false\") \u2014 if they're satisfied\n- set_output(\"feedback\", \"What the user wants explored further, or empty string\")",
"tools": [],
"model": null,
"function": null,
"routes": {},
"max_retries": 3,
"retry_on": [],
"max_node_visits": 3,
"output_model": null,
"max_validation_retries": 2,
"client_facing": true
},
{
"id": "report",
"name": "Write & Deliver Report",
"description": "Write a cited HTML report from the findings and present it to the user",
"node_type": "event_loop",
"input_keys": [
"findings",
"sources",
"research_brief"
],
"output_keys": [
"delivery_status"
],
"nullable_output_keys": [],
"input_schema": {},
"output_schema": {},
"system_prompt": "Write a comprehensive research report as an HTML file and present it to the user.\n\n**STEP 1 \u2014 Write the HTML report (tool calls, NO text to user yet):**\n\n1. Compose a complete, self-contained HTML document with embedded CSS styling.\n Use a clean, readable design: max-width container, pleasant typography,\n numbered citation links, a table of contents, and a references section.\n\n Report structure inside the HTML:\n - Title & date\n - Executive Summary (2-3 paragraphs)\n - Table of Contents\n - Findings (organized by theme, with [n] citation links)\n - Analysis (synthesis, implications, areas of debate)\n - Conclusion (key takeaways, confidence assessment)\n - References (numbered list with clickable URLs)\n\n Requirements:\n - Every factual claim must cite its source with [n] notation\n - Be objective \u2014 present multiple viewpoints where sources disagree\n - Distinguish well-supported conclusions from speculation\n - Answer the original research questions from the brief\n\n2. Save the HTML file:\n save_data(filename=\"report.html\", data=<your_html>)\n\n3. Get the clickable link:\n serve_file_to_user(filename=\"report.html\", label=\"Research Report\")\n\n**STEP 2 \u2014 Present the link to the user (text only, NO tool calls):**\n\nTell the user the report is ready and include the file:// URI from\nserve_file_to_user so they can click it to open. Give a brief summary\nof what the report covers. Ask if they have questions.\n\nAfter presenting the link, call ask_user() to wait for the user's response.\n\n**STEP 3 \u2014 After the user responds:**\n- Answer follow-up questions from the research material\n- Call ask_user() again if they might have more questions\n- When the user is satisfied: set_output(\"delivery_status\", \"completed\")",
"tools": [
"save_data",
"serve_file_to_user",
"load_data",
"list_data_files"
],
"model": null,
"function": null,
"routes": {},
"max_retries": 3,
"retry_on": [],
"max_node_visits": 1,
"output_model": null,
"max_validation_retries": 2,
"client_facing": true
}
],
"edges": [
{
"id": "intake-to-research",
"source": "intake",
"target": "research",
"condition": "on_success",
"condition_expr": null,
"priority": 1,
"input_mapping": {}
},
{
"id": "research-to-review",
"source": "research",
"target": "review",
"condition": "on_success",
"condition_expr": null,
"priority": 1,
"input_mapping": {}
},
{
"id": "review-to-research-feedback",
"source": "review",
"target": "research",
"condition": "conditional",
"condition_expr": "str(needs_more_research).lower() == 'true'",
"priority": 2,
"input_mapping": {}
},
{
"id": "review-to-report",
"source": "review",
"target": "report",
"condition": "conditional",
"condition_expr": "str(needs_more_research).lower() != 'true'",
"priority": 1,
"input_mapping": {}
}
],
"max_steps": 100,
"max_retries_per_node": 3,
"description": "Interactive research agent that rigorously investigates topics through multi-source search, quality evaluation, and synthesis - with TUI conversation at key checkpoints for user guidance and feedback.",
"created_at": "2026-02-06T00:00:00.000000"
},
"goal": {
"id": "rigorous-interactive-research",
"name": "Rigorous Interactive Research",
"description": "Research any topic by searching diverse sources, analyzing findings, and producing a cited report \u2014 with user checkpoints to guide direction.",
"status": "draft",
"success_criteria": [
{
"id": "source-diversity",
"description": "Use multiple diverse, authoritative sources",
"metric": "source_count",
"target": ">=5",
"weight": 0.25,
"met": false
},
{
"id": "citation-coverage",
"description": "Every factual claim in the report cites its source",
"metric": "citation_coverage",
"target": "100%",
"weight": 0.25,
"met": false
},
{
"id": "user-satisfaction",
"description": "User reviews findings before report generation",
"metric": "user_approval",
"target": "true",
"weight": 0.25,
"met": false
},
{
"id": "report-completeness",
"description": "Final report answers the original research questions",
"metric": "question_coverage",
"target": "90%",
"weight": 0.25,
"met": false
}
],
"constraints": [
{
"id": "no-hallucination",
"description": "Only include information found in fetched sources",
"constraint_type": "quality",
"category": "accuracy",
"check": ""
},
{
"id": "source-attribution",
"description": "Every claim must cite its source with a numbered reference",
"constraint_type": "quality",
"category": "accuracy",
"check": ""
},
{
"id": "user-checkpoint",
"description": "Present findings to the user before writing the final report",
"constraint_type": "functional",
"category": "interaction",
"check": ""
}
],
"context": {},
"required_capabilities": [],
"input_schema": {},
"output_schema": {},
"version": "1.0.0",
"parent_version": null,
"evolution_reason": null,
"created_at": "2026-02-06 00:00:00.000000",
"updated_at": "2026-02-06 00:00:00.000000"
},
"required_tools": [
"list_data_files",
"load_data",
"save_data",
"serve_file_to_user",
"web_scrape",
"web_search"
],
"metadata": {
"created_at": "2026-02-06T00:00:00.000000",
"node_count": 4,
"edge_count": 4
}
}
@@ -102,23 +102,23 @@ edges = [
condition=EdgeCondition.ON_SUCCESS, condition=EdgeCondition.ON_SUCCESS,
priority=1, priority=1,
), ),
# review -> research (feedback loop) # review -> research (feedback loop, checked first)
EdgeSpec( EdgeSpec(
id="review-to-research-feedback", id="review-to-research-feedback",
source="review", source="review",
target="research", target="research",
condition=EdgeCondition.CONDITIONAL, condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == True", condition_expr="str(needs_more_research).lower() == 'true'",
priority=1, priority=2,
), ),
# review -> report (user satisfied) # review -> report (complementary condition — proceed to report when no more research needed)
EdgeSpec( EdgeSpec(
id="review-to-report", id="review-to-report",
source="review", source="review",
target="report", target="report",
condition=EdgeCondition.CONDITIONAL, condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == False", condition_expr="str(needs_more_research).lower() != 'true'",
priority=2, priority=1,
), ),
] ]
@@ -1,33 +1,8 @@
"""Runtime configuration.""" """Runtime configuration."""
import json from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
from framework.config import RuntimeConfig
default_config = RuntimeConfig() default_config = RuntimeConfig()
@@ -23,6 +23,8 @@ Have a brief conversation to clarify what they need.
Keep it short. Don't over-ask. Keep it short. Don't over-ask.
After your message, call ask_user() to wait for the user's response.
**STEP 2 After the user confirms, call set_output:** **STEP 2 After the user confirms, call set_output:**
- set_output("research_brief", "A clear paragraph describing exactly what to research, \ - set_output("research_brief", "A clear paragraph describing exactly what to research, \
what questions to answer, what scope to cover, and how deep to go.") what questions to answer, what scope to cover, and how deep to go.")
@@ -93,6 +95,8 @@ Present the research findings to the user clearly and concisely.
End by asking: Are they satisfied, or do they want deeper research? \ End by asking: Are they satisfied, or do they want deeper research? \
Should we proceed to writing the final report? Should we proceed to writing the final report?
After your presentation, call ask_user() to wait for the user's response.
**STEP 2 After the user responds, call set_output:** **STEP 2 After the user responds, call set_output:**
- set_output("needs_more_research", "true") if they want more - set_output("needs_more_research", "true") if they want more
- set_output("needs_more_research", "false") if they're satisfied - set_output("needs_more_research", "false") if they're satisfied
@@ -147,8 +151,11 @@ Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions. of what the report covers. Ask if they have questions.
After presenting the link, call ask_user() to wait for the user's response.
**STEP 3 After the user responds:** **STEP 3 After the user responds:**
- Answer follow-up questions from the research material - Answer follow-up questions from the research material
- Call ask_user() again if they might have more questions
- When the user is satisfied: set_output("delivery_status", "completed") - When the user is satisfied: set_output("delivery_status", "completed")
""", """,
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"], tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
@@ -1,33 +1,8 @@
"""Runtime configuration.""" """Runtime configuration."""
import json from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
from framework.config import RuntimeConfig
default_config = RuntimeConfig() default_config = RuntimeConfig()
+2 -27
View File
@@ -1,33 +1,8 @@
"""Runtime configuration.""" """Runtime configuration."""
import json from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
from framework.config import RuntimeConfig
default_config = RuntimeConfig() default_config = RuntimeConfig()
+241 -10
View File
@@ -313,6 +313,65 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
["deepseek"]="deepseek-chat" ["deepseek"]="deepseek-chat"
) )
# Model choices per provider: composite-key associative arrays
# Keys: "provider:index" -> value
declare -A MODEL_CHOICES_ID=(
["anthropic:0"]="claude-opus-4-6"
["anthropic:1"]="claude-sonnet-4-5-20250929"
["anthropic:2"]="claude-sonnet-4-20250514"
["anthropic:3"]="claude-haiku-4-5-20251001"
["openai:0"]="gpt-5.2"
["openai:1"]="gpt-5-mini"
["openai:2"]="gpt-5-nano"
["gemini:0"]="gemini-3-flash-preview"
["gemini:1"]="gemini-3-pro-preview"
["groq:0"]="moonshotai/kimi-k2-instruct-0905"
["groq:1"]="openai/gpt-oss-120b"
["cerebras:0"]="zai-glm-4.7"
["cerebras:1"]="qwen3-235b-a22b-instruct-2507"
)
declare -A MODEL_CHOICES_LABEL=(
["anthropic:0"]="Opus 4.6 - Most capable (recommended)"
["anthropic:1"]="Sonnet 4.5 - Best balance"
["anthropic:2"]="Sonnet 4 - Fast + capable"
["anthropic:3"]="Haiku 4.5 - Fast + cheap"
["openai:0"]="GPT-5.2 - Most capable (recommended)"
["openai:1"]="GPT-5 Mini - Fast + cheap"
["openai:2"]="GPT-5 Nano - Fastest"
["gemini:0"]="Gemini 3 Flash - Fast (recommended)"
["gemini:1"]="Gemini 3 Pro - Best quality"
["groq:0"]="Kimi K2 - Best quality (recommended)"
["groq:1"]="GPT-OSS 120B - Fast reasoning"
["cerebras:0"]="ZAI-GLM 4.7 - Best quality (recommended)"
["cerebras:1"]="Qwen3 235B - Frontier reasoning"
)
# NOTE: 8192 should match DEFAULT_MAX_TOKENS in core/framework/graph/edge.py
declare -A MODEL_CHOICES_MAXTOKENS=(
["anthropic:0"]=8192
["anthropic:1"]=8192
["anthropic:2"]=8192
["anthropic:3"]=8192
["openai:0"]=16384
["openai:1"]=16384
["openai:2"]=16384
["gemini:0"]=8192
["gemini:1"]=8192
["groq:0"]=8192
["groq:1"]=8192
["cerebras:0"]=8192
["cerebras:1"]=8192
)
declare -A MODEL_CHOICES_COUNT=(
["anthropic"]=4
["openai"]=3
["gemini"]=2
["groq"]=2
["cerebras"]=2
)
# Helper functions for Bash 4+ # Helper functions for Bash 4+
get_provider_name() { get_provider_name() {
echo "${PROVIDER_NAMES[$1]}" echo "${PROVIDER_NAMES[$1]}"
@@ -325,6 +384,22 @@ if [ "$USE_ASSOC_ARRAYS" = true ]; then
get_default_model() { get_default_model() {
echo "${DEFAULT_MODELS[$1]}" echo "${DEFAULT_MODELS[$1]}"
} }
get_model_choice_count() {
echo "${MODEL_CHOICES_COUNT[$1]:-0}"
}
get_model_choice_id() {
echo "${MODEL_CHOICES_ID[$1:$2]}"
}
get_model_choice_label() {
echo "${MODEL_CHOICES_LABEL[$1:$2]}"
}
get_model_choice_maxtokens() {
echo "${MODEL_CHOICES_MAXTOKENS[$1:$2]}"
}
else else
# Bash 3.2 - use parallel indexed arrays # Bash 3.2 - use parallel indexed arrays
PROVIDER_ENV_VARS=(ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY GOOGLE_API_KEY GROQ_API_KEY CEREBRAS_API_KEY MISTRAL_API_KEY TOGETHER_API_KEY DEEPSEEK_API_KEY) PROVIDER_ENV_VARS=(ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY GOOGLE_API_KEY GROQ_API_KEY CEREBRAS_API_KEY MISTRAL_API_KEY TOGETHER_API_KEY DEEPSEEK_API_KEY)
@@ -333,7 +408,7 @@ else
# Default models by provider id (parallel arrays) # Default models by provider id (parallel arrays)
MODEL_PROVIDER_IDS=(anthropic openai gemini groq cerebras mistral together_ai deepseek) MODEL_PROVIDER_IDS=(anthropic openai gemini groq cerebras mistral together_ai deepseek)
MODEL_DEFAULTS=("claude-sonnet-4-5-20250929" "gpt-4o" "gemini-3.0-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat") MODEL_DEFAULTS=("claude-opus-4-6" "gpt-5.2" "gemini-3-flash-preview" "moonshotai/kimi-k2-instruct-0905" "zai-glm-4.7" "mistral-large-latest" "meta-llama/Llama-3.3-70B-Instruct-Turbo" "deepseek-chat")
# Helper: get provider display name for an env var # Helper: get provider display name for an env var
get_provider_name() { get_provider_name() {
@@ -373,6 +448,82 @@ else
i=$((i + 1)) i=$((i + 1))
done done
} }
# Model choices per provider - flat parallel arrays with provider offsets
# Provider order: anthropic(4), openai(3), gemini(2), groq(2), cerebras(2)
MC_PROVIDERS=(anthropic anthropic anthropic anthropic openai openai openai gemini gemini groq groq cerebras cerebras)
MC_IDS=("claude-opus-4-6" "claude-sonnet-4-5-20250929" "claude-sonnet-4-20250514" "claude-haiku-4-5-20251001" "gpt-5.2" "gpt-5-mini" "gpt-5-nano" "gemini-3-flash-preview" "gemini-3-pro-preview" "moonshotai/kimi-k2-instruct-0905" "openai/gpt-oss-120b" "zai-glm-4.7" "qwen3-235b-a22b-instruct-2507")
MC_LABELS=("Opus 4.6 - Most capable (recommended)" "Sonnet 4.5 - Best balance" "Sonnet 4 - Fast + capable" "Haiku 4.5 - Fast + cheap" "GPT-5.2 - Most capable (recommended)" "GPT-5 Mini - Fast + cheap" "GPT-5 Nano - Fastest" "Gemini 3 Flash - Fast (recommended)" "Gemini 3 Pro - Best quality" "Kimi K2 - Best quality (recommended)" "GPT-OSS 120B - Fast reasoning" "ZAI-GLM 4.7 - Best quality (recommended)" "Qwen3 235B - Frontier reasoning")
# NOTE: 8192 should match DEFAULT_MAX_TOKENS in core/framework/graph/edge.py
MC_MAXTOKENS=(8192 8192 8192 8192 16384 16384 16384 8192 8192 8192 8192 8192 8192)
# Helper: get number of model choices for a provider
get_model_choice_count() {
local provider_id="$1"
local count=0
local i=0
while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
count=$((count + 1))
fi
i=$((i + 1))
done
echo "$count"
}
# Helper: get model choice id by provider and index (0-based within provider)
get_model_choice_id() {
local provider_id="$1"
local idx="$2"
local count=0
local i=0
while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
if [ $count -eq "$idx" ]; then
echo "${MC_IDS[$i]}"
return
fi
count=$((count + 1))
fi
i=$((i + 1))
done
}
# Helper: get model choice label by provider and index
get_model_choice_label() {
local provider_id="$1"
local idx="$2"
local count=0
local i=0
while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
if [ $count -eq "$idx" ]; then
echo "${MC_LABELS[$i]}"
return
fi
count=$((count + 1))
fi
i=$((i + 1))
done
}
# Helper: get model choice max_tokens by provider and index
get_model_choice_maxtokens() {
local provider_id="$1"
local idx="$2"
local count=0
local i=0
while [ $i -lt ${#MC_PROVIDERS[@]} ]; do
if [ "${MC_PROVIDERS[$i]}" = "$provider_id" ]; then
if [ $count -eq "$idx" ]; then
echo "${MC_MAXTOKENS[$i]}"
return
fi
count=$((count + 1))
fi
i=$((i + 1))
done
}
fi fi
# Configuration directory # Configuration directory
@@ -411,12 +562,74 @@ detect_shell_rc() {
SHELL_RC_FILE=$(detect_shell_rc) SHELL_RC_FILE=$(detect_shell_rc)
SHELL_NAME=$(basename "$SHELL") SHELL_NAME=$(basename "$SHELL")
# Prompt the user to choose a model for their selected provider.
# Sets SELECTED_MODEL and SELECTED_MAX_TOKENS.
prompt_model_selection() {
local provider_id="$1"
local count
count="$(get_model_choice_count "$provider_id")"
if [ "$count" -eq 0 ]; then
# No curated choices for this provider (e.g. Mistral, DeepSeek)
SELECTED_MODEL="$(get_default_model "$provider_id")"
SELECTED_MAX_TOKENS=8192
return
fi
if [ "$count" -eq 1 ]; then
# Only one choice — auto-select
SELECTED_MODEL="$(get_model_choice_id "$provider_id" 0)"
SELECTED_MAX_TOKENS="$(get_model_choice_maxtokens "$provider_id" 0)"
return
fi
# Multiple choices — show menu
echo ""
echo -e "${BOLD}Select a model:${NC}"
echo ""
local i=0
while [ $i -lt "$count" ]; do
local label
label="$(get_model_choice_label "$provider_id" "$i")"
local mid
mid="$(get_model_choice_id "$provider_id" "$i")"
local num=$((i + 1))
echo -e " ${CYAN}$num)${NC} $label ${DIM}($mid)${NC}"
i=$((i + 1))
done
echo ""
local choice
while true; do
read -r -p "Enter choice [1]: " choice
choice="${choice:-1}"
if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "$count" ]; then
local idx=$((choice - 1))
SELECTED_MODEL="$(get_model_choice_id "$provider_id" "$idx")"
SELECTED_MAX_TOKENS="$(get_model_choice_maxtokens "$provider_id" "$idx")"
echo ""
echo -e "${GREEN}${NC} Model: ${DIM}$SELECTED_MODEL${NC}"
return
fi
echo -e "${RED}Invalid choice. Please enter 1-$count${NC}"
done
}
# Function to save configuration # Function to save configuration
save_configuration() { save_configuration() {
local provider_id="$1" local provider_id="$1"
local env_var="$2" local env_var="$2"
local model local model="$3"
model="$(get_default_model "$provider_id")" local max_tokens="$4"
# Fallbacks if not provided
if [ -z "$model" ]; then
model="$(get_default_model "$provider_id")"
fi
if [ -z "$max_tokens" ]; then
max_tokens=8192
fi
mkdir -p "$HIVE_CONFIG_DIR" mkdir -p "$HIVE_CONFIG_DIR"
@@ -426,6 +639,7 @@ config = {
'llm': { 'llm': {
'provider': '$provider_id', 'provider': '$provider_id',
'model': '$model', 'model': '$model',
'max_tokens': $max_tokens,
'api_key_env_var': '$env_var' 'api_key_env_var': '$env_var'
}, },
'created_at': '$(date -u +"%Y-%m-%dT%H:%M:%S+00:00")' 'created_at': '$(date -u +"%Y-%m-%dT%H:%M:%S+00:00")'
@@ -449,6 +663,8 @@ FOUND_PROVIDERS=() # Display names for UI
FOUND_ENV_VARS=() # Corresponding env var names FOUND_ENV_VARS=() # Corresponding env var names
SELECTED_PROVIDER_ID="" # Will hold the chosen provider ID SELECTED_PROVIDER_ID="" # Will hold the chosen provider ID
SELECTED_ENV_VAR="" # Will hold the chosen env var SELECTED_ENV_VAR="" # Will hold the chosen env var
SELECTED_MODEL="" # Will hold the chosen model ID
SELECTED_MAX_TOKENS=8192 # Will hold the chosen max_tokens
if [ "$USE_ASSOC_ARRAYS" = true ]; then if [ "$USE_ASSOC_ARRAYS" = true ]; then
# Bash 4+ - iterate over associative array keys # Bash 4+ - iterate over associative array keys
@@ -486,6 +702,8 @@ if [ ${#FOUND_PROVIDERS[@]} -gt 0 ]; then
echo "" echo ""
echo -e "${GREEN}${NC} Using ${FOUND_PROVIDERS[0]}" echo -e "${GREEN}${NC} Using ${FOUND_PROVIDERS[0]}"
prompt_model_selection "$SELECTED_PROVIDER_ID"
fi fi
else else
# Multiple providers found, let user pick one # Multiple providers found, let user pick one
@@ -498,28 +716,34 @@ if [ ${#FOUND_PROVIDERS[@]} -gt 0 ]; then
echo -e " ${CYAN}$i)${NC} $provider" echo -e " ${CYAN}$i)${NC} $provider"
i=$((i + 1)) i=$((i + 1))
done done
echo -e " ${CYAN}$i)${NC} Other"
max_choice=$i
echo "" echo ""
while true; do while true; do
read -r -p "Enter choice (1-${#FOUND_PROVIDERS[@]}): " choice read -r -p "Enter choice (1-$max_choice): " choice
if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "${#FOUND_PROVIDERS[@]}" ]; then if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -ge 1 ] && [ "$choice" -le "$max_choice" ]; then
if [ "$choice" -eq "$max_choice" ]; then
# Fall through to the manual provider selection below
break
fi
idx=$((choice - 1)) idx=$((choice - 1))
SELECTED_ENV_VAR="${FOUND_ENV_VARS[$idx]}" SELECTED_ENV_VAR="${FOUND_ENV_VARS[$idx]}"
SELECTED_PROVIDER_ID="$(get_provider_id "$SELECTED_ENV_VAR")" SELECTED_PROVIDER_ID="$(get_provider_id "$SELECTED_ENV_VAR")"
echo "" echo ""
echo -e "${GREEN}${NC} Selected: ${FOUND_PROVIDERS[$idx]}" echo -e "${GREEN}${NC} Selected: ${FOUND_PROVIDERS[$idx]}"
prompt_model_selection "$SELECTED_PROVIDER_ID"
break break
fi fi
echo -e "${RED}Invalid choice. Please enter 1-${#FOUND_PROVIDERS[@]}${NC}" echo -e "${RED}Invalid choice. Please enter 1-$max_choice${NC}"
done done
fi fi
fi fi
if [ -z "$SELECTED_PROVIDER_ID" ]; then if [ -z "$SELECTED_PROVIDER_ID" ]; then
echo "No API keys found. Let's configure one."
echo "" echo ""
prompt_choice "Select your LLM provider:" \ prompt_choice "Select your LLM provider:" \
"Anthropic (Claude) - Recommended" \ "Anthropic (Claude) - Recommended" \
"OpenAI (GPT)" \ "OpenAI (GPT)" \
@@ -595,11 +819,16 @@ if [ -z "$SELECTED_PROVIDER_ID" ]; then
fi fi
fi fi
# Prompt for model if not already selected (manual provider path)
if [ -n "$SELECTED_PROVIDER_ID" ] && [ -z "$SELECTED_MODEL" ]; then
prompt_model_selection "$SELECTED_PROVIDER_ID"
fi
# Save configuration if a provider was selected # Save configuration if a provider was selected
if [ -n "$SELECTED_PROVIDER_ID" ]; then if [ -n "$SELECTED_PROVIDER_ID" ]; then
echo "" echo ""
echo -n " Saving configuration... " echo -n " Saving configuration... "
save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" > /dev/null save_configuration "$SELECTED_PROVIDER_ID" "$SELECTED_ENV_VAR" "$SELECTED_MODEL" "$SELECTED_MAX_TOKENS" > /dev/null
echo -e "${GREEN}${NC}" echo -e "${GREEN}${NC}"
echo -e " ${DIM}~/.hive/configuration.json${NC}" echo -e " ${DIM}~/.hive/configuration.json${NC}"
fi fi
@@ -791,7 +1020,9 @@ echo ""
# Show configured provider # Show configured provider
if [ -n "$SELECTED_PROVIDER_ID" ]; then if [ -n "$SELECTED_PROVIDER_ID" ]; then
SELECTED_MODEL="$(get_default_model "$SELECTED_PROVIDER_ID")" if [ -z "$SELECTED_MODEL" ]; then
SELECTED_MODEL="$(get_default_model "$SELECTED_PROVIDER_ID")"
fi
echo -e "${BOLD}Default LLM:${NC}" echo -e "${BOLD}Default LLM:${NC}"
echo -e " ${CYAN}$SELECTED_PROVIDER_ID${NC}${DIM}$SELECTED_MODEL${NC}" echo -e " ${CYAN}$SELECTED_PROVIDER_ID${NC}${DIM}$SELECTED_MODEL${NC}"
echo "" echo ""
@@ -353,7 +353,18 @@ class CredentialStoreAdapter:
cls, cls,
specs: dict[str, CredentialSpec] | None = None, specs: dict[str, CredentialSpec] | None = None,
) -> CredentialStoreAdapter: ) -> CredentialStoreAdapter:
"""Create adapter with encrypted storage primary and env var fallback.""" """Create adapter with encrypted storage primary and env var fallback.
When ADEN_API_KEY is set, builds the store with AdenSyncProvider and
AdenCachedStorage so that OAuth credentials (Google, HubSpot, Slack)
auto-refresh via the Aden server. Non-Aden credentials (brave_search,
anthropic, resend) still resolve from environment variables.
When ADEN_API_KEY is not set, behaves identically to before.
"""
import logging
import os
from framework.credentials import CredentialStore from framework.credentials import CredentialStore
from framework.credentials.storage import ( from framework.credentials.storage import (
CompositeStorage, CompositeStorage,
@@ -361,6 +372,8 @@ class CredentialStoreAdapter:
EnvVarStorage, EnvVarStorage,
) )
log = logging.getLogger(__name__)
if specs is None: if specs is None:
from . import CREDENTIAL_SPECS from . import CREDENTIAL_SPECS
@@ -368,17 +381,69 @@ class CredentialStoreAdapter:
env_mapping = {name: spec.env_var for name, spec in specs.items()} env_mapping = {name: spec.env_var for name, spec in specs.items()}
# --- Aden sync branch ---
# Note: we don't use CredentialStore.with_aden_sync() here because it
# only wraps EncryptedFileStorage. We need CompositeStorage (encrypted
# + env var fallback) so non-Aden credentials like brave_search still
# resolve from environment variables.
aden_api_key = os.environ.get("ADEN_API_KEY")
if aden_api_key:
try:
from framework.credentials.aden import (
AdenCachedStorage,
AdenClientConfig,
AdenCredentialClient,
AdenSyncProvider,
)
# Local storage: encrypted primary + env var fallback
encrypted = EncryptedFileStorage()
env = EnvVarStorage(env_mapping)
local_composite = CompositeStorage(primary=encrypted, fallbacks=[env])
# Aden components
client = AdenCredentialClient(
AdenClientConfig(
base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
)
)
provider = AdenSyncProvider(client=client)
# AdenCachedStorage wraps composite, giving Aden priority
cached_storage = AdenCachedStorage(
local_storage=local_composite,
aden_provider=provider,
cache_ttl_seconds=300,
)
store = CredentialStore(
storage=cached_storage,
providers=[provider],
auto_refresh=True,
)
# Initial sync: populate local cache from Aden
try:
synced = provider.sync_all(store)
log.info("Aden credential sync complete: %d credentials synced", synced)
except Exception as e:
log.warning("Aden initial sync failed (will retry on access): %s", e)
return cls(store=store, specs=specs)
except Exception as e:
log.warning(
"Aden credential sync unavailable, falling back to default storage: %s", e
)
# --- Default branch (no ADEN_API_KEY or Aden setup failed) ---
try: try:
encrypted = EncryptedFileStorage() encrypted = EncryptedFileStorage()
env = EnvVarStorage(env_mapping) env = EnvVarStorage(env_mapping)
composite = CompositeStorage(primary=encrypted, fallbacks=[env]) composite = CompositeStorage(primary=encrypted, fallbacks=[env])
store = CredentialStore(storage=composite) store = CredentialStore(storage=composite)
except Exception as e: except Exception as e:
import logging log.warning("Encrypted credential storage unavailable, falling back to env vars: %s", e)
logging.getLogger(__name__).warning(
"Encrypted credential storage unavailable, falling back to env vars: %s", e
)
store = CredentialStore.with_env_storage(env_mapping) store = CredentialStore.with_env_storage(env_mapping)
return cls(store=store, specs=specs) return cls(store=store, specs=specs)
+129
View File
@@ -1,5 +1,7 @@
"""Tests for CredentialStoreAdapter.""" """Tests for CredentialStoreAdapter."""
from unittest.mock import MagicMock, patch
import pytest import pytest
from aden_tools.credentials import ( from aden_tools.credentials import (
@@ -484,3 +486,130 @@ class TestSpecCompleteness:
assert spec.credential_group == "", ( assert spec.credential_group == "", (
f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'" f"Credential '{name}' has unexpected credential_group='{spec.credential_group}'"
) )
class TestCredentialStoreAdapterAdenSync:
"""Tests for Aden sync branch in CredentialStoreAdapter.default()."""
def _patch_encrypted_storage(self, tmp_path):
"""Patch EncryptedFileStorage to use a temp directory."""
from framework.credentials.storage import EncryptedFileStorage
original_init = EncryptedFileStorage.__init__
def patched_init(self_inner, base_path=None, **kwargs):
original_init(self_inner, base_path=str(tmp_path / "creds"), **kwargs)
return patch.object(EncryptedFileStorage, "__init__", patched_init)
def test_default_with_aden_key_creates_aden_store(self, monkeypatch, tmp_path):
"""When ADEN_API_KEY is set, default() wires up AdenSyncProvider."""
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
mock_client = MagicMock()
mock_client.list_integrations.return_value = []
with (
self._patch_encrypted_storage(tmp_path),
patch(
"framework.credentials.aden.AdenCredentialClient",
return_value=mock_client,
),
patch(
"framework.credentials.aden.AdenClientConfig",
),
):
adapter = CredentialStoreAdapter.default()
# Verify AdenSyncProvider is registered
provider = adapter.store.get_provider("aden_sync")
assert provider is not None
def test_default_without_aden_key_uses_env_fallback(self, monkeypatch, tmp_path):
"""When ADEN_API_KEY is not set, default() uses env-only storage."""
monkeypatch.delenv("ADEN_API_KEY", raising=False)
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "test-brave-key")
with self._patch_encrypted_storage(tmp_path):
adapter = CredentialStoreAdapter.default()
# No Aden provider should be registered
assert adapter.store.get_provider("aden_sync") is None
# Env vars still work
assert adapter.get("brave_search") == "test-brave-key"
def test_default_aden_non_aden_cred_falls_through_to_env(self, monkeypatch, tmp_path):
"""Non-Aden credentials (e.g. brave_search) resolve from env vars even with Aden."""
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-from-env")
mock_client = MagicMock()
mock_client.list_integrations.return_value = []
# Aden returns None for brave_search (404 → None)
mock_client.get_credential.return_value = None
with (
self._patch_encrypted_storage(tmp_path),
patch(
"framework.credentials.aden.AdenCredentialClient",
return_value=mock_client,
),
patch(
"framework.credentials.aden.AdenClientConfig",
),
):
adapter = CredentialStoreAdapter.default()
assert adapter.get("brave_search") == "brave-from-env"
def test_default_aden_sync_failure_falls_back_gracefully(self, monkeypatch, tmp_path):
"""If Aden initial sync fails, adapter is still created and env vars work."""
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
monkeypatch.setenv("ADEN_API_URL", "https://test.adenhq.com")
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
mock_client = MagicMock()
mock_client.list_integrations.side_effect = Exception("Connection refused")
mock_client.get_credential.return_value = None
with (
self._patch_encrypted_storage(tmp_path),
patch(
"framework.credentials.aden.AdenCredentialClient",
return_value=mock_client,
),
patch(
"framework.credentials.aden.AdenClientConfig",
),
):
adapter = CredentialStoreAdapter.default()
# Adapter was created despite sync failure
assert adapter is not None
assert adapter.get("brave_search") == "brave-fallback"
def test_default_aden_import_error_falls_back(self, monkeypatch, tmp_path):
"""If Aden imports fail (e.g. missing httpx), fall back to default storage."""
monkeypatch.setenv("ADEN_API_KEY", "test-aden-key")
monkeypatch.setenv("BRAVE_SEARCH_API_KEY", "brave-fallback")
import builtins
real_import = builtins.__import__
def mock_import(name, *args, **kwargs):
if name == "framework.credentials.aden":
raise ImportError(f"No module named '{name}'")
return real_import(name, *args, **kwargs)
with (
self._patch_encrypted_storage(tmp_path),
patch.object(builtins, "__import__", side_effect=mock_import),
):
adapter = CredentialStoreAdapter.default()
# Fell back to default — env vars still work, no Aden provider
assert adapter.store.get_provider("aden_sync") is None
assert adapter.get("brave_search") == "brave-fallback"