Compare commits

...

146 Commits

Author SHA1 Message Date
Richard Tang b42a3293f1 docs: change docs tune 2026-02-10 19:37:14 -08:00
Richard Tang ba02e53bdd docs: update the use cases 2026-02-10 18:15:40 -08:00
Richard Tang 40d32f2e01 docs: deployment strategies 2026-02-10 18:02:08 -08:00
Richard Tang 7779bc5336 docs: use cases for first success 2026-02-10 17:42:05 -08:00
Richard Tang a2d21ec7bc docs: update the developer profiles 2026-02-10 13:10:59 -08:00
Richard Tang 06ccc853ee docs: explain developer success as our principle 2026-02-10 13:05:54 -08:00
Richard Tang 4847332161 add placeholders 2026-02-10 13:01:22 -08:00
Richard Tang 8c1ee54725 docs: publish our developer success roadmap 2026-02-10 12:56:54 -08:00
Timothy @aden a12163d63f Merge pull request #4304 from adenhq/fix/init-config
Release / Create Release (push) Waiting to run
model selection + max_tokens in quickstart
2026-02-09 20:11:55 -08:00
RichardTang-Aden 0cd6f21980 Merge pull request #4270 from TimothyZhang7/feature/hard-goal-negotiation
Feature/hard goal negotiation
2026-02-09 20:04:20 -08:00
Richard Tang a88fc1d75c fix: remove the unnecessary summary before checking capabilities and gaps 2026-02-09 19:59:49 -08:00
Richard Tang e9bde26611 fix: fixed minor issues introduced by the merge 2026-02-09 19:45:55 -08:00
Richard Tang c02f40622c Merge remote-tracking branch 'upstream/main' into feature/hard-goal-negotiation 2026-02-09 19:42:55 -08:00
Timothy @aden 3328a388b3 Merge pull request #3877 from adenhq/fix/oauth-refresh
(micro-fix): update oauth to refresh token
2026-02-09 19:30:49 -08:00
Richard Tang 8f632eb005 feat: add communication style guideline 2026-02-09 19:28:48 -08:00
Richard Tang c8ee961436 fix: update the step label to avoid confusion 2026-02-09 19:04:05 -08:00
Richard Tang bc9f6b0af8 feat: update goal negotiation for a more conversational negotiation 2026-02-09 18:52:07 -08:00
bryan 7d48f17867 model selection + max_tokens in quickstart 2026-02-09 18:07:57 -08:00
RichardTang-Aden 736ae65a1d Merge pull request #4262 from adenhq/feat/build-from-sample
Build from Sample Agent
2026-02-09 16:05:42 -08:00
Bryan @ Aden 76c9f7c9a9 Merge pull request #1834 from fermano/feat/observability-trace-context
feat(observability): structured logging for trace context
2026-02-09 15:25:51 -08:00
Fernando Mano 32ad225d7f feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. -- remove redundant text from readme.md 2026-02-09 19:56:17 -03:00
bryan 7ae6f67470 updates to skills, renaming, suggested agents, remove changelog 2026-02-09 13:49:36 -08:00
Timothy @aden 594bceb8f5 Merge branch 'adenhq:main' into feature/hard-goal-negotiation 2026-02-09 12:28:19 -08:00
bryan 9dc0f48ec9 implemented building from sample agent template and updated deep research agent 2026-02-09 12:13:41 -08:00
Fernando Mano ce5a2d4a81 feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. -- remove line that would cause third-party loggers to log twice 2026-02-09 09:36:25 -03:00
Fernando Mano 7f489cee46 Merge branch 'main' into feat/observability-trace-context 2026-02-09 09:25:51 -03:00
Anjali Yadav 3c2d669a2f fix(credentials): correctly resolve integration_id in AdenCredentialResponse.from_dict (#3965)
* fix(credentials): respect integration_id in AdenCredentialResponse.from_dict

* style: fix forward reference annotation for Ruff
2026-02-09 17:52:55 +08:00
Timothy @aden ec36e96499 Merge pull request #4146 from TimothyZhang7/main
docs(release): release v0.4.2 - resumable sessions
2026-02-08 20:49:59 -08:00
Timothy 9ecd4980e4 chore: release v0.4.2 - resumable sessions
Release / Create Release (push) Waiting to run
- Add comprehensive resumable session functionality
- Immediate pause with Ctrl+Z and /pause command
- Auto-save state on quit
- Session management with /resume and /sessions commands
- Full memory and conversation history restoration
- See CHANGELOG.md for complete list of changes
2026-02-08 20:44:36 -08:00
Timothy @aden 64446ff9b6 Merge pull request #4141 from TimothyZhang7/feature/resumable-sessions
Feature/resumable sessions

Release candidate for v0.4.2
2026-02-08 20:40:33 -08:00
Timothy e3d2262292 fix: quit timeout, and tui interactions 2026-02-08 20:30:30 -08:00
Timothy 891cfa387a Merge branch 'main' into feature/resumable-sessions 2026-02-08 19:46:30 -08:00
Timothy f0243fddf2 feat: session resumable states and checkpoint system 2026-02-08 19:42:02 -08:00
Bryan @ Aden 85ff8e364b Merge pull request #3828 from Sandeepa-git/docs/fix-contributing-typo
docs(contributing): fix formatting typo in issue link
2026-02-08 19:07:48 -08:00
Bryan @ Aden 75f1afe8e3 Merge pull request #3857 from Manudeserti/docs/add-deep-research-readme
docs: add missing README for Deep Research Agent
2026-02-08 19:07:40 -08:00
Bryan @ Aden 7b660311e5 Merge pull request #4025 from hamzanajam7/docs/fix-getting-started-project-structure
docs(getting-started): fix project structure tree for tools and mcp_server location
2026-02-08 18:44:24 -08:00
Bryan @ Aden 98a493296d Merge pull request #4026 from hamzanajam7/docs/add-contributing-link-readme
docs(readme): add Contributing link to Quick Links section
2026-02-08 18:43:23 -08:00
RichardTang-Aden bc2a42aed2 Merge pull request #3901 from Templar121/docs/clarify-hive-test-generation
docs: clarify test generation responsibility in hive skill
2026-02-08 14:22:31 -08:00
Gaurav kapur 8b501d9091 fix: write node outputs to memory before edge evaluation (#3599) (#3694)
* fix: write node outputs to memory before edge evaluation (#3599)

* test: add regression tests for conditional edge direct key access
2026-02-08 23:23:37 +08:00
Fernando Mano 0304b392b2 feat(observability): Adding OTel-compliant logging to L3 tool logs as introduced by #3715. 2026-02-07 19:52:03 -03:00
hamzanajam7 ae9b4e82fe docs(readme): add Contributing link to Quick Links section
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 14:52:50 -05:00
hamzanajam7 4bac5e4c46 docs(getting-started): fix project structure tree for tools and mcp_server location
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07 14:49:04 -05:00
Fernando Mano c4d3400ec4 Merge main into feat/observability-trace-context; resolve execution_stream conflicts 2026-02-07 16:49:04 -03:00
Amit Kumar 6d0a3b952a feat(tools): add Apollo.io contact and company data enrichment integration (#3167)
Add Apollo.io MCP tool integration for B2B contact and company data
enrichment. Implements 4 MCP tools:
- apollo_enrich_person: Enrich contact by email, LinkedIn URL, or name+domain
- apollo_enrich_company: Enrich company by domain
- apollo_search_people: Search contacts with filters (titles, seniorities, etc.)
- apollo_search_companies: Search companies with filters (industries, size, etc.)

Features:
- Authentication via X-Api-Key header (APOLLO_API_KEY env var)
- Credential spec in dedicated apollo.py (follows repo pattern)
- Comprehensive error handling (401, 403, 404, 422, 429)
- Full test coverage (36 tests)

Closes #3061
2026-02-07 21:57:13 +08:00
Subhayan Mukherjee 873fcd5822 docs: clarify test generation responsibility in hive skill 2026-02-07 11:39:52 +05:30
RichardTang-Aden 2a98d3a489 Merge pull request #3890 from RichardTang-Aden/update-readme-gifs
docs(readme): quick fix for the doc links
2026-02-06 20:34:34 -08:00
Richard Tang b681ba03b1 chore: quick fix for the doc links 2026-02-06 20:32:20 -08:00
RichardTang-Aden fe775a36c0 Merge pull request #3887 from RichardTang-Aden/update-readme-gifs
Release / Create Release (push) Waiting to run
feat: add video in the README
2026-02-06 20:21:48 -08:00
Timothy @aden 2df9adcb43 Merge pull request #3886 from TimothyZhang7/fix/quickstart-secret-key
fix(micro-fix): quickstart secret key setup
2026-02-06 20:21:06 -08:00
Richard Tang c756cbf6d5 feat: add video in the README 2026-02-06 20:20:53 -08:00
Timothy d0ac67c9d3 fix: quickstart secret key setup 2026-02-06 20:18:12 -08:00
Timothy 47cd55052f feat: hive-create needs to do some hard negotiation 2026-02-06 19:56:05 -08:00
bryan fb203b5bdf update oauth to refresh token 2026-02-06 19:43:30 -08:00
RichardTang-Aden 6ee47e243d Merge pull request #3876 from RichardTang-Aden/update-readme
Docs Update readme
2026-02-06 19:39:16 -08:00
Richard Tang c1844b7a9d docs: improve readme 2026-02-06 19:30:16 -08:00
Richard Tang 99a29e79e5 fix: fix the documentation python run to uv run 2026-02-06 19:22:16 -08:00
Richard Tang 589a66ef26 docs: remove unused docs 2026-02-06 19:19:49 -08:00
RichardTang-Aden 3f960763cb Merge pull request #3875 from RichardTang-Aden/update-readme
Update readme images
2026-02-06 19:08:46 -08:00
Richard Tang 15f8f3783c chore: update images 2026-02-06 19:07:47 -08:00
Richard Tang a2b045c7e3 chore: remove unnecessary links 2026-02-06 18:18:50 -08:00
Richard Tang 055cef2fdc feat: improve quickstart.sh messages 2026-02-06 18:15:13 -08:00
Timothy @aden 6c6c69cbc3 Merge pull request #3872 from TimothyZhang7/refactor/consolidate-multi-level-log-for-tui
docs(path): Align Agent Storage Path to .hive/agents/{agent_name}/
2026-02-06 17:40:39 -08:00
Timothy 6fe0062e6e refactor(path): consolidate tui runner log path 2026-02-06 17:33:32 -08:00
Richard Tang 26b8b2f448 chore: move unused docs 2026-02-06 17:11:13 -08:00
Timothy @aden 7e40d6950a Merge pull request #3871 from TimothyZhang7/main
fix(micro-fix): uv paths in templates
2026-02-06 17:07:19 -08:00
Timothy 590bfa92cb chore: fix mcp server default config 2026-02-06 17:04:03 -08:00
Timothy f0e89a1720 fix: mcp server config with uv 2026-02-06 17:01:42 -08:00
Timothy @aden 575563b1e8 Merge pull request #3870 from adenhq/feat/multi-level-logging
fix: hardening hive cli setup
2026-02-06 16:37:37 -08:00
Timothy 82ea0e47ce fix: hardening hive cli setup 2026-02-06 16:31:31 -08:00
RichardTang-Aden 2f57ca10f7 Merge pull request #3862 from adenhq/feat/hive-tui
(micro-fix): documentation update
2026-02-06 16:19:46 -08:00
RichardTang-Aden 75c2d541c4 Merge branch 'main' into feat/hive-tui 2026-02-06 16:19:30 -08:00
Richard Tang b666f8b50b docs: minor doc update 2026-02-06 16:16:56 -08:00
RichardTang-Aden 09f9322676 Merge pull request #3863 from RichardTang-Aden/fix-remove-old-mock-mode
Fix remove old mock mode
2026-02-06 16:02:01 -08:00
Richard Tang f9a864ef93 fix: remove mock mode in the template 2026-02-06 15:59:48 -08:00
Richard Tang 27f28afe9c fix: remove --mock in the codebase + documentation 2026-02-06 15:59:22 -08:00
Timothy @aden 8f85722fef Merge pull request #3715 from adenhq/feat/multi-level-logging
Feat/multi level logging
2026-02-06 15:59:16 -08:00
bryan 5588445a01 documentation update 2026-02-06 15:59:01 -08:00
Timothy 40529b5722 fix: debugger to instruct on hive tui 2026-02-06 15:56:13 -08:00
Timothy @aden cee632f50c Merge pull request #3855 from adenhq/feat/hive-tui
update tui to support menu, highlight/copy, update quickstart
2026-02-06 15:24:10 -08:00
bryan 3453e3aa05 Merge branch 'feat/hive-tui' into feat/multi-level-logging 2026-02-06 15:21:52 -08:00
Timothy 8de637c421 fix: deprecated tests 2026-02-06 14:00:31 -08:00
Timothy 6c75de862c fix: skip outdated tests 2026-02-06 13:46:12 -08:00
Timothy 2971134882 docs: runtime logging structure 2026-02-06 13:26:53 -08:00
Timothy 6e79860b43 feat: hive debugger skill 2026-02-06 13:22:25 -08:00
Manudeserti 3f6bdda2a0 docs: add missing README for deep_research_agent 2026-02-06 18:11:00 -03:00
bryan 74d0287ec5 update tui to support menu, highlight/copy, update quickstart to include hive tui 2026-02-06 13:10:04 -08:00
RichardTang-Aden 51e81d80fc Merge pull request #3853 from adenhq/docs-key-concepts
Docs key concepts
2026-02-06 12:45:16 -08:00
Richard Tang cd014e41e4 docs: update links in the README.md 2026-02-06 12:44:34 -08:00
Richard Tang 830f11c47d docs: add key concept section 2026-02-06 12:41:22 -08:00
Timothy a73239dd98 feat: runtime log tools 2026-02-06 12:37:18 -08:00
Timothy d68783a612 refactor: unify storage layer for agent runtime 2026-02-06 12:20:46 -08:00
Timothy a28ea40a7d fix: execution log details in error trace 2026-02-06 11:03:19 -08:00
Sandeepa f2492bd4d4 docs(contributing): fix formatting typo in issue link 2026-02-07 00:22:48 +05:30
Timothy @aden b22be7a6cb Merge pull request #3818 from TimothyZhang7/main
(micro-fix)(skills): cursor skill symlinks to claude skill
2026-02-06 09:32:23 -08:00
bryan 5b00445c05 Merge branch 'main' into feat/multi-level-logging 2026-02-05 19:09:18 -08:00
Timothy @aden 5179677e8f Merge pull request #3744 from adenhq/chore/update-hive-credential
(micro-fix): update hive-credentials
2026-02-05 18:55:19 -08:00
bryan 2c25b2eae7 Merge branch 'main' into chore/update-hive-credential 2026-02-05 18:45:11 -08:00
RichardTang-Aden f6705fe2d3 Merge pull request #3746 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix format
2026-02-05 18:36:32 -08:00
Richard Tang c2771fed20 chore: fix format 2026-02-05 18:30:50 -08:00
RichardTang-Aden fc781eccd9 Merge pull request #3745 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix lint
2026-02-05 18:15:38 -08:00
bryan d5a25ae081 update hive-credentials 2026-02-05 18:13:25 -08:00
Richard Tang 23b6fb6391 chore: fix lint 2026-02-05 18:12:47 -08:00
Timothy 433967f0cf fix: cursor skill symlinks to claude skill 2026-02-05 18:11:24 -08:00
RichardTang-Aden 2a876c2a10 Merge pull request #3743 from RichardTang-Aden/integration-ci
feat(ci): add integration credential specs and CI validation
2026-02-05 18:06:22 -08:00
Richard Tang ff0adeaba7 docs: update outdated skill references 2026-02-05 18:00:06 -08:00
Richard Tang 846edbf256 docs: update documentation structure 2026-02-05 18:00:04 -08:00
Richard Tang c68dd48f6d feat: add slack credential spec and contribution doc 2026-02-05 17:39:44 -08:00
bryan 8b828dd139 Merge branch 'main' into feat/multi-level-logging 2026-02-05 17:19:17 -08:00
Richard Tang 50c0a5da9e feat: integration credentials implementation check 2026-02-05 17:06:34 -08:00
Timothy @aden 2f0e5c42f1 Merge pull request #3724 from TimothyZhang7/main
docs(hive): hive commands rebrand
2026-02-05 15:06:25 -08:00
Timothy @aden 903288468a Merge pull request #3725 from adenhq/chore/gmail-to-google
(micro-fix): changing gmail to google
2026-02-05 14:54:18 -08:00
bryan 9e3bba6f59 updated tests 2026-02-05 14:52:19 -08:00
bryan bc16f0752f changing gmail to google 2026-02-05 14:46:38 -08:00
Timothy 86badd70fa docs(hive): hive commands rebrand 2026-02-05 14:35:50 -08:00
Timothy @aden ce5379516c Merge pull request #3722 from TimothyZhang7/main
docs(templates): put example templates in there
2026-02-05 14:31:50 -08:00
Timothy a50078bbf2 chore: moves the templates 2026-02-05 14:25:49 -08:00
Timothy 2cef168442 fix: aden hive url 2026-02-05 14:08:18 -08:00
Timothy @aden 0a1a9e3545 Merge pull request #3720 from TimothyZhang7/feature/example-agent-registry
docs(skills): Rename skills to hive-* namespace and improve create workflow
2026-02-05 13:59:45 -08:00
Timothy 3c8682d80c fix: mention of skill in readme 2026-02-05 13:59:02 -08:00
Timothy ecc5a1608f fix: make sure of the skill ordering 2026-02-05 13:54:20 -08:00
RichardTang-Aden bc81b55600 Merge pull request #3713 from adenhq/update/gmail-send-tool
(micro-fix): created gmail send tool
2026-02-05 13:15:08 -08:00
Timothy 28b628c1b4 fix: update skill names and examples 2026-02-05 13:13:19 -08:00
Timothy 148264ac73 fix: skill problems 2026-02-05 13:11:18 -08:00
bryan 4046e4e379 created gmail send tool 2026-02-05 13:10:47 -08:00
Timothy 28298d9af2 fix: streamline the executor configuration and data tool usage 2026-02-05 12:50:00 -08:00
Fernando Mano 9d156325e0 Merge branch 'main' into feat/observability-trace-context 2026-02-05 17:06:07 -03:00
bryan 221712128d bug fix for crashing agent 2026-02-05 11:59:57 -08:00
bryan e9fc36f2d3 Merge branch 'main' into feat/multi-level-logging 2026-02-05 09:10:56 -08:00
bryan 305b880b1d including missing tool log inputs 2026-02-05 09:08:42 -08:00
Anshumaan Saraf 34782a6b85 docs(CONTRIBUTING): add upstream sync steps (#3477)
Fixes #2692

Added steps to configure the upstream remote and sync the main branch
before creating a feature branch. This helps contributors avoid starting
from stale code and reduces merge conflicts.
2026-02-05 16:28:07 +08:00
Patrick d25d94e71b docs(aden-credential-sync): typo (#3601) 2026-02-05 16:11:13 +08:00
Timothy @aden 51f1b449cd Merge pull request #3584 from TimothyZhang7/main
fix: gap between lint and format
2026-02-04 21:05:22 -08:00
Timothy 804e47dde4 fix: gap between lint and format 2026-02-04 21:02:50 -08:00
Timothy @aden 582c810d15 Merge pull request #3583 from TimothyZhang7/main
fix: test case
2026-02-04 20:59:58 -08:00
Timothy cede629718 fix: test case 2026-02-04 20:53:53 -08:00
bryan 7519c73f2a Merge branch 'main' into feat/multi-level-logging 2026-02-04 19:34:01 -08:00
bryan bf402aaa18 initial multi-level logging 2026-02-04 17:26:58 -08:00
Fernando Mano 4310852ee6 chore: Merge branch 'main' into feat/observability-trace-context 2026-01-30 15:09:54 -03:00
Fernando Mano 853f1e9873 chore: Merge remote-tracking branch 'refs/remotes/origin/feat/observability-trace-context' into feat/observability-trace-context 2026-01-28 16:52:38 -03:00
Fernando Mano ae5fe84fb2 feat(observability): Structured logging with automatic trace context propagation -- fix ruff formatting errors 2026-01-28 15:04:06 -03:00
Fernando Mano 92b538d5ae Merge branch 'adenhq:main' into feat/observability-trace-context 2026-01-28 14:52:37 -03:00
Fernando Mano 5351703949 feat(observability): Structured logging with automatic trace context propagation -- fix lint error 2026-01-28 14:52:02 -03:00
Fernando Mano 7ba8169444 feat(observability): Structured logging with automatic trace context propagation -- remove colored logs for some cases when in prod mode 2026-01-28 12:46:54 -03:00
Fernando Mano d090c954ae feat(observability): Structured logging with automatic trace context propagation -- adjust all logs to print full uuids when in prod mode and include documentation 2026-01-28 12:31:11 -03:00
Fernando Mano 9bee1666f1 chore: Merge branch 'main' into feat/observability-trace-context 2026-01-28 11:35:13 -03:00
Fernando Mano fb94637339 feat(observability): Structured logging with automatic trace context propagation 2026-01-28 11:27:24 -03:00
186 changed files with 20521 additions and 2591 deletions
@@ -1,415 +0,0 @@
---
name: building-agents-construction
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: procedural
part_of: building-agents
requires: building-agents-core
---
# Agent Construction - EXECUTE THESE STEPS
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it.
---
## STEP 1: Initialize Build Environment
**EXECUTE THESE TOOL CALLS NOW:**
1. Register the hive-tools MCP server:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Discover available tools:
```
mcp__agent-builder__list_mcp_tools()
```
4. Create the package directory:
```
mkdir -p exports/AGENT_NAME/nodes
```
**AFTER completing these calls**, tell the user:
> ✅ Build environment initialized
>
> - Session created
> - Available tools: [list the tools from step 3]
>
> Proceeding to define the agent goal...
**THEN immediately proceed to STEP 2.**
---
## STEP 2: Define and Approve Goal
**PROPOSE a goal to the user.** Based on what they asked for, propose:
- Goal ID (kebab-case)
- Goal name
- Goal description
- 3-5 success criteria (each with: id, description, metric, target, weight)
- 2-4 constraints (each with: id, description, constraint_type, category)
**FORMAT your proposal as a clear summary, then ask for approval:**
> **Proposed Goal: [Name]**
>
> [Description]
>
> **Success Criteria:**
>
> 1. [criterion 1]
> 2. [criterion 2]
> ...
>
> **Constraints:**
>
> 1. [constraint 1]
> 2. [constraint 2]
> ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this goal definition?",
"header": "Goal",
"options": [
{"label": "Approve", "description": "Goal looks good, proceed"},
{"label": "Modify", "description": "I want to change something"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
- If **Modify**: Ask what they want to change, update proposal, ask again
---
## STEP 3: Design Node Workflow
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
**DESIGN the workflow** as a series of nodes. For each node, determine:
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist - empty list if no tools needed)
- system_prompt (should mention `set_output` for producing structured outputs)
- client_facing: True if this node interacts with the user
- nullable_output_keys (for mutually exclusive outputs)
- max_node_visits (>1 if this node is a feedback loop target)
**PRESENT the workflow to the user:**
> **Proposed Workflow: [N] nodes**
>
> 1. **[node-id]** - [description]
>
> - Type: event_loop [client-facing] / function
> - Input: [keys]
> - Output: [keys]
> - Tools: [tools or "none"]
>
> 2. **[node-id]** - [description]
> ...
>
> **Flow:** node1 → node2 → node3 → ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this workflow design?",
"header": "Workflow",
"options": [
{"label": "Approve", "description": "Workflow looks good, proceed to build nodes"},
{"label": "Modify", "description": "I want to change the workflow"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 4
- If **Modify**: Ask what they want to change, update design, ask again
---
## STEP 4: Build Nodes One by One
**FOR EACH node in the approved workflow:**
1. **Call** `mcp__agent-builder__add_node(...)` with the node details
- input_keys and output_keys must be JSON strings: `'["key1", "key2"]'`
- tools must be a JSON string: `'["tool1"]'` or `'[]'`
2. **Call** `mcp__agent-builder__test_node(...)` to validate:
```
mcp__agent-builder__test_node(
node_id="the-node-id",
test_input='{"key": "test value"}',
mock_llm_response='{"output_key": "test output"}'
)
```
3. **Check result:**
- If valid: Tell user "✅ Node [id] validated" and continue to next node
- If invalid: Show errors, fix the node, re-validate
4. **Show progress** after each node:
```
mcp__agent-builder__get_session_status()
```
> ✅ Node [X] of [Y] complete: [node-id]
**AFTER all nodes are added and validated**, proceed to STEP 5.
---
## STEP 5: Connect Edges
**DETERMINE the edges** based on the workflow flow. For each connection:
- edge_id (kebab-case)
- source (node that outputs)
- target (node that receives)
- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"`
- condition_expr (Python expression using `output.get(...)`, only if conditional)
- priority (positive = forward edge evaluated first, negative = feedback edge)
**FOR EACH edge, call:**
```
mcp__agent-builder__add_edge(
edge_id="source-to-target",
source="source-node-id",
target="target-node-id",
condition="on_success",
condition_expr="",
priority=1
)
```
**AFTER all edges are added, validate the graph:**
```
mcp__agent-builder__validate_graph()
```
- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6
- If invalid: Show errors, fix edges, re-validate
---
## STEP 6: Generate Agent Package
**EXPORT the graph data:**
```
mcp__agent-builder__export_graph()
```
This returns JSON with all the goal, nodes, edges, and MCP server configurations.
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
5. `__main__.py` - CLI interface
6. `mcp_servers.json` - MCP server configurations
7. `README.md` - Usage documentation
**IMPORTANT entry_points format:**
- MUST be: `{"start": "first-node-id"}`
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
- NOT: `{"first-node-id"}` (WRONG - this is a set)
**Use the example agent** at `.claude/skills/building-agents-construction/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.
**AFTER writing all files, tell the user:**
> ✅ Agent package created: `exports/AGENT_NAME/`
>
> **Files generated:**
>
> - `__init__.py` - Package exports
> - `agent.py` - Goal, nodes, edges, agent class
> - `config.py` - Runtime configuration
> - `__main__.py` - CLI interface
> - `nodes/__init__.py` - Node definitions
> - `mcp_servers.json` - MCP server config
> - `README.md` - Usage documentation
>
> **Test your agent:**
>
> ```bash
> cd /home/timothy/oss/hive
> PYTHONPATH=exports uv run python -m AGENT_NAME validate
> PYTHONPATH=exports uv run python -m AGENT_NAME info
> ```
---
## STEP 7: Verify and Test
**RUN validation:**
```bash
cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME validate
```
- If valid: Agent is complete!
- If errors: Fix the issues and re-run
**SHOW final session summary:**
```
mcp__agent-builder__get_session_status()
```
**TELL the user the agent is ready** and suggest next steps:
- Run with mock mode to test without API calls
- Use `/testing-agent` skill for comprehensive testing
- Use `/setup-credentials` if the agent needs API keys
---
## REFERENCE: Node Types
| Type | tools param | Use when |
|------|-------------|----------|
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
---
## REFERENCE: NodeSpec New Fields
| Field | Default | Description |
|-------|---------|-------------|
| `client_facing` | `False` | Streams output to user, blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (mutually exclusive outputs) |
| `max_node_visits` | `1` | Max executions per run. Set >1 for feedback loop targets. 0=unlimited |
---
## REFERENCE: Edge Conditions & Priority
| Condition | When edge is followed |
|-----------|--------------------------------------|
| `on_success` | Source node completed successfully |
| `on_failure` | Source node failed |
| `always` | Always, regardless of success/failure |
| `conditional` | When condition_expr evaluates to True |
**Priority:** Positive = forward edge (evaluated first). Negative = feedback edge (loops back to earlier node). Multiple ON_SUCCESS edges from same source = parallel execution (fan-out).
---
## REFERENCE: System Prompt Best Practice
For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:
```
Use set_output(key, value) to store your results. For example:
- set_output("search_results", <your results as a JSON string>)
Do NOT return raw JSON. Use the set_output tool to produce outputs.
```
For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
```
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
- set_output("key", "value based on user's response")
```
This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
---
## EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.
```python
# Direct execution
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
storage_path = Path.home() / ".hive" / "my_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
```
**DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.
---
## COMMON MISTAKES TO AVOID
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
3. **Skipping validation** - Always validate nodes and graph before proceeding
4. **Not waiting for approval** - Always ask user before major steps
5. **Displaying this file** - Execute the steps, don't show documentation
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
@@ -1,46 +0,0 @@
"""Runtime configuration."""
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 8192
api_key: str | None = None
api_base: str | None = None
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
metadata = AgentMetadata()
@@ -1,12 +1,12 @@
---
name: building-agents-core
name: hive-concepts
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: foundational
part_of: building-agents
part_of: hive
---
# Building Agents - Core Concepts
@@ -251,6 +251,7 @@ The judge controls when a node's loop exits:
Controls loop behavior:
- `max_iterations` (default 50) — prevents infinite loops
- `max_tool_calls_per_turn` (default 10) — limits tool calls per LLM response
- `tool_call_overflow_margin` (default 0.5) — wiggle room before discarding extra tool calls (50% means hard cutoff at 150% of limit)
- `stall_detection_threshold` (default 3) — detects repeated identical responses
- `max_history_tokens` (default 32000) — triggers conversation compaction
@@ -258,9 +259,12 @@ Controls loop behavior:
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
- `save_data(filename, data, data_dir)` — Write data to a file in the data directory
- `load_data(filename, data_dir, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files(data_dir)` — List available data files
- `save_data(filename, data)` — Write data to a file in the data directory
- `load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files()` — List available data files
- `serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
@@ -346,15 +350,15 @@ Before writing a node with `tools=[...]`:
## When to Use This Skill
Use building-agents-core when:
Use hive-concepts when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `building-agents-construction` skill
- Need patterns and examples? → Use `building-agents-patterns` skill
- Ready to build? → Use `hive-create` skill
- Need patterns and examples? → Use `hive-patterns` skill
## MCP Tools for Validation
@@ -389,7 +393,7 @@ mcp__agent-builder__configure_loop(
## Related Skills
- **building-agents-construction** - Step-by-step building process
- **building-agents-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **agent-workflow** - Complete workflow orchestrator
- **testing-agent** - Test and validate completed agents
- **hive-create** - Step-by-step building process
- **hive-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **hive** - Complete workflow orchestrator
- **hive-test** - Test and validate completed agents
+980
View File
@@ -0,0 +1,980 @@
---
name: hive-create
description: Step-by-step guide for building goal-driven agents. Qualifies use cases first (the good, bad, and ugly), then creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "2.2"
type: procedural
part_of: hive
requires: hive-concepts
---
# Agent Construction - EXECUTE THESE STEPS
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 0 — determine the build path as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 0 now.
---
## STEP 0: Choose Build Path
**If the user has already indicated whether they want to build from scratch or from a template, skip this question and proceed to the appropriate step.**
Otherwise, ask:
```
AskUserQuestion(questions=[{
"question": "How would you like to build your agent?",
"header": "Build Path",
"options": [
{"label": "From scratch", "description": "Design goal, nodes, and graph collaboratively from nothing"},
{"label": "From a template", "description": "Start from a working sample agent and customize it"}
],
"multiSelect": false
}])
```
- If **From scratch**: Proceed to STEP 1A
- If **From a template**: Proceed to STEP 1B
---
## STEP 1A: Initialize Build Environment (From Scratch)
**EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):
1. Check for existing sessions:
```
mcp__agent-builder__list_sessions()
```
- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to step 3.
- If no matching session exists, proceed to step 2.
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Register the hive-tools MCP server:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="uv",
args='["run", "python", "mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
4. Discover available tools:
```
mcp__agent-builder__list_mcp_tools()
```
5. Create the package directory:
```bash
mkdir -p exports/AGENT_NAME/nodes
```
**Save the tool list for STEP 4** — you will need it for node design.
**THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).
---
## STEP 1B: Initialize Build Environment (From Template)
**EXECUTE THESE STEPS NOW:**
### 1B.1: Discover available templates
List the template directories and read each template's `agent.json` to get its name and description:
```bash
ls examples/templates/
```
For each directory found, read `examples/templates/TEMPLATE_DIR/agent.json` with the Read tool and extract:
- `agent.name` — the template's display name
- `agent.description` — what the template does
### 1B.2: Present templates to user
Show the user a table of available templates:
> **Available Templates:**
>
> | # | Template | Description |
> |---|----------|-------------|
> | 1 | [name from agent.json] | [description from agent.json] |
> | 2 | ... | ... |
Then ask the user to pick a template and provide a name for their new agent:
```
AskUserQuestion(questions=[{
"question": "Which template would you like to start from?",
"header": "Template",
"options": [
{"label": "[template 1 name]", "description": "[template 1 description]"},
{"label": "[template 2 name]", "description": "[template 2 description]"},
...
],
"multiSelect": false
}, {
"question": "What should the new agent be named? (snake_case)",
"header": "Agent Name",
"options": [
{"label": "Use template name", "description": "Keep the original template name as-is"},
{"label": "Custom name", "description": "I'll provide a new snake_case name"}
],
"multiSelect": false
}])
```
### 1B.3: Copy template to exports
```bash
cp -r examples/templates/TEMPLATE_DIR exports/NEW_AGENT_NAME
```
### 1B.4: Create session and register MCP (same logic as STEP 1A)
First, check for existing sessions:
```
mcp__agent-builder__list_sessions()
```
- If a session with this agent name already exists, load it with `mcp__agent-builder__load_session_by_id(session_id="...")` and skip to `list_mcp_tools`.
- If no matching session exists, create one:
```
mcp__agent-builder__create_session(name="NEW_AGENT_NAME")
```
Then register MCP and discover tools:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="uv",
args='["run", "python", "mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
```
mcp__agent-builder__list_mcp_tools()
```
### 1B.5: Load template into builder session
Import the entire agent definition in one call:
```
mcp__agent-builder__import_from_export(agent_json_path="exports/NEW_AGENT_NAME/agent.json")
```
This reads the agent.json and populates the builder session with the goal, all nodes, and all edges.
**THEN immediately proceed to STEP 2.**
---
## STEP 2: Define Goal Together with User
**A responsible engineer doesn't jump into building. First, understand the problem and be transparent about what the framework can and cannot do.**
**If starting from a template**, the goal is already loaded in the builder session. Present the existing goal to the user using the format below and ask for approval. Skip the collaborative drafting questions — go straight to presenting and asking "Do you approve this goal, or would you like to modify it?"
**If the user has NOT already described what they want to build**, start by asking what kind of agent they have in mind:
```
AskUserQuestion(questions=[{
"question": "What kind of agent do you want to build? Select an option below, or choose 'Other' to describe your own.",
"header": "Agent type",
"options": [
{"label": "Data collection", "description": "Gathers information from the web, analyzes it, and produces a report or sends outreach (e.g. market research, news digest, email campaigns, competitive analysis)"},
{"label": "Workflow automation", "description": "Automates a multi-step business process end-to-end (e.g. lead qualification, content publishing pipeline, data entry)"},
{"label": "Personal assistant", "description": "Handles recurring tasks or monitors for events and acts on them (e.g. daily briefings, meeting prep, file organization)"}
],
"multiSelect": false
}])
```
Use the user's selection (or their custom description if they chose "Other") as context when shaping the goal below. If the user already described what they want before this step, skip the question and proceed directly.
**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.
### 2a: Fast Discovery (3-8 Turns)
**The core principle**: Discovery should feel like progress, not paperwork. The stakeholder should walk away feeling like you understood them faster than anyone else would have.
**Communication sytle**: Be concise. Say less. Mean more. Impatient stakeholders don't want a wall of text — they want to know you get it. Every sentence you say should either move the conversation forward or prove you understood something. If it does neither, cut it.
**Ask Question Rules: Respect Their Time.** Every question must earn its place by:
1. **Preventing a costly wrong turn** — you're about to build the wrong thing
2. **Unlocking a shortcut** — their answer lets you simplify the design
3. **Surfacing a dealbreaker** — there's a constraint that changes everything
4. **Provide Options** - Provide options to your questions if possible, but also always allow the user to type something beyong the options.
If a question doesn't do one of these, don't ask it. Make an assumption, state it, and move on.
---
#### 2a.1: Let Them Talk, But Listen Like an Architect
When the stakeholder describes what they want, don't just hear the words — listen for the architecture underneath. While they talk, mentally construct:
- **The actors**: Who are the people/systems involved?
- **The trigger**: What kicks off the workflow?
- **The core loop**: What's the main thing that happens repeatedly?
- **The output**: What's the valuable thing produced at the end?
- **The pain**: What about today's situation is broken, slow, or missing?
You are extracting a **domain model** from natural language in real time. Most stakeholders won't give you this structure explicitly — they'll give you a story. Your job is to hear the structure inside the story.
| They say... | You're hearing... |
|-------------|-------------------|
| Nouns they repeat | Your entities |
| Verbs they emphasize | Your core operations |
| Frustrations they mention | Your design constraints |
| Workarounds they describe | What the system must replace |
| People they name | Your user types |
---
#### 2a.2: Use Domain Knowledge to Fill In the Blanks
You have broad knowledge of how systems work. Use it aggressively.
If they say "I need a research agent," you already know it probably involves: search, summarization, source tracking, and iteration. Don't ask about each — use them as your starting mental model and let their specifics override your defaults.
If they say "I need to monitor files and alert me," you know this probably involves: watch patterns, triggers, notifications, and state tracking.
**The key move**: Take your general knowledge of the domain and merge it with the specifics they've given you. The result is a draft understanding that's 60-80% right before you've asked a single question. Your questions close the remaining 20-40%.
---
#### 2a.3: Play Back a Proposed Model (Not a List of Questions)
After listening, present a **concrete picture** of what you think they need. Make it specific enough that they can spot what's wrong.
**Pattern: "Here's what I heard — tell me where I'm off"**
> "OK here's how I'm picturing this: [User type] needs to [core action]. Right now they're [current painful workflow]. What you want is [proposed solution that replaces the pain].
>
> The way I'd structure this: [key entities] connected by [key relationships], with the main flow being [trigger → steps → outcome].
>
> For the MVP, I'd focus on [the one thing that delivers the most value] and hold off on [things that can wait].
>
> Before I start — [1-2 specific questions you genuinely can't infer]."
Why this works:
- **Proves you were listening** — they don't feel like they have to repeat themselves
- **Shows competence** — you're already thinking in systems
- **Fast to correct** — "no, it's more like X" takes 10 seconds vs. answering 15 questions
- **Creates momentum** — heading toward building, not more talking
---
#### 2a.4: Ask Only What You Cannot Infer
Your questions should be **narrow, specific, and consequential**. Never ask what you could answer yourself.
**Good questions** (high-stakes, can't infer):
- "Who's the primary user — you or your end customers?"
- "Is this replacing a spreadsheet, or is there literally nothing today?"
- "Does this need to integrate with anything, or standalone?"
- "Is there existing data to migrate, or starting fresh?"
**Bad questions** (low-stakes, inferable):
- "What should happen if there's an error?" *(handle gracefully, obviously)*
- "Should it have search?" *(if there's a list, yes)*
- "How should we handle permissions?" *(follow standard patterns)*
- "What tools should I use?" *(your call, not theirs)*
---
#### Conversation Flow (3-5 Turns)
| Turn | Who | What |
|------|-----|------|
| 1 | User | Describes what they need |
| 2 | Agent | Plays back understanding as a proposed model. Asks 1-2 critical questions max. |
| 3 | User | Corrects, confirms, or adds detail |
| 4 | Agent | Adjusts model, confirms MVP scope, states assumptions, declares starting point |
| *(5)* | *(Only if Turn 3 revealed something that fundamentally changes the approach)* |
**AFTER the conversation, IMMEDIATELY proceed to 2b. DO NOT skip to building.**
---
#### Anti-Patterns
| Don't | Do Instead |
|-------|------------|
| Open with a list of questions | Open with what you understood from their request |
| "What are your requirements?" | "Here's what I think you need — am I right?" |
| Ask about every edge case | Handle with smart defaults, flag in summary |
| 10+ turn discovery conversation | 3-8 turns. Start building, iterate with real software. |
| Being lazy nd not understand what user want to achieve | Understand "what" and "why |
| Ask for permission to start | State your plan and start |
| Wait for certainty | Start at 80% confidence, iterate the rest |
| Ask what tech/tools to use | That's your job. Decide, disclose, move on. |
---
### 2b: Capability Assessment
**After the user responds, analyze the fit.** Present this assessment honestly:
> **Framework Fit Assessment**
>
> Based on what you've described, here's my honest assessment of how well this framework fits your use case:
>
> **What Works Well (The Good):**
> - [List 2-4 things the framework handles well for this use case]
> - Examples: multi-turn conversations, human-in-the-loop review, tool orchestration, structured outputs
>
> **Limitations to Be Aware Of (The Bad):**
> - [List 2-3 limitations that apply but are workable]
> - Examples: LLM latency means not suitable for sub-second responses, context window limits for very large documents, cost per run for heavy tool usage
>
> **Potential Deal-Breakers (The Ugly):**
> - [List any significant challenges or missing capabilities — be honest]
> - Examples: no tool available for X, would require custom MCP server, framework not designed for Y
**Be specific.** Reference the actual tools discovered in Step 1. If the user needs `send_email` but it's not available, say so. If they need real-time streaming from a database, explain that's not how the framework works.
### 2c: Gap Analysis
**Identify specific gaps** between what the user wants and what you can deliver:
| Requirement | Framework Support | Gap/Workaround |
|-------------|-------------------|----------------|
| [User need] | [✅ Supported / ⚠️ Partial / ❌ Not supported] | [How to handle or why it's a problem] |
**Examples of gaps to identify:**
- Missing tools (user needs X, but only Y and Z are available)
- Scope issues (user wants to process 10,000 items, but LLM rate limits apply)
- Interaction mismatches (user wants CLI-only, but agent is designed for TUI)
- Data flow issues (user needs to persist state across runs, but sessions are isolated)
- Latency requirements (user needs instant responses, but LLM calls take seconds)
### 2d: Recommendation
**Give a clear recommendation:**
> **My Recommendation:**
>
> [One of these three:]
>
> **✅ PROCEED** — This is a good fit. The framework handles your core needs well. [List any minor caveats.]
>
> **⚠️ PROCEED WITH SCOPE ADJUSTMENT** — This can work, but we should adjust: [specific changes]. Without these adjustments, you'll hit [specific problems].
>
> **🛑 RECONSIDER** — This framework may not be the right tool for this job because [specific reasons]. Consider instead: [alternatives — simpler script, different framework, custom solution].
### 2e: Get Explicit Acknowledgment
**CALL AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Based on this assessment, how would you like to proceed?",
"header": "Proceed",
"options": [
{"label": "Proceed as described", "description": "I understand the limitations, let's build it"},
{"label": "Adjust scope", "description": "Let's modify the requirements to fit better"},
{"label": "More questions", "description": "I have questions about the assessment"},
{"label": "Reconsider", "description": "Maybe this isn't the right approach"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Proceed**: Move to STEP 3
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
- If **More questions**: Answer them honestly, then ask again
- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, that's their informed choice
---
## STEP 3: Define Goal Together with User
**Now that the use case is qualified, collaborate on the goal definition.**
**START by synthesizing what you learned:**
> Based on our discussion, here's my understanding of the goal:
>
> **Core purpose:** [what you understood from 2a]
> **Success looks like:** [what you inferred]
> **Key constraints:** [what you inferred]
>
> Let me refine this with you:
>
> 1. **What should this agent accomplish?** (confirm or correct my understanding)
> 2. **How will we know it succeeded?** (what specific outcomes matter)
> 3. **Are there any hard constraints?** (things it must never do, quality bars)
**WAIT for the user to respond.** Use their input (and the agent type they selected) to draft:
- Goal ID (kebab-case)
- Goal name
- Goal description
- 3-5 success criteria (each with: id, description, metric, target, weight)
- 2-4 constraints (each with: id, description, constraint_type, category)
**PRESENT the draft goal for approval:**
> **Proposed Goal: [Name]**
>
> [Description]
>
> **Success Criteria:**
>
> 1. [criterion 1]
> 2. [criterion 2]
> ...
>
> **Constraints:**
>
> 1. [constraint 1]
> 2. [constraint 2]
> ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this goal definition?",
"header": "Goal",
"options": [
{"label": "Approve", "description": "Goal looks good, proceed to workflow design"},
{"label": "Modify", "description": "I want to change something"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 4
- If **Modify**: Ask what they want to change, update the draft, ask again
---
## STEP 4: Design Conceptual Nodes
**If starting from a template**, the nodes are already loaded in the builder session. Present the existing nodes using the table format below and ask for approval. Skip the design phase.
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
**DESIGN the workflow** as a series of nodes. For each node, determine:
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist from Step 1 — empty list if no tools needed)
- client_facing: True if this node interacts with the user
- nullable_output_keys (for mutually exclusive outputs or feedback-only inputs)
- max_node_visits (>1 if this node is a feedback loop target)
**Prefer fewer, richer nodes** (4 nodes > 8 thin nodes). Each node boundary requires serializing outputs. A research node that searches, fetches, and analyzes keeps all source material in its conversation history.
**PRESENT the nodes to the user for review:**
> **Proposed Nodes ([N] total):**
>
> | # | Node ID | Type | Description | Tools | Client-Facing |
> | --- | ---------- | ---------- | ----------------------------- | ---------------------- | :-----------: |
> | 1 | `intake` | event_loop | Gather requirements from user | — | Yes |
> | 2 | `research` | event_loop | Search and analyze sources | web_search, web_scrape | No |
> | 3 | `review` | event_loop | Present findings for approval | — | Yes |
> | 4 | `report` | event_loop | Generate final report | save_data | No |
>
> **Data Flow:**
>
> - `intake` produces: `research_brief`
> - `research` receives: `research_brief` → produces: `findings`, `sources`
> - `review` receives: `findings`, `sources` → produces: `approved_findings` or `feedback`
> - `report` receives: `approved_findings` → produces: `final_report`
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve these nodes?",
"header": "Nodes",
"options": [
{"label": "Approve", "description": "Nodes look good, proceed to graph design"},
{"label": "Modify", "description": "I want to change the nodes"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 5
- If **Modify**: Ask what they want to change, update design, ask again
---
## STEP 5: Design Full Graph and Review
**If starting from a template**, the edges are already loaded in the builder session. Render the existing graph as ASCII art and present it to the user for approval. Skip the edge design phase.
**DETERMINE the edges** connecting the approved nodes. For each edge:
- edge_id (kebab-case)
- source → target
- condition: `on_success`, `on_failure`, `always`, or `conditional`
- condition_expr (Python expression, only if conditional)
- priority (positive = forward, negative = feedback/loop-back)
**RENDER the complete graph as ASCII art.** Make it large and clear — the user needs to see and understand the full workflow at a glance.
**IMPORTANT: Make the ASCII art BIG and READABLE.** Use a box-and-arrow style with generous spacing. Do NOT make it tiny or compressed. Example format:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT: Research Agent │
│ │
│ Goal: Thoroughly research technical topics and produce verified reports │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────────────────────┐
│ INTAKE │
│ (client-facing) │
│ │
│ in: topic │
│ out: research_brief │
└───────────┬───────────┘
│ on_success
┌───────────────────────┐
│ RESEARCH │
│ │
│ tools: web_search, │
│ web_scrape │
│ │
│ in: research_brief │
│ [feedback] │
│ out: findings, │
│ sources │
└───────────┬───────────┘
│ on_success
┌───────────────────────┐
│ REVIEW │
│ (client-facing) │
│ │
│ in: findings, │
│ sources │
│ out: approved_findings│
│ OR feedback │
└───────┬───────┬───────┘
│ │
approved │ │ feedback (priority: -1)
│ │
▼ └──────────────────┐
┌───────────────────────┐ │
│ REPORT │ │
│ │ │
│ tools: save_data │ │
│ │ │
│ in: approved_ │ │
│ findings │ │
│ out: final_report │ │
└───────────────────────┘ │
┌──────────────────────────┘
│ loops back to RESEARCH
▼ (max_node_visits: 3)
EDGES:
──────
1. intake → research [on_success, priority: 1]
2. research → review [on_success, priority: 1]
3. review → report [conditional: approved_findings is not None, priority: 1]
4. review → research [conditional: feedback is not None, priority: -1]
```
**PRESENT the graph and edges to the user:**
> Here is the complete workflow graph:
>
> [ASCII art above]
>
> **Edge Summary:**
>
> | # | Edge | Condition | Priority |
> | --- | ----------------- | -------------------------------------------- | -------- |
> | 1 | intake → research | on_success | 1 |
> | 2 | research → review | on_success | 1 |
> | 3 | review → report | conditional: `approved_findings is not None` | 1 |
> | 4 | review → research | conditional: `feedback is not None` | -1 |
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this workflow graph?",
"header": "Graph",
"options": [
{"label": "Approve", "description": "Graph looks good, proceed to build the agent"},
{"label": "Modify", "description": "I want to change the graph"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 6
- If **Modify**: Ask what they want to change, update the graph, re-render, ask again
---
## STEP 6: Build the Agent
**NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.
### 6a: Register nodes and edges with MCP
**If starting from a template**, the copied files will be overwritten with the approved design. You MUST replace every occurrence of the old template name with the new agent name. Here is the complete checklist — miss NONE of these:
| File | What to rename |
|------|---------------|
| `config.py` | `AgentMetadata.name` — the display name shown in TUI agent selection |
| `config.py` | `AgentMetadata.description` — agent description |
| `agent.py` | Module docstring (line 1) |
| `agent.py` | `class OldNameAgent:``class NewNameAgent:` |
| `agent.py` | `GraphSpec(id="old-name-graph")``GraphSpec(id="new-name-graph")` — shown in TUI status bar |
| `agent.py` | Storage path: `Path.home() / ".hive" / "agents" / "old_name"``"new_name"` |
| `__main__.py` | Module docstring (line 1) |
| `__main__.py` | `from .agent import ... OldNameAgent``NewNameAgent` |
| `__main__.py` | CLI help string in `def cli()` docstring |
| `__main__.py` | All `OldNameAgent()` instantiations |
| `__main__.py` | Storage path (duplicated from agent.py) |
| `__main__.py` | Shell banner string (e.g. `"=== Old Name Agent ==="`) |
| `__init__.py` | Package docstring |
| `__init__.py` | `from .agent import OldNameAgent` import |
| `__init__.py` | `__all__` list entry |
**If starting from a template and no modifications were made in Steps 2-5**, the nodes and edges are already registered. Skip to validation (`mcp__agent-builder__validate_graph()`). If modifications were made, re-register the changed nodes/edges (the MCP tools handle duplicates by overwriting).
**FOR EACH approved node**, call:
```
mcp__agent-builder__add_node(
node_id="...",
name="...",
description="...",
node_type="event_loop",
input_keys='["key1", "key2"]',
output_keys='["key1"]',
tools='["tool1"]',
system_prompt="...",
client_facing=True/False,
nullable_output_keys='["key"]',
max_node_visits=1
)
```
**FOR EACH approved edge**, call:
```
mcp__agent-builder__add_edge(
edge_id="source-to-target",
source="source-node-id",
target="target-node-id",
condition="on_success",
condition_expr="",
priority=1
)
```
**VALIDATE the graph:**
```
mcp__agent-builder__validate_graph()
```
- If invalid: Fix the issues and re-validate
- If valid: Continue to 6b
### 6b: Write Python package files
**EXPORT the graph data:**
```
mcp__agent-builder__export_graph()
```
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
5. `__main__.py` - CLI interface
6. `mcp_servers.json` - MCP server configurations
7. `README.md` - Usage documentation
**IMPORTANT entry_points format:**
- MUST be: `{"start": "first-node-id"}`
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
- NOT: `{"first-node-id"}` (WRONG - this is a set)
**IMPORTANT mcp_servers.json format:**
```json
{
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server"
}
}
```
- NO `"mcpServers"` wrapper (that's Claude Desktop format, NOT hive format)
- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
- `command` MUST be `"uv"` with `"args": ["run", "python", ...]` (NOT bare `"python"` which fails on Mac)
**Use the example agent** at `.claude/skills/hive-create/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.
**AFTER writing all files, tell the user:**
> Agent package created: `exports/AGENT_NAME/`
>
> **Files generated:**
>
> - `__init__.py` - Package exports
> - `agent.py` - Goal, nodes, edges, agent class
> - `config.py` - Runtime configuration
> - `__main__.py` - CLI interface
> - `nodes/__init__.py` - Node definitions
> - `mcp_servers.json` - MCP server config
> - `README.md` - Usage documentation
---
## STEP 7: Verify and Test
**RUN validation:**
```bash
cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME validate
```
- If valid: Agent is complete!
- If errors: Fix the issues and re-run
**TELL the user the agent is ready** and display the next steps box:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ✅ AGENT BUILD COMPLETE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ NEXT STEPS: │
│ │
│ 1. SET UP CREDENTIALS (if agent uses tools like web_search, send_email): │
│ │
│ /hive-credentials --agent AGENT_NAME │
│ │
│ 2. RUN YOUR AGENT: │
│ │
│ hive tui │
│ │
│ Then select your agent from the list and press Enter. │
│ │
│ 3. DEBUG ANY ISSUES: │
│ │
│ /hive-debugger │
│ │
│ The debugger monitors runtime logs, identifies retry loops, │
│ tool failures, and missing outputs, and provides fix recommendations. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## REFERENCE: Node Types
| Type | tools param | Use when |
| ------------ | ----------------------- | --------------------------------------- |
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
---
## REFERENCE: NodeSpec Fields
| Field | Default | Description |
| ---------------------- | ------- | --------------------------------------------------------------------- |
| `client_facing` | `False` | Streams output to user, blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (mutually exclusive outputs) |
| `max_node_visits` | `1` | Max executions per run. Set >1 for feedback loop targets. 0=unlimited |
---
## REFERENCE: Edge Conditions & Priority
| Condition | When edge is followed |
| ------------- | ------------------------------------- |
| `on_success` | Source node completed successfully |
| `on_failure` | Source node failed |
| `always` | Always, regardless of success/failure |
| `conditional` | When condition_expr evaluates to True |
**Priority:** Positive = forward edge (evaluated first). Negative = feedback edge (loops back to earlier node). Multiple ON_SUCCESS edges from same source = parallel execution (fan-out).
---
## REFERENCE: System Prompt Best Practice
For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:
```
Use set_output(key, value) to store your results. For example:
- set_output("search_results", <your results as a JSON string>)
Do NOT return raw JSON. Use the set_output tool to produce outputs.
```
For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
```
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
- set_output("key", "value based on user's response")
```
This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
---
## EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.
```python
# Direct execution
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
storage_path = Path.home() / ".hive" / "agents" / "my_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
```
**DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.
---
## REFERENCE: Framework Capabilities for Qualification
Use this reference during STEP 2 to give accurate, honest assessments.
### What the Framework Does Well (The Good)
| Capability | Description |
|------------|-------------|
| Multi-turn conversations | Client-facing nodes stream to users and block for input |
| Human-in-the-loop review | Approval checkpoints with feedback loops back to earlier nodes |
| Tool orchestration | LLM can call multiple tools, framework handles execution |
| Structured outputs | `set_output` produces validated, typed outputs |
| Parallel execution | Fan-out/fan-in for concurrent node execution |
| Context management | Automatic compaction and spillover for large data |
| Error recovery | Retry logic, judges, and feedback edges for self-correction |
| Session persistence | State saved to disk, resumable sessions |
### Framework Limitations (The Bad)
| Limitation | Impact | Workaround |
|------------|--------|------------|
| LLM latency | 2-10+ seconds per turn | Not suitable for real-time/low-latency needs |
| Context window limits | ~128K tokens max | Use data tools for spillover, design for chunking |
| Cost per run | LLM API calls cost money | Budget planning, caching where possible |
| Rate limits | API throttling on heavy usage | Backoff, queue management |
| Node boundaries lose context | Outputs must be serialized | Prefer fewer, richer nodes |
| Single-threaded within node | One LLM call at a time per node | Use fan-out for parallelism |
### Not Designed For (The Ugly)
| Use Case | Why It's Problematic | Alternative |
|----------|---------------------|-------------|
| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
| Multi-user concurrency | Single-user session model | Separate agent instances per user |
### Tool Availability Reality Check
**Before promising any capability, check `list_mcp_tools()`.** Common gaps:
- **Email**: May not have `send_email` — check before promising email automation
- **Calendar**: May not have calendar APIs — check before promising scheduling
- **Database**: May not have SQL tools — check before promising data queries
- **File system**: Has data tools but not arbitrary filesystem access
- **External APIs**: Depends entirely on what MCP servers are registered
---
## COMMON MISTAKES TO AVOID
1. **Skipping use case qualification** - A responsible engineer qualifies the use case BEFORE building. Be transparent about what works, what doesn't, and what's problematic
2. **Hiding limitations** - Don't oversell the framework. If a tool doesn't exist or a capability is missing, say so upfront
3. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
4. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
5. **Skipping validation** - Always validate nodes and graph before proceeding
6. **Not waiting for approval** - Always ask user before major steps
7. **Displaying this file** - Execute the steps, don't show documentation
8. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
9. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
10. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
@@ -70,7 +70,9 @@ def tui(mock, verbose, debug):
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires the 'textual' package. Install with: pip install textual")
click.echo(
"TUI requires the 'textual' package. Install with: pip install textual"
)
sys.exit(1)
from pathlib import Path
@@ -88,6 +90,9 @@ def tui(mock, verbose, debug):
agent._event_bus = EventBus()
agent._tool_registry = ToolRegistry()
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
agent._tool_registry.load_mcp_config(mcp_config_path)
@@ -104,9 +109,6 @@ def tui(mock, verbose, debug):
tool_executor = agent._tool_registry.get_executor()
graph = agent._build_graph()
storage_path = Path.home() / ".hive" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = create_agent_runtime(
graph=graph,
goal=agent.goal,
@@ -216,7 +218,9 @@ async def _interactive_shell(verbose=False):
if "references" in output:
click.echo("--- References ---\n")
for ref in output.get("references", []):
click.echo(f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}")
click.echo(
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
)
click.echo("\n")
else:
click.echo(f"\nResearch failed: {result.error}\n")
@@ -227,6 +231,7 @@ async def _interactive_shell(verbose=False):
except Exception as e:
click.echo(f"Error: {e}", err=True)
import traceback
traceback.print_exc()
finally:
await agent.stop()
@@ -166,13 +166,18 @@ class DeepResearchAgent:
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config={
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
)
def _setup(self, mock_mode=False) -> GraphExecutor:
"""Set up the executor with all components."""
from pathlib import Path
storage_path = Path.home() / ".hive" / "deep_research_agent"
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
self._event_bus = EventBus()
@@ -203,6 +208,7 @@ class DeepResearchAgent:
tool_executor=tool_executor,
event_bus=self._event_bus,
storage_path=storage_path,
loop_config=self._graph.loop_config,
)
return self._executor
@@ -0,0 +1,21 @@
"""Runtime configuration."""
from dataclasses import dataclass
from framework.config import RuntimeConfig
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
metadata = AgentMetadata()
@@ -1,8 +1,8 @@
{
"hive-tools": {
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server providing web_search, web_scrape, and write_to_file"
}
@@ -102,41 +102,56 @@ Should we proceed to writing the final report?
)
# Node 4: Report (client-facing)
# Writes the final report and presents it to the user.
# Writes an HTML report, serves the link to the user, and answers follow-ups.
report_node = NodeSpec(
id="report",
name="Write & Deliver Report",
description="Write a cited report from the findings and present it to the user",
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status"],
system_prompt="""\
Write a comprehensive research report and present it to the user.
Write a comprehensive research report as an HTML file and present it to the user.
**STEP 1 Write and present the report (text only, NO tool calls):**
**STEP 1 Write the HTML report (tool calls, NO text to user yet):**
Report structure:
1. **Executive Summary** (2-3 paragraphs)
2. **Findings** (organized by theme, with [n] citations)
3. **Analysis** (synthesis, implications, areas of debate)
4. **Conclusion** (key takeaways, confidence assessment)
5. **References** (numbered list of sources cited)
1. Compose a complete, self-contained HTML document with embedded CSS styling.
Use a clean, readable design: max-width container, pleasant typography,
numbered citation links, a table of contents, and a references section.
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
Report structure inside the HTML:
- Title & date
- Executive Summary (2-3 paragraphs)
- Table of Contents
- Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications, areas of debate)
- Conclusion (key takeaways, confidence assessment)
- References (numbered list with clickable URLs)
End by asking the user if they have questions or want to save the report.
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
**STEP 2 After the user responds:**
2. Save the HTML file:
save_data(filename="report.html", data=<your_html>)
3. Get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions.
**STEP 3 After the user responds:**
- Answer follow-up questions from the research material
- If they want to save, use write_to_file tool
- When the user is satisfied: set_output("delivery_status", "completed")
""",
tools=["write_to_file"],
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
)
__all__ = [
@@ -1,10 +1,10 @@
---
name: setup-credentials
name: hive-credentials
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
license: Apache-2.0
metadata:
author: hive
version: "2.2"
version: "2.3"
type: utility
---
@@ -31,95 +31,50 @@ Determine which agent needs credentials. The user will either:
Locate the agent's directory under `exports/{agent_name}/`.
### Step 2: Detect Required Credentials (Bash-First)
### Step 2: Detect Missing Credentials
Use bash commands to determine what the agent needs and what's already configured. This avoids Python import issues and works even when `HIVE_CREDENTIAL_KEY` is not set.
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
#### Step 2a: Read Agent Requirements
Extract `required_tools` and node types from the agent config:
```bash
# Get required tools
jq -r '.required_tools[]?' exports/{agent_name}/agent.json 2>/dev/null
# Get node types from graph nodes
jq -r '.graph.nodes[]?.node_type' exports/{agent_name}/agent.json 2>/dev/null | sort -u
```
check_missing_credentials(agent_path="exports/{agent_name}")
```
Map the extracted tools and node types to credentials by reading the spec files directly:
The tool returns a JSON response:
```bash
# Read all credential specs — each file defines tools, node_types, env_var, and credential_id
cat tools/src/aden_tools/credentials/llm.py tools/src/aden_tools/credentials/search.py tools/src/aden_tools/credentials/email.py tools/src/aden_tools/credentials/integrations.py
```json
{
"agent": "exports/{agent_name}",
"missing": [
{
"credential_name": "brave_search",
"env_var": "BRAVE_SEARCH_API_KEY",
"description": "Brave Search API key for web search",
"help_url": "https://brave.com/search/api/",
"tools": ["web_search"]
}
],
"available": [
{
"credential_name": "anthropic",
"env_var": "ANTHROPIC_API_KEY",
"source": "encrypted_store"
}
],
"total_missing": 1,
"ready": false
}
```
For each `CredentialSpec`, match its `tools` and `node_types` lists against the agent's required tools and node types. Extract the `env_var`, `credential_id`, and `credential_group` for every match. This is the list of needed credentials.
#### Step 2b: Check Existing Credential Sources
For each needed credential, check three sources. A credential is "found" if it exists in ANY of them:
**1. Encrypted store metadata index** (unencrypted JSON — no decryption key needed):
```bash
cat ~/.hive/credentials/metadata/index.json 2>/dev/null | jq -r '.credentials | keys[]'
```
If a credential ID appears in this list, it is stored in the encrypted store.
**2. Environment variables:**
```bash
# Check each needed env var, e.g.:
printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "ANTHROPIC_API_KEY: set" || echo "ANTHROPIC_API_KEY: not set"
printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "BRAVE_SEARCH_API_KEY: set" || echo "BRAVE_SEARCH_API_KEY: not set"
```
**3. Project `.env` file:**
```bash
# Check each needed env var, e.g.:
grep -q '^ANTHROPIC_API_KEY=' .env 2>/dev/null && echo "ANTHROPIC_API_KEY: in .env" || echo "ANTHROPIC_API_KEY: not in .env"
grep -q '^BRAVE_SEARCH_API_KEY=' .env 2>/dev/null && echo "BRAVE_SEARCH_API_KEY: in .env" || echo "BRAVE_SEARCH_API_KEY: not in .env"
```
#### Step 2c: HIVE_CREDENTIAL_KEY Check
If any credentials were found in the encrypted store metadata index, verify the encryption key is available. The key is typically persisted to shell config by a previous setup-credentials run.
Check both the current session AND shell config files:
```bash
# Check 1: Current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check 2: Shell config files (where setup-credentials persists it)
# Note: check each file individually to avoid non-zero exit when one doesn't exist
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
Decision logic:
- **In current session** — no action needed, credentials in the store are usable
- **In shell config but NOT in current session** — the key is persisted but this shell hasn't sourced it. Run `source ~/.zshrc` (or `~/.bashrc`), then re-check. Credentials in the store are usable after sourcing.
- **Not in session AND not in shell config** — the key was never persisted. Warn the user that credentials in the store cannot be decrypted. Help fix the key situation (recover/re-persist), do NOT re-collect credential values that are already stored.
#### Step 2d: Compute Missing & Group
Diff the "needed" credentials against the "found" credentials to get the truly missing list.
Group related credentials by their `credential_group` field from the spec files. Credentials that share the same non-empty `credential_group` value should be presented as a single setup step rather than asking for each one individually.
**If nothing is missing and there's no HIVE_CREDENTIAL_KEY issue:** Report all credentials as configured and skip Steps 3-5. Example:
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
```
All required credentials are already configured:
✓ anthropic (ANTHROPIC_API_KEY) — found in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — found in environment
✓ anthropic (ANTHROPIC_API_KEY)
✓ brave_search (BRAVE_SEARCH_API_KEY)
Your agent is ready to run!
```
**If credentials are missing:** Continue to Step 3 with only the missing ones.
**If credentials are missing:** Continue to Step 3 with the `missing` list.
### Step 3: Present Auth Options for Each Missing Credential
@@ -153,7 +108,7 @@ Present the available options using AskUserQuestion:
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Platform (OAuth) (Recommended)
Secure OAuth2 flow via integration.adenhq.com
Secure OAuth2 flow via hive.adenhq.com
- Quick setup with automatic token refresh
- No need to manage API keys manually
@@ -170,6 +125,22 @@ Choose how to configure HUBSPOT_ACCESS_TOKEN:
### Step 4: Execute Auth Flow Based on User Choice
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
```bash
# Check current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check shell config files
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
- **In current session** — proceed to store credentials
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere**`EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
@@ -195,7 +166,7 @@ If not set, guide user to get one from Aden (this is where they do OAuth):
from aden_tools.credentials import open_browser, get_aden_setup_url
# Open browser to Aden - user will sign up and connect integrations there
url = get_aden_setup_url() # https://integration.adenhq.com/setup
url = get_aden_setup_url() # https://hive.adenhq.com
success, msg = open_browser(url)
print("Please sign in to Aden and connect your integrations (HubSpot, etc.).")
@@ -272,7 +243,7 @@ print(f"Synced credentials: {synced}")
# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
if "hubspot" not in synced:
print("HubSpot not found in your Aden account.")
print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.")
print("Please visit https://hive.adenhq.com to connect HubSpot, then try again.")
```
For more control over the sync process:
@@ -442,28 +413,38 @@ config_path.write_text(json.dumps(config, indent=2))
### Step 6: Verify All Credentials
Run validation again to confirm everything is set:
Use the `verify_credentials` MCP tool to confirm everything is properly configured:
```python
runner = AgentRunner.load("exports/{agent_name}")
validation = runner.validate()
assert not validation.missing_credentials, "Still missing credentials!"
```
verify_credentials(agent_path="exports/{agent_name}")
```
Report the result to the user.
The tool returns:
```json
{
"agent": "exports/{agent_name}",
"ready": true,
"missing_credentials": [],
"warnings": [],
"errors": []
}
```
If `ready` is true, report success. If `missing_credentials` is non-empty, identify what failed and loop back to Step 3 for the remaining credentials.
## Health Check Reference
Health checks validate credentials by making lightweight API calls:
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | ---------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `resend` | `GET /domains` | API key validity |
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | --------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `resend` | `GET /domains` | API key validity |
```python
from aden_tools.credentials import check_credential_health, HealthCheckResult
@@ -560,60 +541,27 @@ token = store.get_key("hubspot", "access_token")
## Example Session
```
User: /setup-credentials for my research-agent
User: /hive-credentials for my research-agent
Agent: Let me check what credentials your research-agent needs.
[Reads agent config]
$ jq -r '.required_tools[]?' exports/research-agent/agent.json
web_search
google_search
$ jq -r '.graph.nodes[]?.node_type' exports/research-agent/agent.json | sort -u
event_loop
[Maps tools/nodes to credentials using lookup table]
Needed: anthropic, brave_search, google_search, google_cse
[Checks encrypted store metadata index]
$ cat ~/.hive/credentials/metadata/index.json | jq -r '.credentials | keys[]'
anthropic
brave_search
[Checks environment variables]
$ printenv ANTHROPIC_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv BRAVE_SEARCH_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv GOOGLE_API_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
not set
$ printenv GOOGLE_CSE_ID > /dev/null 2>&1 && echo "set" || echo "not set"
not set
[Checks HIVE_CREDENTIAL_KEY since credentials found in store]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
session: not set
$ for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
/Users/user/.zshrc
[Key is in shell config but not current session — sourcing it]
$ source ~/.zshrc
[Computes missing credentials]
Found:
✓ anthropic (ANTHROPIC_API_KEY) — in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — in encrypted store
Missing:
✗ google_search (GOOGLE_API_KEY)
✗ google_cse (GOOGLE_CSE_ID)
[Calls check_missing_credentials(agent_path="exports/research-agent")]
→ Returns:
available: anthropic (encrypted_store), brave_search (encrypted_store)
missing: google_search (GOOGLE_API_KEY), google_cse (GOOGLE_CSE_ID)
ready: false
Agent: 2 of 4 required credentials are already configured. Only Google Custom
Search needs setup (2 values as a single group).
Search needs setup (2 values).
--- Setting up Google Custom Search (google_search + google_cse) ---
This requires two values that work together.
[Checks HIVE_CREDENTIAL_KEY before storing]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
set
First, the Google API Key:
1. Go to https://console.cloud.google.com/apis/credentials
2. Create a new project (or select an existing one)
@@ -640,10 +588,31 @@ Now, the Custom Search Engine ID:
✓ Google Custom Search credentials valid
[Calls verify_credentials(agent_path="exports/research-agent")]
→ Returns: ready: true, missing_credentials: []
All credentials are now configured:
✓ anthropic (ANTHROPIC_API_KEY) — already in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — already in encrypted store
✓ google_search (GOOGLE_API_KEY) — stored in encrypted store
✓ google_cse (GOOGLE_CSE_ID) — stored in encrypted store
Your agent is ready to run!
┌─────────────────────────────────────────────────────────────────────────────┐
│ ✅ CREDENTIALS CONFIGURED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ NEXT STEPS: │
│ │
│ 1. RUN YOUR AGENT: │
│ │
│ PYTHONPATH=core:exports python -m research-agent tui │
│ │
│ 2. IF YOU ENCOUNTER ISSUES, USE THE DEBUGGER: │
│ │
│ /hive-debugger │
│ │
│ The debugger analyzes runtime logs, identifies retry loops, tool │
│ failures, stalled execution, and provides actionable fix suggestions. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
+933
View File
@@ -0,0 +1,933 @@
---
name: hive-debugger
type: utility
description: Interactive debugging companion for Hive agents - identifies runtime issues and proposes solutions
version: 1.0.0
requires:
- hive-concepts
tags:
- debugging
- runtime-logs
- agent-development
---
# Hive Debugger
An interactive debugging companion that helps developers identify and fix runtime issues in Hive agents. The debugger analyzes runtime logs at three levels (L1/L2/L3), categorizes issues, and provides actionable fix recommendations.
## When to Use This Skill
Use `/hive-debugger` when:
- Your agent is failing or producing unexpected results
- You need to understand why a specific node is retrying repeatedly
- Tool calls are failing and you need to identify the root cause
- Agent execution is stalled or taking too long
- You want to monitor agent behavior in real-time during development
This skill works alongside agents running in TUI mode and provides supervisor-level insights into execution behavior.
---
## Prerequisites
Before using this skill, ensure:
1. You have an exported agent in `exports/{agent_name}/`
2. The agent has been run at least once (logs exist)
3. Runtime logging is enabled (default in Hive framework)
4. You have access to the agent's working directory at `~/.hive/agents/{agent_name}/`
---
## Workflow
### Stage 1: Setup & Context Gathering
**Objective:** Understand the agent being debugged
**What to do:**
1. **Ask the developer which agent needs debugging:**
- Get agent name (e.g., "twitter_outreach", "deep_research_agent")
- Confirm the agent exists in `exports/{agent_name}/`
2. **Determine agent working directory:**
- Calculate: `~/.hive/agents/{agent_name}/`
- Verify this directory exists and contains session logs
3. **Read agent configuration:**
- Read file: `exports/{agent_name}/agent.json`
- Extract goal information from the JSON:
- `goal.id` - The goal identifier
- `goal.success_criteria` - What success looks like
- `goal.constraints` - Rules the agent must follow
- Extract graph information:
- List of node IDs from `graph.nodes`
- List of edges from `graph.edges`
4. **Store context for the debugging session:**
- agent_name
- agent_work_dir (e.g., `/home/user/.hive/twitter_outreach`)
- goal_id
- success_criteria
- constraints
- node_ids
**Example:**
```
Developer: "My twitter_outreach agent keeps failing"
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
[Read exports/twitter_outreach/agent.json]
Context gathered:
- Agent: twitter_outreach
- Goal: twitter-outreach-multi-loop
- Working Directory: /home/user/.hive/twitter_outreach
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
- Constraints: ["Must verify handle exists", "Must personalize message"]
- Nodes: ["intake-collector", "profile-analyzer", "message-composer", "outreach-sender"]
```
---
### Stage 2: Mode Selection
**Objective:** Choose the debugging approach that best fits the situation
**What to do:**
Ask the developer which debugging mode they want to use. Use AskUserQuestion with these options:
1. **Real-time Monitoring Mode**
- Description: Monitor active TUI session continuously, poll logs every 5-10 seconds, alert on new issues immediately
- Best for: Live debugging sessions where you want to catch issues as they happen
- Note: Requires agent to be currently running
2. **Post-Mortem Analysis Mode**
- Description: Analyze completed or failed runs in detail, deep dive into specific session
- Best for: Understanding why a past execution failed
- Note: Most common mode for debugging
3. **Historical Trends Mode**
- Description: Analyze patterns across multiple runs, identify recurring issues
- Best for: Finding systemic problems that happen repeatedly
- Note: Useful for agents that have run many times
**Implementation:**
```
Use AskUserQuestion to present these options and let the developer choose.
Store the selected mode for the session.
```
---
### Stage 3: Triage (L1 Analysis)
**Objective:** Identify which sessions need attention
**What to do:**
1. **Query high-level run summaries** using the MCP tool:
```
query_runtime_logs(
agent_work_dir="{agent_work_dir}",
status="needs_attention",
limit=20
)
```
2. **Analyze the results:**
- Look for runs with `needs_attention: true`
- Check `attention_summary.categories` for issue types
- Note the `run_id` of problematic sessions
- Check `status` field: "degraded", "failure", "in_progress"
3. **Attention flag triggers to understand:**
From runtime_logger.py, runs are flagged when:
- retry_count > 3
- escalate_count > 2
- latency_ms > 60000
- tokens_used > 100000
- total_steps > 20
4. **Present findings to developer:**
- Summarize how many runs need attention
- List the most recent problematic runs
- Show attention categories for each
- Ask which run they want to investigate (if multiple)
**Example Output:**
```
Found 2 runs needing attention:
1. session_20260206_115718_e22339c5 (30 minutes ago)
Status: degraded
Categories: missing_outputs, retry_loops
2. session_20260206_103422_9f8d1b2a (2 hours ago)
Status: failure
Categories: tool_failures, high_latency
Which run would you like to investigate?
```
---
### Stage 4: Diagnosis (L2 Analysis)
**Objective:** Identify which nodes failed and what patterns exist
**What to do:**
1. **Query per-node details** using the MCP tool:
```
query_runtime_log_details(
agent_work_dir="{agent_work_dir}",
run_id="{selected_run_id}",
needs_attention_only=True
)
```
2. **Categorize issues** using the Issue Taxonomy:
**10 Issue Categories:**
| Category | Detection Pattern | Meaning |
|----------|------------------|---------|
| **Missing Outputs** | `exit_status != "success"`, `attention_reasons` contains "missing_outputs" | Node didn't call set_output with required keys |
| **Tool Errors** | `tool_error_count > 0`, `attention_reasons` contains "tool_failures" | Tool calls failed (API errors, timeouts, auth issues) |
| **Retry Loops** | `retry_count > 3`, `verdict_counts.RETRY > 5` | Judge repeatedly rejecting outputs |
| **Guard Failures** | `guard_reject_count > 0` | Output validation failed (wrong types, missing keys) |
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress |
| **High Latency** | `latency_ms > 60000`, `avg_step_latency > 5000` | Slow tool calls or LLM responses |
| **Client-Facing Issues** | `client_input_requested` but no `user_input_received` | Premature set_output before user input |
| **Edge Routing Errors** | `exit_status == "no_valid_edge"`, `attention_reasons` contains "routing_issue" | No edges match current state |
| **Memory/Context Issues** | `tokens_used > 100000`, `context_overflow_count > 0` | Conversation history too long |
| **Constraint Violations** | Compare output against goal constraints | Agent violated goal-level rules |
3. **Analyze each flagged node:**
- Node ID and name
- Exit status
- Retry count
- Verdict distribution (ACCEPT/RETRY/ESCALATE/CONTINUE)
- Attention reasons
- Total steps executed
4. **Present diagnosis to developer:**
- List problematic nodes
- Categorize each issue
- Highlight the most severe problems
- Show evidence (retry counts, error types)
**Example Output:**
```
Diagnosis for session_20260206_115718_e22339c5:
Problem Node: intake-collector
├─ Exit Status: escalate
├─ Retry Count: 5 (HIGH)
├─ Verdict Counts: {RETRY: 5, ESCALATE: 1}
├─ Attention Reasons: ["high_retry_count", "missing_outputs"]
├─ Total Steps: 8
└─ Categories: Missing Outputs + Retry Loops
Root Issue: The intake-collector node is stuck in a retry loop because it's not setting required outputs.
```
---
### Stage 5: Root Cause Analysis (L3 Analysis)
**Objective:** Understand exactly what went wrong by examining detailed logs
**What to do:**
1. **Query detailed tool/LLM logs** using the MCP tool:
```
query_runtime_log_raw(
agent_work_dir="{agent_work_dir}",
run_id="{run_id}",
node_id="{problem_node_id}"
)
```
2. **Analyze based on issue category:**
**For Missing Outputs:**
- Check `step.tool_calls` for set_output usage
- Look for conditional logic that skipped set_output
- Check if LLM is calling other tools instead
**For Tool Errors:**
- Check `step.tool_results` for error messages
- Identify error types: rate limits, auth failures, timeouts, network errors
- Note which specific tool is failing
**For Retry Loops:**
- Check `step.verdict_feedback` from judge
- Look for repeated failure reasons
- Identify if it's the same issue every time
**For Guard Failures:**
- Check `step.guard_results` for validation errors
- Identify missing keys or type mismatches
- Compare actual output to expected schema
**For Stalled Execution:**
- Check `step.llm_response_text` for repetition
- Look for LLM stuck in same action loop
- Check if tool calls are succeeding but not progressing
3. **Extract evidence:**
- Specific error messages
- Tool call arguments and results
- LLM response text
- Judge feedback
- Step-by-step progression
4. **Formulate root cause explanation:**
- Clearly state what is happening
- Explain why it's happening
- Show evidence from logs
**Example Output:**
```
Root Cause Analysis for intake-collector:
Step-by-step breakdown:
Step 3:
- Tool Call: web_search(query="@RomuloNevesOf")
- Result: Found Twitter profile information
- Verdict: RETRY
- Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
Step 4:
- Tool Call: web_search(query="@RomuloNevesOf twitter")
- Result: Found additional Twitter information
- Verdict: RETRY
- Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
Steps 5-7: Similar pattern continues...
ROOT CAUSE: The node is successfully finding Twitter handles via web_search, but the LLM is not calling set_output to save the results. It keeps searching for more information instead of completing the task.
```
---
### Stage 6: Fix Recommendations
**Objective:** Provide actionable solutions the developer can implement
**What to do:**
Based on the issue category identified, provide specific fix recommendations using these templates:
#### Template 1: Missing Outputs (Client-Facing Nodes)
```markdown
## Issue: Premature set_output in Client-Facing Node
**Root Cause:** Node called set_output before receiving user input
**Fix:** Use STEP 1/STEP 2 prompt pattern
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
**Changes:**
1. Update the system_prompt to include explicit step guidance:
```python
system_prompt = """
STEP 1: Analyze the user input and decide what action to take.
DO NOT call set_output in this step.
STEP 2: After receiving feedback or completing analysis,
ONLY THEN call set_output with your results.
"""
```
2. If some inputs are optional (like feedback on retry edges), add nullable_output_keys:
```python
nullable_output_keys=["feedback"]
```
**Verification:**
- Run the agent with test input
- Verify the client-facing node waits for user input before calling set_output
```
#### Template 2: Retry Loops
```markdown
## Issue: Judge Repeatedly Rejecting Outputs
**Root Cause:** {Insert specific reason from verdict_feedback}
**Fix Options:**
**Option A - If outputs are actually correct:** Adjust judge evaluation rules
- File: `exports/{agent_name}/agent.json`
- Update `evaluation_rules` section to accept the current output format
- Example: If judge expects list but gets string, update rule to accept both
**Option B - If prompt is ambiguous:** Clarify node instructions
- File: `exports/{agent_name}/nodes/{node_name}.py`
- Make system_prompt more explicit about output format and requirements
- Add examples of correct outputs
**Option C - If tool is unreliable:** Add retry logic with fallback
- Consider using alternative tools
- Add manual fallback option
- Update prompt to handle tool failures gracefully
**Verification:**
- Run the node with test input
- Confirm judge accepts output on first try
- Check that retry_count stays at 0
```
#### Template 3: Tool Errors
```markdown
## Issue: {tool_name} Failing with {error_type}
**Root Cause:** {Insert specific error message from logs}
**Fix Strategy:**
**If API rate limit:**
1. Add exponential backoff in tool retry logic
2. Reduce API call frequency
3. Consider caching results
**If auth failure:**
1. Check credentials using:
```bash
/hive-credentials --agent {agent_name}
```
2. Verify API key environment variables
3. Update `mcp_servers.json` if needed
**If timeout:**
1. Increase timeout in `mcp_servers.json`:
```json
{
"timeout_ms": 60000
}
```
2. Consider using faster alternative tools
3. Break large requests into smaller chunks
**Verification:**
- Test tool call manually
- Confirm successful response
- Monitor for recurring errors
```
#### Template 4: Edge Routing Errors
```markdown
## Issue: No Valid Edge from Node {node_id}
**Root Cause:** No edge condition matched the current state
**File to edit:** `exports/{agent_name}/agent.json`
**Analysis:**
- Current node output: {show actual output keys}
- Existing edge conditions: {list edge conditions}
- Why no match: {explain the mismatch}
**Fix:**
Add the missing edge to the graph:
```json
{
"edge_id": "{node_id}_to_{target_node}",
"source": "{node_id}",
"target": "{target_node}",
"condition": "on_success"
}
```
**Alternative:** Update existing edge condition to cover this case
**Verification:**
- Run agent with same input
- Verify edge is traversed successfully
- Check that execution continues to next node
```
#### Template 5: Stalled Execution
```markdown
## Issue: EventLoopNode Not Making Progress
**Root Cause:** {Insert analysis - e.g., "LLM repeating same failed action"}
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
**Fix:** Update system_prompt to guide LLM out of loops
**Add this guidance:**
```python
system_prompt = """
{existing prompt}
IMPORTANT: If a tool call fails multiple times:
1. Try an alternative approach or different tool
2. If no alternatives work, call set_output with partial results
3. DO NOT retry the same failed action more than 3 times
Progress is more important than perfection. Move forward even with incomplete data.
"""
```
**Additional fix:** Lower max_iterations to prevent infinite loops
```python
# In node configuration
max_node_visits=3 # Prevent getting stuck
```
**Verification:**
- Run node with same input that caused stall
- Verify it exits after reasonable attempts (< 10 steps)
- Confirm it calls set_output eventually
```
#### Template 6: Checkpoint Recovery (Post-Fix Resume)
```markdown
## Recovery Strategy: Resume from Last Clean Checkpoint
**Situation:** You've fixed the issue, but the failed session is stuck mid-execution
**Solution:** Resume execution from a checkpoint before the failure
### Option A: Auto-Resume from Latest Checkpoint (Recommended)
Use CLI arguments to auto-resume when launching TUI:
```bash
PYTHONPATH=core:exports python -m {agent_name} --tui \
--resume-session {session_id}
```
This will:
- Load session state from `state.json`
- Continue from where it paused/failed
- Apply your fixes immediately
### Option B: Resume from Specific Checkpoint (Time-Travel)
If you need to go back to an earlier point:
```bash
PYTHONPATH=core:exports python -m {agent_name} --tui \
--resume-session {session_id} \
--checkpoint {checkpoint_id}
```
Example:
```bash
PYTHONPATH=core:exports python -m deep_research_agent --tui \
--resume-session session_20260208_143022_abc12345 \
--checkpoint cp_node_complete_intake_143030
```
### Option C: Use TUI Commands
Alternatively, launch TUI normally and use commands:
```bash
# Launch TUI
PYTHONPATH=core:exports python -m {agent_name} --tui
# In TUI, use commands:
/resume {session_id} # Resume from session state
/recover {session_id} {checkpoint_id} # Recover from specific checkpoint
```
### When to Use Each Option:
**Use `/resume` (or --resume-session) when:**
- You fixed credentials and want to retry
- Agent paused and you want to continue
- Agent failed and you want to retry from last state
**Use `/recover` (or --resume-session + --checkpoint) when:**
- You need to go back to an earlier checkpoint
- You want to try a different path from a specific point
- Debugging requires time-travel to earlier state
### Find Available Checkpoints:
```bash
# In TUI:
/sessions {session_id}
# This shows all checkpoints with timestamps:
Available Checkpoints: (3)
1. cp_node_complete_intake_143030
2. cp_node_complete_research_143115
3. cp_pause_research_143130
```
**Verification:**
- Use `--resume-session` to test your fix immediately
- No need to re-run from the beginning
- Session continues with your code changes applied
```
**Selecting the right template:**
- Match the issue category from Stage 4
- Customize with specific details from Stage 5
- Include actual error messages and code snippets
- Provide file paths and line numbers when possible
- **Always include recovery commands** (Template 6) after providing fix recommendations
---
### Stage 7: Verification Support
**Objective:** Help the developer confirm their fixes work
**What to do:**
1. **Suggest appropriate tests based on fix type:**
**For node-level fixes:**
```bash
# Use hive-test to run goal-based tests
/hive-test --agent {agent_name} --goal {goal_id}
# Or run specific test scenarios
/hive-test --agent {agent_name} --scenario {specific_input}
```
**For quick manual tests:**
```bash
# Launch the interactive TUI dashboard
hive tui
```
Then use arrow keys to select the agent from the list and press Enter to run it.
2. **Provide MCP tool queries to validate the fix:**
**Check if issue is resolved:**
```
query_runtime_logs(
agent_work_dir="~/.hive/agents/{agent_name}",
status="needs_attention",
limit=5
)
# Should show 0 results if fully fixed
```
**Verify specific node behavior:**
```
query_runtime_log_details(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="{new_run_id}",
node_id="{fixed_node_id}"
)
# Should show exit_status="success", retry_count=0
```
3. **Monitor for regression:**
- Run the agent multiple times
- Check for similar issues reappearing
- Verify fix works across different inputs
4. **Provide verification checklist:**
```
Verification Checklist:
□ Applied recommended fix to code
□ Ran agent with test input
□ Checked runtime logs show no attention flags
□ Verified specific node completes successfully
□ Tested with multiple inputs
□ No regression of original issue
□ Agent meets success criteria
```
**Example interaction:**
```
Developer: "I applied the fix to intake-collector. How do I verify it works?"
You: "Great! Let's verify the fix with these steps:
1. Launch the TUI dashboard:
hive tui
Then select your agent from the list and press Enter to run it.
2. After it completes, check the logs:
[Use query_runtime_logs to check for attention flags]
3. Verify the specific node:
[Use query_runtime_log_details for intake-collector]
Expected results:
- No 'needs_attention' flags
- intake-collector shows exit_status='success'
- retry_count should be 0
Let me know when you've run it and I'll help check the logs!"
```
---
## MCP Tool Usage Guide
### Three Levels of Observability
**L1: query_runtime_logs** - Session-level summaries
- **When to use:** Initial triage, identifying problematic runs, monitoring trends
- **Returns:** List of runs with status, attention flags, timestamps
- **Example:**
```
query_runtime_logs(
agent_work_dir="/home/user/.hive/twitter_outreach",
status="needs_attention",
limit=20
)
```
**L2: query_runtime_log_details** - Node-level details
- **When to use:** Diagnosing which nodes failed, understanding retry patterns
- **Returns:** Per-node completion details, retry counts, verdicts
- **Example:**
```
query_runtime_log_details(
agent_work_dir="/home/user/.hive/twitter_outreach",
run_id="session_20260206_115718_e22339c5",
needs_attention_only=True
)
```
**L3: query_runtime_log_raw** - Step-level details
- **When to use:** Root cause analysis, understanding exact failures
- **Returns:** Full tool calls, LLM responses, judge feedback
- **Example:**
```
query_runtime_log_raw(
agent_work_dir="/home/user/.hive/twitter_outreach",
run_id="session_20260206_115718_e22339c5",
node_id="intake-collector"
)
```
### Query Patterns
**Pattern 1: Top-Down Investigation** (Most common)
```
1. L1: Find problematic runs
2. L2: Identify failing nodes
3. L3: Analyze specific failures
```
**Pattern 2: Node-Specific Debugging**
```
1. L2: Get details for specific node across all runs
2. L3: Deep dive into worst failures
```
**Pattern 3: Real-time Monitoring**
```
Loop every 10 seconds:
1. L1: Check for new needs_attention runs
2. If found: Alert and drill into L2
```
---
## Complete Example Walkthrough
### Scenario: twitter_outreach Agent Failing
**Developer says:** "My twitter_outreach agent keeps failing. It's not finding Twitter handles correctly."
#### Stage 1: Setup
```
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
[Read exports/twitter_outreach/agent.json]
Context:
- Agent: twitter_outreach
- Goal: twitter-outreach-multi-loop
- Working Dir: ~/.hive/agents/twitter_outreach
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
- Constraints: ["Must verify handle exists", "Must personalize message"]
- Nodes: intake-collector, profile-analyzer, message-composer, outreach-sender
```
#### Stage 2: Mode Selection
```
You: "Which debugging approach would you prefer?"
[Present 3 mode options using AskUserQuestion]
Developer selects: "Post-Mortem Analysis Mode"
```
#### Stage 3: Triage (L1)
```
You: "Querying runtime logs for problematic runs..."
[Call query_runtime_logs with status="needs_attention"]
Results:
- Found 1 run needing attention
- session_20260206_115718_e22339c5 (30 min ago)
- Status: degraded
- Categories: missing_outputs, retry_loops
```
#### Stage 4: Diagnosis (L2)
```
You: "Analyzing node-level details..."
[Call query_runtime_log_details for the session]
Results:
- Problem Node: intake-collector
- Exit Status: escalate
- Retry Count: 5
- Verdict Counts: {RETRY: 5, ESCALATE: 1}
- Attention Reasons: ["high_retry_count", "missing_outputs"]
- Total Steps: 8
Diagnosis: intake-collector has retry loop + missing outputs issue
Category: Missing Outputs + Retry Loops
```
#### Stage 5: Root Cause (L3)
```
You: "Drilling into detailed logs for intake-collector..."
[Call query_runtime_log_raw for node intake-collector]
Evidence:
Step 3: web_search("@RomuloNevesOf") → Found profile
Verdict: RETRY
Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
Step 4: web_search("@RomuloNevesOf twitter") → Found more info
Verdict: RETRY
Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
Steps 5-7: Same pattern repeats...
ROOT CAUSE: Node successfully finds handles but LLM doesn't call set_output. It keeps searching instead of completing.
```
#### Stage 6: Fix Recommendation
```
You: "Here's the fix for this issue:
## Issue: intake-collector Not Setting Outputs
**Root Cause:** Node successfully finds Twitter handles but doesn't call set_output
**Category:** Missing Outputs
**File to edit:** exports/twitter_outreach/nodes/intake_collector.py
**Changes:**
Update the system_prompt to emphasize the output requirement:
```python
system_prompt = """
You are the Intake Collector for the Twitter outreach agent.
Your task:
1. Extract Twitter handle from user input (URL or @mention)
2. Verify it's a valid Twitter handle format
3. **CRITICAL: Call set_output(twitter_handles=[handle]) immediately after finding valid handle**
IMPORTANT: You MUST call set_output even if you found the handle on the first try.
Do not continue searching once you have a valid handle - set output and finish.
Output format:
{
"twitter_handles": ["@handle1", "@handle2", ...]
}
"""
```
**Why this works:** The explicit instruction to call set_output immediately prevents the LLM from continuing to search after finding the handle.
```
#### Stage 7: Verification
```
Developer: "I updated the prompt. How do I test it?"
You: "Let's verify the fix:
1. Launch the TUI dashboard:
hive tui
Then select your agent and press Enter to run it.
2. I'll check the logs after it runs..."
[Developer runs agent]
Developer: "It completed! Can you check if it worked?"
You: "Checking now..."
[Call query_runtime_log_details for the new run]
Results:
- Node: intake-collector
- Exit Status: success ✓
- Retry Count: 0 ✓
- Total Steps: 2 ✓
- No attention flags ✓
"Perfect! The fix worked. The intake-collector now:
- Completes successfully on first try
- No retries needed
- Calls set_output properly
Your agent should now work correctly!"
```
---
## Tips for Effective Debugging
1. **Always start with L1 logs** - Don't jump straight to detailed logs
2. **Focus on attention flags** - They highlight the real issues
3. **Compare verdict_feedback across steps** - Patterns reveal root causes
4. **Check tool error messages carefully** - They often contain the exact problem
5. **Consider the agent's goal** - Fixes should align with success criteria
6. **Test fixes immediately** - Quick verification prevents wasted effort
7. **Look for patterns across multiple runs** - One-time failures might be transient
## Common Pitfalls to Avoid
1. **Don't recommend code you haven't verified exists** - Always read files first
2. **Don't assume tool capabilities** - Check MCP server configs
3. **Don't ignore edge conditions** - Missing edges cause routing failures
4. **Don't overlook judge configuration** - Mismatched expectations cause retry loops
5. **Don't forget nullable_output_keys** - Optional inputs need explicit marking
---
## Storage Locations Reference
**New unified storage (default):**
- Logs: `~/.hive/agents/{agent_name}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/logs/`
- State: `~/.hive/agents/{agent_name}/sessions/{session_id}/state.json`
- Conversations: `~/.hive/agents/{agent_name}/sessions/{session_id}/conversations/`
**Old storage (deprecated, still supported):**
- Logs: `~/.hive/agents/{agent_name}/runtime_logs/runs/{run_id}/`
The MCP tools automatically check both locations.
---
**Remember:** Your role is to be a debugging companion and thought partner. Guide the developer through the investigation, explain what you find, and provide actionable fixes. Don't just report errors - help understand and solve them.
@@ -1,19 +1,19 @@
---
name: building-agents-patterns
name: hive-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: reference
part_of: building-agents
part_of: hive
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `building-agents-construction`.
**Prerequisites:** Complete agent structure using `hive-create`.
## Practical Example: Hybrid Workflow
@@ -97,6 +97,7 @@ research_node = NodeSpec(
```
**How it works:**
- Client-facing nodes stream LLM text to the user and block for input after each response
- User input is injected via `node.inject_event(text)`
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
@@ -107,13 +108,13 @@ research_node = NodeSpec(
### When to Use client_facing
| Scenario | client_facing | Why |
|----------|:---:|-----|
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
| Scenario | client_facing | Why |
| ----------------------------------- | :-----------: | ---------------------- |
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
@@ -158,22 +159,24 @@ EdgeSpec(
```
**Key concepts:**
- `nullable_output_keys`: Lists output keys that may remain unset. The node sets exactly one of the mutually exclusive keys per execution.
- `max_node_visits`: Must be >1 on the feedback target (extractor) so it can re-execute. Default is 1.
- `priority`: Positive = forward edge (evaluated first). Negative = feedback edge. The executor tries forward edges first; if none match, falls back to feedback edges.
### Routing Decision Table
| Pattern | Old Approach | New Approach |
|---------|-------------|--------------|
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
| Pattern | Old Approach | New Approach |
| ---------------------- | ----------------------- | --------------------------------------------- |
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
## Judge Patterns
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
- Output rollback logic
- `_user_has_responded` flags
- Premature set_output rejection
@@ -184,6 +187,7 @@ Judges control when an event_loop node's loop exits. Choose based on validation
### Implicit Judge (Default)
When no judge is configured, the implicit judge ACCEPTs when:
- The LLM finishes its response with no tool calls
- All required output keys have been set via `set_output`
@@ -219,11 +223,11 @@ class SchemaJudge:
### When to Use Which Judge
| Judge | Use When | Example |
|-------|----------|---------|
| Judge | Use When | Example |
| --------------- | ------------------------------------- | ---------------------- |
| Implicit (None) | Output keys are sufficient validation | Simple data extraction |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
## Fan-Out / Fan-In (Parallel Execution)
@@ -244,6 +248,7 @@ EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
```
**Requirements:**
- Parallel event_loop nodes must have **disjoint output_keys** (no key written by both)
- Only one parallel branch may contain a `client_facing` node
- Fan-in node receives outputs from all completed branches in shared memory
@@ -253,6 +258,7 @@ EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
### Tiered Compaction
EventLoopNode automatically manages context window usage with tiered compaction:
1. **Pruning** — Old tool results replaced with compact placeholders (zero-cost, no LLM call)
2. **Normal compaction** — LLM summarizes older messages
3. **Aggressive compaction** — Keeps only recent messages + summary
@@ -265,17 +271,20 @@ The framework automatically truncates large tool results and saves full content
For explicit data management, use the data tools (real MCP tools, not synthetic):
```python
# save_data, load_data, list_data_files are real MCP tools
# Each takes a data_dir parameter since the MCP server is shared
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
# data_dir is auto-injected by the framework — the LLM never sees it
# Saving large results
save_data(filename="sources.json", data=large_json_string, data_dir="/path/to/spillover")
save_data(filename="sources.json", data=large_json_string)
# Reading with pagination (line-based offset/limit)
load_data(filename="sources.json", data_dir="/path/to/spillover", offset=0, limit=50)
load_data(filename="sources.json", offset=0, limit=50)
# Listing available files
list_data_files(data_dir="/path/to/spillover")
list_data_files()
# Serving a file to the user as a clickable link
serve_file_to_user(filename="report.html", label="Research Report")
```
Add data tools to nodes that handle large tool results:
@@ -287,7 +296,7 @@ research_node = NodeSpec(
)
```
The `data_dir` is passed by the framework (from the node's spillover directory). The LLM sees `data_dir` in truncation messages and uses it when calling `load_data`.
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
## Anti-Patterns
@@ -304,18 +313,19 @@ The `data_dir` is passed by the framework (from the node's spillover directory).
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
| Bad (8 thin nodes) | Good (4 rich nodes) |
|---------------------|---------------------|
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
| Bad (8 thin nodes) | Good (4 rich nodes) |
| ------------------- | ----------------------------------- |
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
**Why fewer nodes are better:**
- The LLM retains full context of its work within a single node
- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
- Fewer edges means simpler graph and fewer failure points
@@ -324,6 +334,7 @@ A common mistake is splitting work into too many small single-purpose nodes. Eac
### MCP Tools - Correct Usage
**MCP tools OK for:**
- `test_node` — Validate node configuration with mock inputs
- `validate_graph` — Check graph structure
- `configure_loop` — Set event loop parameters
@@ -356,7 +367,7 @@ When agent is complete, transition to testing phase:
### Pre-Testing Checklist
- [ ] Agent structure validates: `uv run python -m agent_name validate`
- [ ] All nodes defined in nodes/__init__.py
- [ ] All nodes defined in nodes/**init**.py
- [ ] All edges connect valid nodes with correct priorities
- [ ] Feedback edge targets have `max_node_visits > 1`
- [ ] Client-facing nodes have meaningful system prompts
@@ -364,10 +375,10 @@ When agent is complete, transition to testing phase:
## Related Skills
- **building-agents-core** — Fundamental concepts (node types, edges, event loop architecture)
- **building-agents-construction** — Step-by-step building process
- **testing-agent** — Test and validate agents
- **agent-workflow** — Complete workflow orchestrator
- **hive-concepts** — Fundamental concepts (node types, edges, event loop architecture)
- **hive-create** — Step-by-step building process
- **hive-test** — Test and validate agents
- **hive** — Complete workflow orchestrator
---
@@ -1,11 +1,11 @@
---
name: testing-agent
name: hive-test
description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results.
---
# Testing Workflow
This skill provides tools for testing agents built with the building-agents skill.
This skill provides tools for testing agents built with the hive-create skill.
## Workflow Overview
@@ -61,7 +61,7 @@ mcp__agent-builder__debug_test(
# Testing Agents with MCP Tools
Run goal-based evaluation tests for agents built with the building-agents skill.
Run goal-based evaluation tests for agents built with the hive-create skill.
**Key Principle: MCP tools provide guidelines, Claude writes tests directly**
- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines
@@ -279,7 +279,7 @@ if missing_creds:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ GOAL STAGE │
│ (building-agents skill) │
│ (hive-create skill) │
│ │
│ 1. User defines goal with success_criteria and constraints │
│ 2. Goal written to agent.py immediately │
@@ -289,7 +289,7 @@ if missing_creds:
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT STAGE │
│ (building-agents skill) │
│ (hive-create skill) │
│ │
│ Build nodes + edges, written immediately to files │
│ Constraint tests can run during development: │
@@ -608,7 +608,7 @@ Edit(
)
# 4. May need to regenerate agent nodes if goal changed significantly
# This requires going back to building-agents skill
# This requires going back to hive-create skill
```
#### EDGE_CASE → Add Test and Fix
@@ -1027,17 +1027,17 @@ async def test_client_facing_node(mock_mode):
assert result.success or result.paused_at is not None
```
## Integration with building-agents
## Integration with hive-create
### Handoff Points
| Scenario | From | To | Action |
|----------|------|-----|--------|
| Agent built, ready to test | building-agents | testing-agent | Generate success tests |
| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild |
| IMPLEMENTATION_ERROR found | testing-agent | Direct fix | Edit agent files, re-run tests |
| EDGE_CASE found | testing-agent | testing-agent | Add edge case test |
| All tests pass | testing-agent | Done | Agent validated ✅ |
| Agent built, ready to test | hive-create | hive-test | Generate success tests |
| LOGIC_ERROR found | hive-test | hive-create | Update goal, rebuild |
| IMPLEMENTATION_ERROR found | hive-test | Direct fix | Edit agent files, re-run tests |
| EDGE_CASE found | hive-test | hive-test | Add edge case test |
| All tests pass | hive-test | Done | Agent validated ✅ |
### Iteration Speed Comparison
@@ -4,7 +4,7 @@ This example walks through testing a YouTube research agent that finds relevant
## Prerequisites
- Agent built with building-agents skill at `exports/youtube-research/`
- Agent built with hive-create skill at `exports/youtube-research/`
- Goal defined with success criteria and constraints
## Step 1: Load the Goal
@@ -283,11 +283,11 @@ result = debug_test(
Since this is an **IMPLEMENTATION_ERROR**, we:
1. **Don't restart** the Goal → Agent → Eval flow
2. **Fix the agent** using building-agents skill:
2. **Fix the agent** using hive-create skill:
- Modify `filter_node` to handle null results
3. **Re-run Eval** (tests only)
### Fix in building-agents:
### Fix in hive-create:
```python
# Update the filter_node to handle null
@@ -1,32 +1,53 @@
---
name: agent-workflow
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-* and testing-agent skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
name: hive
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: workflow-orchestrator
orchestrates:
- building-agents-core
- building-agents-construction
- building-agents-patterns
- testing-agent
- setup-credentials
- hive-concepts
- hive-create
- hive-patterns
- hive-test
- hive-credentials
- hive-debugger
---
# Agent Development Workflow
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, **ALWAYS use the AskUserQuestion tool** to present options:
```
Use AskUserQuestion with these options:
- "Build a new agent" → Then invoke /hive-create
- "Test an existing agent" → Then invoke /hive-test
- "Learn agent concepts" → Then invoke /hive-concepts
- "Optimize agent design" → Then invoke /hive-patterns
- "Set up credentials" → Then invoke /hive-credentials
- "Debug a failing agent" → Then invoke /hive-debugger
- "Other" (please describe what you want to achieve)
```
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
---
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
## Overview
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
1. **Understand Concepts**`/building-agents-core` (optional)
2. **Build Structure**`/building-agents-construction`
3. **Optimize Design**`/building-agents-patterns` (optional)
4. **Setup Credentials**`/setup-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/testing-agent`
1. **Understand Concepts**`/hive-concepts` (optional)
2. **Build Structure**`/hive-create`
3. **Optimize Design**`/hive-patterns` (optional)
4. **Setup Credentials**`/hive-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/hive-test`
6. **Debug Issues**`/hive-debugger` (if agent fails at runtime)
## When to Use This Workflow
@@ -37,26 +58,26 @@ Use this meta-skill when:
- Want consistent, repeatable agent builds
**Skip this workflow** if:
- You only need to test an existing agent → use `/testing-agent` directly
- You only need to test an existing agent → use `/hive-test` directly
- You know exactly which phase you're in → use specific skill directly
## Quick Decision Tree
```
"Need to understand agent concepts" → building-agents-core
"Build a new agent" → building-agents-construction
"Optimize my agent design" → building-agents-patterns
"Need client-facing nodes or feedback loops" → building-agents-patterns
"Set up API keys for my agent" → setup-credentials
"Test my agent" → testing-agent
"Need to understand agent concepts" → hive-concepts
"Build a new agent" → hive-create
"Optimize my agent design" → hive-patterns
"Need client-facing nodes or feedback loops" → hive-patterns
"Set up API keys for my agent" → hive-credentials
"Test my agent" → hive-test
"My agent is failing/stuck/has errors" → hive-debugger
"Not sure what I need" → Read phases below, then decide
"Agent has structure but needs implementation" → See agent directory STATUS.md
```
## Phase 0: Understand Concepts (Optional)
**Duration**: 5-10 minutes
**Skill**: `/building-agents-core`
**Skill**: `/hive-concepts`
**Input**: Questions about agent architecture
### When to Use
@@ -77,9 +98,8 @@ Use this meta-skill when:
## Phase 1: Build Agent Structure
**Duration**: 15-30 minutes
**Skill**: `/building-agents-construction`
**Input**: User requirements ("Build an agent that...")
**Skill**: `/hive-create`
**Input**: User requirements ("Build an agent that...") or a template to start from
### What This Phase Does
@@ -121,7 +141,7 @@ You're ready for Phase 2 when:
### Common Outputs
The building-agents-construction skill produces:
The hive-create skill produces:
```
exports/agent_name/
├── __init__.py (package exports)
@@ -141,15 +161,14 @@ exports/agent_name/
→ You may need to add Python functions or MCP tools (not covered by current skills)
**If want to optimize design:**
→ Proceed to Phase 1.5 (building-agents-patterns)
→ Proceed to Phase 1.5 (hive-patterns)
**If ready to test:**
→ Proceed to Phase 2
## Phase 1.5: Optimize Design (Optional)
**Duration**: 10-15 minutes
**Skill**: `/building-agents-patterns`
**Skill**: `/hive-patterns`
**Input**: Completed agent structure
### When to Use
@@ -173,22 +192,21 @@ exports/agent_name/
## Phase 2: Test & Validate
**Duration**: 20-40 minutes
**Skill**: `/testing-agent`
**Skill**: `/hive-test`
**Input**: Working agent from Phase 1
### What This Phase Does
Creates comprehensive test suite:
- Constraint tests (verify hard requirements)
- Success criteria tests (measure goal achievement)
- Edge case tests (handle failures gracefully)
- Integration tests (end-to-end workflows)
Guides the creation and execution of a comprehensive test suite:
- Constraint tests
- Success criteria tests
- Edge case tests
- Integration tests
### Process
1. **Analyze agent** - Read goal, constraints, success criteria
2. **Generate tests** - Create pytest files in `exports/agent_name/tests/`
2. **Generate tests** - The calling agent writes pytest files in `exports/agent_name/tests/` using hive-test guidelines and templates
3. **User approval** - Review and approve each test
4. **Run evaluation** - Execute tests and collect results
5. **Debug failures** - Identify and fix issues
@@ -251,9 +269,9 @@ You're done when:
```
User: "Build an agent that monitors files"
→ Use /building-agents-construction
→ Use /hive-create
→ Agent structure created
→ Use /testing-agent
→ Use /hive-test
→ Tests created and passing
→ Done: Production-ready agent
```
@@ -262,19 +280,32 @@ User: "Build an agent that monitors files"
```
User: "Build an agent (first time)"
→ Use /building-agents-core (understand concepts)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (optimize design)
→ Use /testing-agent (validate)
→ Use /hive-concepts (understand concepts)
→ Use /hive-create (build structure)
→ Use /hive-patterns (optimize design)
→ Use /hive-test (validate)
→ Done: Production-ready agent
```
### Pattern 1c: Build from Template
```
User: "Build an agent based on the deep research template"
→ Use /hive-create
→ Select "From a template" path
→ Pick template, name new agent
→ Review/modify goal, nodes, graph
→ Agent exported with customizations
→ Use /hive-test
→ Done: Customized agent
```
### Pattern 2: Test Existing Agent
```
User: "Test my agent at exports/my_agent"
→ Skip Phase 1
→ Use /testing-agent directly
→ Use /hive-test directly
→ Tests created
→ Done: Validated agent
```
@@ -283,10 +314,10 @@ User: "Test my agent at exports/my_agent"
```
User: "Build an agent"
→ Use /building-agents-construction (Phase 1)
→ Use /hive-create (Phase 1)
→ Implementation needed (see STATUS.md)
→ [User implements functions]
→ Use /testing-agent (Phase 2)
→ Use /hive-test (Phase 2)
→ Tests reveal bugs
→ [Fix bugs manually]
→ Re-run tests
@@ -297,45 +328,57 @@ User: "Build an agent"
```
User: "Build an agent with human review and feedback loops"
→ Use /building-agents-core (learn event loop, client-facing nodes)
→ Use /building-agents-construction (build structure with feedback edges)
→ Use /building-agents-patterns (implement client-facing + feedback patterns)
→ Use /testing-agent (validate review flows and edge routing)
→ Use /hive-concepts (learn event loop, client-facing nodes)
→ Use /hive-create (build structure with feedback edges)
→ Use /hive-patterns (implement client-facing + feedback patterns)
→ Use /hive-test (validate review flows and edge routing)
→ Done: Agent with HITL checkpoints and review loops
```
## Skill Dependencies
```
agent-workflow (meta-skill)
hive (meta-skill)
├── building-agents-core (foundational)
├── hive-concepts (foundational)
│ ├── Architecture concepts (event loop, judges)
│ ├── Node types (event_loop, function)
│ ├── Edge routing and priority
│ ├── Tool discovery procedures
│ └── Workflow overview
├── building-agents-construction (procedural)
├── hive-create (procedural)
│ ├── Creates package structure
│ ├── Defines goal
│ ├── Adds nodes (event_loop, function)
│ ├── Connects edges with priority routing
│ ├── Finalizes agent class
│ └── Requires: building-agents-core
│ └── Requires: hive-concepts
├── building-agents-patterns (reference)
├── hive-patterns (reference)
│ ├── Client-facing interaction patterns
│ ├── Feedback edges and review loops
│ ├── Judge patterns (implicit, SchemaJudge)
│ ├── Fan-out/fan-in parallel execution
│ └── Context management and anti-patterns
── testing-agent
├── Reads agent goal
├── Generates tests
├── Runs evaluation
└── Reports results
── hive-credentials (utility)
├── Detects missing credentials
├── Offers auth method choices (Aden OAuth, direct API key)
├── Stores securely in ~/.hive/credentials
└── Validates with health checks
├── hive-test (validation)
│ ├── Reads agent goal
│ ├── Generates tests
│ ├── Runs evaluation
│ └── Reports results
└── hive-debugger (troubleshooting)
├── Monitors runtime logs (L1/L2/L3)
├── Identifies retry loops, tool failures
├── Categorizes issues (10 categories)
└── Provides fix recommendations
```
## Troubleshooting
@@ -351,7 +394,7 @@ agent-workflow (meta-skill)
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
- Implementation may be needed (Python functions or MCP tools)
- This is expected - building-agents-construction creates structure, not implementation
- This is expected - hive-create creates structure, not implementation
- See implementation guide for completion options
### "Tests are failing"
@@ -359,9 +402,16 @@ agent-workflow (meta-skill)
- Review test output for specific failures
- Check agent goal and success criteria
- Verify constraints are met
- Use `/testing-agent` to debug and iterate
- Use `/hive-test` to debug and iterate
- Fix agent code and re-run tests
### "Agent is failing at runtime"
- Use `/hive-debugger` to analyze runtime logs
- The debugger identifies retry loops, tool failures, and stalled execution
- Get actionable fix recommendations with code changes
- Monitor the agent in real-time during TUI sessions
### "Not sure which phase I'm in"
Run these checks:
@@ -420,10 +470,10 @@ You're done with the workflow when:
## Additional Resources
- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
- **Agent framework docs**: See `core/README.md`
- **Example agents**: See `exports/` directory
@@ -431,36 +481,46 @@ You're done with the workflow when:
This workflow provides a proven path from concept to production-ready agent:
1. **Learn** with `/building-agents-core` → Understand fundamentals (optional)
2. **Build** with `/building-agents-construction` → Get validated structure
3. **Optimize** with `/building-agents-patterns` → Apply best practices (optional)
4. **Test** with `/testing-agent`Get verified functionality
1. **Learn** with `/hive-concepts` → Understand fundamentals (optional)
2. **Build** with `/hive-create` → Get validated structure
3. **Optimize** with `/hive-patterns` → Apply best practices (optional)
4. **Configure** with `/hive-credentials`Set up API keys (if needed)
5. **Test** with `/hive-test` → Get verified functionality
6. **Debug** with `/hive-debugger` → Fix runtime issues (if needed)
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
## Skill Selection Guide
**Choose building-agents-core when:**
**Choose hive-concepts when:**
- First time building agents
- Need to understand event loop architecture
- Validating tool availability
- Learning about node types, edges, and judges
**Choose building-agents-construction when:**
**Choose hive-create when:**
- Actually building an agent
- Have clear requirements
- Ready to write code
- Want step-by-step guidance
- Want to start from an existing template and customize it
**Choose building-agents-patterns when:**
**Choose hive-patterns when:**
- Agent structure complete
- Need client-facing nodes or feedback edges
- Implementing review loops or fan-out/fan-in
- Want judge patterns or context management
- Want best practices
**Choose testing-agent when:**
**Choose hive-test when:**
- Agent structure complete
- Ready to validate functionality
- Need comprehensive test coverage
- Testing feedback loops, output keys, or fan-out
**Choose hive-debugger when:**
- Agent is failing or stuck at runtime
- Seeing retry loops or escalations
- Tool calls are failing
- Need to understand why a node isn't completing
- Want real-time monitoring of agent execution
@@ -1,6 +1,6 @@
# Example: File Monitor Agent
This example shows the complete agent-workflow in action for building a file monitoring agent.
This example shows the complete /hive workflow in action for building a file monitoring agent.
## Initial Request
@@ -12,7 +12,7 @@ User: "Build an agent that monitors ~/Downloads and copies new files to ~/Docume
### Step 1: Create Structure
Agent invokes `/building-agents` skill and:
Agent invokes `/hive-create` skill and:
1. Creates `exports/file_monitor_agent/` package
2. Writes skeleton files (__init__.py, __main__.py, agent.py, etc.)
@@ -107,7 +107,7 @@ exports/file_monitor_agent/
### Step 1: Analyze Agent
Agent invokes `/testing-agent` skill and:
Agent invokes `/hive-test` skill and:
1. Reads goal from `exports/file_monitor_agent/agent.py`
2. Identifies 4 success criteria to test
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/agent-workflow
@@ -1 +0,0 @@
../../.claude/skills/building-agents-construction
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-core
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/testing-agent
+1
View File
@@ -74,3 +74,4 @@ exports/*
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
-41
View File
@@ -1,41 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Initial project structure
- React frontend (honeycomb) with Vite and TypeScript
- Node.js backend (hive) with Express and TypeScript
- Docker Compose configuration for local development
- Configuration system via `config.yaml`
- GitHub Actions CI/CD workflows
- Comprehensive documentation
### Changed
- N/A
### Deprecated
- N/A
### Removed
- N/A
### Fixed
- tools: Fixed web_scrape tool attempting to parse non-HTML content (PDF, JSON) as HTML (#487)
### Security
- N/A
## [0.1.0] - 2025-01-13
### Added
- Initial release
[Unreleased]: https://github.com/adenhq/hive/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/adenhq/hive/releases/tag/v0.1.0
+13 -6
View File
@@ -1,10 +1,10 @@
# Contributing to Aden Agent Framework
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. Were especially looking for help building tools, integrations([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If youre interested in extending its functionality, this is the perfect place to start.
Thank you for your interest in contributing to the Aden Agent Framework! This document provides guidelines and information for contributors. Were especially looking for help building tools, integrations ([check #2805](https://github.com/adenhq/hive/issues/2805)), and example agents for the framework. If youre interested in extending its functionality, this is the perfect place to start.
## Code of Conduct
By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md).
By participating in this project, you agree to abide by our [Code of Conduct](docs/CODE_OF_CONDUCT.md).
## Issue Assignment Policy
@@ -35,9 +35,16 @@ You may submit PRs without prior assignment for:
1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/hive.git`
3. Create a feature branch: `git checkout -b feature/your-feature-name`
4. Make your changes
5. Run checks and tests:
3. Add the upstream repository: `git remote add upstream https://github.com/adenhq/hive.git`
4. Sync with upstream to ensure you're starting from the latest code:
```bash
git fetch upstream
git checkout main
git merge upstream/main
```
5. Create a feature branch: `git checkout -b feature/your-feature-name`
6. Make your changes
7. Run checks and tests:
```bash
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
make test # Core tests (cd core && pytest tests/ -v)
@@ -152,4 +159,4 @@ By submitting a Pull Request, you agree that your contributions will be licensed
Feel free to open an issue for questions or join our [Discord community](https://discord.com/invite/MXE49hrKDk).
Thank you for contributing!
Thank you for contributing!
+3 -1
View File
@@ -4,9 +4,11 @@ help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter (with auto-fix)
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && ruff check --fix .
cd tools && ruff check --fix .
cd core && ruff format .
cd tools && ruff format .
format: ## Run ruff formatter
cd core && ruff format .
-47
View File
@@ -1,47 +0,0 @@
## Summary
- **Added HubSpot integration** — new HubSpot MCP tool with search, get, create, and update operations for contacts, companies, and deals. Includes OAuth2 provider for HubSpot credentials and credential store adapter for the tools layer.
- **Replaced web_scrape tool with Playwright + stealth** — swapped httpx/BeautifulSoup for a headless Chromium browser using `playwright` (async API) and `playwright-stealth`, enabling JS-rendered page scraping and bot detection evasion
- **Added empty response retry logic** — LLM provider now detects empty responses (e.g. Gemini returning 200 with no content on rate limit) and retries with exponential backoff, preventing hallucinated output from the cleanup LLM
- **Added context-aware input compaction** — LLM nodes now estimate input token count before calling the model and progressively truncate the largest values if they exceed the context window budget
- **Increased rate limit retries to 10** with verbose `[retry]` and `[compaction]` logging that includes model name, finish reason, and attempt count
- **Interactive quickstart onboarding** — `quickstart.sh` rewritten as bee-themed interactive wizard that detects existing API keys (including Claude Code subscription), lets user pick ONE default LLM provider, and saves configuration to `~/.hive/configuration.json`
- **Fixed lint errors** across `hubspot_tool.py` (line length) and `agent_builder_server.py` (unused variable)
## Changed files
### HubSpot Integration
- `tools/src/aden_tools/tools/hubspot_tool/` — New MCP tool: contacts, companies, and deals CRUD
- `tools/src/aden_tools/tools/__init__.py` — Registered HubSpot tools
- `tools/src/aden_tools/credentials/integrations.py` — HubSpot credential integration
- `tools/src/aden_tools/credentials/__init__.py` — Updated credential exports
- `core/framework/credentials/oauth2/hubspot_provider.py` — HubSpot OAuth2 provider
- `core/framework/credentials/oauth2/__init__.py` — Registered HubSpot OAuth2 provider
- `core/framework/runner/runner.py` — Updated runner for credential support
### Web Scrape Rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/web_scrape_tool.py` — Playwright async rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/README.md` — Updated docs
- `tools/pyproject.toml` — Added `playwright`, `playwright-stealth` deps
- `tools/Dockerfile` — Added `playwright install chromium --with-deps`
### LLM Reliability
- `core/framework/llm/litellm.py` — Empty response retry + max retries 10 + verbose logging
- `core/framework/graph/node.py` — Input compaction via `_compact_inputs()`, `_estimate_tokens()`, `_get_context_limit()`
### Quickstart & Setup
- `quickstart.sh` — Interactive bee-themed onboarding wizard with single provider selection
- `~/.hive/configuration.json` — New user config file for default LLM provider/model
### Fixes
- `core/framework/mcp/agent_builder_server.py` — Removed unused variable
- `tools/src/aden_tools/tools/hubspot_tool/hubspot_tool.py` — Fixed E501 line length violations
## Test plan
- [ ] Run `make lint` — passes clean
- [ ] Run `./quickstart.sh` and verify interactive flow works, config saved to `~/.hive/configuration.json`
- [ ] Run `pytest tests/tools/test_web_scrape_tool.py -v`
- [ ] Run agent against a JS-heavy site and verify `web_scrape` returns rendered content
- [ ] Set `HUBSPOT_ACCESS_TOKEN` and verify HubSpot tool CRUD operations work
- [ ] Trigger rate limit and verify `[retry]` logs appear with correct attempt counts
- [ ] Run agent with large inputs and verify `[compaction]` logs show truncation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
+77 -104
View File
@@ -1,5 +1,5 @@
<p align="center">
<img width="100%" alt="Hive Banner" src="https://storage.googleapis.com/aden-prod-assets/website/aden-title-card.png" />
<img width="100%" alt="Hive Banner" src="https://github.com/user-attachments/assets/a027429b-5d3c-4d34-88e4-0feaeaabbab3" />
</p>
<p align="center">
@@ -13,16 +13,20 @@
<a href="docs/i18n/ko.md">한국어</a>
</p>
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
<p align="center">
<a href="https://github.com/adenhq/hive/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache 2.0 License" /></a>
<a href="https://www.ycombinator.com/companies/aden"><img src="https://img.shields.io/badge/Y%20Combinator-Aden-orange" alt="Y Combinator" /></a>
<a href="https://discord.com/invite/MXE49hrKDk"><img src="https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb" alt="Discord" /></a>
<a href="https://x.com/aden_hq"><img src="https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5" alt="Twitter Follow" /></a>
<a href="https://www.linkedin.com/company/teamaden/"><img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="LinkedIn" /></a>
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
<img src="https://img.shields.io/badge/Goal--Driven-Development-purple?style=flat-square" alt="Goal-Driven" />
<img src="https://img.shields.io/badge/Headless-Development-purple?style=flat-square" alt="Headless" />
<img src="https://img.shields.io/badge/Human--in--the--Loop-orange?style=flat-square" alt="HITL" />
<img src="https://img.shields.io/badge/Production--Ready-red?style=flat-square" alt="Production" />
</p>
@@ -30,15 +34,16 @@
<img src="https://img.shields.io/badge/OpenAI-supported-412991?style=flat-square&logo=openai" alt="OpenAI" />
<img src="https://img.shields.io/badge/Anthropic-supported-d4a574?style=flat-square" alt="Anthropic" />
<img src="https://img.shields.io/badge/Google_Gemini-supported-4285F4?style=flat-square&logo=google" alt="Gemini" />
<img src="https://img.shields.io/badge/MCP-19_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
## Overview
Build reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Build autonomous, reliable, self-improving AI agents without hardcoding workflows. Define your goal through conversation with a coding agent, and the framework generates a node graph with dynamically created connection code. When things break, the framework captures failure data, evolves the agent through the coding agent, and redeploys. Built-in human-in-the-loop nodes, credential management, and real-time monitoring give you control without sacrificing adaptability.
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
https://github.com/user-attachments/assets/846c0cc7-ffd6-47fa-b4b7-495494857a55
## Who Is Hive For?
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
@@ -58,37 +63,23 @@ Hive may not be the best fit if youre only experimenting with simple agent ch
Use Hive when you need:
- Long-running, autonomous agents
- Multi-agent coordination
- Strong guardrails, process, and controls
- Continuous improvement based on failures
- Strong monitoring, safety, and budget controls
- Multi-agent coordination
- A framework that evolves with your goals
## What is Aden
<p align="center">
<img width="100%" alt="Aden Architecture" src="docs/assets/aden-architecture-diagram.jpg" />
</p>
Aden is a platform for building, deploying, operating, and adapting AI agents:
- **Build** - A Coding Agent generates specialized Worker Agents (Sales, Marketing, Ops) from natural language goals
- **Deploy** - Headless deployment with CI/CD integration and full API lifecycle management
- **Operate** - Real-time monitoring, observability, and runtime guardrails keep agents reliable
- **Adapt** - Continuous evaluation, supervision, and adaptation ensure agents improve over time
- **Infra** - Shared memory, LLM integrations, tools, and skills power every agent
## Quick Links
- **[Documentation](https://docs.adenhq.com/)** - Complete guides and API reference
- **[Self-Hosting Guide](https://docs.adenhq.com/getting-started/quickstart)** - Deploy Hive on your infrastructure
- **[Changelog](https://github.com/adenhq/hive/releases)** - Latest updates and releases
<!-- - **[Roadmap](https://adenhq.com/roadmap)** - Upcoming features and plans -->
- **[Roadmap](docs/roadmap.md)** - Upcoming features and plans
- **[Report Issues](https://github.com/adenhq/hive/issues)** - Bug reports and feature requests
- **[Contributing](CONTRIBUTING.md)** - How to contribute and submit PRs
## Quick Start
## Prerequisites
### Prerequisites
- Python 3.11+ for agent development
- Claude Code or Cursor for utilizing agent skills
@@ -107,45 +98,53 @@ cd hive
```
This sets up:
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- All required Python dependencies
- **credential store** - Encrypted API key storage (`~/.hive/credentials`)
- **LLM provider** - Interactive default model configuration
- All required Python dependencies with `uv`
### Build Your First Agent
```bash
# Build an agent using Claude Code
claude> /building-agents-construction
claude> /hive
# Test your agent
claude> /testing-agent
claude> /hive-debugger
# Run your agent
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
# (at separate terminal) Launch the interactive dashboard
hive tui
# Or run directly
hive run exports/your_agent_name --input '{"key": "value"}'
```
**[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
### Cursor IDE Support
Skills are also available in Cursor. To enable:
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
2. Run `MCP: Enable` to enable MCP servers
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
4. Type `/` in Agent chat and search for skills (e.g., `/building-agents-construction`)
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
## Features
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Interactive TUI Dashboard** - Terminal-based dashboard with live graph view, event log, and chat interface for agent interaction
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Integration
<a href="https://github.com/adenhq/hive/tree/main/tools/src/aden_tools/tools"><img width="100%" alt="Integration" src="https://github.com/user-attachments/assets/a1573f93-cf02-4bb8-b3d5-b305b05b1e51" /></a>
Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Aden
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
@@ -182,67 +181,60 @@ flowchart LR
style V6 fill:#fff,stroke:#ed8c00,stroke-width:1px,color:#cc5d00
```
### The Aden Advantage
### The Hive Advantage
| Traditional Frameworks | Aden |
| Traditional Frameworks | Hive |
| -------------------------- | -------------------------------------- |
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## Run pre-built Agents (Coming Soon)
## Run Agents
### Run a sample agent
Aden Hive provides a list of featured agents that you can use and build on top of.
### Run an agent shared by others
Put the agent in `exports/` and run `PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'`
For building and running goal-driven agents with the framework:
The `hive` CLI is the primary interface for running agents.
```bash
# One-time setup
./quickstart.sh
# Browse and run agents interactively (Recommended)
hive tui
# This sets up:
# - framework package (core runtime)
# - aden_tools package (MCP tools)
# - All Python dependencies
# Run a specific agent directly
hive run exports/my_agent --input '{"task": "Your input here"}'
# Build new agents using Claude Code skills
claude> /building-agents-construction
# Run a specific agent with the TUI dashboard
hive run exports/my_agent --tui
# Test agents
claude> /testing-agent
# Run agents
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
# Interactive REPL
hive shell
```
See [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) for complete setup instructions.
The TUI scans both `exports/` and `examples/templates/` for available agents.
> **Using Python directly (alternative):** You can also run agents with `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
## Documentation
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [TUI Guide](docs/tui-selection-guide.md) - Interactive dashboard usage
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
## Roadmap
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [ROADMAP.md](ROADMAP.md) for details.
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TD
@@ -332,11 +324,12 @@ end
classDef done fill:#9e9e9e,color:#fff,stroke:#757575
```
## Contributing
We welcome contributions from the community! Were especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/adenhq/hive/issues/2805)). If youre interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
1. Find or create an issue and get assigned
2. Fork the repository
@@ -369,10 +362,6 @@ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENS
## Frequently Asked Questions (FAQ)
**Q: Does Hive depend on LangChain or other agent frameworks?**
No. Hive is built from the ground up with no dependencies on LangChain, CrewAI, or other agent frameworks. The framework is designed to be lean and flexible, generating agent graphs dynamically rather than relying on predefined components.
**Q: What LLM providers does Hive support?**
Hive supports 100+ LLM providers through LiteLLM integration, including OpenAI (GPT-4, GPT-4o), Anthropic (Claude models), Google Gemini, DeepSeek, Mistral, Groq, and many more. Simply set the appropriate API key environment variable and specify the model name.
@@ -383,37 +372,25 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma
**Q: What makes Hive different from other agent frameworks?**
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
**Q: Is Hive open-source?**
Yes, Hive is fully open-source under the Apache License 2.0. We actively encourage community contributions and collaboration.
**Q: Does Hive collect data from users?**
Hive collects telemetry data for monitoring and observability purposes, including token usage, latency metrics, and cost tracking. Content capture (prompts and responses) is configurable and stored with team-scoped data isolation. All data stays within your infrastructure when self-hosted.
**Q: What deployment options does Hive support?**
Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](ENVIRONMENT_SETUP.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
**Q: Can Hive handle complex, production-scale use cases?**
Yes. Hive is explicitly designed for production environments with features like automatic failure recovery, real-time observability, cost controls, and horizontal scaling support. The framework handles both simple automations and complex multi-agent workflows.
**Q: Does Hive support human-in-the-loop workflows?**
Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Hive provide?**
Hive includes comprehensive observability features: real-time WebSocket streaming for live agent execution monitoring, TimescaleDB-powered analytics for cost and performance metrics, health check endpoints for Kubernetes integration, and MCP tools for agent execution, including file operations, web search, data processing, and more.
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What programming languages does Hive support?**
The Hive framework is built in Python. A JavaScript/TypeScript SDK is on the roadmap.
**Q: Can Aden agents interact with external tools and APIs?**
**Q: Can Hive agents interact with external tools and APIs?**
Yes. Aden's SDK-wrapped nodes provide built-in tool access, and the framework supports flexible tool ecosystems. Agents can integrate with external APIs, databases, and services through the node architecture.
@@ -423,7 +400,7 @@ Hive provides granular budget controls including spending limits, throttles, and
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
**Q: How can I contribute to Aden?**
@@ -437,10 +414,6 @@ Aden's adaptation loop begins working from the first execution. When an agent fa
Hive focuses on generating agents that run real business processes, rather than generic agents. This vision emphasizes outcome-driven design, adaptability, and an easy-to-use set of tools and integrations.
**Q: Does Aden offer enterprise support?**
For enterprise inquiries, contact the Aden team through [adenhq.com](https://adenhq.com) or join our [Discord community](https://discord.com/invite/MXE49hrKDk) for support and discussions.
---
<p align="center">
+1
View File
@@ -1,4 +1,5 @@
exports/
docs/
.agent-builder-sessions/
.pytest_cache/
**/__pycache__/
+1 -1
View File
@@ -145,7 +145,7 @@ uv run python -m framework test-debug <agent_path> <test_name>
uv run python -m framework test-list <goal_id>
```
For detailed testing workflows, see the [testing-agent skill](../.claude/skills/testing-agent/SKILL.md).
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
### Analyzing Agent Behavior with Builder
+2 -2
View File
@@ -4,8 +4,8 @@
"name": "tools",
"description": "Aden tools including web search, file operations, and PDF reading",
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../tools",
"env": {
"BRAVE_SEARCH_API_KEY": "${BRAVE_SEARCH_API_KEY}"
+7
View File
@@ -44,6 +44,13 @@ def _configure_paths():
if exports_str not in sys.path:
sys.path.insert(0, exports_str)
# Add examples/templates/ to sys.path so template agents are importable
templates_dir = project_root / "examples" / "templates"
if templates_dir.is_dir():
templates_str = str(templates_dir)
if templates_str not in sys.path:
sys.path.insert(0, templates_str)
# Ensure core/ is also in sys.path (for non-editable-install scenarios)
core_str = str(project_root / "core")
if (project_root / "core").is_dir() and core_str not in sys.path:
+64
View File
@@ -0,0 +1,64 @@
"""Shared Hive configuration utilities.
Centralises reading of ~/.hive/configuration.json so that the runner
and every agent template share one implementation instead of copy-pasting
helper functions.
"""
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.graph.edge import DEFAULT_MAX_TOKENS
# ---------------------------------------------------------------------------
# Low-level config file access
# ---------------------------------------------------------------------------
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
# ---------------------------------------------------------------------------
# Derived helpers
# ---------------------------------------------------------------------------
def get_preferred_model() -> str:
"""Return the user's preferred LLM model string (e.g. 'anthropic/claude-sonnet-4-20250514')."""
llm = get_hive_config().get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
def get_max_tokens() -> int:
"""Return the configured max_tokens, falling back to DEFAULT_MAX_TOKENS."""
return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# ---------------------------------------------------------------------------
# RuntimeConfig shared across agent templates
# ---------------------------------------------------------------------------
@dataclass
class RuntimeConfig:
"""Agent runtime configuration loaded from ~/.hive/configuration.json."""
model: str = field(default_factory=get_preferred_model)
temperature: float = 0.7
max_tokens: int = field(default_factory=get_max_tokens)
api_key: str | None = None
api_base: str | None = None
+19 -4
View File
@@ -143,19 +143,34 @@ class AdenCredentialResponse:
def from_dict(
cls, data: dict[str, Any], integration_id: str | None = None
) -> AdenCredentialResponse:
"""Create from API response dictionary."""
"""Create from API response dictionary or normalized credential dict."""
expires_at = None
if data.get("expires_at"):
expires_at = datetime.fromisoformat(data["expires_at"].replace("Z", "+00:00"))
resolved_integration_id = (
integration_id
or data.get("integration_id")
or data.get("alias")
or data.get("provider", "")
)
resolved_integration_type = data.get("integration_type") or data.get("provider", "")
metadata = data.get("metadata")
if metadata is None and data.get("email"):
metadata = {"email": data.get("email")}
if metadata is None:
metadata = {}
return cls(
integration_id=integration_id or data.get("alias", data.get("provider", "")),
integration_type=data.get("provider", ""),
integration_id=resolved_integration_id,
integration_type=resolved_integration_type,
access_token=data["access_token"],
token_type=data.get("token_type", "Bearer"),
expires_at=expires_at,
scopes=data.get("scopes", []),
metadata={"email": data.get("email")} if data.get("email") else {},
metadata=metadata,
)
+2 -1
View File
@@ -9,7 +9,7 @@ from framework.graph.client_io import (
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.event_loop_node import (
EventLoopNode,
JudgeProtocol,
@@ -58,6 +58,7 @@ __all__ = [
"EdgeSpec",
"EdgeCondition",
"GraphSpec",
"DEFAULT_MAX_TOKENS",
# Executor (fixed graph)
"GraphExecutor",
# Plan (flexible execution)
+85
View File
@@ -0,0 +1,85 @@
"""
Checkpoint Configuration - Controls checkpoint behavior during execution.
"""
from dataclasses import dataclass
@dataclass
class CheckpointConfig:
"""
Configuration for checkpoint behavior during graph execution.
Controls when checkpoints are created, how they're stored,
and when they're pruned.
"""
# Enable/disable checkpointing
enabled: bool = True
# When to checkpoint
checkpoint_on_node_start: bool = True
checkpoint_on_node_complete: bool = True
# Pruning (time-based)
checkpoint_max_age_days: int = 7 # Prune checkpoints older than 1 week
prune_every_n_nodes: int = 10 # Check for pruning every N nodes
# Performance
async_checkpoint: bool = True # Don't block execution on checkpoint writes
# What to include in checkpoints
include_full_memory: bool = True
include_metrics: bool = True
def should_checkpoint_node_start(self) -> bool:
"""Check if should checkpoint before node execution."""
return self.enabled and self.checkpoint_on_node_start
def should_checkpoint_node_complete(self) -> bool:
"""Check if should checkpoint after node execution."""
return self.enabled and self.checkpoint_on_node_complete
def should_prune_checkpoints(self, nodes_executed: int) -> bool:
"""
Check if should prune checkpoints based on execution progress.
Args:
nodes_executed: Number of nodes executed so far
Returns:
True if should check for old checkpoints and prune them
"""
return (
self.enabled
and self.prune_every_n_nodes > 0
and nodes_executed % self.prune_every_n_nodes == 0
)
# Default configuration for most agents
DEFAULT_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=True,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
prune_every_n_nodes=10,
async_checkpoint=True,
)
# Minimal configuration (only checkpoint at node completion)
MINIMAL_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
prune_every_n_nodes=20,
async_checkpoint=True,
)
# Disabled configuration (no checkpointing)
DISABLED_CHECKPOINT_CONFIG = CheckpointConfig(
enabled=False,
)
+41 -7
View File
@@ -24,10 +24,12 @@ given the current goal, context, and execution state.
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, model_validator
from framework.graph.safe_eval import safe_eval
DEFAULT_MAX_TOKENS = 8192
class EdgeCondition(StrEnum):
"""When an edge should be traversed."""
@@ -156,6 +158,10 @@ class EdgeSpec(BaseModel):
memory: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
import logging
logger = logging.getLogger(__name__)
if not self.condition_expr:
return True
@@ -172,12 +178,24 @@ class EdgeSpec(BaseModel):
try:
# Safe evaluation using AST-based whitelist
return bool(safe_eval(self.condition_expr, context))
result = bool(safe_eval(self.condition_expr, context))
# Log the evaluation for visibility
# Extract the variable names used in the expression for debugging
expr_vars = {
k: repr(context[k])
for k in context
if k not in ("output", "memory", "result", "true", "false")
and k in self.condition_expr
}
logger.info(
" Edge %s: condition '%s'%s (vars: %s)",
self.id,
self.condition_expr,
result,
expr_vars or "none matched",
)
return result
except Exception as e:
# Log the error for debugging
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ Condition evaluation failed: {self.condition_expr}")
logger.warning(f" Error: {e}")
logger.warning(f" Available context keys: {list(context.keys())}")
@@ -408,7 +426,7 @@ class GraphSpec(BaseModel):
# Default LLM settings
default_model: str = "claude-haiku-4-5-20251001"
max_tokens: int = 1024
max_tokens: int = Field(default=None) # resolved by _resolve_max_tokens validator
# Cleanup LLM for JSON extraction fallback (fast/cheap model preferred)
# If not set, uses CEREBRAS_API_KEY -> cerebras/llama-3.3-70b or
@@ -419,12 +437,28 @@ class GraphSpec(BaseModel):
max_steps: int = Field(default=100, description="Maximum node executions before timeout")
max_retries_per_node: int = 3
# EventLoopNode configuration (from configure_loop)
loop_config: dict[str, Any] = Field(
default_factory=dict,
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
)
# Metadata
description: str = ""
created_by: str = "" # "human" or "builder_agent"
model_config = {"extra": "allow"}
@model_validator(mode="before")
@classmethod
def _resolve_max_tokens(cls, values: Any) -> Any:
"""Resolve max_tokens from the global config store when not explicitly set."""
if isinstance(values, dict) and values.get("max_tokens") is None:
from framework.config import get_max_tokens
values["max_tokens"] = get_max_tokens()
return values
def get_node(self, node_id: str) -> Any | None:
"""Get a node by ID."""
for node in self.nodes:
+544 -124
View File
@@ -74,6 +74,11 @@ class LoopConfig:
max_history_tokens: int = 32_000
store_prefix: str = ""
# Overflow margin for max_tool_calls_per_turn. Tool calls are only
# discarded when the count exceeds max_tool_calls_per_turn * (1 + margin).
# Default 0.5 means 50% wiggle room (e.g. limit=10 → hard cutoff at 15).
tool_call_overflow_margin: float = 0.5
# --- Tool result context management ---
# When a tool result exceeds this character count, it is truncated in the
# conversation context. If *spillover_dir* is set the full result is
@@ -144,7 +149,7 @@ class EventLoopNode(NodeProtocol):
1. Try to restore from durable state (crash recovery)
2. If no prior state, init from NodeSpec.system_prompt + input_keys
3. Loop: drain injection queue -> stream LLM -> execute tools
-> if client_facing + no real tools: block for user input
-> if client_facing + ask_user called: block for user input
-> judge evaluates (acceptance criteria)
(each add_* and set_output writes through to store immediately)
4. Publish events to EventBus at each stage
@@ -152,11 +157,11 @@ class EventLoopNode(NodeProtocol):
6. Terminate when judge returns ACCEPT, shutdown signaled, or max iterations
7. Build output dict from OutputAccumulator
Client-facing blocking: When ``client_facing=True`` and the LLM finishes
without real tool calls (stop_reason != tool_call), the node blocks via
``_await_user_input()`` until ``inject_event()`` or ``signal_shutdown()``
is called. After user input, the judge evaluates the judge is the
sole mechanism for acceptance decisions.
Client-facing blocking: When ``client_facing=True``, a synthetic
``ask_user`` tool is injected. The node blocks for user input ONLY
when the LLM explicitly calls ``ask_user()``. Text-only turns
without ``ask_user`` flow through without blocking, allowing the LLM
to stream progress updates and summaries freely.
Always returns NodeResult with retryable=False semantics. The executor
must NOT retry event loop nodes -- retry is handled internally by the
@@ -205,9 +210,28 @@ class EventLoopNode(NodeProtocol):
stream_id = ctx.node_id
node_id = ctx.node_id
# Verdict counters for runtime logging
_accept_count = _retry_count = _escalate_count = _continue_count = 0
# 1. Guard: LLM required
if ctx.llm is None:
return NodeResult(success=False, error="LLM provider not available")
error_msg = "LLM provider not available"
# Log guard failure
if ctx.runtime_logger:
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error=error_msg,
exit_status="guard_failure",
total_steps=0,
tokens_used=0,
input_tokens=0,
output_tokens=0,
latency_ms=0,
)
return NodeResult(success=False, error=error_msg)
# 2. Restore or create new conversation + accumulator
conversation, accumulator, start_iteration = await self._restore(ctx)
@@ -228,11 +252,13 @@ class EventLoopNode(NodeProtocol):
if initial_message:
await conversation.add_user_message(initial_message)
# 3. Build tool list: node tools + synthetic set_output tool
# 3. Build tool list: node tools + synthetic set_output + ask_user tools
tools = list(ctx.available_tools)
set_output_tool = self._build_set_output_tool(ctx.node_spec.output_keys)
if set_output_tool:
tools.append(set_output_tool)
if ctx.node_spec.client_facing:
tools.append(self._build_ask_user_tool())
logger.info(
"[%s] Tools available (%d): %s | client_facing=%s | judge=%s",
@@ -251,9 +277,28 @@ class EventLoopNode(NodeProtocol):
# 6. Main loop
for iteration in range(start_iteration, self._config.max_iterations):
# 6a. Check pause
iter_start = time.time()
# 6a. Check pause (no current-iteration data yet — only log_node_complete needed)
if await self._check_pause(ctx, conversation, iteration):
latency_ms = int((time.time() - start_time) * 1000)
if ctx.runtime_logger:
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=True,
total_steps=iteration,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="paused",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=True,
output=accumulator.to_dict(),
@@ -278,25 +323,73 @@ class EventLoopNode(NodeProtocol):
iteration,
len(conversation.messages),
)
(
assistant_text,
real_tool_results,
outputs_set,
turn_tokens,
) = await self._run_single_turn(ctx, conversation, tools, iteration, accumulator)
logger.info(
"[%s] iter=%d: LLM done — text=%d chars, real_tools=%d, "
"outputs_set=%s, tokens=%s, accumulator=%s",
node_id,
iteration,
len(assistant_text),
len(real_tool_results),
outputs_set or "[]",
turn_tokens,
{k: ("set" if v is not None else "None") for k, v in accumulator.to_dict().items()},
)
total_input_tokens += turn_tokens.get("input", 0)
total_output_tokens += turn_tokens.get("output", 0)
try:
(
assistant_text,
real_tool_results,
outputs_set,
turn_tokens,
logged_tool_calls,
user_input_requested,
) = await self._run_single_turn(ctx, conversation, tools, iteration, accumulator)
logger.info(
"[%s] iter=%d: LLM done — text=%d chars, real_tools=%d, "
"outputs_set=%s, tokens=%s, accumulator=%s",
node_id,
iteration,
len(assistant_text),
len(real_tool_results),
outputs_set or "[]",
turn_tokens,
{
k: ("set" if v is not None else "None")
for k, v in accumulator.to_dict().items()
},
)
total_input_tokens += turn_tokens.get("input", 0)
total_output_tokens += turn_tokens.get("output", 0)
except Exception as e:
# LLM call crashed - log partial step with error
import traceback
iter_latency_ms = int((time.time() - iter_start) * 1000)
latency_ms = int((time.time() - start_time) * 1000)
error_msg = f"LLM call failed: {e}"
stack_trace = traceback.format_exc()
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
error=error_msg,
stacktrace=stack_trace,
is_partial=True,
input_tokens=0,
output_tokens=0,
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error=error_msg,
stacktrace=stack_trace,
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="failure",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
# Re-raise to maintain existing error handling
raise
# 6e'. Feed actual API token count back for accurate estimation
turn_input = turn_tokens.get("input", 0)
@@ -312,7 +405,12 @@ class EventLoopNode(NodeProtocol):
# outputs are already set, accept immediately. This prevents
# wasted iterations when the LLM has genuinely finished its
# work (e.g. after calling set_output in a previous turn).
truly_empty = not assistant_text and not real_tool_results and not outputs_set
truly_empty = (
not assistant_text
and not real_tool_results
and not outputs_set
and not user_input_requested
)
if truly_empty and accumulator is not None:
missing = self._get_missing_output_keys(
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
@@ -339,6 +437,38 @@ class EventLoopNode(NodeProtocol):
if self._is_stalled(recent_responses):
await self._publish_stalled(stream_id, node_id)
latency_ms = int((time.time() - start_time) * 1000)
_continue_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="CONTINUE",
verdict_feedback="Stall detected before judge evaluation",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error="Node stalled",
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="stalled",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=False,
error=(
@@ -355,18 +485,48 @@ class EventLoopNode(NodeProtocol):
# 6h. Client-facing input blocking
#
# For client_facing nodes, block for user input whenever the
# LLM finishes without making real tool calls (i.e. the LLM's
# stop_reason is not tool_call). set_output is separated from
# real tools by _run_single_turn, so this correctly treats
# set_output-only turns as conversational boundaries.
# For client_facing nodes, block for user input only when the
# LLM explicitly called ask_user(). Text-only turns without
# ask_user flow through without blocking, allowing progress
# updates and summaries to stream freely.
#
# After user input, always fall through to judge evaluation
# (6i). The judge handles all acceptance decisions.
if ctx.node_spec.client_facing and not real_tool_results:
if ctx.node_spec.client_facing and user_input_requested:
if self._shutdown:
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
_continue_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="CONTINUE",
verdict_feedback="Shutdown signaled (client-facing)",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=True,
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="success",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=True,
output=accumulator.to_dict(),
@@ -380,6 +540,37 @@ class EventLoopNode(NodeProtocol):
if not got_input:
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
_continue_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="CONTINUE",
verdict_feedback="No input received (shutdown during wait)",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=True,
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="success",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=True,
output=accumulator.to_dict(),
@@ -397,75 +588,207 @@ class EventLoopNode(NodeProtocol):
)
logger.info("[%s] iter=%d: 6i should_judge=%s", node_id, iteration, should_judge)
if should_judge:
verdict = await self._evaluate(
ctx,
conversation,
accumulator,
assistant_text,
real_tool_results,
iteration,
)
fb_preview = (verdict.feedback or "")[:200]
logger.info(
"[%s] iter=%d: judge verdict=%s feedback=%r",
node_id,
iteration,
verdict.action,
fb_preview,
)
if verdict.action == "ACCEPT":
# Check for missing output keys
missing = self._get_missing_output_keys(
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
if not should_judge:
# Gap C: unjudged iteration — log as CONTINUE
_continue_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="CONTINUE",
verdict_feedback="Unjudged (judge_every_n_turns skip)",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
if missing and self._judge is not None:
hint = (
f"Missing required output keys: {missing}. "
"Use set_output to provide them."
)
logger.info(
"[%s] iter=%d: ACCEPT but missing keys %s",
node_id,
iteration,
missing,
)
await conversation.add_user_message(hint)
continue
continue
# Write outputs to shared memory
for key, value in accumulator.to_dict().items():
ctx.memory.write(key, value, validate=False)
# Judge evaluation (should_judge is always True here)
verdict = await self._evaluate(
ctx,
conversation,
accumulator,
assistant_text,
real_tool_results,
iteration,
)
fb_preview = (verdict.feedback or "")[:200]
logger.info(
"[%s] iter=%d: judge verdict=%s feedback=%r",
node_id,
iteration,
verdict.action,
fb_preview,
)
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
return NodeResult(
if verdict.action == "ACCEPT":
# Check for missing output keys
missing = self._get_missing_output_keys(
accumulator, ctx.node_spec.output_keys, ctx.node_spec.nullable_output_keys
)
if missing and self._judge is not None:
hint = (
f"Missing required output keys: {missing}. Use set_output to provide them."
)
logger.info(
"[%s] iter=%d: ACCEPT but missing keys %s",
node_id,
iteration,
missing,
)
await conversation.add_user_message(hint)
# Gap D: log ACCEPT-with-missing-keys as RETRY
_retry_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="RETRY",
verdict_feedback=(f"Judge accepted but missing output keys: {missing}"),
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
continue
# Exit point 5: Judge ACCEPT — log step + log_node_complete
# Write outputs to shared memory
for key, value in accumulator.to_dict().items():
ctx.memory.write(key, value, validate=False)
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
_accept_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="ACCEPT",
verdict_feedback=verdict.feedback,
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=True,
output=accumulator.to_dict(),
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="success",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=True,
output=accumulator.to_dict(),
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
)
elif verdict.action == "ESCALATE":
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
return NodeResult(
elif verdict.action == "ESCALATE":
# Exit point 6: Judge ESCALATE — log step + log_node_complete
await self._publish_loop_completed(stream_id, node_id, iteration + 1)
latency_ms = int((time.time() - start_time) * 1000)
_escalate_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="ESCALATE",
verdict_feedback=verdict.feedback,
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error=f"Judge escalated: {verdict.feedback}",
output=accumulator.to_dict(),
total_steps=iteration + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="escalated",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=False,
error=f"Judge escalated: {verdict.feedback}",
output=accumulator.to_dict(),
tokens_used=total_input_tokens + total_output_tokens,
latency_ms=latency_ms,
)
elif verdict.action == "RETRY":
if verdict.feedback:
await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
continue
elif verdict.action == "RETRY":
_retry_count += 1
if ctx.runtime_logger:
iter_latency_ms = int((time.time() - iter_start) * 1000)
ctx.runtime_logger.log_step(
node_id=node_id,
node_type="event_loop",
step_index=iteration,
verdict="RETRY",
verdict_feedback=verdict.feedback,
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
latency_ms=iter_latency_ms,
)
if verdict.feedback:
await conversation.add_user_message(f"[Judge feedback]: {verdict.feedback}")
continue
# 7. Max iterations exhausted
await self._publish_loop_completed(stream_id, node_id, self._config.max_iterations)
latency_ms = int((time.time() - start_time) * 1000)
if ctx.runtime_logger:
ctx.runtime_logger.log_node_complete(
node_id=node_id,
node_name=ctx.node_spec.name,
node_type="event_loop",
success=False,
error=f"Max iterations ({self._config.max_iterations}) reached without acceptance",
total_steps=self._config.max_iterations,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
exit_status="failure",
accept_count=_accept_count,
retry_count=_retry_count,
escalate_count=_escalate_count,
continue_count=_continue_count,
)
return NodeResult(
success=False,
error=(f"Max iterations ({self._config.max_iterations}) reached without acceptance"),
@@ -496,8 +819,8 @@ class EventLoopNode(NodeProtocol):
async def _await_user_input(self, ctx: NodeContext) -> bool:
"""Block until user input arrives or shutdown is signaled.
Called when a client_facing node produces text without tool calls
a natural conversational turn boundary.
Called when a client_facing node explicitly calls ask_user()
an intentional conversational turn boundary.
Returns True if input arrived, False if shutdown was signaled.
"""
@@ -523,16 +846,23 @@ class EventLoopNode(NodeProtocol):
tools: list[Tool],
iteration: int,
accumulator: OutputAccumulator,
) -> tuple[str, list[dict], list[str], dict[str, int]]:
) -> tuple[str, list[dict], list[str], dict[str, int], list[dict], bool]:
"""Run a single LLM turn with streaming and tool execution.
Returns (assistant_text, real_tool_results, outputs_set, token_counts).
Returns (assistant_text, real_tool_results, outputs_set, token_counts, logged_tool_calls,
user_input_requested).
``real_tool_results`` contains only results from actual tools (web_search,
etc.), NOT from the synthetic ``set_output`` tool. ``outputs_set`` lists
the output keys written via ``set_output`` during this turn. This
separation lets the caller treat set_output as a framework concern
rather than a tool-execution concern.
etc.), NOT from the synthetic ``set_output`` or ``ask_user`` tools.
``outputs_set`` lists the output keys written via ``set_output`` during
this turn. ``user_input_requested`` is True if the LLM called
``ask_user`` during this turn. This separation lets the caller treat
synthetic tools as framework concerns rather than tool-execution concerns.
``logged_tool_calls`` accumulates ALL tool calls across inner iterations
(real tools, set_output, and discarded calls) for L3 logging. Unlike
``real_tool_results`` which resets each inner iteration, this list grows
across the entire turn.
"""
stream_id = ctx.node_id
node_id = ctx.node_id
@@ -541,6 +871,10 @@ class EventLoopNode(NodeProtocol):
final_text = ""
# Track output keys set via set_output across all inner iterations
outputs_set_this_turn: list[str] = []
user_input_requested = False
# Accumulate ALL tool calls across inner iterations for L3 logging.
# Unlike real_tool_results (reset each inner iteration), this persists.
logged_tool_calls: list[dict] = []
# Inner tool loop: stream may produce tool calls requiring re-invocation
while True:
@@ -611,15 +945,25 @@ class EventLoopNode(NodeProtocol):
# If no tool calls, turn is complete
if not tool_calls:
return final_text, [], outputs_set_this_turn, token_counts
return (
final_text,
[],
outputs_set_this_turn,
token_counts,
logged_tool_calls,
user_input_requested,
)
# Execute tool calls — separate real tools from set_output
real_tool_results: list[dict] = []
limit_hit = False
executed_in_batch = 0
hard_limit = int(
self._config.max_tool_calls_per_turn * (1 + self._config.tool_call_overflow_margin)
)
for tc in tool_calls:
tool_call_count += 1
if tool_call_count > self._config.max_tool_calls_per_turn:
if tool_call_count > hard_limit:
limit_hit = True
break
executed_in_batch += 1
@@ -652,24 +996,42 @@ class EventLoopNode(NodeProtocol):
if isinstance(value, str):
try:
parsed = json.loads(value)
if isinstance(parsed, (list, dict)):
if isinstance(parsed, (list, dict, bool, int, float)):
value = parsed
except (json.JSONDecodeError, TypeError):
pass
await accumulator.set(tc.tool_input["key"], value)
outputs_set_this_turn.append(tc.tool_input["key"])
else:
# --- Real tool execution ---
result = await self._execute_tool(tc)
result = self._truncate_tool_result(result, tc.tool_name)
real_tool_results.append(
logged_tool_calls.append(
{
"tool_use_id": tc.tool_use_id,
"tool_name": tc.tool_name,
"tool_name": "set_output",
"tool_input": tc.tool_input,
"content": result.content,
"is_error": result.is_error,
}
)
elif tc.tool_name == "ask_user":
# --- Framework-level ask_user handling ---
user_input_requested = True
result = ToolResult(
tool_use_id=tc.tool_use_id,
content="Waiting for user input...",
is_error=False,
)
else:
# --- Real tool execution ---
result = await self._execute_tool(tc)
result = self._truncate_tool_result(result, tc.tool_name)
tool_entry = {
"tool_use_id": tc.tool_use_id,
"tool_name": tc.tool_name,
"tool_input": tc.tool_input,
"content": result.content,
"is_error": result.is_error,
}
real_tool_results.append(tool_entry)
logged_tool_calls.append(tool_entry)
# Record tool result in conversation (both real and set_output
# go into the conversation for LLM context continuity)
@@ -695,17 +1057,16 @@ class EventLoopNode(NodeProtocol):
# corresponding tool results, causing the LLM to repeat them
# in the next turn (infinite loop).
if limit_hit:
max_tc = self._config.max_tool_calls_per_turn
skipped = tool_calls[executed_in_batch:]
logger.warning(
"Max tool calls per turn (%d) exceeded — discarding %d remaining call(s): %s",
max_tc,
"Hard tool call limit (%d) exceeded — discarding %d remaining call(s): %s",
hard_limit,
len(skipped),
", ".join(tc.tool_name for tc in skipped),
)
discard_msg = (
f"Tool call discarded: max tool calls per turn "
f"({max_tc}) exceeded. Consolidate your work and "
f"Tool call discarded: hard limit of {hard_limit} tool calls "
f"per turn exceeded. Consolidate your work and "
f"use fewer tool calls."
)
for tc in skipped:
@@ -716,14 +1077,15 @@ class EventLoopNode(NodeProtocol):
)
# Discarded calls go into real_tool_results so the
# caller sees they were attempted (for judge context).
real_tool_results.append(
{
"tool_use_id": tc.tool_use_id,
"tool_name": tc.tool_name,
"content": discard_msg,
"is_error": True,
}
)
discard_entry = {
"tool_use_id": tc.tool_use_id,
"tool_name": tc.tool_name,
"tool_input": tc.tool_input,
"content": discard_msg,
"is_error": True,
}
real_tool_results.append(discard_entry)
logged_tool_calls.append(discard_entry)
# Prune old tool results NOW to prevent context bloat on the
# next turn. The char-based token estimator underestimates
# actual API tokens, so the standard compaction check in the
@@ -741,7 +1103,14 @@ class EventLoopNode(NodeProtocol):
)
# Limit hit — return from this turn so the judge can
# evaluate instead of looping back for another stream.
return final_text, real_tool_results, outputs_set_this_turn, token_counts
return (
final_text,
real_tool_results,
outputs_set_this_turn,
token_counts,
logged_tool_calls,
user_input_requested,
)
# --- Mid-turn pruning: prevent context blowup within a single turn ---
if conversation.usage_ratio() >= 0.6:
@@ -757,12 +1126,51 @@ class EventLoopNode(NodeProtocol):
conversation.usage_ratio() * 100,
)
# If ask_user was called, return immediately so the outer loop
# can block for user input instead of re-invoking the LLM.
if user_input_requested:
return (
final_text,
real_tool_results,
outputs_set_this_turn,
token_counts,
logged_tool_calls,
user_input_requested,
)
# Tool calls processed -- loop back to stream with updated conversation
# -------------------------------------------------------------------
# set_output synthetic tool
# Synthetic tools: set_output, ask_user
# -------------------------------------------------------------------
def _build_ask_user_tool(self) -> Tool:
"""Build the synthetic ask_user tool for explicit user-input requests.
Client-facing nodes call ask_user() when they need to pause and wait
for user input. Text-only turns WITHOUT ask_user flow through without
blocking, allowing progress updates and summaries to stream freely.
"""
return Tool(
name="ask_user",
description=(
"Call this tool when you need to wait for the user's response. "
"Use it after greeting the user, asking a question, or requesting "
"approval. Do NOT call it when you are just providing a status "
"update or summary that doesn't require a response."
),
parameters={
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "Optional: the question or prompt shown to the user.",
},
},
"required": [],
},
)
def _build_set_output_tool(self, output_keys: list[str] | None) -> Tool | None:
"""Build the synthetic set_output tool for explicit output declaration."""
if not output_keys:
@@ -1048,7 +1456,7 @@ class EventLoopNode(NodeProtocol):
truncated = (
f"[Result from {tool_name}: {len(result.content)} chars — "
f"too large for context, saved to '{filename}'. "
f"Use load_data(filename='{filename}', data_dir='{spill_dir}') "
f"Use load_data(filename='{filename}') "
f"to read the full result.]\n\n"
f"Preview:\n{preview}"
)
@@ -1268,11 +1676,9 @@ class EventLoopNode(NodeProtocol):
# 5. Spillover files hint
if self._config.spillover_dir:
spill = self._config.spillover_dir
parts.append(
"NOTE: Large tool results were saved to files. "
f"Use load_data(filename='<filename>', data_dir='{spill}') "
"to read them."
"Use load_data(filename='<filename>') to read them."
)
# 6. Tool call history (prevent re-calling tools)
@@ -1357,7 +1763,19 @@ class EventLoopNode(NodeProtocol):
conversation: NodeConversation,
iteration: int,
) -> bool:
"""Check if pause has been requested. Returns True if paused."""
"""
Check if pause has been requested. Returns True if paused.
Note: This check happens BEFORE starting iteration N, after completing N-1.
If paused, the node exits having completed {iteration} iterations (0 to iteration-1).
"""
# Check executor-level pause event (for /pause command, Ctrl+Z)
if ctx.pause_event and ctx.pause_event.is_set():
completed = iteration # 0-indexed: iteration=3 means 3 iterations completed (0,1,2)
logger.info(f"⏸ Pausing after {completed} iteration(s) completed (executor-level)")
return True
# Check context-level pause flags (legacy/alternative methods)
pause_requested = ctx.input_data.get("pause_requested", False)
if not pause_requested:
try:
@@ -1365,8 +1783,10 @@ class EventLoopNode(NodeProtocol):
except (PermissionError, KeyError):
pause_requested = False
if pause_requested:
logger.info(f"Pause requested at iteration {iteration}")
completed = iteration
logger.info(f"⏸ Pausing after {completed} iteration(s) completed (context-level)")
return True
return False
# -------------------------------------------------------------------
+480 -7
View File
@@ -17,6 +17,7 @@ from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.goal import Goal
from framework.graph.node import (
@@ -32,7 +33,10 @@ from framework.graph.node import (
from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
from framework.graph.validator import OutputValidator
from framework.llm.provider import LLMProvider, Tool
from framework.observability import set_trace_context
from framework.runtime.core import Runtime
from framework.schemas.checkpoint import Checkpoint
from framework.storage.checkpoint_store import CheckpointStore
@dataclass
@@ -131,7 +135,9 @@ class GraphExecutor:
parallel_config: ParallelExecutionConfig | None = None,
event_bus: Any | None = None,
stream_id: str = "",
runtime_logger: Any = None,
storage_path: str | Path | None = None,
loop_config: dict[str, Any] | None = None,
):
"""
Initialize the executor.
@@ -148,7 +154,9 @@ class GraphExecutor:
parallel_config: Configuration for parallel execution behavior
event_bus: Optional event bus for emitting node lifecycle events
stream_id: Stream ID for event correlation
runtime_logger: Optional RuntimeLogger for per-graph-run logging
storage_path: Optional base path for conversation persistence
loop_config: Optional EventLoopNode configuration (max_iterations, etc.)
"""
self.runtime = runtime
self.llm = llm
@@ -160,7 +168,9 @@ class GraphExecutor:
self.logger = logging.getLogger(__name__)
self._event_bus = event_bus
self._stream_id = stream_id
self.runtime_logger = runtime_logger
self._storage_path = Path(storage_path) if storage_path else None
self._loop_config = loop_config or {}
# Initialize output cleaner
self.cleansing_config = cleansing_config or CleansingConfig()
@@ -173,6 +183,9 @@ class GraphExecutor:
self.enable_parallel_execution = enable_parallel_execution
self._parallel_config = parallel_config or ParallelExecutionConfig()
# Pause/resume control
self._pause_requested = asyncio.Event()
def _validate_tools(self, graph: GraphSpec) -> list[str]:
"""
Validate that all tools declared by nodes are available.
@@ -202,6 +215,7 @@ class GraphExecutor:
goal: Goal,
input_data: dict[str, Any] | None = None,
session_state: dict[str, Any] | None = None,
checkpoint_config: "CheckpointConfig | None" = None,
) -> ExecutionResult:
"""
Execute a graph for a goal.
@@ -215,6 +229,9 @@ class GraphExecutor:
Returns:
ExecutionResult with output and metrics
"""
# Add agent_id to trace context for correlation
set_trace_context(agent_id=graph.id)
# Validate graph
errors = graph.validate()
if errors:
@@ -240,6 +257,12 @@ class GraphExecutor:
# Initialize execution state
memory = SharedMemory()
# Initialize checkpoint store if checkpointing is enabled
checkpoint_store: CheckpointStore | None = None
if checkpoint_config and checkpoint_config.enabled and self._storage_path:
checkpoint_store = CheckpointStore(self._storage_path)
self.logger.info("✓ Checkpointing enabled")
# Restore session state if provided
if session_state and "memory" in session_state:
memory_data = session_state["memory"]
@@ -267,8 +290,110 @@ class GraphExecutor:
node_visit_counts: dict[str, int] = {} # Track visits for feedback loops
_is_retry = False # True when looping back for a retry (not a new visit)
# Restore node_visit_counts from session state if available
if session_state and "node_visit_counts" in session_state:
node_visit_counts = dict(session_state["node_visit_counts"])
if node_visit_counts:
self.logger.info(f"📥 Restored node visit counts: {node_visit_counts}")
# If resuming at a specific node (paused_at), that node was counted
# but never completed, so decrement its count
paused_at = session_state.get("paused_at")
if (
paused_at
and paused_at in node_visit_counts
and node_visit_counts[paused_at] > 0
):
old_count = node_visit_counts[paused_at]
node_visit_counts[paused_at] -= 1
self.logger.info(
f"📥 Decremented visit count for paused node '{paused_at}': "
f"{old_count} -> {node_visit_counts[paused_at]}"
)
# Determine entry point (may differ if resuming)
current_node_id = graph.get_entry_point(session_state)
# Check if resuming from checkpoint
if session_state and session_state.get("resume_from_checkpoint") and checkpoint_store:
checkpoint_id = session_state["resume_from_checkpoint"]
try:
checkpoint = await checkpoint_store.load_checkpoint(checkpoint_id)
if checkpoint:
self.logger.info(
f"🔄 Resuming from checkpoint: {checkpoint_id} "
f"(node: {checkpoint.current_node})"
)
# Restore memory from checkpoint
for key, value in checkpoint.shared_memory.items():
memory.write(key, value, validate=False)
# Start from checkpoint's next node or current node
current_node_id = (
checkpoint.next_node or checkpoint.current_node or graph.entry_node
)
# Restore execution path
path.extend(checkpoint.execution_path)
self.logger.info(
f"📥 Restored memory with {len(checkpoint.shared_memory)} keys, "
f"resuming at node: {current_node_id}"
)
else:
self.logger.warning(
f"Checkpoint {checkpoint_id} not found, resuming from normal entry point"
)
# Check if resuming from paused_at (fallback to session state)
paused_at = session_state.get("paused_at") if session_state else None
if paused_at and graph.get_node(paused_at) is not None:
current_node_id = paused_at
self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
else:
current_node_id = graph.get_entry_point(session_state)
except Exception as e:
self.logger.error(
f"Failed to load checkpoint {checkpoint_id}: {e}, "
f"resuming from normal entry point"
)
# Check if resuming from paused_at (fallback to session state)
paused_at = session_state.get("paused_at") if session_state else None
if paused_at and graph.get_node(paused_at) is not None:
current_node_id = paused_at
self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
else:
current_node_id = graph.get_entry_point(session_state)
else:
# Check if resuming from paused_at (session state resume)
paused_at = session_state.get("paused_at") if session_state else None
node_ids = [n.id for n in graph.nodes]
self.logger.info(f"🔍 Debug: paused_at={paused_at}, available node IDs={node_ids}")
if paused_at and graph.get_node(paused_at) is not None:
# Resume from paused_at node directly (works for any node, not just pause_nodes)
current_node_id = paused_at
# Restore execution path from session state if available
if session_state:
execution_path = session_state.get("execution_path", [])
if execution_path:
path.extend(execution_path)
self.logger.info(
f"🔄 Resuming from paused node: {paused_at} "
f"(restored path: {execution_path})"
)
else:
self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
else:
self.logger.info(f"🔄 Resuming from paused node: {paused_at}")
else:
# Fall back to normal entry point logic
self.logger.warning(
f"⚠ paused_at={paused_at} is not a valid node, falling back to entry point"
)
current_node_id = graph.get_entry_point(session_state)
steps = 0
if session_state and current_node_id != graph.entry_node:
@@ -281,14 +406,70 @@ class GraphExecutor:
input_data=input_data or {},
)
if self.runtime_logger:
# Extract session_id from storage_path if available (for unified sessions)
session_id = ""
if self._storage_path and self._storage_path.name.startswith("session_"):
session_id = self._storage_path.name
self.runtime_logger.start_run(goal_id=goal.id, session_id=session_id)
self.logger.info(f"🚀 Starting execution: {goal.name}")
self.logger.info(f" Goal: {goal.description}")
self.logger.info(f" Entry node: {graph.entry_node}")
# Set per-execution data_dir so data tools (save_data, load_data, etc.)
# and spillover files share the same session-scoped directory.
_ctx_token = None
if self._storage_path:
from framework.runner.tool_registry import ToolRegistry
_ctx_token = ToolRegistry.set_execution_context(
data_dir=str(self._storage_path / "data"),
)
try:
while steps < graph.max_steps:
steps += 1
# Check for pause request
if self._pause_requested.is_set():
self.logger.info("⏸ Pause detected - stopping at node boundary")
# Create session state for pause
saved_memory = memory.read_all()
pause_session_state: dict[str, Any] = {
"memory": saved_memory, # Include memory for resume
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
}
# Create a pause checkpoint
if checkpoint_store:
pause_checkpoint = self._create_checkpoint(
checkpoint_type="pause",
current_node=current_node_id,
execution_path=path,
memory=memory,
next_node=current_node_id,
is_clean=True,
)
await checkpoint_store.save_checkpoint(pause_checkpoint)
pause_session_state["latest_checkpoint_id"] = pause_checkpoint.checkpoint_id
pause_session_state["resume_from_checkpoint"] = (
pause_checkpoint.checkpoint_id
)
# Return with paused status
return ExecutionResult(
success=False,
output=saved_memory,
path=path,
paused_at=current_node_id,
error="Execution paused by user request",
session_state=pause_session_state,
node_visit_counts=dict(node_visit_counts),
)
# Get current node
node_spec = graph.get_node(current_node_id)
if node_spec is None:
@@ -367,6 +548,27 @@ class GraphExecutor:
description=f"Validation errors for {current_node_id}: {validation_errors}",
)
# CHECKPOINT: node_start
if (
checkpoint_store
and checkpoint_config
and checkpoint_config.should_checkpoint_node_start()
):
checkpoint = self._create_checkpoint(
checkpoint_type="node_start",
current_node=node_spec.id,
execution_path=list(path),
memory=memory,
is_clean=(sum(node_retry_counts.values()) == 0),
)
if checkpoint_config.async_checkpoint:
# Non-blocking checkpoint save
asyncio.create_task(checkpoint_store.save_checkpoint(checkpoint))
else:
# Blocking checkpoint save
await checkpoint_store.save_checkpoint(checkpoint)
# Emit node-started event (skip event_loop nodes — they emit their own)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_started(
@@ -383,6 +585,18 @@ class GraphExecutor:
stream_id=self._stream_id, node_id=current_node_id, iterations=1
)
# Ensure runtime logging has an L2 entry for this node
if self.runtime_logger:
self.runtime_logger.ensure_node_logged(
node_id=node_spec.id,
node_name=node_spec.name,
node_type=node_spec.node_type,
success=result.success,
error=result.error,
tokens_used=result.tokens_used,
latency_ms=result.latency_ms,
)
if result.success:
# Validate output before accepting it.
# Skip for event_loop nodes — their judge system is
@@ -428,6 +642,13 @@ class GraphExecutor:
if len(value_str) > 200:
value_str = value_str[:200] + "..."
self.logger.info(f" {key}: {value_str}")
# Write node outputs to memory BEFORE edge evaluation
# This enables direct key access in conditional expressions (e.g., "score > 80")
# Without this, conditional edges can only use output['key'] syntax
if result.output:
for key, value in result.output.items():
memory.write(key, value, validate=False)
else:
self.logger.error(f" ✗ Failed: {result.error}")
@@ -513,13 +734,29 @@ class GraphExecutor:
total_retries_count = sum(node_retry_counts.values())
nodes_failed = list(node_retry_counts.keys())
if self.runtime_logger:
await self.runtime_logger.end_run(
status="failure",
duration_ms=total_latency,
node_path=path,
execution_quality="failed",
)
# Save memory for potential resume
saved_memory = memory.read_all()
failure_session_state = {
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
}
return ExecutionResult(
success=False,
error=(
f"Node '{node_spec.name}' failed after "
f"{max_retries} attempts: {result.error}"
),
output=memory.read_all(),
output=saved_memory,
steps_executed=steps,
total_tokens=total_tokens,
total_latency_ms=total_latency,
@@ -530,6 +767,7 @@ class GraphExecutor:
had_partial_failures=len(nodes_failed) > 0,
execution_quality="failed",
node_visit_counts=dict(node_visit_counts),
session_state=failure_session_state,
)
# Check if we just executed a pause node - if so, save state and return
@@ -555,6 +793,14 @@ class GraphExecutor:
nodes_failed = [nid for nid, count in node_retry_counts.items() if count > 0]
exec_quality = "degraded" if total_retries_count > 0 else "clean"
if self.runtime_logger:
await self.runtime_logger.end_run(
status="success",
duration_ms=total_latency,
node_path=path,
execution_quality=exec_quality,
)
return ExecutionResult(
success=True,
output=saved_memory,
@@ -644,6 +890,39 @@ class GraphExecutor:
break
next_spec = graph.get_node(next_node)
self.logger.info(f" → Next: {next_spec.name if next_spec else next_node}")
# CHECKPOINT: node_complete (after determining next node)
if (
checkpoint_store
and checkpoint_config
and checkpoint_config.should_checkpoint_node_complete()
):
checkpoint = self._create_checkpoint(
checkpoint_type="node_complete",
current_node=node_spec.id,
execution_path=list(path),
memory=memory,
next_node=next_node,
is_clean=(sum(node_retry_counts.values()) == 0),
)
if checkpoint_config.async_checkpoint:
asyncio.create_task(checkpoint_store.save_checkpoint(checkpoint))
else:
await checkpoint_store.save_checkpoint(checkpoint)
# Periodic checkpoint pruning
if (
checkpoint_store
and checkpoint_config
and checkpoint_config.should_prune_checkpoints(len(path))
):
asyncio.create_task(
checkpoint_store.prune_checkpoints(
max_age_days=checkpoint_config.checkpoint_max_age_days
)
)
current_node_id = next_node
# Update input_data for next node
@@ -678,6 +957,14 @@ class GraphExecutor:
),
)
if self.runtime_logger:
await self.runtime_logger.end_run(
status="success" if exec_quality != "failed" else "failure",
duration_ms=total_latency,
node_path=path,
execution_quality=exec_quality,
)
return ExecutionResult(
success=True,
output=output,
@@ -693,7 +980,55 @@ class GraphExecutor:
node_visit_counts=dict(node_visit_counts),
)
except asyncio.CancelledError:
# Handle cancellation (e.g., TUI quit) - save as paused instead of failed
self.logger.info("⏸ Execution cancelled - saving state for resume")
# Save memory and state for resume
saved_memory = memory.read_all()
session_state_out: dict[str, Any] = {
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
}
# Calculate quality metrics
total_retries_count = sum(node_retry_counts.values())
nodes_failed = [nid for nid, count in node_retry_counts.items() if count > 0]
exec_quality = "degraded" if total_retries_count > 0 else "clean"
if self.runtime_logger:
await self.runtime_logger.end_run(
status="paused",
duration_ms=total_latency,
node_path=path,
execution_quality=exec_quality,
)
# Return with paused status
return ExecutionResult(
success=False,
error="Execution paused by user",
output=saved_memory,
steps_executed=steps,
total_tokens=total_tokens,
total_latency_ms=total_latency,
path=path,
paused_at=current_node_id, # Save where we were
session_state=session_state_out,
total_retries=total_retries_count,
nodes_with_failures=nodes_failed,
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality=exec_quality,
node_visit_counts=dict(node_visit_counts),
)
except Exception as e:
import traceback
stack_trace = traceback.format_exc()
self.runtime.report_problem(
severity="critical",
description=str(e),
@@ -703,13 +1038,63 @@ class GraphExecutor:
narrative=f"Failed at step {steps}: {e}",
)
# Log the crashing node to L2 with full stack trace
if self.runtime_logger and node_spec is not None:
self.runtime_logger.ensure_node_logged(
node_id=node_spec.id,
node_name=node_spec.name,
node_type=node_spec.node_type,
success=False,
error=str(e),
stacktrace=stack_trace,
)
# Calculate quality metrics even for exceptions
total_retries_count = sum(node_retry_counts.values())
nodes_failed = list(node_retry_counts.keys())
if self.runtime_logger:
await self.runtime_logger.end_run(
status="failure",
duration_ms=total_latency,
node_path=path,
execution_quality="failed",
)
# Save memory and state for potential resume
saved_memory = memory.read_all()
session_state_out: dict[str, Any] = {
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
}
# Mark latest checkpoint for resume on failure
if checkpoint_store:
try:
checkpoints = await checkpoint_store.list_checkpoints()
if checkpoints:
# Find latest clean checkpoint
index = await checkpoint_store.load_index()
if index:
latest_clean = index.get_latest_clean_checkpoint()
if latest_clean:
session_state_out["resume_from_checkpoint"] = (
latest_clean.checkpoint_id
)
session_state_out["latest_checkpoint_id"] = (
latest_clean.checkpoint_id
)
self.logger.info(
f"💾 Marked checkpoint for resume: {latest_clean.checkpoint_id}"
)
except Exception as checkpoint_err:
self.logger.warning(f"Failed to mark checkpoint for resume: {checkpoint_err}")
return ExecutionResult(
success=False,
error=str(e),
output=saved_memory,
steps_executed=steps,
path=path,
total_retries=total_retries_count,
@@ -718,8 +1103,15 @@ class GraphExecutor:
had_partial_failures=len(nodes_failed) > 0,
execution_quality="failed",
node_visit_counts=dict(node_visit_counts),
session_state=session_state_out,
)
finally:
if _ctx_token is not None:
from framework.runner.tool_registry import ToolRegistry
ToolRegistry.reset_execution_context(_ctx_token)
def _build_context(
self,
node_spec: NodeSpec,
@@ -751,6 +1143,8 @@ class GraphExecutor:
goal_context=goal.to_prompt_context(),
goal=goal, # Pass Goal object for LLM-powered routers
max_tokens=max_tokens,
runtime_logger=self.runtime_logger,
pause_event=self._pause_requested, # Pass pause event for granular control
)
# Valid node types - no ambiguous "llm" type allowed
@@ -845,19 +1239,24 @@ class GraphExecutor:
# When a tool result exceeds max_tool_result_chars, the full
# content is written to spillover_dir and the agent gets a
# truncated preview with instructions to use load_data().
# Uses storage_path/data which is session-scoped, matching the
# data_dir set via execution context for data tools.
spillover = None
if self._storage_path:
spillover = str(self._storage_path / "data")
lc = self._loop_config
default_max_iter = 100 if node_spec.client_facing else 50
node = EventLoopNode(
event_bus=self._event_bus,
judge=None, # implicit judge: accept when output_keys are filled
config=LoopConfig(
max_iterations=100 if node_spec.client_facing else 50,
max_tool_calls_per_turn=10,
stall_detection_threshold=3,
max_history_tokens=32000,
max_tool_result_chars=3_000,
max_iterations=lc.get("max_iterations", default_max_iter),
max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 10),
tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
stall_detection_threshold=lc.get("stall_detection_threshold", 3),
max_history_tokens=lc.get("max_history_tokens", 32000),
max_tool_result_chars=lc.get("max_tool_result_chars", 3_000),
spillover_dir=spillover,
),
tool_executor=self.tool_executor,
@@ -1147,6 +1546,18 @@ class GraphExecutor:
result = await node_impl.execute(ctx)
last_result = result
# Ensure L2 entry for this branch node
if self.runtime_logger:
self.runtime_logger.ensure_node_logged(
node_id=node_spec.id,
node_name=node_spec.name,
node_type=node_spec.node_type,
success=result.success,
error=result.error,
tokens_used=result.tokens_used,
latency_ms=result.latency_ms,
)
# Emit node-completed event (skip event_loop nodes)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_completed(
@@ -1182,9 +1593,24 @@ class GraphExecutor:
return branch, last_result
except Exception as e:
import traceback
stack_trace = traceback.format_exc()
branch.status = "failed"
branch.error = str(e)
self.logger.error(f" ✗ Branch {branch.node_id}: exception - {e}")
# Log the crashing branch node to L2 with full stack trace
if self.runtime_logger and node_spec is not None:
self.runtime_logger.ensure_node_logged(
node_id=node_spec.id,
node_name=node_spec.name,
node_type=node_spec.node_type,
success=False,
error=str(e),
stacktrace=stack_trace,
)
return branch, e
# Execute all branches concurrently
@@ -1231,3 +1657,50 @@ class GraphExecutor:
def register_function(self, node_id: str, func: Callable) -> None:
"""Register a function as a node."""
self.node_registry[node_id] = FunctionNode(func)
def request_pause(self) -> None:
"""
Request graceful pause of the current execution.
The execution will pause at the next node boundary after the current
node completes. A checkpoint will be saved at the pause point, allowing
the execution to be resumed later.
This method is safe to call from any thread.
"""
self._pause_requested.set()
self.logger.info("⏸ Pause requested - will pause at next node boundary")
def _create_checkpoint(
self,
checkpoint_type: str,
current_node: str,
execution_path: list[str],
memory: SharedMemory,
next_node: str | None = None,
is_clean: bool = True,
) -> Checkpoint:
"""
Create a checkpoint from current execution state.
Args:
checkpoint_type: Type of checkpoint (node_start, node_complete)
current_node: Current node ID
execution_path: Nodes executed so far
memory: SharedMemory instance
next_node: Next node to execute (for node_complete checkpoints)
is_clean: Whether execution was clean up to this point
Returns:
New Checkpoint instance
"""
return Checkpoint.create(
checkpoint_type=checkpoint_type,
session_id=self._storage_path.name if self._storage_path else "unknown",
current_node=current_node,
execution_path=execution_path,
shared_memory=memory.read_all(),
next_node=next_node,
is_clean=is_clean,
)
+158 -4
View File
@@ -477,6 +477,12 @@ class NodeContext:
attempt: int = 1
max_attempts: int = 3
# Runtime logging (optional)
runtime_logger: Any = None # RuntimeLogger | None — uses Any to avoid import
# Pause control (optional) - asyncio.Event for pause requests
pause_event: Any = None # asyncio.Event | None
@dataclass
class NodeResult:
@@ -854,6 +860,8 @@ Keep the same JSON structure but with shorter content values.
)
start = time.time()
_step_index = 0
_captured_tool_calls: list[dict] = []
try:
# Build messages
@@ -893,6 +901,16 @@ Keep the same JSON structure but with shorter content values.
if len(str(result.content)) > 150:
result_str += "..."
logger.info(f" ✓ Tool result: {result_str}")
# Capture for runtime logging
_captured_tool_calls.append(
{
"tool_use_id": tool_use.id,
"tool_name": tool_use.name,
"tool_input": tool_use.input,
"content": result.content,
"is_error": result.is_error,
}
)
return result
response = ctx.llm.complete_with_tools(
@@ -1072,6 +1090,29 @@ Keep the same JSON structure but with shorter content values.
f"Pydantic validation failed after "
f"{max_validation_retries} retries: {err}"
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=error_msg,
total_steps=_step_index + 1,
tokens_used=total_input_tokens + total_output_tokens,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=False,
error=error_msg,
@@ -1161,12 +1202,36 @@ Keep the same JSON structure but with shorter content values.
)
# Return failure instead of writing garbage to all keys
_extraction_error = (
f"Output extraction failed: {e}. LLM returned non-JSON response. "
f"Expected keys: {ctx.node_spec.output_keys}"
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=_extraction_error,
total_steps=_step_index + 1,
tokens_used=response.input_tokens + response.output_tokens,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=False,
error=(
f"Output extraction failed: {e}. LLM returned non-JSON response. "
f"Expected keys: {ctx.node_spec.output_keys}"
),
error=_extraction_error,
output={},
tokens_used=response.input_tokens + response.output_tokens,
latency_ms=latency_ms,
@@ -1184,6 +1249,29 @@ Keep the same JSON structure but with shorter content values.
ctx.memory.write(key, stripped_content, validate=False)
output[key] = stripped_content
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type=ctx.node_spec.node_type,
step_index=_step_index,
llm_text=response.content,
tool_calls=_captured_tool_calls,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=True,
total_steps=_step_index + 1,
tokens_used=response.input_tokens + response.output_tokens,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
latency_ms=latency_ms,
)
return NodeResult(
success=True,
output=output,
@@ -1199,6 +1287,15 @@ Keep the same JSON structure but with shorter content values.
error=str(e),
latency_ms=latency_ms,
)
if ctx.runtime_logger:
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type=ctx.node_spec.node_type,
success=False,
error=str(e),
latency_ms=latency_ms,
)
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
def _parse_output(self, content: str, node_spec: NodeSpec) -> dict[str, Any]:
@@ -1591,6 +1688,9 @@ class RouterNode(NodeProtocol):
async def execute(self, ctx: NodeContext) -> NodeResult:
"""Execute routing logic."""
import time as _time
start = _time.time()
ctx.runtime.set_node(ctx.node_id)
# Build options from routes
@@ -1635,10 +1735,30 @@ class RouterNode(NodeProtocol):
summary=f"Routing to {chosen_route[1]}",
)
latency_ms = int((_time.time() - start) * 1000)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="router",
step_index=0,
llm_text=f"Route: {chosen_route[0]} -> {chosen_route[1]}",
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="router",
success=True,
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(
success=True,
next_node=chosen_route[1],
route_reason=f"Chose route: {chosen_route[0]}",
latency_ms=latency_ms,
)
async def _llm_route(
@@ -1800,6 +1920,22 @@ class FunctionNode(NodeProtocol):
else:
output = {"result": result}
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="function",
step_index=0,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="function",
success=True,
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(success=True, output=output, latency_ms=latency_ms)
except Exception as e:
@@ -1810,4 +1946,22 @@ class FunctionNode(NodeProtocol):
error=str(e),
latency_ms=latency_ms,
)
if ctx.runtime_logger:
ctx.runtime_logger.log_step(
node_id=ctx.node_id,
node_type="function",
step_index=0,
latency_ms=latency_ms,
)
ctx.runtime_logger.log_node_complete(
node_id=ctx.node_id,
node_name=ctx.node_spec.name,
node_type="function",
success=False,
error=str(e),
total_steps=1,
latency_ms=latency_ms,
)
return NodeResult(success=False, error=str(e), latency_ms=latency_ms)
+134 -7
View File
@@ -9,20 +9,37 @@ Usage:
import json
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Annotated
from mcp.server import FastMCP
# Ensure exports/ is on sys.path so AgentRunner can import agent modules.
_framework_dir = Path(__file__).resolve().parent.parent # core/framework/ -> core/
_project_root = _framework_dir.parent # core/ -> project root
_exports_dir = _project_root / "exports"
if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
sys.path.insert(0, str(_exports_dir))
del _framework_dir, _project_root, _exports_dir
from framework.graph import Constraint, EdgeCondition, EdgeSpec, Goal, NodeSpec, SuccessCriterion
from framework.graph.plan import Plan
from mcp.server import FastMCP # noqa: E402
from pydantic import ValidationError # noqa: E402
from framework.graph import ( # noqa: E402
Constraint,
EdgeCondition,
EdgeSpec,
Goal,
NodeSpec,
SuccessCriterion,
)
from framework.graph.plan import Plan # noqa: E402
# Testing framework imports
from framework.testing.prompts import (
from framework.testing.prompts import ( # noqa: E402
PYTEST_TEST_FILE_HEADER,
)
from framework.utils.io import atomic_write
from framework.utils.io import atomic_write # noqa: E402
# Initialize MCP server
mcp = FastMCP("agent-builder")
@@ -569,7 +586,11 @@ def add_node(
str, "JSON object mapping conditions to target node IDs for router nodes"
] = "{}",
client_facing: Annotated[
bool, "If True, node streams output to user and blocks for input between turns"
bool,
"If True, an ask_user() tool is injected so the LLM can explicitly request user input. "
"The node blocks ONLY when ask_user() is called — text-only turns stream freely. "
"Set True for nodes that interact with users (intake, review, approval). "
"Nodes that do autonomous work (research, data processing, API calls) MUST be False.",
] = False,
nullable_output_keys: Annotated[
str, "JSON array of output keys that may remain unset (for mutually exclusive outputs)"
@@ -650,6 +671,14 @@ def add_node(
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# Warn about client_facing on nodes with tools (likely autonomous work)
if node_type == "event_loop" and client_facing and tools_list:
warnings.append(
f"Node '{node_id}' is client_facing=True but has tools {tools_list}. "
"Nodes with tools typically do autonomous work and should be "
"client_facing=False. Only set True if this node needs user approval."
)
# nullable_output_keys must be a subset of output_keys
if nullable_output_keys_list:
invalid_nullable = [k for k in nullable_output_keys_list if k not in output_keys_list]
@@ -1360,6 +1389,17 @@ def validate_graph() -> str:
f"Node '{dn['node_id']}' uses deprecated type '{dn['type']}'. Use 'event_loop' instead."
)
# Warn if all event_loop nodes are client_facing (common misconfiguration)
el_nodes = [n for n in session.nodes if n.node_type == "event_loop"]
cf_el_nodes = [n for n in el_nodes if n.client_facing]
if len(el_nodes) > 1 and len(cf_el_nodes) == len(el_nodes):
warnings.append(
f"ALL {len(el_nodes)} event_loop nodes are client_facing=True. "
"This injects ask_user() on every node. Only nodes that need user "
"interaction (intake, review, approval) should be client_facing. Set "
"client_facing=False on autonomous processing nodes."
)
# Collect summary info
event_loop_nodes = [n.id for n in session.nodes if n.node_type == "event_loop"]
client_facing_nodes = [n.id for n in session.nodes if n.client_facing]
@@ -1817,6 +1857,85 @@ def export_graph() -> str:
)
@mcp.tool()
def import_from_export(
agent_json_path: Annotated[str, "Path to the agent.json file to import"],
) -> str:
"""
Import an agent definition from an exported agent.json file into the current build session.
Reads the agent.json, parses goal/nodes/edges, and populates the current session.
This is the reverse of export_graph().
Args:
agent_json_path: Path to the agent.json file to import
Returns:
JSON summary of what was imported (goal name, node count, edge count)
"""
session = get_session()
path = Path(agent_json_path)
if not path.exists():
return json.dumps({"success": False, "error": f"File not found: {agent_json_path}"})
try:
data = json.loads(path.read_text())
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
try:
# Parse goal (same pattern as BuildSession.from_dict lines 88-99)
goal_data = data.get("goal")
if goal_data:
session.goal = Goal(
id=goal_data["id"],
name=goal_data["name"],
description=goal_data["description"],
success_criteria=[
SuccessCriterion(**sc) for sc in goal_data.get("success_criteria", [])
],
constraints=[Constraint(**c) for c in goal_data.get("constraints", [])],
)
# Parse nodes (same pattern as BuildSession.from_dict line 102)
graph_data = data.get("graph", {})
nodes_data = graph_data.get("nodes", [])
session.nodes = [NodeSpec(**n) for n in nodes_data]
# Parse edges (same pattern as BuildSession.from_dict lines 105-118)
edges_data = graph_data.get("edges", [])
session.edges = []
for e in edges_data:
condition_str = e.get("condition")
if isinstance(condition_str, str):
condition_map = {
"always": EdgeCondition.ALWAYS,
"on_success": EdgeCondition.ON_SUCCESS,
"on_failure": EdgeCondition.ON_FAILURE,
"conditional": EdgeCondition.CONDITIONAL,
"llm_decide": EdgeCondition.LLM_DECIDE,
}
e["condition"] = condition_map.get(condition_str, EdgeCondition.ON_SUCCESS)
session.edges.append(EdgeSpec(**e))
except (KeyError, TypeError, ValueError, ValidationError) as e:
return json.dumps({"success": False, "error": f"Malformed agent.json: {e}"})
# Persist updated session
_save_session(session)
return json.dumps(
{
"success": True,
"goal": session.goal.name if session.goal else None,
"nodes_count": len(session.nodes),
"edges_count": len(session.edges),
"node_ids": [n.id for n in session.nodes],
"edge_ids": [e.id for e in session.edges],
}
)
@mcp.tool()
def get_session_status() -> str:
"""Get the current status of the build session."""
@@ -1853,12 +1972,19 @@ def configure_loop(
max_history_tokens: Annotated[
int, "Maximum conversation history tokens before compaction (default 32000)"
] = 32000,
tool_call_overflow_margin: Annotated[
float,
"Overflow margin for max_tool_calls_per_turn. "
"Tool calls are only discarded when count exceeds "
"max_tool_calls_per_turn * (1 + margin). Default 0.5 (50% wiggle room)",
] = 0.5,
) -> str:
"""Configure event loop parameters for EventLoopNode execution.
These settings control how EventLoopNodes behave at runtime:
- max_iterations: prevents infinite loops
- max_tool_calls_per_turn: limits tool calls per LLM response
- tool_call_overflow_margin: wiggle room before tool calls are discarded (default 50%)
- stall_detection_threshold: detects when LLM repeats itself
- max_history_tokens: triggers conversation compaction
"""
@@ -1867,6 +1993,7 @@ def configure_loop(
session.loop_config = {
"max_iterations": max_iterations,
"max_tool_calls_per_turn": max_tool_calls_per_turn,
"tool_call_overflow_margin": tool_call_overflow_margin,
"stall_detection_threshold": stall_detection_threshold,
"max_history_tokens": max_history_tokens,
}
@@ -2189,7 +2316,7 @@ def test_node(
)
else:
cf_note = (
"Node is client-facing: will block for user input between turns. "
"Node is client-facing: has ask_user() tool, blocks when LLM calls it. "
if node_spec.client_facing
else ""
)
+236
View File
@@ -0,0 +1,236 @@
# Observability - Structured Logging
## Configuration via Environment Variables
Control logging format using environment variables:
```bash
# JSON logging (production) - Machine-parseable, one line per log
export LOG_FORMAT=json
python -m my_agent run
# Human-readable (development) - Color-coded, easy to read
# Default if LOG_FORMAT is not set
python -m my_agent run
```
**Alternative:** Set `ENV=production` to automatically use JSON format:
```bash
export ENV=production
python -m my_agent run
```
---
## Overview
The Hive framework provides automatic structured logging with trace context propagation. Logs include correlation IDs (`trace_id`, `execution_id`) that automatically follow your agent execution flow.
**Features:**
- **Zero developer friction**: Standard `logger.info()` calls automatically get trace context
- **ContextVar-based propagation**: Thread-safe and async-safe for concurrent executions
- **Dual output modes**: JSON for production, human-readable for development
- **Automatic correlation**: `trace_id` and `execution_id` propagate through all logs
## Quick Start
Logging is automatically configured when you use `AgentRunner`. No setup required:
```python
from framework.runner import AgentRunner
runner = AgentRunner(graph=my_graph, goal=my_goal)
result = await runner.run({"input": "data"})
# Logs automatically include trace_id, execution_id, agent_id, etc.
```
## Programmatic Configuration
Configure logging explicitly in your code:
```python
from framework.observability import configure_logging
# Human-readable (development)
configure_logging(level="DEBUG", format="human")
# JSON (production)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
```
### Configuration Options
- **level**: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`
- **format**:
- `"json"` - Machine-parseable JSON (one line per log entry)
- `"human"` - Human-readable with colors
- `"auto"` - Detects from `LOG_FORMAT` env var or `ENV=production`
## Log Format Examples
### JSON Format (Machine-parseable)
```json
{"timestamp": "2026-01-28T15:01:02.671126+00:00", "level": "info", "logger": "framework.runtime", "message": "Starting agent execution", "trace_id": "54e80d7b5bd6409dbc3217e5cd16a4fd", "execution_id": "b4c348ec54e80d7b5bd6409dbc3217e50", "agent_id": "sales-agent", "goal_id": "qualify-leads"}
```
**Features:**
- `trace_id` and `execution_id` are 32 hex chars (W3C/OTel-aligned, no prefixes)
- Compact single-line format (easy to stream/parse)
- All trace context fields included automatically
### Human-Readable Format (Development)
```
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Starting agent execution
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] Processing input data [node_id:input-processor]
[INFO ] [trace:12345678 | exec:a1b2c3d4 | agent:sales-agent] LLM call completed [latency_ms:1250] [tokens_used:450]
```
**Features:**
- Color-coded log levels
- Shortened IDs for readability (first 8 chars)
- Context prefix shows trace correlation
## Trace Context Fields
When the framework sets trace context, these fields are included in all logs. IDs are 32 hex (W3C/OTel-aligned, no prefixes).
- **trace_id**: Trace identifier
- **execution_id**: Run/session correlation
- **agent_id**: Agent/graph identifier
- **goal_id**: Goal being pursued
- **node_id**: Current node (when set)
## Custom Log Fields
Add custom fields using the `extra` parameter:
```python
import logging
logger = logging.getLogger("my_module")
# Add custom fields
logger.info("LLM call completed", extra={
"latency_ms": 1250,
"tokens_used": 450,
"model": "claude-3-5-sonnet-20241022",
"node_id": "web-search"
})
```
These fields appear in both JSON and human-readable formats.
## Usage in Your Code
### Standard Logging (Recommended)
Just use Python's standard logging - context is automatic:
```python
import logging
logger = logging.getLogger(__name__)
def my_function():
# This log automatically includes trace_id, execution_id, etc.
logger.info("Processing data")
try:
result = do_work()
logger.info("Work completed", extra={"result_count": len(result)})
except Exception as e:
logger.error("Work failed", exc_info=True)
```
### Framework-Managed Context
The framework automatically sets trace context at key points:
- **Runtime.start_run()**: Sets `trace_id`, `execution_id`, `goal_id`
- **GraphExecutor.execute()**: Adds `agent_id`
- **Node execution**: Adds `node_id`
Propagation is automatic via ContextVar.
## Advanced Usage
### Manual Context Management
If you need to set trace context manually (rare):
```python
from framework.observability import set_trace_context, get_trace_context
# Set context (32-hex, no prefixes)
set_trace_context(
trace_id="54e80d7b5bd6409dbc3217e5cd16a4fd",
execution_id="b4c348ec54e80d7b5bd6409dbc3217e50",
agent_id="my-agent"
)
# Get current context
context = get_trace_context()
print(context["execution_id"])
# Clear context (usually not needed)
from framework.observability import clear_trace_context
clear_trace_context()
```
### Testing
For tests, you may want to configure logging explicitly:
```python
import pytest
from framework.observability import configure_logging
@pytest.fixture(autouse=True)
def setup_logging():
configure_logging(level="DEBUG", format="human")
```
## Best Practices
1. **Production**: Use JSON format (`LOG_FORMAT=json` or `ENV=production`)
2. **Development**: Use human-readable format (default)
3. **Don't manually set context**: Let the framework manage it
4. **Use standard logging**: No special APIs needed - just `logger.info()`
5. **Add custom fields**: Use `extra` dict for additional metadata
## Troubleshooting
### Logs missing trace context
Ensure `configure_logging()` has been called (usually automatic via `AgentRunner._setup()`).
### JSON logs not appearing
Check environment variables:
```bash
echo $LOG_FORMAT
echo $ENV
```
Or explicitly set:
```python
configure_logging(format="json")
```
### Context not propagating
ContextVar automatically propagates through async calls. If context seems lost, check:
- Are you in the same async execution context?
- Has `set_trace_context()` been called for this execution?
## See Also
- [Logging Implementation](../observability/logging.py) - Source code
- [AgentRunner](../runner/runner.py) - Where logging is configured
- [Runtime Core](../runtime/core.py) - Where trace context is set
+23
View File
@@ -0,0 +1,23 @@
"""
Observability module for automatic trace correlation and structured logging.
This module provides zero-friction observability:
- Automatic trace context propagation via ContextVar
- Structured JSON logging for production
- Human-readable logging for development
- No manual ID passing required
"""
from framework.observability.logging import (
clear_trace_context,
configure_logging,
get_trace_context,
set_trace_context,
)
__all__ = [
"configure_logging",
"get_trace_context",
"set_trace_context",
"clear_trace_context",
]
+302
View File
@@ -0,0 +1,302 @@
"""
Structured logging with automatic trace context propagation.
Key Features:
- Zero developer friction: Standard logger.info() calls get automatic context
- ContextVar-based propagation: Thread-safe and async-safe
- Dual output modes: JSON for production, human-readable for development
- Correlation IDs: trace_id follows entire request flow automatically
Architecture:
Runtime.start_run() Generates trace_id, sets context once
(automatic propagation via ContextVar)
GraphExecutor.execute() Adds agent_id to context
(automatic propagation)
Node.execute() Adds node_id to context
(automatic propagation)
User code logger.info("message") Gets ALL context automatically!
"""
import json
import logging
import os
import re
from contextvars import ContextVar
from datetime import UTC, datetime
from typing import Any
# Context variable for trace propagation
# ContextVar is thread-safe and async-safe - perfect for concurrent agent execution
trace_context: ContextVar[dict[str, Any] | None] = ContextVar("trace_context", default=None)
# ANSI escape code pattern (matches \033[...m or \x1b[...m)
ANSI_ESCAPE_PATTERN = re.compile(r"\x1b\[[0-9;]*m|\033\[[0-9;]*m")
def strip_ansi_codes(text: str) -> str:
"""Remove ANSI escape codes from text for clean JSON logging."""
return ANSI_ESCAPE_PATTERN.sub("", text)
class StructuredFormatter(logging.Formatter):
"""
JSON formatter for structured logging.
Produces machine-parseable log entries with:
- Standard fields (timestamp, level, logger, message)
- Trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC
- Custom fields from extra dict
"""
def format(self, record: logging.LogRecord) -> str:
"""Format log record as JSON."""
# Get trace context for correlation - AUTOMATIC!
context = trace_context.get() or {}
# Strip ANSI codes from message for clean JSON output
message = strip_ansi_codes(record.getMessage())
# Build base log entry
log_entry = {
"timestamp": datetime.now(UTC).isoformat(),
"level": record.levelname.lower(),
"logger": record.name,
"message": message,
}
# Add trace context (trace_id, execution_id, agent_id, etc.) - AUTOMATIC!
log_entry.update(context)
# Add custom fields from extra (optional)
event = getattr(record, "event", None)
if event is not None:
if isinstance(event, str):
log_entry["event"] = strip_ansi_codes(str(event))
else:
log_entry["event"] = event
latency_ms = getattr(record, "latency_ms", None)
if latency_ms is not None:
log_entry["latency_ms"] = latency_ms
tokens_used = getattr(record, "tokens_used", None)
if tokens_used is not None:
log_entry["tokens_used"] = tokens_used
node_id = getattr(record, "node_id", None)
if node_id is not None:
log_entry["node_id"] = node_id
model = getattr(record, "model", None)
if model is not None:
log_entry["model"] = model
# Add exception info if present (strip ANSI codes from exception text too)
if record.exc_info:
exception_text = self.formatException(record.exc_info)
log_entry["exception"] = strip_ansi_codes(exception_text)
return json.dumps(log_entry)
class HumanReadableFormatter(logging.Formatter):
"""
Human-readable formatter for development.
Provides colorized logs with trace context for local debugging.
Includes trace_id prefix for correlation - AUTOMATIC!
"""
COLORS = {
"DEBUG": "\033[36m", # Cyan
"INFO": "\033[32m", # Green
"WARNING": "\033[33m", # Yellow
"ERROR": "\033[31m", # Red
"CRITICAL": "\033[35m", # Magenta
}
RESET = "\033[0m"
def format(self, record: logging.LogRecord) -> str:
"""Format log record as human-readable string."""
# Get trace context - AUTOMATIC!
context = trace_context.get() or {}
trace_id = context.get("trace_id", "")
execution_id = context.get("execution_id", "")
agent_id = context.get("agent_id", "")
# Build context prefix
prefix_parts = []
if trace_id:
prefix_parts.append(f"trace:{trace_id[:8]}")
if execution_id:
prefix_parts.append(f"exec:{execution_id[-8:]}")
if agent_id:
prefix_parts.append(f"agent:{agent_id}")
context_prefix = f"[{' | '.join(prefix_parts)}] " if prefix_parts else ""
# Get color
color = self.COLORS.get(record.levelname, "")
reset = self.RESET
# Format log level (5 chars wide for alignment)
level = f"{record.levelname:<8}"
# Add event if present
event = ""
record_event = getattr(record, "event", None)
if record_event is not None:
event = f" [{record_event}]"
# Format message: [LEVEL] [trace context] message
return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
def configure_logging(
level: str = "INFO",
format: str = "auto", # "json", "human", or "auto"
) -> None:
"""
Configure structured logging for the application.
This should be called ONCE at application startup, typically in:
- AgentRunner._setup()
- Main entry point
- Test fixtures
Args:
level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
format: Output format:
- "json": Machine-parseable JSON (for production)
- "human": Human-readable with colors (for development)
- "auto": JSON if LOG_FORMAT=json or ENV=production, else human
Examples:
# Development mode (human-readable)
configure_logging(level="DEBUG", format="human")
# Production mode (JSON)
configure_logging(level="INFO", format="json")
# Auto-detect from environment
configure_logging(level="INFO", format="auto")
"""
# Auto-detect format
if format == "auto":
# Use JSON if LOG_FORMAT=json or ENV=production
log_format_env = os.getenv("LOG_FORMAT", "").lower()
env = os.getenv("ENV", "development").lower()
if log_format_env == "json" or env == "production":
format = "json"
else:
format = "human"
# Select formatter
if format == "json":
formatter = StructuredFormatter()
# Disable colors in third-party libraries when using JSON format
_disable_third_party_colors()
else:
formatter = HumanReadableFormatter()
# Configure handler
handler = logging.StreamHandler()
handler.setFormatter(formatter)
# Configure root logger
root_logger = logging.getLogger()
root_logger.handlers.clear()
root_logger.addHandler(handler)
root_logger.setLevel(level.upper())
# When in JSON mode, configure known third-party loggers to use JSON formatter
# This ensures libraries like LiteLLM, httpcore also output clean JSON
if format == "json":
third_party_loggers = [
"LiteLLM",
"httpcore",
"httpx",
"openai",
]
for logger_name in third_party_loggers:
logger = logging.getLogger(logger_name)
# Clear existing handlers so records propagate to root and use our formatter there
logger.handlers.clear()
logger.propagate = True # Still propagate to root for consistency
def _disable_third_party_colors() -> None:
"""Disable color output in third-party libraries for clean JSON logging."""
# Set NO_COLOR environment variable (common convention for disabling colors)
os.environ["NO_COLOR"] = "1"
os.environ["FORCE_COLOR"] = "0"
# Disable LiteLLM debug/verbose output colors if available
try:
import litellm
# LiteLLM respects NO_COLOR, but we can also suppress debug info
if hasattr(litellm, "suppress_debug_info"):
litellm.suppress_debug_info = True # type: ignore[attr-defined]
except (ImportError, AttributeError):
pass
def set_trace_context(**kwargs: Any) -> None:
"""
Set trace context for current execution.
Context is stored in a ContextVar and AUTOMATICALLY propagates
through async calls within the same execution context.
This is called by the framework at key points:
- Runtime.start_run(): Sets trace_id, execution_id, goal_id
- GraphExecutor.execute(): Adds agent_id
- Node execution: Adds node_id
Developers/agents NEVER call this directly - it's framework-managed.
Args:
**kwargs: Context fields (trace_id, execution_id, agent_id, etc.)
Example (framework code):
# In Runtime.start_run()
trace_id = uuid.uuid4().hex # 32 hex, W3C Trace Context compliant
execution_id = uuid.uuid4().hex # 32 hex, OTel-aligned for correlation
set_trace_context(
trace_id=trace_id,
execution_id=execution_id,
goal_id=goal_id
)
# All subsequent logs in this execution get these fields automatically!
"""
current = trace_context.get() or {}
trace_context.set({**current, **kwargs})
def get_trace_context() -> dict:
"""
Get current trace context.
Returns:
Dict with trace_id, execution_id, agent_id, etc.
Empty dict if no context set.
"""
context = trace_context.get() or {}
return context.copy()
def clear_trace_context() -> None:
"""
Clear trace context.
Useful for:
- Cleanup between test runs
- Starting a completely new execution context
- Manual context management (rare)
Note: Framework typically doesn't need to call this - ContextVar
is execution-scoped and cleans itself up automatically.
"""
trace_context.set(None)
+525 -32
View File
@@ -33,11 +33,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
type=str,
help="Input context from JSON file",
)
run_parser.add_argument(
"--mock",
action="store_true",
help="Run in mock mode (no real LLM calls)",
)
run_parser.add_argument(
"--output",
"-o",
@@ -68,6 +63,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
run_parser.add_argument(
"--resume-session",
type=str,
default=None,
help="Resume from a specific session ID",
)
run_parser.add_argument(
"--checkpoint",
type=str,
default=None,
help="Resume from a specific checkpoint (requires --resume-session)",
)
run_parser.set_defaults(func=cmd_run)
# info command
@@ -186,6 +193,144 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
)
shell_parser.set_defaults(func=cmd_shell)
# tui command (interactive agent dashboard)
tui_parser = subparsers.add_parser(
"tui",
help="Launch interactive TUI dashboard",
description="Browse available agents and launch the terminal dashboard.",
)
tui_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
tui_parser.set_defaults(func=cmd_tui)
# sessions command group (checkpoint/resume management)
sessions_parser = subparsers.add_parser(
"sessions",
help="Manage agent sessions",
description="List, inspect, and manage agent execution sessions.",
)
sessions_subparsers = sessions_parser.add_subparsers(
dest="sessions_cmd",
help="Session management commands",
)
# sessions list
sessions_list_parser = sessions_subparsers.add_parser(
"list",
help="List agent sessions",
description="List all sessions for an agent.",
)
sessions_list_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_list_parser.add_argument(
"--status",
choices=["all", "active", "failed", "completed", "paused"],
default="all",
help="Filter by session status (default: all)",
)
sessions_list_parser.add_argument(
"--has-checkpoints",
action="store_true",
help="Show only sessions with checkpoints",
)
sessions_list_parser.set_defaults(func=cmd_sessions_list)
# sessions show
sessions_show_parser = sessions_subparsers.add_parser(
"show",
help="Show session details",
description="Display detailed information about a specific session.",
)
sessions_show_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_show_parser.add_argument(
"session_id",
type=str,
help="Session ID to inspect",
)
sessions_show_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
sessions_show_parser.set_defaults(func=cmd_sessions_show)
# sessions checkpoints
sessions_checkpoints_parser = sessions_subparsers.add_parser(
"checkpoints",
help="List session checkpoints",
description="List all checkpoints for a session.",
)
sessions_checkpoints_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_checkpoints_parser.add_argument(
"session_id",
type=str,
help="Session ID",
)
sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
# pause command
pause_parser = subparsers.add_parser(
"pause",
help="Pause running session",
description="Request graceful pause of a running agent session.",
)
pause_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
pause_parser.add_argument(
"session_id",
type=str,
help="Session ID to pause",
)
pause_parser.set_defaults(func=cmd_pause)
# resume command
resume_parser = subparsers.add_parser(
"resume",
help="Resume session from checkpoint",
description="Resume a paused or failed session from a checkpoint.",
)
resume_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
resume_parser.add_argument(
"session_id",
type=str,
help="Session ID to resume",
)
resume_parser.add_argument(
"--checkpoint",
"-c",
type=str,
help="Specific checkpoint ID to resume from (default: latest)",
)
resume_parser.add_argument(
"--tui",
action="store_true",
help="Resume in TUI dashboard mode",
)
resume_parser.set_defaults(func=cmd_resume)
def cmd_run(args: argparse.Namespace) -> int:
"""Run an exported agent."""
@@ -228,7 +373,6 @@ def cmd_run(args: argparse.Namespace) -> int:
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=True,
)
@@ -244,7 +388,11 @@ def cmd_run(args: argparse.Namespace) -> int:
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
app = AdenTUI(
runner._agent_runtime,
resume_session=getattr(args, "resume_session", None),
resume_checkpoint=getattr(args, "checkpoint", None),
)
# TUI handles execution via ChatRepl — user submits input,
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
@@ -266,7 +414,6 @@ def cmd_run(args: argparse.Namespace) -> int:
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=False,
)
@@ -985,8 +1132,215 @@ def cmd_shell(args: argparse.Namespace) -> int:
return 0
def cmd_tui(args: argparse.Namespace) -> int:
"""Browse agents and launch the interactive TUI dashboard."""
import logging
from framework.runner import AgentRunner
from framework.tui.app import AdenTUI
logging.basicConfig(level=logging.WARNING, format="%(message)s")
exports_dir = Path("exports")
examples_dir = Path("examples/templates")
has_exports = _has_agents(exports_dir)
has_examples = _has_agents(examples_dir)
if not has_exports and not has_examples:
print("No agents found in exports/ or examples/templates/", file=sys.stderr)
return 1
# Determine which directory to browse
if has_exports and has_examples:
print("\nAgent sources:\n")
print(" 1. Your Agents (exports/)")
print(" 2. Sample Agents (examples/templates/)")
print()
try:
choice = input("Select source (number): ").strip()
if choice == "1":
agents_dir = exports_dir
elif choice == "2":
agents_dir = examples_dir
else:
print("Invalid selection")
return 1
except (EOFError, KeyboardInterrupt):
print()
return 1
elif has_exports:
agents_dir = exports_dir
else:
agents_dir = examples_dir
# Let user pick an agent
agent_path = _select_agent(agents_dir)
if not agent_path:
return 1
# Launch TUI (same pattern as cmd_run --tui)
async def run_with_tui():
try:
runner = AgentRunner.load(
agent_path,
model=args.model,
enable_tui=True,
)
except Exception as e:
print(f"Error loading agent: {e}")
return
if runner._agent_runtime is None:
runner._setup()
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
try:
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
"""Extract name and description from a Python-based agent's config.py.
Uses AST parsing to safely extract values without executing code.
Returns (name, description) tuple, with fallbacks if parsing fails.
"""
import ast
config_path = agent_path / "config.py"
fallback_name = agent_path.name.replace("_", " ").title()
fallback_desc = "(Python-based agent)"
if not config_path.exists():
return fallback_name, fallback_desc
try:
with open(config_path) as f:
tree = ast.parse(f.read())
# Find AgentMetadata class definition
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef) and node.name == "AgentMetadata":
name = fallback_name
desc = fallback_desc
# Extract default values from class body
for item in node.body:
if isinstance(item, ast.AnnAssign) and isinstance(item.target, ast.Name):
field_name = item.target.id
if item.value:
# Handle simple string constants
if isinstance(item.value, ast.Constant):
if field_name == "name":
name = item.value.value
elif field_name == "description":
desc = item.value.value
# Handle parenthesized multi-line strings (concatenated)
elif isinstance(item.value, ast.JoinedStr):
# f-strings - skip, use fallback
pass
elif isinstance(item.value, ast.BinOp):
# String concatenation with + - try to evaluate
try:
result = _eval_string_binop(item.value)
if result and field_name == "name":
name = result
elif result and field_name == "description":
desc = result
except Exception:
pass
return name, desc
return fallback_name, fallback_desc
except Exception:
return fallback_name, fallback_desc
def _eval_string_binop(node) -> str | None:
"""Recursively evaluate a BinOp of string constants."""
import ast
if isinstance(node, ast.Constant) and isinstance(node.value, str):
return node.value
elif isinstance(node, ast.BinOp) and isinstance(node.op, ast.Add):
left = _eval_string_binop(node.left)
right = _eval_string_binop(node.right)
if left is not None and right is not None:
return left + right
return None
def _is_valid_agent_dir(path: Path) -> bool:
"""Check if a directory contains a valid agent (agent.json or agent.py)."""
if not path.is_dir():
return False
return (path / "agent.json").exists() or (path / "agent.py").exists()
def _has_agents(directory: Path) -> bool:
"""Check if a directory contains any valid agents (folders with agent.json or agent.py)."""
if not directory.exists():
return False
return any(_is_valid_agent_dir(p) for p in directory.iterdir())
def _getch() -> str:
"""Read a single character from stdin without waiting for Enter."""
try:
if sys.platform == "win32":
import msvcrt
ch = msvcrt.getch()
return ch.decode("utf-8", errors="ignore")
else:
import termios
import tty
fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
try:
tty.setraw(fd)
ch = sys.stdin.read(1)
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
return ch
except Exception:
return ""
def _read_key() -> str:
"""Read a key, handling arrow key escape sequences."""
ch = _getch()
if ch == "\x1b": # Escape sequence start
ch2 = _getch()
if ch2 == "[":
ch3 = _getch()
if ch3 == "C": # Right arrow
return "RIGHT"
elif ch3 == "D": # Left arrow
return "LEFT"
return ch
def _select_agent(agents_dir: Path) -> str | None:
"""Let user select an agent from available agents."""
"""Let user select an agent from available agents with pagination."""
AGENTS_PER_PAGE = 10
if not agents_dir.exists():
print(f"Directory not found: {agents_dir}", file=sys.stderr)
# fixes issue #696, creates an exports folder if it does not exist
@@ -996,37 +1350,126 @@ def _select_agent(agents_dir: Path) -> str | None:
agents = []
for path in agents_dir.iterdir():
if path.is_dir() and (path / "agent.json").exists():
if _is_valid_agent_dir(path):
agents.append(path)
if not agents:
print(f"No agents found in {agents_dir}", file=sys.stderr)
return None
print(f"\nAvailable agents in {agents_dir}:\n")
for i, agent_path in enumerate(agents, 1):
# Pagination setup
page = 0
total_pages = (len(agents) + AGENTS_PER_PAGE - 1) // AGENTS_PER_PAGE
while True:
start_idx = page * AGENTS_PER_PAGE
end_idx = min(start_idx + AGENTS_PER_PAGE, len(agents))
page_agents = agents[start_idx:end_idx]
# Show page header with indicator
if total_pages > 1:
print(f"\nAvailable agents in {agents_dir} (Page {page + 1}/{total_pages}):\n")
else:
print(f"\nAvailable agents in {agents_dir}:\n")
# Display agents for current page (with global numbering)
for i, agent_path in enumerate(page_agents, start_idx + 1):
try:
agent_json = agent_path / "agent.json"
if agent_json.exists():
with open(agent_json) as f:
data = json.load(f)
agent_meta = data.get("agent", {})
name = agent_meta.get("name", agent_path.name)
desc = agent_meta.get("description", "")
else:
# Python-based agent - extract from config.py
name, desc = _extract_python_agent_metadata(agent_path)
desc = desc[:50] + "..." if len(desc) > 50 else desc
print(f" {i}. {name}")
print(f" {desc}")
except Exception as e:
print(f" {i}. {agent_path.name} (error: {e})")
# Build navigation options
nav_options = []
if total_pages > 1:
nav_options.append("←/→ or p/n=navigate")
nav_options.append("q=quit")
print()
if total_pages > 1:
print(f" [{', '.join(nav_options)}]")
print()
# Show prompt
print("Select agent (number), use arrows to navigate, or q to quit: ", end="", flush=True)
try:
from framework.runner import AgentRunner
key = _read_key()
runner = AgentRunner.load(agent_path)
info = runner.info()
desc = info.description[:50] + "..." if len(info.description) > 50 else info.description
print(f" {i}. {info.name}")
print(f" {desc}")
runner.cleanup()
except Exception as e:
print(f" {i}. {agent_path.name} (error: {e})")
if key == "RIGHT" and page < total_pages - 1:
page += 1
print() # Newline before redrawing
elif key == "LEFT" and page > 0:
page -= 1
print()
elif key == "q":
print()
return None
elif key in ("n", ">") and page < total_pages - 1:
page += 1
print()
elif key in ("p", "<") and page > 0:
page -= 1
print()
elif key.isdigit():
# Build number with support for backspace
buffer = key
print(key, end="", flush=True)
print()
try:
choice = input("Select agent (number): ").strip()
idx = int(choice) - 1
if 0 <= idx < len(agents):
return str(agents[idx])
print("Invalid selection")
return None
except (ValueError, EOFError, KeyboardInterrupt):
return None
while True:
ch = _getch()
if ch in ("\r", "\n"):
# Enter pressed - submit
print()
break
elif ch in ("\x7f", "\x08"):
# Backspace (DEL or BS)
if buffer:
buffer = buffer[:-1]
# Erase character: move back, print space, move back
print("\b \b", end="", flush=True)
elif ch.isdigit():
buffer += ch
print(ch, end="", flush=True)
elif ch == "\x1b":
# Escape - cancel input
print()
buffer = ""
break
elif ch == "\x03":
# Ctrl+C
print()
return None
# Ignore other characters
if buffer:
try:
idx = int(buffer) - 1
if 0 <= idx < len(agents):
return str(agents[idx])
print("Invalid selection")
except ValueError:
print("Invalid input")
elif key == "\r" or key == "\n":
print() # Just pressed enter, redraw
else:
print()
print("Invalid input")
except (EOFError, KeyboardInterrupt):
print()
return None
def _interactive_multi(agents_dir: Path) -> int:
@@ -1042,7 +1485,7 @@ def _interactive_multi(agents_dir: Path) -> int:
# Register all agents
for path in agents_dir.iterdir():
if path.is_dir() and (path / "agent.json").exists():
if _is_valid_agent_dir(path):
try:
orchestrator.register(path.name, path)
agent_count += 1
@@ -1128,3 +1571,53 @@ def _interactive_multi(agents_dir: Path) -> int:
orchestrator.cleanup()
return 0
def cmd_sessions_list(args: argparse.Namespace) -> int:
"""List agent sessions."""
print("⚠ Sessions list command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Status filter: {args.status}")
print(f"Has checkpoints: {args.has_checkpoints}")
return 1
def cmd_sessions_show(args: argparse.Namespace) -> int:
"""Show detailed session information."""
print("⚠ Session show command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
"""List checkpoints for a session."""
print("⚠ Session checkpoints command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_pause(args: argparse.Namespace) -> int:
"""Pause a running session."""
print("⚠ Pause command not yet implemented")
print("This will be available once executor pause integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_resume(args: argparse.Namespace) -> int:
"""Resume a session from checkpoint."""
print("⚠ Resume command not yet implemented")
print("This will be available once checkpoint resume integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
if args.checkpoint:
print(f"Checkpoint: {args.checkpoint}")
if args.tui:
print("Mode: TUI")
return 1
+45 -24
View File
@@ -8,8 +8,15 @@ from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.graph import Goal
from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.edge import (
DEFAULT_MAX_TOKENS,
AsyncEntryPointSpec,
EdgeCondition,
EdgeSpec,
GraphSpec,
)
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, Tool
@@ -19,6 +26,8 @@ from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.core import Runtime
from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.runtime_log_store import RuntimeLogStore
from framework.runtime.runtime_logger import RuntimeLogger
if TYPE_CHECKING:
from framework.runner.protocol import AgentMessage, CapabilityResponse
@@ -26,9 +35,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
# Configuration paths
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def _ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
@@ -58,17 +64,6 @@ def _ensure_credential_key_env() -> None:
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
def get_hive_config() -> dict[str, Any]:
"""Load hive configuration from ~/.hive/configuration.json."""
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
def get_claude_code_token() -> str | None:
"""
Get the OAuth token from Claude Code subscription.
@@ -266,11 +261,7 @@ class AgentRunner:
@staticmethod
def _resolve_default_model() -> str:
"""Resolve the default model from ~/.hive/configuration.json."""
config = get_hive_config()
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
return get_preferred_model()
def __init__(
self,
@@ -306,9 +297,9 @@ class AgentRunner:
self._storage_path = storage_path
self._temp_dir = None
else:
# Use persistent storage in ~/.hive by default
# Use persistent storage in ~/.hive/agents/{agent_name}/ per RUNTIME_LOGGING.md spec
home = Path.home()
default_storage = home / ".hive" / "storage" / agent_path.name
default_storage = home / ".hive" / "agents" / agent_path.name
default_storage.mkdir(parents=True, exist_ok=True)
self._storage_path = default_storage
self._temp_dir = None
@@ -393,7 +384,7 @@ class AgentRunner:
Args:
agent_path: Path to agent folder
mock_mode: If True, use mock LLM responses
storage_path: Path for runtime storage (defaults to ~/.hive/storage/{name})
storage_path: Path for runtime storage (defaults to ~/.hive/agents/{name})
model: LLM model to use (reads from agent's default_config if None)
enable_tui: If True, forces use of AgentRuntime with EventBus
@@ -423,7 +414,11 @@ class AgentRunner:
if agent_config and hasattr(agent_config, "model"):
model = agent_config.model
max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
if agent_config and hasattr(agent_config, "max_tokens"):
max_tokens = agent_config.max_tokens
else:
hive_config = get_hive_config()
max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# Build GraphSpec from module-level variables
graph = GraphSpec(
@@ -560,6 +555,11 @@ class AgentRunner:
def _setup(self) -> None:
"""Set up runtime, LLM, and executor."""
# Configure structured logging (auto-detects JSON vs human-readable)
from framework.observability import configure_logging
configure_logging(level="INFO", format="auto")
# Set up session context for tools (workspace_id, agent_id, session_id)
workspace_id = "default" # Could be derived from storage path
agent_id = self.graph.id or "unknown"
@@ -691,6 +691,10 @@ class AgentRunner:
# Create runtime
self._runtime = Runtime(storage_path=self._storage_path)
# Create runtime logger
log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")
runtime_logger = RuntimeLogger(store=log_store, agent_id=self.graph.id)
# Create executor
self._executor = GraphExecutor(
runtime=self._runtime,
@@ -698,6 +702,8 @@ class AgentRunner:
tools=tools,
tool_executor=tool_executor,
approval_callback=self._approval_callback,
runtime_logger=runtime_logger,
loop_config=self.graph.loop_config,
)
def _setup_agent_runtime(self, tools: list, tool_executor: Callable | None) -> None:
@@ -731,6 +737,19 @@ class AgentRunner:
)
# Create AgentRuntime with all entry points
log_store = RuntimeLogStore(base_path=self._storage_path / "runtime_logs")
# Enable checkpointing by default for resumable sessions
from framework.graph.checkpoint_config import CheckpointConfig
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False, # Only checkpoint after nodes complete
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True, # Non-blocking
)
self._agent_runtime = create_agent_runtime(
graph=self.graph,
goal=self.goal,
@@ -739,6 +758,8 @@ class AgentRunner:
llm=self._llm,
tools=tools,
tool_executor=tool_executor,
runtime_log_store=log_store,
checkpoint_config=checkpoint_config,
)
async def run(
+35 -5
View File
@@ -1,5 +1,6 @@
"""Tool discovery and registration for agent runner."""
import contextvars
import importlib.util
import inspect
import json
@@ -13,6 +14,13 @@ from framework.llm.provider import Tool, ToolResult, ToolUse
logger = logging.getLogger(__name__)
# Per-execution context overrides. Each asyncio task (and thus each
# concurrent graph execution) gets its own copy, so there are no races
# when multiple ExecutionStreams run in parallel.
_execution_context: contextvars.ContextVar[dict[str, Any] | None] = contextvars.ContextVar(
"_execution_context", default=None
)
@dataclass
class RegisteredTool:
@@ -36,7 +44,7 @@ class ToolRegistry:
# Framework-internal context keys injected into tool calls.
# Stripped from LLM-facing schemas (the LLM doesn't know these values)
# and auto-injected at call time for tools that accept them.
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id"})
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id", "data_dir"})
def __init__(self):
self._tools: dict[str, RegisteredTool] = {}
@@ -262,6 +270,24 @@ class ToolRegistry:
"""
self._session_context.update(context)
@staticmethod
def set_execution_context(**context) -> contextvars.Token:
"""Set per-execution context overrides (concurrency-safe via contextvars).
Values set here take precedence over session context. Each asyncio
task gets its own copy, so concurrent executions don't interfere.
Returns a token that must be passed to :meth:`reset_execution_context`
to restore the previous state.
"""
current = _execution_context.get() or {}
return _execution_context.set({**current, **context})
@staticmethod
def reset_execution_context(token: contextvars.Token) -> None:
"""Restore execution context to its previous state."""
_execution_context.reset(token)
def load_mcp_config(self, config_path: Path) -> None:
"""
Load and register MCP servers from a config file.
@@ -359,11 +385,15 @@ class ToolRegistry:
):
def executor(inputs: dict) -> Any:
try:
# Only inject session context params the tool accepts
# Build base context: session < execution (execution wins)
base_context = dict(registry_ref._session_context)
exec_ctx = _execution_context.get()
if exec_ctx:
base_context.update(exec_ctx)
# Only inject context params the tool accepts
filtered_context = {
k: v
for k, v in registry_ref._session_context.items()
if k in tool_params
k: v for k, v in base_context.items() if k in tool_params
}
merged_inputs = {**filtered_context, **inputs}
result = client_ref.call_tool(tool_name, merged_inputs)
@@ -0,0 +1,842 @@
# Resumable Sessions Design
## Problem Statement
Currently, when an agent encounters a failure during execution (e.g., credential validation, API errors, tool failures), the entire session is lost. This creates a poor user experience, especially when:
1. The agent has completed significant work before the failure
2. The failure is recoverable (e.g., adding missing credentials)
3. The user wants to retry from the exact failure point without redoing work
## Design Goals
1. **Crash Recovery**: Sessions can resume after process crashes or errors
2. **Partial Completion**: Preserve work done by nodes that completed successfully
3. **Flexible Resume Points**: Resume from exact failure point or previous checkpoints
4. **State Consistency**: Guarantee consistent SharedMemory and conversation state
5. **Minimal Overhead**: Checkpointing shouldn't significantly impact performance
6. **User Control**: Users can inspect, modify, and resume sessions explicitly
## Architecture
### 1. Checkpoint System
#### Checkpoint Types
**Automatic Checkpoints** (saved automatically by framework):
- `node_start`: Before each node begins execution
- `node_complete`: After each node successfully completes
- `edge_transition`: Before traversing to next node
- `loop_iteration`: At each iteration in EventLoopNode (optional)
**Manual Checkpoints** (triggered by agent designer):
- `safe_point`: Explicitly marked safe points in graph
- `user_checkpoint`: Before awaiting user input in client-facing nodes
#### Checkpoint Data Structure
```python
@dataclass
class Checkpoint:
"""Single checkpoint in execution timeline."""
# Identity
checkpoint_id: str # Format: checkpoint_{timestamp}_{uuid_short}
session_id: str
checkpoint_type: str # "node_start", "node_complete", etc.
# Timestamps
created_at: str # ISO 8601
# Execution state
current_node: str | None
next_node: str | None # For edge_transition checkpoints
execution_path: list[str] # Nodes executed so far
# Memory state (snapshot)
shared_memory: dict[str, Any] # Full SharedMemory._data
# Per-node conversation state references
# (actual conversations stored separately, reference by node_id)
conversation_states: dict[str, str] # {node_id: conversation_checkpoint_id}
# Output accumulator state
accumulated_outputs: dict[str, Any]
# Execution metrics (for resuming quality tracking)
metrics_snapshot: dict[str, Any]
# Metadata
is_clean: bool # True if no failures/retries before this checkpoint
can_resume_from: bool # False if checkpoint is in unstable state
description: str # Human-readable checkpoint description
```
#### Storage Structure
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state (existing)
├── checkpoints/
│ ├── index.json # Checkpoint index/manifest
│ ├── checkpoint_1.json # Individual checkpoints
│ ├── checkpoint_2.json
│ └── checkpoint_N.json
├── conversations/ # Per-node conversation state (existing)
│ ├── node_id_1/
│ │ ├── parts/
│ │ ├── meta.json
│ │ └── cursor.json
│ └── node_id_2/...
├── data/ # Spillover artifacts (existing)
└── logs/ # L1/L2/L3 logs (existing)
```
**Checkpoint Index Format** (`checkpoints/index.json`):
```json
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30.123Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
{
"checkpoint_id": "checkpoint_20260208_143045_abc789",
"type": "node_start",
"created_at": "2026-02-08T14:30:45.456Z",
"current_node": "analyzer",
"is_clean": true,
"can_resume_from": true,
"description": "Starting analyzer node"
}
],
"latest_checkpoint_id": "checkpoint_20260208_143045_abc789",
"total_checkpoints": 2
}
```
### 2. Resume Mechanism
#### Resume Flow
```python
# High-level resume flow
async def resume_session(
session_id: str,
checkpoint_id: str | None = None, # None = resume from latest
modifications: dict[str, Any] | None = None, # Override memory values
) -> ExecutionResult:
"""
Resume a session from a checkpoint.
Args:
session_id: Session to resume
checkpoint_id: Specific checkpoint (None = latest)
modifications: Optional memory/state modifications before resume
Returns:
ExecutionResult with resumed execution
"""
# 1. Load session state
session_state = await session_store.read_state(session_id)
# 2. Verify session is resumable
if not session_state.is_resumable:
raise ValueError(f"Session {session_id} is not resumable")
# 3. Load checkpoint
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
checkpoint_id or session_state.progress.resume_from
)
# 4. Restore state
# - Restore SharedMemory from checkpoint.shared_memory
# - Restore per-node conversations from checkpoint.conversation_states
# - Restore output accumulator from checkpoint.accumulated_outputs
# - Apply modifications if provided
# 5. Resume execution from checkpoint.next_node or checkpoint.current_node
result = await executor.execute(
graph=graph,
goal=goal,
memory=restored_memory,
entry_point=checkpoint.next_node or checkpoint.current_node,
session_state=restored_session_state,
)
# 6. Update session state with resumed execution
await session_store.write_state(session_id, updated_state)
return result
```
#### Checkpoint Restoration
```python
@dataclass
class CheckpointStore:
"""Manages checkpoint storage and retrieval."""
async def save_checkpoint(
self,
session_id: str,
checkpoint: Checkpoint,
) -> None:
"""Save a checkpoint atomically."""
# 1. Write checkpoint file: checkpoints/checkpoint_{id}.json
# 2. Update index: checkpoints/index.json
# 3. Use atomic write for crash safety
async def load_checkpoint(
self,
session_id: str,
checkpoint_id: str | None = None,
) -> Checkpoint | None:
"""Load a checkpoint by ID or latest."""
# 1. Read checkpoint index
# 2. Find checkpoint by ID (or latest if None)
# 3. Load and deserialize checkpoint file
async def list_checkpoints(
self,
session_id: str,
checkpoint_type: str | None = None,
is_clean: bool | None = None,
) -> list[Checkpoint]:
"""List all checkpoints for a session with optional filters."""
async def delete_checkpoint(
self,
session_id: str,
checkpoint_id: str,
) -> bool:
"""Delete a specific checkpoint."""
async def prune_checkpoints(
self,
session_id: str,
keep_count: int = 10,
keep_clean_only: bool = False,
) -> int:
"""Prune old checkpoints, keeping most recent N."""
```
### 3. GraphExecutor Integration
#### Modified Execution Loop
```python
# In GraphExecutor.execute()
async def execute(
self,
graph: GraphSpec,
goal: Goal,
memory: SharedMemory | None = None,
entry_point: str = "start",
session_state: dict[str, Any] | None = None,
checkpoint_config: CheckpointConfig | None = None,
) -> ExecutionResult:
"""
Execute graph with checkpointing support.
New parameters:
checkpoint_config: Configuration for checkpointing behavior
"""
# Initialize checkpoint store
checkpoint_store = CheckpointStore(storage_path / "checkpoints")
# Restore from checkpoint if session_state indicates resume
if session_state and session_state.get("resume_from"):
checkpoint = await checkpoint_store.load_checkpoint(
session_id,
session_state["resume_from"]
)
memory = self._restore_memory_from_checkpoint(checkpoint)
entry_point = checkpoint.next_node or checkpoint.current_node
current_node = entry_point
while current_node:
# CHECKPOINT: node_start
if checkpoint_config and checkpoint_config.checkpoint_on_node_start:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_start",
current_node=current_node,
memory=memory,
# ... other state
)
try:
# Execute node
result = await self._execute_node(current_node, memory, context)
# CHECKPOINT: node_complete
if checkpoint_config and checkpoint_config.checkpoint_on_node_complete:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="node_complete",
current_node=current_node,
memory=memory,
# ... other state
)
except Exception as e:
# On failure, mark current checkpoint as resume point
await self._mark_failure_checkpoint(
checkpoint_store,
current_node=current_node,
error=str(e),
)
raise
# Find next edge
next_node = self._find_next_node(current_node, result, memory)
# CHECKPOINT: edge_transition
if next_node and checkpoint_config and checkpoint_config.checkpoint_on_edge:
await self._save_checkpoint(
checkpoint_store,
checkpoint_type="edge_transition",
current_node=current_node,
next_node=next_node,
memory=memory,
# ... other state
)
current_node = next_node
```
### 4. EventLoopNode Integration
#### Conversation State Checkpointing
EventLoopNode already has conversation persistence via `ConversationStore`. For resumability:
```python
class EventLoopNode:
async def execute(self, ctx: NodeContext) -> NodeResult:
"""Execute with checkpoint support."""
# Try to restore from checkpoint
if ctx.checkpoint_id:
conversation = await self._restore_conversation(ctx.checkpoint_id)
output_accumulator = await OutputAccumulator.restore(self.store)
else:
# Fresh start
conversation = await self._initialize_conversation(ctx)
output_accumulator = OutputAccumulator(store=self.store)
# Event loop with periodic checkpointing
iteration = 0
while iteration < self.config.max_iterations:
# Optional: checkpoint every N iterations
if self.config.checkpoint_every_n_iterations:
if iteration % self.config.checkpoint_every_n_iterations == 0:
await self._save_loop_checkpoint(
conversation,
output_accumulator,
iteration,
)
# ... rest of event loop
iteration += 1
```
**Note**: EventLoopNode conversation state is already persisted to disk after each turn via `ConversationStore`, so it's naturally resumable. We just need to:
1. Track which conversation checkpoint to restore from
2. Ensure output accumulator state is also restored
### 5. User-Facing API
#### MCP Tools for Resume
```python
# In tools/src/aden_tools/tools/session_management/
@tool
async def list_resumable_sessions(
agent_work_dir: str,
status: str = "failed", # "failed", "paused", "cancelled"
limit: int = 20,
) -> dict:
"""
List sessions that can be resumed.
Returns:
{
"sessions": [
{
"session_id": "session_20260208_143022_abc12345",
"status": "failed",
"error": "Missing API key: OPENAI_API_KEY",
"failed_at_node": "analyzer",
"last_checkpoint": "checkpoint_20260208_143045_abc789",
"created_at": "2026-02-08T14:30:22Z",
"updated_at": "2026-02-08T14:30:45Z"
}
],
"total": 1
}
"""
@tool
async def list_session_checkpoints(
agent_work_dir: str,
session_id: str,
checkpoint_type: str = "", # Filter by type
clean_only: bool = False, # Only show clean checkpoints
) -> dict:
"""
List all checkpoints for a session.
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"checkpoints": [
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"created_at": "2026-02-08T14:30:30Z",
"current_node": "collector",
"is_clean": true,
"can_resume_from": true,
"description": "Completed collector node successfully"
},
...
]
}
"""
@tool
async def inspect_checkpoint(
agent_work_dir: str,
session_id: str,
checkpoint_id: str,
include_memory: bool = False, # Include full memory state
) -> dict:
"""
Inspect a checkpoint's detailed state.
Returns:
{
"checkpoint_id": "checkpoint_20260208_143030_xyz123",
"type": "node_complete",
"current_node": "collector",
"execution_path": ["start", "collector"],
"accumulated_outputs": {
"twitter_handles": ["@user1", "@user2"]
},
"memory": {...}, # If include_memory=True
"metrics_snapshot": {
"total_retries": 2,
"nodes_with_failures": []
}
}
"""
@tool
async def resume_session(
agent_work_dir: str,
session_id: str,
checkpoint_id: str = "", # Empty = latest checkpoint
memory_modifications: str = "{}", # JSON string of memory overrides
) -> dict:
"""
Resume a session from a checkpoint.
Args:
agent_work_dir: Path to agent workspace
session_id: Session to resume
checkpoint_id: Specific checkpoint (empty = latest)
memory_modifications: JSON object with memory key overrides
Returns:
{
"session_id": "session_20260208_143022_abc12345",
"resumed_from": "checkpoint_20260208_143045_abc789",
"status": "active", # Now actively running
"message": "Session resumed successfully from checkpoint_20260208_143045_abc789"
}
"""
```
#### CLI Commands
```bash
# List resumable sessions
hive sessions list --agent twitter_outreach --status failed
# Show checkpoints for a session
hive sessions checkpoints session_20260208_143022_abc12345
# Inspect a checkpoint
hive sessions inspect session_20260208_143022_abc12345 checkpoint_20260208_143045_abc789
# Resume a session
hive sessions resume session_20260208_143022_abc12345
# Resume from specific checkpoint
hive sessions resume session_20260208_143022_abc12345 --checkpoint checkpoint_20260208_143030_xyz123
# Resume with memory modifications (e.g., after adding credentials)
hive sessions resume session_20260208_143022_abc12345 --set api_key=sk-...
```
### 6. Configuration
#### CheckpointConfig
```python
@dataclass
class CheckpointConfig:
"""Configuration for checkpoint behavior."""
# When to checkpoint
checkpoint_on_node_start: bool = True
checkpoint_on_node_complete: bool = True
checkpoint_on_edge: bool = False # Usually redundant with node_start
checkpoint_on_loop_iteration: bool = False # Can be expensive
checkpoint_every_n_iterations: int = 0 # 0 = disabled
# Pruning
max_checkpoints_per_session: int = 100
prune_after_node_count: int = 10 # Prune every N nodes
keep_clean_checkpoints_only: bool = False
# Performance
async_checkpoint: bool = True # Don't block execution on checkpoint writes
# What to include
include_conversation_snapshots: bool = True
include_full_memory: bool = True
```
#### Agent-Level Configuration
```python
# In agent.py or config.py
class MyAgent(Agent):
def get_checkpoint_config(self) -> CheckpointConfig:
"""Override to customize checkpoint behavior."""
return CheckpointConfig(
checkpoint_on_node_start=True,
checkpoint_on_node_complete=True,
checkpoint_every_n_iterations=5, # Checkpoint every 5 iterations in loops
max_checkpoints_per_session=50,
)
```
## Implementation Plan
### Phase 1: Core Checkpoint Infrastructure (Week 1)
1. **Create checkpoint schemas**
- `Checkpoint` dataclass
- `CheckpointIndex` for manifest
- Serialization/deserialization
2. **Implement CheckpointStore**
- `save_checkpoint()` with atomic writes
- `load_checkpoint()` with deserialization
- `list_checkpoints()` with filtering
- `prune_checkpoints()` for cleanup
3. **Update SessionState schema**
- Add `resume_from_checkpoint_id` field
- Add `checkpoints_enabled` flag
### Phase 2: GraphExecutor Integration (Week 2)
1. **Modify GraphExecutor**
- Add `CheckpointConfig` parameter
- Implement checkpoint saving at node boundaries
- Implement checkpoint restoration logic
- Handle memory state snapshots
2. **Update execution loop**
- Checkpoint before node execution
- Checkpoint after successful completion
- Mark failure checkpoints on errors
### Phase 3: EventLoopNode Integration (Week 3)
1. **Enhance conversation restoration**
- Link checkpoints to conversation states
- Ensure OutputAccumulator is checkpointed
- Test loop resumption from middle of execution
2. **Add optional loop iteration checkpoints**
- Configurable iteration frequency
- Balance between granularity and performance
### Phase 4: User-Facing Features (Week 4)
1. **Implement MCP tools**
- `list_resumable_sessions`
- `list_session_checkpoints`
- `inspect_checkpoint`
- `resume_session`
2. **Add CLI commands**
- `hive sessions list`
- `hive sessions checkpoints`
- `hive sessions inspect`
- `hive sessions resume`
3. **Update TUI**
- Show resumable sessions in UI
- Allow resume from TUI interface
### Phase 5: Testing & Documentation (Week 5)
1. **Write comprehensive tests**
- Unit tests for CheckpointStore
- Integration tests for resume flow
- Edge case testing (concurrent checkpoints, corruption, etc.)
2. **Performance testing**
- Measure checkpoint overhead
- Optimize async checkpoint writing
- Test with large memory states
3. **Documentation**
- Update skills with resume patterns
- Document checkpoint configuration
- Add troubleshooting guide
## Performance Considerations
### Checkpoint Overhead
**Estimated overhead per checkpoint**:
- Memory serialization: ~5-10ms for typical state (< 1MB)
- File I/O: ~10-20ms for atomic write
- Total: ~15-30ms per checkpoint
**Mitigation strategies**:
1. **Async checkpointing**: Don't block execution on writes
2. **Selective checkpointing**: Only checkpoint at important boundaries
3. **Incremental checkpoints**: Store deltas instead of full state (future)
4. **Compression**: Compress large memory states before writing
### Storage Size
**Typical checkpoint size**:
- Small memory state (< 100KB): ~50-100KB per checkpoint
- Medium memory state (< 1MB): ~500KB-1MB per checkpoint
- Large memory state (> 1MB): ~1-5MB per checkpoint
**Mitigation strategies**:
1. **Pruning**: Keep only N most recent checkpoints
2. **Clean-only retention**: Only keep checkpoints from clean execution
3. **Compression**: Use gzip for checkpoint files
4. **Archiving**: Move old checkpoints to archive storage
## Error Handling
### Checkpoint Save Failures
**Scenarios**:
- Disk full
- Permission errors
- Serialization failures
- Concurrent writes
**Handling**:
```python
try:
await checkpoint_store.save_checkpoint(session_id, checkpoint)
except CheckpointSaveError as e:
# Log warning but don't fail execution
logger.warning(f"Failed to save checkpoint: {e}")
# Continue execution without checkpoint
```
### Checkpoint Load Failures
**Scenarios**:
- Checkpoint file corrupted
- Checkpoint format incompatible
- Referenced conversation state missing
**Handling**:
```python
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, checkpoint_id)
except CheckpointLoadError as e:
# Try to find previous valid checkpoint
checkpoints = await checkpoint_store.list_checkpoints(session_id)
for cp in reversed(checkpoints):
try:
checkpoint = await checkpoint_store.load_checkpoint(session_id, cp.checkpoint_id)
logger.info(f"Fell back to checkpoint {cp.checkpoint_id}")
break
except CheckpointLoadError:
continue
else:
raise ValueError(f"No valid checkpoints found for session {session_id}")
```
### Resume Failures
**Scenarios**:
- Checkpoint state inconsistent with current graph
- Node no longer exists in updated agent code
- Memory keys missing required values
**Handling**:
1. **Validation**: Verify checkpoint compatibility before resume
2. **Graceful degradation**: Resume from earlier checkpoint if possible
3. **User notification**: Clear error messages about why resume failed
## Migration Path
### Backward Compatibility
**Existing sessions** (without checkpoints):
- Can still be executed normally
- Checkpoint system is opt-in per agent
- No breaking changes to existing APIs
**Enabling checkpoints**:
```python
# Option 1: Agent-level default
class MyAgent(Agent):
checkpoint_config = CheckpointConfig(
checkpoint_on_node_complete=True,
)
# Option 2: Runtime override
runtime = create_agent_runtime(
agent=my_agent,
checkpoint_config=CheckpointConfig(...),
)
# Option 3: Per-execution
result = await executor.execute(
graph=graph,
goal=goal,
checkpoint_config=CheckpointConfig(...),
)
```
### Gradual Rollout
1. **Phase 1**: Core infrastructure, no user-facing features
2. **Phase 2**: Opt-in for specific agents via config
3. **Phase 3**: User-facing MCP tools and CLI
4. **Phase 4**: Enable by default for all new agents
5. **Phase 5**: TUI integration
## Future Enhancements
### 1. Incremental Checkpoints
Instead of full state snapshots, store only deltas:
```python
@dataclass
class IncrementalCheckpoint:
"""Checkpoint with only changed state."""
base_checkpoint_id: str # Parent checkpoint
memory_delta: dict[str, Any] # Only changed keys
added_outputs: dict[str, Any] # Only new outputs
```
### 2. Distributed Checkpointing
For long-running agents, checkpoint to cloud storage:
```python
checkpoint_config = CheckpointConfig(
storage_backend="s3", # or "gcs", "azure"
storage_url="s3://my-bucket/checkpoints/",
)
```
### 3. Checkpoint Compression
Compress large memory states:
```python
checkpoint_config = CheckpointConfig(
compress=True,
compression_threshold_bytes=100_000, # Compress if > 100KB
)
```
### 4. Smart Checkpoint Selection
Use heuristics to decide when to checkpoint:
```python
class SmartCheckpointStrategy:
def should_checkpoint(self, context: ExecutionContext) -> bool:
# Checkpoint after expensive nodes
if context.node_latency_ms > 30_000:
return True
# Checkpoint before risky operations
if context.node_id in ["api_call", "external_tool"]:
return True
# Checkpoint after significant memory changes
if context.memory_delta_size > 10:
return True
return False
```
## Security Considerations
### 1. Sensitive Data in Checkpoints
**Problem**: Checkpoints may contain sensitive data (API keys, credentials, PII)
**Mitigation**:
```python
@dataclass
class CheckpointConfig:
# Exclude sensitive keys from checkpoint
exclude_memory_keys: list[str] = field(default_factory=lambda: [
"api_key",
"credentials",
"access_token",
])
# Encrypt checkpoint files
encrypt_checkpoints: bool = True
encryption_key_source: str = "keychain" # or "env_var", "file"
```
### 2. Checkpoint Tampering
**Problem**: Malicious modification of checkpoint files
**Mitigation**:
```python
@dataclass
class Checkpoint:
# Add cryptographic signature
signature: str # HMAC of checkpoint content
def verify_signature(self, secret_key: str) -> bool:
"""Verify checkpoint hasn't been tampered with."""
...
```
## References
- [RUNTIME_LOGGING.md](./RUNTIME_LOGGING.md) - Current logging system
- [session_state.py](../schemas/session_state.py) - Session state schema
- [session_store.py](../storage/session_store.py) - Session storage
- [executor.py](../graph/executor.py) - Graph executor
- [event_loop_node.py](../graph/event_loop_node.py) - EventLoop implementation
+698
View File
@@ -0,0 +1,698 @@
# Runtime Logging System
## Overview
The Hive framework uses a **three-level observability system** for tracking agent execution at different granularities:
- **L1 (Summary)**: High-level run outcomes - success/failure, execution quality, attention flags
- **L2 (Details)**: Per-node completion details - retries, verdicts, latency, attention reasons
- **L3 (Tool Logs)**: Step-by-step execution - tool calls, LLM responses, judge feedback
This layered approach enables efficient debugging: start with L1 to identify problematic runs, drill into L2 to find failing nodes, and analyze L3 for root cause details.
---
## Storage Architecture
### Current Structure (Unified Sessions)
**Default since 2026-02-06**
```
~/.hive/agents/{agent_name}/
└── sessions/
└── session_YYYYMMDD_HHMMSS_{uuid}/
├── state.json # Session state and metadata
├── logs/ # Runtime logs (L1/L2/L3)
│ ├── summary.json # L1: Run outcome
│ ├── details.jsonl # L2: Per-node results
│ └── tool_logs.jsonl # L3: Step-by-step execution
├── conversations/ # Per-node EventLoop state
└── data/ # Spillover artifacts
```
**Key characteristics:**
- All session data colocated in one directory
- Consistent ID format: `session_YYYYMMDD_HHMMSS_{short_uuid}`
- Logs written incrementally (JSONL for L2/L3)
- Single source of truth: `state.json`
### Legacy Structure (Deprecated)
**Read-only for backward compatibility**
```
~/.hive/agents/{agent_name}/
├── runtime_logs/
│ └── runs/
│ └── {run_id}/
│ ├── summary.json # L1
│ ├── details.jsonl # L2
│ └── tool_logs.jsonl # L3
├── sessions/
│ └── exec_{stream_id}_{uuid}/
│ ├── conversations/
│ └── data/
├── runs/ # Deprecated
│ └── run_start_*.json
└── summaries/ # Deprecated
└── run_start_*.json
```
**Migration status:**
- ✅ New sessions write to unified structure only
- ✅ Old sessions remain readable
- ❌ No new writes to `runs/`, `summaries/`, `runtime_logs/runs/`
- ⚠️ Deprecation warnings emitted when reading old locations
---
## Components
### RuntimeLogger
**Location:** `core/framework/runtime/runtime_logger.py`
**Responsibilities:**
- Receives execution events from GraphExecutor
- Tracks per-node execution details
- Aggregates attention flags
- Coordinates with RuntimeLogStore
**Key methods:**
```python
def start_run(goal_id: str, session_id: str = "") -> str:
"""Initialize a new run. Uses session_id as run_id if provided."""
def log_step(node_id: str, step_index: int, tool_calls: list, ...):
"""Record one LLM step (L3). Appends to tool_logs.jsonl immediately."""
def log_node_complete(node_id: str, exit_status: str, ...):
"""Record node completion (L2). Appends to details.jsonl immediately."""
async def end_run(status: str):
"""Finalize run, aggregate L2→L1, write summary.json."""
```
**Attention flag triggers:**
```python
# From runtime_logger.py:190-203
needs_attention = any([
retry_count > 3,
escalate_count > 2,
latency_ms > 60000,
tokens_used > 100000,
total_steps > 20,
])
```
### RuntimeLogStore
**Location:** `core/framework/runtime/runtime_log_store.py`
**Responsibilities:**
- Manages log file I/O
- Handles both old and new storage paths
- Provides incremental append for L2/L3 (crash-safe)
- Atomic writes for L1
**Storage path resolution:**
```python
def _get_run_dir(run_id: str) -> Path:
"""Determine log directory based on run_id format.
- session_* → {storage_root}/sessions/{run_id}/logs/
- Other → {base_path}/runtime_logs/runs/{run_id}/ (deprecated)
"""
```
**Key methods:**
```python
def ensure_run_dir(run_id: str):
"""Create log directory immediately at start_run()."""
def append_step(run_id: str, step: NodeStepLog):
"""Append L3 entry to tool_logs.jsonl. Thread-safe sync write."""
def append_node_detail(run_id: str, detail: NodeDetail):
"""Append L2 entry to details.jsonl. Thread-safe sync write."""
async def save_summary(run_id: str, summary: RunSummaryLog):
"""Write L1 summary.json atomically at end_run()."""
```
**File format:**
- **L1 (summary.json)**: Standard JSON, written once at end
- **L2 (details.jsonl)**: JSONL (one object per line), appended per node
- **L3 (tool_logs.jsonl)**: JSONL (one object per line), appended per step
### Runtime Log Schemas
**Location:** `core/framework/runtime/runtime_log_schemas.py`
**L1: RunSummaryLog**
```python
@dataclass
class RunSummaryLog:
run_id: str
goal_id: str
status: str # "success", "failure", "degraded", "in_progress"
started_at: str # ISO 8601
ended_at: str | None
needs_attention: bool
attention_summary: AttentionSummary
total_nodes_executed: int
nodes_with_failures: list[str]
execution_quality: str # "clean", "degraded", "failed"
total_latency_ms: int
# ... additional metrics
```
**L2: NodeDetail**
```python
@dataclass
class NodeDetail:
node_id: str
exit_status: str # "success", "escalate", "no_valid_edge"
retry_count: int
verdict_counts: dict[str, int] # {ACCEPT: 1, RETRY: 3, ...}
total_steps: int
latency_ms: int
needs_attention: bool
attention_reasons: list[str]
# ... tool error tracking, token counts
```
**L3: NodeStepLog**
```python
@dataclass
class NodeStepLog:
node_id: str
step_index: int
tool_calls: list[dict]
tool_results: list[dict]
verdict: str # "ACCEPT", "RETRY", "ESCALATE", "CONTINUE"
verdict_feedback: str
llm_response_text: str
tokens_used: int
latency_ms: int
# ... detailed execution state
# Trace context (OTel-aligned; empty if observability context not set):
trace_id: str # From set_trace_context (OTel trace)
span_id: str # 16 hex chars per step (OTel span)
parent_span_id: str # Optional; for nested span hierarchy
execution_id: str # Session/run correlation id
```
L3 entries include `trace_id`, `span_id`, and `execution_id` for correlation and **OpenTelemetry (OTel) compatibility**. When the framework sets trace context (e.g. via `Runtime.start_run()` or `StreamRuntime.start_run()`), these fields are populated automatically so L3 data can be exported to OTel backends without schema changes.
**L2: NodeDetail** also includes `trace_id` and `span_id`; **L1: RunSummaryLog** includes `trace_id` and `execution_id` for the same correlation.
---
## Querying Logs (MCP Tools)
### Tools Location
**MCP Server:** `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py`
Three MCP tools provide access to the logging system:
### L1: query_runtime_logs
**Purpose:** Find problematic runs
```python
query_runtime_logs(
agent_work_dir: str, # e.g., "~/.hive/agents/twitter_outreach"
status: str = "", # "needs_attention", "success", "failure", "degraded"
limit: int = 20
) -> dict # {"runs": [...], "total": int}
```
**Returns:**
```json
{
"runs": [
{
"run_id": "session_20260206_115718_e22339c5",
"status": "degraded",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"]
},
"started_at": "2026-02-06T11:57:18Z"
}
],
"total": 1
}
```
**Common queries:**
```python
# Find all problematic runs
query_runtime_logs(agent_work_dir, status="needs_attention")
# Get recent runs regardless of status
query_runtime_logs(agent_work_dir, limit=10)
# Check for failures
query_runtime_logs(agent_work_dir, status="failure")
```
### L2: query_runtime_log_details
**Purpose:** Identify which nodes failed
```python
query_runtime_log_details(
agent_work_dir: str,
run_id: str, # From L1 query
needs_attention_only: bool = False,
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "nodes": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"nodes": [
{
"node_id": "intake-collector",
"exit_status": "escalate",
"retry_count": 5,
"verdict_counts": {"RETRY": 5, "ESCALATE": 1},
"attention_reasons": ["high_retry_count", "missing_outputs"],
"total_steps": 8,
"latency_ms": 12500,
"needs_attention": true
}
]
}
```
**Common queries:**
```python
# Get all problematic nodes
query_runtime_log_details(agent_work_dir, run_id, needs_attention_only=True)
# Analyze specific node across run
query_runtime_log_details(agent_work_dir, run_id, node_id="intake-collector")
# Full node breakdown
query_runtime_log_details(agent_work_dir, run_id)
```
### L3: query_runtime_log_raw
**Purpose:** Root cause analysis
```python
query_runtime_log_raw(
agent_work_dir: str,
run_id: str,
step_index: int = -1, # Specific step or -1 for all
node_id: str = "" # Filter to specific node
) -> dict # {"run_id": str, "steps": [...]}
```
**Returns:**
```json
{
"run_id": "session_20260206_115718_e22339c5",
"steps": [
{
"node_id": "intake-collector",
"step_index": 3,
"tool_calls": [
{
"tool": "web_search",
"args": {"query": "@RomuloNevesOf"}
}
],
"tool_results": [
{
"status": "success",
"data": "..."
}
],
"verdict": "RETRY",
"verdict_feedback": "Missing required output 'twitter_handles'. You found the handle but didn't call set_output.",
"llm_response_text": "I found the Twitter profile...",
"tokens_used": 1234,
"latency_ms": 2500
}
]
}
```
**Common queries:**
```python
# All steps for a problematic node
query_runtime_log_raw(agent_work_dir, run_id, node_id="intake-collector")
# Specific step analysis
query_runtime_log_raw(agent_work_dir, run_id, step_index=5)
# Full execution trace
query_runtime_log_raw(agent_work_dir, run_id)
```
---
## Usage Patterns
### Pattern 1: Top-Down Investigation
**Use case:** Debug a failing agent
```python
# 1. Find problematic runs (L1)
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/twitter_outreach",
status="needs_attention"
)
run_id = result["runs"][0]["run_id"]
# 2. Identify failing nodes (L2)
details = query_runtime_log_details(
agent_work_dir="~/.hive/agents/twitter_outreach",
run_id=run_id,
needs_attention_only=True
)
problem_node = details["nodes"][0]["node_id"]
# 3. Analyze root cause (L3)
raw = query_runtime_log_raw(
agent_work_dir="~/.hive/agents/twitter_outreach",
run_id=run_id,
node_id=problem_node
)
# Examine verdict_feedback, tool_results, etc.
```
### Pattern 2: Node-Specific Debugging
**Use case:** Investigate why a specific node keeps failing
```python
# Get recent runs
runs = query_runtime_logs("~/.hive/agents/my_agent", limit=10)
# For each run, check specific node
for run in runs["runs"]:
node_details = query_runtime_log_details(
"~/.hive/agents/my_agent",
run["run_id"],
node_id="problematic-node"
)
# Analyze retry patterns, error types
```
### Pattern 3: Real-Time Monitoring
**Use case:** Watch for issues during development
```python
import time
while True:
result = query_runtime_logs(
agent_work_dir="~/.hive/agents/my_agent",
status="needs_attention",
limit=1
)
if result["total"] > 0:
new_issue = result["runs"][0]
print(f"⚠️ New issue detected: {new_issue['run_id']}")
# Alert or drill into L2/L3
time.sleep(10) # Poll every 10 seconds
```
---
## Integration Points
### GraphExecutor → RuntimeLogger
**Location:** `core/framework/graph/executor.py`
```python
# Executor creates logger and passes session_id
logger = RuntimeLogger(store, agent_id)
run_id = logger.start_run(goal_id, session_id=execution_id)
# During execution
logger.log_step(node_id, step_index, tool_calls, ...)
logger.log_node_complete(node_id, exit_status, ...)
# At completion
await logger.end_run(status="success")
```
### EventLoopNode → RuntimeLogger
**Location:** `core/framework/graph/event_loop_node.py`
```python
# EventLoopNode logs each step
self._logger.log_step(
node_id=self.id,
step_index=step_count,
tool_calls=current_tool_calls,
tool_results=current_tool_results,
verdict=verdict,
verdict_feedback=feedback,
...
)
```
### AgentRuntime → RuntimeLogger
**Location:** `core/framework/runtime/agent_runtime.py`
```python
# Runtime initializes logger with storage path
log_store = RuntimeLogStore(base_path / "runtime_logs")
logger = RuntimeLogger(log_store, agent_id)
# Passes session_id from ExecutionStream
logger.start_run(goal_id, session_id=execution_id)
```
---
## File Format Details
### L1: summary.json
**Written:** Once at end_run()
**Format:** Standard JSON
```json
{
"run_id": "session_20260206_115718_e22339c5",
"goal_id": "twitter-outreach-multi-loop",
"status": "degraded",
"started_at": "2026-02-06T11:57:18.593081",
"ended_at": "2026-02-06T11:58:45.123456",
"needs_attention": true,
"attention_summary": {
"total_attention_flags": 3,
"categories": ["missing_outputs", "retry_loops"],
"nodes_with_attention": ["intake-collector"]
},
"total_nodes_executed": 4,
"nodes_with_failures": ["intake-collector"],
"execution_quality": "degraded",
"total_latency_ms": 86530,
"total_retries": 5
}
```
### L2: details.jsonl
**Written:** Incrementally (append per node completion)
**Format:** JSONL (one JSON object per line)
```jsonl
{"node_id":"intake-collector","exit_status":"escalate","retry_count":5,"verdict_counts":{"RETRY":5,"ESCALATE":1},"total_steps":8,"latency_ms":12500,"needs_attention":true,"attention_reasons":["high_retry_count","missing_outputs"],"tool_error_count":0,"tokens_used":9876}
{"node_id":"profile-analyzer","exit_status":"success","retry_count":0,"verdict_counts":{"ACCEPT":1},"total_steps":2,"latency_ms":5432,"needs_attention":false,"attention_reasons":[],"tool_error_count":0,"tokens_used":3456}
```
### L3: tool_logs.jsonl
**Written:** Incrementally (append per step)
**Format:** JSONL (one JSON object per line)
Each line includes **trace context** when the framework has set it (via the observability module): `trace_id`, `span_id`, `parent_span_id` (optional), and `execution_id`. These align with OpenTelemetry/W3C TraceContext so L3 data can be exported to OTel backends without schema changes.
```jsonl
{"node_id":"intake-collector","step_index":3,"trace_id":"54e80d7b5bd6409dbc3217e5cd16a4fd","span_id":"a1b2c3d4e5f67890","execution_id":"b4c348ec54e80d7b5bd6409dbc3217e50","tool_calls":[...],"verdict":"RETRY",...}
```
**Why JSONL?**
- Incremental append during execution (crash-safe)
- No need to parse entire file to add one line
- Data persisted immediately, not buffered
- Easy to stream/process line-by-line
---
## Attention Flags System
### Automatic Detection
The runtime logger automatically flags issues based on execution metrics:
| Trigger | Threshold | Attention Reason | Category |
|---------|-----------|------------------|----------|
| High retries | `retry_count > 3` | `high_retry_count` | Retry Loops |
| Escalations | `escalate_count > 2` | `escalation_pattern` | Guard Failures |
| High latency | `latency_ms > 60000` | `high_latency` | High Latency |
| Token usage | `tokens_used > 100000` | `high_token_usage` | Memory/Context |
| Stalled steps | `total_steps > 20` | `excessive_steps` | Stalled Execution |
| Tool errors | `tool_error_count > 0` | `tool_failures` | Tool Errors |
| Missing outputs | `exit_status != "success"` | `missing_outputs` | Missing Outputs |
### Attention Categories
Used by `/hive-debugger` skill for issue categorization:
1. **Missing Outputs**: Node didn't set required output keys
2. **Tool Errors**: Tool calls failed (API errors, timeouts)
3. **Retry Loops**: Judge repeatedly rejecting outputs
4. **Guard Failures**: Output validation failed
5. **Stalled Execution**: EventLoopNode not making progress
6. **High Latency**: Slow tool calls or LLM responses
7. **Client-Facing Issues**: Premature set_output before user input
8. **Edge Routing Errors**: No edges match current state
9. **Memory/Context Issues**: Conversation history too long
10. **Constraint Violations**: Agent violated goal-level rules
---
## Migration Guide
### Reading Old Logs
The system automatically handles both old and new formats:
```python
# MCP tools check both locations automatically
result = query_runtime_logs("~/.hive/agents/old_agent")
# Returns logs from both:
# - ~/.hive/agents/old_agent/runtime_logs/runs/*/
# - ~/.hive/agents/old_agent/sessions/session_*/logs/
```
### Deprecation Warnings
When reading from old locations, deprecation warnings are emitted:
```
DeprecationWarning: Reading logs from deprecated location for run_id=20260101T120000_abc12345.
New sessions use unified storage at sessions/session_*/logs/
```
### Migration Script (Optional)
For migrating existing old logs to new format, see:
- `EXECUTION_STORAGE_REDESIGN.md` - Migration strategy
- Future: `scripts/migrate_to_unified_sessions.py`
---
## Performance Characteristics
### Write Performance
- **L3 append**: ~1-2ms per step (sync I/O, thread-safe)
- **L2 append**: ~1-2ms per node (sync I/O, thread-safe)
- **L1 write**: ~5-10ms at end_run (atomic, async)
**Overhead:** < 5% of total execution time for typical agents
### Read Performance
- **L1 summary**: ~1-5ms (single JSON file)
- **L2 details**: ~10-50ms (JSONL, depends on node count)
- **L3 raw logs**: ~50-500ms (JSONL, depends on step count)
**Optimization:** Use filters (node_id, step_index) to reduce data read
### Storage Size
Typical session with 5 nodes, 20 steps:
- **L1 (summary.json)**: ~2-5 KB
- **L2 (details.jsonl)**: ~5-10 KB (1-2 KB per node)
- **L3 (tool_logs.jsonl)**: ~50-200 KB (2-10 KB per step)
**Total per session:** ~60-215 KB
**Compression:** Consider archiving old sessions after 90 days
---
## Troubleshooting
### Issue: Logs not appearing
**Symptom:** MCP tools return empty results
**Check:**
1. Verify storage path exists: `~/.hive/agents/{agent_name}/`
2. Check session directories: `ls ~/.hive/agents/{agent_name}/sessions/`
3. Verify logs directory exists: `ls ~/.hive/agents/{agent_name}/sessions/session_*/logs/`
4. Check file permissions
### Issue: Corrupt JSONL files
**Symptom:** Partial data or JSON decode errors
**Cause:** Process crash during write (rare, but possible)
**Recovery:**
```python
# MCP tools skip corrupt lines automatically
query_runtime_log_details(agent_work_dir, run_id)
# Logs warning but continues with valid lines
```
### Issue: High disk usage
**Symptom:** Storage growing too large
**Solution:**
```bash
# Archive old sessions
cd ~/.hive/agents/{agent_name}/sessions/
find . -name "session_2025*" -type d -exec tar -czf archive.tar.gz {} +
rm -rf session_2025*
# Or set up automatic cleanup (future feature)
```
---
## References
**Implementation:**
- `core/framework/runtime/runtime_logger.py` - Logger implementation
- `core/framework/runtime/runtime_log_store.py` - Storage layer
- `core/framework/runtime/runtime_log_schemas.py` - Data schemas
- `tools/src/aden_tools/tools/runtime_logs_tool/runtime_logs_tool.py` - MCP query tools
**Documentation:**
- `EXECUTION_STORAGE_REDESIGN.md` - Unified session storage design
- `/.claude/skills/hive-debugger/SKILL.md` - Interactive debugging skill
**Related:**
- `core/framework/schemas/session_state.py` - Session state schema
- `core/framework/storage/session_store.py` - Session state storage
- `core/framework/graph/executor.py` - GraphExecutor integration
+35 -1
View File
@@ -12,12 +12,14 @@ from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.executor import ExecutionResult
from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.runtime.shared_state import SharedStateManager
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
@@ -100,6 +102,8 @@ class AgentRuntime:
tools: list["Tool"] | None = None,
tool_executor: Callable | None = None,
config: AgentRuntimeConfig | None = None,
runtime_log_store: Any = None,
checkpoint_config: CheckpointConfig | None = None,
):
"""
Initialize agent runtime.
@@ -112,18 +116,26 @@ class AgentRuntime:
tools: Available tools
tool_executor: Function to execute tools
config: Optional runtime configuration
runtime_log_store: Optional RuntimeLogStore for per-execution logging
checkpoint_config: Optional checkpoint configuration for resumable sessions
"""
self.graph = graph
self.goal = goal
self._config = config or AgentRuntimeConfig()
self._runtime_log_store = runtime_log_store
self._checkpoint_config = checkpoint_config
# Initialize storage
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
self._storage = ConcurrentStorage(
base_path=storage_path,
base_path=storage_path_obj,
cache_ttl=self._config.cache_ttl,
batch_interval=self._config.batch_interval,
)
# Initialize SessionStore for unified sessions (always enabled)
self._session_store = SessionStore(storage_path_obj)
# Initialize shared components
self._state_manager = SharedStateManager()
self._event_bus = EventBus(max_history=self._config.max_history)
@@ -212,6 +224,9 @@ class AgentRuntime:
tool_executor=self._tool_executor,
result_retention_max=self._config.execution_result_max,
result_retention_ttl_seconds=self._config.execution_result_ttl_seconds,
runtime_log_store=self._runtime_log_store,
session_store=self._session_store,
checkpoint_config=self._checkpoint_config,
)
await stream.start()
self._streams[ep_id] = stream
@@ -448,11 +463,15 @@ def create_agent_runtime(
tools: list["Tool"] | None = None,
tool_executor: Callable | None = None,
config: AgentRuntimeConfig | None = None,
runtime_log_store: Any = None,
enable_logging: bool = True,
checkpoint_config: CheckpointConfig | None = None,
) -> AgentRuntime:
"""
Create and configure an AgentRuntime with entry points.
Convenience factory that creates runtime and registers entry points.
Runtime logging is enabled by default for observability.
Args:
graph: Graph specification
@@ -463,10 +482,23 @@ def create_agent_runtime(
tools: Available tools
tool_executor: Tool executor function
config: Runtime configuration
runtime_log_store: Optional RuntimeLogStore for per-execution logging.
If None and enable_logging=True, creates one automatically.
enable_logging: Whether to enable runtime logging (default: True).
Set to False to disable logging entirely.
checkpoint_config: Optional checkpoint configuration for resumable sessions.
If None, uses default checkpointing behavior.
Returns:
Configured AgentRuntime (not yet started)
"""
# Auto-create runtime log store if logging is enabled and not provided
if enable_logging and runtime_log_store is None:
from framework.runtime.runtime_log_store import RuntimeLogStore
storage_path_obj = Path(storage_path) if isinstance(storage_path, str) else storage_path
runtime_log_store = RuntimeLogStore(storage_path_obj / "runtime_logs")
runtime = AgentRuntime(
graph=graph,
goal=goal,
@@ -475,6 +507,8 @@ def create_agent_runtime(
tools=tools,
tool_executor=tool_executor,
config=config,
runtime_log_store=runtime_log_store,
checkpoint_config=checkpoint_config,
)
for spec in entry_points:
+9
View File
@@ -13,6 +13,7 @@ from datetime import datetime
from pathlib import Path
from typing import Any
from framework.observability import set_trace_context
from framework.schemas.decision import Decision, DecisionType, Option, Outcome
from framework.schemas.run import Run, RunStatus
from framework.storage.backend import FileStorage
@@ -79,6 +80,14 @@ class Runtime:
The run ID
"""
run_id = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
trace_id = uuid.uuid4().hex
execution_id = uuid.uuid4().hex # 32 hex, OTel/W3C-aligned for logs
set_trace_context(
trace_id=trace_id,
execution_id=execution_id,
goal_id=goal_id,
)
self._current_run = Run(
id=run_id,
+205 -6
View File
@@ -17,6 +17,7 @@ from dataclasses import dataclass, field
from datetime import datetime
from typing import TYPE_CHECKING, Any
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.runtime.shared_state import IsolationLevel, SharedStateManager
from framework.runtime.stream_runtime import StreamRuntime, StreamRuntimeAdapter
@@ -28,6 +29,7 @@ if TYPE_CHECKING:
from framework.runtime.event_bus import EventBus
from framework.runtime.outcome_aggregator import OutcomeAggregator
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
logger = logging.getLogger(__name__)
@@ -112,6 +114,9 @@ class ExecutionStream:
tool_executor: Callable | None = None,
result_retention_max: int | None = 1000,
result_retention_ttl_seconds: float | None = None,
runtime_log_store: Any = None,
session_store: "SessionStore | None" = None,
checkpoint_config: CheckpointConfig | None = None,
):
"""
Initialize execution stream.
@@ -128,6 +133,9 @@ class ExecutionStream:
llm: LLM provider for nodes
tools: Available tools
tool_executor: Function to execute tools
runtime_log_store: Optional RuntimeLogStore for per-execution logging
session_store: Optional SessionStore for unified session storage
checkpoint_config: Optional checkpoint configuration for resumable sessions
"""
self.stream_id = stream_id
self.entry_spec = entry_spec
@@ -142,6 +150,9 @@ class ExecutionStream:
self._tool_executor = tool_executor
self._result_retention_max = result_retention_max
self._result_retention_ttl_seconds = result_retention_ttl_seconds
self._runtime_log_store = runtime_log_store
self._checkpoint_config = checkpoint_config
self._session_store = session_store
# Create stream-scoped runtime
self._runtime = StreamRuntime(
@@ -221,6 +232,13 @@ class ExecutionStream:
await task
except asyncio.CancelledError:
pass
except RuntimeError as e:
# Task may be attached to a different event loop (e.g., when TUI
# uses a separate loop). Log and continue cleanup.
if "attached to a different loop" in str(e):
logger.warning(f"Task cleanup skipped (different event loop): {e}")
else:
raise
self._execution_tasks.clear()
self._active_executions.clear()
@@ -275,8 +293,21 @@ class ExecutionStream:
if not self._running:
raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")
# Generate execution ID
execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex[:8]}"
# Generate execution ID using unified session format
if self._session_store:
execution_id = self._session_store.generate_session_id()
else:
# Fallback to old format if SessionStore not available (shouldn't happen)
import warnings
warnings.warn(
"SessionStore not available, using deprecated exec_* ID format. "
"Please ensure AgentRuntime is properly initialized.",
DeprecationWarning,
stacklevel=2,
)
execution_id = f"exec_{self.stream_id}_{uuid.uuid4().hex[:8]}"
if correlation_id is None:
correlation_id = execution_id
@@ -330,9 +361,28 @@ class ExecutionStream:
# Create runtime adapter for this execution
runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)
# Start run to set trace context (CRITICAL for observability)
runtime_adapter.start_run(
goal_id=self.goal.id,
goal_description=self.goal.description,
input_data=ctx.input_data,
)
# Create per-execution runtime logger
runtime_logger = None
if self._runtime_log_store:
from framework.runtime.runtime_logger import RuntimeLogger
runtime_logger = RuntimeLogger(
store=self._runtime_log_store, agent_id=self.graph.id
)
# Create executor for this execution.
# Scope storage by execution_id so each execution gets
# fresh conversations and spillover directories.
# Each execution gets its own storage under sessions/{exec_id}/
# so conversations, spillover, and data files are all scoped
# to this execution. The executor sets data_dir via execution
# context (contextvars) so data tools and spillover share the
# same session-scoped directory.
exec_storage = self._storage.base_path / "sessions" / execution_id
executor = GraphExecutor(
runtime=runtime_adapter,
@@ -342,10 +392,15 @@ class ExecutionStream:
event_bus=self._event_bus,
stream_id=self.stream_id,
storage_path=exec_storage,
runtime_logger=runtime_logger,
loop_config=self.graph.loop_config,
)
# Track executor so inject_input() can reach EventLoopNode instances
self._active_executors[execution_id] = executor
# Write initial session state
await self._write_session_state(execution_id, ctx)
# Create modified graph with entry point
# We need to override the entry_node to use our entry point
modified_graph = self._create_modified_graph()
@@ -356,6 +411,7 @@ class ExecutionStream:
goal=self.goal,
input_data=ctx.input_data,
session_state=ctx.session_state,
checkpoint_config=self._checkpoint_config,
)
# Clean up executor reference
@@ -364,12 +420,22 @@ class ExecutionStream:
# Store result with retention
self._record_execution_result(execution_id, result)
# End run to complete trace (for observability)
runtime_adapter.end_run(
success=result.success,
narrative=f"Execution {'succeeded' if result.success else 'failed'}",
output_data=result.output,
)
# Update context
ctx.completed_at = datetime.now()
ctx.status = "completed" if result.success else "failed"
if result.paused_at:
ctx.status = "paused"
# Write final session state
await self._write_session_state(execution_id, ctx, result=result)
# Emit completion/failure event
if self._event_bus:
if result.success:
@@ -390,8 +456,42 @@ class ExecutionStream:
logger.debug(f"Execution {execution_id} completed: success={result.success}")
except asyncio.CancelledError:
ctx.status = "cancelled"
raise
# Execution was cancelled
# The executor catches CancelledError and returns a paused result,
# but if cancellation happened before executor started, we won't have a result
logger.info(f"Execution {execution_id} cancelled")
# Check if we have a result (executor completed and returned)
try:
_ = result # Check if result variable exists
has_result = True
except NameError:
has_result = False
result = ExecutionResult(
success=False,
error="Execution cancelled",
)
# Update context status based on result
if has_result and result.paused_at:
ctx.status = "paused"
ctx.completed_at = datetime.now()
else:
ctx.status = "cancelled"
# Clean up executor reference
self._active_executors.pop(execution_id, None)
# Store result with retention
self._record_execution_result(execution_id, result)
# Write session state
if has_result and result.paused_at:
await self._write_session_state(execution_id, ctx, result=result)
else:
await self._write_session_state(execution_id, ctx, error="Execution cancelled")
# Don't re-raise - we've handled it and saved state
except Exception as e:
ctx.status = "failed"
@@ -406,6 +506,19 @@ class ExecutionStream:
),
)
# Write error session state
await self._write_session_state(execution_id, ctx, error=str(e))
# End run with failure (for observability)
try:
runtime_adapter.end_run(
success=False,
narrative=f"Execution failed: {str(e)}",
output_data={},
)
except Exception:
pass # Don't let end_run errors mask the original error
# Emit failure event
if self._event_bus:
await self._event_bus.emit_execution_failed(
@@ -429,6 +542,92 @@ class ExecutionStream:
self._completion_events.pop(execution_id, None)
self._execution_tasks.pop(execution_id, None)
async def _write_session_state(
self,
execution_id: str,
ctx: ExecutionContext,
result: ExecutionResult | None = None,
error: str | None = None,
) -> None:
"""
Write state.json for a session.
Args:
execution_id: Session/execution ID
ctx: Execution context
result: Optional execution result (if completed)
error: Optional error message (if failed)
"""
# Only write if session_store is available
if not self._session_store:
return
from framework.schemas.session_state import SessionState, SessionStatus
try:
# Determine status
if result:
if result.paused_at:
status = SessionStatus.PAUSED
elif result.success:
status = SessionStatus.COMPLETED
else:
status = SessionStatus.FAILED
elif error:
# Check if this is a cancellation
if ctx.status == "cancelled" or "cancelled" in error.lower():
status = SessionStatus.CANCELLED
else:
status = SessionStatus.FAILED
else:
status = SessionStatus.ACTIVE
# Create SessionState
if result:
# Create from execution result
state = SessionState.from_execution_result(
session_id=execution_id,
goal_id=self.goal.id,
result=result,
stream_id=self.stream_id,
correlation_id=ctx.correlation_id,
started_at=ctx.started_at.isoformat(),
input_data=ctx.input_data,
agent_id=self.graph.id,
entry_point=self.entry_spec.id,
)
else:
# Create initial state
from framework.schemas.session_state import SessionTimestamps
now = datetime.now().isoformat()
state = SessionState(
session_id=execution_id,
stream_id=self.stream_id,
correlation_id=ctx.correlation_id,
goal_id=self.goal.id,
agent_id=self.graph.id,
entry_point=self.entry_spec.id,
status=status,
timestamps=SessionTimestamps(
started_at=ctx.started_at.isoformat(),
updated_at=now,
),
input_data=ctx.input_data,
)
# Handle error case
if error:
state.result.error = error
# Write state.json
await self._session_store.write_state(execution_id, state)
logger.debug(f"Wrote state.json for session {execution_id} (status={status})")
except Exception as e:
# Log but don't fail the execution
logger.error(f"Failed to write state.json for {execution_id}: {e}")
def _create_modified_graph(self) -> "GraphSpec":
"""Create a graph with the entry point overridden."""
# Use the existing graph but override entry_node
@@ -0,0 +1,142 @@
"""Pydantic models for the three-level runtime logging system.
Level 1 - SUMMARY: Per graph run pass/fail, token counts, timing
Level 2 - DETAILS: Per node completion results and attention flags
Level 3 - TOOL LOGS: Per step within any node (tool calls, LLM text, tokens)
"""
from __future__ import annotations
from typing import Any
from pydantic import BaseModel, Field
# ---------------------------------------------------------------------------
# Level 3: Tool logs (most granular) — per step within any node
# ---------------------------------------------------------------------------
class ToolCallLog(BaseModel):
"""A single tool call within a step."""
tool_use_id: str
tool_name: str
tool_input: dict[str, Any] = Field(default_factory=dict)
result: str = ""
is_error: bool = False
class NodeStepLog(BaseModel):
"""Full tool and LLM details for one step within a node.
For EventLoopNode, each iteration is a step. For single-step nodes
(LLMNode, FunctionNode, RouterNode), step_index is 0.
OTel-aligned fields (trace_id, span_id, execution_id) enable correlation
and future OpenTelemetry export without schema changes.
"""
node_id: str
node_type: str = "" # "event_loop"|"llm_tool_use"|"llm_generate"|"function"|"router"
step_index: int = 0 # iteration number for event_loop, 0 for single-step nodes
llm_text: str = ""
tool_calls: list[ToolCallLog] = Field(default_factory=list)
input_tokens: int = 0
output_tokens: int = 0
latency_ms: int = 0
# EventLoopNode only:
verdict: str = "" # "ACCEPT"|"RETRY"|"ESCALATE"|"CONTINUE"
verdict_feedback: str = ""
# Error tracking:
error: str = "" # Error message if step failed
stacktrace: str = "" # Full stack trace if exception occurred
is_partial: bool = False # True if step didn't complete normally
# OTel / trace context (from observability; empty if not set):
trace_id: str = "" # OTel trace id (e.g. from set_trace_context)
span_id: str = "" # OTel span id (16 hex chars per step)
parent_span_id: str = "" # Optional; for nested span hierarchy
execution_id: str = "" # Session/run correlation id
# ---------------------------------------------------------------------------
# Level 2: Per-node completion details
# ---------------------------------------------------------------------------
class NodeDetail(BaseModel):
"""Per-node completion result and attention flags.
OTel-aligned fields (trace_id, span_id) tie L2 to the same trace as L3.
"""
node_id: str
node_name: str = ""
node_type: str = ""
success: bool = True
error: str | None = None
stacktrace: str = "" # Full stack trace if exception occurred
total_steps: int = 0
tokens_used: int = 0 # combined input+output from NodeResult
input_tokens: int = 0
output_tokens: int = 0
latency_ms: int = 0
attempt: int = 1 # retry attempt number
# EventLoopNode-specific:
exit_status: str = "" # "success"|"failure"|"stalled"|"escalated"|"paused"|"guard_failure"
accept_count: int = 0
retry_count: int = 0
escalate_count: int = 0
continue_count: int = 0
needs_attention: bool = False
attention_reasons: list[str] = Field(default_factory=list)
# OTel / trace context (from observability; empty if not set):
trace_id: str = ""
span_id: str = "" # Optional node-level span for hierarchy
# ---------------------------------------------------------------------------
# Level 1: Run summary — one per full graph execution
# ---------------------------------------------------------------------------
class RunSummaryLog(BaseModel):
"""Run-level summary for a full graph execution.
OTel-aligned fields (trace_id, execution_id) tie L1 to the same trace as L2/L3.
"""
run_id: str
agent_id: str = ""
goal_id: str = ""
status: str = "" # "success"|"failure"|"degraded"
total_nodes_executed: int = 0
node_path: list[str] = Field(default_factory=list)
total_input_tokens: int = 0
total_output_tokens: int = 0
needs_attention: bool = False
attention_reasons: list[str] = Field(default_factory=list)
started_at: str = "" # ISO timestamp
duration_ms: int = 0
execution_quality: str = "" # "clean"|"degraded"|"failed"
# OTel / trace context (from observability; empty if not set):
trace_id: str = ""
execution_id: str = ""
# ---------------------------------------------------------------------------
# Container models for file serialization
# ---------------------------------------------------------------------------
class RunDetailsLog(BaseModel):
"""Level 2 container: all node details for a run."""
run_id: str
nodes: list[NodeDetail] = Field(default_factory=list)
class RunToolLogs(BaseModel):
"""Level 3 container: all step logs for a run."""
run_id: str
steps: list[NodeStepLog] = Field(default_factory=list)
+297
View File
@@ -0,0 +1,297 @@
"""File-based storage for runtime logs.
Each run gets its own directory under ``runs/``. No shared mutable index
``list_runs()`` scans the directory and loads summary.json from each run.
This eliminates concurrency issues when parallel EventLoopNodes write
simultaneously.
L2 (details) and L3 (tool logs) use JSONL (one JSON object per line) for
incremental append-on-write. This provides crash resilience data is on
disk as soon as it's logged, not only at end_run(). L1 (summary) is still
written once at end as a regular JSON file since it aggregates L2.
Storage layout (current)::
{base_path}/
sessions/
{session_id}/
logs/
summary.json # Level 1 — written once at end
details.jsonl # Level 2 — appended per node completion
tool_logs.jsonl # Level 3 — appended per step
"""
from __future__ import annotations
import asyncio
import json
import logging
from datetime import UTC, datetime
from pathlib import Path
from framework.runtime.runtime_log_schemas import (
NodeDetail,
NodeStepLog,
RunDetailsLog,
RunSummaryLog,
RunToolLogs,
)
logger = logging.getLogger(__name__)
class RuntimeLogStore:
"""Persists runtime logs at three levels. Thread-safe via per-run directories."""
def __init__(self, base_path: Path) -> None:
self._base_path = base_path
# Note: _runs_dir is determined per-run_id by _get_run_dir()
def _get_run_dir(self, run_id: str) -> Path:
"""Determine run directory path based on run_id format.
- New format (session_*): {storage_root}/sessions/{run_id}/logs/
- Old format (anything else): {base_path}/runs/{run_id}/ (deprecated)
"""
if run_id.startswith("session_"):
is_runtime_logs = self._base_path.name == "runtime_logs"
root = self._base_path.parent if is_runtime_logs else self._base_path
return root / "sessions" / run_id / "logs"
import warnings
warnings.warn(
f"Reading logs from deprecated location for run_id={run_id}. "
"New sessions use unified storage at sessions/session_*/logs/",
DeprecationWarning,
stacklevel=3,
)
return self._base_path / "runs" / run_id
# -------------------------------------------------------------------
# Incremental write (sync — called from locked sections)
# -------------------------------------------------------------------
def ensure_run_dir(self, run_id: str) -> None:
"""Create the run directory immediately. Called by start_run()."""
run_dir = self._get_run_dir(run_id)
run_dir.mkdir(parents=True, exist_ok=True)
def append_step(self, run_id: str, step: NodeStepLog) -> None:
"""Append one JSONL line to tool_logs.jsonl. Sync."""
path = self._get_run_dir(run_id) / "tool_logs.jsonl"
line = json.dumps(step.model_dump(), ensure_ascii=False) + "\n"
with open(path, "a", encoding="utf-8") as f:
f.write(line)
def append_node_detail(self, run_id: str, detail: NodeDetail) -> None:
"""Append one JSONL line to details.jsonl. Sync."""
path = self._get_run_dir(run_id) / "details.jsonl"
line = json.dumps(detail.model_dump(), ensure_ascii=False) + "\n"
with open(path, "a", encoding="utf-8") as f:
f.write(line)
def read_node_details_sync(self, run_id: str) -> list[NodeDetail]:
"""Read details.jsonl back into a list of NodeDetail. Sync.
Used by end_run() to aggregate L2 into L1. Skips corrupt lines.
"""
path = self._get_run_dir(run_id) / "details.jsonl"
return _read_jsonl_as_models(path, NodeDetail)
# -------------------------------------------------------------------
# Summary write (async — called from end_run)
# -------------------------------------------------------------------
async def save_summary(self, run_id: str, summary: RunSummaryLog) -> None:
"""Write summary.json atomically. Called once at end_run()."""
run_dir = self._get_run_dir(run_id)
await asyncio.to_thread(run_dir.mkdir, parents=True, exist_ok=True)
await self._write_json(run_dir / "summary.json", summary.model_dump())
# -------------------------------------------------------------------
# Read
# -------------------------------------------------------------------
async def load_summary(self, run_id: str) -> RunSummaryLog | None:
"""Load Level 1 summary for a specific run."""
data = await self._read_json(self._get_run_dir(run_id) / "summary.json")
return RunSummaryLog(**data) if data is not None else None
async def load_details(self, run_id: str) -> RunDetailsLog | None:
"""Load Level 2 details from details.jsonl for a specific run."""
path = self._get_run_dir(run_id) / "details.jsonl"
def _read() -> RunDetailsLog | None:
if not path.exists():
return None
nodes = _read_jsonl_as_models(path, NodeDetail)
return RunDetailsLog(run_id=run_id, nodes=nodes)
return await asyncio.to_thread(_read)
async def load_tool_logs(self, run_id: str) -> RunToolLogs | None:
"""Load Level 3 tool logs from tool_logs.jsonl for a specific run."""
path = self._get_run_dir(run_id) / "tool_logs.jsonl"
def _read() -> RunToolLogs | None:
if not path.exists():
return None
steps = _read_jsonl_as_models(path, NodeStepLog)
return RunToolLogs(run_id=run_id, steps=steps)
return await asyncio.to_thread(_read)
async def list_runs(
self,
status: str = "",
needs_attention: bool | None = None,
limit: int = 20,
) -> list[RunSummaryLog]:
"""Scan both old and new directory structures, load summaries, filter, and sort.
Scans:
- Old: base_path/runs/{run_id}/
- New: base_path/sessions/{session_id}/logs/
Directories without summary.json are treated as in-progress runs and
get a synthetic summary with status="in_progress".
"""
entries = await asyncio.to_thread(self._scan_run_dirs)
summaries: list[RunSummaryLog] = []
for run_id in entries:
summary = await self.load_summary(run_id)
if summary is None:
# In-progress run: no summary.json yet. Synthesize one.
run_dir = self._get_run_dir(run_id)
if not run_dir.is_dir():
continue
summary = RunSummaryLog(
run_id=run_id,
status="in_progress",
started_at=_infer_started_at(run_id),
)
if status and status != "needs_attention" and summary.status != status:
continue
if status == "needs_attention" and not summary.needs_attention:
continue
if needs_attention is not None and summary.needs_attention != needs_attention:
continue
summaries.append(summary)
# Sort by started_at descending (most recent first)
summaries.sort(key=lambda s: s.started_at, reverse=True)
return summaries[:limit]
# -------------------------------------------------------------------
# Internal helpers
# -------------------------------------------------------------------
def _scan_run_dirs(self) -> list[str]:
"""Return list of run_id directory names from both old and new locations.
Scans:
- New: base_path/sessions/{session_id}/logs/ (preferred)
- Old: base_path/runs/{run_id}/ (deprecated, backward compatibility)
Returns run_ids/session_ids. Includes all directories, not just those
with summary.json, so in-progress runs are visible.
"""
run_ids = []
# Scan new location: base_path/sessions/{session_id}/logs/
# Determine the correct base path for sessions
is_runtime_logs = self._base_path.name == "runtime_logs"
root = self._base_path.parent if is_runtime_logs else self._base_path
sessions_dir = root / "sessions"
if sessions_dir.exists():
for session_dir in sessions_dir.iterdir():
if session_dir.is_dir() and session_dir.name.startswith("session_"):
logs_dir = session_dir / "logs"
if logs_dir.exists() and logs_dir.is_dir():
run_ids.append(session_dir.name)
# Scan old location: base_path/runs/ (deprecated)
old_runs_dir = self._base_path / "runs"
if old_runs_dir.exists():
old_ids = [d.name for d in old_runs_dir.iterdir() if d.is_dir()]
if old_ids:
import warnings
warnings.warn(
f"Found {len(old_ids)} runs in deprecated location. "
"Consider migrating to unified session storage.",
DeprecationWarning,
stacklevel=3,
)
run_ids.extend(old_ids)
return run_ids
@staticmethod
async def _write_json(path: Path, data: dict) -> None:
"""Write JSON atomically: write to .tmp then rename."""
tmp = path.with_suffix(".tmp")
content = json.dumps(data, indent=2, ensure_ascii=False)
def _write() -> None:
tmp.write_text(content, encoding="utf-8")
tmp.rename(path)
await asyncio.to_thread(_write)
@staticmethod
async def _read_json(path: Path) -> dict | None:
"""Read and parse a JSON file. Returns None if missing or corrupt."""
def _read() -> dict | None:
if not path.exists():
return None
try:
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to read %s: %s", path, e)
return None
return await asyncio.to_thread(_read)
# -------------------------------------------------------------------
# Module-level helpers
# -------------------------------------------------------------------
def _read_jsonl_as_models(path: Path, model_cls: type) -> list:
"""Parse a JSONL file into a list of Pydantic model instances.
Skips blank lines and corrupt JSON lines (partial writes from crashes).
"""
results = []
if not path.exists():
return results
try:
with open(path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
data = json.loads(line)
results.append(model_cls(**data))
except (json.JSONDecodeError, Exception) as e:
logger.warning("Skipping corrupt JSONL line in %s: %s", path, e)
continue
except OSError as e:
logger.warning("Failed to read %s: %s", path, e)
return results
def _infer_started_at(run_id: str) -> str:
"""Best-effort ISO timestamp from a run_id like '20250101T120000_abc12345'."""
try:
ts_part = run_id.split("_")[0] # '20250101T120000'
dt = datetime.strptime(ts_part, "%Y%m%dT%H%M%S").replace(tzinfo=UTC)
return dt.isoformat()
except (ValueError, IndexError):
return ""
+326
View File
@@ -0,0 +1,326 @@
"""RuntimeLogger: captures runtime data during graph execution.
Injected into GraphExecutor as an optional parameter. Each log_step() and
log_node_complete() call writes immediately to disk (JSONL append). Only
the L1 summary is written at end_run() since it aggregates L2 data.
This provides crash resilience L2 and L3 data survives process death
without needing end_run() to complete.
Usage::
store = RuntimeLogStore(Path(work_dir) / "runtime_logs")
runtime_logger = RuntimeLogger(store=store, agent_id="my-agent")
executor = GraphExecutor(..., runtime_logger=runtime_logger)
# After execution, logger has persisted all data to store
Safety: ``end_run()`` catches all exceptions internally and logs them via
the Python logger. Logging failure must never kill a successful run.
"""
from __future__ import annotations
import logging
import threading
import uuid
from datetime import UTC, datetime
from typing import Any
from framework.observability import get_trace_context
from framework.runtime.runtime_log_schemas import (
NodeDetail,
NodeStepLog,
RunSummaryLog,
ToolCallLog,
)
from framework.runtime.runtime_log_store import RuntimeLogStore
logger = logging.getLogger(__name__)
class RuntimeLogger:
"""Captures runtime data during graph execution.
Thread-safe: uses a lock around file appends for parallel node safety.
"""
def __init__(self, store: RuntimeLogStore, agent_id: str = "") -> None:
self._store = store
self._agent_id = agent_id
self._run_id = ""
self._goal_id = ""
self._started_at = ""
self._logged_node_ids: set[str] = set()
self._lock = threading.Lock()
def start_run(self, goal_id: str = "", session_id: str = "") -> str:
"""Start a new run. Called by GraphExecutor at graph start. Returns run_id.
Args:
goal_id: Goal ID for this run
session_id: Optional session ID. If provided, uses it as run_id (for unified sessions).
Otherwise generates a new run_id in old format.
Returns:
The run_id (same as session_id if provided)
"""
if session_id:
self._run_id = session_id
else:
ts = datetime.now(UTC).strftime("%Y%m%dT%H%M%S")
short_uuid = uuid.uuid4().hex[:8]
self._run_id = f"{ts}_{short_uuid}"
self._goal_id = goal_id
self._started_at = datetime.now(UTC).isoformat()
self._logged_node_ids = set()
self._store.ensure_run_dir(self._run_id)
return self._run_id
def log_step(
self,
node_id: str,
node_type: str,
step_index: int,
llm_text: str = "",
tool_calls: list[dict[str, Any]] | None = None,
input_tokens: int = 0,
output_tokens: int = 0,
latency_ms: int = 0,
verdict: str = "",
verdict_feedback: str = "",
error: str = "",
stacktrace: str = "",
is_partial: bool = False,
) -> None:
"""Record data for one step within a node.
Called by any node during execution. Synchronous, appends to JSONL file.
Args:
error: Error message if step failed
stacktrace: Full stack trace if exception occurred
is_partial: True if step didn't complete normally (e.g., LLM call crashed)
"""
if tool_calls is None:
tool_calls = []
call_logs = []
for tc in tool_calls:
call_logs.append(
ToolCallLog(
tool_use_id=tc.get("tool_use_id", ""),
tool_name=tc.get("tool_name", ""),
tool_input=tc.get("tool_input", {}),
result=tc.get("content", ""),
is_error=tc.get("is_error", False),
)
)
# OTel / trace context: from observability ContextVar (empty if not set)
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
execution_id = ctx.get("execution_id", "")
span_id = uuid.uuid4().hex[:16] # OTel 16-hex span_id per step
step_log = NodeStepLog(
node_id=node_id,
node_type=node_type,
step_index=step_index,
llm_text=llm_text,
tool_calls=call_logs,
input_tokens=input_tokens,
output_tokens=output_tokens,
latency_ms=latency_ms,
verdict=verdict,
verdict_feedback=verdict_feedback,
error=error,
stacktrace=stacktrace,
is_partial=is_partial,
trace_id=trace_id,
span_id=span_id,
execution_id=execution_id,
)
with self._lock:
self._store.append_step(self._run_id, step_log)
def log_node_complete(
self,
node_id: str,
node_name: str,
node_type: str,
success: bool,
error: str | None = None,
stacktrace: str = "",
total_steps: int = 0,
tokens_used: int = 0,
input_tokens: int = 0,
output_tokens: int = 0,
latency_ms: int = 0,
attempt: int = 1,
# EventLoopNode-specific kwargs:
exit_status: str = "",
accept_count: int = 0,
retry_count: int = 0,
escalate_count: int = 0,
continue_count: int = 0,
) -> None:
"""Record completion of a node.
Called after each node completes. EventLoopNode calls this with
verdict counts and exit_status. Other nodes: executor calls this
from NodeResult data.
"""
needs_attention = not success
attention_reasons: list[str] = []
if not success and error:
attention_reasons.append(f"Node {node_id} failed: {error}")
# Enhanced attention flags
if retry_count > 3:
needs_attention = True
attention_reasons.append(f"Excessive retries: {retry_count}")
if escalate_count > 2:
needs_attention = True
attention_reasons.append(f"Excessive escalations: {escalate_count}")
if latency_ms > 60000: # > 1 minute
needs_attention = True
attention_reasons.append(f"High latency: {latency_ms}ms")
if tokens_used > 100000: # High token usage
needs_attention = True
attention_reasons.append(f"High token usage: {tokens_used}")
if total_steps > 20: # Many iterations
needs_attention = True
attention_reasons.append(f"Many iterations: {total_steps}")
# OTel / trace context for L2 correlation
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
span_id = uuid.uuid4().hex[:16] # Optional node-level span
detail = NodeDetail(
node_id=node_id,
node_name=node_name,
node_type=node_type,
success=success,
error=error,
stacktrace=stacktrace,
total_steps=total_steps,
tokens_used=tokens_used,
input_tokens=input_tokens,
output_tokens=output_tokens,
latency_ms=latency_ms,
attempt=attempt,
exit_status=exit_status,
accept_count=accept_count,
retry_count=retry_count,
escalate_count=escalate_count,
continue_count=continue_count,
needs_attention=needs_attention,
attention_reasons=attention_reasons,
trace_id=trace_id,
span_id=span_id,
)
with self._lock:
self._store.append_node_detail(self._run_id, detail)
self._logged_node_ids.add(node_id)
def ensure_node_logged(
self,
node_id: str,
node_name: str,
node_type: str,
success: bool,
error: str | None = None,
stacktrace: str = "",
tokens_used: int = 0,
latency_ms: int = 0,
) -> None:
"""Fallback: ensure a node has an L2 entry.
Called by executor after each node returns. If node_id already
appears in _logged_node_ids (because the node called log_node_complete
itself), this is a no-op. Otherwise appends a basic NodeDetail.
"""
with self._lock:
if node_id in self._logged_node_ids:
return # Already logged by the node itself
# Not yet logged — create a basic entry
self.log_node_complete(
node_id=node_id,
node_name=node_name,
node_type=node_type,
success=success,
error=error,
stacktrace=stacktrace,
tokens_used=tokens_used,
latency_ms=latency_ms,
)
async def end_run(
self,
status: str,
duration_ms: int,
node_path: list[str] | None = None,
execution_quality: str = "",
) -> None:
"""Read L2 from disk, aggregate into L1, write summary.json.
Called by GraphExecutor when graph finishes. Async, writes 1 file.
Catches all exceptions internally -- logging failure must not
propagate to the caller.
"""
try:
# Read L2 back from disk to aggregate into L1
node_details = self._store.read_node_details_sync(self._run_id)
total_input = sum(nd.input_tokens for nd in node_details)
total_output = sum(nd.output_tokens for nd in node_details)
needs_attention = any(nd.needs_attention for nd in node_details)
attention_reasons: list[str] = []
for nd in node_details:
attention_reasons.extend(nd.attention_reasons)
# OTel / trace context for L1 correlation
ctx = get_trace_context()
trace_id = ctx.get("trace_id", "")
execution_id = ctx.get("execution_id", "")
summary = RunSummaryLog(
run_id=self._run_id,
agent_id=self._agent_id,
goal_id=self._goal_id,
status=status,
total_nodes_executed=len(node_details),
node_path=node_path or [],
total_input_tokens=total_input,
total_output_tokens=total_output,
needs_attention=needs_attention,
attention_reasons=attention_reasons,
started_at=self._started_at,
duration_ms=duration_ms,
execution_quality=execution_quality,
trace_id=trace_id,
execution_id=execution_id,
)
await self._store.save_summary(self._run_id, summary)
logger.info(
"Runtime logs saved: run_id=%s status=%s nodes=%d",
self._run_id,
status,
len(node_details),
)
except Exception:
logger.exception(
"Failed to save runtime logs for run_id=%s (non-fatal)",
self._run_id,
)
+11
View File
@@ -12,6 +12,7 @@ import uuid
from datetime import datetime
from typing import TYPE_CHECKING, Any
from framework.observability import set_trace_context
from framework.schemas.decision import Decision, DecisionType, Option, Outcome
from framework.schemas.run import Run, RunStatus
from framework.storage.concurrent import ConcurrentStorage
@@ -119,6 +120,16 @@ class StreamRuntime:
"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
run_id = f"run_{self.stream_id}_{timestamp}_{uuid.uuid4().hex[:8]}"
trace_id = uuid.uuid4().hex
otel_execution_id = uuid.uuid4().hex # 32 hex, OTel/W3C-aligned for logs
set_trace_context(
trace_id=trace_id,
execution_id=otel_execution_id,
run_id=run_id,
goal_id=goal_id,
stream_id=self.stream_id,
)
run = Run(
id=run_id,
+178
View File
@@ -0,0 +1,178 @@
"""
Checkpoint Schema - Execution state snapshots for resumability.
Checkpoints capture the execution state at strategic points (node boundaries,
iterations) to enable crash recovery and resume-from-failure scenarios.
"""
from datetime import datetime
from typing import Any
from pydantic import BaseModel, Field
class Checkpoint(BaseModel):
"""
Single checkpoint in execution timeline.
Captures complete execution state at a specific point to enable
resuming from that exact point after failures or pauses.
"""
# Identity
checkpoint_id: str # Format: cp_{type}_{node_id}_{timestamp}
checkpoint_type: str # "node_start" | "node_complete" | "loop_iteration"
session_id: str
# Timestamps
created_at: str # ISO 8601 format
# Execution state
current_node: str | None = None
next_node: str | None = None # For edge_transition checkpoints
execution_path: list[str] = Field(default_factory=list) # Nodes executed so far
# State snapshots
shared_memory: dict[str, Any] = Field(default_factory=dict) # Full SharedMemory._data
accumulated_outputs: dict[str, Any] = Field(default_factory=dict) # Outputs accumulated so far
# Execution metrics (for resuming quality tracking)
metrics_snapshot: dict[str, Any] = Field(default_factory=dict)
# Metadata
is_clean: bool = True # True if no failures/retries before this checkpoint
description: str = "" # Human-readable checkpoint description
model_config = {"extra": "allow"}
@classmethod
def create(
cls,
checkpoint_type: str,
session_id: str,
current_node: str,
execution_path: list[str],
shared_memory: dict[str, Any],
next_node: str | None = None,
accumulated_outputs: dict[str, Any] | None = None,
metrics_snapshot: dict[str, Any] | None = None,
is_clean: bool = True,
description: str = "",
) -> "Checkpoint":
"""
Create a new checkpoint with generated ID and timestamp.
Args:
checkpoint_type: Type of checkpoint (node_start, node_complete, etc.)
session_id: Session this checkpoint belongs to
current_node: Node ID at checkpoint time
execution_path: List of node IDs executed so far
shared_memory: Full memory state snapshot
next_node: Next node to execute (for node_complete checkpoints)
accumulated_outputs: Outputs accumulated so far
metrics_snapshot: Execution metrics at checkpoint time
is_clean: Whether execution was clean up to this point
description: Human-readable description
Returns:
New Checkpoint instance
"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
checkpoint_id = f"cp_{checkpoint_type}_{current_node}_{timestamp}"
if not description:
description = f"{checkpoint_type.replace('_', ' ').title()}: {current_node}"
return cls(
checkpoint_id=checkpoint_id,
checkpoint_type=checkpoint_type,
session_id=session_id,
created_at=datetime.now().isoformat(),
current_node=current_node,
next_node=next_node,
execution_path=execution_path,
shared_memory=shared_memory,
accumulated_outputs=accumulated_outputs or {},
metrics_snapshot=metrics_snapshot or {},
is_clean=is_clean,
description=description,
)
class CheckpointSummary(BaseModel):
"""
Lightweight checkpoint metadata for index listings.
Used in checkpoint index to provide fast scanning without
loading full checkpoint data.
"""
checkpoint_id: str
checkpoint_type: str
created_at: str
current_node: str | None = None
next_node: str | None = None
is_clean: bool = True
description: str = ""
model_config = {"extra": "allow"}
@classmethod
def from_checkpoint(cls, checkpoint: Checkpoint) -> "CheckpointSummary":
"""Create summary from full checkpoint."""
return cls(
checkpoint_id=checkpoint.checkpoint_id,
checkpoint_type=checkpoint.checkpoint_type,
created_at=checkpoint.created_at,
current_node=checkpoint.current_node,
next_node=checkpoint.next_node,
is_clean=checkpoint.is_clean,
description=checkpoint.description,
)
class CheckpointIndex(BaseModel):
"""
Manifest of all checkpoints for a session.
Provides fast lookup and filtering without loading
full checkpoint files.
"""
session_id: str
checkpoints: list[CheckpointSummary] = Field(default_factory=list)
latest_checkpoint_id: str | None = None
total_checkpoints: int = 0
model_config = {"extra": "allow"}
def add_checkpoint(self, checkpoint: Checkpoint) -> None:
"""Add a checkpoint to the index."""
summary = CheckpointSummary.from_checkpoint(checkpoint)
self.checkpoints.append(summary)
self.latest_checkpoint_id = checkpoint.checkpoint_id
self.total_checkpoints = len(self.checkpoints)
def get_checkpoint_summary(self, checkpoint_id: str) -> CheckpointSummary | None:
"""Get checkpoint summary by ID."""
for summary in self.checkpoints:
if summary.checkpoint_id == checkpoint_id:
return summary
return None
def filter_by_type(self, checkpoint_type: str) -> list[CheckpointSummary]:
"""Filter checkpoints by type."""
return [cp for cp in self.checkpoints if cp.checkpoint_type == checkpoint_type]
def filter_by_node(self, node_id: str) -> list[CheckpointSummary]:
"""Filter checkpoints by current_node."""
return [cp for cp in self.checkpoints if cp.current_node == node_id]
def get_clean_checkpoints(self) -> list[CheckpointSummary]:
"""Get all clean checkpoints (no failures before them)."""
return [cp for cp in self.checkpoints if cp.is_clean]
def get_latest_clean_checkpoint(self) -> CheckpointSummary | None:
"""Get the most recent clean checkpoint."""
clean = self.get_clean_checkpoints()
return clean[-1] if clean else None
+287
View File
@@ -0,0 +1,287 @@
"""
Session State Schema - Unified state for session execution.
This schema consolidates data from Run, ExecutionResult, and runtime logs
into a single source of truth for session status and resumability.
"""
from datetime import datetime
from enum import StrEnum
from typing import TYPE_CHECKING, Any
from pydantic import BaseModel, Field, computed_field
if TYPE_CHECKING:
from framework.graph.executor import ExecutionResult
from framework.schemas.run import Run
class SessionStatus(StrEnum):
"""Status of a session execution."""
ACTIVE = "active" # Currently executing
PAUSED = "paused" # Waiting for resume (client input, pause node)
COMPLETED = "completed" # Finished successfully
FAILED = "failed" # Finished with error
CANCELLED = "cancelled" # User/system cancelled
class SessionTimestamps(BaseModel):
"""Timestamps tracking session lifecycle."""
started_at: str # ISO 8601 format
updated_at: str # ISO 8601 format (updated on every state write)
completed_at: str | None = None
paused_at_time: str | None = None # When it was paused
model_config = {"extra": "allow"}
class SessionProgress(BaseModel):
"""Execution progress tracking."""
current_node: str | None = None
paused_at: str | None = None # Node ID where paused
resume_from: str | None = None # Entry point or node ID to resume from
steps_executed: int = 0
total_tokens: int = 0
total_latency_ms: int = 0
path: list[str] = Field(default_factory=list) # Node IDs traversed
# Quality metrics (from ExecutionResult)
total_retries: int = 0
nodes_with_failures: list[str] = Field(default_factory=list)
retry_details: dict[str, int] = Field(default_factory=dict)
had_partial_failures: bool = False
execution_quality: str = "clean" # "clean", "degraded", or "failed"
node_visit_counts: dict[str, int] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class SessionResult(BaseModel):
"""Final result of session execution."""
success: bool | None = None # None if still running
error: str | None = None
output: dict[str, Any] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class SessionMetrics(BaseModel):
"""Execution metrics (from Run.metrics)."""
decision_count: int = 0
problem_count: int = 0
total_input_tokens: int = 0
total_output_tokens: int = 0
nodes_executed: list[str] = Field(default_factory=list)
edges_traversed: list[str] = Field(default_factory=list)
model_config = {"extra": "allow"}
class SessionState(BaseModel):
"""
Complete state for a session execution.
This is the single source of truth for session status and resumability.
Consolidates data from ExecutionResult, ExecutionContext, Run, and runtime logs.
Version History:
- v1.0: Initial schema (2026-02-06)
- v1.1: Added checkpoint support (2026-02-08)
"""
# Schema version for forward/backward compatibility
schema_version: str = "1.1"
# Identity
session_id: str # Format: session_YYYYMMDD_HHMMSS_{uuid_8char}
stream_id: str = "" # Which ExecutionStream created this
correlation_id: str = "" # For correlating related executions
# Status
status: SessionStatus = SessionStatus.ACTIVE
# Goal/Agent context
goal_id: str
agent_id: str = ""
entry_point: str = "start"
# Timestamps
timestamps: SessionTimestamps
# Progress
progress: SessionProgress = Field(default_factory=SessionProgress)
# Result
result: SessionResult = Field(default_factory=SessionResult)
# Memory (for resumability)
memory: dict[str, Any] = Field(default_factory=dict)
# Metrics
metrics: SessionMetrics = Field(default_factory=SessionMetrics)
# Problems (from Run.problems)
problems: list[dict[str, Any]] = Field(default_factory=list)
# Decisions (from Run.decisions - can be large, so store references)
decisions: list[dict[str, Any]] = Field(default_factory=list)
# Input data (for debugging/replay)
input_data: dict[str, Any] = Field(default_factory=dict)
# Isolation level (from ExecutionContext)
isolation_level: str = "shared"
# Checkpointing (for crash recovery and resume-from-failure)
checkpoint_enabled: bool = False
latest_checkpoint_id: str | None = None
model_config = {"extra": "allow"}
@computed_field
@property
def duration_ms(self) -> int:
"""Duration of the session in milliseconds."""
if not self.timestamps.completed_at:
return 0
started = datetime.fromisoformat(self.timestamps.started_at)
completed = datetime.fromisoformat(self.timestamps.completed_at)
return int((completed - started).total_seconds() * 1000)
@computed_field
@property
def is_resumable(self) -> bool:
"""Can this session be resumed?"""
return self.status == SessionStatus.PAUSED and self.progress.resume_from is not None
@computed_field
@property
def is_resumable_from_checkpoint(self) -> bool:
"""Can this session be resumed from a checkpoint?"""
# ANY session with checkpoints can be resumed (not just failed ones)
# This enables: pause/resume, iterative execution, continuation after completion
return self.checkpoint_enabled and self.latest_checkpoint_id is not None
@classmethod
def from_execution_result(
cls,
session_id: str,
goal_id: str,
result: "ExecutionResult",
stream_id: str = "",
correlation_id: str = "",
started_at: str = "",
input_data: dict[str, Any] | None = None,
agent_id: str = "",
entry_point: str = "start",
) -> "SessionState":
"""Create SessionState from ExecutionResult."""
now = datetime.now().isoformat()
# Determine status based on execution result
if result.paused_at:
status = SessionStatus.PAUSED
elif result.success:
status = SessionStatus.COMPLETED
else:
status = SessionStatus.FAILED
return cls(
session_id=session_id,
stream_id=stream_id,
correlation_id=correlation_id,
goal_id=goal_id,
agent_id=agent_id,
entry_point=entry_point,
status=status,
timestamps=SessionTimestamps(
started_at=started_at or now,
updated_at=now,
completed_at=now if not result.paused_at else None,
paused_at_time=now if result.paused_at else None,
),
progress=SessionProgress(
current_node=result.paused_at or (result.path[-1] if result.path else None),
paused_at=result.paused_at,
resume_from=result.session_state.get("resume_from")
if result.session_state
else None,
steps_executed=result.steps_executed,
total_tokens=result.total_tokens,
total_latency_ms=result.total_latency_ms,
path=result.path,
total_retries=result.total_retries,
nodes_with_failures=result.nodes_with_failures,
retry_details=result.retry_details,
had_partial_failures=result.had_partial_failures,
execution_quality=result.execution_quality,
node_visit_counts=result.node_visit_counts,
),
result=SessionResult(
success=result.success,
error=result.error,
output=result.output,
),
memory=result.session_state.get("memory", {}) if result.session_state else {},
input_data=input_data or {},
)
@classmethod
def from_legacy_run(cls, run: "Run", session_id: str, stream_id: str = "") -> "SessionState":
"""Create SessionState from legacy Run object."""
from framework.schemas.run import RunStatus
now = datetime.now().isoformat()
# Map RunStatus to SessionStatus
status_mapping = {
RunStatus.RUNNING: SessionStatus.ACTIVE,
RunStatus.COMPLETED: SessionStatus.COMPLETED,
RunStatus.FAILED: SessionStatus.FAILED,
RunStatus.CANCELLED: SessionStatus.CANCELLED,
RunStatus.STUCK: SessionStatus.FAILED,
}
status = status_mapping.get(run.status, SessionStatus.FAILED)
return cls(
schema_version="1.0",
session_id=session_id,
stream_id=stream_id,
goal_id=run.goal_id,
status=status,
timestamps=SessionTimestamps(
started_at=run.started_at.isoformat(),
updated_at=now,
completed_at=run.completed_at.isoformat() if run.completed_at else None,
),
result=SessionResult(
success=run.status == RunStatus.COMPLETED,
output=run.output_data,
),
metrics=SessionMetrics(
decision_count=run.metrics.total_decisions,
problem_count=len(run.problems),
total_input_tokens=run.metrics.total_tokens, # Approximate
total_output_tokens=0, # Not tracked in old format
nodes_executed=run.metrics.nodes_executed,
edges_traversed=run.metrics.edges_traversed,
),
decisions=[d.model_dump() for d in run.decisions],
problems=[p.model_dump() for p in run.problems],
input_data=run.input_data,
)
def to_session_state_dict(self) -> dict[str, Any]:
"""Convert to session_state format for GraphExecutor.execute()."""
return {
"paused_at": self.progress.paused_at,
"resume_from": self.progress.resume_from,
"memory": self.memory,
"next_node": None,
}
+93 -40
View File
@@ -1,7 +1,10 @@
"""
File-based storage backend for runtime data.
Stores runs as JSON files with indexes for efficient querying.
DEPRECATED: This storage backend is deprecated for new sessions.
New sessions use unified storage at sessions/{session_id}/state.json.
This module is kept for backward compatibility with old run data only.
Uses Pydantic's built-in serialization.
"""
@@ -14,21 +17,24 @@ from framework.utils.io import atomic_write
class FileStorage:
"""
Simple file-based storage for runs.
DEPRECATED: File-based storage for old runs only.
Directory structure:
New sessions use unified storage at sessions/{session_id}/state.json.
This class is kept for backward compatibility with old run data.
Old directory structure (deprecated):
{base_path}/
runs/
{run_id}.json # Full run data
indexes/
runs/ # DEPRECATED - no longer written
{run_id}.json
summaries/ # DEPRECATED - no longer written
{run_id}.json
indexes/ # DEPRECATED - no longer written or read
by_goal/
{goal_id}.json # List of run IDs for this goal
{goal_id}.json
by_status/
{status}.json # List of run IDs with this status
{status}.json
by_node/
{node_id}.json # List of run IDs that used this node
summaries/
{run_id}.json # Run summary (for quick loading)
{node_id}.json
"""
def __init__(self, base_path: str | Path):
@@ -36,16 +42,14 @@ class FileStorage:
self._ensure_dirs()
def _ensure_dirs(self) -> None:
"""Create directory structure if it doesn't exist."""
dirs = [
self.base_path / "runs",
self.base_path / "indexes" / "by_goal",
self.base_path / "indexes" / "by_status",
self.base_path / "indexes" / "by_node",
self.base_path / "summaries",
]
for d in dirs:
d.mkdir(parents=True, exist_ok=True)
"""Create directory structure if it doesn't exist.
DEPRECATED: All directories (runs/, summaries/, indexes/) are deprecated.
New sessions use unified storage at sessions/{session_id}/state.json.
This method is now a no-op. Tests should not rely on this.
"""
# No-op: do not create deprecated directories
pass
def _validate_key(self, key: str) -> None:
"""
@@ -84,23 +88,22 @@ class FileStorage:
# === RUN OPERATIONS ===
def save_run(self, run: Run) -> None:
"""Save a run to storage."""
# Save full run using Pydantic's model_dump_json
run_path = self.base_path / "runs" / f"{run.id}.json"
with atomic_write(run_path) as f:
f.write(run.model_dump_json(indent=2))
"""Save a run to storage.
# Save summary
summary = RunSummary.from_run(run)
summary_path = self.base_path / "summaries" / f"{run.id}.json"
with atomic_write(summary_path) as f:
f.write(summary.model_dump_json(indent=2))
DEPRECATED: This method is now a no-op.
New sessions use unified storage at sessions/{session_id}/state.json.
Tests should not rely on FileStorage - use unified session storage instead.
"""
import warnings
# Update indexes
self._add_to_index("by_goal", run.goal_id, run.id)
self._add_to_index("by_status", run.status.value, run.id)
for node_id in run.metrics.nodes_executed:
self._add_to_index("by_node", node_id, run.id)
warnings.warn(
"FileStorage.save_run() is deprecated. "
"New sessions use unified storage at sessions/{session_id}/state.json. "
"This write has been skipped.",
DeprecationWarning,
stacklevel=2,
)
# No-op: do not write to deprecated locations
def load_run(self, run_id: str) -> Run | None:
"""Load a run from storage."""
@@ -148,17 +151,53 @@ class FileStorage:
# === QUERY OPERATIONS ===
def get_runs_by_goal(self, goal_id: str) -> list[str]:
"""Get all run IDs for a goal."""
"""Get all run IDs for a goal.
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
This method only returns old run IDs from deprecated indexes.
"""
import warnings
warnings.warn(
"FileStorage.get_runs_by_goal() is deprecated. "
"For new sessions, scan sessions/*/state.json instead.",
DeprecationWarning,
stacklevel=2,
)
return self._get_index("by_goal", goal_id)
def get_runs_by_status(self, status: str | RunStatus) -> list[str]:
"""Get all run IDs with a status."""
"""Get all run IDs with a status.
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
This method only returns old run IDs from deprecated indexes.
"""
import warnings
warnings.warn(
"FileStorage.get_runs_by_status() is deprecated. "
"For new sessions, scan sessions/*/state.json instead.",
DeprecationWarning,
stacklevel=2,
)
if isinstance(status, RunStatus):
status = status.value
return self._get_index("by_status", status)
def get_runs_by_node(self, node_id: str) -> list[str]:
"""Get all run IDs that executed a node."""
"""Get all run IDs that executed a node.
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
This method only returns old run IDs from deprecated indexes.
"""
import warnings
warnings.warn(
"FileStorage.get_runs_by_node() is deprecated. "
"For new sessions, scan sessions/*/state.json instead.",
DeprecationWarning,
stacklevel=2,
)
return self._get_index("by_node", node_id)
def list_all_runs(self) -> list[str]:
@@ -167,8 +206,22 @@ class FileStorage:
return [f.stem for f in runs_dir.glob("*.json")]
def list_all_goals(self) -> list[str]:
"""List all goal IDs that have runs."""
"""List all goal IDs that have runs.
DEPRECATED: Indexes are deprecated. For new sessions, scan sessions/*/state.json instead.
This method only returns goals from old run IDs in deprecated indexes.
"""
import warnings
warnings.warn(
"FileStorage.list_all_goals() is deprecated. "
"For new sessions, scan sessions/*/state.json instead.",
DeprecationWarning,
stacklevel=2,
)
goals_dir = self.base_path / "indexes" / "by_goal"
if not goals_dir.exists():
return []
return [f.stem for f in goals_dir.glob("*.json")]
# === INDEX OPERATIONS ===
+325
View File
@@ -0,0 +1,325 @@
"""
Checkpoint Store - Manages checkpoint storage with atomic writes.
Handles saving, loading, listing, and pruning of execution checkpoints
for session resumability.
"""
import asyncio
import logging
from datetime import datetime, timedelta
from pathlib import Path
from framework.schemas.checkpoint import Checkpoint, CheckpointIndex, CheckpointSummary
from framework.utils.io import atomic_write
logger = logging.getLogger(__name__)
class CheckpointStore:
"""
Manages checkpoint storage with atomic writes.
Stores checkpoints in a session's checkpoints/ directory with
an index for fast lookup and filtering.
Directory structure:
checkpoints/
index.json # Checkpoint manifest
cp_{type}_{node}_{timestamp}.json # Individual checkpoints
"""
def __init__(self, base_path: Path):
"""
Initialize checkpoint store.
Args:
base_path: Session directory (e.g., ~/.hive/agents/agent_name/sessions/session_ID/)
"""
self.base_path = Path(base_path)
self.checkpoints_dir = self.base_path / "checkpoints"
self.index_path = self.checkpoints_dir / "index.json"
self._index_lock = asyncio.Lock()
async def save_checkpoint(self, checkpoint: Checkpoint) -> None:
"""
Atomically save checkpoint and update index.
Uses temp file + rename for crash safety. Updates index
after checkpoint is persisted.
Args:
checkpoint: Checkpoint to save
Raises:
OSError: If file write fails
"""
def _write():
# Ensure directory exists
self.checkpoints_dir.mkdir(parents=True, exist_ok=True)
# Write checkpoint file atomically
checkpoint_path = self.checkpoints_dir / f"{checkpoint.checkpoint_id}.json"
with atomic_write(checkpoint_path) as f:
f.write(checkpoint.model_dump_json(indent=2))
logger.debug(f"Saved checkpoint {checkpoint.checkpoint_id}")
# Write checkpoint file (blocking I/O in thread)
await asyncio.to_thread(_write)
# Update index (with lock to prevent concurrent modifications)
async with self._index_lock:
await self._update_index_add(checkpoint)
async def load_checkpoint(
self,
checkpoint_id: str | None = None,
) -> Checkpoint | None:
"""
Load checkpoint by ID or latest.
Args:
checkpoint_id: Checkpoint ID to load, or None for latest
Returns:
Checkpoint object, or None if not found
"""
def _read(checkpoint_id: str) -> Checkpoint | None:
checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
if not checkpoint_path.exists():
logger.warning(f"Checkpoint file not found: {checkpoint_path}")
return None
try:
return Checkpoint.model_validate_json(checkpoint_path.read_text())
except Exception as e:
logger.error(f"Failed to load checkpoint {checkpoint_id}: {e}")
return None
# Load index to get checkpoint ID if not provided
if checkpoint_id is None:
index = await self.load_index()
if not index or not index.latest_checkpoint_id:
logger.warning("No checkpoints found in index")
return None
checkpoint_id = index.latest_checkpoint_id
return await asyncio.to_thread(_read, checkpoint_id)
async def load_index(self) -> CheckpointIndex | None:
"""
Load checkpoint index.
Returns:
CheckpointIndex or None if not found
"""
def _read() -> CheckpointIndex | None:
if not self.index_path.exists():
return None
try:
return CheckpointIndex.model_validate_json(self.index_path.read_text())
except Exception as e:
logger.error(f"Failed to load checkpoint index: {e}")
return None
return await asyncio.to_thread(_read)
async def list_checkpoints(
self,
checkpoint_type: str | None = None,
is_clean: bool | None = None,
) -> list[CheckpointSummary]:
"""
List checkpoints with optional filters.
Args:
checkpoint_type: Filter by type (node_start, node_complete)
is_clean: Filter by clean status
Returns:
List of CheckpointSummary objects
"""
index = await self.load_index()
if not index:
return []
checkpoints = index.checkpoints
# Apply filters
if checkpoint_type:
checkpoints = [cp for cp in checkpoints if cp.checkpoint_type == checkpoint_type]
if is_clean is not None:
checkpoints = [cp for cp in checkpoints if cp.is_clean == is_clean]
return checkpoints
async def delete_checkpoint(self, checkpoint_id: str) -> bool:
"""
Delete a specific checkpoint.
Args:
checkpoint_id: Checkpoint ID to delete
Returns:
True if deleted, False if not found
"""
def _delete(checkpoint_id: str) -> bool:
checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
if not checkpoint_path.exists():
logger.warning(f"Checkpoint file not found: {checkpoint_path}")
return False
try:
checkpoint_path.unlink()
logger.info(f"Deleted checkpoint {checkpoint_id}")
return True
except Exception as e:
logger.error(f"Failed to delete checkpoint {checkpoint_id}: {e}")
return False
# Delete checkpoint file
deleted = await asyncio.to_thread(_delete, checkpoint_id)
if deleted:
# Update index (with lock)
async with self._index_lock:
await self._update_index_remove(checkpoint_id)
return deleted
async def prune_checkpoints(
self,
max_age_days: int = 7,
) -> int:
"""
Prune checkpoints older than max_age_days.
Args:
max_age_days: Maximum age in days (default 7)
Returns:
Number of checkpoints deleted
"""
index = await self.load_index()
if not index or not index.checkpoints:
return 0
# Calculate cutoff datetime
cutoff = datetime.now() - timedelta(days=max_age_days)
# Find old checkpoints
old_checkpoints = []
for cp in index.checkpoints:
try:
created = datetime.fromisoformat(cp.created_at)
if created < cutoff:
old_checkpoints.append(cp.checkpoint_id)
except Exception as e:
logger.warning(f"Failed to parse timestamp for {cp.checkpoint_id}: {e}")
# Delete old checkpoints
deleted_count = 0
for checkpoint_id in old_checkpoints:
if await self.delete_checkpoint(checkpoint_id):
deleted_count += 1
if deleted_count > 0:
logger.info(f"Pruned {deleted_count} checkpoints older than {max_age_days} days")
return deleted_count
async def checkpoint_exists(self, checkpoint_id: str) -> bool:
"""
Check if a checkpoint exists.
Args:
checkpoint_id: Checkpoint ID
Returns:
True if checkpoint exists
"""
def _check(checkpoint_id: str) -> bool:
checkpoint_path = self.checkpoints_dir / f"{checkpoint_id}.json"
return checkpoint_path.exists()
return await asyncio.to_thread(_check, checkpoint_id)
async def _update_index_add(self, checkpoint: Checkpoint) -> None:
"""
Update index after adding a checkpoint.
Should be called with _index_lock held.
Args:
checkpoint: Checkpoint that was added
"""
def _write(index: CheckpointIndex):
# Ensure directory exists
self.checkpoints_dir.mkdir(parents=True, exist_ok=True)
# Write index atomically
with atomic_write(self.index_path) as f:
f.write(index.model_dump_json(indent=2))
# Load or create index
index = await self.load_index()
if not index:
index = CheckpointIndex(
session_id=checkpoint.session_id,
checkpoints=[],
)
# Add checkpoint to index
index.add_checkpoint(checkpoint)
# Write updated index
await asyncio.to_thread(_write, index)
logger.debug(f"Updated index with checkpoint {checkpoint.checkpoint_id}")
async def _update_index_remove(self, checkpoint_id: str) -> None:
"""
Update index after removing a checkpoint.
Should be called with _index_lock held.
Args:
checkpoint_id: Checkpoint ID that was removed
"""
def _write(index: CheckpointIndex):
with atomic_write(self.index_path) as f:
f.write(index.model_dump_json(indent=2))
# Load index
index = await self.load_index()
if not index:
return
# Remove checkpoint from index
index.checkpoints = [cp for cp in index.checkpoints if cp.checkpoint_id != checkpoint_id]
# Update totals
index.total_checkpoints = len(index.checkpoints)
# Update latest_checkpoint_id if we removed the latest
if index.latest_checkpoint_id == checkpoint_id:
index.latest_checkpoint_id = (
index.checkpoints[-1].checkpoint_id if index.checkpoints else None
)
# Write updated index
await asyncio.to_thread(_write, index)
logger.debug(f"Removed checkpoint {checkpoint_id} from index")
+213
View File
@@ -0,0 +1,213 @@
"""
Session Store - Unified session storage with state.json.
Handles reading and writing session state to the new unified structure:
sessions/session_YYYYMMDD_HHMMSS_{uuid}/state.json
"""
import asyncio
import logging
import uuid
from datetime import datetime
from pathlib import Path
from framework.schemas.session_state import SessionState
from framework.utils.io import atomic_write
logger = logging.getLogger(__name__)
class SessionStore:
"""
Unified session storage with state.json.
Manages sessions in the new structure:
{base_path}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/
state.json # Single source of truth
conversations/ # Per-node EventLoop state
artifacts/ # Spillover data
logs/ # L1/L2/L3 observability
summary.json
details.jsonl
tool_logs.jsonl
"""
def __init__(self, base_path: Path):
"""
Initialize session store.
Args:
base_path: Base path for storage (e.g., ~/.hive/agents/twitter_outreach)
"""
self.base_path = Path(base_path)
self.sessions_dir = self.base_path / "sessions"
def generate_session_id(self) -> str:
"""
Generate session ID in format: session_YYYYMMDD_HHMMSS_{uuid}.
Returns:
Session ID string (e.g., "session_20260206_143022_abc12345")
"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
short_uuid = uuid.uuid4().hex[:8]
return f"session_{timestamp}_{short_uuid}"
def get_session_path(self, session_id: str) -> Path:
"""
Get path to session directory.
Args:
session_id: Session ID
Returns:
Path to session directory
"""
return self.sessions_dir / session_id
def get_state_path(self, session_id: str) -> Path:
"""
Get path to state.json file.
Args:
session_id: Session ID
Returns:
Path to state.json
"""
return self.get_session_path(session_id) / "state.json"
async def write_state(self, session_id: str, state: SessionState) -> None:
"""
Atomically write state.json for a session.
Uses temp file + rename for crash safety.
Args:
session_id: Session ID
state: SessionState to write
"""
def _write():
state_path = self.get_state_path(session_id)
state_path.parent.mkdir(parents=True, exist_ok=True)
with atomic_write(state_path) as f:
f.write(state.model_dump_json(indent=2))
await asyncio.to_thread(_write)
logger.debug(f"Wrote state.json for session {session_id}")
async def read_state(self, session_id: str) -> SessionState | None:
"""
Read state.json for a session.
Args:
session_id: Session ID
Returns:
SessionState or None if not found
"""
def _read():
state_path = self.get_state_path(session_id)
if not state_path.exists():
return None
return SessionState.model_validate_json(state_path.read_text())
return await asyncio.to_thread(_read)
async def list_sessions(
self,
status: str | None = None,
goal_id: str | None = None,
limit: int = 100,
) -> list[SessionState]:
"""
List sessions, optionally filtered by status or goal.
Args:
status: Optional status filter (e.g., "paused", "completed")
goal_id: Optional goal ID filter
limit: Maximum number of sessions to return
Returns:
List of SessionState objects
"""
def _scan():
sessions = []
if not self.sessions_dir.exists():
return sessions
for session_dir in self.sessions_dir.iterdir():
if not session_dir.is_dir():
continue
state_path = session_dir / "state.json"
if not state_path.exists():
continue
try:
state = SessionState.model_validate_json(state_path.read_text())
# Apply filters
if status and state.status != status:
continue
if goal_id and state.goal_id != goal_id:
continue
sessions.append(state)
except Exception as e:
logger.warning(f"Failed to load {state_path}: {e}")
continue
# Sort by updated_at descending (most recent first)
sessions.sort(key=lambda s: s.timestamps.updated_at, reverse=True)
return sessions[:limit]
return await asyncio.to_thread(_scan)
async def delete_session(self, session_id: str) -> bool:
"""
Delete a session and all its data.
Args:
session_id: Session ID to delete
Returns:
True if deleted, False if not found
"""
def _delete():
import shutil
session_path = self.get_session_path(session_id)
if not session_path.exists():
return False
shutil.rmtree(session_path)
logger.info(f"Deleted session {session_id}")
return True
return await asyncio.to_thread(_delete)
async def session_exists(self, session_id: str) -> bool:
"""
Check if a session exists.
Args:
session_id: Session ID
Returns:
True if session exists
"""
def _check():
return self.get_state_path(session_id).exists()
return await asyncio.to_thread(_check)
+179
View File
@@ -0,0 +1,179 @@
"""
State Writer - Dual-write adapter for migration period.
Writes execution state to both old (Run/RunSummary) and new (state.json) formats
to maintain backward compatibility during the transition period.
"""
import logging
import os
from datetime import datetime
from framework.schemas.run import Problem, Run, RunMetrics, RunStatus
from framework.schemas.session_state import SessionState, SessionStatus
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
logger = logging.getLogger(__name__)
class StateWriter:
"""
Writes execution state to both old and new formats during migration.
During the dual-write phase:
- New format (state.json) is written when USE_UNIFIED_SESSIONS=true
- Old format (Run/RunSummary) is always written for backward compatibility
"""
def __init__(self, old_storage: ConcurrentStorage, session_store: SessionStore):
"""
Initialize state writer.
Args:
old_storage: ConcurrentStorage for old format (runs/, summaries/)
session_store: SessionStore for new format (sessions/*/state.json)
"""
self.old = old_storage
self.new = session_store
self.dual_write_enabled = os.getenv("USE_UNIFIED_SESSIONS", "false").lower() == "true"
async def write_execution_state(
self,
session_id: str,
state: SessionState,
) -> None:
"""
Write execution state to both old and new formats.
Args:
session_id: Session ID
state: SessionState to write
"""
# Write to new format if enabled
if self.dual_write_enabled:
try:
await self.new.write_state(session_id, state)
logger.debug(f"Wrote state.json for session {session_id}")
except Exception as e:
logger.error(f"Failed to write state.json for {session_id}: {e}")
# Don't fail - old format is still written
# Always write to old format for backward compatibility
try:
run = self._convert_to_run(state)
await self.old.save_run(run)
logger.debug(f"Wrote Run object for session {session_id}")
except Exception as e:
logger.error(f"Failed to write Run object for {session_id}: {e}")
# This is more critical - reraise if old format fails
raise
def _convert_to_run(self, state: SessionState) -> Run:
"""
Convert SessionState to legacy Run object.
Args:
state: SessionState to convert
Returns:
Run object
"""
# Map SessionStatus to RunStatus
status_mapping = {
SessionStatus.ACTIVE: RunStatus.RUNNING,
SessionStatus.PAUSED: RunStatus.RUNNING, # Paused is still "running" in old format
SessionStatus.COMPLETED: RunStatus.COMPLETED,
SessionStatus.FAILED: RunStatus.FAILED,
SessionStatus.CANCELLED: RunStatus.CANCELLED,
}
run_status = status_mapping.get(state.status, RunStatus.FAILED)
# Convert timestamps
started_at = datetime.fromisoformat(state.timestamps.started_at)
completed_at = (
datetime.fromisoformat(state.timestamps.completed_at)
if state.timestamps.completed_at
else None
)
# Build RunMetrics
metrics = RunMetrics(
total_decisions=state.metrics.decision_count,
successful_decisions=state.metrics.decision_count
- len(state.progress.nodes_with_failures), # Approximate
failed_decisions=len(state.progress.nodes_with_failures),
total_tokens=state.metrics.total_input_tokens + state.metrics.total_output_tokens,
total_latency_ms=state.progress.total_latency_ms,
nodes_executed=state.metrics.nodes_executed,
edges_traversed=state.metrics.edges_traversed,
)
# Convert problems (SessionState stores as dicts, Run expects Problem objects)
problems = []
for p_dict in state.problems:
# Handle both old Problem objects and new dict format
if isinstance(p_dict, dict):
problems.append(Problem(**p_dict))
else:
problems.append(p_dict)
# Convert decisions (SessionState stores as dicts, Run expects Decision objects)
from framework.schemas.decision import Decision
decisions = []
for d_dict in state.decisions:
# Handle both old Decision objects and new dict format
if isinstance(d_dict, dict):
try:
decisions.append(Decision(**d_dict))
except Exception:
# Skip invalid decisions
continue
else:
decisions.append(d_dict)
# Create Run object
run = Run(
id=state.session_id, # Use session_id as run_id
goal_id=state.goal_id,
started_at=started_at,
status=run_status,
completed_at=completed_at,
decisions=decisions,
problems=problems,
metrics=metrics,
goal_description="", # Not stored in SessionState
input_data=state.input_data,
output_data=state.result.output,
)
return run
async def read_state(
self,
session_id: str,
prefer_new: bool = True,
) -> SessionState | None:
"""
Read execution state from either format.
Args:
session_id: Session ID
prefer_new: If True, try new format first (default)
Returns:
SessionState or None if not found
"""
if prefer_new:
# Try new format first
state = await self.new.read_state(session_id)
if state:
return state
# Fall back to old format
run = await self.old.load_run(session_id)
if run:
return SessionState.from_legacy_run(run, session_id)
return None
+126 -4
View File
@@ -1,16 +1,19 @@
import logging
import platform
import subprocess
import time
from textual.app import App, ComposeResult
from textual.binding import Binding
from textual.containers import Container, Horizontal, Vertical
from textual.widgets import Footer, Label
from textual.widgets import Footer, Input, Label
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.chat_repl import ChatRepl
from framework.tui.widgets.graph_view import GraphOverview
from framework.tui.widgets.log_pane import LogPane
from framework.tui.widgets.selectable_rich_log import SelectableRichLog
class StatusBar(Container):
@@ -202,21 +205,50 @@ class AdenTUI(App):
BINDINGS = [
Binding("q", "quit", "Quit"),
Binding("ctrl+c", "ctrl_c", "Interrupt", show=False, priority=True),
Binding("super+c", "ctrl_c", "Copy", show=False, priority=True),
Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
Binding("ctrl+z", "pause_execution", "Pause", show=True, priority=True),
Binding("ctrl+r", "show_sessions", "Sessions", show=True, priority=True),
Binding("tab", "focus_next", "Next Panel", show=True),
Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
]
def __init__(self, runtime: AgentRuntime):
def __init__(
self,
runtime: AgentRuntime,
resume_session: str | None = None,
resume_checkpoint: str | None = None,
):
super().__init__()
self.runtime = runtime
self.log_pane = LogPane()
self.graph_view = GraphOverview(runtime)
self.chat_repl = ChatRepl(runtime)
self.chat_repl = ChatRepl(runtime, resume_session, resume_checkpoint)
self.status_bar = StatusBar(graph_id=runtime.graph.id)
self.is_ready = False
def open_url(self, url: str, *, new_tab: bool = True) -> None:
"""Override to use native `open` for file:// URLs on macOS."""
if url.startswith("file://") and platform.system() == "Darwin":
path = url.removeprefix("file://")
subprocess.Popen(["open", path])
else:
super().open_url(url, new_tab=new_tab)
def action_ctrl_c(self) -> None:
# Check if any SelectableRichLog has an active selection to copy
for widget in self.query(SelectableRichLog):
if widget.selection is not None:
text = widget.copy_selection()
if text:
widget.clear_selection()
self.notify("Copied to clipboard", severity="information", timeout=2)
return
self.notify("Press [b]q[/b] to quit", severity="warning", timeout=3)
def compose(self) -> ComposeResult:
yield self.status_bar
@@ -503,9 +535,99 @@ class AdenTUI(App):
except Exception as e:
self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)
def action_pause_execution(self) -> None:
"""Immediately pause execution by cancelling task (bound to Ctrl+Z)."""
try:
chat_repl = self.query_one(ChatRepl)
if not chat_repl._current_exec_id:
self.notify(
"No active execution to pause",
severity="information",
timeout=3,
)
return
# Find and cancel the execution task - executor will catch and save state
task_cancelled = False
for stream in self.runtime._streams.values():
exec_id = chat_repl._current_exec_id
task = stream._execution_tasks.get(exec_id)
if task and not task.done():
task.cancel()
task_cancelled = True
self.notify(
"⏸ Execution paused - state saved",
severity="information",
timeout=3,
)
break
if not task_cancelled:
self.notify(
"Execution already completed",
severity="information",
timeout=2,
)
except Exception as e:
self.notify(
f"Error pausing execution: {e}",
severity="error",
timeout=5,
)
def action_show_sessions(self) -> None:
"""Show sessions list (bound to Ctrl+R)."""
# Send /sessions command to chat input
try:
chat_repl = self.query_one(ChatRepl)
chat_input = chat_repl.query_one("#chat-input", Input)
chat_input.value = "/sessions"
# Trigger submission
self.notify(
"💡 Type /sessions in the chat to see all sessions",
severity="information",
timeout=3,
)
except Exception:
self.notify(
"Use /sessions command to see all sessions",
severity="information",
timeout=3,
)
async def on_unmount(self) -> None:
"""Cleanup on app shutdown."""
"""Cleanup on app shutdown - cancel execution which will save state."""
self.is_ready = False
# Cancel any active execution - the executor will catch CancelledError
# and save current state as paused (no waiting needed!)
try:
import asyncio
chat_repl = self.query_one(ChatRepl)
if chat_repl._current_exec_id:
# Find the stream with this execution
for stream in self.runtime._streams.values():
exec_id = chat_repl._current_exec_id
task = stream._execution_tasks.get(exec_id)
if task and not task.done():
# Cancel the task - executor will catch and save state
task.cancel()
try:
# Wait for executor to save state (may take a few seconds)
# Longer timeout for quit to ensure state is properly saved
await asyncio.wait_for(task, timeout=5.0)
except (TimeoutError, asyncio.CancelledError):
# Expected - task was cancelled
# If timeout, state may not be fully saved
pass
except Exception:
# Ignore other exceptions during cleanup
pass
break
except Exception:
pass
try:
if hasattr(self, "_subscription_id"):
self.runtime.unsubscribe_from_events(self._subscription_id)
+626 -11
View File
@@ -15,14 +15,17 @@ Client-facing input:
"""
import asyncio
import re
import threading
from pathlib import Path
from typing import Any
from textual.app import ComposeResult
from textual.containers import Vertical
from textual.widgets import Input, Label, RichLog
from textual.widgets import Input, Label
from framework.runtime.agent_runtime import AgentRuntime
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
class ChatRepl(Vertical):
@@ -67,13 +70,20 @@ class ChatRepl(Vertical):
}
"""
def __init__(self, runtime: AgentRuntime):
def __init__(
self,
runtime: AgentRuntime,
resume_session: str | None = None,
resume_checkpoint: str | None = None,
):
super().__init__()
self.runtime = runtime
self._current_exec_id: str | None = None
self._streaming_snapshot: str = ""
self._waiting_for_input: bool = False
self._input_node_id: str | None = None
self._resume_session = resume_session
self._resume_checkpoint = resume_checkpoint
# Dedicated event loop for agent execution.
# Keeps blocking runtime code (LLM calls, MCP tools) off
@@ -87,22 +97,621 @@ class ChatRepl(Vertical):
self._agent_thread.start()
def compose(self) -> ComposeResult:
yield RichLog(id="chat-history", highlight=True, markup=True, auto_scroll=False, wrap=True)
yield RichLog(
id="chat-history",
highlight=True,
markup=True,
auto_scroll=False,
wrap=True,
min_width=0,
)
yield Label("Agent is processing...", id="processing-indicator")
yield Input(placeholder="Enter input for agent...", id="chat-input")
# Regex for file:// URIs that are NOT already inside Rich [link=...] markup
_FILE_URI_RE = re.compile(r"(?<!\[link=)(file://[^\s)\]>*]+)")
def _linkify(self, text: str) -> str:
"""Convert bare file:// URIs to clickable Rich [link=...] markup with short display text."""
def _shorten(match: re.Match) -> str:
uri = match.group(1)
filename = uri.rsplit("/", 1)[-1] if "/" in uri else uri
return f"[link={uri}]{filename}[/link]"
return self._FILE_URI_RE.sub(_shorten, text)
def _write_history(self, content: str) -> None:
"""Write to chat history, only auto-scrolling if user is at the bottom."""
history = self.query_one("#chat-history", RichLog)
was_at_bottom = history.is_vertical_scroll_end
history.write(content)
history.write(self._linkify(content))
if was_at_bottom:
history.scroll_end(animate=False)
async def _handle_command(self, command: str) -> None:
"""Handle slash commands for session and checkpoint operations."""
parts = command.split(maxsplit=2)
cmd = parts[0].lower()
if cmd == "/help":
self._write_history("""[bold cyan]Available Commands:[/bold cyan]
[bold]/sessions[/bold] - List all sessions for this agent
[bold]/sessions[/bold] <session_id> - Show session details and checkpoints
[bold]/resume[/bold] - Resume latest paused/failed session
[bold]/resume[/bold] <session_id> - Resume session from where it stopped
[bold]/recover[/bold] <session_id> <cp_id> - Recover from specific checkpoint
[bold]/pause[/bold] - Pause current execution (Ctrl+Z)
[bold]/help[/bold] - Show this help message
[dim]Examples:[/dim]
/sessions [dim]# List all sessions[/dim]
/sessions session_20260208_143022 [dim]# Show session details[/dim]
/resume [dim]# Resume latest session (from state)[/dim]
/resume session_20260208_143022 [dim]# Resume specific session (from state)[/dim]
/recover session_20260208_143022 cp_xxx [dim]# Recover from specific checkpoint[/dim]
/pause [dim]# Pause (or Ctrl+Z)[/dim]
""")
elif cmd == "/sessions":
session_id = parts[1].strip() if len(parts) > 1 else None
await self._cmd_sessions(session_id)
elif cmd == "/resume":
# Resume from session state (not checkpoint-based)
if len(parts) < 2:
session_id = await self._find_latest_resumable_session()
if not session_id:
self._write_history("[bold red]No resumable sessions found[/bold red]")
self._write_history(" Tip: Use [bold]/sessions[/bold] to see all sessions")
return
else:
session_id = parts[1].strip()
await self._cmd_resume(session_id)
elif cmd == "/recover":
# Recover from specific checkpoint
if len(parts) < 3:
self._write_history(
"[bold red]Error:[/bold red] /recover requires session_id and checkpoint_id"
)
self._write_history(" Usage: [bold]/recover <session_id> <checkpoint_id>[/bold]")
self._write_history(
" Tip: Use [bold]/sessions <session_id>[/bold] to see checkpoints"
)
return
session_id = parts[1].strip()
checkpoint_id = parts[2].strip()
await self._cmd_recover(session_id, checkpoint_id)
elif cmd == "/pause":
await self._cmd_pause()
else:
self._write_history(
f"[bold red]Unknown command:[/bold red] {cmd}\n"
"Type [bold]/help[/bold] for available commands"
)
async def _cmd_sessions(self, session_id: str | None) -> None:
"""List sessions or show details of a specific session."""
try:
# Get storage path from runtime
storage_path = self.runtime._storage.base_path
if session_id:
# Show details of specific session including checkpoints
await self._show_session_details(storage_path, session_id)
else:
# List all sessions
await self._list_sessions(storage_path)
except Exception as e:
self._write_history(f"[bold red]Error:[/bold red] {e}")
self._write_history(" Could not access session data")
async def _find_latest_resumable_session(self) -> str | None:
"""Find the most recent paused or failed session."""
try:
storage_path = self.runtime._storage.base_path
sessions_dir = storage_path / "sessions"
if not sessions_dir.exists():
return None
# Get all sessions, most recent first
session_dirs = sorted(
[d for d in sessions_dir.iterdir() if d.is_dir()],
key=lambda d: d.name,
reverse=True,
)
# Find first paused, failed, or cancelled session
import json
for session_dir in session_dirs:
state_file = session_dir / "state.json"
if not state_file.exists():
continue
with open(state_file) as f:
state = json.load(f)
status = state.get("status", "").lower()
# Check if resumable (any non-completed status)
if status in ["paused", "failed", "cancelled", "active"]:
return session_dir.name
return None
except Exception:
return None
async def _list_sessions(self, storage_path: Path) -> None:
"""List all sessions for the agent."""
self._write_history("[bold cyan]Available Sessions:[/bold cyan]")
# Find all session directories
sessions_dir = storage_path / "sessions"
if not sessions_dir.exists():
self._write_history("[dim]No sessions found.[/dim]")
self._write_history(" Sessions will appear here after running the agent")
return
session_dirs = sorted(
[d for d in sessions_dir.iterdir() if d.is_dir()],
key=lambda d: d.name,
reverse=True, # Most recent first
)
if not session_dirs:
self._write_history("[dim]No sessions found.[/dim]")
return
self._write_history(f"[dim]Found {len(session_dirs)} session(s)[/dim]\n")
for session_dir in session_dirs[:10]: # Show last 10 sessions
session_id = session_dir.name
state_file = session_dir / "state.json"
if not state_file.exists():
continue
# Read session state
try:
import json
with open(state_file) as f:
state = json.load(f)
status = state.get("status", "unknown").upper()
# Status with color
if status == "COMPLETED":
status_colored = f"[green]{status}[/green]"
elif status == "FAILED":
status_colored = f"[red]{status}[/red]"
elif status == "PAUSED":
status_colored = f"[yellow]{status}[/yellow]"
elif status == "CANCELLED":
status_colored = f"[dim yellow]{status}[/dim yellow]"
else:
status_colored = f"[dim]{status}[/dim]"
# Check for checkpoints
checkpoint_dir = session_dir / "checkpoints"
checkpoint_count = 0
if checkpoint_dir.exists():
checkpoint_files = list(checkpoint_dir.glob("cp_*.json"))
checkpoint_count = len(checkpoint_files)
# Session line
self._write_history(f"📋 [bold]{session_id}[/bold]")
self._write_history(f" Status: {status_colored} Checkpoints: {checkpoint_count}")
if checkpoint_count > 0:
self._write_history(f" [dim]Resume: /resume {session_id}[/dim]")
self._write_history("") # Blank line
except Exception as e:
self._write_history(f" [dim red]Error reading: {e}[/dim red]")
async def _show_session_details(self, storage_path: Path, session_id: str) -> None:
"""Show detailed information about a specific session."""
self._write_history(f"[bold cyan]Session Details:[/bold cyan] {session_id}\n")
session_dir = storage_path / "sessions" / session_id
if not session_dir.exists():
self._write_history("[bold red]Error:[/bold red] Session not found")
self._write_history(f" Path: {session_dir}")
self._write_history(" Tip: Use [bold]/sessions[/bold] to see available sessions")
return
state_file = session_dir / "state.json"
if not state_file.exists():
self._write_history("[bold red]Error:[/bold red] Session state not found")
return
try:
import json
with open(state_file) as f:
state = json.load(f)
# Basic info
status = state.get("status", "unknown").upper()
if status == "COMPLETED":
status_colored = f"[green]{status}[/green]"
elif status == "FAILED":
status_colored = f"[red]{status}[/red]"
elif status == "PAUSED":
status_colored = f"[yellow]{status}[/yellow]"
elif status == "CANCELLED":
status_colored = f"[dim yellow]{status}[/dim yellow]"
else:
status_colored = status
self._write_history(f"Status: {status_colored}")
if "started_at" in state:
self._write_history(f"Started: {state['started_at']}")
if "completed_at" in state:
self._write_history(f"Completed: {state['completed_at']}")
# Execution path
if "execution_path" in state and state["execution_path"]:
self._write_history("\n[bold]Execution Path:[/bold]")
for node_id in state["execution_path"]:
self._write_history(f"{node_id}")
# Checkpoints
checkpoint_dir = session_dir / "checkpoints"
if checkpoint_dir.exists():
checkpoint_files = sorted(checkpoint_dir.glob("cp_*.json"))
if checkpoint_files:
self._write_history(
f"\n[bold]Available Checkpoints:[/bold] ({len(checkpoint_files)})"
)
# Load and show checkpoints
for i, cp_file in enumerate(checkpoint_files[-5:], 1): # Last 5
try:
with open(cp_file) as f:
cp_data = json.load(f)
cp_id = cp_data.get("checkpoint_id", cp_file.stem)
cp_type = cp_data.get("checkpoint_type", "unknown")
current_node = cp_data.get("current_node", "unknown")
is_clean = cp_data.get("is_clean", False)
clean_marker = "" if is_clean else ""
self._write_history(f" {i}. {clean_marker} [cyan]{cp_id}[/cyan]")
self._write_history(f" Type: {cp_type}, Node: {current_node}")
except Exception:
pass
# Quick actions
if checkpoint_dir.exists() and list(checkpoint_dir.glob("cp_*.json")):
self._write_history("\n[bold]Quick Actions:[/bold]")
self._write_history(
f" [dim]/resume {session_id}[/dim] - Resume from latest checkpoint"
)
except Exception as e:
self._write_history(f"[bold red]Error:[/bold red] {e}")
import traceback
self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
async def _cmd_resume(self, session_id: str) -> None:
"""Resume a session from its last state (session state, not checkpoint)."""
try:
storage_path = self.runtime._storage.base_path
session_dir = storage_path / "sessions" / session_id
# Verify session exists
if not session_dir.exists():
self._write_history(f"[bold red]Error:[/bold red] Session not found: {session_id}")
self._write_history(" Use [bold]/sessions[/bold] to see available sessions")
return
# Load session state
state_file = session_dir / "state.json"
if not state_file.exists():
self._write_history("[bold red]Error:[/bold red] Session state not found")
return
import json
with open(state_file) as f:
state = json.load(f)
# Resume from session state (not checkpoint)
progress = state.get("progress", {})
paused_at = progress.get("paused_at") or progress.get("resume_from")
if paused_at:
# Has paused_at - resume from there
resume_session_state = {
"paused_at": paused_at,
"memory": state.get("memory", {}),
"execution_path": progress.get("path", []),
"node_visit_counts": progress.get("node_visit_counts", {}),
}
resume_info = f"From node: [cyan]{paused_at}[/cyan]"
else:
# No paused_at - just retry with same input
resume_session_state = {}
resume_info = "Retrying with same input"
# Display resume info
self._write_history(f"[bold cyan]🔄 Resuming session[/bold cyan] {session_id}")
self._write_history(f" {resume_info}")
if paused_at:
self._write_history(" [dim](Using session state, not checkpoint)[/dim]")
# Check if already executing
if self._current_exec_id is not None:
self._write_history(
"[bold yellow]Warning:[/bold yellow] An execution is already running"
)
self._write_history(" Wait for it to complete or use /pause first")
return
# Get original input data from session state
input_data = state.get("input_data", {})
# Show indicator
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Resuming from session state...")
indicator.display = True
# Update placeholder
chat_input = self.query_one("#chat-input", Input)
chat_input.placeholder = "Commands: /pause, /sessions (agent resuming...)"
# Trigger execution with resume state
try:
entry_points = self.runtime.get_entry_points()
if not entry_points:
self._write_history("[bold red]Error:[/bold red] No entry points available")
return
# Submit execution with resume state and original input data
future = asyncio.run_coroutine_threadsafe(
self.runtime.trigger(
entry_points[0].id,
input_data=input_data,
session_state=resume_session_state,
),
self._agent_loop,
)
exec_id = await asyncio.wrap_future(future)
self._current_exec_id = exec_id
self._write_history(
f"[green]✓[/green] Resume started (execution: {exec_id[:12]}...)"
)
self._write_history(" Agent is continuing from where it stopped...")
except Exception as e:
self._write_history(f"[bold red]Error starting resume:[/bold red] {e}")
indicator.display = False
chat_input.placeholder = "Enter input for agent..."
except Exception as e:
self._write_history(f"[bold red]Error:[/bold red] {e}")
import traceback
self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
async def _cmd_recover(self, session_id: str, checkpoint_id: str) -> None:
"""Recover a session from a specific checkpoint (time-travel debugging)."""
try:
storage_path = self.runtime._storage.base_path
session_dir = storage_path / "sessions" / session_id
# Verify session exists
if not session_dir.exists():
self._write_history(f"[bold red]Error:[/bold red] Session not found: {session_id}")
self._write_history(" Use [bold]/sessions[/bold] to see available sessions")
return
# Verify checkpoint exists
checkpoint_file = session_dir / "checkpoints" / f"{checkpoint_id}.json"
if not checkpoint_file.exists():
self._write_history(
f"[bold red]Error:[/bold red] Checkpoint not found: {checkpoint_id}"
)
self._write_history(
f" Use [bold]/sessions {session_id}[/bold] to see available checkpoints"
)
return
# Display recover info
self._write_history(f"[bold cyan]⏪ Recovering session[/bold cyan] {session_id}")
self._write_history(f" From checkpoint: [cyan]{checkpoint_id}[/cyan]")
self._write_history(
" [dim](Checkpoint-based recovery for time-travel debugging)[/dim]"
)
# Check if already executing
if self._current_exec_id is not None:
self._write_history(
"[bold yellow]Warning:[/bold yellow] An execution is already running"
)
self._write_history(" Wait for it to complete or use /pause first")
return
# Create session_state for checkpoint recovery
recover_session_state = {
"resume_from_checkpoint": checkpoint_id,
}
# Show indicator
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Recovering from checkpoint...")
indicator.display = True
# Update placeholder
chat_input = self.query_one("#chat-input", Input)
chat_input.placeholder = "Commands: /pause, /sessions (agent recovering...)"
# Trigger execution with checkpoint recovery
try:
entry_points = self.runtime.get_entry_points()
if not entry_points:
self._write_history("[bold red]Error:[/bold red] No entry points available")
return
# Submit execution with checkpoint recovery state
future = asyncio.run_coroutine_threadsafe(
self.runtime.trigger(
entry_points[0].id,
input_data={},
session_state=recover_session_state,
),
self._agent_loop,
)
exec_id = await asyncio.wrap_future(future)
self._current_exec_id = exec_id
self._write_history(
f"[green]✓[/green] Recovery started (execution: {exec_id[:12]}...)"
)
self._write_history(" Agent is continuing from checkpoint...")
except Exception as e:
self._write_history(f"[bold red]Error starting recovery:[/bold red] {e}")
indicator.display = False
chat_input.placeholder = "Enter input for agent..."
except Exception as e:
self._write_history(f"[bold red]Error:[/bold red] {e}")
import traceback
self._write_history(f"[dim]{traceback.format_exc()}[/dim]")
async def _cmd_pause(self) -> None:
"""Immediately pause execution by cancelling task (same as Ctrl+Z)."""
# Check if there's a current execution
if not self._current_exec_id:
self._write_history("[bold yellow]No active execution to pause[/bold yellow]")
self._write_history(" Start an execution first, then use /pause during execution")
return
# Find and cancel the execution task - executor will catch and save state
task_cancelled = False
for stream in self.runtime._streams.values():
exec_id = self._current_exec_id
task = stream._execution_tasks.get(exec_id)
if task and not task.done():
task.cancel()
task_cancelled = True
self._write_history("[bold green]⏸ Execution paused - state saved[/bold green]")
self._write_history(" Resume later with: [bold]/resume[/bold]")
break
if not task_cancelled:
self._write_history("[bold yellow]Execution already completed[/bold yellow]")
def on_mount(self) -> None:
"""Add welcome message when widget mounts."""
"""Add welcome message and check for resumable sessions."""
history = self.query_one("#chat-history", RichLog)
history.write("[bold cyan]Chat REPL Ready[/bold cyan] — Type your input below\n")
history.write(
"[bold cyan]Chat REPL Ready[/bold cyan] — "
"Type your input or use [bold]/help[/bold] for commands\n"
)
# Auto-trigger resume/recover if CLI args provided
if self._resume_session:
if self._resume_checkpoint:
# Use /recover for checkpoint-based recovery
history.write(
"\n[bold cyan]🔄 Auto-recovering from checkpoint "
"(--resume-session + --checkpoint)[/bold cyan]"
)
self.call_later(self._cmd_recover, self._resume_session, self._resume_checkpoint)
else:
# Use /resume for session state resume
history.write(
"\n[bold cyan]🔄 Auto-resuming session (--resume-session)[/bold cyan]"
)
self.call_later(self._cmd_resume, self._resume_session)
return # Skip normal startup messages
# Check for resumable sessions
self._check_and_show_resumable_sessions()
history.write(
"[dim]Quick start: /sessions to see previous sessions, "
"/pause to pause execution[/dim]\n"
)
def _check_and_show_resumable_sessions(self) -> None:
"""Check for non-terminated sessions and prompt user."""
try:
storage_path = self.runtime._storage.base_path
sessions_dir = storage_path / "sessions"
if not sessions_dir.exists():
return
# Find non-terminated sessions (paused, failed, cancelled, active)
resumable = []
session_dirs = sorted(
[d for d in sessions_dir.iterdir() if d.is_dir()],
key=lambda d: d.name,
reverse=True, # Most recent first
)
import json
for session_dir in session_dirs[:5]: # Check last 5 sessions
state_file = session_dir / "state.json"
if not state_file.exists():
continue
try:
with open(state_file) as f:
state = json.load(f)
status = state.get("status", "").lower()
# Non-terminated statuses
if status in ["paused", "failed", "cancelled", "active"]:
resumable.append(
{
"session_id": session_dir.name,
"status": status.upper(),
}
)
except Exception:
continue
if resumable:
self._write_history("\n[bold yellow]⚠ Non-terminated sessions found:[/bold yellow]")
for i, session in enumerate(resumable[:3], 1): # Show top 3
status = session["status"]
session_id = session["session_id"]
# Color code status
if status == "PAUSED":
status_colored = f"[yellow]{status}[/yellow]"
elif status == "FAILED":
status_colored = f"[red]{status}[/red]"
elif status == "CANCELLED":
status_colored = f"[dim yellow]{status}[/dim yellow]"
else:
status_colored = f"[dim]{status}[/dim]"
self._write_history(f" {i}. {session_id[:32]}... [{status_colored}]")
self._write_history("\n[bold cyan]What would you like to do?[/bold cyan]")
self._write_history(" • Type [bold]/resume[/bold] to continue the latest session")
self._write_history(
f" • Type [bold]/resume {resumable[0]['session_id']}[/bold] "
"for specific session"
)
self._write_history(" • Or just type your input to start a new session\n")
except Exception:
# Silently fail - don't block TUI startup
pass
async def on_input_submitted(self, message: Input.Submitted) -> None:
"""Handle input submission — either start new execution or inject input."""
@@ -110,15 +719,21 @@ class ChatRepl(Vertical):
if not user_input:
return
# Handle commands (starting with /) - ALWAYS process commands first
# Commands work during execution, during client-facing input, anytime
if user_input.startswith("/"):
await self._handle_command(user_input)
message.input.value = ""
return
# Client-facing input: route to the waiting node
if self._waiting_for_input and self._input_node_id:
self._write_history(f"[bold green]You:[/bold green] {user_input}")
message.input.value = ""
# Disable input while agent processes the response
# Keep input enabled for commands (but change placeholder)
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = True
chat_input.placeholder = "Enter input for agent..."
chat_input.placeholder = "Commands: /pause, /sessions (agent processing...)"
self._waiting_for_input = False
indicator = self.query_one("#processing-indicator", Label)
@@ -171,9 +786,9 @@ class ChatRepl(Vertical):
indicator.update("Thinking...")
indicator.display = True
# Disable input while the agent is working
# Keep input enabled for commands during execution
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = True
chat_input.placeholder = "Commands available: /pause, /sessions, /help"
# Submit execution to the dedicated agent loop so blocking
# runtime code (LLM, MCP tools) never touches Textual's loop.
+1 -1
View File
@@ -4,10 +4,10 @@ Graph/Tree Overview Widget - Displays real agent graph structure.
from textual.app import ComposeResult
from textual.containers import Vertical
from textual.widgets import RichLog
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
class GraphOverview(Vertical):
+1 -1
View File
@@ -7,9 +7,9 @@ from datetime import datetime
from textual.app import ComposeResult
from textual.containers import Container
from textual.widgets import RichLog
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
class LogPane(Container):
@@ -0,0 +1,206 @@
"""
SelectableRichLog - RichLog with mouse-driven text selection and clipboard copy.
Drop-in replacement for RichLog. Click-and-drag to select text, which is
visually highlighted. Press Ctrl+C to copy selection to clipboard (handled
by app.py). Press Escape or single-click to clear selection.
"""
from __future__ import annotations
import subprocess
import sys
from rich.segment import Segment as RichSegment
from rich.style import Style
from textual.geometry import Offset
from textual.selection import Selection
from textual.strip import Strip
from textual.widgets import RichLog
# Highlight style for selected text
_HIGHLIGHT_STYLE = Style(bgcolor="blue", color="white")
class SelectableRichLog(RichLog):
"""RichLog with mouse-driven text selection."""
DEFAULT_CSS = """
SelectableRichLog {
pointer: text;
}
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._sel_anchor: Offset | None = None
self._sel_end: Offset | None = None
self._selecting: bool = False
# -- Internal helpers --
def _apply_highlight(self, strip: Strip) -> Strip:
"""Apply highlight with correct precedence (highlight wins over base style)."""
segments = []
for text, style, control in strip._segments:
if control:
segments.append(RichSegment(text, style, control))
else:
new_style = (style + _HIGHLIGHT_STYLE) if style else _HIGHLIGHT_STYLE
segments.append(RichSegment(text, new_style, control))
return Strip(segments, strip.cell_length)
# -- Selection helpers --
@property
def selection(self) -> Selection | None:
"""Build a Selection from current anchor/end, or None if no selection."""
if self._sel_anchor is None or self._sel_end is None:
return None
if self._sel_anchor == self._sel_end:
return None
return Selection.from_offsets(self._sel_anchor, self._sel_end)
def _mouse_to_content(self, event_x: int, event_y: int) -> Offset:
"""Convert viewport mouse coords to content (line, col) coords."""
scroll_x, scroll_y = self.scroll_offset
return Offset(scroll_x + event_x, scroll_y + event_y)
def clear_selection(self) -> None:
"""Clear any active selection."""
had_selection = self._sel_anchor is not None
self._sel_anchor = None
self._sel_end = None
self._selecting = False
if had_selection:
self.refresh()
# -- Mouse handlers (left button only) --
def on_mouse_down(self, event) -> None:
"""Start selection on left mouse button."""
if event.button != 1:
return
self._sel_anchor = self._mouse_to_content(event.x, event.y)
self._sel_end = self._sel_anchor
self._selecting = True
self.capture_mouse()
self.refresh()
def on_mouse_move(self, event) -> None:
"""Extend selection while dragging."""
if not self._selecting:
return
self._sel_end = self._mouse_to_content(event.x, event.y)
self.refresh()
def on_mouse_up(self, event) -> None:
"""End selection on mouse release."""
if not self._selecting:
return
self._selecting = False
self.release_mouse()
# Single-click (no drag) clears selection
if self._sel_anchor == self._sel_end:
self.clear_selection()
# -- Keyboard handlers --
def on_key(self, event) -> None:
"""Clear selection on Escape."""
if event.key == "escape":
self.clear_selection()
# -- Rendering with highlight --
def render_line(self, y: int) -> Strip:
"""Override to apply selection highlight on top of the base strip."""
strip = super().render_line(y)
sel = self.selection
if sel is None:
return strip
# Determine which content line this viewport row corresponds to
_, scroll_y = self.scroll_offset
content_y = scroll_y + y
span = sel.get_span(content_y)
if span is None:
return strip
start_x, end_x = span
cell_len = strip.cell_length
if cell_len == 0:
return strip
scroll_x, _ = self.scroll_offset
# -1 means "to end of content line" — use viewport end
if end_x == -1:
end_x = cell_len
else:
# Convert content-space x to viewport-space x
end_x = end_x - scroll_x
# Convert content-space x to viewport-space x
start_x = start_x - scroll_x
# Clamp to viewport strip bounds
start_x = max(0, start_x)
end_x = min(end_x, cell_len)
if start_x >= end_x:
return strip
# Divide strip into [before, selected, after] and highlight the middle
parts = strip.divide([start_x, end_x])
if len(parts) < 2:
return strip
highlighted_parts: list[Strip] = []
for i, part in enumerate(parts):
if i == 1:
highlighted_parts.append(self._apply_highlight(part))
else:
highlighted_parts.append(part)
return Strip.join(highlighted_parts)
# -- Text extraction & clipboard --
def get_selected_text(self) -> str | None:
"""Extract the plain text of the current selection, or None."""
sel = self.selection
if sel is None:
return None
# Build full text from all lines
all_text = "\n".join(strip.text for strip in self.lines)
extracted = sel.extract(all_text)
return extracted if extracted else None
def copy_selection(self) -> str | None:
"""Copy selected text to system clipboard. Returns text or None."""
text = self.get_selected_text()
if not text:
return None
_copy_to_clipboard(text)
return text
def _copy_to_clipboard(text: str) -> None:
"""Copy text to system clipboard using platform-native tools."""
try:
if sys.platform == "darwin":
subprocess.run(["pbcopy"], input=text.encode(), check=True, timeout=5)
elif sys.platform.startswith("linux"):
subprocess.run(
["xclip", "-selection", "clipboard"],
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
pass
+1 -1
View File
@@ -1,6 +1,6 @@
[project]
name = "framework"
version = "0.1.0"
version = "0.4.2"
description = "Goal-driven agent runtime with Builder-friendly observability"
readme = "README.md"
requires-python = ">=3.11"
+11 -1
View File
@@ -1,10 +1,20 @@
"""Tests for the BuilderQuery interface - how Builder analyzes agent runs."""
"""Tests for the BuilderQuery interface - how Builder analyzes agent runs.
DEPRECATED: These tests rely on the deprecated FileStorage backend.
BuilderQuery and Runtime both use FileStorage which is deprecated.
New code should use unified session storage instead.
"""
from pathlib import Path
import pytest
from framework import BuilderQuery, Runtime
from framework.schemas.run import RunStatus
# Mark all tests in this module as skipped - they rely on deprecated FileStorage
pytestmark = pytest.mark.skip(reason="Tests rely on deprecated FileStorage backend")
def create_successful_run(runtime: Runtime, goal_id: str = "test_goal") -> str:
"""Helper to create a successful run with decisions."""
+20
View File
@@ -26,6 +26,11 @@ def create_test_run(
)
@pytest.mark.skip(
reason="FileStorage.save_run() is deprecated and now a no-op. "
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
@pytest.mark.asyncio
async def test_cache_invalidation_on_save(tmp_path: Path):
"""Test that summary cache is invalidated when a run is saved.
@@ -62,6 +67,11 @@ async def test_cache_invalidation_on_save(tmp_path: Path):
await storage.stop()
@pytest.mark.skip(
reason="FileStorage.save_run() is deprecated and now a no-op. "
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
@pytest.mark.asyncio
async def test_batched_write_cache_consistency(tmp_path: Path):
"""Test that cache is only updated after successful batched write.
@@ -104,6 +114,11 @@ async def test_batched_write_cache_consistency(tmp_path: Path):
await storage.stop()
@pytest.mark.skip(
reason="FileStorage.save_run() is deprecated and now a no-op. "
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
@pytest.mark.asyncio
async def test_immediate_write_updates_cache(tmp_path: Path):
"""Test that immediate writes still update cache correctly."""
@@ -129,6 +144,11 @@ async def test_immediate_write_updates_cache(tmp_path: Path):
await storage.stop()
@pytest.mark.skip(
reason="FileStorage.save_run() is deprecated and now a no-op. "
"ConcurrentStorage wraps FileStorage, so these tests no longer work. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
@pytest.mark.asyncio
async def test_summary_cache_invalidated_on_multiple_saves(tmp_path: Path):
"""Test that summary cache is invalidated on each save, not just the first."""
@@ -0,0 +1,344 @@
"""
Regression tests for conditional edge direct key access (Issue #3599).
Verifies that node outputs are written to memory before edge evaluation,
enabling direct key access in conditional expressions (e.g., 'score > 80')
instead of requiring output['score'] > 80 syntax.
"""
import pytest
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Goal
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
from framework.runtime.core import Runtime
class SimpleRuntime(Runtime):
"""Minimal runtime for testing."""
def start_run(self, **kwargs):
return "test-run"
def end_run(self, **kwargs):
pass
def report_problem(self, **kwargs):
pass
def decide(self, **kwargs):
return "test-decision"
def record_outcome(self, **kwargs):
pass
def set_node(self, **kwargs):
pass
class ScoreNode(NodeProtocol):
"""Node that outputs a score value."""
async def execute(self, ctx: NodeContext) -> NodeResult:
return NodeResult(success=True, output={"score": 85})
class HighScoreNode(NodeProtocol):
"""Consumer node for high scores."""
async def execute(self, ctx: NodeContext) -> NodeResult:
return NodeResult(success=True, output={"result": "high_score_path"})
class MultiKeyNode(NodeProtocol):
"""Node that outputs multiple keys."""
async def execute(self, ctx: NodeContext) -> NodeResult:
return NodeResult(success=True, output={"x": 100, "y": 50})
class ConsumerNode(NodeProtocol):
"""Generic consumer node."""
async def execute(self, ctx: NodeContext) -> NodeResult:
return NodeResult(success=True, output={"processed": True})
@pytest.mark.asyncio
async def test_direct_key_access_in_conditional_edge():
"""
Verify direct key access works in conditional edges (e.g., 'score > 80').
This is the core regression test for issue #3599. Before the fix,
node outputs were only written to memory during input mapping (after
edge evaluation), causing NameError when edges tried to access keys directly.
"""
goal = Goal(
id="test-direct-key",
name="Test Direct Key Access",
description="Test that direct key access works in conditional edges",
)
nodes = [
NodeSpec(
id="score_node",
name="ScoreNode",
description="Outputs a score",
node_type="function",
output_keys=["score"],
),
NodeSpec(
id="high_score_node",
name="HighScoreNode",
description="Handles high scores",
node_type="function",
input_keys=["score"],
output_keys=["result"],
),
]
# Edge with DIRECT key access: 'score > 80' (not 'output["score"] > 80')
edges = [
EdgeSpec(
id="score_to_high",
source="score_node",
target="high_score_node",
condition=EdgeCondition.CONDITIONAL,
condition_expr="score > 80", # Direct key access
)
]
graph = GraphSpec(
id="test-graph",
goal_id="test-direct-key",
entry_node="score_node",
nodes=nodes,
edges=edges,
terminal_nodes=["high_score_node"],
)
runtime = SimpleRuntime(storage_path="/tmp/test")
executor = GraphExecutor(runtime=runtime)
executor.register_node("score_node", ScoreNode())
executor.register_node("high_score_node", HighScoreNode())
result = await executor.execute(graph, goal, {})
# Verify the edge was followed (high_score_node executed)
assert result.success, "Execution should succeed"
assert "high_score_node" in result.path, (
f"Expected high_score_node in path. "
f"Condition 'score > 80' should evaluate to True (score=85). "
f"Path: {result.path}"
)
@pytest.mark.asyncio
async def test_backward_compatibility_output_syntax():
"""
Verify backward compatibility: output['key'] syntax still works.
The fix should not break existing code that uses the explicit
output dictionary syntax in conditional expressions.
"""
goal = Goal(
id="test-backward-compat",
name="Test Backward Compatibility",
description="Test that output['key'] syntax still works",
)
nodes = [
NodeSpec(
id="score_node",
name="ScoreNode",
description="Outputs a score",
node_type="function",
output_keys=["score"],
),
NodeSpec(
id="consumer_node",
name="ConsumerNode",
description="Consumer",
node_type="function",
input_keys=["score"],
output_keys=["processed"],
),
]
# Edge with OLD syntax: output['score'] > 80
edges = [
EdgeSpec(
id="score_to_consumer",
source="score_node",
target="consumer_node",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output['score'] > 80", # Old explicit syntax
)
]
graph = GraphSpec(
id="test-graph-compat",
goal_id="test-backward-compat",
entry_node="score_node",
nodes=nodes,
edges=edges,
terminal_nodes=["consumer_node"],
)
runtime = SimpleRuntime(storage_path="/tmp/test")
executor = GraphExecutor(runtime=runtime)
executor.register_node("score_node", ScoreNode())
executor.register_node("consumer_node", ConsumerNode())
result = await executor.execute(graph, goal, {})
# Verify backward compatibility maintained
assert result.success, "Execution should succeed"
assert "consumer_node" in result.path, (
f"Expected consumer_node in path. "
f"Old syntax output['score'] > 80 should still work. "
f"Path: {result.path}"
)
@pytest.mark.asyncio
async def test_multiple_keys_in_expression():
"""
Verify multiple direct keys work in complex expressions.
Tests that expressions like 'x > y and y < 100' work correctly
when both x and y are written to memory before edge evaluation.
"""
goal = Goal(
id="test-multi-key",
name="Test Multiple Keys",
description="Test multiple keys in conditional expression",
)
nodes = [
NodeSpec(
id="multi_key_node",
name="MultiKeyNode",
description="Outputs multiple keys",
node_type="function",
output_keys=["x", "y"],
),
NodeSpec(
id="consumer_node",
name="ConsumerNode",
description="Consumer",
node_type="function",
input_keys=["x", "y"],
output_keys=["processed"],
),
]
# Complex expression with multiple direct keys
edges = [
EdgeSpec(
id="multi_to_consumer",
source="multi_key_node",
target="consumer_node",
condition=EdgeCondition.CONDITIONAL,
condition_expr="x > y and y < 100", # Multiple keys
)
]
graph = GraphSpec(
id="test-graph-multi",
goal_id="test-multi-key",
entry_node="multi_key_node",
nodes=nodes,
edges=edges,
terminal_nodes=["consumer_node"],
)
runtime = SimpleRuntime(storage_path="/tmp/test")
executor = GraphExecutor(runtime=runtime)
executor.register_node("multi_key_node", MultiKeyNode())
executor.register_node("consumer_node", ConsumerNode())
result = await executor.execute(graph, goal, {})
# Verify multiple keys work correctly
assert result.success, "Execution should succeed"
assert "consumer_node" in result.path, (
f"Expected consumer_node in path. "
f"Condition 'x > y and y < 100' should be True (x=100, y=50). "
f"Path: {result.path}"
)
@pytest.mark.asyncio
async def test_negative_case_condition_false():
"""
Verify conditions correctly evaluate to False when not met.
Tests that when a condition fails, the edge is NOT followed
and execution doesn't proceed to the target node.
"""
goal = Goal(
id="test-negative",
name="Test Negative Case",
description="Test condition evaluates to False correctly",
)
class LowScoreNode(NodeProtocol):
"""Node that outputs a LOW score."""
async def execute(self, ctx: NodeContext) -> NodeResult:
return NodeResult(success=True, output={"score": 30})
nodes = [
NodeSpec(
id="low_score_node",
name="LowScoreNode",
description="Outputs low score",
node_type="function",
output_keys=["score"],
),
NodeSpec(
id="high_score_handler",
name="HighScoreHandler",
description="Should NOT execute",
node_type="function",
input_keys=["score"],
output_keys=["result"],
),
]
# Condition should be FALSE (30 is not > 80)
edges = [
EdgeSpec(
id="low_to_high",
source="low_score_node",
target="high_score_handler",
condition=EdgeCondition.CONDITIONAL,
condition_expr="score > 80", # Should be False
)
]
graph = GraphSpec(
id="test-graph-negative",
goal_id="test-negative",
entry_node="low_score_node",
nodes=nodes,
edges=edges,
terminal_nodes=["high_score_handler"],
)
runtime = SimpleRuntime(storage_path="/tmp/test")
executor = GraphExecutor(runtime=runtime)
executor.register_node("low_score_node", LowScoreNode())
executor.register_node("high_score_handler", HighScoreNode())
result = await executor.execute(graph, goal, {})
# Verify condition correctly evaluated to False
assert result.success, "Execution should succeed"
assert "high_score_handler" not in result.path, (
f"high_score_handler should NOT be in path. "
f"Condition 'score > 80' should be False (score=30). "
f"Path: {result.path}"
)
+5 -11
View File
@@ -8,7 +8,6 @@ Set HIVE_TEST_LLM_MODEL=<model> to override the real model.
from __future__ import annotations
import asyncio
import os
from collections.abc import AsyncIterator, Callable
from dataclasses import dataclass
@@ -508,7 +507,7 @@ async def test_event_loop_set_output():
assert result.success
if USE_MOCK_LLM:
assert result.output == {"lead_score": "87", "company": "TechCorp"}
assert result.output == {"lead_score": 87, "company": "TechCorp"}
else:
assert "lead_score" in result.output
assert "company" in result.output
@@ -549,7 +548,7 @@ async def test_event_loop_missing_output_keys_retried():
assert "score" in result.output
assert "reason" in result.output
if USE_MOCK_LLM:
assert result.output["score"] == "87"
assert result.output["score"] == 87
assert result.output["reason"] == "good fit"
@@ -920,7 +919,7 @@ async def test_context_handoff_between_nodes(runtime):
assert "lead_score" in result.output
assert "strategy" in result.output
if USE_MOCK_LLM:
assert result.output["lead_score"] == "92"
assert result.output["lead_score"] == 92
assert result.output["strategy"] == "premium"
@@ -952,14 +951,9 @@ async def test_client_facing_node_streams_output():
config=LoopConfig(max_iterations=5),
)
# client_facing + text-only blocks for user input; use shutdown to unblock
async def auto_shutdown():
await asyncio.sleep(0.05)
node.signal_shutdown()
task = asyncio.create_task(auto_shutdown())
# Text-only on client_facing no longer blocks (no ask_user called),
# so the node completes without needing a shutdown workaround.
result = await node.execute(ctx)
await task
assert result.success
+122 -33
View File
@@ -316,7 +316,7 @@ class TestSetOutput:
result = await node.execute(ctx)
assert result.success is True
assert result.output["result"] == "42"
assert result.output["result"] == 42
@pytest.mark.asyncio
async def test_set_output_rejects_invalid_key(self, runtime, node_spec, memory):
@@ -447,14 +447,9 @@ class TestEventBusLifecycle:
ctx = build_ctx(runtime, spec, memory, llm)
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
# client_facing + text-only blocks for user input; use shutdown to unblock
async def auto_shutdown():
await asyncio.sleep(0.05)
node.signal_shutdown()
task = asyncio.create_task(auto_shutdown())
# Text-only on client_facing no longer blocks (no ask_user), so
# the node completes without needing shutdown.
await node.execute(ctx)
await task
assert EventType.CLIENT_OUTPUT_DELTA in received_types
assert EventType.LLM_TEXT_DELTA not in received_types
@@ -480,11 +475,38 @@ class TestClientFacingBlocking:
)
@pytest.mark.asyncio
async def test_client_facing_blocks_on_text(self, runtime, memory, client_spec):
"""client_facing + text-only response blocks until inject_event."""
async def test_text_only_no_blocking(self, runtime, memory, client_spec):
"""client_facing + text-only (no ask_user) should NOT block."""
llm = MockStreamingLLM(
scenarios=[
text_scenario("Hello!"),
text_scenario("Hello! Here is your status update."),
]
)
bus = EventBus()
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
ctx = build_ctx(runtime, client_spec, memory, llm)
# Should complete without blocking — no ask_user called, no output_keys required
result = await node.execute(ctx)
assert result.success is True
assert llm._call_index >= 1
@pytest.mark.asyncio
async def test_ask_user_triggers_blocking(self, runtime, memory, client_spec):
"""client_facing + ask_user() blocks until inject_event."""
# Give the node an output key so the judge doesn't auto-accept
# after the user responds — it needs set_output first.
client_spec.output_keys = ["answer"]
llm = MockStreamingLLM(
scenarios=[
# Turn 1: LLM greets user and calls ask_user
tool_call_scenario(
"ask_user", {"question": "What do you need?"}, tool_use_id="ask_1"
),
# Turn 2: after user responds, LLM processes and sets output
tool_call_scenario("set_output", {"key": "answer", "value": "help provided"}),
# Turn 3: text finish (implicit judge accepts — output key set)
text_scenario("Got your message."),
]
)
@@ -495,20 +517,19 @@ class TestClientFacingBlocking:
async def user_responds():
await asyncio.sleep(0.05)
await node.inject_event("I need help")
await asyncio.sleep(0.05)
node.signal_shutdown()
user_task = asyncio.create_task(user_responds())
result = await node.execute(ctx)
await user_task
assert result.success is True
# LLM should have been called at least twice (first response + after inject)
# LLM called at least twice: once for ask_user turn, once after user responded
assert llm._call_index >= 2
assert result.output["answer"] == "help provided"
@pytest.mark.asyncio
async def test_client_facing_does_not_block_on_tools(self, runtime, memory):
"""client_facing + tool calls should NOT block — judge evaluates normally."""
"""client_facing + tool calls (no ask_user) should NOT block."""
spec = NodeSpec(
id="chat",
name="Chat",
@@ -517,10 +538,9 @@ class TestClientFacingBlocking:
output_keys=["result"],
client_facing=True,
)
# Scenario 1: LLM calls set_output (tool call present → no blocking, judge RETRYs)
# Scenario 2: LLM produces text (implicit judge sees output key set → ACCEPT)
# But scenario 2 is text-only on client_facing → would block.
# So we need shutdown to handle that case.
# Scenario 1: LLM calls set_output
# Scenario 2: LLM produces text implicit judge ACCEPTs (output key set)
# No ask_user called, so no blocking occurs.
llm = MockStreamingLLM(
scenarios=[
tool_call_scenario("set_output", {"key": "result", "value": "done"}),
@@ -530,18 +550,8 @@ class TestClientFacingBlocking:
node = EventLoopNode(config=LoopConfig(max_iterations=5))
ctx = build_ctx(runtime, spec, memory, llm)
# After set_output, implicit judge RETRYs (tool calls present).
# Next turn: text-only on client_facing → blocks.
# But implicit judge should ACCEPT first (output key is set, no tools).
# Actually, client_facing check happens BEFORE judge, so it blocks.
# Use shutdown as safety net.
async def auto_shutdown():
await asyncio.sleep(0.1)
node.signal_shutdown()
task = asyncio.create_task(auto_shutdown())
# Should complete without blocking — no ask_user called
result = await node.execute(ctx)
await task
assert result.success is True
assert result.output["result"] == "done"
@@ -567,7 +577,11 @@ class TestClientFacingBlocking:
@pytest.mark.asyncio
async def test_signal_shutdown_unblocks(self, runtime, memory, client_spec):
"""signal_shutdown should unblock a waiting client_facing node."""
llm = MockStreamingLLM(scenarios=[text_scenario("Waiting...")])
llm = MockStreamingLLM(
scenarios=[
tool_call_scenario("ask_user", {"question": "Waiting..."}, tool_use_id="ask_1"),
]
)
bus = EventBus()
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=10))
ctx = build_ctx(runtime, client_spec, memory, llm)
@@ -584,8 +598,12 @@ class TestClientFacingBlocking:
@pytest.mark.asyncio
async def test_client_input_requested_event_published(self, runtime, memory, client_spec):
"""CLIENT_INPUT_REQUESTED should be published when blocking."""
llm = MockStreamingLLM(scenarios=[text_scenario("Hello!")])
"""CLIENT_INPUT_REQUESTED should be published when ask_user blocks."""
llm = MockStreamingLLM(
scenarios=[
tool_call_scenario("ask_user", {"question": "Hello!"}, tool_use_id="ask_1"),
]
)
bus = EventBus()
received = []
@@ -611,6 +629,77 @@ class TestClientFacingBlocking:
assert len(received) >= 1
assert received[0].type == EventType.CLIENT_INPUT_REQUESTED
@pytest.mark.asyncio
async def test_ask_user_with_real_tools(self, runtime, memory):
"""ask_user alongside real tool calls still triggers blocking."""
spec = NodeSpec(
id="chat",
name="Chat",
description="chat node",
node_type="event_loop",
output_keys=[],
client_facing=True,
)
# LLM calls a real tool AND ask_user in the same turn
llm = MockStreamingLLM(
scenarios=[
[
ToolCallEvent(
tool_use_id="tool_1", tool_name="search", tool_input={"q": "test"}
),
ToolCallEvent(tool_use_id="ask_1", tool_name="ask_user", tool_input={}),
FinishEvent(
stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"
),
],
text_scenario("Done"),
]
)
def my_executor(tool_use: ToolUse) -> ToolResult:
return ToolResult(tool_use_id=tool_use.id, content="result", is_error=False)
node = EventLoopNode(
tool_executor=my_executor,
config=LoopConfig(max_iterations=5),
)
ctx = build_ctx(
runtime, spec, memory, llm, tools=[Tool(name="search", description="", parameters={})]
)
async def unblock():
await asyncio.sleep(0.05)
await node.inject_event("user input")
task = asyncio.create_task(unblock())
result = await node.execute(ctx)
await task
assert result.success is True
assert llm._call_index >= 2
@pytest.mark.asyncio
async def test_ask_user_not_available_non_client_facing(self, runtime, memory):
"""ask_user tool should NOT be injected for non-client-facing nodes."""
spec = NodeSpec(
id="internal",
name="Internal",
description="internal node",
node_type="event_loop",
output_keys=[],
)
llm = MockStreamingLLM(scenarios=[text_scenario("thinking...")])
node = EventLoopNode(config=LoopConfig(max_iterations=2))
ctx = build_ctx(runtime, spec, memory, llm)
await node.execute(ctx)
# Verify ask_user was NOT in the tools passed to the LLM
assert llm._call_index >= 1
for call in llm.stream_calls:
tool_names = [t.name for t in (call["tools"] or [])]
assert "ask_user" not in tool_names
# ===========================================================================
# Tool execution
+12
View File
@@ -37,6 +37,10 @@ class TestRuntimeBasics:
runtime.end_run(success=True)
assert runtime.current_run is None
@pytest.mark.skip(
reason="FileStorage.save_run() is deprecated and now a no-op. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
def test_run_saved_on_end(self, tmp_path: Path):
"""Run is saved to storage when ended."""
runtime = Runtime(tmp_path)
@@ -341,6 +345,10 @@ class TestConvenienceMethods:
class TestNarrativeGeneration:
"""Test automatic narrative generation."""
@pytest.mark.skip(
reason="FileStorage.save_run() and get_runs_by_goal() are deprecated. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
def test_default_narrative_success(self, tmp_path: Path):
"""Test default narrative for successful run."""
runtime = Runtime(tmp_path)
@@ -360,6 +368,10 @@ class TestNarrativeGeneration:
run = runtime.storage.load_run(runtime.storage.get_runs_by_goal("test_goal")[0])
assert "completed successfully" in run.narrative
@pytest.mark.skip(
reason="FileStorage.save_run() and get_runs_by_goal() are deprecated. "
"New sessions use unified storage at sessions/{session_id}/state.json"
)
def test_default_narrative_failure(self, tmp_path: Path):
"""Test default narrative for failed run."""
runtime = Runtime(tmp_path)
File diff suppressed because it is too large Load Diff
+16 -1
View File
@@ -1,4 +1,9 @@
"""Tests for the storage module - FileStorage and ConcurrentStorage backends."""
"""Tests for the storage module - FileStorage and ConcurrentStorage backends.
DEPRECATED: FileStorage and ConcurrentStorage are deprecated.
New sessions use unified storage at sessions/{session_id}/state.json.
These tests are kept for backward compatibility verification only.
"""
import json
import time
@@ -38,6 +43,7 @@ def create_test_run(
# === FILESTORAGE TESTS ===
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
class TestFileStorageBasics:
"""Test basic FileStorage operations."""
@@ -57,6 +63,7 @@ class TestFileStorageBasics:
assert storage.base_path == tmp_path
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
class TestFileStorageRunOperations:
"""Test FileStorage run CRUD operations."""
@@ -155,6 +162,7 @@ class TestFileStorageRunOperations:
assert result is False
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
class TestFileStorageIndexing:
"""Test FileStorage index operations."""
@@ -259,6 +267,7 @@ class TestFileStorageIndexing:
assert storage.get_runs_by_node("nonexistent") == []
@pytest.mark.skip(reason="FileStorage is deprecated - use unified session storage")
class TestFileStorageListOperations:
"""Test FileStorage list operations."""
@@ -323,6 +332,7 @@ class TestCacheEntry:
# === CONCURRENTSTORAGE TESTS ===
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageBasics:
"""Test basic ConcurrentStorage operations."""
@@ -367,6 +377,7 @@ class TestConcurrentStorageBasics:
assert storage._running is False
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageRunOperations:
"""Test ConcurrentStorage run operations."""
@@ -471,6 +482,7 @@ class TestConcurrentStorageRunOperations:
await storage.stop()
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageQueryOperations:
"""Test ConcurrentStorage query operations."""
@@ -526,6 +538,7 @@ class TestConcurrentStorageQueryOperations:
await storage.stop()
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageCacheManagement:
"""Test ConcurrentStorage cache management."""
@@ -565,6 +578,7 @@ class TestConcurrentStorageCacheManagement:
assert stats["valid_entries"] == 1
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageSyncAPI:
"""Test ConcurrentStorage synchronous API for backward compatibility."""
@@ -598,6 +612,7 @@ class TestConcurrentStorageSyncAPI:
assert loaded is None
@pytest.mark.skip(reason="ConcurrentStorage is deprecated - wraps deprecated FileStorage")
class TestConcurrentStorageStats:
"""Test ConcurrentStorage statistics."""
+9 -9
View File
@@ -13,26 +13,26 @@ The Aden server handles OAuth2 authorization code flows (user login, consent, to
```
┌─────────────────────────────────────────────────────────────────┐
│ Local Agent Environment
│ Local Agent Environment │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ CredentialStore │ │
│ │ CredentialStore │ │
│ │ ┌────────────────────┐ ┌────────────────────────────┐ │ │
│ │ │EncryptedFileStorage│ │ AdenSyncProvider │ │ │
│ │ │ (local cache) │ │ - Fetches from Aden │ │ │
│ │ │ ~/.hive/creds │ │ - Delegates refresh │ │ │
│ │ │ ~/.hive/credentials│ │ - Delegates refresh │ │ │
│ │ └────────────────────┘ │ - Reports usage │ │ │
│ │ └─────────────┬──────────────┘ │ │
│ └────────────────────────────────────────┼─────────────────┘ │
│ │
└───────────────────────────────────────────┼─────────────────────
│ │ │
└───────────────────────────────────────────┼─────────────────────┘
│ HTTPS
┌─────────────────────────────────────────────────────────────────┐
│ Aden Server
│ Aden Server │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Integration Management │ │
│ │ Integration Management │ │
│ │ - HubSpot, GitHub, Slack, etc. │ │
│ │ - Handles OAuth2 auth code flow │ │
│ │ - Stores refresh tokens securely │ │
+172
View File
@@ -0,0 +1,172 @@
# Agent Runtime
Unified execution system for all Hive agents. Every agent — single-entry or multi-entry, headless or TUI — runs through the same runtime stack.
## Topology
```
AgentRunner.load(agent_path)
|
AgentRunner
(factory + public API)
|
_setup_agent_runtime()
|
AgentRuntime
(lifecycle + orchestration)
/ | \\
Stream A Stream B Stream C ← one per entry point
| | |
GraphExecutor GraphExecutor GraphExecutor
| | |
Node → Node → Node (graph traversal)
```
Single-entry agents get a `"default"` entry point automatically. There is no separate code path.
## Components
| Component | File | Role |
| --- | --- | --- |
| `AgentRunner` | `runner/runner.py` | Load agents, configure tools/LLM, expose high-level API |
| `AgentRuntime` | `runtime/agent_runtime.py` | Lifecycle management, entry point routing, event bus |
| `ExecutionStream` | `runtime/execution_stream.py` | Per-entry-point execution queue, session persistence |
| `GraphExecutor` | `graph/executor.py` | Node traversal, tool dispatch, checkpointing |
| `EventBus` | `runtime/event_bus.py` | Pub/sub for execution events (streaming, I/O) |
| `SharedStateManager` | `runtime/shared_state.py` | Cross-stream state with isolation levels |
| `OutcomeAggregator` | `runtime/outcome_aggregator.py` | Goal progress tracking across streams |
| `SessionStore` | `storage/session_store.py` | Session state persistence (`sessions/{id}/state.json`) |
## Programming Interface
### AgentRunner (high-level)
```python
from framework.runner import AgentRunner
# Load and run
runner = AgentRunner.load("exports/my_agent", model="anthropic/claude-sonnet-4-20250514")
result = await runner.run({"query": "hello"})
# Resume from paused session
result = await runner.run({"query": "continue"}, session_state=saved_state)
# Lifecycle
await runner.start() # Start the runtime
await runner.stop() # Stop the runtime
exec_id = await runner.trigger("default", {}) # Non-blocking trigger
progress = await runner.get_goal_progress() # Goal evaluation
entry_points = runner.get_entry_points() # List entry points
# Context manager
async with AgentRunner.load("exports/my_agent") as runner:
result = await runner.run({"query": "hello"})
# Cleanup
runner.cleanup() # Synchronous
await runner.cleanup_async() # Asynchronous
```
### AgentRuntime (lower-level)
```python
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
# Create runtime with entry points
runtime = create_agent_runtime(
graph=graph,
goal=goal,
storage_path=Path("~/.hive/agents/my_agent"),
entry_points=[
EntryPointSpec(id="default", name="Default", entry_node="start", trigger_type="manual"),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=checkpoint_config,
)
# Lifecycle
await runtime.start()
await runtime.stop()
# Execution
exec_id = await runtime.trigger("default", {"query": "hello"}) # Non-blocking
result = await runtime.trigger_and_wait("default", {"query": "hello"}) # Blocking
result = await runtime.trigger_and_wait("default", {}, session_state=state) # Resume
# Client-facing node I/O
await runtime.inject_input(node_id="chat", content="user response")
# Events
sub_id = runtime.subscribe_to_events(
event_types=[EventType.CLIENT_OUTPUT_DELTA],
handler=my_handler,
)
runtime.unsubscribe_from_events(sub_id)
# Inspection
runtime.is_running # bool
runtime.event_bus # EventBus
runtime.state_manager # SharedStateManager
runtime.get_stats() # Runtime statistics
```
## Execution Flow
1. `AgentRunner.run()` calls `AgentRuntime.trigger_and_wait()`
2. `AgentRuntime` routes to the `ExecutionStream` for the entry point
3. `ExecutionStream` creates a `GraphExecutor` and calls `execute()`
4. `GraphExecutor` traverses nodes, dispatches tools, manages checkpoints
5. `ExecutionResult` flows back up through the stack
6. `ExecutionStream` writes session state to disk
## Session Resume
All execution paths support session resume:
```python
# First run (agent pauses at a client-facing node)
result = await runner.run({"query": "start task"})
# result.paused_at = "review-node"
# result.session_state = {"memory": {...}, "paused_at": "review-node", ...}
# Resume
result = await runner.run({"input": "approved"}, session_state=result.session_state)
```
Session state flows: `AgentRunner.run()``AgentRuntime.trigger_and_wait()``ExecutionStream.execute()``GraphExecutor.execute()`.
Checkpoints are saved at node boundaries (`sessions/{id}/checkpoints/`) for crash recovery.
## Event Bus
The `EventBus` provides real-time execution visibility:
| Event | When |
| --- | --- |
| `NODE_STARTED` | Node begins execution |
| `NODE_COMPLETED` | Node finishes |
| `TOOL_CALL_STARTED` | Tool invocation begins |
| `TOOL_CALL_COMPLETED` | Tool invocation finishes |
| `CLIENT_OUTPUT_DELTA` | Agent streams text to user |
| `CLIENT_INPUT_REQUESTED` | Agent needs user input |
| `EXECUTION_COMPLETED` | Full execution finishes |
In headless mode, `AgentRunner` subscribes to `CLIENT_OUTPUT_DELTA` and `CLIENT_INPUT_REQUESTED` to print output and read stdin. In TUI mode, `AdenTUI` subscribes to route events to UI widgets.
## Storage Layout
```
~/.hive/agents/{agent_name}/
sessions/
session_YYYYMMDD_HHMMSS_{uuid}/
state.json # Session state (status, memory, progress)
checkpoints/ # Node-boundary snapshots
logs/
summary.json # Execution summary
details.jsonl # Detailed event log
tool_logs.jsonl # Tool call log
runtime_logs/ # Cross-session runtime logs
```
+35 -14
View File
@@ -5,12 +5,31 @@ Aden Hive is a Python-based agent framework. Configuration is handled through en
## Configuration Overview
```
Environment variables (API keys, runtime flags)
Agent config.py (per-agent settings: model, tools, storage)
pyproject.toml (package metadata and dependencies)
.mcp.json (MCP server connections)
~/.hive/configuration.json (global defaults: provider, model, max_tokens)
Environment variables (API keys, runtime flags)
Agent config.py (per-agent settings: model, tools, storage)
pyproject.toml (package metadata and dependencies)
.mcp.json (MCP server connections)
```
## Global Configuration (~/.hive/configuration.json)
The `quickstart.sh` script creates this file during setup. It stores the default LLM provider, model, and max_tokens used by all agents unless overridden in an agent's own `config.py`.
```json
{
"llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 8192,
"api_key_env_var": "ANTHROPIC_API_KEY"
},
"created_at": "2026-01-15T12:00:00+00:00"
}
```
The default `max_tokens` value (8192) is defined as `DEFAULT_MAX_TOKENS` in `framework.graph.edge` and re-exported from `framework.graph`. Each agent's `RuntimeConfig` reads from this file at startup. To change defaults, either re-run `quickstart.sh` or edit the file directly.
## Environment Variables
### LLM Providers (at least one required for real execution)
@@ -61,14 +80,16 @@ Each agent package in `exports/` contains its own `config.py`:
```python
# exports/my_agent/config.py
CONFIG = {
"model": "claude-haiku-4-5-20251001", # Default LLM model
"max_tokens": 4096,
"model": "anthropic/claude-sonnet-4-5-20250929", # Default LLM model
"max_tokens": 8192, # default: DEFAULT_MAX_TOKENS from framework.graph
"temperature": 0.7,
"tools": ["web_search", "pdf_read"], # MCP tools to enable
"storage_path": "/tmp/my_agent", # Runtime data location
}
```
If `model` or `max_tokens` are omitted, the agent loads defaults from `~/.hive/configuration.json`.
### Agent Graph Specification
Agent behavior is defined in `agent.json` (or constructed in `agent.py`):
@@ -96,14 +117,14 @@ MCP (Model Context Protocol) servers are configured in `.mcp.json` at the projec
{
"mcpServers": {
"agent-builder": {
"command": "core/.venv/bin/python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "."
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": "tools/.venv/bin/python",
"args": ["-m", "aden_tools.mcp_server", "--stdio"],
"cwd": "."
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
@@ -152,7 +173,7 @@ Add to `.vscode/settings.json`:
1. **Never commit API keys** - Use environment variables or `.env` files
2. **`.env` is git-ignored** - Copy `.env.example` to `.env` at the project root and fill in your values
3. **Mock mode for testing** - Set `MOCK_MODE=1` to avoid LLM calls during development
3. **Use real provider keys in non-production environments** - validate configuration with low-risk inputs before production rollout
4. **Credential isolation** - Each tool validates its own credentials at runtime
## Troubleshooting
@@ -187,4 +208,4 @@ Run from the project root with PYTHONPATH:
PYTHONPATH=exports uv run python -m my_agent validate
```
See [Environment Setup](../ENVIRONMENT_SETUP.md) for detailed installation instructions.
See [Environment Setup](./environment-setup.md) for detailed installation instructions.
+48 -60
View File
@@ -20,12 +20,12 @@ This guide covers everything you need to know to develop with the Aden Agent Fra
Aden Agent Framework is a Python-based system for building goal-driven, self-improving AI agents.
| Package | Directory | Description | Tech Stack |
| ------------- | ---------- | --------------------------------------- | ------------ |
| **framework** | `/core` | Core runtime, graph executor, protocols | Python 3.11+ |
| **tools** | `/tools` | MCP tools for agent capabilities | Python 3.11+ |
| Package | Directory | Description | Tech Stack |
| ------------- | ---------- | ----------------------------------------- | ------------ |
| **framework** | `/core` | Core runtime, graph executor, protocols | Python 3.11+ |
| **tools** | `/tools` | MCP tools for agent capabilities | Python 3.11+ |
| **exports** | `/exports` | Agent packages (user-created, gitignored) | Python 3.11+ |
| **skills** | `.claude` | Claude Code skills for building/testing | Markdown |
| **skills** | `.claude` | Claude Code skills for building/testing | Markdown |
### Key Principles
@@ -101,11 +101,20 @@ Get API keys:
This installs agent-related Claude Code skills:
- `/building-agents-core` - Fundamental agent concepts
- `/building-agents-construction` - Step-by-step agent building
- `/building-agents-patterns` - Best practices and design patterns
- `/testing-agent` - Test and validate agents
- `/agent-workflow` - End-to-end guided workflow
- `/hive` - Complete workflow for building agents
- `/hive-create` - Step-by-step agent building
- `/hive-concepts` - Fundamental agent concepts
- `/hive-patterns` - Best practices and design patterns
- `/hive-test` - Test and validate agents
### Cursor IDE Support
Skills are also available in Cursor. To enable:
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
2. Run `MCP: Enable` to enable MCP servers
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
### Verify Setup
@@ -115,7 +124,7 @@ uv run python -c "import framework; print('✓ framework OK')"
uv run python -c "import aden_tools; print('✓ aden_tools OK')"
uv run python -c "import litellm; print('✓ litellm OK')"
# Run an agent (after building one via /building-agents-construction)
# Run an agent (after building one via /hive-create)
PYTHONPATH=exports uv run python -m your_agent_name validate
```
@@ -140,21 +149,11 @@ hive/ # Repository root
├── .claude/ # Claude Code Skills
│ └── skills/ # Skills for building
│ ├── building-agents-core/
| | ├── SKILL.md # Main skill definition
| └── examples
│ ├── building-agents-patterns/
| | ├── SKILL.md
│ | └── examples
│ ├── building-agents-construction/
| | ├── SKILL.md
│ | └── examples
│ ├── testing-agent/ # Skills for testing agents
│ │ ├── SKILL.md
│ | └── examples
│ └── agent-workflow/ # Complete workflow
| ├── SKILL.md
│ └── examples
│ ├── hive/ # Complete workflow
├── hive-create/ # Step-by-step build guide
├── hive-concepts/ # Fundamental concepts
│ ├── hive-patterns/ # Best practices
│ └── hive-test/ # Test and validate agents
├── core/ # CORE FRAMEWORK PACKAGE
│ ├── framework/ # Main package code
@@ -164,10 +163,12 @@ hive/ # Repository root
│ │ ├── llm/ # LLM provider integrations (Anthropic, OpenAI, etc.)
│ │ ├── mcp/ # MCP server integration
│ │ ├── runner/ # AgentRunner - loads and runs agents
| | ├── observability/ # Structured logging - human-readable and machine-parseable tracing
│ │ ├── runtime/ # Runtime environment
│ │ ├── schemas/ # Data schemas
│ │ ├── storage/ # File-based persistence
│ │ ├── testing/ # Testing utilities
│ │ ├── tui/ # Terminal UI dashboard
│ │ └── __init__.py
│ ├── pyproject.toml # Package metadata and dependencies
│ ├── README.md # Framework documentation
@@ -188,7 +189,10 @@ hive/ # Repository root
│ └── README.md # Tools documentation
├── exports/ # AGENT PACKAGES (user-created, gitignored)
│ └── your_agent_name/ # Created via /building-agents-construction
│ └── your_agent_name/ # Created via /hive-create
├── examples/ # Example agents
│ └── templates/ # Pre-built template agents
├── docs/ # Documentation
│ ├── getting-started.md # Quick start guide
@@ -202,14 +206,11 @@ hive/ # Repository root
│ └── auto-close-duplicates.ts # GitHub duplicate issue closer
├── quickstart.sh # Interactive setup wizard
├── ENVIRONMENT_SETUP.md # Complete Python setup guide
├── README.md # Project overview
├── DEVELOPER.md # This file
├── CONTRIBUTING.md # Contribution guidelines
├── CHANGELOG.md # Version history
├── ROADMAP.md # Product roadmap
├── LICENSE # Apache 2.0 License
├── CODE_OF_CONDUCT.md # Community guidelines
├── docs/CODE_OF_CONDUCT.md # Community guidelines
└── SECURITY.md # Security policy
```
@@ -226,10 +227,10 @@ The fastest way to build agents is using the Claude Code skills:
./quickstart.sh
# Build a new agent
claude> /building-agents-construction
claude> /hive
# Test the agent
claude> /testing-agent
claude> /hive-test
```
### Agent Development Workflow
@@ -237,7 +238,7 @@ claude> /testing-agent
1. **Define Your Goal**
```
claude> /building-agents-construction
claude> /hive
Enter goal: "Build an agent that processes customer support tickets"
```
@@ -260,7 +261,7 @@ claude> /testing-agent
5. **Test the Agent**
```
claude> /testing-agent
claude> /hive-test
```
### Manual Agent Development
@@ -300,22 +301,19 @@ If you prefer to build agents manually:
### Running Agents
```bash
# Validate agent structure
PYTHONPATH=exports uv run python -m agent_name validate
# Browse and run agents interactively (Recommended)
hive tui
# Show agent information
PYTHONPATH=exports uv run python -m agent_name info
# Run a specific agent
hive run exports/my_agent --input '{"ticket_content": "My login is broken", "customer_id": "CUST-123"}'
# Run agent with input
PYTHONPATH=exports uv run python -m agent_name run --input '{
"ticket_content": "My login is broken",
"customer_id": "CUST-123"
}'
# Run with TUI dashboard
hive run exports/my_agent --tui
# Run in mock mode (no LLM calls)
PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
```
> **Using Python directly:** `PYTHONPATH=exports uv run python -m agent_name run --input '{...}'`
---
## Testing Agents
@@ -324,7 +322,7 @@ PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
```bash
# Run tests for an agent
claude> /testing-agent
claude> /hive-test
```
This generates and runs:
@@ -542,7 +540,7 @@ uv add <package>
```bash
# Option 1: Use Claude Code skill (recommended)
claude> /building-agents-construction
claude> /hive
# Option 2: Create manually
# Note: exports/ is initially empty (gitignored). Create your agent directory:
@@ -628,16 +626,10 @@ echo 'ANTHROPIC_API_KEY=your-key-here' >> .env
### Debugging Agent Execution
```python
# Add debug logging to your agent
import logging
logging.basicConfig(level=logging.DEBUG)
```bash
# Run with verbose output
PYTHONPATH=exports uv run python -m agent_name run --input '{...}' --verbose
hive run exports/my_agent --verbose --input '{"task": "..."}'
# Use mock mode to test without LLM calls
PYTHONPATH=exports uv run python -m agent_name run --mock --input '{...}'
```
---
@@ -657,8 +649,6 @@ kill -9 <PID>
# Or change ports in config.yaml and regenerate
```
### Environment Variables Not Loading
```bash
@@ -672,8 +662,6 @@ echo $ANTHROPIC_API_KEY
# Then add your API keys
```
---
## Getting Help
@@ -9,8 +9,8 @@ Complete setup guide for building and running goal-driven agents with the Aden A
./quickstart.sh
```
> **Note for Windows Users:**
> Running the setup script on native Windows shells (PowerShell / Git Bash) may sometimes fail due to Python App Execution Aliases.
> **Note for Windows Users:**
> Running the setup script on native Windows shells (PowerShell / Git Bash) may sometimes fail due to Python App Execution Aliases.
> It is **strongly recommended to use WSL (Windows Subsystem for Linux)** for a smoother setup experience.
This will:
@@ -18,6 +18,8 @@ This will:
- Check Python version (requires 3.11+)
- Install the core framework package (`framework`)
- Install the tools package (`aden_tools`)
- Initialize encrypted credential store (`~/.hive/credentials`)
- Configure default LLM provider
- Fix package compatibility issues (openai + litellm)
- Verify all installations
@@ -39,17 +41,22 @@ Windows users should use **WSL (Windows Subsystem for Linux)** to set up and run
If you are using Alpine Linux (e.g., inside a Docker container), you must install system dependencies and use a virtual environment before running the setup script:
1. Install System Dependencies:
```bash
apk update
apk add bash git python3 py3-pip nodejs npm curl build-base python3-dev linux-headers libffi-dev
```
2. Set up Virtual Environment (Required for Python 3.12+):
```
uv venv
source .venv/bin/activate
# uv handles pip/setuptools/wheel automatically
```
3. Run the Quickstart Script:
```
./quickstart.sh
```
@@ -87,7 +94,7 @@ uv run python -c "import aden_tools; print('✓ aden_tools OK')"
uv run python -c "import litellm; print('✓ litellm OK')"
```
> **Windows Tip:**
> **Windows Tip:**
> On Windows, if the verification commands fail, ensure you are running them in **WSL** or after **disabling Python App Execution Aliases** in Windows Settings → Apps → App Execution Aliases.
## Requirements
@@ -105,23 +112,38 @@ uv run python -c "import litellm; print('✓ litellm OK')"
- Internet connection (for LLM API calls)
- For Windows users: WSL 2 is recommended for full compatibility.
### API Keys (Optional)
### API Keys
For running agents with real LLMs:
```bash
export ANTHROPIC_API_KEY="your-key-here"
```
Windows (PowerShell):
```powershell
$env:ANTHROPIC_API_KEY="your-key-here"
```
We recommend using quickstart.sh for LLM API credential setup and /hive-credentials for the tools credentials
## Running Agents
All agent commands must be run from the project root with `PYTHONPATH` set:
The `hive` CLI is the primary interface for running agents:
```bash
# Browse and run agents interactively (Recommended)
hive tui
# Run a specific agent
hive run exports/my_agent --input '{"task": "Your input here"}'
# Run with TUI dashboard
hive run exports/my_agent --tui
```
### CLI Command Reference
| Command | Description |
|---------|-------------|
| `hive tui` | Browse agents and launch TUI dashboard |
| `hive run <path>` | Execute an agent (`--tui`, `--model`, `--mock`, `--quiet`, `--verbose`) |
| `hive shell [path]` | Interactive REPL (`--multi`, `--no-approve`) |
| `hive info <path>` | Show agent details |
| `hive validate <path>` | Validate agent structure |
| `hive list [dir]` | List available agents |
| `hive dispatch [dir]` | Multi-agent orchestration |
### Using Python directly (alternative)
```bash
# From /hive/ directory
@@ -135,24 +157,6 @@ $env:PYTHONPATH="core;exports"
python -m agent_name COMMAND
```
### Example: Support Ticket Agent
```bash
# Validate agent structure
PYTHONPATH=exports uv run python -m your_agent_name validate
# Show agent information
PYTHONPATH=exports uv run python -m your_agent_name info
# Run agent with input
PYTHONPATH=exports uv run python -m your_agent_name run --input '{
"task": "Your input here"
}'
# Run in mock mode (no LLM calls)
PYTHONPATH=exports uv run python -m your_agent_name run --mock --input '{...}'
```
## Building New Agents and Run Flow
Build and run an agent using Claude Code CLI with the agent building skills:
@@ -165,16 +169,25 @@ Build and run an agent using Claude Code CLI with the agent building skills:
This verifies agent-related Claude Code skills are available:
- `/building-agents-construction` - Step-by-step build guide
- `/building-agents-core` - Fundamental concepts
- `/building-agents-patterns` - Best practices
- `/testing-agent` - Test and validate agents
- `/agent-workflow` - Complete workflow
- `/hive` - Complete workflow for building agents
- `/hive-create` - Step-by-step build guide
- `/hive-concepts` - Fundamental concepts
- `/hive-patterns` - Best practices
- `/hive-test` - Test and validate agents
### Cursor IDE Support
Skills are also available in Cursor. To enable:
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
2. Run `MCP: Enable` to enable MCP servers
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
### 2. Build an Agent
```
claude> /building-agents-construction
claude> /hive
```
Follow the prompts to:
@@ -189,7 +202,7 @@ This step creates the initial agent structure required for further development.
### 3. Define Agent Logic
```
claude> /building-agents-core
claude> /hive-concepts
```
Follow the prompts to:
@@ -204,7 +217,7 @@ This step establishes the core concepts and rules needed before building an agen
### 4. Apply Agent Patterns
```
claude> /building-agents-patterns
claude> /hive-patterns
```
Follow the prompts to:
@@ -219,8 +232,9 @@ This step helps optimize agent design before final testing.
### 5. Test Your Agent
```
claude> /testing-agent
claude> /hive-test
```
Follow the prompts to:
1. Generate test guidelines for constraints and success criteria
@@ -230,21 +244,6 @@ Follow the prompts to:
This step verifies that the agent meets its goals before production use.
### 6. Agent Development Workflow (End-to-End)
```
claude> /agent-workflow
```
Follow the guided flow to:
1. Understand core agent concepts (optional)
2. Build the agent structure step by step
3. Apply best-practice design patterns (optional)
4. Test and validate the agent against its goals
This workflow orchestrates all agent-building skills to take you from idea → production-ready agent.
## Troubleshooting
### "externally-managed-environment" error (PEP 668)
@@ -362,13 +361,18 @@ hive/
│ ├── .venv/ # Created by quickstart.sh
│ └── pyproject.toml
── exports/ # Agent packages (user-created, gitignored)
└── your_agent_name/ # Created via /building-agents-construction
── exports/ # Agent packages (user-created, gitignored)
└── your_agent_name/ # Created via /hive-create
└── examples/
└── templates/ # Pre-built template agents
```
## Separate Virtual Environments
The project uses **separate virtual environments** for `core` and `tools` packages to:
Hive primarily uses **uv** to create and manage separate virtual environments for `core` and `tools`.
The project uses separate virtual environments to:
- Isolate dependencies and avoid conflicts
- Allow independent development and testing of each package
@@ -376,11 +380,18 @@ The project uses **separate virtual environments** for `core` and `tools` packag
### How It Works
When you run `./quickstart.sh` or `uv sync` in each directory:
When you run `./quickstart.sh`, `uv` sets up:
1. **core/.venv/** - Contains the `framework` package and its dependencies (anthropic, litellm, mcp, etc.)
2. **tools/.venv/** - Contains the `aden_tools` package and its dependencies (beautifulsoup4, pandas, etc.)
If you need to refresh environments manually, use `uv`:
```bash
cd core && uv sync
cd ../tools && uv sync
```
### Cross-Package Imports
The `core` and `tools` packages are **intentionally independent**:
@@ -389,38 +400,34 @@ The `core` and `tools` packages are **intentionally independent**:
- **Communication via MCP**: Tools are exposed to agents through MCP servers, not direct Python imports
- **Runtime integration**: The agent runner loads tools via the MCP protocol at runtime
If you need to use both packages in a single script (e.g., for testing), you have two options:
If you need to use both packages in a single script (e.g., for testing), prefer `uv run` with `PYTHONPATH`:
```bash
# Option 1: Install both in a shared environment
uv venv
source .venv/bin/activate
uv pip install -e core/ -e tools/
# Option 2: Use PYTHONPATH (for quick testing)
PYTHONPATH=tools/src uv run python your_script.py
```
### MCP Server Configuration
The `.mcp.json` at project root configures MCP servers to use their respective virtual environments:
The `.mcp.json` at project root configures MCP servers to run through `uv run` in each package directory:
```json
{
"mcpServers": {
"agent-builder": {
"command": "core/.venv/bin/python",
"args": ["-m", "framework.mcp.agent_builder_server"]
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": "tools/.venv/bin/python",
"args": ["-m", "aden_tools.mcp_server", "--stdio"]
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
```
This ensures each MCP server runs with its correct dependencies.
This ensures each MCP server runs with the correct project environment managed by `uv`.
### Why PYTHONPATH is Required
@@ -446,7 +453,7 @@ This design allows agents in `exports/` to be:
### 2. Build Agent (Claude Code)
```
claude> /building-agents-construction
claude> /hive
Enter goal: "Build an agent that processes customer support tickets"
```
@@ -459,13 +466,17 @@ PYTHONPATH=exports uv run python -m your_agent_name validate
### 4. Test Agent
```
claude> /testing-agent
claude> /hive-test
```
### 5. Run Agent
```bash
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
# Interactive dashboard
hive tui
# Or run directly
hive run exports/your_agent_name --input '{"task": "..."}'
```
## IDE Setup
@@ -513,11 +524,11 @@ export AGENT_STORAGE_PATH="/custom/storage"
## Additional Resources
- **Framework Documentation:** [core/README.md](core/README.md)
- **Tools Documentation:** [tools/README.md](tools/README.md)
- **Example Agents:** [exports/](exports/)
- **Agent Building Guide:** [.claude/skills/building-agents-construction/SKILL.md](.claude/skills/building-agents-construction/SKILL.md)
- **Testing Guide:** [.claude/skills/testing-agent/SKILL.md](.claude/skills/testing-agent/SKILL.md)
- **Framework Documentation:** [core/README.md](../core/README.md)
- **Tools Documentation:** [tools/README.md](../tools/README.md)
- **Example Agents:** [exports/](../exports/)
- **Agent Building Guide:** [.claude/skills/hive-create/SKILL.md](../.claude/skills/hive-create/SKILL.md)
- **Testing Guide:** [.claude/skills/hive-test/SKILL.md](../.claude/skills/hive-test/SKILL.md)
## Contributing
@@ -526,7 +537,7 @@ When contributing agent packages:
1. Place agents in `exports/agent_name/`
2. Follow the standard agent structure (see existing agents)
3. Include README.md with usage instructions
4. Add tests if using `/testing-agent`
4. Add tests if using `/hive-test`
5. Document required environment variables
## Support
+34 -33
View File
@@ -33,10 +33,11 @@ uv run python -c "import framework; import aden_tools; print('✓ Setup complete
# Setup already done via quickstart.sh above
# Start Claude Code and build an agent
claude> /building-agents-construction
claude> /hive
```
Follow the interactive prompts to:
1. Define your agent's goal
2. Design the workflow (nodes and edges)
3. Generate the agent package
@@ -52,7 +53,7 @@ mkdir -p exports/my_agent
# Create your agent structure
cd exports/my_agent
# Create agent.json, tools.py, README.md (see DEVELOPER.md for structure)
# Create agent.json, tools.py, README.md (see developer-guide.md for structure)
# Validate the agent
PYTHONPATH=exports uv run python -m my_agent validate
@@ -87,27 +88,31 @@ hive/
│ │ ├── runtime/ # Runtime environment
│ │ ├── schemas/ # Data schemas
│ │ ├── storage/ # File-based persistence
│ │ ── testing/ # Testing utilities
│ │ ── testing/ # Testing utilities
│ │ └── tui/ # Terminal UI dashboard
│ └── pyproject.toml # Package metadata
├── tools/ # MCP Tools Package
│ ├── mcp_server.py # MCP server entry point
│ └── src/aden_tools/ # Tools for agent capabilities
── tools/ # Individual tool implementations
├── web_search_tool/
├── web_scrape_tool/
└── file_system_toolkits/
│ └── mcp_server.py # HTTP MCP server
── tools/ # Individual tool implementations
├── web_search_tool/
├── web_scrape_tool/
└── file_system_toolkits/
├── exports/ # Agent Packages (user-generated, not in repo)
│ └── your_agent/ # Your agents created via /building-agents
│ └── your_agent/ # Your agents created via /hive
├── examples/
│ └── templates/ # Pre-built template agents
├── .claude/ # Claude Code Skills
│ └── skills/
│ ├── agent-workflow/
│ ├── building-agents-construction/
│ ├── building-agents-core/
│ ├── building-agents-patterns/
│ └── testing-agent/
│ ├── hive/
│ ├── hive-create/
│ ├── hive-concepts/
│ ├── hive-patterns/
│ └── hive-test/
└── docs/ # Documentation
```
@@ -115,19 +120,15 @@ hive/
## Running an Agent
```bash
# Validate agent structure
PYTHONPATH=exports uv run python -m my_agent validate
# Browse and run agents interactively (Recommended)
hive tui
# Show agent information
PYTHONPATH=exports uv run python -m my_agent info
# Run a specific agent
hive run exports/my_agent --input '{"task": "Your input here"}'
# Run agent with input
PYTHONPATH=exports uv run python -m my_agent run --input '{
"task": "Your input here"
}'
# Run with TUI dashboard
hive run exports/my_agent --tui
# Run in mock mode (no LLM calls)
PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
```
## API Keys Setup
@@ -142,6 +143,7 @@ export BRAVE_SEARCH_API_KEY="your-key-here" # Optional, for web search
```
Get your API keys:
- **Anthropic**: [console.anthropic.com](https://console.anthropic.com/)
- **OpenAI**: [platform.openai.com](https://platform.openai.com/)
- **Brave Search**: [brave.com/search/api](https://brave.com/search/api/)
@@ -150,7 +152,7 @@ Get your API keys:
```bash
# Using Claude Code
claude> /testing-agent
claude> /hive-test
# Or manually
PYTHONPATH=exports uv run python -m my_agent test
@@ -162,11 +164,12 @@ PYTHONPATH=exports uv run python -m my_agent test --type success
## Next Steps
1. **Detailed Setup**: See [ENVIRONMENT_SETUP.md](../ENVIRONMENT_SETUP.md)
2. **Developer Guide**: See [DEVELOPER.md](../DEVELOPER.md)
3. **Build Agents**: Use `/building-agents` skill in Claude Code
4. **Custom Tools**: Learn to integrate MCP servers
5. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
1. **TUI Dashboard**: Run `hive tui` to explore agents interactively
2. **Detailed Setup**: See [environment-setup.md](./environment-setup.md)
3. **Developer Guide**: See [developer-guide.md](./developer-guide.md)
4. **Build Agents**: Use `/hive` skill in Claude Code
5. **Custom Tools**: Learn to integrate MCP servers
6. **Join Community**: [Discord](https://discord.com/invite/MXE49hrKDk)
## Troubleshooting
@@ -192,8 +195,6 @@ uv pip install -e .
# Verify API key is set
echo $ANTHROPIC_API_KEY
# Run in mock mode to test without API
PYTHONPATH=exports uv run python -m my_agent run --mock --input '{...}'
```
### Package Installation Issues
@@ -209,4 +210,4 @@ pip uninstall -y framework tools
- **Documentation**: Check the `/docs` folder
- **Issues**: [github.com/adenhq/hive/issues](https://github.com/adenhq/hive/issues)
- **Discord**: [discord.com/invite/MXE49hrKDk](https://discord.com/invite/MXE49hrKDk)
- **Build Agents**: Use `/building-agents` skill to create agents
- **Build Agents**: Use `/hive` skill to create agents
+17 -19
View File
@@ -78,6 +78,7 @@ cd hive
```
Esto instala:
- **framework** - Runtime del agente principal y ejecutor de grafos
- **aden_tools** - 19 herramientas MCP para capacidades de agentes
- Todas las dependencias requeridas
@@ -89,16 +90,16 @@ Esto instala:
./quickstart.sh
# Construir un agente usando Claude Code
claude> /building-agents-construction
claude> /hive
# Probar tu agente
claude> /testing-agent
claude> /hive-test
# Ejecutar tu agente
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 Guía de Configuración Completa](ENVIRONMENT_SETUP.md)** - Instrucciones detalladas para desarrollo de agentes
**[📖 Guía de Configuración Completa](../environment-setup.md)** - Instrucciones detalladas para desarrollo de agentes
## Características
@@ -162,14 +163,14 @@ flowchart LR
### La Ventaja de Aden
| Frameworks Tradicionales | Aden |
|--------------------------|------|
| Codificar flujos de trabajo de agentes | Describir objetivos en lenguaje natural |
| Definición manual de grafos | Grafos de agentes auto-generados |
| Manejo reactivo de errores | Auto-evolución proactiva |
| Configuraciones de herramientas estáticas | Nodos dinámicos envueltos en SDK |
| Configuración de monitoreo separada | Observabilidad en tiempo real integrada |
| Gestión de presupuesto DIY | Controles de costos y degradación integrados |
| Frameworks Tradicionales | Aden |
| ----------------------------------------- | -------------------------------------------- |
| Codificar flujos de trabajo de agentes | Describir objetivos en lenguaje natural |
| Definición manual de grafos | Grafos de agentes auto-generados |
| Manejo reactivo de errores | Auto-evolución proactiva |
| Configuraciones de herramientas estáticas | Nodos dinámicos envueltos en SDK |
| Configuración de monitoreo separada | Observabilidad en tiempo real integrada |
| Gestión de presupuesto DIY | Controles de costos y degradación integrados |
### Cómo Funciona
@@ -213,10 +214,7 @@ hive/
├── docs/ # Documentación y guías
├── scripts/ # Scripts de construcción y utilidades
├── .claude/ # Habilidades de Claude Code para construir agentes
├── ENVIRONMENT_SETUP.md # Guía de configuración de Python para desarrollo de agentes
├── DEVELOPER.md # Guía del desarrollador
├── CONTRIBUTING.md # Directrices de contribución
└── ROADMAP.md # Hoja de ruta del producto
```
## Desarrollo
@@ -235,20 +233,20 @@ Para construir y ejecutar agentes orientados a objetivos con el framework:
# - Todas las dependencias
# Construir nuevos agentes usando habilidades de Claude Code
claude> /building-agents-construction
claude> /hive
# Probar agentes
claude> /testing-agent
claude> /hive-test
# Ejecutar agentes
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
Consulta [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) para instrucciones de configuración completas.
Consulta [environment-setup.md](../environment-setup.md) para instrucciones de configuración completas.
## Documentación
- **[Guía del Desarrollador](DEVELOPER.md)** - Guía completa para desarrolladores
- **[Guía del Desarrollador](../developer-guide.md)** - Guía completa para desarrolladores
- [Primeros Pasos](docs/getting-started.md) - Instrucciones de configuración rápida
- [Guía de Configuración](docs/configuration.md) - Todas las opciones de configuración
- [Visión General de Arquitectura](docs/architecture/README.md) - Diseño y estructura del sistema
@@ -257,7 +255,7 @@ Consulta [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) para instrucciones de conf
El Framework de Agentes Aden tiene como objetivo ayudar a los desarrolladores a construir agentes auto-adaptativos orientados a resultados. Encuentra nuestra hoja de ruta aquí
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
+20 -23
View File
@@ -62,8 +62,8 @@ Aden एक ऐसा प्लेटफ़ॉर्म है जो AI एज
# त्वरित लिंक (Quick Links)
- **[डाक्यूमेंटेशन](https://docs.adenhq.com/)** - पूर्ण गाइड्स और API संदर्भ
- **[सेल्फ-होस्टिंग गाइड](https://docs.adenhq.com/getting-started/quickstart)** -
Hive को अपने इंफ़्रास्ट्रक्चर पर डिप्लॉय करें
- **[सेल्फ-होस्टिंग गाइड](https://docs.adenhq.com/getting-started/quickstart)** -
Hive को अपने इंफ़्रास्ट्रक्चर पर डिप्लॉय करें
- **[चेंजलॉग](https://github.com/adenhq/hive/releases)** - नवीनतम अपडेट और रिलीज़
<!-- - **[Hoja de Ruta](https://adenhq.com/roadmap)** - Funciones y planes próximos -->
- **[इशू रिपोर्ट करें](https://github.com/adenhq/hive/issues)** - बग रिपोर्ट और फ़ीचर अनुरोध
@@ -87,6 +87,7 @@ cd hive
```
यह इंस्टॉल करता है:
- **framework** - मुख्य एजेंट रनटाइम और ग्राफ़ एक्ज़ीक्यूटर
- **aden_tools** - एजेंट क्षमताओं के लिए 19 MCP टूल्स
- सभी आवश्यक डिपेंडेंसीज़
@@ -98,16 +99,16 @@ Claude Code की क्षमताएँ इंस्टॉल करें (
./quickstart.sh
# Claude Code का उपयोग करके एक एजेंट बनाएँ
claude> /building-agents-construction
claude> /hive
# अपने एजेंट का परीक्षण करें
claude> /testing-agent
claude> /hive-test
# अपने एजेंट को चलाएँ
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 पूर्ण कॉन्फ़िगरेशन गाइड](ENVIRONMENT_SETUP.md)** - एजेंट विकास के लिए विस्तृत निर्देश
**[📖 पूर्ण कॉन्फ़िगरेशन गाइड](../environment-setup.md)** - एजेंट विकास के लिए विस्तृत निर्देश
## विशेषताएँ
@@ -171,14 +172,14 @@ flowchart LR
### Aden की बढ़त
| पारंपरिक फ़्रेमवर्क्स | Aden |
|--------------------------|------|
| एजेंट वर्कफ़्लो को हार्डकोड करना | प्राकृतिक भाषा में लक्ष्यों का वर्णन |
| ग्राफ़ की मैन्युअल परिभाषा | स्वतः-उत्पन्न एजेंट ग्राफ़ |
| त्रुटियों का प्रतिक्रियात्मक प्रबंधन | प्रॉएक्टिव स्वयं-विकास |
| स्थिर टूल कॉन्फ़िगरेशन | SDK-रैप्ड डायनेमिक नोड्स |
| अलग मॉनिटरिंग सेटअप | एकीकृत रीयल-टाइम ऑब्ज़र्वेबिलिटी |
| DIY बजट प्रबंधन | एकीकृत लागत नियंत्रण और डिग्रेडेशन नीतियाँ |
| पारंपरिक फ़्रेमवर्क्स | Aden |
| ------------------------------------ | ------------------------------------------ |
| एजेंट वर्कफ़्लो को हार्डकोड करना | प्राकृतिक भाषा में लक्ष्यों का वर्णन |
| ग्राफ़ की मैन्युअल परिभाषा | स्वतः-उत्पन्न एजेंट ग्राफ़ |
| त्रुटियों का प्रतिक्रियात्मक प्रबंधन | प्रॉएक्टिव स्वयं-विकास |
| स्थिर टूल कॉन्फ़िगरेशन | SDK-रैप्ड डायनेमिक नोड्स |
| अलग मॉनिटरिंग सेटअप | एकीकृत रीयल-टाइम ऑब्ज़र्वेबिलिटी |
| DIY बजट प्रबंधन | एकीकृत लागत नियंत्रण और डिग्रेडेशन नीतियाँ |
### यह कैसे काम करता है
@@ -222,10 +223,7 @@ hive/
├── docs/ # दस्तावेज़ और मार्गदर्शिकाएँ
├── scripts/ # बिल्ड स्क्रिप्ट्स और यूटिलिटीज़
├── .claude/ # एजेंट बनाने के लिए Claude Code क्षमताएँ
├── ENVIRONMENT_SETUP.md # एजेंट डेवलपमेंट के लिए Python सेटअप गाइड
├── DEVELOPER.md # डेवलपर गाइड
├── CONTRIBUTING.md # योगदान दिशानिर्देश
└── ROADMAP.md # प्रोडक्ट रोडमैप
```
## विकास
@@ -244,20 +242,20 @@ hive/
# - सभी डिपेंडेंसीज़
# Claude Code क्षमताओं का उपयोग करके नए एजेंट बनाएं
claude> /building-agents-construction
claude> /hive
# एजेंट का परीक्षण करें
claude> /testing-agent
claude> /hive-test
# एजेंट चलाएँ
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
पूरी कॉन्फ़िगरेशन निर्देशों के लिए ENVIRONMENT_SETUP.md देखें।
पूरी कॉन्फ़िगरेशन निर्देशों के लिए [environment-setup.md](../environment-setup.md) देखें।
## दस्तावेज़ीकरण
- **[डेवलपर गाइड](DEVELOPER.md)** - डेवलपर्स के लिए पूर्ण मार्गदर्शिका
- **[डेवलपर गाइड](../developer-guide.md)** - डेवलपर्स के लिए पूर्ण मार्गदर्शिका
- [शुरुआत करें](docs/getting-started.md) - त्वरित कॉन्फ़िगरेशन निर्देश
- [कॉन्फ़िगरेशन गाइड](docs/configuration.md) - सभी कॉन्फ़िगरेशन विकल्प
- [आर्किटेक्चर का अवलोकन](docs/architecture/README.md) - सिस्टम का डिज़ाइन और संरचना
@@ -266,7 +264,7 @@ PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
एडेन एजेंट फ़्रेमवर्क का उद्देश्य डेवलपर्स को परिणाम-उन्मुख, स्वयं-अनुकूलित एजेंट बनाने में मदद करना है। हमारी रोडमैप यहाँ देखें।
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
@@ -293,6 +291,7 @@ timeline
- LinkedIn - [कंपनी पेज](https://www.linkedin.com/company/teamaden/)
## योगदान करें
हम योगदान का स्वागत करते हैं! कृपया देखें [CONTRIBUTING.md] (CONTRIBUTING.md) दिशानिर्देशों के लिए.
**महत्वपूर्ण:**: कृपया PR भेजने से पहले किसी issue को अपने नाम असाइन करवाने का अनुरोध करें। उसे क्लेम करने के लिए issue पर टिप्पणी करें, और कोई मेंटेनर 24 घंटों के भीतर उसे आपको असाइन कर देगा। इससे डुप्लिकेट काम से बचाव होता है।
@@ -352,5 +351,3 @@ timeline
<p align="center">
सैन फ्रांसिस्को में 🔥 जुनून के साथ बनाया गया
</p>
+52 -54
View File
@@ -35,28 +35,28 @@
## 概要
ワークフローをハードコーディングせずに、信頼性の高い自己改善型AIエージェントを構築できます。コーディングエージェントとの会話を通じて目標を定義すると、フレームワークが動的に作成された接続コードを持つノードグラフを生成します。問題が発生すると、フレームワークは障害データをキャプチャし、コーディングエージェントを通じてエージェントを進化させ、再デプロイします。組み込みのヒューマンインザループノード、認証情報管理、リアルタイムモニタリングにより、適応性を損なうことなく制御を維持できます。
ワークフローをハードコーディングせずに、信頼性の高い自己改善型 AI エージェントを構築できます。コーディングエージェントとの会話を通じて目標を定義すると、フレームワークが動的に作成された接続コードを持つノードグラフを生成します。問題が発生すると、フレームワークは障害データをキャプチャし、コーディングエージェントを通じてエージェントを進化させ、再デプロイします。組み込みのヒューマンインザループノード、認証情報管理、リアルタイムモニタリングにより、適応性を損なうことなく制御を維持できます。
完全なドキュメント、例、ガイドについては [adenhq.com](https://adenhq.com) をご覧ください。
## Adenとは
## Aden とは
<p align="center">
<img width="100%" alt="Aden Architecture" src="../assets/aden-architecture-diagram.jpg" />
</p>
Adenは、AIエージェントの構築、デプロイ、運用、適応のためのプラットフォームです:
Aden は、AI エージェントの構築、デプロイ、運用、適応のためのプラットフォームです:
- **構築** - コーディングエージェントが自然言語の目標から専門的なワーカーエージェント(セールス、マーケティング、オペレーション)を生成
- **デプロイ** - CI/CD統合と完全なAPIライフサイクル管理を備えたヘッドレスデプロイメント
- **デプロイ** - CI/CD 統合と完全な API ライフサイクル管理を備えたヘッドレスデプロイメント
- **運用** - リアルタイムモニタリング、可観測性、ランタイムガードレールがエージェントの信頼性を維持
- **適応** - 継続的な評価、監督、適応により、エージェントは時間とともに改善
- **インフラ** - 共有メモリ、LLM統合、ツール、スキルがすべてのエージェントを支援
- **インフラ** - 共有メモリ、LLM 統合、ツール、スキルがすべてのエージェントを支援
## クイックリンク
- **[ドキュメント](https://docs.adenhq.com/)** - 完全なガイドとAPIリファレンス
- **[セルフホスティングガイド](https://docs.adenhq.com/getting-started/quickstart)** - インフラストラクチャへのHiveデプロイ
- **[ドキュメント](https://docs.adenhq.com/)** - 完全なガイドと API リファレンス
- **[セルフホスティングガイド](https://docs.adenhq.com/getting-started/quickstart)** - インフラストラクチャへの Hive デプロイ
- **[変更履歴](https://github.com/adenhq/hive/releases)** - 最新の更新とリリース
<!-- - **[ロードマップ](https://adenhq.com/roadmap)** - 今後の機能と計画 -->
- **[問題を報告](https://github.com/adenhq/hive/issues)** - バグレポートと機能リクエスト
@@ -80,8 +80,9 @@ cd hive
```
これにより以下がインストールされます:
- **framework** - コアエージェントランタイムとグラフエグゼキュータ
- **aden_tools** - エージェント機能のための19個のMCPツール
- **aden_tools** - エージェント機能のための 19 個の MCP ツール
- すべての必要な依存関係
### 最初のエージェントを構築
@@ -91,31 +92,31 @@ cd hive
./quickstart.sh
# Claude Codeを使用してエージェントを構築
claude> /building-agents-construction
claude> /hive
# エージェントをテスト
claude> /testing-agent
claude> /hive-test
# エージェントを実行
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 完全セットアップガイド](ENVIRONMENT_SETUP.md)** - エージェント開発の詳細な手順
**[📖 完全セットアップガイド](../environment-setup.md)** - エージェント開発の詳細な手順
## 機能
- **目標駆動開発** - 自然言語で目標を定義;コーディングエージェントがそれを達成するためのエージェントグラフと接続コードを生成
- **自己適応エージェント** - フレームワークが障害をキャプチャし、目標を更新し、エージェントグラフを更新
- **動的ノード接続** - 事前定義されたエッジなし;接続コードは目標に基づいて任意の対応LLMによって生成
- **SDKラップノード** - すべてのノードが共有メモリ、ローカルRLMメモリ、モニタリング、ツール、LLMアクセスを標準装備
- **動的ノード接続** - 事前定義されたエッジなし;接続コードは目標に基づいて任意の対応 LLM によって生成
- **SDK ラップノード** - すべてのノードが共有メモリ、ローカル RLM メモリ、モニタリング、ツール、LLM アクセスを標準装備
- **ヒューマンインザループ** - 設定可能なタイムアウトとエスカレーションを備えた、人間の入力のために実行を一時停止する介入ノード
- **リアルタイム可観測性** - エージェント実行、決定、ノード間通信のライブモニタリングのためのWebSocketストリーミング
- **リアルタイム可観測性** - エージェント実行、決定、ノード間通信のライブモニタリングのための WebSocket ストリーミング
- **コストと予算管理** - 支出制限、スロットル、自動モデル劣化ポリシーを設定
- **本番環境対応** - セルフホスト可能、スケールと信頼性のために構築
## なぜAdenか
## なぜ Aden
従来のエージェントフレームワークでは、ワークフローを手動で設計し、エージェントの相互作用を定義し、障害を事後的に処理する必要があります。Adenはこのパラダイムを逆転させます—**結果を記述すれば、システムが自ら構築します**。
従来のエージェントフレームワークでは、ワークフローを手動で設計し、エージェントの相互作用を定義し、障害を事後的に処理する必要があります。Aden はこのパラダイムを逆転させます—**結果を記述すれば、システムが自ら構築します**。
```mermaid
flowchart LR
@@ -162,34 +163,34 @@ flowchart LR
style STORE fill:#ed8c00,stroke:#cc5d00,stroke-width:2px,color:#fff
```
### Adenの優位性
### Aden の優位性
| 従来のフレームワーク | Aden |
|----------------------|------|
| エージェントワークフローをハードコード | 自然言語で目標を記述 |
| 手動でグラフを定義 | 自動生成されるエージェントグラフ |
| 事後的なエラー処理 | プロアクティブな自己進化 |
| 静的なツール設定 | 動的なSDKラップノード |
| 別途モニタリング設定 | 組み込みのリアルタイム可観測性 |
| DIY予算管理 | 統合されたコスト制御と劣化 |
| 従来のフレームワーク | Aden |
| -------------------------------------- | -------------------------------- |
| エージェントワークフローをハードコード | 自然言語で目標を記述 |
| 手動でグラフを定義 | 自動生成されるエージェントグラフ |
| 事後的なエラー処理 | プロアクティブな自己進化 |
| 静的なツール設定 | 動的な SDK ラップノード |
| 別途モニタリング設定 | 組み込みのリアルタイム可観測性 |
| DIY 予算管理 | 統合されたコスト制御と劣化 |
### 仕組み
1. **目標を定義** → 達成したいことを平易な言葉で記述
2. **コーディングエージェントが生成** → エージェントグラフ、接続コード、テストケースを作成
3. **ワーカーが実行** → SDKラップノードが完全な可観測性とツールアクセスで実行
3. **ワーカーが実行** → SDK ラップノードが完全な可観測性とツールアクセスで実行
4. **コントロールプレーンが監視** → リアルタイムメトリクス、予算執行、ポリシー管理
5. **自己改善** → 障害時、システムがグラフを進化させ自動的に再デプロイ
## Adenの比較
## Aden の比較
Adenはエージェント開発に根本的に異なるアプローチを採用しています。ほとんどのフレームワークがワークフローをハードコードするか、エージェントグラフを手動で定義することを要求するのに対し、Adenは**コーディングエージェントを使用して自然言語の目標からエージェントシステム全体を生成**します。エージェントが失敗した場合、フレームワークは単にエラーをログに記録するだけでなく—**自動的にエージェントグラフを進化させ**、再デプロイします。
Aden はエージェント開発に根本的に異なるアプローチを採用しています。ほとんどのフレームワークがワークフローをハードコードするか、エージェントグラフを手動で定義することを要求するのに対し、Aden は**コーディングエージェントを使用して自然言語の目標からエージェントシステム全体を生成**します。エージェントが失敗した場合、フレームワークは単にエラーをログに記録するだけでなく—**自動的にエージェントグラフを進化させ**、再デプロイします。
> **注意:** 詳細なフレームワーク比較表とよくある質問については、英語の[README.md](README.md)を参照してください。
### Adenを選ぶべきとき
### Aden を選ぶべきとき
Adenを選択する場合:
Aden を選択する場合:
- 手動介入なしに**失敗から自己改善する**エージェントが必要
- ワークフローではなく結果を記述する**目標駆動開発**が必要
@@ -200,7 +201,7 @@ Adenを選択する場合:
他のフレームワークを選択する場合:
- **型安全で予測可能なワークフロー**PydanticAI、Mastra
- **RAGとドキュメント処理**LlamaIndex、Haystack
- **RAG とドキュメント処理**LlamaIndex、Haystack
- **エージェント創発の研究**(CAMEL)
- **リアルタイム音声/マルチモーダル**TEN Framework
- **シンプルなコンポーネント連鎖**LangChain、Swarm
@@ -215,15 +216,12 @@ hive/
├── docs/ # ドキュメントとガイド
├── scripts/ # ビルドとユーティリティスクリプト
├── .claude/ # エージェント構築用のClaude Codeスキル
├── ENVIRONMENT_SETUP.md # エージェント開発用のPythonセットアップガイド
├── DEVELOPER.md # 開発者ガイド
├── CONTRIBUTING.md # 貢献ガイドライン
└── ROADMAP.md # プロダクトロードマップ
```
## 開発
### Pythonエージェント開発
### Python エージェント開発
フレームワークで目標駆動エージェントを構築および実行するには:
@@ -237,29 +235,29 @@ hive/
# - すべての依存関係
# Claude Codeスキルを使用して新しいエージェントを構築
claude> /building-agents-construction
claude> /hive
# エージェントをテスト
claude> /testing-agent
claude> /hive-test
# エージェントを実行
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
完全なセットアップ手順については、[ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md)を参照してください。
完全なセットアップ手順については、[environment-setup.md](../environment-setup.md)を参照してください。
## ドキュメント
- **[開発者ガイド](DEVELOPER.md)** - 開発者向け総合ガイド
- **[開発者ガイド](../developer-guide.md)** - 開発者向け総合ガイド
- [はじめに](docs/getting-started.md) - クイックセットアップ手順
- [設定ガイド](docs/configuration.md) - すべての設定オプション
- [アーキテクチャ概要](docs/architecture/README.md) - システム設計と構造
## ロードマップ
Adenエージェントフレームワークは、開発者が結果志向で自己適応するエージェントを構築できるよう支援することを目指しています。ロードマップはこちらをご覧ください
Aden エージェントフレームワークは、開発者が結果志向で自己適応するエージェントを構築できるよう支援することを目指しています。ロードマップはこちらをご覧ください
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
@@ -289,9 +287,9 @@ timeline
貢献を歓迎します!ガイドラインについては[CONTRIBUTING.md](CONTRIBUTING.md)をご覧ください。
**重要:** PRを提出する前に、まずIssueにアサインされてください。Issueにコメントして担当を申請すると、メンテナーが24時間以内にアサインします。これにより重複作業を防ぐことができます。
**重要:** PR を提出する前に、まず Issue にアサインされてください。Issue にコメントして担当を申請すると、メンテナーが 24 時間以内にアサインします。これにより重複作業を防ぐことができます。
1. Issueを見つけるか作成し、アサインを受ける
1. Issue を見つけるか作成し、アサインを受ける
2. リポジトリをフォーク
3. 機能ブランチを作成 (`git checkout -b feature/amazing-feature`)
4. 変更をコミット (`git commit -m 'Add amazing feature'`)
@@ -310,31 +308,31 @@ timeline
## ライセンス
このプロジェクトはApache License 2.0の下でライセンスされています - 詳細は[LICENSE](LICENSE)ファイルをご覧ください。
このプロジェクトは Apache License 2.0 の下でライセンスされています - 詳細は[LICENSE](LICENSE)ファイルをご覧ください。
## よくある質問 (FAQ)
> **注意:** よくある質問の完全版については、英語の[README.md](README.md)を参照してください。
**Q: AdenLangChainや他のエージェントフレームワークに依存していますか?**
**Q: AdenLangChain や他のエージェントフレームワークに依存していますか?**
いいえ。AdenLangChain、CrewAI、その他のエージェントフレームワークに依存せずにゼロから構築されています。フレームワークは軽量で柔軟に設計されており、事前定義されたコンポーネントに依存するのではなく、エージェントグラフを動的に生成します。
いいえ。AdenLangChain、CrewAI、その他のエージェントフレームワークに依存せずにゼロから構築されています。フレームワークは軽量で柔軟に設計されており、事前定義されたコンポーネントに依存するのではなく、エージェントグラフを動的に生成します。
**Q: AdenはどのLLMプロバイダーをサポートしていますか?**
**Q: Aden はどの LLM プロバイダーをサポートしていますか?**
AdenLiteLLM統合を通じて100以上のLLMプロバイダーをサポートしており、OpenAIGPT-4、GPT-4o)、AnthropicClaudeモデル)、Google Gemini、Mistral、Groqなどが含まれます。適切なAPIキー環境変数を設定し、モデル名を指定するだけです。
AdenLiteLLM 統合を通じて 100 以上の LLM プロバイダーをサポートしており、OpenAIGPT-4、GPT-4o)、AnthropicClaude モデル)、Google Gemini、Mistral、Groq などが含まれます。適切な API キー環境変数を設定し、モデル名を指定するだけです。
**Q: Adenはオープンソースですか?**
**Q: Aden はオープンソースですか?**
はい、AdenApache License 2.0の下で完全にオープンソースです。コミュニティの貢献とコラボレーションを積極的に奨励しています。
はい、AdenApache License 2.0 の下で完全にオープンソースです。コミュニティの貢献とコラボレーションを積極的に奨励しています。
**Q: Adenは他のエージェントフレームワークと何が違いますか?**
**Q: Aden は他のエージェントフレームワークと何が違いますか?**
Adenはコーディングエージェントを使用して自然言語の目標からエージェントシステム全体を生成します—ワークフローをハードコードしたり、グラフを手動で定義したりする必要はありません。エージェントが失敗すると、フレームワークは自動的に障害データをキャプチャし、エージェントグラフを進化させ、再デプロイします。この自己改善ループはAden独自のものです。
Aden はコーディングエージェントを使用して自然言語の目標からエージェントシステム全体を生成します—ワークフローをハードコードしたり、グラフを手動で定義したりする必要はありません。エージェントが失敗すると、フレームワークは自動的に障害データをキャプチャし、エージェントグラフを進化させ、再デプロイします。この自己改善ループは Aden 独自のものです。
**Q: Adenはヒューマンインザループワークフローをサポートしていますか?**
**Q: Aden はヒューマンインザループワークフローをサポートしていますか?**
はい、Adenは人間の入力のために実行を一時停止する介入ノードを通じて、ヒューマンインザループワークフローを完全にサポートしています。設定可能なタイムアウトとエスカレーションポリシーが含まれており、人間の専門家とAIエージェントのシームレスなコラボレーションを可能にします。
はい、Aden は人間の入力のために実行を一時停止する介入ノードを通じて、ヒューマンインザループワークフローを完全にサポートしています。設定可能なタイムアウトとエスカレーションポリシーが含まれており、人間の専門家と AI エージェントのシームレスなコラボレーションを可能にします。
---
+10 -13
View File
@@ -91,16 +91,16 @@ cd hive
./quickstart.sh
# Claude Code를 사용해 에이전트 빌드
claude> /building-agents
claude> /hive
# 에이전트 테스트
claude> /testing-agent
claude> /hive-test
# 에이전트 실행
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 전체 설정 가이드](ENVIRONMENT_SETUP.md)** - 에이전트 개발을 위한 상세한 설명
**[📖 전체 설정 가이드](../environment-setup.md)** - 에이전트 개발을 위한 상세한 설명
## 주요 기능
@@ -226,10 +226,7 @@ hive/
├── docs/ # 문서 및 가이드
├── scripts/ # 빌드 및 유틸리티 스크립트
├── .claude/ # 에이전트 생성을 위한 Claude Code 스킬
├── ENVIRONMENT_SETUP.md # 에이전트 개발을 위한 Python 환경 설정 가이드
├── DEVELOPER.md # 개발자 가이드
├── CONTRIBUTING.md # 기여 가이드라인
└── ROADMAP.md # 제품 로드맵
```
## 개발
@@ -248,20 +245,20 @@ hive/
# - 모든 의존성
# Claude Code 스킬을 사용해 새 에이전트 생성
claude> /building-agents
claude> /hive
# 에이전트 테스트
claude> /testing-agent
claude> /hive-test
# 에이전트 실행
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
전체 설정 방법은 [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) 를 참고하세요.
전체 설정 방법은 [environment-setup.md](../environment-setup.md) 를 참고하세요.
## 문서
- **[개발자 가이드](DEVELOPER.md)** - 개발자를 위한 종합 가이드
- **[개발자 가이드](../developer-guide.md)** - 개발자를 위한 종합 가이드
- [시작하기](docs/getting-started.md) - 빠른 설정 방법
- [설정 가이드](docs/configuration.md) - 모든 설정 옵션 안내
- [아키텍처 개요](docs/architecture/README.md) - 시스템 설계 및 구조
@@ -271,7 +268,7 @@ PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
Aden Agent Framework는 개발자가 결과 중심(outcome-oriented) 이며 자기 적응형(self-adaptive) 에이전트를 구축할 수 있도록 돕는 것을 목표로 합니다.
자세한 로드맵은 아래 문서에서 확인할 수 있습니다.
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
@@ -352,7 +349,7 @@ Aden은 모니터링과 관측성을 위해 토큰 사용량, 지연 시간 메
**Q: Aden은 어떤 배포 방식을 지원하나요?**
Aden은 Python 패키지를 통한 셀프 호스팅 배포를 지원합니다. 설치 방법은 [환경 설정 가이드](ENVIRONMENT_SETUP.md)를 참조하세요. 클라우드 배포 옵션과 Kubernetes 대응 설정은 로드맵에 포함되어 있습니다.
Aden은 Python 패키지를 통한 셀프 호스팅 배포를 지원합니다. 설치 방법은 [환경 설정 가이드](../environment-setup.md)를 참조하세요. 클라우드 배포 옵션과 Kubernetes 대응 설정은 로드맵에 포함되어 있습니다.
**Q: Aden은 복잡한 프로덕션 규모의 사용 사례도 처리할 수 있나요?**
@@ -380,7 +377,7 @@ Aden은 지출 한도, 호출 제한, 자동 모델 다운그레이드 정책
**Q: 예제와 문서는 어디에서 확인할 수 있나요?**
전체 가이드, API 레퍼런스, 시작 튜토리얼은 [docs.adenhq.com](https://docs.adenhq.com/) 에서 확인하실 수 있습니다. 또한 저장소의 `docs/` 디렉터리와 종합적인 [DEVELOPER.md](DEVELOPER.md) 가이드도 함께 제공됩니다.
전체 가이드, API 레퍼런스, 시작 튜토리얼은 [docs.adenhq.com](https://docs.adenhq.com/) 에서 확인하실 수 있습니다. 또한 저장소의 `docs/` 디렉터리와 종합적인 [developer-guide.md](../developer-guide.md) 가이드도 함께 제공됩니다.
**Q: Aden에 기여하려면 어떻게 해야 하나요?**
+17 -19
View File
@@ -80,6 +80,7 @@ cd hive
```
Isto instala:
- **framework** - Runtime do agente principal e executor de grafos
- **aden_tools** - 19 ferramentas MCP para capacidades de agentes
- Todas as dependências necessárias
@@ -91,16 +92,16 @@ Isto instala:
./quickstart.sh
# Construir um agente usando Claude Code
claude> /building-agents-construction
claude> /hive
# Testar seu agente
claude> /testing-agent
claude> /hive-test
# Executar seu agente
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 Guia Completo de Configuração](ENVIRONMENT_SETUP.md)** - Instruções detalhadas para desenvolvimento de agentes
**[📖 Guia Completo de Configuração](../environment-setup.md)** - Instruções detalhadas para desenvolvimento de agentes
## Funcionalidades
@@ -164,14 +165,14 @@ flowchart LR
### A Vantagem Aden
| Frameworks Tradicionais | Aden |
|-------------------------|------|
| Codificar fluxos de trabalho de agentes | Descrever objetivos em linguagem natural |
| Definição manual de grafos | Grafos de agentes auto-gerados |
| Tratamento reativo de erros | Auto-evolução proativa |
| Configurações de ferramentas estáticas | Nós dinâmicos envolvidos em SDK |
| Configuração de monitoramento separada | Observabilidade em tempo real integrada |
| Gerenciamento de orçamento DIY | Controles de custo e degradação integrados |
| Frameworks Tradicionais | Aden |
| --------------------------------------- | ------------------------------------------ |
| Codificar fluxos de trabalho de agentes | Descrever objetivos em linguagem natural |
| Definição manual de grafos | Grafos de agentes auto-gerados |
| Tratamento reativo de erros | Auto-evolução proativa |
| Configurações de ferramentas estáticas | Nós dinâmicos envolvidos em SDK |
| Configuração de monitoramento separada | Observabilidade em tempo real integrada |
| Gerenciamento de orçamento DIY | Controles de custo e degradação integrados |
### Como Funciona
@@ -215,10 +216,7 @@ hive/
├── docs/ # Documentação e guias
├── scripts/ # Scripts de build e utilitários
├── .claude/ # Habilidades Claude Code para construir agentes
├── ENVIRONMENT_SETUP.md # Guia de configuração Python para desenvolvimento de agentes
├── DEVELOPER.md # Guia do desenvolvedor
├── CONTRIBUTING.md # Diretrizes de contribuição
└── ROADMAP.md # Roadmap do produto
```
## Desenvolvimento
@@ -237,20 +235,20 @@ Para construir e executar agentes orientados a objetivos com o framework:
# - Todas as dependências
# Construir novos agentes usando habilidades Claude Code
claude> /building-agents-construction
claude> /hive
# Testar agentes
claude> /testing-agent
claude> /hive-test
# Executar agentes
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
Consulte [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) para instruções completas de configuração.
Consulte [environment-setup.md](../environment-setup.md) para instruções completas de configuração.
## Documentação
- **[Guia do Desenvolvedor](DEVELOPER.md)** - Guia abrangente para desenvolvedores
- **[Guia do Desenvolvedor](../developer-guide.md)** - Guia abrangente para desenvolvedores
- [Começando](docs/getting-started.md) - Instruções de configuração rápida
- [Guia de Configuração](docs/configuration.md) - Todas as opções de configuração
- [Visão Geral da Arquitetura](docs/architecture/README.md) - Design e estrutura do sistema
@@ -259,7 +257,7 @@ Consulte [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) para instruções completa
O Aden Agent Framework visa ajudar desenvolvedores a construir agentes auto-adaptativos orientados a resultados. Encontre nosso roadmap aqui
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
+17 -19
View File
@@ -80,6 +80,7 @@ cd hive
```
Это установит:
- **framework** - Основная среда выполнения агентов и исполнитель графов
- **aden_tools** - 19 инструментов MCP для возможностей агентов
- Все необходимые зависимости
@@ -91,16 +92,16 @@ cd hive
./quickstart.sh
# Создать агента с помощью Claude Code
claude> /building-agents-construction
claude> /hive
# Протестировать агента
claude> /testing-agent
claude> /hive-test
# Запустить агента
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 Полное руководство по настройке](ENVIRONMENT_SETUP.md)** - Подробные инструкции для разработки агентов
**[📖 Полное руководство по настройке](../environment-setup.md)** - Подробные инструкции для разработки агентов
## Функции
@@ -164,14 +165,14 @@ flowchart LR
### Преимущество Aden
| Традиционные фреймворки | Aden |
|-------------------------|------|
| Жёсткое кодирование рабочих процессов | Описание целей на естественном языке |
| Ручное определение графов | Автоматически генерируемые графы агентов |
| Реактивная обработка ошибок | Проактивная самоэволюция |
| Статические конфигурации инструментов | Динамические узлы, обёрнутые SDK |
| Отдельная настройка мониторинга | Встроенная наблюдаемость в реальном времени |
| DIY управление бюджетом | Интегрированный контроль затрат и деградация |
| Традиционные фреймворки | Aden |
| ------------------------------------- | -------------------------------------------- |
| Жёсткое кодирование рабочих процессов | Описание целей на естественном языке |
| Ручное определение графов | Автоматически генерируемые графы агентов |
| Реактивная обработка ошибок | Проактивная самоэволюция |
| Статические конфигурации инструментов | Динамические узлы, обёрнутые SDK |
| Отдельная настройка мониторинга | Встроенная наблюдаемость в реальном времени |
| DIY управление бюджетом | Интегрированный контроль затрат и деградация |
### Как это работает
@@ -215,10 +216,7 @@ hive/
├── docs/ # Документация и руководства
├── scripts/ # Скрипты сборки и утилиты
├── .claude/ # Навыки Claude Code для создания агентов
├── ENVIRONMENT_SETUP.md # Руководство по настройке Python для разработки агентов
├── DEVELOPER.md # Руководство разработчика
├── CONTRIBUTING.md # Руководство по участию
└── ROADMAP.md # Дорожная карта продукта
```
## Разработка
@@ -237,20 +235,20 @@ hive/
# - Все зависимости
# Создать новых агентов с помощью навыков Claude Code
claude> /building-agents-construction
claude> /hive
# Протестировать агентов
claude> /testing-agent
claude> /hive-test
# Запустить агентов
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
Обратитесь к [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) для полных инструкций по настройке.
Обратитесь к [environment-setup.md](../environment-setup.md) для полных инструкций по настройке.
## Документация
- **[Руководство разработчика](DEVELOPER.md)** - Полное руководство для разработчиков
- **[Руководство разработчика](../developer-guide.md)** - Полное руководство для разработчиков
- [Начало работы](docs/getting-started.md) - Инструкции по быстрой настройке
- [Руководство по конфигурации](docs/configuration.md) - Все опции конфигурации
- [Обзор архитектуры](docs/architecture/README.md) - Дизайн и структура системы
@@ -259,7 +257,7 @@ PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
Aden Agent Framework призван помочь разработчикам создавать самоадаптирующихся агентов, ориентированных на результат. Найдите нашу дорожную карту здесь
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
+16 -18
View File
@@ -80,6 +80,7 @@ cd hive
```
这将安装:
- **framework** - 核心智能体运行时和图执行器
- **aden_tools** - 19 个 MCP 工具提供智能体能力
- 所有必需的依赖项
@@ -91,16 +92,16 @@ cd hive
./quickstart.sh
# 使用 Claude Code 构建智能体
claude> /building-agents-construction
claude> /hive
# 测试您的智能体
claude> /testing-agent
claude> /hive-test
# 运行您的智能体
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 完整设置指南](ENVIRONMENT_SETUP.md)** - 智能体开发的详细说明
**[📖 完整设置指南](../environment-setup.md)** - 智能体开发的详细说明
## 功能特性
@@ -164,14 +165,14 @@ flowchart LR
### Aden 的优势
| 传统框架 | Aden |
|----------|------|
| 传统框架 | Aden |
| ------------------ | ------------------ |
| 硬编码智能体工作流 | 用自然语言描述目标 |
| 手动图定义 | 自动生成智能体图 |
| 被动错误处理 | 主动自我进化 |
| 静态工具配置 | 动态 SDK 封装节点 |
| 单独设置监控 | 内置实时可观测性 |
| DIY 预算管理 | 集成成本控制和降级 |
| 手动图定义 | 自动生成智能体图 |
| 被动错误处理 | 主动自我进化 |
| 静态工具配置 | 动态 SDK 封装节点 |
| 单独设置监控 | 内置实时可观测性 |
| DIY 预算管理 | 集成成本控制和降级 |
### 工作原理
@@ -215,10 +216,7 @@ hive/
├── docs/ # 文档和指南
├── scripts/ # 构建和实用脚本
├── .claude/ # Claude Code 技能用于构建智能体
├── ENVIRONMENT_SETUP.md # 智能体开发的 Python 设置指南
├── DEVELOPER.md # 开发者指南
├── CONTRIBUTING.md # 贡献指南
└── ROADMAP.md # 产品路线图
```
## 开发
@@ -237,20 +235,20 @@ hive/
# - 所有依赖项
# 使用 Claude Code 技能构建新智能体
claude> /building-agents-construction
claude> /hive
# 测试智能体
claude> /testing-agent
claude> /hive-test
# 运行智能体
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
完整设置说明请参阅 [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md)。
完整设置说明请参阅 [environment-setup.md](../environment-setup.md)。
## 文档
- **[开发者指南](DEVELOPER.md)** - 开发者综合指南
- **[开发者指南](../developer-guide.md)** - 开发者综合指南
- [入门指南](docs/getting-started.md) - 快速设置说明
- [配置指南](docs/configuration.md) - 所有配置选项
- [架构概述](docs/architecture/README.md) - 系统设计和结构
@@ -259,7 +257,7 @@ PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
Aden 智能体框架旨在帮助开发者构建面向结果的、自适应的智能体。请在此查看我们的路线图
[ROADMAP.md](ROADMAP.md)
[roadmap.md](../roadmap.md)
```mermaid
timeline
+49
View File
@@ -0,0 +1,49 @@
# Evolution
## Evolution Is the Mechanism; Adaptiveness Is the Result
Agents don't just fail; they fail inevitably. Real-world variables—private LinkedIn profiles, shifting API schemas, or LLM hallucinations—are impossible to predict in a vacuum. The first version of any agent is merely a "happy path" draft.
Evolution is how Hive handles this. When an agent fails, the framework captures what went wrong — which node failed, which success criteria weren't met, what the agent tried and why it didn't work. Then a coding agent (Claude Code, Cursor, or similar) uses that failure data to generate an improved version of the agent. The new version gets deployed, runs, encounters new edge cases, and the cycle continues.
Over generations, the agent gets more reliable. Not because someone sat down and anticipated every possible failure, but because each failure teaches the next version something specific.
## How It Works
The evolution loop has four stages:
**1. Execute** — The worker agent runs against real inputs. Sessions produce outcomes, decisions, and metrics.
**2. Evaluate** — The framework checks outcomes against the goal's success criteria and constraints. Did the agent produce the desired result? Which criteria were satisfied and which weren't? Were any constraints violated?
**3. Diagnose** — Failure data is structured and specific. It's not just "the agent failed" — it's "node `draft_message` failed to produce personalized content because the research node returned insufficient data about the prospect's recent activity." The decision log, problem reports, and execution trace provide the full picture.
**4. Regenerate** — A coding agent receives the diagnosis and the current agent code. It modifies the graph — adding nodes, adjusting prompts, changing edge conditions, adding tools — to address the specific failure. The new version is deployed and the cycle restarts.
## Adaptiveness ≠ Intelligence or Intent
An important distinction: evolution makes agents more adaptive, but not more intelligent in any general sense. The agent isn't learning to reason better — it's being rewritten to handle more situations correctly.
This is closer to how biological evolution works than how learning works. A species doesn't "learn" to survive winter — individuals that happen to have thicker fur survive, and that trait gets selected for. Similarly, agent versions that handle more edge cases correctly survive in production, and the patterns that made them successful get carried forward.
The practical implication: don't expect evolution to make an agent smarter about problems it's never seen. Evolution improves reliability on the *kinds* of problems the agent has already encountered. For genuinely novel situations, that's what human-in-the-loop is for — and every time a human steps in, that interaction becomes potential fuel for the next evolution cycle.
## What Gets Evolved
Evolution can change almost anything about an agent:
**Prompts** — The most common fix. A node's system prompt gets refined based on the specific ways the LLM misunderstood its instructions.
**Graph structure** — Adding a validation node before a critical step, splitting a node that's trying to do too much, adding a fallback path for a common failure mode.
**Edge conditions** — Adjusting routing logic based on observed patterns. If low-confidence research results consistently lead to bad drafts, add a conditional edge that routes them back for another research pass.
**Tool selection** — Swapping in a better tool, adding a new one, or removing one that causes more problems than it solves.
**Constraints and criteria** — Tightening or loosening based on what's actually achievable and what matters in practice.
## The Role of Decision Logging
Evolution depends on good data. The runtime captures every decision an agent makes: what it was trying to do, what options it considered, what it chose, and what happened as a result. This isn't overhead — it's the signal that makes evolution possible.
Without decision logging, failure analysis is guesswork. With it, the coding agent can trace a failure back to its root cause and make a targeted fix rather than a blind change.
+101
View File
@@ -0,0 +1,101 @@
# Goals & Outcome-Driven Development
## The Core Idea
Business processes are outcome-driven. A sales team doesn't follow a rigid script — they adapt their approach until the deal closes. A support agent doesn't execute a flowchart — they resolve the customer's issue. The outcome is what matters, not the specific steps taken to get there.
Hive is built on this principle. Instead of hardcoding agent workflows step by step, you define the outcome you want, and the framework figures out how to get there. We call this **Outcome-Driven Development (ODD)**.
## Task-Driven vs Goal-Driven vs Outcome-Driven
These three paradigms represent different levels of abstraction for building agents:
**Task-Driven Development (TDD)** asks: *"Is the code correct?"*
You define explicit steps. The agent follows them. Success means the steps ran without errors. The problem: an agent can execute every step perfectly and still produce a useless result. The steps become the goal, not the actual outcome.
**Goal-Driven Development (GDD)** asks: *"Are we solving the right problem?"*
You define what you want to achieve. The agent plans and executes toward that goal. Better than TDD because it captures intent. But goals can be vague — "improve customer satisfaction" doesn't tell you when you're done.
**Outcome-Driven Development (ODD)** asks: *"Did the system produce the desired result?"*
You define measurable success criteria, hard constraints, and the context the agent needs. The agent is evaluated against the actual outcome, not whether it followed the right steps or aimed at the right goal. This is what Hive implements.
## Goals as First-Class Citizens
In Hive, a `Goal` is not a string description. It's a structured object with three components:
### Success Criteria
Each goal has weighted success criteria that define what "done" looks like. These aren't binary pass/fail checks — they're multi-dimensional measures of quality.
```python
Goal(
id="twitter-outreach",
name="Personalized Twitter Outreach",
success_criteria=[
SuccessCriterion(
id="personalized",
description="Messages reference specific details from the prospect's profile",
metric="llm_judge",
weight=0.4
),
SuccessCriterion(
id="compliant",
description="Messages follow brand voice guidelines",
metric="llm_judge",
weight=0.3
),
SuccessCriterion(
id="actionable",
description="Each message includes a clear call to action",
metric="output_contains",
target="CTA",
weight=0.3
),
],
...
)
```
Metrics can be `output_contains`, `output_equals`, `llm_judge`, or `custom`. Weights let you express what matters most — a perfectly compliant message that isn't personalized still falls short.
### Constraints
Constraints define what must **not** happen. They're the guardrails.
```python
constraints=[
Constraint(
id="no_spam",
description="Never send more than 3 messages to the same person per week",
constraint_type="hard", # Violation = immediate escalation
category="safety"
),
Constraint(
id="budget_limit",
description="Total LLM cost must not exceed $5 per run",
constraint_type="soft", # Violation = warning, not a hard stop
category="cost"
),
]
```
Hard constraints are non-negotiable — violating one triggers escalation or failure. Soft constraints are preferences that the agent should respect but can bend when necessary. Constraint categories include `time`, `cost`, `safety`, `scope`, and `quality`.
### Context
Goals carry context — domain knowledge, preferences, background information that the agent needs to make good decisions. This context is injected into every LLM call the agent makes, so the agent is always reasoning with the full picture.
## Why This Matters
When you define goals with weighted criteria and constraints, three things happen:
1. **The agent can self-correct.** Goals are injected into every LLM call, so the agent is always reasoning against its success criteria. Within a [graph execution](./graph.md), nodes use these criteria to decide whether to accept their output, retry, or escalate — self-correction in real time.
2. **Evolution has a target.** When an agent fails, the framework knows *which criteria* it fell short on, which gives the coding agent specific information to improve the next generation (see [Evolution](./evolution.md)).
3. **Humans stay in control.** Constraints define the boundaries. The agent has freedom to find creative solutions within those boundaries, but it can't cross the lines you've drawn.
The goal lifecycle flows through `DRAFT → READY → ACTIVE → COMPLETED / FAILED / SUSPENDED`, giving you visibility into where each objective stands at any point during execution.
+78
View File
@@ -0,0 +1,78 @@
# The Agent Graph
## Why a Graph
Real business processes aren't linear. A sales outreach might go: research a prospect, draft a message, realize the research is thin, go back and dig deeper, draft again, get human approval, send. There are loops, branches, fallbacks, and decision points.
Hive models this as a directed graph. Nodes do work, edges connect them, and shared memory lets them pass data. The framework walks this structure — running nodes, following edges, managing retries — until the agent reaches its goal or exhausts its step budget.
Edges can loop back, creating feedback cycles where an agent retries a step or takes a different path. That's intentional. A graph that only moves forward can't self-correct.
## Nodes
A node is a unit of work. Each node reads inputs from shared memory, does something, and writes outputs back. There are a handful of node types, each suited to a different kind of work:
**`event_loop`** — The workhorse. This is a multi-turn LLM loop: the model reasons about the current state, calls tools, observes results, and keeps going until it has produced the required outputs. Most of the interesting agent behavior happens in these nodes. They handle long-running tasks, manage their own context window, and can recover from crashes mid-conversation.
**`function`** — A plain Python function. No LLM involved. Use these for anything deterministic: data transformation, API calls with known parameters, validation logic, or any step where you don't want a language model making judgment calls.
**`router`** — A decision point that directs execution down different paths. Can be rule-based ("if confidence is high, go left; otherwise, go right") or LLM-powered ("given the goal and what we know so far, which path makes sense?").
**`human_input`** — A pause point where the agent stops and asks a human for input before continuing. See [Human-in-the-Loop](#human-in-the-loop) below.
There are also simpler LLM node types (`llm_tool_use` for a single LLM call with tools, `llm_generate` for pure text generation) for steps that don't need the full event loop.
### Self-Correction Within a Node
The most important behavior in an `event_loop` node is the ability to self-correct. After each iteration, the node evaluates its own output: did it produce what was needed? If yes, it's done. If not, it tries again — but this time it sees what went wrong and adjusts.
This is the **reflexion pattern**: try, evaluate, learn from the result, try again. It's cheaper and more effective than starting over. An agent that takes three attempts to get something right is still more useful than one that fails on the first try and gives up.
Within a single node, the outcomes are:
- **Accept** — Output meets the bar. Move on.
- **Retry** — Not good enough, but recoverable. Try again with feedback.
- **Escalate** — Something is fundamentally broken. Hand off to error handling.
This is self-correction *within a session* — the agent adapting in real time. It's different from [evolution](./evolution.md), which improves the agent *across sessions* by rewriting its code between generations. Both matter: reflexion handles the bumps in a single run, evolution handles the patterns that keep recurring across many runs.
## Edges
Edges control flow between nodes. Each edge has a condition:
- **On success** — follow this edge if the source node succeeded
- **On failure** — follow if the source failed (this is how you wire up fallback paths and error recovery)
- **Conditional** — follow if an expression is true (e.g., route high-confidence results one way, low-confidence results another)
- **LLM-decided** — let the LLM choose which path based on the [goal](./goals_outcome.md) and current context
Edges also handle data plumbing between nodes — mapping one node's outputs to another node's expected inputs, so each node has a clean interface without needing to know where its data came from.
When a node has multiple outgoing edges, the framework can run those branches in parallel and reconverge when they're all done. This is useful for tasks like researching a prospect from multiple sources simultaneously.
## Shared Memory
Shared memory is how nodes communicate. It's a key-value store scoped to a single [session](./worker_agent.md). Every node declares which keys it reads and which it writes, and the framework enforces those boundaries — a node can't quietly access data it hasn't declared.
Data flows through the graph in a natural way: input arrives at the start, each node reads what it needs and writes what it produces, and edges map outputs to inputs as data moves between nodes. At the end, the full memory state is the execution result.
## Human-in-the-Loop
Human-in-the-loop (HITL) nodes are where the agent pauses and asks a person for input. This isn't a blunt "stop everything" — the framework supports structured questions: open-ended text, multiple choice, yes/no approvals, and multi-field forms.
When the agent hits a HITL node, it saves its entire state and presents the questions. The session can sit paused for minutes, hours, or days. When the human responds, execution picks up exactly where it left off.
This is what makes Hive agents supervisable in production. You place HITL nodes at critical decision points — before sending a message, before making a purchase, before any action that's hard to undo. The agent handles the routine work autonomously; humans weigh in on the decisions that matter. And every time a human provides input, that decision becomes data the [evolution](./evolution.md) process can learn from.
## The Shape of an Agent
A typical agent graph looks something like this:
```
intake → research → draft → [human review] → send → done
↑ |
└──── on failure ─────────────────┘
```
An entry node where work begins. A chain of nodes that do the real work. HITL nodes at approval gates. Failure edges that loop back for another attempt. Terminal nodes where execution ends.
The framework tracks everything as it walks the graph: which nodes ran, how many retries each needed, how much the LLM calls cost, how long each step took. This metadata feeds into the [worker agent runtime](./worker_agent.md) for monitoring and into the [evolution](./evolution.md) process for improvement.
+51
View File
@@ -0,0 +1,51 @@
# The Worker Agent
## What a Worker Agent Is
A worker agent is a specialized AI agent built to perform a specific business process. It's not a general-purpose assistant — it's purpose-built, like hiring someone for a defined role. A sales outreach agent knows how to research prospects, craft personalized messages, and follow up. A support triage agent knows how to categorize tickets, pull customer context, and route to the right team.
In Hive, a **Coding Agent** (like Claude Code or Cursor) generates worker agents from a natural language goal description. You describe what you want the agent to do, and the coding agent produces the graph, nodes, edges, and configuration. The worker agent is the thing that actually runs.
## Sessions
A session is a single execution of a worker agent against a specific input. If your outreach agent processes 50 prospects, that's 50 sessions.
Each session is isolated — it has its own shared memory, its own execution state, and its own history. This matters because sessions can be long-running. An agent might start researching a prospect, pause for human approval, wait hours or days, and then resume to send the message. The session preserves everything across that gap.
Sessions also make debugging straightforward. Every decision the agent made, every tool it called, every retry it attempted — it's all captured in the session. When something goes wrong, you can trace exactly what happened.
## Iterations
Within a session, nodes (especially `event_loop` nodes) work in iterations. An iteration is one turn of the loop: the LLM reasons about the current state, possibly calls tools, observes results, and produces output. Then the judge evaluates: is this good enough?
If not, the node iterates again. The LLM sees what went wrong and adjusts its approach. This is how agents self-correct without human intervention — through rapid iteration within a single node, not by restarting the whole process.
Iterations have limits. You set a maximum per node to prevent runaway loops. If a node can't produce acceptable output within its iteration budget, it fails and the graph's error-handling edges take over.
## Headless Execution
A lot of business processes need to run continuously — monitoring inboxes, processing incoming leads, watching for events. These agents run **headless**: no UI, no human sitting at a terminal, just the agent doing its job in the background.
Headless doesn't mean unsupervised. HITL (human-in-the-loop) nodes still pause execution and wait for human input when the agent hits a decision it shouldn't make alone. The difference is that instead of a live conversation, the agent sends a notification, waits for a response through whatever channel you've configured, and resumes when the human weighs in.
This is the operational model Hive is designed for: agents that run 24/7 as part of your business infrastructure, with humans stepping in only when needed. The goal is to automate the routine and escalate the exceptions.
## The Runtime
The worker agent runtime manages the lifecycle: starting sessions, executing the graph, handling pauses and resumes, tracking costs, and collecting metrics. It coordinates everything the agent needs — LLM access, tool execution, shared memory, credential management — so individual nodes can focus on their specific job.
Key things the runtime handles:
**Cost tracking** — Every LLM call is metered. You set budget constraints on the goal, and the runtime enforces them. An agent can't silently burn through your API credits.
**Decision logging** — Every meaningful choice the agent makes is recorded: what it was trying to do, what options it considered, what it chose, and what happened. This isn't just for debugging — it's the raw material that evolution uses to improve future generations.
**Event streaming** — The runtime emits events as the agent works. You can wire these up to dashboards, logs, or alerting systems to monitor agents in real time.
**Crash recovery** — If execution is interrupted (process crash, deployment, anything), the runtime can resume from the last checkpoint. Conversation state and memory are persisted, so the agent picks up where it left off rather than starting over.
## The Big Picture
The worker agent model is Hive's answer to a simple question: how do you run AI agents like you'd run a team?
You hire for a role (define the goal), you onboard them with context (provide tools, credentials, domain knowledge), you set expectations (success criteria and constraints), you let them work independently (headless execution), and you check in when something unusual comes up (HITL). When they're not performing well, you don't debug them line by line — you evolve them (see [Evolution](./evolution.md)).

Some files were not shown because too many files have changed in this diff Show More