Compare commits

...

177 Commits

Author SHA1 Message Date
Richard Tang cd014e41e4 docs: update links in the README.md 2026-02-06 12:44:34 -08:00
Richard Tang 830f11c47d docs: add key concept section 2026-02-06 12:41:22 -08:00
Timothy @aden b22be7a6cb Merge pull request #3818 from TimothyZhang7/main
(micro-fix)(skills): cursor skill symlinks to claude skill
2026-02-06 09:32:23 -08:00
Timothy @aden 5179677e8f Merge pull request #3744 from adenhq/chore/update-hive-credential
(micro-fix): update hive-credentials
2026-02-05 18:55:19 -08:00
bryan 2c25b2eae7 Merge branch 'main' into chore/update-hive-credential 2026-02-05 18:45:11 -08:00
RichardTang-Aden f6705fe2d3 Merge pull request #3746 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix format
2026-02-05 18:36:32 -08:00
Richard Tang c2771fed20 chore: fix format 2026-02-05 18:30:50 -08:00
RichardTang-Aden fc781eccd9 Merge pull request #3745 from RichardTang-Aden/integration-ci
(micro-fix)(chore): fix lint
2026-02-05 18:15:38 -08:00
bryan d5a25ae081 update hive-credentials 2026-02-05 18:13:25 -08:00
Richard Tang 23b6fb6391 chore: fix lint 2026-02-05 18:12:47 -08:00
Timothy 433967f0cf fix: cursor skill symlinks to claude skill 2026-02-05 18:11:24 -08:00
RichardTang-Aden 2a876c2a10 Merge pull request #3743 from RichardTang-Aden/integration-ci
feat(ci): add integration credential specs and CI validation
2026-02-05 18:06:22 -08:00
Richard Tang ff0adeaba7 docs: update outdated skill references 2026-02-05 18:00:06 -08:00
Richard Tang 846edbf256 docs: update documentation structure 2026-02-05 18:00:04 -08:00
Richard Tang c68dd48f6d feat: add slack credential spec and contribution doc 2026-02-05 17:39:44 -08:00
Richard Tang 50c0a5da9e feat: integration credentials implementation check 2026-02-05 17:06:34 -08:00
Timothy @aden 2f0e5c42f1 Merge pull request #3724 from TimothyZhang7/main
docs(hive): hive commands rebrand
2026-02-05 15:06:25 -08:00
Timothy @aden 903288468a Merge pull request #3725 from adenhq/chore/gmail-to-google
(micro-fix): changing gmail to google
2026-02-05 14:54:18 -08:00
bryan 9e3bba6f59 updated tests 2026-02-05 14:52:19 -08:00
bryan bc16f0752f changing gmail to google 2026-02-05 14:46:38 -08:00
Timothy 86badd70fa docs(hive): hive commands rebrand 2026-02-05 14:35:50 -08:00
Timothy @aden ce5379516c Merge pull request #3722 from TimothyZhang7/main
docs(templates): put example templates in there
2026-02-05 14:31:50 -08:00
Timothy a50078bbf2 chore: moves the templates 2026-02-05 14:25:49 -08:00
Timothy 2cef168442 fix: aden hive url 2026-02-05 14:08:18 -08:00
Timothy @aden 0a1a9e3545 Merge pull request #3720 from TimothyZhang7/feature/example-agent-registry
docs(skills): Rename skills to hive-* namespace and improve create workflow
2026-02-05 13:59:45 -08:00
Timothy 3c8682d80c fix: mention of skill in readme 2026-02-05 13:59:02 -08:00
Timothy ecc5a1608f fix: make sure of the skill ordering 2026-02-05 13:54:20 -08:00
RichardTang-Aden bc81b55600 Merge pull request #3713 from adenhq/update/gmail-send-tool
(micro-fix): created gmail send tool
2026-02-05 13:15:08 -08:00
Timothy 28b628c1b4 fix: update skill names and examples 2026-02-05 13:13:19 -08:00
Timothy 148264ac73 fix: skill problems 2026-02-05 13:11:18 -08:00
bryan 4046e4e379 created gmail send tool 2026-02-05 13:10:47 -08:00
Timothy 28298d9af2 fix: streamline the executor configuration and data tool usage 2026-02-05 12:50:00 -08:00
Anshumaan Saraf 34782a6b85 docs(CONTRIBUTING): add upstream sync steps (#3477)
Fixes #2692

Added steps to configure the upstream remote and sync the main branch
before creating a feature branch. This helps contributors avoid starting
from stale code and reduces merge conflicts.
2026-02-05 16:28:07 +08:00
Patrick d25d94e71b docs(aden-credential-sync): typo (#3601) 2026-02-05 16:11:13 +08:00
Timothy @aden 51f1b449cd Merge pull request #3584 from TimothyZhang7/main
fix: gap between lint and format
2026-02-04 21:05:22 -08:00
Timothy 804e47dde4 fix: gap between lint and format 2026-02-04 21:02:50 -08:00
Timothy @aden 582c810d15 Merge pull request #3583 from TimothyZhang7/main
fix: test case
2026-02-04 20:59:58 -08:00
Timothy cede629718 fix: test case 2026-02-04 20:53:53 -08:00
Timothy @aden 10941dc7fc Merge pull request #3579 from TimothyZhang7/fix/do-not-mention-deprecated-nodes
Release / Create Release (push) Waiting to run
fix: mentions of deprecated nodes in agent builder
2026-02-04 20:46:10 -08:00
Timothy c1c16878e4 fix: mentions of deprecated nodes in agent builder 2026-02-04 20:42:16 -08:00
Timothy @aden 80a41b434b Merge pull request #3240 from levxn/main
Integration: Advanced Slack MCP tools Integration (~45+ tools), compatibility and working checked with all other existing tools
2026-02-04 20:34:24 -08:00
Timothy 9a8e117f1d chore: fix lint and tests 2026-02-04 20:30:47 -08:00
Bryan @ Aden 878603033a Merge pull request #3573 from TimothyZhang7/feature/quickstart-credential-store
feat: credential store init, add textual dep, standardize uv commands
2026-02-04 20:20:52 -08:00
Timothy 1c6f17e8db Merge remote-tracking branch 'origin/main' into feature/quickstart-credential-store 2026-02-04 20:03:44 -08:00
Timothy 8f32ef8064 chore: uv for all 2026-02-04 19:57:41 -08:00
Bryan @ Aden e12bc96e21 Merge pull request #3557 from TimothyZhang7/feature/tui-dashboard
feat: TUI dashboard, EventLoopNode refinements, auto-creation, data tools, and runner overhaul
2026-02-04 19:04:41 -08:00
RichardTang-Aden 2355d3d729 Merge pull request #3525 from Acid-OP/fix/on-failure-edge-routing
fix: follow ON_FAILURE edges when node fails after max retries
2026-02-04 17:00:35 -08:00
Richard Tang a093a59cb0 test: add tests for ON_FAILURE edge routing after max retries
Covers:
- ON_FAILURE edge followed when node fails after max retries
- Original termination behavior preserved when no ON_FAILURE edge exists
- ON_FAILURE edge not followed on success (only ON_SUCCESS fires)
- ON_FAILURE routing with max_retries=0 (no retries)
- Failure handler appears in execution path and node_visit_counts
2026-02-04 16:58:30 -08:00
Gaurav Kapur d7917988c3 fix: follow ON_FAILURE edges when node fails after max retries 2026-02-04 16:51:46 -08:00
Timothy @aden ae566a2027 Merge pull request #2652 from mubarakar95/feature/tui-dashboard
feat: Implement production-ready TUI with interactive agent execution, real-time monitoring, and screenshot support
2026-02-04 15:43:54 -08:00
Timothy b15473d3f3 fix: graph validation 2026-02-04 15:32:53 -08:00
RichardTang-Aden 265bf885ec Merge pull request #3556 from RichardTang-Aden/remove-unused-scripts
chore(scripts): remove deprecated setup scripts
2026-02-04 15:07:12 -08:00
Richard Tang e318281989 refactor: clean up the use of setup-python 2026-02-04 14:49:39 -08:00
Timothy 3e2a11d60d feat: integrate agent builder with tui 2026-02-04 14:47:28 -08:00
Timothy @aden 4b9f73310e Merge branch 'main' into feature/tui-dashboard 2026-02-04 10:53:43 -08:00
Levin b17c26116d Merge branch 'adenhq:main' into main 2026-02-04 23:46:35 +05:30
Timothy @aden 3114af75e4 Merge pull request #3536 from TimothyZhang7/feature/coding-agent-reskill
feat: update skills and agent builder tools, bump pinned ruff version
2026-02-04 10:11:58 -08:00
Levin 7a6d10639b Merge branch 'main' into main 2026-02-04 23:39:17 +05:30
Timothy 6ff29ea6aa fix: max retry go back to 0 for event loop node 2026-02-04 10:08:40 -08:00
Timothy a23f01973a feat: update skills and agent builder tools, bump pinned ruff version 2026-02-04 09:52:28 -08:00
bryan 0aaa3a3eca uv.lock update 2026-02-04 07:57:22 -08:00
bryan 82f05d1102 Merge branch 'main' into feature/tui-dashboard 2026-02-04 07:49:08 -08:00
Bryan @ Aden 8ff6d9c8bd Merge pull request #3423 from adenhq/event-loop-arch
Event Loop Architecture: Streaming Multi-Turn Agent Nodes
2026-02-04 07:43:56 -08:00
bryan a2e102fe15 windows lint fix 2026-02-03 20:00:11 -08:00
Timothy 119280da1a Merge remote-tracking branch 'upstream/main' into event-loop-arch
Resolve conflict in tools/mcp_server.py: take main's
CredentialStoreAdapter.default() which encapsulates the same
CompositeStorage logic our branch had inline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:42:47 -08:00
RichardTang-Aden 4d49f74d5a Merge pull request #3372 from ranjithkumar9343/ranjithkumar9343-patch-1
docs: add Windows compatibility warning
2026-02-03 19:36:50 -08:00
Timothy 6a42b9c66b fix: resolve CI failures in lint and tests
- Fix max_node_visits blocking executor retries: the visit count was
  incremented on every loop iteration including retries, causing nodes
  with max_node_visits=1 (default) to be skipped on retry. Added
  _is_retry flag to distinguish retries from new visits via edge
  traversal.

- Fix 20 UP042 lint errors: replace (str, Enum) with StrEnum across
  14 files. Python 3.11+ StrEnum is preferred and enforced by ruff.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:36:35 -08:00
RichardTang-Aden fc4a39480a Merge pull request #3455 from adenhq/feat/non-oauth-setup
update credentials store to work with non-oauth keys
2026-02-03 19:31:46 -08:00
Timothy b98afb01c8 chore: lint 2026-02-03 18:01:39 -08:00
Timothy ccd6bb7656 chore: lint 2026-02-03 17:57:02 -08:00
bryan ea30e5c631 consolidate workspace to uv monorepo 2026-02-03 17:47:57 -08:00
Timothy d16a3c3b22 chore: give comand line to demo 2026-02-03 17:38:21 -08:00
Timothy a03bd78c2e fix: formulate tool call results 2026-02-03 17:25:44 -08:00
bryan 3cca41aab1 updating tests 2026-02-03 15:17:19 -08:00
bryan d19aaed946 fixing linter issues 2026-02-03 14:55:51 -08:00
RichardTang-Aden 9a7db8cf94 Merge pull request #3450 from adenhq/adenhq-patch-1
(micro-fix): Update issue templates
2026-02-03 14:53:09 -08:00
bryan f50630c551 update credentials store to work with non-oauth keys 2026-02-03 14:49:12 -08:00
Timothy 0ef2e64733 fix: use mcp tools properly 2026-02-03 14:36:09 -08:00
Aden HQ 3a8e121d43 Update issue templates 2026-02-03 14:24:13 -08:00
bryan 23e249144d Merge remote-tracking branch 'origin/main' into feature/tui-dashboard
# Conflicts:
#	.github/workflows/ci.yml
2026-02-03 12:26:51 -08:00
bryan 25014bfa89 update ci to use uv, updated linting 2026-02-03 12:14:13 -08:00
bryan 78ea585779 tui upgrades 2026-02-03 11:55:22 -08:00
bryan ac13c11f89 Merge remote-tracking branch 'origin/event-loop-arch' into feature/tui-dashboard 2026-02-03 11:06:44 -08:00
Timothy 51d341b88c fix: tool pruning logic 2026-02-03 10:29:35 -08:00
Timothy 7dd70b8e31 feat: tool truncation 2026-02-03 08:50:14 -08:00
ranjithkumar9343 84b332d989 docs: add Windows compatibility warning
Added a note for Windows users to use WSL/Git Bash to prevent setup errors.
2026-02-03 17:43:36 +05:30
Levin fd1826a267 Merge branch 'adenhq:main' into main 2026-02-03 16:11:05 +05:30
Anjali Yadav bcc6848275 fix(mcp): handle missing exports directory in test generation tools (#3066)
* fix(mcp): handle missing exports path in test generation tools

* fix(mcp): centralize agent path validation across test tools

* fix: remove duplicate if blocks and improve error hint message

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-03 18:15:34 +08:00
Hundao 75dd053a40 fix(ci): migrate remaining CI jobs from pip to uv (#3366)
Closes #3363
2026-02-03 18:11:32 +08:00
Yogesh Sakharam Diwate 20f2aa09f2 docs(readme): update architecture diagram src path (#3348) 2026-02-03 14:58:29 +08:00
RichardTang-Aden fb8c810b3d Merge pull request #3293 from Antiarin/feat/llm-datetime-context
feat: inject runtime datetime into LLM system prompts
2026-02-02 20:07:00 -08:00
levxn b99b6c5cd3 latest upstream 2026-02-03 09:35:38 +05:30
RichardTang-Aden ad21cf4243 Merge pull request #3327 from adenhq/fix-for-quickstart
fix: replace the use of PYTHON_CMD in quickstart.sh
2026-02-02 19:35:19 -08:00
Richard Tang 1e45cfff67 feat: Check for litellm import in both CORE_PYTHON and TOOLS_PYTHON environments. 2026-02-02 19:27:32 -08:00
Richard Tang 0280600a47 fix: fix error when test import installation 2026-02-02 19:24:13 -08:00
RichardTang-Aden 571ad518dc Merge pull request #3064 from kuldeepgaur02/fix/quickstart-provider-selection
fix: silent exit when selecting non-Anthropic LLM provider
2026-02-02 19:03:16 -08:00
RichardTang-Aden fe37a25cf1 Merge pull request #3322 from RichardTang-Aden/main
fix: Resolve quickstart.sh compatibility issues and migrate from pip to uv
2026-02-02 18:55:00 -08:00
Timothy e06138628c chore: remove local claude settings 2026-02-02 18:48:18 -08:00
Timothy 1ed0edd158 Merge remote-tracking branch 'upstream/main' into event-loop-arch
# Conflicts:
#	.claude/settings.local.json
2026-02-02 18:46:07 -08:00
Richard Tang 49dbc46082 feat: migrate from pip to uv 2026-02-02 18:45:09 -08:00
Richard Tang a16a4adc09 feat: add message when LLM key is not available 2026-02-02 18:41:20 -08:00
Richard Tang b4ab1cbd56 fix: fix quickstart competibility 2026-02-02 18:34:09 -08:00
Timothy 6faa63f0d0 fix: loop prevention in feedback edges 2026-02-02 18:26:45 -08:00
bryan f4737dcfe7 Merge remote-tracking branch 'origin/event-loop-arch' into feature/tui-dashboard 2026-02-02 18:25:46 -08:00
Richard Tang 2b44af427f fix: quickstart.sh competibility fix 2026-02-02 18:21:40 -08:00
RichardTang-Aden 11f7401bc2 Merge branch 'adenhq:main' into main 2026-02-02 18:00:40 -08:00
RichardTang-Aden db7b5180dd Merge pull request #3270 from adenhq/bot-detecting-issue-size
feat: Edit bot prompt to be able to decide on the technical size of issue
2026-02-02 17:49:52 -08:00
Timothy @aden 5b4e56252c Merge pull request #2820 from lakshitaa-chellaramani/feature/github-tool
Feature/GitHub tool
2026-02-02 17:49:34 -08:00
Timothy e3c71f77de chore: fix ruff format 2026-02-02 17:37:37 -08:00
Timothy b09824faec chore: fix lint 2026-02-02 17:36:02 -08:00
RichardTang-Aden c69bc24598 Merge pull request #3301 from adenhq/add-example-structure
docs: sample agent folder, remove docker file in Readme
2026-02-02 16:47:04 -08:00
Richard Tang 0cf17e1c63 feat: sample agent folder, remove docker file in Readme 2026-02-02 14:15:58 -08:00
Timothy @aden feac803491 Merge pull request #3256 from adenhq/feat/integration-tests
Feat/integration tests
2026-02-02 13:23:42 -08:00
Timothy 4aacec30d8 fix: text delta granularity, tool limit problem 2026-02-02 13:21:50 -08:00
RichardTang-Aden b459a2f7a9 Merge pull request #918 from Siddharth2624/fix-malformed-json-tool-args
Handle malformed JSON tool arguments in LiteLLMProvider
2026-02-02 13:04:13 -08:00
bryan ca7f6d3514 fixes to linting 2026-02-02 12:52:11 -08:00
Antiarin ca8ede65f0 feat: inject runtime datetime into LLM system prompts 2026-02-03 02:10:14 +05:30
bryan b033c56ae5 Merge remote-tracking branch 'origin/main' into feature/tui-dashboard 2026-02-02 12:29:27 -08:00
Richard Tang 9a177c46e1 feat: edit bot prompt to be able to decide on the technical size of issue 2026-02-02 11:39:44 -08:00
bryan d49e858d32 lint update 2026-02-02 11:12:09 -08:00
Timothy @aden 20bea9cd7f Merge pull request #2273 from krish341360/fix/concurrent-storage-race-condition
Release / Create Release (push) Waiting to run
fix: race condition in ConcurrentStorage and cache invalidation bug
2026-02-02 10:57:45 -08:00
bryan d7afa5dcf2 wp-12 2026-02-02 10:41:12 -08:00
Timothy 22e816bf86 chore: update gitignore 2026-02-02 10:30:03 -08:00
krish341360 a7709d489c style: apply ruff formatting to test file 2026-02-02 23:53:27 +05:30
Timothy @aden 3240616808 Merge pull request #3250 from adenhq/feat/validation-client-facing
(micro-fix): added graph validation for client-facing nodes [WP-10]
2026-02-02 10:02:38 -08:00
krish341360 18dfc997b8 fix: resolve lint errors in test file 2026-02-02 23:24:21 +05:30
Timothy @aden 92d0b6addf Merge pull request #3050 from Rockysahu704/rocky-first-contribution
docs: clearify who Hive is for and when to use it
2026-02-02 09:51:05 -08:00
Timothy @aden b9f83d4d61 Merge pull request #3244 from TimothyZhang7/feature/aden-sync-by-provider
Feature/aden sync by provider
2026-02-02 09:39:00 -08:00
levxn 694feaffd2 phase 3 tools implemented, totals now to 45+ tools for Slack for multipurpose integration including CRM support 2026-02-02 22:44:10 +05:30
Timothy @aden 9c16826ad3 Merge pull request #3137 from adenhq/feat/clientIO-gateway
implemented clientIO gateway [WP-9]
2026-02-02 07:29:03 -08:00
levxn eb68e2143b slack tools add ons and latest upstream commit 2026-02-02 19:07:05 +05:30
Harsh Kishorani f305745295 feat(setup): add native PowerShell setup script for Windows (#746)
* feat(setup): add PowerShell setup script with venv for Windows

* docs: restore PEP 668 troubleshooting section

* docs: restore Alpine Linux setup section

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-02 17:02:19 +08:00
Timothy df4d0ad3fd feat: aden provider credential store by provider 2026-02-01 20:34:21 -08:00
bryan 9034d1dc71 lint fix 2026-02-01 20:26:36 -08:00
bryan 537172d8ce implemented clientIO gateway [WP-9] 2026-02-01 20:23:26 -08:00
Timothy 20b2e4b3dd fix: robust compaction logic 2026-02-01 19:59:27 -08:00
RichardTang-Aden fc22586752 Merge pull request #3128 from adenhq/fix/tests
(micro-fix): fixed pytests and warnings
2026-02-01 19:53:07 -08:00
Richard Tang 646440eba3 chore: update developer doc 2026-02-01 19:49:35 -08:00
Richard Tang 53e5579326 fix: remove requirements.txt 2026-02-01 19:45:32 -08:00
Richard Tang 29a1630d0f feat: add tool tests in CI 2026-02-01 19:38:33 -08:00
bryan 171f4ab2ae fixed pytests and warnings 2026-02-01 19:11:44 -08:00
Timothy @aden a86043a2ec Merge pull request #3127 from TimothyZhang7/feature/event-loop-wp8
Feature/event loop wp8
2026-02-01 19:07:33 -08:00
Timothy 3947da2cf1 Merge upstream/event-loop-arch into feature/event-loop-wp8
Brings in upstream changes: email tool, csv/pdf fixes, docs updates,
agent builder export atomicity fix, JSON extraction validation bugfix.
No conflicts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:06:30 -08:00
Timothy 17caab6563 feature: remove hard failure on schema mismatch for context hand off 2026-02-01 18:55:41 -08:00
Timothy @aden a5ae071a03 Merge pull request #723 from trinh31201/bugfix/json-extraction-validation
micro-fix(graph): validate LLM JSON extraction to prevent empty/fabricated data
2026-02-01 18:51:51 -08:00
Timothy 94d31743b0 fix: sync with wp7 2026-02-01 18:14:04 -08:00
Timothy 70db618c6e feat: event loop node implementation 2026-02-01 17:16:18 -08:00
levxn 960a4549ef latest upstream v1 2026-02-01 19:03:39 +05:30
Kuldeep Raj Gour 363a650dfa fix: silent exit when selecting non-Anthropic LLM provider
prompt_choice used return codes to pass selections. Combined with set -e,
non-zero returns (options 2-6) caused immediate script exit.

Fix: Use global variable PROMPT_CHOICE instead of return codes.
2026-02-01 15:51:46 +05:45
Rocky Sahu b6e2634537 docs: clearify who Hive is for and when to use it 2026-02-01 14:38:02 +05:30
Siddharth Varshney 9f424f2fc0 Remove unused Fake* classes and unrelated note block
- Remove unused FakeFunction, FakeToolCall, FakeMessage, FakeChoice, FakeResponse classes from test_litellm_provider.py
- Remove unrelated note block from building-production-ai-agents.md
- Fix lint issues (trailing whitespace)
2026-01-31 20:56:52 +00:00
levxn 25989d9f90 slack add ons v3, now ~30 tools 2026-02-01 01:29:32 +05:30
Lakshitaa Chellaramani 0715fc5498 Merge branch 'main' into feature/github-tool 2026-01-31 23:31:22 +05:30
lakshitaa f9fddd6663 fix(github-tool): Address PR feedback - security and integration fixes
Addresses all blockers and suggestions from code review:

**Blockers fixed:**
1. Register tools in tools/__init__.py - Added import, registration call,
   and all 13 tool names to return list
2. Add credential spec - Created GitHub entry in credentials/integrations.py
   with env_var, tools list, help URL, and health check config
3. Move tests to correct location - Relocated from
   tools/src/.../github_tool/tests/ to tools/tests/tools/test_github_tool.py
4. Removed .claude/settings.local.json from PR

**Security improvements:**
1. URL parameter sanitization - Added _sanitize_path_param() to reject
   path traversal attempts (/ or ..) in owner, repo, branch, username params
2. Error message sanitization - Added _sanitize_error_message() to prevent
   token leaks from httpx.RequestError exceptions

All 38 tests passing.
2026-01-31 23:26:33 +05:30
levxn 684da96a83 slack add ons v2, now 15 tools in total 2026-01-31 20:39:38 +05:30
levxn abae7979cb excluded a file not needed 2026-01-31 18:37:35 +05:30
levxn 49bce57fcf slack bot integration v1 2026-01-31 18:35:27 +05:30
Richard Tang 63d017fc21 fix: bash version support 2026-01-30 20:32:56 -08:00
Timothy c52ce6bb49 Merge branch 'feature/event-loop-framework' into test/wp1-wp2-wp6-combined 2026-01-30 16:34:12 -08:00
Timothy bcddd4ce77 Merge branch 'feature/credential-manager-aden-provider' into test/wp1-wp2-wp6-combined 2026-01-30 16:30:54 -08:00
Timothy 017872f71b feat: emit bus events 2026-01-30 16:27:39 -08:00
lakshitaa bfb660275e feat(tools): Add GitHub tool for repository and issue management
Implements comprehensive GitHub REST API v3 integration with 15 MCP tools
for managing repositories, issues, pull requests, code search, and branches.

Features:
- Repository management (list, get, search repos)
- Issue operations (create, update, close, list issues)
- Pull request management (create, list, get PRs)
- Code search across GitHub
- Branch operations (list, get branch info)

Technical details:
- 15 MCP tools organized in 5 categories
- 38 comprehensive tests with mocking (all passing)
- Full credential store support (env var + CredentialStoreAdapter)
- Proper error handling (timeout, network, API errors)
- Follows HubSpot/Slack tool patterns exactly

Files:
- tools/src/aden_tools/tools/github_tool/github_tool.py (757 lines)
- tools/src/aden_tools/tools/github_tool/tests/test_github_tool.py (628 lines)
- tools/src/aden_tools/tools/github_tool/README.md (646 lines)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 04:32:46 +05:30
lakshitaa d6ae48bc58 Merge upstream/main 2026-01-31 03:19:12 +05:30
Timothy 7e670ce0a8 feat: event loop WP1-4 2026-01-30 11:43:19 -08:00
mubarakar95 d32308b6d2 Add TUI enhancements: screenshot feature, header polish, and keybinding updates
- Implement SVG screenshot functionality (Ctrl+S)
- Remove header icon and disable expansion
- Hide borders in screenshots for clean output
- Change command palette to Ctrl+O
- Make screenshot shortcut work globally (priority binding)
2026-01-30 16:56:34 +05:30
mubarakar95 604d16e353 Enhance TUI: Fix rendering, polish layout, and clean up header 2026-01-30 15:40:15 +05:30
mubarakar95 db577785d6 feat: Implement fully functional TUI dashboard
- Fix ScreenStackError crash by moving runtime init inside async context
- Implement selectable logging with TextArea widget
- Create interactive ChatREPL for agent execution
- Optimize 3-pane layout: logs/graph on left (60%), chat on right (40%)
- Add thread-safe event handling with call_from_thread
- Add TUI selection guide documentation
All features tested and working.
2026-01-30 11:49:04 +05:30
mubarakar95 c9ae3a0541 feat: finalize TUI with minimal stable mode
The TUI feature is fully functional with a minimal, stable interface:
- Header with title
- Central display area
- Footer with keybindings

This provides a foundation for future enhancements. Custom widgets
(LogPane, GraphOverview, ChatRepl) are available in the codebase
and can be integrated once Textual rendering issues on Windows are
resolved.

The --tui flag successfully launches the TUI dashboard for any agent:
  hive run <agent_path> --tui

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:31:59 +05:30
mubarakar95 ed95dab9f3 fix: implement lazy widget loading and store references
Defer widget creation from __init__ to compose() and store references
as instance variables to prevent garbage collection and initialization
order issues. This resolves ScreenStackError during TUI startup.

Changes:
- Move LogPane, GraphOverview, ChatRepl creation to compose()
- Store widgets as instance variables (self.log_pane, etc)
- Restore Horizontal/Vertical container layout

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:27:56 +05:30
mubarakar95 a6536cef94 fix: restore Horizontal/Vertical layout containers in TUI compose
The compose() method was missing the proper container structure
for the layout, which caused initialization to fail. Restored the
Horizontal/Vertical container layout with proper pane structure.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:22:41 +05:30
mubarakar95 3ccc81e81c feat: add interactive TUI dashboard for agent execution
Implement a Textual-based terminal UI for the hive framework that displays:
- Agent execution status and progress
- Log output in real-time
- Graph visualization of agent execution flow
- Interactive REPL for user input/feedback

Changes:
- Add core/framework/tui/ module with AdenTUI app and custom widgets
- Add LogPane widget for streaming log output
- Add GraphOverview widget for execution graph visualization
- Add ChatRepl widget for interactive user input
- Add TUILogHandler for capturing framework logs to TUI
- Update cli.py to support --tui flag for launching dashboard
- Update runner.py to enable AgentRuntime when TUI is active
- Fix missing Textual container imports (Horizontal, Vertical, Container)
- Fix race conditions in async TUI initialization
- Fix threading issues in app event handling

The TUI is launched via: hive run <agent_path> --tui

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-30 01:19:28 +05:30
krish341360 94197cbcb9 fix: race condition in ConcurrentStorage and cache invalidation bug
- Fix race condition: cache now updates only after successful write
- Fix cache invalidation: summary cache invalidated on save_run()
- Add 4 tests to verify the fixes
2026-01-29 16:35:38 +05:30
trinh31201 3ee6d98905 fix(graph): validate LLM JSON extraction to prevent empty/fabricated data 2026-01-29 01:04:08 +07:00
Siddharth Varshney a96cd546c8 Merge branch 'main' into fix-malformed-json-tool-args 2026-01-28 15:35:33 +05:30
Siddharth Varshney eb33d4f1c2 Remove duplicate malformed JSON tool-call test 2026-01-28 09:57:27 +00:00
Siddharth Varshney 4253956326 Handle malformed JSON tool arguments safely 2026-01-28 09:49:17 +00:00
Siddharth Varshney d6b05bf337 Handle malformed JSON tool arguments in LiteLLMProvider 2026-01-26 23:27:32 +00:00
221 changed files with 41751 additions and 4250 deletions
-40
View File
@@ -1,40 +0,0 @@
{
"permissions": {
"allow": [
"Bash(npm install:*)",
"Bash(npm test:*)",
"Skill(building-agents-construction)",
"Skill(building-agents-construction:*)",
"Bash(PYTHONPATH=core:exports pytest:*)",
"mcp__agent-builder__create_session",
"mcp__agent-builder__get_session_status",
"mcp__agent-builder__set_goal",
"mcp__agent-builder__list_mcp_servers",
"mcp__agent-builder__test_node",
"mcp__agent-builder__add_node",
"mcp__agent-builder__add_edge",
"mcp__agent-builder__validate_graph",
"Bash(ruff check:*)",
"Bash(PYTHONPATH=core:exports python:*)",
"mcp__agent-builder__list_tests",
"mcp__agent-builder__generate_constraint_tests",
"Bash(python -m agent:*)",
"Bash(python agent.py:*)",
"Bash(python -c:*)",
"Bash(done)",
"Bash(xargs cat:*)",
"mcp__agent-builder__list_mcp_tools",
"mcp__agent-builder__add_mcp_server",
"mcp__agent-builder__check_missing_credentials",
"mcp__agent-builder__store_credential",
"mcp__agent-builder__list_stored_credentials",
"mcp__agent-builder__delete_stored_credential",
"mcp__agent-builder__verify_credentials",
"Bash(PYTHONPATH=/home/timothy/oss/hive/core:/home/timothy/oss/hive/exports python:*)",
"Bash(PYTHONPATH=core:exports:tools/src python -m hubspot_input:*)",
"mcp__agent-builder__export_graph"
]
},
"enabledMcpjsonServers": ["agent-builder", "tools"],
"enableAllProjectMcpServers": true
}
@@ -1,361 +0,0 @@
---
name: building-agents-construction
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: procedural
part_of: building-agents
requires: building-agents-core
---
# Agent Construction - EXECUTE THESE STEPS
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it.
---
## STEP 1: Initialize Build Environment
**EXECUTE THESE TOOL CALLS NOW:**
1. Register the hive-tools MCP server:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Discover available tools:
```
mcp__agent-builder__list_mcp_tools()
```
4. Create the package directory:
```
mkdir -p exports/AGENT_NAME/nodes
```
**AFTER completing these calls**, tell the user:
> ✅ Build environment initialized
>
> - Session created
> - Available tools: [list the tools from step 3]
>
> Proceeding to define the agent goal...
**THEN immediately proceed to STEP 2.**
---
## STEP 2: Define and Approve Goal
**PROPOSE a goal to the user.** Based on what they asked for, propose:
- Goal ID (kebab-case)
- Goal name
- Goal description
- 3-5 success criteria (each with: id, description, metric, target, weight)
- 2-4 constraints (each with: id, description, constraint_type, category)
**FORMAT your proposal as a clear summary, then ask for approval:**
> **Proposed Goal: [Name]**
>
> [Description]
>
> **Success Criteria:**
>
> 1. [criterion 1]
> 2. [criterion 2]
> ...
>
> **Constraints:**
>
> 1. [constraint 1]
> 2. [constraint 2]
> ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this goal definition?",
"header": "Goal",
"options": [
{"label": "Approve", "description": "Goal looks good, proceed"},
{"label": "Modify", "description": "I want to change something"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
- If **Modify**: Ask what they want to change, update proposal, ask again
---
## STEP 3: Design Node Workflow
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
**DESIGN the workflow** as a series of nodes. For each node, determine:
- node_id (kebab-case)
- name
- description
- node_type: `"llm_generate"` (no tools) or `"llm_tool_use"` (uses tools)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist - empty list for llm_generate)
- system_prompt
**PRESENT the workflow to the user:**
> **Proposed Workflow: [N] nodes**
>
> 1. **[node-id]** - [description]
>
> - Type: [llm_generate/llm_tool_use]
> - Input: [keys]
> - Output: [keys]
> - Tools: [tools or "none"]
>
> 2. **[node-id]** - [description]
> ...
>
> **Flow:** node1 → node2 → node3 → ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this workflow design?",
"header": "Workflow",
"options": [
{"label": "Approve", "description": "Workflow looks good, proceed to build nodes"},
{"label": "Modify", "description": "I want to change the workflow"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 4
- If **Modify**: Ask what they want to change, update design, ask again
---
## STEP 4: Build Nodes One by One
**FOR EACH node in the approved workflow:**
1. **Call** `mcp__agent-builder__add_node(...)` with the node details
- input_keys and output_keys must be JSON strings: `'["key1", "key2"]'`
- tools must be a JSON string: `'["tool1"]'` or `'[]'`
2. **Call** `mcp__agent-builder__test_node(...)` to validate:
```
mcp__agent-builder__test_node(
node_id="the-node-id",
test_input='{"key": "test value"}',
mock_llm_response='{"output_key": "test output"}'
)
```
3. **Check result:**
- If valid: Tell user "✅ Node [id] validated" and continue to next node
- If invalid: Show errors, fix the node, re-validate
4. **Show progress** after each node:
```
mcp__agent-builder__get_session_status()
```
> ✅ Node [X] of [Y] complete: [node-id]
**AFTER all nodes are added and validated**, proceed to STEP 5.
---
## STEP 5: Connect Edges
**DETERMINE the edges** based on the workflow flow. For each connection:
- edge_id (kebab-case)
- source (node that outputs)
- target (node that receives)
- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"`
- condition_expr (Python expression, only if conditional)
- priority (integer, lower = higher priority)
**FOR EACH edge, call:**
```
mcp__agent-builder__add_edge(
edge_id="source-to-target",
source="source-node-id",
target="target-node-id",
condition="on_success",
condition_expr="",
priority=1
)
```
**AFTER all edges are added, validate the graph:**
```
mcp__agent-builder__validate_graph()
```
- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6
- If invalid: Show errors, fix edges, re-validate
---
## STEP 6: Generate Agent Package
**EXPORT the graph data:**
```
mcp__agent-builder__export_graph()
```
This returns JSON with all the goal, nodes, edges, and MCP server configurations.
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
5. `__main__.py` - CLI interface
6. `mcp_servers.json` - MCP server configurations
7. `README.md` - Usage documentation
**IMPORTANT entry_points format:**
- MUST be: `{"start": "first-node-id"}`
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
- NOT: `{"first-node-id"}` (WRONG - this is a set)
**Use the example agent** at `.claude/skills/building-agents-construction/examples/online_research_agent/` as a template for file structure and patterns.
**AFTER writing all files, tell the user:**
> ✅ Agent package created: `exports/AGENT_NAME/`
>
> **Files generated:**
>
> - `__init__.py` - Package exports
> - `agent.py` - Goal, nodes, edges, agent class
> - `config.py` - Runtime configuration
> - `__main__.py` - CLI interface
> - `nodes/__init__.py` - Node definitions
> - `mcp_servers.json` - MCP server config
> - `README.md` - Usage documentation
>
> **Test your agent:**
>
> ```bash
> cd /home/timothy/oss/hive
> PYTHONPATH=core:exports python -m AGENT_NAME validate
> PYTHONPATH=core:exports python -m AGENT_NAME info
> ```
---
## STEP 7: Verify and Test
**RUN validation:**
```bash
cd /home/timothy/oss/hive && PYTHONPATH=core:exports python -m AGENT_NAME validate
```
- If valid: Agent is complete!
- If errors: Fix the issues and re-run
**SHOW final session summary:**
```
mcp__agent-builder__get_session_status()
```
**TELL the user the agent is ready** and suggest next steps:
- Run with mock mode to test without API calls
- Use `/testing-agent` skill for comprehensive testing
- Use `/setup-credentials` if the agent needs API keys
---
## REFERENCE: Node Types
| Type | tools param | Use when |
| -------------- | ---------------------- | ---------------------------------------------- |
| `llm_generate` | `'[]'` | Pure reasoning, JSON output, no external calls |
| `llm_tool_use` | `'["tool1", "tool2"]'` | Needs to call MCP tools |
---
## REFERENCE: Edge Conditions
| Condition | When edge is followed |
| ------------- | ------------------------------------- |
| `on_success` | Source node completed successfully |
| `on_failure` | Source node failed |
| `always` | Always, regardless of success/failure |
| `conditional` | When condition_expr evaluates to True |
---
## REFERENCE: System Prompt Best Practice
For nodes with JSON output, include this in the system_prompt:
```
CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks.
Just the JSON object starting with { and ending with }.
Return this exact structure:
{
"key1": "...",
"key2": "..."
}
```
---
## COMMON MISTAKES TO AVOID
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
3. **Skipping validation** - Always validate nodes and graph before proceeding
4. **Not waiting for approval** - Always ask user before major steps
5. **Displaying this file** - Execute the steps, don't show documentation
@@ -1,80 +0,0 @@
# Online Research Agent
Deep-dive research agent that searches 10+ sources and produces comprehensive narrative reports with citations.
## Features
- Generates multiple search queries from a topic
- Searches and fetches 15+ web sources
- Evaluates and ranks sources by relevance
- Synthesizes findings into themes
- Writes narrative report with numbered citations
- Quality checks for uncited claims
- Saves report to local markdown file
## Usage
### CLI
```bash
# Show agent info
python -m online_research_agent info
# Validate structure
python -m online_research_agent validate
# Run research on a topic
python -m online_research_agent run --topic "impact of AI on healthcare"
# Interactive shell
python -m online_research_agent shell
```
### Python API
```python
from online_research_agent import default_agent
# Simple usage
result = await default_agent.run({"topic": "climate change solutions"})
# Check output
if result.success:
print(f"Report saved to: {result.output['file_path']}")
print(result.output['final_report'])
```
## Workflow
```
parse-query → search-sources → fetch-content → evaluate-sources
write-report ← synthesize-findings
quality-check → save-report
```
## Output
Reports are saved to `./research_reports/` as markdown files with:
1. Executive Summary
2. Introduction
3. Key Findings (by theme)
4. Analysis
5. Conclusion
6. References
## Requirements
- Python 3.11+
- LLM provider API key (Groq, Cerebras, etc.)
- Internet access for web search/fetch
## Configuration
Edit `config.py` to change:
- `model`: LLM model (default: groq/moonshotai/kimi-k2-instruct-0905)
- `temperature`: Generation temperature (default: 0.7)
- `max_tokens`: Max tokens per response (default: 16384)
@@ -1,23 +0,0 @@
"""
Online Research Agent - Deep-dive research with narrative reports.
Research any topic by searching multiple sources, synthesizing information,
and producing a well-structured narrative report with citations.
"""
from .agent import OnlineResearchAgent, default_agent, goal, nodes, edges
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"OnlineResearchAgent",
"default_agent",
"goal",
"nodes",
"edges",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
@@ -1,429 +0,0 @@
"""Agent graph construction for Online Research Agent."""
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from .config import default_config, metadata
from .nodes import (
parse_query_node,
search_sources_node,
fetch_content_node,
evaluate_sources_node,
synthesize_findings_node,
write_report_node,
quality_check_node,
save_report_node,
)
# Goal definition
goal = Goal(
id="comprehensive-online-research",
name="Comprehensive Online Research",
description="Research any topic by searching multiple sources, synthesizing information, and producing a well-structured narrative report with citations.",
success_criteria=[
SuccessCriterion(
id="source-coverage",
description="Query 10+ diverse sources",
metric="source_count",
target=">=10",
weight=0.20,
),
SuccessCriterion(
id="relevance",
description="All sources directly address the query",
metric="relevance_score",
target="90%",
weight=0.25,
),
SuccessCriterion(
id="synthesis",
description="Synthesize findings into coherent narrative",
metric="coherence_score",
target="85%",
weight=0.25,
),
SuccessCriterion(
id="citations",
description="Include citations for all claims",
metric="citation_coverage",
target="100%",
weight=0.15,
),
SuccessCriterion(
id="actionable",
description="Report answers the user's question",
metric="answer_completeness",
target="90%",
weight=0.15,
),
],
constraints=[
Constraint(
id="no-hallucination",
description="Only include information found in sources",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="source-attribution",
description="Every factual claim must cite its source",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="recency-preference",
description="Prefer recent sources when relevant",
constraint_type="quality",
category="relevance",
),
Constraint(
id="no-paywalled",
description="Avoid sources that require payment to access",
constraint_type="functional",
category="accessibility",
),
],
)
# Node list
nodes = [
parse_query_node,
search_sources_node,
fetch_content_node,
evaluate_sources_node,
synthesize_findings_node,
write_report_node,
quality_check_node,
save_report_node,
]
# Edge definitions
edges = [
EdgeSpec(
id="parse-to-search",
source="parse-query",
target="search-sources",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="search-to-fetch",
source="search-sources",
target="fetch-content",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="fetch-to-evaluate",
source="fetch-content",
target="evaluate-sources",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="evaluate-to-synthesize",
source="evaluate-sources",
target="synthesize-findings",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="synthesize-to-write",
source="synthesize-findings",
target="write-report",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="write-to-quality",
source="write-report",
target="quality-check",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
EdgeSpec(
id="quality-to-save",
source="quality-check",
target="save-report",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
]
# Graph configuration
entry_node = "parse-query"
entry_points = {"start": "parse-query"}
pause_nodes = []
terminal_nodes = ["save-report"]
class OnlineResearchAgent:
"""
Online Research Agent - Deep-dive research with narrative reports.
Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._runtime: AgentRuntime | None = None
self._graph: GraphSpec | None = None
def _build_entry_point_specs(self) -> list[EntryPointSpec]:
"""Convert entry_points dict to EntryPointSpec list."""
specs = []
for ep_id, node_id in self.entry_points.items():
if ep_id == "start":
trigger_type = "manual"
name = "Start"
elif "_resume" in ep_id:
trigger_type = "resume"
name = f"Resume from {ep_id.replace('_resume', '')}"
else:
trigger_type = "manual"
name = ep_id.replace("-", " ").title()
specs.append(
EntryPointSpec(
id=ep_id,
name=name,
entry_node=node_id,
trigger_type=trigger_type,
isolation_level="shared",
)
)
return specs
def _create_runtime(self, mock_mode=False) -> AgentRuntime:
"""Create AgentRuntime instance."""
import json
from pathlib import Path
# Persistent storage in ~/.hive for telemetry and run history
storage_path = Path.home() / ".hive" / "online_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
tool_registry = ToolRegistry()
# Load MCP servers (always load, needed for tool validation)
agent_dir = Path(__file__).parent
mcp_config_path = agent_dir / "mcp_servers.json"
if mcp_config_path.exists():
with open(mcp_config_path) as f:
mcp_servers = json.load(f)
for server_config in mcp_servers.get("servers", []):
# Resolve relative cwd paths
cwd = server_config.get("cwd")
if cwd and not Path(cwd).is_absolute():
server_config["cwd"] = str(agent_dir / cwd)
tool_registry.register_mcp_server(server_config)
llm = None
if not mock_mode:
# LiteLLMProvider uses environment variables for API keys
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
)
self._graph = GraphSpec(
id="online-research-agent-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
)
# Create AgentRuntime with all entry points
self._runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=storage_path,
entry_points=self._build_entry_point_specs(),
llm=llm,
tools=list(tool_registry.get_tools().values()),
tool_executor=tool_registry.get_executor(),
)
return self._runtime
async def start(self, mock_mode=False) -> None:
"""Start the agent runtime."""
if self._runtime is None:
self._create_runtime(mock_mode=mock_mode)
await self._runtime.start()
async def stop(self) -> None:
"""Stop the agent runtime."""
if self._runtime is not None:
await self._runtime.stop()
async def trigger(
self,
entry_point: str,
input_data: dict,
correlation_id: str | None = None,
session_state: dict | None = None,
) -> str:
"""
Trigger execution at a specific entry point (non-blocking).
Args:
entry_point: Entry point ID (e.g., "start", "pause-node_resume")
input_data: Input data for the execution
correlation_id: Optional ID to correlate related executions
session_state: Optional session state to resume from (with paused_at, memory)
Returns:
Execution ID for tracking
"""
if self._runtime is None or not self._runtime.is_running:
raise RuntimeError("Agent runtime not started. Call start() first.")
return await self._runtime.trigger(
entry_point, input_data, correlation_id, session_state=session_state
)
async def trigger_and_wait(
self,
entry_point: str,
input_data: dict,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""
Trigger execution and wait for completion.
Args:
entry_point: Entry point ID
input_data: Input data for the execution
timeout: Maximum time to wait (seconds)
session_state: Optional session state to resume from (with paused_at, memory)
Returns:
ExecutionResult or None if timeout
"""
if self._runtime is None or not self._runtime.is_running:
raise RuntimeError("Agent runtime not started. Call start() first.")
return await self._runtime.trigger_and_wait(
entry_point, input_data, timeout, session_state=session_state
)
async def run(
self, context: dict, mock_mode=False, session_state=None
) -> ExecutionResult:
"""
Run the agent (convenience method for simple single execution).
For more control, use start() + trigger_and_wait() + stop().
"""
await self.start(mock_mode=mock_mode)
try:
# Determine entry point based on session_state
if session_state and "paused_at" in session_state:
paused_node = session_state["paused_at"]
resume_key = f"{paused_node}_resume"
if resume_key in self.entry_points:
entry_point = resume_key
else:
entry_point = "start"
else:
entry_point = "start"
result = await self.trigger_and_wait(
entry_point, context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
async def get_goal_progress(self) -> dict:
"""Get goal progress across all executions."""
if self._runtime is None:
raise RuntimeError("Agent runtime not started")
return await self._runtime.get_goal_progress()
def get_stats(self) -> dict:
"""Get runtime statistics."""
if self._runtime is None:
return {"running": False}
return self._runtime.get_stats()
def info(self):
"""Get agent information."""
return {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": self.goal.name,
"description": self.goal.description,
},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"entry_points": self.entry_points,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
"multi_entrypoint": True,
}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {node.id for node in self.nodes}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{self.entry_node}' not found")
for terminal in self.terminal_nodes:
if terminal not in node_ids:
errors.append(f"Terminal node '{terminal}' not found")
for pause in self.pause_nodes:
if pause not in node_ids:
errors.append(f"Pause node '{pause}' not found")
# Validate entry points
for ep_id, node_id in self.entry_points.items():
if node_id not in node_ids:
errors.append(
f"Entry point '{ep_id}' references unknown node '{node_id}'"
)
return {
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}
# Create default instance
default_agent = OnlineResearchAgent()
@@ -1,396 +0,0 @@
"""Node definitions for Online Research Agent."""
from framework.graph import NodeSpec
# Node 1: Parse Query
parse_query_node = NodeSpec(
id="parse-query",
name="Parse Query",
description="Analyze the research topic and generate 3-5 diverse search queries to cover different aspects",
node_type="llm_generate",
input_keys=["topic"],
output_keys=["search_queries", "research_focus", "key_aspects"],
output_schema={
"research_focus": {
"type": "string",
"required": True,
"description": "Brief statement of what we're researching",
},
"key_aspects": {
"type": "array",
"required": True,
"description": "List of 3-5 key aspects to investigate",
},
"search_queries": {
"type": "array",
"required": True,
"description": "List of 3-5 search queries",
},
},
system_prompt="""\
You are a research query strategist. Given a research topic, analyze it and generate search queries.
Your task:
1. Understand the core research question
2. Identify 3-5 key aspects to investigate
3. Generate 3-5 diverse search queries that will find comprehensive information
CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks.
Return this JSON structure:
{
"research_focus": "Brief statement of what we're researching",
"key_aspects": ["aspect1", "aspect2", "aspect3"],
"search_queries": [
"query 1 - broad overview",
"query 2 - specific angle",
"query 3 - recent developments",
"query 4 - expert opinions",
"query 5 - data/statistics"
]
}
""",
tools=[],
max_retries=3,
)
# Node 2: Search Sources
search_sources_node = NodeSpec(
id="search-sources",
name="Search Sources",
description="Execute web searches using the generated queries to find 15+ source URLs",
node_type="llm_tool_use",
input_keys=["search_queries", "research_focus"],
output_keys=["source_urls", "search_results_summary"],
output_schema={
"source_urls": {
"type": "array",
"required": True,
"description": "List of source URLs found",
},
"search_results_summary": {
"type": "string",
"required": True,
"description": "Brief summary of what was found",
},
},
system_prompt="""\
You are a research assistant executing web searches. Use the web_search tool to find sources.
Your task:
1. Execute each search query using web_search tool
2. Collect URLs from search results
3. Aim for 15+ diverse sources
After searching, return JSON with found sources:
{
"source_urls": ["url1", "url2", ...],
"search_results_summary": "Brief summary of what was found"
}
""",
tools=["web_search"],
max_retries=3,
)
# Node 3: Fetch Content
fetch_content_node = NodeSpec(
id="fetch-content",
name="Fetch Content",
description="Fetch and extract content from the discovered source URLs",
node_type="llm_tool_use",
input_keys=["source_urls", "research_focus"],
output_keys=["fetched_sources", "fetch_errors"],
output_schema={
"fetched_sources": {
"type": "array",
"required": True,
"description": "List of fetched source objects with url, title, content",
},
"fetch_errors": {
"type": "array",
"required": True,
"description": "List of URLs that failed to fetch",
},
},
system_prompt="""\
You are a content fetcher. Use web_scrape tool to retrieve content from URLs.
Your task:
1. Fetch content from each source URL using web_scrape tool
2. Extract the main content relevant to the research focus
3. Track any URLs that failed to fetch
After fetching, return JSON:
{
"fetched_sources": [
{"url": "...", "title": "...", "content": "extracted text..."},
...
],
"fetch_errors": ["url that failed", ...]
}
""",
tools=["web_scrape"],
max_retries=3,
)
# Node 4: Evaluate Sources
evaluate_sources_node = NodeSpec(
id="evaluate-sources",
name="Evaluate Sources",
description="Score sources for relevance and quality, filter to top 10",
node_type="llm_generate",
input_keys=["fetched_sources", "research_focus", "key_aspects"],
output_keys=["ranked_sources", "source_analysis"],
output_schema={
"ranked_sources": {
"type": "array",
"required": True,
"description": "List of ranked sources with scores",
},
"source_analysis": {
"type": "string",
"required": True,
"description": "Overview of source quality and coverage",
},
},
system_prompt="""\
You are a source evaluator. Assess each source for quality and relevance.
Scoring criteria:
- Relevance to research focus (1-10)
- Source credibility (1-10)
- Information depth (1-10)
- Recency if relevant (1-10)
Your task:
1. Score each source
2. Rank by combined score
3. Select top 10 sources
4. Note what each source uniquely contributes
Return JSON:
{
"ranked_sources": [
{"url": "...", "title": "...", "content": "...", "score": 8.5, "unique_value": "..."},
...
],
"source_analysis": "Overview of source quality and coverage"
}
""",
tools=[],
max_retries=3,
)
# Node 5: Synthesize Findings
synthesize_findings_node = NodeSpec(
id="synthesize-findings",
name="Synthesize Findings",
description="Extract key facts from sources and identify common themes",
node_type="llm_generate",
input_keys=["ranked_sources", "research_focus", "key_aspects"],
output_keys=["key_findings", "themes", "source_citations"],
output_schema={
"key_findings": {
"type": "array",
"required": True,
"description": "List of key findings with sources and confidence",
},
"themes": {
"type": "array",
"required": True,
"description": "List of themes with descriptions and supporting sources",
},
"source_citations": {
"type": "object",
"required": True,
"description": "Map of facts to supporting URLs",
},
},
system_prompt="""\
You are a research synthesizer. Analyze multiple sources to extract insights.
Your task:
1. Identify key facts from each source
2. Find common themes across sources
3. Note contradictions or debates
4. Build a citation map (fact -> source URL)
Return JSON:
{
"key_findings": [
{"finding": "...", "sources": ["url1", "url2"], "confidence": "high/medium/low"},
...
],
"themes": [
{"theme": "...", "description": "...", "supporting_sources": ["url1", ...]},
...
],
"source_citations": {
"fact or claim": ["supporting url1", "url2"],
...
}
}
""",
tools=[],
max_retries=3,
)
# Node 6: Write Report
write_report_node = NodeSpec(
id="write-report",
name="Write Report",
description="Generate a narrative report with proper citations",
node_type="llm_generate",
input_keys=[
"key_findings",
"themes",
"source_citations",
"research_focus",
"ranked_sources",
],
output_keys=["report_content", "references"],
output_schema={
"report_content": {
"type": "string",
"required": True,
"description": "Full markdown report text with citations",
},
"references": {
"type": "array",
"required": True,
"description": "List of reference objects with number, url, title",
},
},
system_prompt="""\
You are a research report writer. Create a well-structured narrative report.
Report structure:
1. Executive Summary (2-3 paragraphs)
2. Introduction (context and scope)
3. Key Findings (organized by theme)
4. Analysis (synthesis and implications)
5. Conclusion
6. References (numbered list of all sources)
Citation format: Use numbered citations like [1], [2] that correspond to the References section.
IMPORTANT:
- Every factual claim MUST have a citation
- Write in clear, professional prose
- Be objective and balanced
- Highlight areas of consensus and debate
Return JSON:
{
"report_content": "Full markdown report text with citations...",
"references": [
{"number": 1, "url": "...", "title": "..."},
...
]
}
""",
tools=[],
max_retries=3,
)
# Node 7: Quality Check
quality_check_node = NodeSpec(
id="quality-check",
name="Quality Check",
description="Verify all claims have citations and report is coherent",
node_type="llm_generate",
input_keys=["report_content", "references", "source_citations"],
output_keys=["quality_score", "issues", "final_report"],
output_schema={
"quality_score": {
"type": "number",
"required": True,
"description": "Quality score 0-1",
},
"issues": {
"type": "array",
"required": True,
"description": "List of issues found and fixed",
},
"final_report": {
"type": "string",
"required": True,
"description": "Corrected full report",
},
},
system_prompt="""\
You are a quality assurance reviewer. Check the research report for issues.
Check for:
1. Uncited claims (factual statements without [n] citation)
2. Broken citations (references to non-existent numbers)
3. Coherence (logical flow between sections)
4. Completeness (all key aspects covered)
5. Accuracy (claims match source content)
If issues found, fix them in the final report.
Return JSON:
{
"quality_score": 0.95,
"issues": [
{"type": "uncited_claim", "location": "paragraph 3", "fixed": true},
...
],
"final_report": "Corrected full report with all issues fixed..."
}
""",
tools=[],
max_retries=3,
)
# Node 8: Save Report
save_report_node = NodeSpec(
id="save-report",
name="Save Report",
description="Write the final report to a local markdown file",
node_type="llm_tool_use",
input_keys=["final_report", "references", "research_focus"],
output_keys=["file_path", "save_status"],
output_schema={
"file_path": {
"type": "string",
"required": True,
"description": "Path where report was saved",
},
"save_status": {
"type": "string",
"required": True,
"description": "Status of save operation",
},
},
system_prompt="""\
You are a file manager. Save the research report to disk.
Your task:
1. Generate a filename from the research focus (slugified, with date)
2. Use the write_to_file tool to save the report as markdown
3. Save to the ./research_reports/ directory
Filename format: research_YYYY-MM-DD_topic-slug.md
Return JSON:
{
"file_path": "research_reports/research_2026-01-23_topic-name.md",
"save_status": "success"
}
""",
tools=["write_to_file"],
max_retries=3,
)
__all__ = [
"parse_query_node",
"search_sources_node",
"fetch_content_node",
"evaluate_sources_node",
"synthesize_findings_node",
"write_report_node",
"quality_check_node",
"save_report_node",
]
@@ -1,303 +0,0 @@
---
name: building-agents-core
description: Core concepts for goal-driven agents - architecture, node types, tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "1.0"
type: foundational
part_of: building-agents
---
# Building Agents - Core Concepts
Foundational knowledge for building goal-driven agents as Python packages.
## Architecture: Python Services (Not JSON Configs)
Agents are built as Python packages:
```
exports/my_agent/
├── __init__.py # Package exports
├── __main__.py # CLI (run, info, validate, shell)
├── agent.py # Graph construction (goal, edges, agent class)
├── nodes/__init__.py # Node definitions (NodeSpec)
├── config.py # Runtime config
└── README.md # Documentation
```
**Key Principle: Agent is visible and editable during build**
- ✅ Files created immediately as components are approved
- ✅ User can watch files grow in their editor
- ✅ No session state - just direct file writes
- ✅ No "export" step - agent is ready when build completes
## Core Concepts
### Goal
Success criteria and constraints (written to agent.py)
```python
goal = Goal(
id="research-goal",
name="Technical Research Agent",
description="Research technical topics thoroughly",
success_criteria=[
SuccessCriterion(
id="completeness",
description="Cover all aspects of topic",
metric="coverage_score",
target=">=0.9",
weight=0.4,
),
# 3-5 success criteria total
],
constraints=[
Constraint(
id="accuracy",
description="All information must be verified",
constraint_type="hard",
category="quality",
),
# 1-5 constraints total
],
)
```
### Node
Unit of work (written to nodes/__init__.py)
**Node Types:**
- `llm_generate` - Text generation, parsing
- `llm_tool_use` - Actions requiring tools
- `router` - Conditional branching
- `function` - Deterministic operations
```python
search_node = NodeSpec(
id="search-web",
name="Search Web",
description="Search for information online",
node_type="llm_tool_use",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}",
tools=["web_search"],
max_retries=3,
)
```
### Edge
Connection between nodes (written to agent.py)
**Edge Conditions:**
- `on_success` - Proceed if node succeeds
- `on_failure` - Handle errors
- `always` - Always proceed
- `conditional` - Based on expression
```python
EdgeSpec(
id="search-to-analyze",
source="search-web",
target="analyze-results",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
)
```
### Pause/Resume
Multi-turn conversations
- **Pause nodes** - Stop execution, wait for user input
- **Resume entry points** - Continue from pause with user's response
```python
# Example pause/resume configuration
pause_nodes = ["request-clarification"]
entry_points = {
"start": "analyze-request",
"request-clarification_resume": "process-clarification"
}
```
## Tool Discovery & Validation
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
### Step 1: Register MCP Server (if not already done)
```python
mcp__agent-builder__add_mcp_server(
name="tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="../tools"
)
```
### Step 2: Discover Available Tools
```python
# List all tools from all registered servers
mcp__agent-builder__list_mcp_tools()
# Or list tools from a specific server
mcp__agent-builder__list_mcp_tools(server_name="tools")
```
This returns available tools with their descriptions and parameters:
```json
{
"success": true,
"tools_by_server": {
"tools": [
{
"name": "web_search",
"description": "Search the web...",
"parameters": ["query"]
},
{
"name": "web_scrape",
"description": "Scrape a URL...",
"parameters": ["url"]
}
]
},
"total_tools": 14
}
```
### Step 3: Validate Before Adding Nodes
Before writing a node with `tools=[...]`:
1. Call `list_mcp_tools()` to get available tools
2. Check each tool in your node exists in the response
3. If a tool doesn't exist:
- **DO NOT proceed** with the node
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
- Ask if they want to use an alternative or proceed without the tool
### Tool Validation Anti-Patterns
**Never assume a tool exists** - always call `list_mcp_tools()` first
**Never write a node with unverified tools** - validate before writing
**Never silently drop tools** - if a tool doesn't exist, inform the user
**Never guess tool names** - use exact names from discovery response
### Example Validation Flow
```python
# 1. User requests: "Add a node that searches the web"
# 2. Discover available tools
tools_response = mcp__agent-builder__list_mcp_tools()
# 3. Check if web_search exists
available = [t["name"] for tools in tools_response["tools_by_server"].values() for t in tools]
if "web_search" not in available:
# Inform user and ask how to proceed
print("'web_search' not available. Available tools:", available)
else:
# Proceed with node creation
# ...
```
## Workflow Overview: Incremental File Construction
```
1. CREATE PACKAGE → mkdir + write skeletons
2. DEFINE GOAL → Write to agent.py + config.py
3. FOR EACH NODE:
- Propose design
- User approves
- Write to nodes/__init__.py IMMEDIATELY ← FILE WRITTEN
- (Optional) Validate with test_node ← MCP VALIDATION
- User can open file and see it
4. CONNECT EDGES → Update agent.py ← FILE WRITTEN
- (Optional) Validate with validate_graph ← MCP VALIDATION
5. FINALIZE → Write agent class to agent.py ← FILE WRITTEN
6. DONE - Agent ready at exports/my_agent/
```
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
### The Key Difference
**OLD (Bad):**
```
MCP add_node → Session State → MCP add_node → Session State → ...
MCP export_graph
Files appear
```
**NEW (Good):**
```
Write node to file → (Optional: MCP test_node) → Write node to file → ...
↓ ↓
File visible File visible
immediately immediately
```
**Bottom line:** Use Write/Edit for construction, MCP for validation if needed.
## When to Use This Skill
Use building-agents-core when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `building-agents-construction` skill
- Need patterns and examples? → Use `building-agents-patterns` skill
## MCP Tools for Validation
After writing files, optionally use MCP tools for validation:
**test_node** - Validate node configuration with mock inputs
```python
mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "test query"}',
mock_llm_response='{"results": "mock output"}'
)
```
**validate_graph** - Check graph structure
```python
mcp__agent-builder__validate_graph()
# Returns: unreachable nodes, missing connections, etc.
```
**create_session** - Track session state for bookkeeping
```python
mcp__agent-builder__create_session(session_name="my-build")
```
**Key Point:** Files are written FIRST. MCP tools are for validation only.
## Related Skills
- **building-agents-construction** - Step-by-step building process
- **building-agents-patterns** - Best practices and examples
- **agent-workflow** - Complete workflow orchestrator
- **testing-agent** - Test and validate completed agents
@@ -1,497 +0,0 @@
---
name: building-agents-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes pause/resume architecture, hybrid workflows, anti-patterns, and handoff to testing. Use when optimizing agent design.
license: Apache-2.0
metadata:
author: hive
version: "1.0"
type: reference
part_of: building-agents
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `building-agents-construction`.
## Practical Example: Hybrid Workflow
How to build a node using both direct file writes and optional MCP validation:
```python
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
node_code = '''
search_node = NodeSpec(
id="search-web",
node_type="llm_tool_use",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}",
tools=["web_search"],
)
'''
Edit(
file_path="exports/research_agent/nodes/__init__.py",
old_string="# Nodes will be added here",
new_string=node_code
)
print("✅ Added search_node to nodes/__init__.py")
print("📁 Open exports/research_agent/nodes/__init__.py to see it!")
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
validation = mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "python tutorials"}',
mock_llm_response='{"search_results": [...mock results...]}'
)
print(f"✓ Validation: {validation['success']}")
```
**User experience:**
- Immediately sees node in their editor (from step 1)
- Gets validation feedback (from step 2)
- Can edit the file directly if needed
This combines visibility (files) with validation (MCP tools).
## Pause/Resume Architecture
For agents needing multi-turn conversations with user interaction:
### Basic Pause/Resume Flow
```python
# Define pause nodes - execution stops at these nodes
pause_nodes = ["request-clarification", "await-approval"]
# Define entry points - where to resume from each pause
entry_points = {
"start": "analyze-request", # Initial entry
"request-clarification_resume": "process-clarification", # Resume from clarification
"await-approval_resume": "execute-action", # Resume from approval
}
```
### Example: Multi-Turn Research Agent
```python
# Nodes
nodes = [
NodeSpec(id="analyze-request", ...),
NodeSpec(id="request-clarification", ...), # PAUSE NODE
NodeSpec(id="process-clarification", ...),
NodeSpec(id="generate-results", ...),
NodeSpec(id="await-approval", ...), # PAUSE NODE
NodeSpec(id="execute-action", ...),
]
# Edges with resume flows
edges = [
EdgeSpec(
id="analyze-to-clarify",
source="analyze-request",
target="request-clarification",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_clarification == true",
),
# When resumed, goes to process-clarification
EdgeSpec(
id="clarify-to-process",
source="request-clarification",
target="process-clarification",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="results-to-approval",
source="generate-results",
target="await-approval",
condition=EdgeCondition.ALWAYS,
),
# When resumed, goes to execute-action
EdgeSpec(
id="approval-to-execute",
source="await-approval",
target="execute-action",
condition=EdgeCondition.ALWAYS,
),
]
# Configuration
pause_nodes = ["request-clarification", "await-approval"]
entry_points = {
"start": "analyze-request",
"request-clarification_resume": "process-clarification",
"await-approval_resume": "execute-action",
}
```
### Running Pause/Resume Agents
```python
# Initial run - will pause at first pause node
result1 = await agent.run(
context={"query": "research topic"},
session_state=None
)
# Check if paused
if result1.paused_at:
print(f"Paused at: {result1.paused_at}")
# Resume with user input
result2 = await agent.run(
context={"user_response": "clarification details"},
session_state=result1.session_state # Pass previous state
)
```
## Anti-Patterns
### What NOT to Do
**Don't rely on `export_graph`** - Write files immediately, not at end
```python
# BAD: Building in session state, exporting at end
mcp__agent-builder__add_node(...)
mcp__agent-builder__add_node(...)
mcp__agent-builder__export_graph() # Files appear only now
# GOOD: Writing files immediately
Write(file_path="...", content=node_code) # File visible now
Write(file_path="...", content=node_code) # File visible now
```
**Don't hide code in session** - Write to files as components approved
```python
# BAD: Accumulating changes invisibly
session.add_component(component1)
session.add_component(component2)
# User can't see anything yet
# GOOD: Incremental visibility
Edit(file_path="...", ...) # User sees change 1
Edit(file_path="...", ...) # User sees change 2
```
**Don't wait to write files** - Agent visible from first step
```python
# BAD: Building everything before writing
design_all_nodes()
design_all_edges()
write_everything_at_once()
# GOOD: Write as you go
write_package_structure() # Visible
write_goal() # Visible
write_node_1() # Visible
write_node_2() # Visible
```
**Don't batch everything** - Write incrementally
```python
# BAD: Batching all nodes
nodes = [design_node_1(), design_node_2(), ...]
write_all_nodes(nodes)
# GOOD: One at a time with user feedback
write_node_1() # User approves
write_node_2() # User approves
write_node_3() # User approves
```
### MCP Tools - Correct Usage
**MCP tools OK for:**
`test_node` - Validate node configuration with mock inputs
`validate_graph` - Check graph structure
`create_session` - Track session state for bookkeeping
✅ Other validation tools
**Just don't:** Use MCP as the primary construction method or rely on export_graph
## Best Practices
### 1. Show Progress After Each Write
```python
# After writing a node
print("✅ Added analyze_request_node to nodes/__init__.py")
print("📊 Progress: 1/6 nodes added")
print("📁 Open exports/my_agent/nodes/__init__.py to see it!")
```
### 2. Let User Open Files During Build
```python
# Encourage file inspection
print("✅ Goal written to agent.py")
print("")
print("💡 Tip: Open exports/my_agent/agent.py in your editor to see the goal!")
```
### 3. Write Incrementally - One Component at a Time
```python
# Good flow
write_package_structure()
show_user("Package created")
write_goal()
show_user("Goal written")
for node in nodes:
get_approval(node)
write_node(node)
show_user(f"Node {node.id} written")
```
### 4. Test As You Build
```python
# After adding several nodes
print("💡 You can test current state with:")
print(" PYTHONPATH=core:exports python -m my_agent validate")
print(" PYTHONPATH=core:exports python -m my_agent info")
```
### 5. Keep User Informed
```python
# Clear status updates
print("🔨 Creating package structure...")
print("✅ Package created: exports/my_agent/")
print("")
print("📝 Next: Define agent goal")
```
## Continuous Monitoring Agents
For agents that run continuously without terminal nodes:
```python
# No terminal nodes - loops forever
terminal_nodes = []
# Workflow loops back to start
edges = [
EdgeSpec(id="monitor-to-check", source="monitor", target="check-condition"),
EdgeSpec(id="check-to-wait", source="check-condition", target="wait"),
EdgeSpec(id="wait-to-monitor", source="wait", target="monitor"), # Loop
]
# Entry node only
entry_node = "monitor"
entry_points = {"start": "monitor"}
pause_nodes = []
```
**Example: File Monitor**
```python
nodes = [
NodeSpec(id="list-files", ...),
NodeSpec(id="check-new-files", node_type="router", ...),
NodeSpec(id="process-files", ...),
NodeSpec(id="wait-interval", node_type="function", ...),
]
edges = [
EdgeSpec(id="list-to-check", source="list-files", target="check-new-files"),
EdgeSpec(
id="check-to-process",
source="check-new-files",
target="process-files",
condition=EdgeCondition.CONDITIONAL,
condition_expr="new_files_count > 0",
),
EdgeSpec(
id="check-to-wait",
source="check-new-files",
target="wait-interval",
condition=EdgeCondition.CONDITIONAL,
condition_expr="new_files_count == 0",
),
EdgeSpec(id="process-to-wait", source="process-files", target="wait-interval"),
EdgeSpec(id="wait-to-list", source="wait-interval", target="list-files"), # Loop back
]
terminal_nodes = [] # No terminal - runs forever
```
## Complex Routing Patterns
### Multi-Condition Router
```python
router_node = NodeSpec(
id="decision-router",
node_type="router",
input_keys=["analysis_result"],
output_keys=["decision"],
system_prompt="""
Based on the analysis result, decide the next action:
- If confidence > 0.9: route to "execute"
- If 0.5 <= confidence <= 0.9: route to "review"
- If confidence < 0.5: route to "clarify"
Return: {"decision": "execute|review|clarify"}
""",
)
# Edges for each route
edges = [
EdgeSpec(
id="router-to-execute",
source="decision-router",
target="execute-action",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'execute'",
priority=1,
),
EdgeSpec(
id="router-to-review",
source="decision-router",
target="human-review",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'review'",
priority=2,
),
EdgeSpec(
id="router-to-clarify",
source="decision-router",
target="request-clarification",
condition=EdgeCondition.CONDITIONAL,
condition_expr="decision == 'clarify'",
priority=3,
),
]
```
## Error Handling Patterns
### Graceful Failure with Fallback
```python
# Primary node with error handling
nodes = [
NodeSpec(id="api-call", max_retries=3, ...),
NodeSpec(id="fallback-cache", ...),
NodeSpec(id="report-error", ...),
]
edges = [
# Success path
EdgeSpec(
id="api-success",
source="api-call",
target="process-results",
condition=EdgeCondition.ON_SUCCESS,
),
# Fallback on failure
EdgeSpec(
id="api-to-fallback",
source="api-call",
target="fallback-cache",
condition=EdgeCondition.ON_FAILURE,
priority=1,
),
# Report if fallback also fails
EdgeSpec(
id="fallback-to-error",
source="fallback-cache",
target="report-error",
condition=EdgeCondition.ON_FAILURE,
priority=1,
),
]
```
## Performance Optimization
### Parallel Node Execution
```python
# Use multiple edges from same source for parallel execution
edges = [
EdgeSpec(
id="start-to-search1",
source="start",
target="search-source-1",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="start-to-search2",
source="start",
target="search-source-2",
condition=EdgeCondition.ALWAYS,
),
EdgeSpec(
id="start-to-search3",
source="start",
target="search-source-3",
condition=EdgeCondition.ALWAYS,
),
# Converge results
EdgeSpec(
id="search1-to-merge",
source="search-source-1",
target="merge-results",
),
EdgeSpec(
id="search2-to-merge",
source="search-source-2",
target="merge-results",
),
EdgeSpec(
id="search3-to-merge",
source="search-source-3",
target="merge-results",
),
]
```
## Handoff to Testing
When agent is complete, transition to testing phase:
```python
print("""
✅ Agent complete: exports/my_agent/
Next steps:
1. Switch to testing-agent skill
2. Generate and approve tests
3. Run evaluation
4. Debug any failures
Command: "Test the agent at exports/my_agent/"
""")
```
### Pre-Testing Checklist
Before handing off to testing-agent:
- [ ] Agent structure validates: `python -m agent_name validate`
- [ ] All nodes defined in nodes/__init__.py
- [ ] All edges connect valid nodes
- [ ] Entry node specified
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
- [ ] README.md with usage instructions
- [ ] CLI commands work (info, validate)
## Related Skills
- **building-agents-core** - Fundamental concepts
- **building-agents-construction** - Step-by-step building
- **testing-agent** - Test and validate agents
- **agent-workflow** - Complete workflow orchestrator
---
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
+399
View File
@@ -0,0 +1,399 @@
---
name: hive-concepts
description: Core concepts for goal-driven agents - architecture, node types (event_loop, function), tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: foundational
part_of: hive
---
# Building Agents - Core Concepts
Foundational knowledge for building goal-driven agents as Python packages.
## Architecture: Python Services (Not JSON Configs)
Agents are built as Python packages:
```
exports/my_agent/
├── __init__.py # Package exports
├── __main__.py # CLI (run, info, validate, shell)
├── agent.py # Graph construction (goal, edges, agent class)
├── nodes/__init__.py # Node definitions (NodeSpec)
├── config.py # Runtime config
└── README.md # Documentation
```
**Key Principle: Agent is visible and editable during build**
- Files created immediately as components are approved
- User can watch files grow in their editor
- No session state - just direct file writes
- No "export" step - agent is ready when build completes
## Core Concepts
### Goal
Success criteria and constraints (written to agent.py)
```python
goal = Goal(
id="research-goal",
name="Technical Research Agent",
description="Research technical topics thoroughly",
success_criteria=[
SuccessCriterion(
id="completeness",
description="Cover all aspects of topic",
metric="coverage_score",
target=">=0.9",
weight=0.4,
),
# 3-5 success criteria total
],
constraints=[
Constraint(
id="accuracy",
description="All information must be verified",
constraint_type="hard",
category="quality",
),
# 1-5 constraints total
],
)
```
### Node
Unit of work (written to nodes/__init__.py)
**Node Types:**
- `event_loop` — Multi-turn streaming loop with tool execution and judge-based evaluation. Works with or without tools.
- `function` — Deterministic Python operations. No LLM involved.
```python
search_node = NodeSpec(
id="search-web",
name="Search Web",
description="Search for information and extract results",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use the web_search tool to find results, then call set_output to store them.",
tools=["web_search"],
)
```
**NodeSpec Fields for Event Loop Nodes:**
| Field | Default | Description |
|-------|---------|-------------|
| `client_facing` | `False` | If True, streams output to user and blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (for mutually exclusive outputs) |
| `max_node_visits` | `1` | Max times this node executes per run. Set >1 for feedback loop targets |
### Edge
Connection between nodes (written to agent.py)
**Edge Conditions:**
- `on_success` — Proceed if node succeeds (most common)
- `on_failure` — Handle errors
- `always` — Always proceed
- `conditional` — Based on expression evaluating node output
**Edge Priority:**
Priority controls evaluation order when multiple edges leave the same node. Higher priority edges are evaluated first. Use negative priority for feedback edges (edges that loop back to earlier nodes).
```python
# Forward edge (evaluated first)
EdgeSpec(
id="review-to-campaign",
source="review",
target="campaign-builder",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('approved_contacts') is not None",
priority=1,
)
# Feedback edge (evaluated after forward edges)
EdgeSpec(
id="review-feedback",
source="review",
target="extractor",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('redo_extraction') is not None",
priority=-1,
)
```
### Client-Facing Nodes
For multi-turn conversations with the user, set `client_facing=True` on a node. The node will:
- Stream its LLM output directly to the end user
- Block for user input between conversational turns
- Resume when new input is injected via `inject_event()`
```python
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=[],
output_keys=["repo_url", "project_url"],
system_prompt="You are the intake agent. Ask the user for the repo URL and project URL.",
)
```
> **Legacy Note:** The old `pause_nodes` / `entry_points` pattern still works but `client_facing=True` is preferred for new agents.
**STEP 1 / STEP 2 Prompt Pattern:** For client-facing nodes, structure the system prompt with two explicit phases:
```python
system_prompt="""\
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
[Call set_output with the structured outputs]
"""
```
This prevents the LLM from calling `set_output` prematurely before the user has had a chance to respond.
### Node Design: Fewer, Richer Nodes
Prefer fewer nodes that do more work over many thin single-purpose nodes:
- **Bad**: 8 thin nodes (parse query → search → fetch → evaluate → synthesize → write → check → save)
- **Good**: 4 rich nodes (intake → research → review → report)
Why: Each node boundary requires serializing outputs and passing context. Fewer nodes means the LLM retains full context of its work within the node. A research node that searches, fetches, and analyzes keeps all the source material in its conversation history.
### nullable_output_keys for Cross-Edge Inputs
When a node receives inputs that only arrive on certain edges (e.g., `feedback` only comes from a review → research feedback loop, not from intake → research), mark those keys as `nullable_output_keys`:
```python
research_node = NodeSpec(
id="research",
input_keys=["research_brief", "feedback"],
nullable_output_keys=["feedback"], # Not present on first visit
max_node_visits=3,
...
)
```
## Event Loop Architecture Concepts
### How EventLoopNode Works
An event loop node runs a multi-turn loop:
1. LLM receives system prompt + conversation history
2. LLM responds (text and/or tool calls)
3. Tool calls are executed, results added to conversation
4. Judge evaluates: ACCEPT (exit loop), RETRY (loop again), or ESCALATE
5. Repeat until judge ACCEPTs or max_iterations reached
### EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. You do NOT need to manually register them. Both `GraphExecutor` (direct) and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically.
```python
# Direct execution — executor auto-creates EventLoopNodes
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
# TUI execution — AgentRuntime also works
from framework.runtime.agent_runtime import create_agent_runtime
runtime = create_agent_runtime(
graph=graph, goal=goal, storage_path=storage_path,
entry_points=[...], llm=llm, tools=tools, tool_executor=tool_executor,
)
```
### set_output
Nodes produce structured outputs by calling `set_output(key, value)` — a synthetic tool injected by the framework. When the LLM calls `set_output`, the value is stored in the output accumulator and made available to downstream nodes via shared memory.
`set_output` is NOT a real tool — it is excluded from `real_tool_results`. For client-facing nodes, this means a turn where the LLM only calls `set_output` (no other tools) is treated as a conversational boundary and will block for user input.
### JudgeProtocol
**The judge is the SOLE mechanism for acceptance decisions.** Do not add ad-hoc framework gating, output rollback, or premature rejection logic. If the LLM calls `set_output` too early, fix it with better prompts or a custom judge — not framework-level guards.
The judge controls when a node's loop exits:
- **Implicit judge** (default, no judge configured): ACCEPTs when the LLM finishes with no tool calls and all required output keys are set
- **SchemaJudge**: Validates outputs against a Pydantic model
- **Custom judges**: Implement `evaluate(context) -> JudgeVerdict`
### LoopConfig
Controls loop behavior:
- `max_iterations` (default 50) — prevents infinite loops
- `max_tool_calls_per_turn` (default 10) — limits tool calls per LLM response
- `tool_call_overflow_margin` (default 0.5) — wiggle room before discarding extra tool calls (50% means hard cutoff at 150% of limit)
- `stall_detection_threshold` (default 3) — detects repeated identical responses
- `max_history_tokens` (default 32000) — triggers conversation compaction
### Data Tools (Spillover Management)
When tool results exceed the context window, the framework automatically saves them to a spillover directory and truncates with a hint. Nodes that produce or consume large data should include the data tools:
- `save_data(filename, data)` — Write data to a file in the data directory
- `load_data(filename, offset=0, limit=50)` — Read data with line-based pagination
- `list_data_files()` — List available data files
- `serve_file_to_user(filename, label="")` — Get a clickable file:// URI for the user
Note: `data_dir` is a framework-injected context parameter — the LLM never sees or passes it. `GraphExecutor.execute()` sets it per-execution via `contextvars`, so data tools and spillover always share the same session-scoped directory.
These are real MCP tools (not synthetic). Add them to nodes that handle large tool results:
```python
research_node = NodeSpec(
...
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
)
```
### Fan-Out / Fan-In
Multiple ON_SUCCESS edges from the same source create parallel execution. All branches run concurrently via `asyncio.gather()`. Parallel event_loop nodes must have disjoint `output_keys`.
### max_node_visits
Controls how many times a node can execute in one graph run. Default is 1. Set higher for nodes that are targets of feedback edges (review-reject loops). Set 0 for unlimited (guarded by max_steps).
## Tool Discovery & Validation
**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
### Step 1: Register MCP Server (if not already done)
```python
mcp__agent-builder__add_mcp_server(
name="tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="../tools"
)
```
### Step 2: Discover Available Tools
```python
# List all tools from all registered servers
mcp__agent-builder__list_mcp_tools()
# Or list tools from a specific server
mcp__agent-builder__list_mcp_tools(server_name="tools")
```
### Step 3: Validate Before Adding Nodes
Before writing a node with `tools=[...]`:
1. Call `list_mcp_tools()` to get available tools
2. Check each tool in your node exists in the response
3. If a tool doesn't exist:
- **DO NOT proceed** with the node
- Inform the user: "The tool 'X' is not available. Available tools are: ..."
- Ask if they want to use an alternative or proceed without the tool
### Tool Validation Anti-Patterns
- **Never assume a tool exists** - always call `list_mcp_tools()` first
- **Never write a node with unverified tools** - validate before writing
- **Never silently drop tools** - if a tool doesn't exist, inform the user
- **Never guess tool names** - use exact names from discovery response
## Workflow Overview: Incremental File Construction
```
1. CREATE PACKAGE → mkdir + write skeletons
2. DEFINE GOAL → Write to agent.py + config.py
3. FOR EACH NODE:
- Propose design (event_loop for LLM work, function for deterministic)
- User approves
- Write to nodes/__init__.py IMMEDIATELY
- (Optional) Validate with test_node
4. CONNECT EDGES → Update agent.py
- Use priority for feedback edges (negative priority)
- (Optional) Validate with validate_graph
5. FINALIZE → Write agent class to agent.py
6. DONE - Agent ready at exports/my_agent/
```
**Files written immediately. MCP tools optional for validation/testing bookkeeping.**
## When to Use This Skill
Use hive-concepts when:
- Starting a new agent project and need to understand fundamentals
- Need to understand agent architecture before building
- Want to validate tool availability before proceeding
- Learning about node types, edges, and graph execution
**Next Steps:**
- Ready to build? → Use `hive-create` skill
- Need patterns and examples? → Use `hive-patterns` skill
## MCP Tools for Validation
After writing files, optionally use MCP tools for validation:
**test_node** - Validate node configuration with mock inputs
```python
mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "test query"}',
mock_llm_response='{"results": "mock output"}'
)
```
**validate_graph** - Check graph structure
```python
mcp__agent-builder__validate_graph()
# Returns: unreachable nodes, missing connections, event_loop validation, etc.
```
**configure_loop** - Set event loop parameters
```python
mcp__agent-builder__configure_loop(
max_iterations=50,
max_tool_calls_per_turn=10,
stall_detection_threshold=3,
max_history_tokens=32000
)
```
**Key Point:** Files are written FIRST. MCP tools are for validation only.
## Related Skills
- **hive-create** - Step-by-step building process
- **hive-patterns** - Best practices: judges, feedback edges, fan-out, context management
- **hive** - Complete workflow orchestrator
- **hive-test** - Test and validate completed agents
+516
View File
@@ -0,0 +1,516 @@
---
name: hive-create
description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
license: Apache-2.0
metadata:
author: hive
version: "2.1"
type: procedural
part_of: hive
requires: hive-concepts
---
# Agent Construction - EXECUTE THESE STEPS
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
**CRITICAL: DO NOT explore the codebase, read source files, or search for code before starting.** All context you need is in this skill file. When this skill is loaded, IMMEDIATELY begin executing Step 1 — call the MCP tools listed in Step 1 as your FIRST action. Do not explain what you will do, do not investigate the project structure, do not read any files — just execute Step 1 now.
---
## STEP 1: Initialize Build Environment
**EXECUTE THESE TOOL CALLS NOW** (silent setup — no user interaction needed):
1. Register the hive-tools MCP server:
```
mcp__agent-builder__add_mcp_server(
name="hive-tools",
transport="stdio",
command="python",
args='["mcp_server.py", "--stdio"]',
cwd="tools",
description="Hive tools MCP server"
)
```
2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
```
mcp__agent-builder__create_session(name="AGENT_NAME")
```
3. Discover available tools:
```
mcp__agent-builder__list_mcp_tools()
```
4. Create the package directory:
```bash
mkdir -p exports/AGENT_NAME/nodes
```
**Save the tool list for step 3** — you will need it for node design in STEP 3.
**THEN immediately proceed to STEP 2** (do NOT display setup results to the user — just move on).
---
## STEP 2: Define Goal Together with User
**DO NOT propose a complete goal on your own.** Instead, collaborate with the user to define it.
**START by asking the user to help shape the goal:**
> I've set up the build environment and discovered [N] available tools. Let's define the goal for your agent together.
>
> To get started, can you help me understand:
>
> 1. **What should this agent accomplish?** (the core purpose)
> 2. **How will we know it succeeded?** (what does "done" look like)
> 3. **Are there any hard constraints?** (things it must never do, quality bars, etc.)
**WAIT for the user to respond.** Use their input to draft:
- Goal ID (kebab-case)
- Goal name
- Goal description
- 3-5 success criteria (each with: id, description, metric, target, weight)
- 2-4 constraints (each with: id, description, constraint_type, category)
**PRESENT the draft goal for approval:**
> **Proposed Goal: [Name]**
>
> [Description]
>
> **Success Criteria:**
>
> 1. [criterion 1]
> 2. [criterion 2]
> ...
>
> **Constraints:**
>
> 1. [constraint 1]
> 2. [constraint 2]
> ...
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this goal definition?",
"header": "Goal",
"options": [
{"label": "Approve", "description": "Goal looks good, proceed to workflow design"},
{"label": "Modify", "description": "I want to change something"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
- If **Modify**: Ask what they want to change, update the draft, ask again
---
## STEP 3: Design Conceptual Nodes
**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
**DESIGN the workflow** as a series of nodes. For each node, determine:
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist from Step 1 — empty list if no tools needed)
- client_facing: True if this node interacts with the user
- nullable_output_keys (for mutually exclusive outputs or feedback-only inputs)
- max_node_visits (>1 if this node is a feedback loop target)
**Prefer fewer, richer nodes** (4 nodes > 8 thin nodes). Each node boundary requires serializing outputs. A research node that searches, fetches, and analyzes keeps all source material in its conversation history.
**PRESENT the nodes to the user for review:**
> **Proposed Nodes ([N] total):**
>
> | # | Node ID | Type | Description | Tools | Client-Facing |
> | --- | ---------- | ---------- | ----------------------------- | ---------------------- | :-----------: |
> | 1 | `intake` | event_loop | Gather requirements from user | — | Yes |
> | 2 | `research` | event_loop | Search and analyze sources | web_search, web_scrape | No |
> | 3 | `review` | event_loop | Present findings for approval | — | Yes |
> | 4 | `report` | event_loop | Generate final report | save_data | No |
>
> **Data Flow:**
>
> - `intake` produces: `research_brief`
> - `research` receives: `research_brief` → produces: `findings`, `sources`
> - `review` receives: `findings`, `sources` → produces: `approved_findings` or `feedback`
> - `report` receives: `approved_findings` → produces: `final_report`
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve these nodes?",
"header": "Nodes",
"options": [
{"label": "Approve", "description": "Nodes look good, proceed to graph design"},
{"label": "Modify", "description": "I want to change the nodes"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 4
- If **Modify**: Ask what they want to change, update design, ask again
---
## STEP 4: Design Full Graph and Review
**DETERMINE the edges** connecting the approved nodes. For each edge:
- edge_id (kebab-case)
- source → target
- condition: `on_success`, `on_failure`, `always`, or `conditional`
- condition_expr (Python expression, only if conditional)
- priority (positive = forward, negative = feedback/loop-back)
**RENDER the complete graph as ASCII art.** Make it large and clear — the user needs to see and understand the full workflow at a glance.
**IMPORTANT: Make the ASCII art BIG and READABLE.** Use a box-and-arrow style with generous spacing. Do NOT make it tiny or compressed. Example format:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT: Research Agent │
│ │
│ Goal: Thoroughly research technical topics and produce verified reports │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────────────────────┐
│ INTAKE │
│ (client-facing) │
│ │
│ in: topic │
│ out: research_brief │
└───────────┬───────────┘
│ on_success
┌───────────────────────┐
│ RESEARCH │
│ │
│ tools: web_search, │
│ web_scrape │
│ │
│ in: research_brief │
│ [feedback] │
│ out: findings, │
│ sources │
└───────────┬───────────┘
│ on_success
┌───────────────────────┐
│ REVIEW │
│ (client-facing) │
│ │
│ in: findings, │
│ sources │
│ out: approved_findings│
│ OR feedback │
└───────┬───────┬───────┘
│ │
approved │ │ feedback (priority: -1)
│ │
▼ └──────────────────┐
┌───────────────────────┐ │
│ REPORT │ │
│ │ │
│ tools: save_data │ │
│ │ │
│ in: approved_ │ │
│ findings │ │
│ out: final_report │ │
└───────────────────────┘ │
┌──────────────────────────┘
│ loops back to RESEARCH
▼ (max_node_visits: 3)
EDGES:
──────
1. intake → research [on_success, priority: 1]
2. research → review [on_success, priority: 1]
3. review → report [conditional: approved_findings is not None, priority: 1]
4. review → research [conditional: feedback is not None, priority: -1]
```
**PRESENT the graph and edges to the user:**
> Here is the complete workflow graph:
>
> [ASCII art above]
>
> **Edge Summary:**
>
> | # | Edge | Condition | Priority |
> | --- | ----------------- | -------------------------------------------- | -------- |
> | 1 | intake → research | on_success | 1 |
> | 2 | research → review | on_success | 1 |
> | 3 | review → report | conditional: `approved_findings is not None` | 1 |
> | 4 | review → research | conditional: `feedback is not None` | -1 |
**THEN call AskUserQuestion:**
```
AskUserQuestion(questions=[{
"question": "Do you approve this workflow graph?",
"header": "Graph",
"options": [
{"label": "Approve", "description": "Graph looks good, proceed to build the agent"},
{"label": "Modify", "description": "I want to change the graph"}
],
"multiSelect": false
}])
```
**WAIT for user response.**
- If **Approve**: Proceed to STEP 5
- If **Modify**: Ask what they want to change, update the graph, re-render, ask again
---
## STEP 5: Build the Agent
**NOW — and only now — write the actual code.** The user has approved the goal, nodes, and graph.
### 5a: Register nodes and edges with MCP
**FOR EACH approved node**, call:
```
mcp__agent-builder__add_node(
node_id="...",
name="...",
description="...",
node_type="event_loop",
input_keys='["key1", "key2"]',
output_keys='["key1"]',
tools='["tool1"]',
system_prompt="...",
client_facing=True/False,
nullable_output_keys='["key"]',
max_node_visits=1
)
```
**FOR EACH approved edge**, call:
```
mcp__agent-builder__add_edge(
edge_id="source-to-target",
source="source-node-id",
target="target-node-id",
condition="on_success",
condition_expr="",
priority=1
)
```
**VALIDATE the graph:**
```
mcp__agent-builder__validate_graph()
```
- If invalid: Fix the issues and re-validate
- If valid: Continue to 5b
### 5b: Write Python package files
**EXPORT the graph data:**
```
mcp__agent-builder__export_graph()
```
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
5. `__main__.py` - CLI interface
6. `mcp_servers.json` - MCP server configurations
7. `README.md` - Usage documentation
**IMPORTANT entry_points format:**
- MUST be: `{"start": "first-node-id"}`
- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
- NOT: `{"first-node-id"}` (WRONG - this is a set)
**IMPORTANT mcp_servers.json format:**
```json
{
"hive-tools": {
"transport": "stdio",
"command": "python",
"args": ["mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server"
}
}
```
- NO `"mcpServers"` wrapper (that's Claude Desktop format, NOT hive format)
- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
**Use the example agent** at `.claude/skills/hive-create/examples/deep_research_agent/` as a template for file structure and patterns. It demonstrates: STEP 1/STEP 2 prompts, client-facing nodes, feedback loops, nullable_output_keys, and data tools.
**AFTER writing all files, tell the user:**
> Agent package created: `exports/AGENT_NAME/`
>
> **Files generated:**
>
> - `__init__.py` - Package exports
> - `agent.py` - Goal, nodes, edges, agent class
> - `config.py` - Runtime configuration
> - `__main__.py` - CLI interface
> - `nodes/__init__.py` - Node definitions
> - `mcp_servers.json` - MCP server config
> - `README.md` - Usage documentation
---
## STEP 6: Verify and Test
**RUN validation:**
```bash
cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME validate
```
- If valid: Agent is complete!
- If errors: Fix the issues and re-run
**TELL the user the agent is ready** and suggest next steps:
- Run with mock mode to test without API calls
- Use `/hive-test` skill for comprehensive testing
- Use `/hive-credentials` if the agent needs API keys
---
## REFERENCE: Node Types
| Type | tools param | Use when |
| ------------ | ----------------------- | --------------------------------------- |
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
---
## REFERENCE: NodeSpec Fields
| Field | Default | Description |
| ---------------------- | ------- | --------------------------------------------------------------------- |
| `client_facing` | `False` | Streams output to user, blocks for input between turns |
| `nullable_output_keys` | `[]` | Output keys that may remain unset (mutually exclusive outputs) |
| `max_node_visits` | `1` | Max executions per run. Set >1 for feedback loop targets. 0=unlimited |
---
## REFERENCE: Edge Conditions & Priority
| Condition | When edge is followed |
| ------------- | ------------------------------------- |
| `on_success` | Source node completed successfully |
| `on_failure` | Source node failed |
| `always` | Always, regardless of success/failure |
| `conditional` | When condition_expr evaluates to True |
**Priority:** Positive = forward edge (evaluated first). Negative = feedback edge (loops back to earlier node). Multiple ON_SUCCESS edges from same source = parallel execution (fan-out).
---
## REFERENCE: System Prompt Best Practice
For **internal** event_loop nodes (not client-facing), instruct the LLM to use `set_output`:
```
Use set_output(key, value) to store your results. For example:
- set_output("search_results", <your results as a JSON string>)
Do NOT return raw JSON. Use the set_output tool to produce outputs.
```
For **client-facing** event_loop nodes, use the STEP 1/STEP 2 pattern:
```
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions, etc.]
**STEP 2 — After the user responds, call set_output:**
- set_output("key", "value based on user's response")
```
This prevents the LLM from calling `set_output` before the user has had a chance to respond. The "NO tool calls" instruction in STEP 1 ensures the node blocks for user input before proceeding.
---
## EventLoopNode Runtime
EventLoopNodes are **auto-created** by `GraphExecutor` at runtime. Both direct `GraphExecutor` and `AgentRuntime` / `create_agent_runtime()` handle event_loop nodes automatically. No manual `node_registry` setup is needed.
```python
# Direct execution
from framework.graph.executor import GraphExecutor
from framework.runtime.core import Runtime
storage_path = Path.home() / ".hive" / "my_agent"
storage_path.mkdir(parents=True, exist_ok=True)
runtime = Runtime(storage_path)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
storage_path=storage_path,
)
result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
```
**DO NOT pass `runtime=None` to `GraphExecutor`** — it will crash with `'NoneType' object has no attribute 'start_run'`.
---
## COMMON MISTAKES TO AVOID
1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
3. **Skipping validation** - Always validate nodes and graph before proceeding
4. **Not waiting for approval** - Always ask user before major steps
5. **Displaying this file** - Execute the steps, don't show documentation
6. **Too many thin nodes** - Prefer fewer, richer nodes (4 nodes > 8 nodes)
7. **Missing STEP 1/STEP 2 in client-facing prompts** - Client-facing nodes need explicit phases to prevent premature set_output
8. **Forgetting nullable_output_keys** - Mark input_keys that only arrive on certain edges (e.g., feedback) as nullable on the receiving node
9. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
10. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
11. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), and `cwd` must be `"../../tools"` not `"tools"`
@@ -0,0 +1,24 @@
"""
Deep Research Agent - Interactive, rigorous research with TUI conversation.
Research any topic through multi-source web search, quality evaluation,
and synthesis. Features client-facing TUI interaction at key checkpoints
for user guidance and iterative deepening.
"""
from .agent import DeepResearchAgent, default_agent, goal, nodes, edges
from .config import RuntimeConfig, AgentMetadata, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"DeepResearchAgent",
"default_agent",
"goal",
"nodes",
"edges",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
@@ -1,5 +1,5 @@
"""
CLI entry point for Online Research Agent.
CLI entry point for Deep Research Agent.
Uses AgentRuntime for multi-entrypoint support with HITL pause/resume.
"""
@@ -10,7 +10,7 @@ import logging
import sys
import click
from .agent import default_agent, OnlineResearchAgent
from .agent import default_agent, DeepResearchAgent
def setup_logging(verbose=False, debug=False):
@@ -28,7 +28,7 @@ def setup_logging(verbose=False, debug=False):
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Online Research Agent - Deep-dive research with narrative reports."""
"""Deep Research Agent - Interactive, rigorous research with TUI conversation."""
pass
@@ -59,6 +59,85 @@ def run(topic, mock, quiet, verbose, debug):
sys.exit(0 if result.success else 1)
@cli.command()
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
def tui(mock, verbose, debug):
"""Launch the TUI dashboard for interactive research."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo(
"TUI requires the 'textual' package. Install with: pip install textual"
)
sys.exit(1)
from pathlib import Path
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import create_agent_runtime
from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec
async def run_with_tui():
agent = DeepResearchAgent()
# Build graph and tools
agent._event_bus = EventBus()
agent._tool_registry = ToolRegistry()
storage_path = Path.home() / ".hive" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
agent._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock:
llm = LiteLLMProvider(
model=agent.config.model,
api_key=agent.config.api_key,
api_base=agent.config.api_base,
)
tools = list(agent._tool_registry.get_tools().values())
tool_executor = agent._tool_registry.get_executor()
graph = agent._build_graph()
runtime = create_agent_runtime(
graph=graph,
goal=agent.goal,
storage_path=storage_path,
entry_points=[
EntryPointSpec(
id="start",
name="Start Research",
entry_node="intake",
trigger_type="manual",
isolation_level="isolated",
),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
)
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_with_tui())
@cli.command()
@click.option("--json", "output_json", is_flag=True)
def info(output_json):
@@ -71,6 +150,7 @@ def info(output_json):
click.echo(f"Version: {info_data['version']}")
click.echo(f"Description: {info_data['description']}")
click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
click.echo(f"Entry: {info_data['entry_node']}")
click.echo(f"Terminal: {', '.join(info_data['terminal_nodes'])}")
@@ -81,6 +161,9 @@ def validate():
validation = default_agent.validate()
if validation["valid"]:
click.echo("Agent is valid")
if validation["warnings"]:
for warning in validation["warnings"]:
click.echo(f" WARNING: {warning}")
else:
click.echo("Agent has errors:")
for error in validation["errors"]:
@@ -91,7 +174,7 @@ def validate():
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
def shell(verbose):
"""Interactive research session."""
"""Interactive research session (CLI, no TUI)."""
asyncio.run(_interactive_shell(verbose))
@@ -99,10 +182,10 @@ async def _interactive_shell(verbose=False):
"""Async interactive shell."""
setup_logging(verbose=verbose)
click.echo("=== Online Research Agent ===")
click.echo("=== Deep Research Agent ===")
click.echo("Enter a topic to research (or 'quit' to exit):\n")
agent = OnlineResearchAgent()
agent = DeepResearchAgent()
await agent.start()
try:
@@ -118,7 +201,7 @@ async def _interactive_shell(verbose=False):
if not topic.strip():
continue
click.echo("\nResearching... (this may take a few minutes)\n")
click.echo("\nResearching...\n")
result = await agent.trigger_and_wait("start", {"topic": topic})
@@ -128,16 +211,16 @@ async def _interactive_shell(verbose=False):
if result.success:
output = result.output
if "file_path" in output:
click.echo(f"\nReport saved to: {output['file_path']}\n")
if "final_report" in output:
click.echo("\n--- Report Preview ---\n")
preview = (
output["final_report"][:500] + "..."
if len(output.get("final_report", "")) > 500
else output.get("final_report", "")
)
click.echo(preview)
if "report_content" in output:
click.echo("\n--- Report ---\n")
click.echo(output["report_content"])
click.echo("\n")
if "references" in output:
click.echo("--- References ---\n")
for ref in output.get("references", []):
click.echo(
f" [{ref.get('number', '?')}] {ref.get('title', '')} - {ref.get('url', '')}"
)
click.echo("\n")
else:
click.echo(f"\nResearch failed: {result.error}\n")
@@ -0,0 +1,311 @@
"""Agent graph construction for Deep Research Agent."""
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.runtime.event_bus import EventBus
from framework.runtime.core import Runtime
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from .config import default_config, metadata
from .nodes import (
intake_node,
research_node,
review_node,
report_node,
)
# Goal definition
goal = Goal(
id="rigorous-interactive-research",
name="Rigorous Interactive Research",
description=(
"Research any topic by searching diverse sources, analyzing findings, "
"and producing a cited report — with user checkpoints to guide direction."
),
success_criteria=[
SuccessCriterion(
id="source-diversity",
description="Use multiple diverse, authoritative sources",
metric="source_count",
target=">=5",
weight=0.25,
),
SuccessCriterion(
id="citation-coverage",
description="Every factual claim in the report cites its source",
metric="citation_coverage",
target="100%",
weight=0.25,
),
SuccessCriterion(
id="user-satisfaction",
description="User reviews findings before report generation",
metric="user_approval",
target="true",
weight=0.25,
),
SuccessCriterion(
id="report-completeness",
description="Final report answers the original research questions",
metric="question_coverage",
target="90%",
weight=0.25,
),
],
constraints=[
Constraint(
id="no-hallucination",
description="Only include information found in fetched sources",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="source-attribution",
description="Every claim must cite its source with a numbered reference",
constraint_type="quality",
category="accuracy",
),
Constraint(
id="user-checkpoint",
description="Present findings to the user before writing the final report",
constraint_type="functional",
category="interaction",
),
],
)
# Node list
nodes = [
intake_node,
research_node,
review_node,
report_node,
]
# Edge definitions
edges = [
# intake -> research
EdgeSpec(
id="intake-to-research",
source="intake",
target="research",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# research -> review
EdgeSpec(
id="research-to-review",
source="research",
target="review",
condition=EdgeCondition.ON_SUCCESS,
priority=1,
),
# review -> research (feedback loop)
EdgeSpec(
id="review-to-research-feedback",
source="review",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == True",
priority=1,
),
# review -> report (user satisfied)
EdgeSpec(
id="review-to-report",
source="review",
target="report",
condition=EdgeCondition.CONDITIONAL,
condition_expr="needs_more_research == False",
priority=2,
),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = ["report"]
class DeepResearchAgent:
"""
Deep Research Agent 4-node pipeline with user checkpoints.
Flow: intake -> research -> review -> report
^ |
+-- feedback loop (if user wants more)
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._executor: GraphExecutor | None = None
self._graph: GraphSpec | None = None
self._event_bus: EventBus | None = None
self._tool_registry: ToolRegistry | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
return GraphSpec(
id="deep-research-agent-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config={
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
)
def _setup(self, mock_mode=False) -> GraphExecutor:
"""Set up the executor with all components."""
from pathlib import Path
storage_path = Path.home() / ".hive" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
self._event_bus = EventBus()
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
self._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock_mode:
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
)
tool_executor = self._tool_registry.get_executor()
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
runtime = Runtime(storage_path)
self._executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
event_bus=self._event_bus,
storage_path=storage_path,
loop_config=self._graph.loop_config,
)
return self._executor
async def start(self, mock_mode=False) -> None:
"""Set up the agent (initialize executor and tools)."""
if self._executor is None:
self._setup(mock_mode=mock_mode)
async def stop(self) -> None:
"""Clean up resources."""
self._executor = None
self._event_bus = None
async def trigger_and_wait(
self,
entry_point: str,
input_data: dict,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._executor is None:
raise RuntimeError("Agent not started. Call start() first.")
if self._graph is None:
raise RuntimeError("Graph not built. Call start() first.")
return await self._executor.execute(
graph=self._graph,
goal=self.goal,
input_data=input_data,
session_state=session_state,
)
async def run(
self, context: dict, mock_mode=False, session_state=None
) -> ExecutionResult:
"""Run the agent (convenience method for single execution)."""
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait(
"start", context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
def info(self):
"""Get agent information."""
return {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": self.goal.name,
"description": self.goal.description,
},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"entry_points": self.entry_points,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {node.id for node in self.nodes}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{self.entry_node}' not found")
for terminal in self.terminal_nodes:
if terminal not in node_ids:
errors.append(f"Terminal node '{terminal}' not found")
for ep_id, node_id in self.entry_points.items():
if node_id not in node_ids:
errors.append(
f"Entry point '{ep_id}' references unknown node '{node_id}'"
)
return {
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}
# Create default instance
default_agent = DeepResearchAgent()
@@ -0,0 +1,46 @@
"""Runtime configuration."""
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Deep Research Agent"
version: str = "1.0.0"
description: str = (
"Interactive research agent that rigorously investigates topics through "
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
metadata = AgentMetadata()
@@ -0,0 +1,162 @@
"""Node definitions for Deep Research Agent."""
from framework.graph import NodeSpec
# Node 1: Intake (client-facing)
# Brief conversation to clarify what the user wants researched.
intake_node = NodeSpec(
id="intake",
name="Research Intake",
description="Discuss the research topic with the user, clarify scope, and confirm direction",
node_type="event_loop",
client_facing=True,
input_keys=["topic"],
output_keys=["research_brief"],
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
**STEP 1 Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions (scope, angle, depth)
3. If it's already clear, confirm your understanding and ask the user to confirm
Keep it short. Don't over-ask.
**STEP 2 After the user confirms, call set_output:**
- set_output("research_brief", "A clear paragraph describing exactly what to research, \
what questions to answer, what scope to cover, and how deep to go.")
""",
tools=[],
)
# Node 2: Research
# The workhorse — searches the web, fetches content, analyzes sources.
# One node with both tools avoids the context-passing overhead of 5 separate nodes.
research_node = NodeSpec(
id="research",
name="Research",
description="Search the web, fetch source content, and compile findings",
node_type="event_loop",
max_node_visits=3,
input_keys=["research_brief", "feedback"],
output_keys=["findings", "sources", "gaps"],
nullable_output_keys=["feedback"],
system_prompt="""\
You are a research agent. Given a research brief, find and analyze sources.
If feedback is provided, this is a follow-up round focus on the gaps identified.
Work in phases:
1. **Search**: Use web_search with 3-5 diverse queries covering different angles.
Prioritize authoritative sources (.edu, .gov, established publications).
2. **Fetch**: Use web_scrape on the most promising URLs (aim for 5-8 sources).
Skip URLs that fail. Extract the substantive content.
3. **Analyze**: Review what you've collected. Identify key findings, themes,
and any contradictions between sources.
Important:
- Work in batches of 3-4 tool calls at a time to manage context
- After each batch, assess whether you have enough material
- Prefer quality over quantity 5 good sources beat 15 thin ones
- Track which URL each finding comes from (you'll need citations later)
When done, use set_output:
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
Include themes, contradictions, and confidence levels.")
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
""",
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
)
# Node 3: Review (client-facing)
# Shows the user what was found and asks whether to dig deeper or proceed.
review_node = NodeSpec(
id="review",
name="Review Findings",
description="Present findings to user and decide whether to research more or write the report",
node_type="event_loop",
client_facing=True,
max_node_visits=3,
input_keys=["findings", "sources", "gaps", "research_brief"],
output_keys=["needs_more_research", "feedback"],
system_prompt="""\
Present the research findings to the user clearly and concisely.
**STEP 1 Present (your first message, text only, NO tool calls):**
1. **Summary** (2-3 sentences of what was found)
2. **Key Findings** (bulleted, with confidence levels)
3. **Sources Used** (count and quality assessment)
4. **Gaps** (what's still unclear or under-covered)
End by asking: Are they satisfied, or do they want deeper research? \
Should we proceed to writing the final report?
**STEP 2 After the user responds, call set_output:**
- set_output("needs_more_research", "true") if they want more
- set_output("needs_more_research", "false") if they're satisfied
- set_output("feedback", "What the user wants explored further, or empty string")
""",
tools=[],
)
# Node 4: Report (client-facing)
# Writes an HTML report, serves the link to the user, and answers follow-ups.
report_node = NodeSpec(
id="report",
name="Write & Deliver Report",
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status"],
system_prompt="""\
Write a comprehensive research report as an HTML file and present it to the user.
**STEP 1 Write the HTML report (tool calls, NO text to user yet):**
1. Compose a complete, self-contained HTML document with embedded CSS styling.
Use a clean, readable design: max-width container, pleasant typography,
numbered citation links, a table of contents, and a references section.
Report structure inside the HTML:
- Title & date
- Executive Summary (2-3 paragraphs)
- Table of Contents
- Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications, areas of debate)
- Conclusion (key takeaways, confidence assessment)
- References (numbered list with clickable URLs)
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
2. Save the HTML file:
save_data(filename="report.html", data=<your_html>)
3. Get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions.
**STEP 3 After the user responds:**
- Answer follow-up questions from the research material
- When the user is satisfied: set_output("delivery_status", "completed")
""",
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
)
__all__ = [
"intake_node",
"research_node",
"review_node",
"report_node",
]
@@ -1,10 +1,10 @@
---
name: setup-credentials
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the encrypted credential store at ~/.hive/credentials.
name: hive-credentials
description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the local encrypted store at ~/.hive/credentials.
license: Apache-2.0
metadata:
author: hive
version: "2.1"
version: "2.3"
type: utility
---
@@ -31,47 +31,50 @@ Determine which agent needs credentials. The user will either:
Locate the agent's directory under `exports/{agent_name}/`.
### Step 2: Detect Required Credentials
### Step 2: Detect Missing Credentials
Read the agent's configuration to determine which tools and node types it uses:
Use the `check_missing_credentials` MCP tool to detect what the agent needs and what's already configured. This tool loads the agent, inspects its required tools and node types, maps them to credentials via `CREDENTIAL_SPECS`, and checks both the encrypted store and environment variables.
```python
from core.framework.runner import AgentRunner
runner = AgentRunner.load("exports/{agent_name}")
validation = runner.validate()
# validation.missing_credentials contains env var names
# validation.warnings contains detailed messages with help URLs
```
check_missing_credentials(agent_path="exports/{agent_name}")
```
Alternatively, check the credential store directly:
The tool returns a JSON response:
```python
from core.framework.credentials import CredentialStore
# Use encrypted storage (default: ~/.hive/credentials)
store = CredentialStore.with_encrypted_storage()
# Check what's available
available = store.list_credentials()
print(f"Available credentials: {available}")
# Check if specific credential exists
if store.is_available("hubspot"):
print("HubSpot credential found")
else:
print("HubSpot credential missing")
```json
{
"agent": "exports/{agent_name}",
"missing": [
{
"credential_name": "brave_search",
"env_var": "BRAVE_SEARCH_API_KEY",
"description": "Brave Search API key for web search",
"help_url": "https://brave.com/search/api/",
"tools": ["web_search"]
}
],
"available": [
{
"credential_name": "anthropic",
"env_var": "ANTHROPIC_API_KEY",
"source": "encrypted_store"
}
],
"total_missing": 1,
"ready": false
}
```
To see all known credential specs (for help URLs and setup instructions):
**If `ready` is true (nothing missing):** Report all credentials as configured and skip Steps 3-5. Example:
```python
from aden_tools.credentials import CREDENTIAL_SPECS
for name, spec in CREDENTIAL_SPECS.items():
print(f"{name}: env_var={spec.env_var}, aden={spec.aden_supported}")
```
All required credentials are already configured:
✓ anthropic (ANTHROPIC_API_KEY)
✓ brave_search (BRAVE_SEARCH_API_KEY)
Your agent is ready to run!
```
**If credentials are missing:** Continue to Step 3 with the `missing` list.
### Step 3: Present Auth Options for Each Missing Credential
@@ -104,8 +107,8 @@ Present the available options using AskUserQuestion:
```
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Authorization Server (Recommended)
Secure OAuth2 flow via integration.adenhq.com
1) Aden Platform (OAuth) (Recommended)
Secure OAuth2 flow via hive.adenhq.com
- Quick setup with automatic token refresh
- No need to manage API keys manually
@@ -114,7 +117,7 @@ Choose how to configure HUBSPOT_ACCESS_TOKEN:
- Requires creating a HubSpot Private App
- Full control over scopes and permissions
3) Custom Credential Store (Advanced)
3) Local Credential Setup (Advanced)
Programmatic configuration for CI/CD
- For automated deployments
- Requires manual API calls
@@ -122,7 +125,23 @@ Choose how to configure HUBSPOT_ACCESS_TOKEN:
### Step 4: Execute Auth Flow Based on User Choice
#### Option 1: Aden Authorization Server
#### Prerequisite: Ensure HIVE_CREDENTIAL_KEY Is Available
Before storing any credentials, verify `HIVE_CREDENTIAL_KEY` is set (needed to encrypt/decrypt the local store). Check both the current session and shell config:
```bash
# Check current session
printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "session: set" || echo "session: not set"
# Check shell config files
for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTIAL_KEY' "$f" && echo "$f"; done
```
- **In current session** — proceed to store credentials
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere**`EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
@@ -147,7 +166,7 @@ If not set, guide user to get one from Aden (this is where they do OAuth):
from aden_tools.credentials import open_browser, get_aden_setup_url
# Open browser to Aden - user will sign up and connect integrations there
url = get_aden_setup_url() # https://integration.adenhq.com/setup
url = get_aden_setup_url() # https://hive.adenhq.com
success, msg = open_browser(url)
print("Please sign in to Aden and connect your integrations (HubSpot, etc.).")
@@ -174,7 +193,7 @@ shell_type = detect_shell() # 'bash', 'zsh', or 'unknown'
success, config_path = add_env_var_to_shell_config(
"ADEN_API_KEY",
user_provided_key,
comment="Aden authorization server API key"
comment="Aden Platform (OAuth) API key"
)
if success:
@@ -224,7 +243,7 @@ print(f"Synced credentials: {synced}")
# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
if "hubspot" not in synced:
print("HubSpot not found in your Aden account.")
print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.")
print("Please visit https://hive.adenhq.com to connect HubSpot, then try again.")
```
For more control over the sync process:
@@ -313,7 +332,7 @@ if not result.valid:
# 2. Continue anyway (not recommended)
```
**4.2d. Store in Encrypted Credential Store**
**4.2d. Store in Local Encrypted Store**
```python
from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey
@@ -340,7 +359,7 @@ store.save_credential(cred)
export HUBSPOT_ACCESS_TOKEN="the-value"
```
#### Option 3: Custom Credential Store (Advanced)
#### Option 3: Local Credential Setup (Advanced)
For programmatic/CI/CD setups.
@@ -394,24 +413,38 @@ config_path.write_text(json.dumps(config, indent=2))
### Step 6: Verify All Credentials
Run validation again to confirm everything is set:
Use the `verify_credentials` MCP tool to confirm everything is properly configured:
```python
runner = AgentRunner.load("exports/{agent_name}")
validation = runner.validate()
assert not validation.missing_credentials, "Still missing credentials!"
```
verify_credentials(agent_path="exports/{agent_name}")
```
Report the result to the user.
The tool returns:
```json
{
"agent": "exports/{agent_name}",
"ready": true,
"missing_credentials": [],
"warnings": [],
"errors": []
}
```
If `ready` is true, report success. If `missing_credentials` is non-empty, identify what failed and loop back to Step 3 for the remaining credentials.
## Health Check Reference
Health checks validate credentials by making lightweight API calls:
| Credential | Endpoint | What It Checks |
| -------------- | --------------------------------------- | --------------------------------- |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| Credential | Endpoint | What It Checks |
| --------------- | --------------------------------------- | --------------------------------- |
| `anthropic` | `POST /v1/messages` | API key validity |
| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity |
| `google_search` | `GET /customsearch/v1?q=test&num=1` | API key + CSE ID validity |
| `github` | `GET /user` | Token validity, user identity |
| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes |
| `resend` | `GET /domains` | API key validity |
```python
from aden_tools.credentials import check_credential_health, HealthCheckResult
@@ -424,7 +457,7 @@ result: HealthCheckResult = check_credential_health("hubspot", token_value)
## Encryption Key (HIVE_CREDENTIAL_KEY)
The encrypted credential store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
The local encrypted store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
- If the user doesn't have one, `EncryptedFileStorage` will auto-generate one and log it
- The user MUST persist this key (e.g., in `~/.bashrc` or a secrets manager)
@@ -443,7 +476,7 @@ If `HIVE_CREDENTIAL_KEY` is not set:
- **NEVER** store credentials in plaintext files, git-tracked files, or agent configs
- **NEVER** hardcode credentials in source code
- **ALWAYS** use `SecretStr` from Pydantic when handling credential values in Python
- **ALWAYS** use the encrypted credential store (`~/.hive/credentials`) for persistence
- **ALWAYS** use the local encrypted store (`~/.hive/credentials`) for persistence
- **ALWAYS** run health checks before storing credentials (when possible)
- **ALWAYS** verify credentials were stored by re-running validation, not by reading them back
- When modifying `~/.bashrc` or `~/.zshrc`, confirm with the user first
@@ -456,7 +489,8 @@ All credential specs are defined in `tools/src/aden_tools/credentials/`:
| ----------------- | ------------- | --------------------------------------------- | -------------- |
| `llm.py` | LLM Providers | `anthropic` | No |
| `search.py` | Search Tools | `brave_search`, `google_search`, `google_cse` | No |
| `integrations.py` | Integrations | `hubspot` | Yes |
| `email.py` | Email | `resend` | No |
| `integrations.py` | Integrations | `github`, `hubspot` | No / Yes |
**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment
variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS.
@@ -507,66 +541,60 @@ token = store.get_key("hubspot", "access_token")
## Example Session
```
User: /setup-credentials for my hubspot-agent
User: /hive-credentials for my research-agent
Agent: Let me check what credentials your hubspot-agent needs.
Agent: Let me check what credentials your research-agent needs.
[Runs validation, finds ANTHROPIC_API_KEY and HUBSPOT_ACCESS_TOKEN missing]
[Calls check_missing_credentials(agent_path="exports/research-agent")]
→ Returns:
available: anthropic (encrypted_store), brave_search (encrypted_store)
missing: google_search (GOOGLE_API_KEY), google_cse (GOOGLE_CSE_ID)
ready: false
Agent: Your hubspot-agent requires 2 credentials:
Agent: 2 of 4 required credentials are already configured. Only Google Custom
Search needs setup (2 values).
1. ANTHROPIC_API_KEY - Anthropic API key for LLM calls
2. HUBSPOT_ACCESS_TOKEN - HubSpot access token
--- Setting up Google Custom Search (google_search + google_cse) ---
Let's set these up one at a time.
This requires two values that work together.
--- Setting up ANTHROPIC_API_KEY ---
[Checks HIVE_CREDENTIAL_KEY before storing]
$ printenv HIVE_CREDENTIAL_KEY > /dev/null 2>&1 && echo "set" || echo "not set"
set
This credential only supports direct API key entry.
First, the Google API Key:
1. Go to https://console.cloud.google.com/apis/credentials
2. Create a new project (or select an existing one)
3. Enable the "Custom Search API" from the API Library
4. Go to Credentials > Create Credentials > API Key
5. Copy the generated API key
[Shows instructions for getting Anthropic API key]
[AskUserQuestion: "Please provide your Anthropic API key:"]
[User provides key]
[Stores in credential store, exports to env]
--- Setting up HUBSPOT_ACCESS_TOKEN ---
[AskUserQuestion with 3 options]
Choose how to configure HUBSPOT_ACCESS_TOKEN:
1) Aden Authorization Server (Recommended)
2) Direct API Key
3) Custom Credential Store (Advanced)
User: 1 (Aden)
Agent: Great! Let me check if you have an Aden API key configured...
[Checks for ADEN_API_KEY, not found]
[Opens browser to integration.adenhq.com/setup]
Agent: I've opened Aden's setup page. Please:
1. Sign in or create an account
2. Connect your HubSpot account (OAuth happens on Aden's site)
3. Copy the API key shown after setup
[AskUserQuestion: "Please provide your Aden API key:"]
[AskUserQuestion: "Please provide your Google API key:"]
[User provides key]
Agent: Would you like me to save this to your shell config (~/.zshrc) for future sessions?
Now, the Custom Search Engine ID:
1. Go to https://programmablesearchengine.google.com/controlpanel/all
2. Click "Add" to create a new search engine
3. Under "What to search", select "Search the entire web"
4. Give your search engine a name
5. Click "Create"
6. Copy the Search Engine ID (cx value)
[User approves]
[Saves to ~/.zshrc and ~/.hive/configuration.json]
[AskUserQuestion: "Please provide your Google CSE ID:"]
[User provides ID]
Agent: Let me sync your credentials from Aden...
[Runs health check with both values - GET /customsearch/v1?q=test&num=1 → 200 OK]
[Stores both in local encrypted store, exports to env]
[Syncs credentials from Aden server - OAuth already done on Aden's side]
[Runs health check]
✓ Google Custom Search credentials valid
Agent: HubSpot credentials validated successfully!
[Calls verify_credentials(agent_path="exports/research-agent")]
→ Returns: ready: true, missing_credentials: []
All credentials are now configured:
- ANTHROPIC_API_KEY: Stored in encrypted credential store
- HUBSPOT_ACCESS_TOKEN: Synced from Aden (OAuth completed on Aden)
- Validation passed - your agent is ready to run!
✓ anthropic (ANTHROPIC_API_KEY) — already in encrypted store
✓ brave_search (BRAVE_SEARCH_API_KEY) — already in encrypted store
✓ google_search (GOOGLE_API_KEY) — stored in encrypted store
✓ google_cse (GOOGLE_CSE_ID) — stored in encrypted store
Your agent is ready to run!
```
+385
View File
@@ -0,0 +1,385 @@
---
name: hive-patterns
description: Best practices, patterns, and examples for building goal-driven agents. Includes client-facing interaction, feedback edges, judge patterns, fan-out/fan-in, context management, and anti-patterns.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: reference
part_of: hive
---
# Building Agents - Patterns & Best Practices
Design patterns, examples, and best practices for building robust goal-driven agents.
**Prerequisites:** Complete agent structure using `hive-create`.
## Practical Example: Hybrid Workflow
How to build a node using both direct file writes and optional MCP validation:
```python
# 1. WRITE TO FILE FIRST (Primary - makes it visible)
node_code = '''
search_node = NodeSpec(
id="search-web",
node_type="event_loop",
input_keys=["query"],
output_keys=["search_results"],
system_prompt="Search the web for: {query}. Use web_search, then call set_output to store results.",
tools=["web_search"],
)
'''
Edit(
file_path="exports/research_agent/nodes/__init__.py",
old_string="# Nodes will be added here",
new_string=node_code
)
# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
validation = mcp__agent-builder__test_node(
node_id="search-web",
test_input='{"query": "python tutorials"}',
mock_llm_response='{"search_results": [...mock results...]}'
)
```
**User experience:**
- Immediately sees node in their editor (from step 1)
- Gets validation feedback (from step 2)
- Can edit the file directly if needed
## Multi-Turn Interaction Patterns
For agents needing multi-turn conversations with users, use `client_facing=True` on event_loop nodes.
### Client-Facing Nodes
A client-facing node streams LLM output to the user and blocks for user input between conversational turns. This replaces the old pause/resume pattern.
```python
# Client-facing node with STEP 1/STEP 2 prompt pattern
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
input_keys=["topic"],
output_keys=["research_brief"],
system_prompt="""\
You are an intake specialist.
**STEP 1 — Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If it's vague, ask 1-2 clarifying questions
3. If it's clear, confirm your understanding
**STEP 2 — After the user confirms, call set_output:**
- set_output("research_brief", "Clear description of what to research")
""",
)
# Internal node runs without user interaction
research_node = NodeSpec(
id="research",
name="Research",
description="Search and analyze sources",
node_type="event_loop",
input_keys=["research_brief"],
output_keys=["findings", "sources"],
system_prompt="Research the topic using web_search and web_scrape...",
tools=["web_search", "web_scrape", "load_data", "save_data"],
)
```
**How it works:**
- Client-facing nodes stream LLM text to the user and block for input after each response
- User input is injected via `node.inject_event(text)`
- When the LLM calls `set_output` to produce structured outputs, the judge evaluates and ACCEPTs
- Internal nodes (non-client-facing) run their entire loop without blocking
- `set_output` is a synthetic tool — a turn with only `set_output` calls (no real tools) triggers user input blocking
**STEP 1/STEP 2 pattern:** Always structure client-facing prompts with explicit phases. STEP 1 is text-only conversation. STEP 2 calls `set_output` after user confirmation. This prevents the LLM from calling `set_output` prematurely before the user responds.
### When to Use client_facing
| Scenario | client_facing | Why |
| ----------------------------------- | :-----------: | ---------------------- |
| Gathering user requirements | Yes | Need user input |
| Human review/approval checkpoint | Yes | Need human decision |
| Data processing (scanning, scoring) | No | Runs autonomously |
| Report generation | No | No user input needed |
| Final confirmation before action | Yes | Need explicit approval |
> **Legacy Note:** The `pause_nodes` / `entry_points` pattern still works for backward compatibility but `client_facing=True` is preferred for new agents.
## Edge-Based Routing and Feedback Loops
### Conditional Edge Routing
Multiple conditional edges from the same source replace the old `router` node type. Each edge checks a condition on the node's output.
```python
# Node with mutually exclusive outputs
review_node = NodeSpec(
id="review",
name="Review",
node_type="event_loop",
client_facing=True,
output_keys=["approved_contacts", "redo_extraction"],
nullable_output_keys=["approved_contacts", "redo_extraction"],
max_node_visits=3,
system_prompt="Present the contact list to the operator. If they approve, call set_output('approved_contacts', ...). If they want changes, call set_output('redo_extraction', 'true').",
)
# Forward edge (positive priority, evaluated first)
EdgeSpec(
id="review-to-campaign",
source="review",
target="campaign-builder",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('approved_contacts') is not None",
priority=1,
)
# Feedback edge (negative priority, evaluated after forward edges)
EdgeSpec(
id="review-feedback",
source="review",
target="extractor",
condition=EdgeCondition.CONDITIONAL,
condition_expr="output.get('redo_extraction') is not None",
priority=-1,
)
```
**Key concepts:**
- `nullable_output_keys`: Lists output keys that may remain unset. The node sets exactly one of the mutually exclusive keys per execution.
- `max_node_visits`: Must be >1 on the feedback target (extractor) so it can re-execute. Default is 1.
- `priority`: Positive = forward edge (evaluated first). Negative = feedback edge. The executor tries forward edges first; if none match, falls back to feedback edges.
### Routing Decision Table
| Pattern | Old Approach | New Approach |
| ---------------------- | ----------------------- | --------------------------------------------- |
| Conditional branching | `router` node | Conditional edges with `condition_expr` |
| Binary approve/reject | `pause_nodes` + resume | `client_facing=True` + `nullable_output_keys` |
| Loop-back on rejection | Manual entry_points | Feedback edge with `priority=-1` |
| Multi-way routing | Router with routes dict | Multiple conditional edges with priorities |
## Judge Patterns
**Core Principle: The judge is the SOLE mechanism for acceptance decisions.** Never add ad-hoc framework gating to compensate for LLM behavior. If the LLM calls `set_output` prematurely, fix the system prompt or use a custom judge. Anti-patterns to avoid:
- Output rollback logic
- `_user_has_responded` flags
- Premature set_output rejection
- Interaction protocol injection into system prompts
Judges control when an event_loop node's loop exits. Choose based on validation needs.
### Implicit Judge (Default)
When no judge is configured, the implicit judge ACCEPTs when:
- The LLM finishes its response with no tool calls
- All required output keys have been set via `set_output`
Best for simple nodes where "all outputs set" is sufficient validation.
### SchemaJudge
Validates outputs against a Pydantic model. Use when you need structural validation.
```python
from pydantic import BaseModel
class ScannerOutput(BaseModel):
github_users: list[dict] # Must be a list of user objects
class SchemaJudge:
def __init__(self, output_model: type[BaseModel]):
self._model = output_model
async def evaluate(self, context: dict) -> JudgeVerdict:
missing = context.get("missing_keys", [])
if missing:
return JudgeVerdict(
action="RETRY",
feedback=f"Missing output keys: {missing}. Use set_output to provide them.",
)
try:
self._model.model_validate(context["output_accumulator"])
return JudgeVerdict(action="ACCEPT")
except ValidationError as e:
return JudgeVerdict(action="RETRY", feedback=str(e))
```
### When to Use Which Judge
| Judge | Use When | Example |
| --------------- | ------------------------------------- | ---------------------- |
| Implicit (None) | Output keys are sufficient validation | Simple data extraction |
| SchemaJudge | Need structural validation of outputs | API response parsing |
| Custom | Domain-specific validation logic | Score must be 0.0-1.0 |
## Fan-Out / Fan-In (Parallel Execution)
Multiple ON_SUCCESS edges from the same source trigger parallel execution. All branches run concurrently via `asyncio.gather()`.
```python
# Scanner fans out to Profiler and Scorer in parallel
EdgeSpec(id="scanner-to-profiler", source="scanner", target="profiler",
condition=EdgeCondition.ON_SUCCESS)
EdgeSpec(id="scanner-to-scorer", source="scanner", target="scorer",
condition=EdgeCondition.ON_SUCCESS)
# Both fan in to Extractor
EdgeSpec(id="profiler-to-extractor", source="profiler", target="extractor",
condition=EdgeCondition.ON_SUCCESS)
EdgeSpec(id="scorer-to-extractor", source="scorer", target="extractor",
condition=EdgeCondition.ON_SUCCESS)
```
**Requirements:**
- Parallel event_loop nodes must have **disjoint output_keys** (no key written by both)
- Only one parallel branch may contain a `client_facing` node
- Fan-in node receives outputs from all completed branches in shared memory
## Context Management Patterns
### Tiered Compaction
EventLoopNode automatically manages context window usage with tiered compaction:
1. **Pruning** — Old tool results replaced with compact placeholders (zero-cost, no LLM call)
2. **Normal compaction** — LLM summarizes older messages
3. **Aggressive compaction** — Keeps only recent messages + summary
4. **Emergency** — Hard reset with tool history preservation
### Spillover Pattern
The framework automatically truncates large tool results and saves full content to a spillover directory. The LLM receives a truncation message with instructions to use `load_data` to read the full result.
For explicit data management, use the data tools (real MCP tools, not synthetic):
```python
# save_data, load_data, list_data_files, serve_file_to_user are real MCP tools
# data_dir is auto-injected by the framework — the LLM never sees it
# Saving large results
save_data(filename="sources.json", data=large_json_string)
# Reading with pagination (line-based offset/limit)
load_data(filename="sources.json", offset=0, limit=50)
# Listing available files
list_data_files()
# Serving a file to the user as a clickable link
serve_file_to_user(filename="report.html", label="Research Report")
```
Add data tools to nodes that handle large tool results:
```python
research_node = NodeSpec(
...
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
)
```
`data_dir` is a framework context parameter — auto-injected at call time. `GraphExecutor.execute()` sets it per-execution via `ToolRegistry.set_execution_context(data_dir=...)` (using `contextvars` for concurrency safety), ensuring it matches the session-scoped spillover directory.
## Anti-Patterns
### What NOT to Do
- **Don't rely on `export_graph`** — Write files immediately, not at end
- **Don't hide code in session** — Write to files as components are approved
- **Don't wait to write files** — Agent visible from first step
- **Don't batch everything** — Write incrementally, one component at a time
- **Don't create too many thin nodes** — Prefer fewer, richer nodes (see below)
- **Don't add framework gating for LLM behavior** — Fix prompts or use judges instead
### Fewer, Richer Nodes
A common mistake is splitting work into too many small single-purpose nodes. Each node boundary requires serializing outputs, losing in-context information, and adding edge complexity.
| Bad (8 thin nodes) | Good (4 rich nodes) |
| ------------------- | ----------------------------------- |
| parse-query | intake (client-facing) |
| search-sources | research (search + fetch + analyze) |
| fetch-content | review (client-facing) |
| evaluate-sources | report (write + deliver) |
| synthesize-findings | |
| write-report | |
| quality-check | |
| save-report | |
**Why fewer nodes are better:**
- The LLM retains full context of its work within a single node
- A research node that searches, fetches, and analyzes keeps all source material in its conversation history
- Fewer edges means simpler graph and fewer failure points
- Data tools (`save_data`/`load_data`) handle context window limits within a single node
### MCP Tools - Correct Usage
**MCP tools OK for:**
- `test_node` — Validate node configuration with mock inputs
- `validate_graph` — Check graph structure
- `configure_loop` — Set event loop parameters
- `create_session` — Track session state for bookkeeping
**Just don't:** Use MCP as the primary construction method or rely on export_graph
## Error Handling Patterns
### Graceful Failure with Fallback
```python
edges = [
# Success path
EdgeSpec(id="api-success", source="api-call", target="process-results",
condition=EdgeCondition.ON_SUCCESS),
# Fallback on failure
EdgeSpec(id="api-to-fallback", source="api-call", target="fallback-cache",
condition=EdgeCondition.ON_FAILURE, priority=1),
# Report if fallback also fails
EdgeSpec(id="fallback-to-error", source="fallback-cache", target="report-error",
condition=EdgeCondition.ON_FAILURE, priority=1),
]
```
## Handoff to Testing
When agent is complete, transition to testing phase:
### Pre-Testing Checklist
- [ ] Agent structure validates: `uv run python -m agent_name validate`
- [ ] All nodes defined in nodes/**init**.py
- [ ] All edges connect valid nodes with correct priorities
- [ ] Feedback edge targets have `max_node_visits > 1`
- [ ] Client-facing nodes have meaningful system prompts
- [ ] Agent can be imported: `from exports.agent_name import default_agent`
## Related Skills
- **hive-concepts** — Fundamental concepts (node types, edges, event loop architecture)
- **hive-create** — Step-by-step building process
- **hive-test** — Test and validate agents
- **hive** — Complete workflow orchestrator
---
**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
@@ -1,11 +1,11 @@
---
name: testing-agent
name: hive-test
description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results.
---
# Testing Workflow
This skill provides tools for testing agents built with the building-agents skill.
This skill provides tools for testing agents built with the hive-create skill.
## Workflow Overview
@@ -61,7 +61,7 @@ mcp__agent-builder__debug_test(
# Testing Agents with MCP Tools
Run goal-based evaluation tests for agents built with the building-agents skill.
Run goal-based evaluation tests for agents built with the hive-create skill.
**Key Principle: MCP tools provide guidelines, Claude writes tests directly**
- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines
@@ -279,7 +279,7 @@ if missing_creds:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ GOAL STAGE │
│ (building-agents skill) │
│ (hive-create skill) │
│ │
│ 1. User defines goal with success_criteria and constraints │
│ 2. Goal written to agent.py immediately │
@@ -289,7 +289,7 @@ if missing_creds:
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT STAGE │
│ (building-agents skill) │
│ (hive-create skill) │
│ │
│ Build nodes + edges, written immediately to files │
│ Constraint tests can run during development: │
@@ -608,7 +608,7 @@ Edit(
)
# 4. May need to regenerate agent nodes if goal changed significantly
# This requires going back to building-agents skill
# This requires going back to hive-create skill
```
#### EDGE_CASE → Add Test and Fix
@@ -930,9 +930,10 @@ assert approval == "APPROVED", f"Expected APPROVED, got {approval}"
- `steps_executed: int` - Number of nodes executed
- `total_tokens: int` - Cumulative token usage
- `total_latency_ms: int` - Total execution time
- `path: list[str]` - Node IDs traversed
- `path: list[str]` - Node IDs traversed (may contain repeated IDs from feedback loops)
- `paused_at: str | None` - Node ID if HITL pause occurred
- `session_state: dict` - State for resuming
- `node_visit_counts: dict[str, int]` - How many times each node executed (useful for feedback loop testing)
### Happy Path Test
```python
@@ -975,17 +976,68 @@ async def test_performance_latency(mock_mode):
assert duration < 5.0, f"Took {{duration}}s, expected <5s"
```
## Integration with building-agents
### Testing Event Loop Nodes
Event loop nodes run multi-turn loops internally. Tests should verify:
**Output Keys Test** — All required keys are set via `set_output`:
```python
@pytest.mark.asyncio
async def test_all_output_keys_set(mock_mode):
"""Test that event_loop nodes set all required output keys."""
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
assert result.success, f"Agent failed: {{result.error}}"
output = result.output or {{}}
for key in ["expected_key_1", "expected_key_2"]:
assert key in output, f"Output key '{{key}}' not set by event_loop node"
```
**Feedback Loop Test** — Verify feedback loops terminate:
```python
@pytest.mark.asyncio
async def test_feedback_loop_respects_max_visits(mock_mode):
"""Test that feedback loops terminate at max_node_visits."""
result = await default_agent.run({{"input": "trigger_rejection"}}, mock_mode=mock_mode)
assert result.success or result.error is not None
visits = getattr(result, "node_visit_counts", {{}}) or {{}}
for node_id, count in visits.items():
assert count <= 5, f"Node {{node_id}} visited {{count}} times"
```
**Fan-Out Test** — Verify parallel branches both complete:
```python
@pytest.mark.asyncio
async def test_parallel_branches_complete(mock_mode):
"""Test that fan-out branches all complete and produce outputs."""
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
assert result.success
output = result.output or {{}}
# Check outputs from both parallel branches
assert "branch_a_output" in output, "Branch A output missing"
assert "branch_b_output" in output, "Branch B output missing"
```
**Client-Facing Node Test** — In mock mode, client-facing nodes may not block:
```python
@pytest.mark.asyncio
async def test_client_facing_node(mock_mode):
"""Test that client-facing nodes produce output."""
result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
# In mock mode, client-facing blocking is typically bypassed
assert result.success or result.paused_at is not None
```
## Integration with hive-create
### Handoff Points
| Scenario | From | To | Action |
|----------|------|-----|--------|
| Agent built, ready to test | building-agents | testing-agent | Generate success tests |
| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild |
| IMPLEMENTATION_ERROR found | testing-agent | Direct fix | Edit agent files, re-run tests |
| EDGE_CASE found | testing-agent | testing-agent | Add edge case test |
| All tests pass | testing-agent | Done | Agent validated ✅ |
| Agent built, ready to test | hive-create | hive-test | Generate success tests |
| LOGIC_ERROR found | hive-test | hive-create | Update goal, rebuild |
| IMPLEMENTATION_ERROR found | hive-test | Direct fix | Edit agent files, re-run tests |
| EDGE_CASE found | hive-test | hive-test | Add edge case test |
| All tests pass | hive-test | Done | Agent validated ✅ |
### Iteration Speed Comparison
@@ -4,7 +4,7 @@ This example walks through testing a YouTube research agent that finds relevant
## Prerequisites
- Agent built with building-agents skill at `exports/youtube-research/`
- Agent built with hive-create skill at `exports/youtube-research/`
- Goal defined with success criteria and constraints
## Step 1: Load the Goal
@@ -283,11 +283,11 @@ result = debug_test(
Since this is an **IMPLEMENTATION_ERROR**, we:
1. **Don't restart** the Goal → Agent → Eval flow
2. **Fix the agent** using building-agents skill:
2. **Fix the agent** using hive-create skill:
- Modify `filter_node` to handle null results
3. **Re-run Eval** (tests only)
### Fix in building-agents:
### Fix in hive-create:
```python
# Update the filter_node to handle null
@@ -1,32 +1,46 @@
---
name: agent-workflow
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-* and testing-agent skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
name: hive
description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
license: Apache-2.0
metadata:
author: hive
version: "2.0"
type: workflow-orchestrator
orchestrates:
- building-agents-core
- building-agents-construction
- building-agents-patterns
- testing-agent
- setup-credentials
- hive-concepts
- hive-create
- hive-patterns
- hive-test
- hive-credentials
---
# Agent Development Workflow
**THIS IS AN EXECUTABLE WORKFLOW. DO NOT explore the codebase or read source files. ROUTE to the correct skill IMMEDIATELY.**
When this skill is loaded, determine what the user needs and invoke the appropriate skill NOW:
- **User wants to build an agent** → Invoke `/hive-create` immediately
- **User wants to test an agent** → Invoke `/hive-test` immediately
- **User wants to learn concepts** → Invoke `/hive-concepts` immediately
- **User wants patterns/optimization** → Invoke `/hive-patterns` immediately
- **User wants to set up credentials** → Invoke `/hive-credentials` immediately
- **Unclear what user needs** → Ask the user (do NOT explore the codebase to figure it out)
**DO NOT:** Read source files, explore the codebase, search for code, or do any investigation before routing. The sub-skills handle all of that.
---
Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
## Overview
This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
1. **Understand Concepts**`/building-agents-core` (optional)
2. **Build Structure**`/building-agents-construction`
3. **Optimize Design**`/building-agents-patterns` (optional)
4. **Setup Credentials**`/setup-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/testing-agent`
1. **Understand Concepts**`/hive-concepts` (optional)
2. **Build Structure**`/hive-create`
3. **Optimize Design**`/hive-patterns` (optional)
4. **Setup Credentials**`/hive-credentials` (if agent uses tools requiring API keys)
5. **Test & Validate**`/hive-test`
## When to Use This Workflow
@@ -37,17 +51,18 @@ Use this meta-skill when:
- Want consistent, repeatable agent builds
**Skip this workflow** if:
- You only need to test an existing agent → use `/testing-agent` directly
- You only need to test an existing agent → use `/hive-test` directly
- You know exactly which phase you're in → use specific skill directly
## Quick Decision Tree
```
"Need to understand agent concepts" → building-agents-core
"Build a new agent" → building-agents-construction
"Optimize my agent design" → building-agents-patterns
"Set up API keys for my agent" → setup-credentials
"Test my agent" → testing-agent
"Need to understand agent concepts" → hive-concepts
"Build a new agent" → hive-create
"Optimize my agent design" → hive-patterns
"Need client-facing nodes or feedback loops" → hive-patterns
"Set up API keys for my agent" → hive-credentials
"Test my agent" → hive-test
"Not sure what I need" → Read phases below, then decide
"Agent has structure but needs implementation" → See agent directory STATUS.md
```
@@ -55,7 +70,7 @@ Use this meta-skill when:
## Phase 0: Understand Concepts (Optional)
**Duration**: 5-10 minutes
**Skill**: `/building-agents-core`
**Skill**: `/hive-concepts`
**Input**: Questions about agent architecture
### When to Use
@@ -63,12 +78,12 @@ Use this meta-skill when:
- First time building an agent
- Need to understand node types, edges, goals
- Want to validate tool availability
- Learning about pause/resume architecture
- Learning about event loop architecture and client-facing nodes
### What This Phase Provides
- Architecture overview (Python packages, not JSON)
- Core concepts (Goal, Node, Edge, Pause/Resume)
- Core concepts (Goal, Node, Edge, Event Loop, Judges)
- Tool discovery and validation procedures
- Workflow overview
@@ -77,7 +92,7 @@ Use this meta-skill when:
## Phase 1: Build Agent Structure
**Duration**: 15-30 minutes
**Skill**: `/building-agents-construction`
**Skill**: `/hive-create`
**Input**: User requirements ("Build an agent that...")
### What This Phase Does
@@ -106,7 +121,7 @@ Creates the complete agent architecture:
- ✅ 1-5 constraints defined
- ✅ 5-10 nodes specified in nodes/__init__.py
- ✅ 8-15 edges connecting workflow
- ✅ Validated structure (passes `python -m agent_name validate`)
- ✅ Validated structure (passes `uv run python -m agent_name validate`)
- ✅ README.md with usage instructions
- ✅ CLI commands (info, validate, run, shell)
@@ -120,7 +135,7 @@ You're ready for Phase 2 when:
### Common Outputs
The building-agents-construction skill produces:
The hive-create skill produces:
```
exports/agent_name/
├── __init__.py (package exports)
@@ -140,7 +155,7 @@ exports/agent_name/
→ You may need to add Python functions or MCP tools (not covered by current skills)
**If want to optimize design:**
→ Proceed to Phase 1.5 (building-agents-patterns)
→ Proceed to Phase 1.5 (hive-patterns)
**If ready to test:**
→ Proceed to Phase 2
@@ -148,31 +163,32 @@ exports/agent_name/
## Phase 1.5: Optimize Design (Optional)
**Duration**: 10-15 minutes
**Skill**: `/building-agents-patterns`
**Skill**: `/hive-patterns`
**Input**: Completed agent structure
### When to Use
- Want to add pause/resume functionality
- Want to add client-facing blocking or feedback edges
- Need judge patterns for output validation
- Want fan-out/fan-in (parallel execution)
- Need error handling patterns
- Want to optimize performance
- Need examples of complex routing
- Want best practices guidance
### What This Phase Provides
- Practical examples and patterns
- Pause/resume architecture
- Error handling strategies
- Client-facing interaction patterns
- Feedback edge routing with nullable output keys
- Judge patterns (implicit, SchemaJudge)
- Fan-out/fan-in parallel execution
- Context management and spillover patterns
- Anti-patterns to avoid
- Performance optimization techniques
**Skip this phase** if your agent design is straightforward.
## Phase 2: Test & Validate
**Duration**: 20-40 minutes
**Skill**: `/testing-agent`
**Skill**: `/hive-test`
**Input**: Working agent from Phase 1
### What This Phase Does
@@ -249,9 +265,9 @@ You're done when:
```
User: "Build an agent that monitors files"
→ Use /building-agents-construction
→ Use /hive-create
→ Agent structure created
→ Use /testing-agent
→ Use /hive-test
→ Tests created and passing
→ Done: Production-ready agent
```
@@ -260,10 +276,10 @@ User: "Build an agent that monitors files"
```
User: "Build an agent (first time)"
→ Use /building-agents-core (understand concepts)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (optimize design)
→ Use /testing-agent (validate)
→ Use /hive-concepts (understand concepts)
→ Use /hive-create (build structure)
→ Use /hive-patterns (optimize design)
→ Use /hive-test (validate)
→ Done: Production-ready agent
```
@@ -272,7 +288,7 @@ User: "Build an agent (first time)"
```
User: "Test my agent at exports/my_agent"
→ Skip Phase 1
→ Use /testing-agent directly
→ Use /hive-test directly
→ Tests created
→ Done: Validated agent
```
@@ -281,54 +297,55 @@ User: "Test my agent at exports/my_agent"
```
User: "Build an agent"
→ Use /building-agents-construction (Phase 1)
→ Use /hive-create (Phase 1)
→ Implementation needed (see STATUS.md)
→ [User implements functions]
→ Use /testing-agent (Phase 2)
→ Use /hive-test (Phase 2)
→ Tests reveal bugs
→ [Fix bugs manually]
→ Re-run tests
→ Done: Working agent
```
### Pattern 4: Complex Agent with Patterns
### Pattern 4: Agent with Review Loops and HITL Checkpoints
```
User: "Build an agent with multi-turn conversations"
→ Use /building-agents-core (learn pause/resume)
→ Use /building-agents-construction (build structure)
→ Use /building-agents-patterns (implement pause/resume pattern)
→ Use /testing-agent (validate conversation flows)
→ Done: Complex conversational agent
User: "Build an agent with human review and feedback loops"
→ Use /hive-concepts (learn event loop, client-facing nodes)
→ Use /hive-create (build structure with feedback edges)
→ Use /hive-patterns (implement client-facing + feedback patterns)
→ Use /hive-test (validate review flows and edge routing)
→ Done: Agent with HITL checkpoints and review loops
```
## Skill Dependencies
```
agent-workflow (meta-skill)
hive (meta-skill)
├── building-agents-core (foundational)
│ ├── Architecture concepts
│ ├── Node/Edge/Goal definitions
├── hive-concepts (foundational)
│ ├── Architecture concepts (event loop, judges)
│ ├── Node types (event_loop, function)
│ ├── Edge routing and priority
│ ├── Tool discovery procedures
│ └── Workflow overview
├── building-agents-construction (procedural)
├── hive-create (procedural)
│ ├── Creates package structure
│ ├── Defines goal
│ ├── Adds nodes incrementally
│ ├── Connects edges
│ ├── Adds nodes (event_loop, function)
│ ├── Connects edges with priority routing
│ ├── Finalizes agent class
│ └── Requires: building-agents-core
│ └── Requires: hive-concepts
├── building-agents-patterns (reference)
│ ├── Best practices
│ ├── Pause/resume patterns
│ ├── Error handling
│ ├── Anti-patterns
│ └── Performance optimization
├── hive-patterns (reference)
│ ├── Client-facing interaction patterns
│ ├── Feedback edges and review loops
│ ├── Judge patterns (implicit, SchemaJudge)
│ ├── Fan-out/fan-in parallel execution
│ └── Context management and anti-patterns
└── testing-agent
└── hive-test
├── Reads agent goal
├── Generates tests
├── Runs evaluation
@@ -342,13 +359,13 @@ agent-workflow (meta-skill)
- Check node IDs match between nodes/__init__.py and agent.py
- Verify all edges reference valid node IDs
- Ensure entry_node exists in nodes list
- Run: `PYTHONPATH=core:exports python -m agent_name validate`
- Run: `PYTHONPATH=exports uv run python -m agent_name validate`
### "Agent has structure but won't run"
- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
- Implementation may be needed (Python functions or MCP tools)
- This is expected - building-agents-construction creates structure, not implementation
- This is expected - hive-create creates structure, not implementation
- See implementation guide for completion options
### "Tests are failing"
@@ -356,7 +373,7 @@ agent-workflow (meta-skill)
- Review test output for specific failures
- Check agent goal and success criteria
- Verify constraints are met
- Use `/testing-agent` to debug and iterate
- Use `/hive-test` to debug and iterate
- Fix agent code and re-run tests
### "Not sure which phase I'm in"
@@ -368,7 +385,7 @@ Run these checks:
ls exports/my_agent/agent.py
# Check if it validates
PYTHONPATH=core:exports python -m my_agent validate
PYTHONPATH=exports uv run python -m my_agent validate
# Check if tests exist
ls exports/my_agent/tests/
@@ -417,10 +434,10 @@ You're done with the workflow when:
## Additional Resources
- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
- **hive-concepts**: See `.claude/skills/hive-concepts/SKILL.md`
- **hive-create**: See `.claude/skills/hive-create/SKILL.md`
- **hive-patterns**: See `.claude/skills/hive-patterns/SKILL.md`
- **hive-test**: See `.claude/skills/hive-test/SKILL.md`
- **Agent framework docs**: See `core/README.md`
- **Example agents**: See `exports/` directory
@@ -428,36 +445,36 @@ You're done with the workflow when:
This workflow provides a proven path from concept to production-ready agent:
1. **Learn** with `/building-agents-core` → Understand fundamentals (optional)
2. **Build** with `/building-agents-construction` → Get validated structure
3. **Optimize** with `/building-agents-patterns` → Apply best practices (optional)
4. **Test** with `/testing-agent` → Get verified functionality
1. **Learn** with `/hive-concepts` → Understand fundamentals (optional)
2. **Build** with `/hive-create` → Get validated structure
3. **Optimize** with `/hive-patterns` → Apply best practices (optional)
4. **Test** with `/hive-test` → Get verified functionality
The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
## Skill Selection Guide
**Choose building-agents-core when:**
**Choose hive-concepts when:**
- First time building agents
- Need to understand architecture
- Need to understand event loop architecture
- Validating tool availability
- Learning about node types and edges
- Learning about node types, edges, and judges
**Choose building-agents-construction when:**
**Choose hive-create when:**
- Actually building an agent
- Have clear requirements
- Ready to write code
- Want step-by-step guidance
**Choose building-agents-patterns when:**
**Choose hive-patterns when:**
- Agent structure complete
- Need advanced patterns
- Implementing pause/resume
- Optimizing performance
- Need client-facing nodes or feedback edges
- Implementing review loops or fan-out/fan-in
- Want judge patterns or context management
- Want best practices
**Choose testing-agent when:**
**Choose hive-test when:**
- Agent structure complete
- Ready to validate functionality
- Need comprehensive test coverage
- Debugging agent behavior
- Testing feedback loops, output keys, or fan-out
@@ -1,6 +1,6 @@
# Example: File Monitor Agent
This example shows the complete agent-workflow in action for building a file monitoring agent.
This example shows the complete /hive workflow in action for building a file monitoring agent.
## Initial Request
@@ -12,7 +12,7 @@ User: "Build an agent that monitors ~/Downloads and copies new files to ~/Docume
### Step 1: Create Structure
Agent invokes `/building-agents` skill and:
Agent invokes `/hive-create` skill and:
1. Creates `exports/file_monitor_agent/` package
2. Writes skeleton files (__init__.py, __main__.py, agent.py, etc.)
@@ -75,10 +75,10 @@ initialize → list → identify → check
### Step 5: Finalize
```bash
$ PYTHONPATH=core:exports python -m file_monitor_agent validate
$ PYTHONPATH=exports uv run python -m file_monitor_agent validate
✓ Agent is valid
$ PYTHONPATH=core:exports python -m file_monitor_agent info
$ PYTHONPATH=exports uv run python -m file_monitor_agent info
Agent: File Monitor & Copy Agent
Nodes: 7
Edges: 8
@@ -107,7 +107,7 @@ exports/file_monitor_agent/
### Step 1: Analyze Agent
Agent invokes `/testing-agent` skill and:
Agent invokes `/hive-test` skill and:
1. Reads goal from `exports/file_monitor_agent/agent.py`
2. Identifies 4 success criteria to test
@@ -131,7 +131,7 @@ Tests approved incrementally by user.
### Step 3: Run Tests
```bash
$ PYTHONPATH=core:exports pytest exports/file_monitor_agent/tests/
$ PYTHONPATH=exports uv run pytest exports/file_monitor_agent/tests/
test_constraints.py::test_preserves_originals PASSED
test_constraints.py::test_handles_errors PASSED
@@ -162,7 +162,7 @@ test_edge_cases.py::test_large_files PASSED
./RUN_AGENT.sh
# Or manually
PYTHONPATH=core:exports:tools/src python -m file_monitor_agent run
PYTHONPATH=exports uv run python -m file_monitor_agent run
```
**Capabilities:**
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/agent-workflow
@@ -1 +0,0 @@
../../.claude/skills/building-agents-construction
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-core
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/building-agents-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
-1
View File
@@ -1 +0,0 @@
../../.claude/skills/testing-agent
+3 -2
View File
@@ -1,9 +1,10 @@
---
name: Bug Report
about: Report a bug to help us improve
title: '[Bug]: '
labels: bug
title: "[Bug]: "
labels: bug, enhancement
assignees: ''
---
## Describe the Bug
+2 -1
View File
@@ -1,9 +1,10 @@
---
name: Feature Request
about: Suggest a new feature or enhancement
title: '[Feature]: '
title: "[Feature]: "
labels: enhancement
assignees: ''
---
## Problem Statement
@@ -0,0 +1,71 @@
---
name: Integration Request
about: Suggest a new integration
title: "[Integration]:"
labels: ''
assignees: ''
---
## Service
Name and brief description of the service and what it enables agents to do.
**Description:** [e.g., "API key for Slack Bot" — short one-liner for the credential spec]
## Credential Identity
- **credential_id:** [e.g., `slack`]
- **env_var:** [e.g., `SLACK_BOT_TOKEN`]
- **credential_key:** [e.g., `access_token`, `api_key`, `bot_token`]
## Tools
Tool function names that require this credential:
- [e.g., `slack_send_message`]
- [e.g., `slack_list_channels`]
## Auth Methods
- **Direct API key supported:** Yes / No
- **Aden OAuth supported:** Yes / No
If Aden OAuth is supported, describe the OAuth scopes/permissions required.
## How to Get the Credential
Link where users obtain the key/token:
[e.g., https://api.slack.com/apps]
Step-by-step instructions:
1. Go to ...
2. Create a ...
3. Select scopes/permissions: ...
4. Copy the key/token
## Health Check
A lightweight API call to validate the credential (no writes, no charges).
- **Endpoint:** [e.g., `https://slack.com/api/auth.test`]
- **Method:** [e.g., `GET` or `POST`]
- **Auth header:** [e.g., `Authorization: Bearer {token}` or `X-Api-Key: {key}`]
- **Parameters (if any):** [e.g., `?limit=1`]
- **200 means:** [e.g., key is valid]
- **401 means:** [e.g., invalid or expired]
- **429 means:** [e.g., rate limited but key is valid]
## Credential Group
Does this require multiple credentials configured together? (e.g., Google Custom Search needs
both an API key and a CSE ID)
- [ ] No, single credential
- [ ] Yes — list the other credential IDs in the group:
## Additional Context
Links to API docs, rate limits, free tier availability, or anything else relevant.
+42 -25
View File
@@ -21,23 +21,22 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
run: uv sync --project core --group dev
- name: Ruff lint
run: |
ruff check core/
ruff check tools/
uv run --project core ruff check core/
uv run --project core ruff check tools/
- name: Ruff format
run: |
ruff format --check core/
ruff format --check tools/
uv run --project core ruff format --check core/
uv run --project core ruff format --check tools/
test:
name: Test Python Framework
@@ -52,23 +51,19 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies and run tests
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
uv sync
uv run pytest tests/ -v
- name: Run tests
run: |
cd core
pytest tests/ -v
validate:
name: Validate Agent Exports
test-tools:
name: Test Tools
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
@@ -76,13 +71,35 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies and run tests
run: |
cd tools
uv sync --extra dev
uv run pytest tests/ -v
validate:
name: Validate Agent Exports
runs-on: ubuntu-latest
needs: [lint, test, test-tools]
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
uv sync
- name: Validate exported agents
run: |
@@ -105,7 +122,7 @@ jobs:
for agent_dir in "${agent_dirs[@]}"; do
if [ -f "$agent_dir/agent.json" ]; then
echo "Validating $agent_dir"
python -c "import json; json.load(open('$agent_dir/agent.json'))"
uv run python -c "import json; json.load(open('$agent_dir/agent.json'))"
validated=$((validated + 1))
fi
done
+7 -1
View File
@@ -80,7 +80,13 @@ jobs:
- help wanted: Extra attention is needed (if issue needs community input)
- backlog: Tracked for the future, but not currently planned or prioritized
You may apply multiple labels if appropriate (e.g., "bug" and "help wanted").
### 6. Estimate size (if NOT a duplicate, spam, or invalid)
Apply exactly ONE size label to help contributors match their capacity to the task:
- "size: small": Docs, typos, single-file fixes, config changes
- "size: medium": Bug fixes with tests, adding a single tool, changes within one package
- "size: large": Cross-package changes (core + tools), new modules, complex logic, architectural refactors
You may apply multiple labels if appropriate (e.g., "bug", "size: small", and "good first issue").
## Tools Available:
- mcp__github__get_issue: Get issue details
+5 -4
View File
@@ -21,18 +21,19 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: |
cd core
pip install -e .
pip install -r requirements-dev.txt
uv sync
- name: Run tests
run: |
cd core
pytest tests/ -v
uv run pytest tests/ -v
- name: Generate changelog
id: changelog
+6 -2
View File
@@ -54,7 +54,6 @@ __pycache__/
*.egg-info/
.eggs/
*.egg
uv.lock
# Generated runtime data
core/data/
@@ -69,4 +68,9 @@ exports/*
.agent-builder-sessions/*
.venv
.claude/settings.local.json
.venv
docs/github-issues/*
core/tests/*dumps/*
+6 -12
View File
@@ -1,20 +1,14 @@
{
"mcpServers": {
"agent-builder": {
"command": ".venv/bin/python",
"args": ["-m", "framework.mcp.agent_builder_server"],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": ".venv/bin/python",
"args": ["mcp_server.py", "--stdio"],
"cwd": "tools",
"env": {
"PYTHONPATH": "src:../core"
}
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
+1 -1
View File
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.6
rev: v0.15.0
hooks:
- id: ruff
name: ruff lint (core)
+12 -5
View File
@@ -35,9 +35,16 @@ You may submit PRs without prior assignment for:
1. Fork the repository
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/hive.git`
3. Create a feature branch: `git checkout -b feature/your-feature-name`
4. Make your changes
5. Run checks and tests:
3. Add the upstream repository: `git remote add upstream https://github.com/adenhq/hive.git`
4. Sync with upstream to ensure you're starting from the latest code:
```bash
git fetch upstream
git checkout main
git merge upstream/main
```
5. Create a feature branch: `git checkout -b feature/your-feature-name`
6. Make your changes
7. Run checks and tests:
```bash
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
make test # Core tests (cd core && pytest tests/ -v)
@@ -125,7 +132,7 @@ feat(component): add new feature description
> **Note:** When testing agents in `exports/`, always set PYTHONPATH:
>
> ```bash
> PYTHONPATH=core:exports python -m agent_name test
> PYTHONPATH=exports uv run python -m agent_name test
> ```
```bash
@@ -139,7 +146,7 @@ make test
cd core && pytest tests/ -v
# Run tests for a specific agent
PYTHONPATH=core:exports python -m agent_name test
PYTHONPATH=exports uv run python -m agent_name test
```
> **CI also validates** that all exported agent JSON files (`exports/*/agent.json`) are well-formed JSON. Ensure your agent exports are valid before submitting.
+5 -3
View File
@@ -4,9 +4,11 @@ help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter (with auto-fix)
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && ruff check --fix .
cd tools && ruff check --fix .
cd core && ruff format .
cd tools && ruff format .
format: ## Run ruff formatter
cd core && ruff format .
@@ -19,8 +21,8 @@ check: ## Run all checks without modifying files (CI-safe)
cd tools && ruff format --check .
test: ## Run all tests
cd core && python -m pytest tests/ -v
cd core && uv run python -m pytest tests/ -v
install-hooks: ## Install pre-commit hooks
pip install pre-commit
uv pip install pre-commit
pre-commit install
-51
View File
@@ -1,51 +0,0 @@
## Summary
- **Added HubSpot integration** — new HubSpot MCP tool with search, get, create, and update operations for contacts, companies, and deals. Includes OAuth2 provider for HubSpot credentials and credential store adapter for the tools layer.
- **Replaced web_scrape tool with Playwright + stealth** — swapped httpx/BeautifulSoup for a headless Chromium browser using `playwright` (async API) and `playwright-stealth`, enabling JS-rendered page scraping and bot detection evasion
- **Added empty response retry logic** — LLM provider now detects empty responses (e.g. Gemini returning 200 with no content on rate limit) and retries with exponential backoff, preventing hallucinated output from the cleanup LLM
- **Added context-aware input compaction** — LLM nodes now estimate input token count before calling the model and progressively truncate the largest values if they exceed the context window budget
- **Increased rate limit retries to 10** with verbose `[retry]` and `[compaction]` logging that includes model name, finish reason, and attempt count
- **Updated setup scripts** — `scripts/setup-python.sh` now installs Playwright Chromium browser automatically for web scraping support
- **Interactive quickstart onboarding** — `quickstart.sh` rewritten as bee-themed interactive wizard that detects existing API keys (including Claude Code subscription), lets user pick ONE default LLM provider, and saves configuration to `~/.hive/configuration.json`
- **Fixed lint errors** across `hubspot_tool.py` (line length) and `agent_builder_server.py` (unused variable)
## Changed files
### HubSpot Integration
- `tools/src/aden_tools/tools/hubspot_tool/` — New MCP tool: contacts, companies, and deals CRUD
- `tools/src/aden_tools/tools/__init__.py` — Registered HubSpot tools
- `tools/src/aden_tools/credentials/integrations.py` — HubSpot credential integration
- `tools/src/aden_tools/credentials/__init__.py` — Updated credential exports
- `core/framework/credentials/oauth2/hubspot_provider.py` — HubSpot OAuth2 provider
- `core/framework/credentials/oauth2/__init__.py` — Registered HubSpot OAuth2 provider
- `core/framework/runner/runner.py` — Updated runner for credential support
### Web Scrape Rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/web_scrape_tool.py` — Playwright async rewrite
- `tools/src/aden_tools/tools/web_scrape_tool/README.md` — Updated docs
- `tools/pyproject.toml` — Added `playwright`, `playwright-stealth` deps
- `tools/Dockerfile` — Added `playwright install chromium --with-deps`
- `scripts/setup-python.sh` — Added Playwright Chromium browser install step
### LLM Reliability
- `core/framework/llm/litellm.py` — Empty response retry + max retries 10 + verbose logging
- `core/framework/graph/node.py` — Input compaction via `_compact_inputs()`, `_estimate_tokens()`, `_get_context_limit()`
### Quickstart & Setup
- `quickstart.sh` — Interactive bee-themed onboarding wizard with single provider selection
- `~/.hive/configuration.json` — New user config file for default LLM provider/model
### Fixes
- `core/framework/mcp/agent_builder_server.py` — Removed unused variable
- `tools/src/aden_tools/tools/hubspot_tool/hubspot_tool.py` — Fixed E501 line length violations
## Test plan
- [ ] Run `make lint` — passes clean
- [ ] Run `./quickstart.sh` and verify interactive flow works, config saved to `~/.hive/configuration.json`
- [ ] Run `./scripts/setup-python.sh` and verify Playwright Chromium installs
- [ ] Run `pytest tests/tools/test_web_scrape_tool.py -v`
- [ ] Run agent against a JS-heavy site and verify `web_scrape` returns rendered content
- [ ] Set `HUBSPOT_ACCESS_TOKEN` and verify HubSpot tool CRUD operations work
- [ ] Trigger rate limit and verify `[retry]` logs appear with correct attempt counts
- [ ] Run agent with large inputs and verify `[compaction]` logs show truncation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
+66 -34
View File
@@ -15,7 +15,6 @@
[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/adenhq/hive/blob/main/LICENSE)
[![Y Combinator](https://img.shields.io/badge/Y%20Combinator-Aden-orange)](https://www.ycombinator.com/companies/aden)
[![Docker Pulls](https://img.shields.io/docker/pulls/adenhq/hive?logo=Docker&labelColor=%23528bff)](https://hub.docker.com/u/adenhq)
[![Discord](https://img.shields.io/discord/1172610340073242735?logo=discord&labelColor=%235462eb&logoColor=%23f5f5f5&color=%235462eb)](https://discord.com/invite/MXE49hrKDk)
[![Twitter Follow](https://img.shields.io/twitter/follow/teamaden?logo=X&color=%23f5f5f5)](https://x.com/aden_hq)
[![LinkedIn](https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff)](https://www.linkedin.com/company/teamaden/)
@@ -40,6 +39,30 @@ Build reliable, self-improving AI agents without hardcoding workflows. Define yo
Visit [adenhq.com](https://adenhq.com) for complete documentation, examples, and guides.
## Who Is Hive For?
Hive is designed for developers and teams who want to build **production-grade AI agents** without manually wiring complex workflows.
Hive is a good fit if you:
- Want AI agents that **execute real business processes**, not demos
- Prefer **goal-driven development** over hardcoded workflows
- Need **self-healing and adaptive agents** that improve over time
- Require **human-in-the-loop control**, observability, and cost limits
- Plan to run agents in **production environments**
Hive may not be the best fit if youre only experimenting with simple agent chains or one-off scripts.
## When Should You Use Hive?
Use Hive when you need:
- Long-running, autonomous agents
- Multi-agent coordination
- Continuous improvement based on failures
- Strong monitoring, safety, and budget controls
- A framework that evolves with your goals
## What is Aden
<p align="center">
@@ -64,11 +87,13 @@ Aden is a platform for building, deploying, operating, and adapting AI agents:
## Quick Start
### Prerequisites
## Prerequisites
- [Python 3.11+](https://www.python.org/downloads/) for agent development
- Python 3.11+ for agent development
- Claude Code or Cursor for utilizing agent skills
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
### Installation
```bash
@@ -81,6 +106,7 @@ cd hive
```
This sets up:
- **framework** - Core agent runtime and graph executor (in `core/.venv`)
- **aden_tools** - MCP tools for agent capabilities (in `tools/.venv`)
- All required Python dependencies
@@ -89,16 +115,16 @@ This sets up:
```bash
# Build an agent using Claude Code
claude> /building-agents-construction
claude> /hive
# Test your agent
claude> /testing-agent
claude> /hive-test
# Run your agent
PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
```
**[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Cursor IDE Support
@@ -107,22 +133,22 @@ Skills are also available in Cursor. To enable:
1. Open Command Palette (`Cmd+Shift+P` / `Ctrl+Shift+P`)
2. Run `MCP: Enable` to enable MCP servers
3. Restart Cursor to load the MCP servers from `.cursor/mcp.json`
4. Type `/` in Agent chat and search for skills (e.g., `/building-agents-construction`)
4. Type `/` in Agent chat and search for skills (e.g., `/hive-create`)
## Features
- **Goal-Driven Development** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **Adaptiveness** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **Dynamic Node Connections** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
- **SDK-Wrapped Nodes** - Every node gets shared memory, local RLM memory, monitoring, tools, and LLM access out of the box
- **Human-in-the-Loop** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **[Human-in-the-Loop](docs/key_concepts/graph.md#human-in-the-loop)** - Intervention nodes that pause execution for human input with configurable timeouts and escalation
- **Real-time Observability** - WebSocket streaming for live monitoring of agent execution, decisions, and node-to-node communication
- **Cost & Budget Control** - Set spending limits, throttles, and automatic model degradation policies
- **Production-Ready** - Self-hostable, built for scale and reliability
## Why Aden
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe [outcomes](docs/key_concepts/goals_outcome.md), and the system builds itself**—delivering an outcome-driven, [adaptive](docs/key_concepts/evolution.md) experience with an easy-to-use set of tools and integrations.
```mermaid
flowchart LR
@@ -162,27 +188,28 @@ flowchart LR
| -------------------------- | -------------------------------------- |
| Hardcode agent workflows | Describe goals in natural language |
| Manual graph definition | Auto-generated agent graphs |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Reactive error handling | Outcome-evaluation and adaptiveness |
| Static tool configurations | Dynamic SDK-wrapped nodes |
| Separate monitoring setup | Built-in real-time observability |
| DIY budget management | Integrated cost controls & degradation |
### How It Works
1. **Define Your Goal** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the agent graph, connection code, and test cases
3. **Workers Execute** → SDK-wrapped nodes run with full observability and tool access
1. **[Define Your Goal](docs/key_concepts/goals_outcome.md)** → Describe what you want to achieve in plain English
2. **Coding Agent Generates** → Creates the [agent graph](docs/key_concepts/graph.md), connection code, and test cases
3. **[Workers Execute](docs/key_concepts/worker_agent.md)** → SDK-wrapped nodes run with full observability and tool access
4. **Control Plane Monitors** → Real-time metrics, budget enforcement, policy management
5. **Adaptiveness** → On failure, the system evolves the graph and redeploys automatically
5. **[Adaptiveness](docs/key_concepts/evolution.md)** → On failure, the system evolves the graph and redeploys automatically
## Run pre-built Agents (Coming Soon)
### Run a sample agent
Aden Hive provides a list of featured agents that you can use and build on top of.
### Run an agent shared by others
Put the agent in `exports/` and run `PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'`
Put the agent in `exports/` and run `PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'`
For building and running goal-driven agents with the framework:
@@ -195,28 +222,32 @@ For building and running goal-driven agents with the framework:
# - aden_tools package (MCP tools)
# - All Python dependencies
# Build new agents using Claude Code skills
claude> /building-agents-construction
# Test agents
claude> /testing-agent
# Build new agents using Agent Skills
claude> /hive
# Run agents
PYTHONPATH=core:exports python -m agent_name run --input '{...}'
PYTHONPATH=exports uv run python -m agent_name run --input '{...}'
```
See [ENVIRONMENT_SETUP.md](ENVIRONMENT_SETUP.md) for complete setup instructions.
See [environment-setup.md](docs/environment-setup.md) for complete setup instructions.
## Documentation
- **[Developer Guide](DEVELOPER.md)** - Comprehensive guide for developers
- **[Developer Guide](docs/developer-guide.md)** - Comprehensive guide for developers
- [Getting Started](docs/getting-started.md) - Quick setup instructions
- [Configuration Guide](docs/configuration.md) - All configuration options
- [Architecture Overview](docs/architecture/README.md) - System design and structure
### Key Concepts
- [Goals & Outcome-Driven Development](docs/key_concepts/goals_outcome.md) - Why Hive is outcome-driven and how goals define success
- [The Agent Graph](docs/key_concepts/graph.md) - Nodes, edges, shared memory, and how agents execute
- [The Worker Agent](docs/key_concepts/worker_agent.md) - Sessions, iterations, headless execution, and the runtime
- [Evolution](docs/key_concepts/evolution.md) - How agents improve across generations through failure data
## Roadmap
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [ROADMAP.md](ROADMAP.md) for details.
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TD
@@ -306,11 +337,12 @@ end
classDef done fill:#9e9e9e,color:#fff,stroke:#757575
```
## Contributing
We welcome contributions from the community! Were especially looking for help building tools, integrations, and example agents for the framework ([check #2805](https://github.com/adenhq/hive/issues/2805)). If youre interested in extending its functionality, this is the perfect place to start. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
**Important:** Please get assigned to an issue before submitting a PR. Comment on an issue to claim it, and a maintainer will assign you. Issues with reproducible steps and proposals are prioritized. This helps prevent duplicate work.
1. Find or create an issue and get assigned
2. Fork the repository
@@ -357,7 +389,7 @@ Yes! Hive supports local models through LiteLLM. Simply use the model name forma
**Q: What makes Hive different from other agent frameworks?**
Hive generates your entire agent system from natural language goals using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, evolves the agent graph, and redeploys. This self-improving loop is unique to Aden.
Hive generates your entire agent system from natural language [goals](docs/key_concepts/goals_outcome.md) using a coding agent—you don't hardcode workflows or manually define graphs. When agents fail, the framework automatically captures failure data, [evolves the agent graph](docs/key_concepts/evolution.md), and redeploys. This self-improving loop is unique to Aden.
**Q: Is Hive open-source?**
@@ -369,7 +401,7 @@ Hive collects telemetry data for monitoring and observability purposes, includin
**Q: What deployment options does Hive support?**
Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](ENVIRONMENT_SETUP.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
Hive supports self-hosted deployments via Python packages. See the [Environment Setup Guide](docs/environment-setup.md) for installation instructions. Cloud deployment options and Kubernetes-ready configurations are on the roadmap.
**Q: Can Hive handle complex, production-scale use cases?**
@@ -377,7 +409,7 @@ Yes. Hive is explicitly designed for production environments with features like
**Q: Does Hive support human-in-the-loop workflows?**
Yes, Hive fully supports human-in-the-loop workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
Yes, Hive fully supports [human-in-the-loop](docs/key_concepts/graph.md#human-in-the-loop) workflows through intervention nodes that pause execution for human input. These include configurable timeouts and escalation policies, allowing seamless collaboration between human experts and AI agents.
**Q: What monitoring and debugging tools does Hive provide?**
@@ -397,7 +429,7 @@ Hive provides granular budget controls including spending limits, throttles, and
**Q: Where can I find examples and documentation?**
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [DEVELOPER.md](DEVELOPER.md) guide.
Visit [docs.adenhq.com](https://docs.adenhq.com/) for complete guides, API reference, and getting started tutorials. The repository also includes documentation in the `docs/` folder and a comprehensive [developer guide](docs/developer-guide.md).
**Q: How can I contribute to Aden?**
@@ -405,7 +437,7 @@ Contributions are welcome! Fork the repository, create your feature branch, impl
**Q: When will my team start seeing results from Aden's adaptive agents?**
Aden's adaptation loop begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your goal definitions, and the volume of executions generating feedback.
Aden's [adaptation loop](docs/key_concepts/evolution.md) begins working from the first execution. When an agent fails, the framework captures the failure data, helping developers evolve the agent graph through the coding agent. How quickly this translates to measurable results depends on the complexity of your use case, the quality of your [goal definitions](docs/key_concepts/goals_outcome.md), and the volume of executions generating feedback.
**Q: How does Hive compare to other agent frameworks?**
+11 -11
View File
@@ -14,7 +14,7 @@ Framework provides a runtime framework that captures **decisions**, not just act
## Installation
```bash
pip install -e .
uv pip install -e .
```
## MCP Server Setup
@@ -45,13 +45,13 @@ If you prefer manual setup:
```bash
# Install framework
pip install -e .
uv pip install -e .
# Install MCP dependencies
pip install mcp fastmcp
uv pip install mcp fastmcp
# Test the server
python -m framework.mcp.agent_builder_server
uv run python -m framework.mcp.agent_builder_server
```
### Using with MCP Clients
@@ -86,13 +86,13 @@ Run an LLM-powered calculator:
```bash
# Single calculation
python -m framework calculate "2 + 3 * 4"
uv run python -m framework calculate "2 + 3 * 4"
# Interactive mode
python -m framework interactive
uv run python -m framework interactive
# Analyze runs with Builder
python -m framework analyze calculator
uv run python -m framework analyze calculator
```
### Using the Runtime
@@ -136,16 +136,16 @@ Tests are generated using MCP tools (`generate_constraint_tests`, `generate_succ
```bash
# Run tests against an agent
python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
uv run python -m framework test-run <agent_path> --goal <goal_id> --parallel 4
# Debug failed tests
python -m framework test-debug <agent_path> <test_name>
uv run python -m framework test-debug <agent_path> <test_name>
# List tests for a goal
python -m framework test-list <goal_id>
uv run python -m framework test-list <goal_id>
```
For detailed testing workflows, see the [testing-agent skill](../.claude/skills/testing-agent/SKILL.md).
For detailed testing workflows, see the [hive-test skill](../.claude/skills/hive-test/SKILL.md).
### Analyzing Agent Behavior with Builder
+740
View File
@@ -0,0 +1,740 @@
#!/usr/bin/env python3
"""
EventLoopNode WebSocket Demo
Real LLM, real FileConversationStore, real EventBus.
Streams EventLoopNode execution to a browser via WebSocket.
Usage:
cd /home/timothy/oss/hive/core
python demos/event_loop_wss_demo.py
Then open http://localhost:8765 in your browser.
"""
import asyncio
import json
import logging
import sys
import tempfile
from http import HTTPStatus
from pathlib import Path
import httpx
import websockets
from bs4 import BeautifulSoup
from websockets.http11 import Request, Response
# Add core, tools, and hive root to path
_CORE_DIR = Path(__file__).resolve().parent.parent
_HIVE_DIR = _CORE_DIR.parent
sys.path.insert(0, str(_CORE_DIR)) # framework.*
sys.path.insert(0, str(_HIVE_DIR / "tools" / "src")) # aden_tools.*
sys.path.insert(0, str(_HIVE_DIR)) # core.framework.* (for aden_tools imports)
import os # noqa: E402
from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter # noqa: E402
from core.framework.credentials import CredentialStore # noqa: E402
from framework.credentials.storage import ( # noqa: E402
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
from framework.graph.event_loop_node import EventLoopNode, LoopConfig # noqa: E402
from framework.graph.node import NodeContext, NodeSpec, SharedMemory # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.llm.provider import Tool # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
from framework.runtime.event_bus import EventBus, EventType # noqa: E402
from framework.storage.conversation_store import FileConversationStore # noqa: E402
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
logger = logging.getLogger("demo")
# -------------------------------------------------------------------------
# Persistent state (shared across WebSocket connections)
# -------------------------------------------------------------------------
STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_demo_"))
STORE = FileConversationStore(STORE_DIR / "conversation")
RUNTIME = Runtime(STORE_DIR / "runtime")
LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
# -------------------------------------------------------------------------
# Tool Registry — real tools via ToolRegistry (same pattern as GraphExecutor)
# -------------------------------------------------------------------------
TOOL_REGISTRY = ToolRegistry()
# Credential store: Aden sync (OAuth2 tokens) + encrypted files + env var fallback
_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
_local_storage = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
)
if os.environ.get("ADEN_API_KEY"):
try:
from framework.credentials.aden import ( # noqa: E402
AdenCachedStorage,
AdenClientConfig,
AdenCredentialClient,
AdenSyncProvider,
)
_client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
_provider = AdenSyncProvider(client=_client)
_storage = AdenCachedStorage(
local_storage=_local_storage,
aden_provider=_provider,
)
_cred_store = CredentialStore(storage=_storage, providers=[_provider], auto_refresh=True)
_synced = _provider.sync_all(_cred_store)
logger.info("Synced %d credentials from Aden", _synced)
except Exception as e:
logger.warning("Aden sync unavailable: %s", e)
_cred_store = CredentialStore(storage=_local_storage)
else:
logger.info("ADEN_API_KEY not set, using local credential storage")
_cred_store = CredentialStore(storage=_local_storage)
CREDENTIALS = CredentialStoreAdapter(_cred_store)
# Debug: log which credentials resolved
for _name in ["brave_search", "hubspot", "anthropic"]:
_val = CREDENTIALS.get(_name)
if _val:
logger.debug("credential %s: OK (len=%d)", _name, len(_val))
else:
logger.debug("credential %s: not found", _name)
# --- web_search (Brave Search API) ---
TOOL_REGISTRY.register(
name="web_search",
tool=Tool(
name="web_search",
description=(
"Search the web for current information. "
"Returns titles, URLs, and snippets from search results."
),
parameters={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query (1-500 characters)",
},
"num_results": {
"type": "integer",
"description": "Number of results to return (1-20, default 10)",
},
},
"required": ["query"],
},
),
executor=lambda inputs: _exec_web_search(inputs),
)
def _exec_web_search(inputs: dict) -> dict:
api_key = CREDENTIALS.get("brave_search")
if not api_key:
return {"error": "brave_search credential not configured"}
query = inputs.get("query", "")
num_results = min(inputs.get("num_results", 10), 20)
resp = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": num_results},
headers={"X-Subscription-Token": api_key, "Accept": "application/json"},
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"Brave API HTTP {resp.status_code}"}
data = resp.json()
results = [
{
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("description", ""),
}
for item in data.get("web", {}).get("results", [])[:num_results]
]
return {"query": query, "results": results, "total": len(results)}
# --- web_scrape (httpx + BeautifulSoup, no playwright for sync compat) ---
TOOL_REGISTRY.register(
name="web_scrape",
tool=Tool(
name="web_scrape",
description=(
"Scrape and extract text content from a webpage URL. "
"Returns the page title and main text content."
),
parameters={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the webpage to scrape",
},
"max_length": {
"type": "integer",
"description": "Maximum text length (default 50000)",
},
},
"required": ["url"],
},
),
executor=lambda inputs: _exec_web_scrape(inputs),
)
_SCRAPE_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
}
def _exec_web_scrape(inputs: dict) -> dict:
url = inputs.get("url", "")
max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
if not url.startswith(("http://", "https://")):
url = "https://" + url
try:
resp = httpx.get(url, timeout=30.0, follow_redirects=True, headers=_SCRAPE_HEADERS)
if resp.status_code != 200:
return {"error": f"HTTP {resp.status_code}"}
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
tag.decompose()
title = soup.title.get_text(strip=True) if soup.title else ""
main = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find("body")
)
text = main.get_text(separator=" ", strip=True) if main else ""
text = " ".join(text.split())
if len(text) > max_length:
text = text[:max_length] + "..."
return {"url": url, "title": title, "content": text, "length": len(text)}
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"Scrape failed: {e}"}
# --- HubSpot CRM tools (optional, requires HUBSPOT_ACCESS_TOKEN) ---
_HUBSPOT_API = "https://api.hubapi.com"
def _hubspot_headers() -> dict | None:
token = CREDENTIALS.get("hubspot")
if token:
logger.debug("HubSpot token: %s...%s (len=%d)", token[:8], token[-4:], len(token))
else:
logger.debug("HubSpot token: not found")
if not token:
return None
return {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
}
def _exec_hubspot_search(inputs: dict) -> dict:
headers = _hubspot_headers()
if not headers:
return {"error": "HUBSPOT_ACCESS_TOKEN not set"}
object_type = inputs.get("object_type", "contacts")
query = inputs.get("query", "")
limit = min(inputs.get("limit", 10), 100)
body: dict = {"limit": limit}
if query:
body["query"] = query
try:
resp = httpx.post(
f"{_HUBSPOT_API}/crm/v3/objects/{object_type}/search",
headers=headers,
json=body,
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"HubSpot API HTTP {resp.status_code}: {resp.text[:200]}"}
return resp.json()
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"HubSpot error: {e}"}
TOOL_REGISTRY.register(
name="hubspot_search",
tool=Tool(
name="hubspot_search",
description=(
"Search HubSpot CRM objects (contacts, companies, or deals). "
"Returns matching records with their properties."
),
parameters={
"type": "object",
"properties": {
"object_type": {
"type": "string",
"description": "CRM object type: 'contacts', 'companies', or 'deals'",
},
"query": {
"type": "string",
"description": "Search query (name, email, domain, etc.)",
},
"limit": {
"type": "integer",
"description": "Max results (1-100, default 10)",
},
},
"required": ["object_type"],
},
),
executor=lambda inputs: _exec_hubspot_search(inputs),
)
logger.info(
"ToolRegistry loaded: %s",
", ".join(TOOL_REGISTRY.get_registered_names()),
)
# -------------------------------------------------------------------------
# HTML page (embedded)
# -------------------------------------------------------------------------
HTML_PAGE = ( # noqa: E501
"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>EventLoopNode Live Demo</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: 'SF Mono', 'Fira Code', monospace;
background: #0d1117; color: #c9d1d9;
height: 100vh; display: flex; flex-direction: column;
}
header {
background: #161b22; padding: 12px 20px;
border-bottom: 1px solid #30363d;
display: flex; align-items: center; gap: 16px;
}
header h1 { font-size: 16px; color: #58a6ff; font-weight: 600; }
.status {
font-size: 12px; padding: 3px 10px; border-radius: 12px;
background: #21262d; color: #8b949e;
}
.status.running { background: #1a4b2e; color: #3fb950; }
.status.done { background: #1a3a5c; color: #58a6ff; }
.status.error { background: #4b1a1a; color: #f85149; }
.chat { flex: 1; overflow-y: auto; padding: 16px; }
.msg {
margin: 8px 0; padding: 10px 14px; border-radius: 8px;
line-height: 1.6; white-space: pre-wrap; word-wrap: break-word;
}
.msg.user { background: #1a3a5c; color: #58a6ff; }
.msg.assistant { background: #161b22; color: #c9d1d9; }
.msg.event {
background: transparent; color: #8b949e; font-size: 11px;
padding: 4px 14px; border-left: 3px solid #30363d;
}
.msg.event.loop { border-left-color: #58a6ff; }
.msg.event.tool { border-left-color: #d29922; }
.msg.event.stall { border-left-color: #f85149; }
.input-bar {
padding: 12px 16px; background: #161b22;
border-top: 1px solid #30363d; display: flex; gap: 8px;
}
.input-bar input {
flex: 1; background: #0d1117; border: 1px solid #30363d;
color: #c9d1d9; padding: 8px 12px; border-radius: 6px;
font-family: inherit; font-size: 14px; outline: none;
}
.input-bar input:focus { border-color: #58a6ff; }
.input-bar button {
background: #238636; color: #fff; border: none;
padding: 8px 20px; border-radius: 6px; cursor: pointer;
font-family: inherit; font-weight: 600;
}
.input-bar button:hover { background: #2ea043; }
.input-bar button:disabled {
background: #21262d; color: #484f58; cursor: not-allowed;
}
.input-bar button.clear { background: #da3633; }
.input-bar button.clear:hover { background: #f85149; }
</style>
</head>
<body>
<header>
<h1>EventLoopNode Live</h1>
<span id="status" class="status">Idle</span>
<span id="iter" class="status" style="display:none">Step 0</span>
</header>
<div id="chat" class="chat"></div>
<div class="input-bar">
<input id="input" type="text"
placeholder="Ask anything..." autofocus />
<button id="go" onclick="run()">Send</button>
<button class="clear"
onclick="clearConversation()">Clear</button>
</div>
<script>
let ws = null;
let currentAssistantEl = null;
let iterCount = 0;
const chat = document.getElementById('chat');
const status = document.getElementById('status');
const iterEl = document.getElementById('iter');
const goBtn = document.getElementById('go');
const inputEl = document.getElementById('input');
inputEl.addEventListener('keydown', e => {
if (e.key === 'Enter') run();
});
function setStatus(text, cls) {
status.textContent = text;
status.className = 'status ' + cls;
}
function addMsg(text, cls) {
const el = document.createElement('div');
el.className = 'msg ' + cls;
el.textContent = text;
chat.appendChild(el);
chat.scrollTop = chat.scrollHeight;
return el;
}
function connect() {
ws = new WebSocket('ws://' + location.host + '/ws');
ws.onopen = () => {
setStatus('Ready', 'done');
goBtn.disabled = false;
};
ws.onmessage = handleEvent;
ws.onerror = () => { setStatus('Error', 'error'); };
ws.onclose = () => {
setStatus('Reconnecting...', '');
goBtn.disabled = true;
setTimeout(connect, 2000);
};
}
function handleEvent(msg) {
const evt = JSON.parse(msg.data);
if (evt.type === 'llm_text_delta') {
if (currentAssistantEl) {
currentAssistantEl.textContent += evt.content;
chat.scrollTop = chat.scrollHeight;
}
}
else if (evt.type === 'ready') {
setStatus('Ready', 'done');
if (currentAssistantEl && !currentAssistantEl.textContent)
currentAssistantEl.remove();
goBtn.disabled = false;
}
else if (evt.type === 'node_loop_iteration') {
iterCount = evt.iteration || (iterCount + 1);
iterEl.textContent = 'Step ' + iterCount;
iterEl.style.display = '';
}
else if (evt.type === 'tool_call_started') {
var info = evt.tool_name + '('
+ JSON.stringify(evt.tool_input).slice(0, 120) + ')';
addMsg('TOOL ' + info, 'event tool');
}
else if (evt.type === 'tool_call_completed') {
var preview = (evt.result || '').slice(0, 200);
var cls = evt.is_error ? 'stall' : 'tool';
addMsg('RESULT ' + evt.tool_name + ': ' + preview,
'event ' + cls);
currentAssistantEl = addMsg('', 'assistant');
}
else if (evt.type === 'result') {
setStatus('Session ended', evt.success ? 'done' : 'error');
if (evt.error) addMsg('ERROR ' + evt.error, 'event stall');
if (currentAssistantEl && !currentAssistantEl.textContent)
currentAssistantEl.remove();
goBtn.disabled = false;
}
else if (evt.type === 'node_stalled') {
addMsg('STALLED ' + evt.reason, 'event stall');
}
else if (evt.type === 'cleared') {
chat.innerHTML = '';
iterCount = 0;
iterEl.textContent = 'Step 0';
iterEl.style.display = 'none';
setStatus('Ready', 'done');
goBtn.disabled = false;
}
}
function run() {
const text = inputEl.value.trim();
if (!text || !ws || ws.readyState !== 1) return;
addMsg(text, 'user');
currentAssistantEl = addMsg('', 'assistant');
inputEl.value = '';
setStatus('Running', 'running');
goBtn.disabled = true;
ws.send(JSON.stringify({ topic: text }));
}
function clearConversation() {
if (ws && ws.readyState === 1) {
ws.send(JSON.stringify({ command: 'clear' }));
}
}
connect();
</script>
</body>
</html>"""
)
# -------------------------------------------------------------------------
# WebSocket handler
# -------------------------------------------------------------------------
async def handle_ws(websocket):
"""Persistent WebSocket: long-lived EventLoopNode with client_facing blocking."""
global STORE
# -- Event forwarding (WebSocket ← EventBus) ----------------------------
bus = EventBus()
async def forward_event(event):
try:
payload = {"type": event.type.value, **event.data}
if event.node_id:
payload["node_id"] = event.node_id
await websocket.send(json.dumps(payload))
except Exception:
pass
bus.subscribe(
event_types=[
EventType.NODE_LOOP_STARTED,
EventType.NODE_LOOP_ITERATION,
EventType.NODE_LOOP_COMPLETED,
EventType.LLM_TEXT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
EventType.NODE_STALLED,
],
handler=forward_event,
)
# -- Per-connection state -----------------------------------------------
node = None
loop_task = None
tools = list(TOOL_REGISTRY.get_tools().values())
tool_executor = TOOL_REGISTRY.get_executor()
node_spec = NodeSpec(
id="assistant",
name="Chat Assistant",
description="A conversational assistant that remembers context across messages",
node_type="event_loop",
client_facing=True,
system_prompt=(
"You are a helpful assistant with access to tools. "
"You can search the web, scrape webpages, and query HubSpot CRM. "
"Use tools when the user asks for current information or external data. "
"You have full conversation history, so you can reference previous messages."
),
)
# -- Ready callback: subscribe to CLIENT_INPUT_REQUESTED on the bus ---
async def on_input_requested(event):
try:
await websocket.send(json.dumps({"type": "ready"}))
except Exception:
pass
bus.subscribe(
event_types=[EventType.CLIENT_INPUT_REQUESTED],
handler=on_input_requested,
)
async def start_loop(first_message: str):
"""Create an EventLoopNode and run it as a background task."""
nonlocal node, loop_task
memory = SharedMemory()
ctx = NodeContext(
runtime=RUNTIME,
node_id="assistant",
node_spec=node_spec,
memory=memory,
input_data={},
llm=LLM,
available_tools=tools,
)
node = EventLoopNode(
event_bus=bus,
config=LoopConfig(max_iterations=10_000, max_history_tokens=32_000),
conversation_store=STORE,
tool_executor=tool_executor,
)
await node.inject_event(first_message)
async def _run():
try:
result = await node.execute(ctx)
try:
await websocket.send(
json.dumps(
{
"type": "result",
"success": result.success,
"output": result.output,
"error": result.error,
"tokens": result.tokens_used,
}
)
)
except Exception:
pass
logger.info(f"Loop ended: success={result.success}, tokens={result.tokens_used}")
except websockets.exceptions.ConnectionClosed:
logger.info("Loop stopped: WebSocket closed")
except Exception as e:
logger.exception("Loop error")
try:
await websocket.send(
json.dumps(
{
"type": "result",
"success": False,
"error": str(e),
"output": {},
}
)
)
except Exception:
pass
loop_task = asyncio.create_task(_run())
async def stop_loop():
"""Signal the node and wait for the loop task to finish."""
nonlocal node, loop_task
if loop_task and not loop_task.done():
if node:
node.signal_shutdown()
try:
await asyncio.wait_for(loop_task, timeout=5.0)
except (TimeoutError, asyncio.CancelledError):
loop_task.cancel()
node = None
loop_task = None
# -- Message loop (runs for the lifetime of this WebSocket) -------------
try:
async for raw in websocket:
try:
msg = json.loads(raw)
except Exception:
continue
# Clear command
if msg.get("command") == "clear":
import shutil
await stop_loop()
await STORE.close()
conv_dir = STORE_DIR / "conversation"
if conv_dir.exists():
shutil.rmtree(conv_dir)
STORE = FileConversationStore(conv_dir)
await websocket.send(json.dumps({"type": "cleared"}))
logger.info("Conversation cleared")
continue
topic = msg.get("topic", "")
if not topic:
continue
if node is None:
# First message — spin up the loop
logger.info(f"Starting persistent loop: {topic}")
await start_loop(topic)
else:
# Subsequent message — inject into the running loop
logger.info(f"Injecting message: {topic}")
await node.inject_event(topic)
except websockets.exceptions.ConnectionClosed:
pass
finally:
await stop_loop()
logger.info("WebSocket closed, loop stopped")
# -------------------------------------------------------------------------
# HTTP handler for serving the HTML page
# -------------------------------------------------------------------------
async def process_request(connection, request: Request):
"""Serve HTML on GET /, upgrade to WebSocket on /ws."""
if request.path == "/ws":
return None # let websockets handle the upgrade
# Serve the HTML page for any other path
return Response(
HTTPStatus.OK,
"OK",
websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
HTML_PAGE.encode(),
)
# -------------------------------------------------------------------------
# Main
# -------------------------------------------------------------------------
async def main():
port = 8765
async with websockets.serve(
handle_ws,
"0.0.0.0",
port,
process_request=process_request,
):
logger.info(f"Demo running at http://localhost:{port}")
logger.info("Open in your browser and enter a topic to research.")
await asyncio.Future() # run forever
if __name__ == "__main__":
asyncio.run(main())
File diff suppressed because it is too large Load Diff
+930
View File
@@ -0,0 +1,930 @@
#!/usr/bin/env python3
"""
Two-Node ContextHandoff Demo
Demonstrates ContextHandoff between two EventLoopNode instances:
Node A (Researcher) ContextHandoff Node B (Analyst)
Real LLM, real FileConversationStore, real EventBus.
Streams both nodes to a browser via WebSocket.
Usage:
cd /home/timothy/oss/hive/core
python demos/handoff_demo.py
Then open http://localhost:8766 in your browser.
"""
import asyncio
import json
import logging
import sys
import tempfile
from http import HTTPStatus
from pathlib import Path
import httpx
import websockets
from bs4 import BeautifulSoup
from websockets.http11 import Request, Response
# Add core, tools, and hive root to path
_CORE_DIR = Path(__file__).resolve().parent.parent
_HIVE_DIR = _CORE_DIR.parent
sys.path.insert(0, str(_CORE_DIR)) # framework.*
sys.path.insert(0, str(_HIVE_DIR / "tools" / "src")) # aden_tools.*
sys.path.insert(0, str(_HIVE_DIR)) # core.framework.* (for aden_tools imports)
from aden_tools.credentials import CREDENTIAL_SPECS, CredentialStoreAdapter # noqa: E402
from core.framework.credentials import CredentialStore # noqa: E402
from framework.credentials.storage import ( # noqa: E402
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
from framework.graph.context_handoff import ContextHandoff # noqa: E402
from framework.graph.conversation import NodeConversation # noqa: E402
from framework.graph.event_loop_node import EventLoopNode, LoopConfig # noqa: E402
from framework.graph.node import NodeContext, NodeSpec, SharedMemory # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.llm.provider import Tool # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
from framework.runtime.event_bus import EventBus, EventType # noqa: E402
from framework.storage.conversation_store import FileConversationStore # noqa: E402
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(message)s")
logger = logging.getLogger("handoff_demo")
# -------------------------------------------------------------------------
# Persistent state
# -------------------------------------------------------------------------
STORE_DIR = Path(tempfile.mkdtemp(prefix="hive_handoff_"))
RUNTIME = Runtime(STORE_DIR / "runtime")
LLM = LiteLLMProvider(model="claude-sonnet-4-5-20250929")
# -------------------------------------------------------------------------
# Credentials
# -------------------------------------------------------------------------
# Composite credential store: encrypted files (primary) + env vars (fallback)
_env_mapping = {name: spec.env_var for name, spec in CREDENTIAL_SPECS.items()}
_composite = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=_env_mapping)],
)
CREDENTIALS = CredentialStoreAdapter(CredentialStore(storage=_composite))
for _name in ["brave_search", "hubspot"]:
_val = CREDENTIALS.get(_name)
if _val:
logger.debug("credential %s: OK (len=%d)", _name, len(_val))
else:
logger.debug("credential %s: not found", _name)
# -------------------------------------------------------------------------
# Tool Registry — web_search + web_scrape for Node A (Researcher)
# -------------------------------------------------------------------------
TOOL_REGISTRY = ToolRegistry()
def _exec_web_search(inputs: dict) -> dict:
api_key = CREDENTIALS.get("brave_search")
if not api_key:
return {"error": "brave_search credential not configured"}
query = inputs.get("query", "")
num_results = min(inputs.get("num_results", 10), 20)
resp = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": num_results},
headers={
"X-Subscription-Token": api_key,
"Accept": "application/json",
},
timeout=30.0,
)
if resp.status_code != 200:
return {"error": f"Brave API HTTP {resp.status_code}"}
data = resp.json()
results = [
{
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("description", ""),
}
for item in data.get("web", {}).get("results", [])[:num_results]
]
return {"query": query, "results": results, "total": len(results)}
TOOL_REGISTRY.register(
name="web_search",
tool=Tool(
name="web_search",
description=(
"Search the web for current information. "
"Returns titles, URLs, and snippets from search results."
),
parameters={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query (1-500 characters)",
},
"num_results": {
"type": "integer",
"description": "Number of results (1-20, default 10)",
},
},
"required": ["query"],
},
),
executor=lambda inputs: _exec_web_search(inputs),
)
_SCRAPE_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml",
}
def _exec_web_scrape(inputs: dict) -> dict:
url = inputs.get("url", "")
max_length = max(1000, min(inputs.get("max_length", 50000), 500000))
if not url.startswith(("http://", "https://")):
url = "https://" + url
try:
resp = httpx.get(
url,
timeout=30.0,
follow_redirects=True,
headers=_SCRAPE_HEADERS,
)
if resp.status_code != 200:
return {"error": f"HTTP {resp.status_code}"}
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "footer", "header", "aside", "noscript"]):
tag.decompose()
title = soup.title.get_text(strip=True) if soup.title else ""
main = (
soup.find("article")
or soup.find("main")
or soup.find(attrs={"role": "main"})
or soup.find("body")
)
text = main.get_text(separator=" ", strip=True) if main else ""
text = " ".join(text.split())
if len(text) > max_length:
text = text[:max_length] + "..."
return {
"url": url,
"title": title,
"content": text,
"length": len(text),
}
except httpx.TimeoutException:
return {"error": "Request timed out"}
except Exception as e:
return {"error": f"Scrape failed: {e}"}
TOOL_REGISTRY.register(
name="web_scrape",
tool=Tool(
name="web_scrape",
description=(
"Scrape and extract text content from a webpage URL. "
"Returns the page title and main text content."
),
parameters={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL of the webpage to scrape",
},
"max_length": {
"type": "integer",
"description": "Maximum text length (default 50000)",
},
},
"required": ["url"],
},
),
executor=lambda inputs: _exec_web_scrape(inputs),
)
logger.info(
"ToolRegistry loaded: %s",
", ".join(TOOL_REGISTRY.get_registered_names()),
)
# -------------------------------------------------------------------------
# Node Specs
# -------------------------------------------------------------------------
RESEARCHER_SPEC = NodeSpec(
id="researcher",
name="Researcher",
description="Researches a topic using web search and scraping tools",
node_type="event_loop",
input_keys=["topic"],
output_keys=["research_summary"],
system_prompt=(
"You are a thorough research assistant. Your job is to research "
"the given topic using the web_search and web_scrape tools.\n\n"
"1. Search for relevant information on the topic\n"
"2. Scrape 1-2 of the most promising URLs for details\n"
"3. Synthesize your findings into a comprehensive summary\n"
"4. Use set_output with key='research_summary' to save your "
"findings\n\n"
"Be thorough but efficient. Aim for 2-4 search/scrape calls, "
"then summarize and set_output."
),
)
ANALYST_SPEC = NodeSpec(
id="analyst",
name="Analyst",
description="Analyzes research findings and provides insights",
node_type="event_loop",
input_keys=["context"],
output_keys=["analysis"],
system_prompt=(
"You are a strategic analyst. You receive research findings from "
"a previous researcher and must:\n\n"
"1. Identify key themes and patterns\n"
"2. Assess the reliability and significance of the findings\n"
"3. Provide actionable insights and recommendations\n"
"4. Use set_output with key='analysis' to save your analysis\n\n"
"Be concise but insightful. Focus on what matters most."
),
)
# -------------------------------------------------------------------------
# HTML page
# -------------------------------------------------------------------------
HTML_PAGE = ( # noqa: E501
"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>ContextHandoff Demo</title>
<style>
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
font-family: 'SF Mono', 'Fira Code', monospace;
background: #0d1117;
color: #c9d1d9;
height: 100vh;
display: flex;
flex-direction: column;
}
header {
background: #161b22;
padding: 12px 20px;
border-bottom: 1px solid #30363d;
display: flex;
align-items: center;
gap: 16px;
}
header h1 {
font-size: 16px;
color: #58a6ff;
font-weight: 600;
}
.badge {
font-size: 12px;
padding: 3px 10px;
border-radius: 12px;
background: #21262d;
color: #8b949e;
}
.badge.researcher {
background: #1a3a5c;
color: #58a6ff;
}
.badge.analyst {
background: #1a4b2e;
color: #3fb950;
}
.badge.handoff {
background: #3d1f00;
color: #d29922;
}
.badge.done {
background: #21262d;
color: #8b949e;
}
.badge.error {
background: #4b1a1a;
color: #f85149;
}
.chat {
flex: 1;
overflow-y: auto;
padding: 16px;
}
.msg {
margin: 8px 0;
padding: 10px 14px;
border-radius: 8px;
line-height: 1.6;
white-space: pre-wrap;
word-wrap: break-word;
}
.msg.user {
background: #1a3a5c;
color: #58a6ff;
}
.msg.assistant {
background: #161b22;
color: #c9d1d9;
}
.msg.assistant.analyst-msg {
border-left: 3px solid #3fb950;
}
.msg.event {
background: transparent;
color: #8b949e;
font-size: 11px;
padding: 4px 14px;
border-left: 3px solid #30363d;
}
.msg.event.loop {
border-left-color: #58a6ff;
}
.msg.event.tool {
border-left-color: #d29922;
}
.msg.event.stall {
border-left-color: #f85149;
}
.handoff-banner {
margin: 16px 0;
padding: 16px;
background: #1c1200;
border: 1px solid #d29922;
border-radius: 8px;
text-align: center;
}
.handoff-banner h3 {
color: #d29922;
font-size: 14px;
margin-bottom: 8px;
}
.handoff-banner p, .result-banner p {
color: #8b949e;
font-size: 12px;
line-height: 1.5;
max-height: 200px;
overflow-y: auto;
white-space: pre-wrap;
text-align: left;
}
.result-banner {
margin: 16px 0;
padding: 16px;
background: #0a2614;
border: 1px solid #3fb950;
border-radius: 8px;
}
.result-banner h3 {
color: #3fb950;
font-size: 14px;
margin-bottom: 8px;
text-align: center;
}
.result-banner .label {
color: #58a6ff;
font-size: 11px;
font-weight: 600;
margin-top: 10px;
margin-bottom: 2px;
}
.result-banner .tokens {
color: #484f58;
font-size: 11px;
text-align: center;
margin-top: 10px;
}
.input-bar {
padding: 12px 16px;
background: #161b22;
border-top: 1px solid #30363d;
display: flex;
gap: 8px;
}
.input-bar input {
flex: 1;
background: #0d1117;
border: 1px solid #30363d;
color: #c9d1d9;
padding: 8px 12px;
border-radius: 6px;
font-family: inherit;
font-size: 14px;
outline: none;
}
.input-bar input:focus {
border-color: #58a6ff;
}
.input-bar button {
background: #238636;
color: #fff;
border: none;
padding: 8px 20px;
border-radius: 6px;
cursor: pointer;
font-family: inherit;
font-weight: 600;
}
.input-bar button:hover {
background: #2ea043;
}
.input-bar button:disabled {
background: #21262d;
color: #484f58;
cursor: not-allowed;
}
</style>
</head>
<body>
<header>
<h1>ContextHandoff Demo</h1>
<span id="phase" class="badge">Idle</span>
<span id="iter" class="badge" style="display:none">Step 0</span>
</header>
<div id="chat" class="chat"></div>
<div class="input-bar">
<input id="input" type="text"
placeholder="Enter a research topic..." autofocus />
<button id="go" onclick="run()">Research</button>
</div>
<script>
let ws = null;
let currentAssistantEl = null;
let iterCount = 0;
let currentPhase = 'idle';
const chat = document.getElementById('chat');
const phase = document.getElementById('phase');
const iterEl = document.getElementById('iter');
const goBtn = document.getElementById('go');
const inputEl = document.getElementById('input');
inputEl.addEventListener('keydown', e => {
if (e.key === 'Enter') run();
});
function setPhase(text, cls) {
phase.textContent = text;
phase.className = 'badge ' + cls;
currentPhase = cls;
}
function addMsg(text, cls) {
const el = document.createElement('div');
el.className = 'msg ' + cls;
el.textContent = text;
chat.appendChild(el);
chat.scrollTop = chat.scrollHeight;
return el;
}
function addHandoffBanner(summary) {
const banner = document.createElement('div');
banner.className = 'handoff-banner';
const h3 = document.createElement('h3');
h3.textContent = 'Context Handoff: Researcher -> Analyst';
const p = document.createElement('p');
p.textContent = summary || 'Passing research context...';
banner.appendChild(h3);
banner.appendChild(p);
chat.appendChild(banner);
chat.scrollTop = chat.scrollHeight;
}
function addResultBanner(researcher, analyst, tokens) {
const banner = document.createElement('div');
banner.className = 'result-banner';
const h3 = document.createElement('h3');
h3.textContent = 'Pipeline Complete';
banner.appendChild(h3);
if (researcher && researcher.research_summary) {
const lbl = document.createElement('div');
lbl.className = 'label';
lbl.textContent = 'RESEARCH SUMMARY';
banner.appendChild(lbl);
const p = document.createElement('p');
p.textContent = researcher.research_summary;
banner.appendChild(p);
}
if (analyst && analyst.analysis) {
const lbl = document.createElement('div');
lbl.className = 'label';
lbl.textContent = 'ANALYSIS';
lbl.style.color = '#3fb950';
banner.appendChild(lbl);
const p = document.createElement('p');
p.textContent = analyst.analysis;
banner.appendChild(p);
}
if (tokens) {
const t = document.createElement('div');
t.className = 'tokens';
t.textContent = 'Total tokens: ' + tokens.toLocaleString();
banner.appendChild(t);
}
chat.appendChild(banner);
chat.scrollTop = chat.scrollHeight;
}
function connect() {
ws = new WebSocket('ws://' + location.host + '/ws');
ws.onopen = () => {
setPhase('Ready', 'done');
goBtn.disabled = false;
};
ws.onmessage = handleEvent;
ws.onerror = () => { setPhase('Error', 'error'); };
ws.onclose = () => {
setPhase('Reconnecting...', '');
goBtn.disabled = true;
setTimeout(connect, 2000);
};
}
function handleEvent(msg) {
const evt = JSON.parse(msg.data);
if (evt.type === 'phase') {
if (evt.phase === 'researcher') {
setPhase('Researcher', 'researcher');
} else if (evt.phase === 'handoff') {
setPhase('Handoff', 'handoff');
} else if (evt.phase === 'analyst') {
setPhase('Analyst', 'analyst');
}
iterCount = 0;
iterEl.style.display = 'none';
}
else if (evt.type === 'llm_text_delta') {
if (currentAssistantEl) {
currentAssistantEl.textContent += evt.content;
chat.scrollTop = chat.scrollHeight;
}
}
else if (evt.type === 'node_loop_iteration') {
iterCount = evt.iteration || (iterCount + 1);
iterEl.textContent = 'Step ' + iterCount;
iterEl.style.display = '';
}
else if (evt.type === 'tool_call_started') {
var info = evt.tool_name + '('
+ JSON.stringify(evt.tool_input).slice(0, 120) + ')';
addMsg('TOOL ' + info, 'event tool');
}
else if (evt.type === 'tool_call_completed') {
var preview = (evt.result || '').slice(0, 200);
var cls = evt.is_error ? 'stall' : 'tool';
addMsg(
'RESULT ' + evt.tool_name + ': ' + preview,
'event ' + cls
);
var assistCls = currentPhase === 'analyst'
? 'assistant analyst-msg' : 'assistant';
currentAssistantEl = addMsg('', assistCls);
}
else if (evt.type === 'handoff_context') {
addHandoffBanner(evt.summary);
var assistCls = 'assistant analyst-msg';
currentAssistantEl = addMsg('', assistCls);
}
else if (evt.type === 'node_result') {
if (evt.node_id === 'researcher') {
if (currentAssistantEl
&& !currentAssistantEl.textContent) {
currentAssistantEl.remove();
}
}
}
else if (evt.type === 'done') {
setPhase('Done', 'done');
iterEl.style.display = 'none';
if (currentAssistantEl
&& !currentAssistantEl.textContent) {
currentAssistantEl.remove();
}
currentAssistantEl = null;
addResultBanner(
evt.researcher, evt.analyst, evt.total_tokens
);
goBtn.disabled = false;
inputEl.placeholder = 'Enter another topic...';
}
else if (evt.type === 'error') {
setPhase('Error', 'error');
addMsg('ERROR ' + evt.message, 'event stall');
goBtn.disabled = false;
}
else if (evt.type === 'node_stalled') {
addMsg('STALLED ' + evt.reason, 'event stall');
}
}
function run() {
const text = inputEl.value.trim();
if (!text || !ws || ws.readyState !== 1) return;
chat.innerHTML = '';
addMsg(text, 'user');
currentAssistantEl = addMsg('', 'assistant');
inputEl.value = '';
goBtn.disabled = true;
ws.send(JSON.stringify({ topic: text }));
}
connect();
</script>
</body>
</html>"""
)
# -------------------------------------------------------------------------
# WebSocket handler — sequential Node A → Handoff → Node B
# -------------------------------------------------------------------------
async def handle_ws(websocket):
"""Run the two-node handoff pipeline per user message."""
try:
async for raw in websocket:
try:
msg = json.loads(raw)
except Exception:
continue
topic = msg.get("topic", "")
if not topic:
continue
logger.info(f"Starting handoff pipeline for: {topic}")
try:
await _run_pipeline(websocket, topic)
except websockets.exceptions.ConnectionClosed:
logger.info("WebSocket closed during pipeline")
return
except Exception as e:
logger.exception("Pipeline error")
try:
await websocket.send(json.dumps({"type": "error", "message": str(e)}))
except Exception:
pass
except websockets.exceptions.ConnectionClosed:
pass
async def _run_pipeline(websocket, topic: str):
"""Execute: Node A (research) → ContextHandoff → Node B (analysis)."""
import shutil
# Fresh stores for each run
run_dir = Path(tempfile.mkdtemp(prefix="hive_run_", dir=STORE_DIR))
store_a = FileConversationStore(run_dir / "node_a")
store_b = FileConversationStore(run_dir / "node_b")
# Shared event bus
bus = EventBus()
async def forward_event(event):
try:
payload = {"type": event.type.value, **event.data}
if event.node_id:
payload["node_id"] = event.node_id
await websocket.send(json.dumps(payload))
except Exception:
pass
bus.subscribe(
event_types=[
EventType.NODE_LOOP_STARTED,
EventType.NODE_LOOP_ITERATION,
EventType.NODE_LOOP_COMPLETED,
EventType.LLM_TEXT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
EventType.NODE_STALLED,
],
handler=forward_event,
)
tools = list(TOOL_REGISTRY.get_tools().values())
tool_executor = TOOL_REGISTRY.get_executor()
# ---- Phase 1: Researcher ------------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "researcher"}))
node_a = EventLoopNode(
event_bus=bus,
judge=None, # implicit judge: accept when output_keys filled
config=LoopConfig(
max_iterations=20,
max_tool_calls_per_turn=10,
max_history_tokens=32_000,
),
conversation_store=store_a,
tool_executor=tool_executor,
)
ctx_a = NodeContext(
runtime=RUNTIME,
node_id="researcher",
node_spec=RESEARCHER_SPEC,
memory=SharedMemory(),
input_data={"topic": topic},
llm=LLM,
available_tools=tools,
)
result_a = await node_a.execute(ctx_a)
logger.info(
"Researcher done: success=%s, tokens=%s",
result_a.success,
result_a.tokens_used,
)
await websocket.send(
json.dumps(
{
"type": "node_result",
"node_id": "researcher",
"success": result_a.success,
"output": result_a.output,
}
)
)
if not result_a.success:
await websocket.send(
json.dumps(
{
"type": "error",
"message": f"Researcher failed: {result_a.error}",
}
)
)
return
# ---- Phase 2: Context Handoff -------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "handoff"}))
# Restore the researcher's conversation from store
conversation_a = await NodeConversation.restore(store_a)
if conversation_a is None:
await websocket.send(
json.dumps(
{
"type": "error",
"message": "Failed to restore researcher conversation",
}
)
)
return
handoff_engine = ContextHandoff(llm=LLM)
handoff_context = handoff_engine.summarize_conversation(
conversation=conversation_a,
node_id="researcher",
output_keys=["research_summary"],
)
formatted_handoff = ContextHandoff.format_as_input(handoff_context)
logger.info(
"Handoff: %d turns, ~%d tokens, keys=%s",
handoff_context.turn_count,
handoff_context.total_tokens_used,
list(handoff_context.key_outputs.keys()),
)
# Send handoff context to browser
await websocket.send(
json.dumps(
{
"type": "handoff_context",
"summary": handoff_context.summary[:500],
"turn_count": handoff_context.turn_count,
"tokens": handoff_context.total_tokens_used,
"key_outputs": handoff_context.key_outputs,
}
)
)
# ---- Phase 3: Analyst ---------------------------------------------------
await websocket.send(json.dumps({"type": "phase", "phase": "analyst"}))
node_b = EventLoopNode(
event_bus=bus,
judge=None, # implicit judge
config=LoopConfig(
max_iterations=10,
max_tool_calls_per_turn=5,
max_history_tokens=32_000,
),
conversation_store=store_b,
)
ctx_b = NodeContext(
runtime=RUNTIME,
node_id="analyst",
node_spec=ANALYST_SPEC,
memory=SharedMemory(),
input_data={"context": formatted_handoff},
llm=LLM,
available_tools=[],
)
result_b = await node_b.execute(ctx_b)
logger.info(
"Analyst done: success=%s, tokens=%s",
result_b.success,
result_b.tokens_used,
)
# ---- Done ---------------------------------------------------------------
await websocket.send(
json.dumps(
{
"type": "done",
"researcher": result_a.output,
"analyst": result_b.output,
"total_tokens": ((result_a.tokens_used or 0) + (result_b.tokens_used or 0)),
}
)
)
# Clean up temp stores
try:
shutil.rmtree(run_dir)
except Exception:
pass
# -------------------------------------------------------------------------
# HTTP handler
# -------------------------------------------------------------------------
async def process_request(connection, request: Request):
"""Serve HTML on GET /, upgrade to WebSocket on /ws."""
if request.path == "/ws":
return None
return Response(
HTTPStatus.OK,
"OK",
websockets.Headers({"Content-Type": "text/html; charset=utf-8"}),
HTML_PAGE.encode(),
)
# -------------------------------------------------------------------------
# Main
# -------------------------------------------------------------------------
async def main():
port = 8766
async with websockets.serve(
handle_ws,
"0.0.0.0",
port,
process_request=process_request,
):
logger.info(f"Handoff demo at http://localhost:{port}")
logger.info("Enter a research topic to start the pipeline.")
await asyncio.Future()
if __name__ == "__main__":
asyncio.run(main())
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -9,7 +9,7 @@ for understanding the core runtime loop:
Setup -> Graph definition -> Execution -> Result
Run with:
PYTHONPATH=core python core/examples/manual_agent.py
uv run python core/examples/manual_agent.py
"""
import asyncio
+2 -2
View File
@@ -15,7 +15,7 @@ You cannot skip steps or bypass validation.
from collections.abc import Callable
from datetime import datetime
from enum import Enum
from enum import StrEnum
from pathlib import Path
from typing import Any
@@ -26,7 +26,7 @@ from framework.graph.goal import Goal
from framework.graph.node import NodeSpec
class BuildPhase(str, Enum):
class BuildPhase(StrEnum):
"""Current phase of the build process."""
INIT = "init" # Just started
+88 -6
View File
@@ -64,6 +64,8 @@ class AdenCachedStorage(CredentialStorage):
- **Reads**: Try local cache first, fallback to Aden if stale/missing
- **Writes**: Always write to local cache
- **Offline resilience**: Uses cached credentials when Aden is unreachable
- **Provider-based lookup**: Match credentials by provider name (e.g., "hubspot")
when direct ID lookup fails, since Aden uses hash-based IDs internally.
The cache TTL determines how long to trust local credentials before
checking with the Aden server for updates. This balances:
@@ -85,6 +87,7 @@ class AdenCachedStorage(CredentialStorage):
# First access fetches from Aden
# Subsequent accesses use cache until TTL expires
# Can look up by provider name OR credential ID
token = store.get_key("hubspot", "access_token")
"""
@@ -111,21 +114,24 @@ class AdenCachedStorage(CredentialStorage):
self._cache_ttl = timedelta(seconds=cache_ttl_seconds)
self._prefer_local = prefer_local
self._cache_timestamps: dict[str, datetime] = {}
# Index: provider name (e.g., "hubspot") -> credential hash ID
self._provider_index: dict[str, str] = {}
def save(self, credential: CredentialObject) -> None:
"""
Save credential to local cache.
Save credential to local cache and update provider index.
Args:
credential: The credential to save.
"""
self._local.save(credential)
self._cache_timestamps[credential.id] = datetime.now(UTC)
self._index_provider(credential)
logger.debug(f"Cached credential '{credential.id}'")
def load(self, credential_id: str) -> CredentialObject | None:
"""
Load credential from cache, with Aden fallback.
Load credential from cache, with Aden fallback and provider-based lookup.
The loading strategy depends on the `prefer_local` setting:
@@ -141,8 +147,37 @@ class AdenCachedStorage(CredentialStorage):
2. Update local cache with response
3. Fall back to local cache only if Aden fails
Provider-based lookup:
When a provider index mapping exists for the credential_id (e.g.,
"hubspot" hash ID), the Aden-synced credential is loaded first.
This ensures fresh OAuth tokens from Aden take priority over stale
local credentials (env vars, old encrypted files).
Args:
credential_id: The credential identifier.
credential_id: The credential identifier or provider name.
Returns:
CredentialObject if found, None otherwise.
"""
# Check provider index first — Aden-synced credentials take priority
resolved_id = self._provider_index.get(credential_id)
if resolved_id and resolved_id != credential_id:
result = self._load_by_id(resolved_id)
if result is not None:
logger.info(
f"Loaded credential '{credential_id}' via provider index (id='{resolved_id}')"
)
return result
# Direct lookup (exact credential_id match)
return self._load_by_id(credential_id)
def _load_by_id(self, credential_id: str) -> CredentialObject | None:
"""
Load credential by exact ID from cache, with Aden fallback.
Args:
credential_id: The exact credential identifier.
Returns:
CredentialObject if found, None otherwise.
@@ -200,15 +235,21 @@ class AdenCachedStorage(CredentialStorage):
def exists(self, credential_id: str) -> bool:
"""
Check if credential exists in local cache.
Check if credential exists in local cache (by ID or provider name).
Args:
credential_id: The credential identifier.
credential_id: The credential identifier or provider name.
Returns:
True if credential exists locally.
"""
return self._local.exists(credential_id)
if self._local.exists(credential_id):
return True
# Check provider index
resolved_id = self._provider_index.get(credential_id)
if resolved_id and resolved_id != credential_id:
return self._local.exists(resolved_id)
return False
def _is_cache_fresh(self, credential_id: str) -> bool:
"""
@@ -242,6 +283,47 @@ class AdenCachedStorage(CredentialStorage):
self._cache_timestamps.clear()
logger.debug("Invalidated all cache entries")
def _index_provider(self, credential: CredentialObject) -> None:
"""
Index a credential by its provider/integration type.
Aden credentials carry an ``_integration_type`` key whose value is
the provider name (e.g., ``hubspot``). This method maps that
provider name to the credential's hash ID so that subsequent
``load("hubspot")`` calls resolve to the correct credential.
Args:
credential: The credential to index.
"""
integration_type_key = credential.keys.get("_integration_type")
if integration_type_key is None:
return
provider_name = integration_type_key.value.get_secret_value()
if provider_name:
self._provider_index[provider_name] = credential.id
logger.debug(f"Indexed provider '{provider_name}' -> '{credential.id}'")
def rebuild_provider_index(self) -> int:
"""
Rebuild the provider index from all locally cached credentials.
Useful after loading from disk when the in-memory index is empty.
Returns:
Number of provider mappings indexed.
"""
self._provider_index.clear()
indexed = 0
for cred_id in self._local.list_all():
cred = self._local.load(cred_id)
if cred:
before = len(self._provider_index)
self._index_provider(cred)
if len(self._provider_index) > before:
indexed += 1
logger.debug(f"Rebuilt provider index with {indexed} mappings")
return indexed
def sync_all_from_aden(self) -> int:
"""
Sync all credentials from Aden server to local cache.
@@ -589,6 +589,149 @@ class TestAdenCachedStorage:
assert info["stale"]["is_fresh"] is False
assert info["stale"]["ttl_remaining_seconds"] == 0
def test_save_indexes_provider(self, cached_storage):
"""Test save builds the provider index from _integration_type key."""
cred = CredentialObject(
id="aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1",
credential_type=CredentialType.OAUTH2,
keys={
"access_token": CredentialKey(
name="access_token",
value=SecretStr("token-value"),
),
"_integration_type": CredentialKey(
name="_integration_type",
value=SecretStr("hubspot"),
),
},
)
cached_storage.save(cred)
assert cached_storage._provider_index["hubspot"] == "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
def test_load_by_provider_name(self, cached_storage):
"""Test load resolves provider name to hash-based credential ID."""
hash_id = "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
cred = CredentialObject(
id=hash_id,
credential_type=CredentialType.OAUTH2,
keys={
"access_token": CredentialKey(
name="access_token",
value=SecretStr("hubspot-token"),
),
"_integration_type": CredentialKey(
name="_integration_type",
value=SecretStr("hubspot"),
),
},
)
# Save builds the index
cached_storage.save(cred)
# Load by provider name should resolve to the hash ID
loaded = cached_storage.load("hubspot")
assert loaded is not None
assert loaded.id == hash_id
assert loaded.keys["access_token"].value.get_secret_value() == "hubspot-token"
def test_load_by_direct_id_still_works(self, cached_storage):
"""Test load by direct hash ID still works as before."""
hash_id = "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
cred = CredentialObject(
id=hash_id,
credential_type=CredentialType.OAUTH2,
keys={
"access_token": CredentialKey(
name="access_token",
value=SecretStr("token"),
),
"_integration_type": CredentialKey(
name="_integration_type",
value=SecretStr("hubspot"),
),
},
)
cached_storage.save(cred)
# Direct ID lookup should still work
loaded = cached_storage.load(hash_id)
assert loaded is not None
assert loaded.id == hash_id
def test_exists_by_provider_name(self, cached_storage):
"""Test exists resolves provider name to hash-based credential ID."""
hash_id = "c2xhY2s6dGVzdDo5OTk="
cred = CredentialObject(
id=hash_id,
credential_type=CredentialType.OAUTH2,
keys={
"access_token": CredentialKey(
name="access_token",
value=SecretStr("slack-token"),
),
"_integration_type": CredentialKey(
name="_integration_type",
value=SecretStr("slack"),
),
},
)
cached_storage.save(cred)
assert cached_storage.exists("slack") is True
assert cached_storage.exists(hash_id) is True
assert cached_storage.exists("nonexistent") is False
def test_rebuild_provider_index(self, cached_storage, local_storage):
"""Test rebuild_provider_index reconstructs from local storage."""
# Manually save credentials to local storage (bypassing cached_storage.save)
for provider_name, hash_id in [("hubspot", "hash_hub"), ("slack", "hash_slack")]:
cred = CredentialObject(
id=hash_id,
credential_type=CredentialType.OAUTH2,
keys={
"_integration_type": CredentialKey(
name="_integration_type",
value=SecretStr(provider_name),
),
},
)
local_storage.save(cred)
# Index should be empty (we bypassed save)
assert len(cached_storage._provider_index) == 0
# Rebuild
indexed = cached_storage.rebuild_provider_index()
assert indexed == 2
assert cached_storage._provider_index["hubspot"] == "hash_hub"
assert cached_storage._provider_index["slack"] == "hash_slack"
def test_save_without_integration_type_no_index(self, cached_storage):
"""Test save does not index credentials without _integration_type key."""
cred = CredentialObject(
id="plain-cred",
credential_type=CredentialType.API_KEY,
keys={
"api_key": CredentialKey(
name="api_key",
value=SecretStr("key-value"),
),
},
)
cached_storage.save(cred)
assert "plain-cred" not in cached_storage._provider_index
assert len(cached_storage._provider_index) == 0
# =============================================================================
# Integration Tests
+2 -2
View File
@@ -8,7 +8,7 @@ containing one or more keys (e.g., api_key, access_token, refresh_token).
from __future__ import annotations
from datetime import UTC, datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field, SecretStr
@@ -19,7 +19,7 @@ def _utc_now() -> datetime:
return datetime.now(UTC)
class CredentialType(str, Enum):
class CredentialType(StrEnum):
"""Types of credentials the store can manage."""
API_KEY = "api_key"
@@ -96,7 +96,7 @@ class BaseOAuth2Provider(CredentialProvider):
self._client = httpx.Client(timeout=self.config.request_timeout)
except ImportError as e:
raise ImportError(
"OAuth2 provider requires 'httpx'. Install with: pip install httpx"
"OAuth2 provider requires 'httpx'. Install with: uv pip install httpx"
) from e
return self._client
@@ -11,11 +11,11 @@ from __future__ import annotations
from dataclasses import dataclass, field
from datetime import UTC, datetime, timedelta
from enum import Enum
from enum import StrEnum
from typing import Any
class TokenPlacement(str, Enum):
class TokenPlacement(StrEnum):
"""Where to place the access token in HTTP requests."""
HEADER_BEARER = "header_bearer"
+2 -1
View File
@@ -136,7 +136,8 @@ class EncryptedFileStorage(CredentialStorage):
from cryptography.fernet import Fernet
except ImportError as e:
raise ImportError(
"Encrypted storage requires 'cryptography'. Install with: pip install cryptography"
"Encrypted storage requires 'cryptography'. "
"Install with: uv pip install cryptography"
) from e
self.base_path = Path(base_path or self.DEFAULT_PATH).expanduser()
@@ -2,7 +2,7 @@
HashiCorp Vault storage adapter.
Provides integration with HashiCorp Vault for enterprise secret management.
Requires the 'hvac' package: pip install hvac
Requires the 'hvac' package: uv pip install hvac
"""
from __future__ import annotations
@@ -66,7 +66,7 @@ class HashiCorpVaultStorage(CredentialStorage):
- AWS IAM auth method
Requirements:
pip install hvac
uv pip install hvac
"""
def __init__(
@@ -97,7 +97,7 @@ class HashiCorpVaultStorage(CredentialStorage):
import hvac
except ImportError as e:
raise ImportError(
"HashiCorp Vault support requires 'hvac'. Install with: pip install hvac"
"HashiCorp Vault support requires 'hvac'. Install with: uv pip install hvac"
) from e
self._url = url
+28
View File
@@ -1,8 +1,22 @@
"""Graph structures: Goals, Nodes, Edges, and Flexible Execution."""
from framework.graph.client_io import (
ActiveNodeClientIO,
ClientIOGateway,
InertNodeClientIO,
NodeClientIO,
)
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.event_loop_node import (
EventLoopNode,
JudgeProtocol,
JudgeVerdict,
LoopConfig,
OutputAccumulator,
)
from framework.graph.executor import GraphExecutor
from framework.graph.flexible_executor import ExecutorConfig, FlexibleGraphExecutor
from framework.graph.goal import Constraint, Goal, GoalStatus, SuccessCriterion
@@ -77,4 +91,18 @@ __all__ = [
"NodeConversation",
"ConversationStore",
"Message",
# Event Loop
"EventLoopNode",
"LoopConfig",
"OutputAccumulator",
"JudgeProtocol",
"JudgeVerdict",
# Context Handoff
"ContextHandoff",
"HandoffContext",
# Client I/O
"NodeClientIO",
"ActiveNodeClientIO",
"InertNodeClientIO",
"ClientIOGateway",
]
+170
View File
@@ -0,0 +1,170 @@
"""
Client I/O gateway for graph nodes.
Provides the bridge between node code and external clients:
- ActiveNodeClientIO: for client_facing=True nodes (streams output, accepts input)
- InertNodeClientIO: for client_facing=False nodes (logs internally, redirects input)
- ClientIOGateway: factory that creates the right variant per node
"""
from __future__ import annotations
import asyncio
import logging
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.runtime.event_bus import EventBus
logger = logging.getLogger(__name__)
class NodeClientIO(ABC):
"""Abstract base for node client I/O."""
@abstractmethod
async def emit_output(self, content: str, is_final: bool = False) -> None:
"""Emit output content. If is_final=True, signal end of stream."""
@abstractmethod
async def request_input(self, prompt: str = "", timeout: float | None = None) -> str:
"""Request input. Behavior depends on whether the node is client-facing."""
class ActiveNodeClientIO(NodeClientIO):
"""
Client I/O for client_facing=True nodes.
- emit_output() queues content and publishes CLIENT_OUTPUT_DELTA.
- request_input() publishes CLIENT_INPUT_REQUESTED, then awaits provide_input().
- output_stream() yields queued content until the final sentinel.
"""
def __init__(
self,
node_id: str,
event_bus: EventBus | None = None,
) -> None:
self.node_id = node_id
self._event_bus = event_bus
self._output_queue: asyncio.Queue[str | None] = asyncio.Queue()
self._output_snapshot = ""
self._input_event: asyncio.Event | None = None
self._input_result: str | None = None
async def emit_output(self, content: str, is_final: bool = False) -> None:
self._output_snapshot += content
await self._output_queue.put(content)
if self._event_bus is not None:
await self._event_bus.emit_client_output_delta(
stream_id=self.node_id,
node_id=self.node_id,
content=content,
snapshot=self._output_snapshot,
)
if is_final:
await self._output_queue.put(None)
async def request_input(self, prompt: str = "", timeout: float | None = None) -> str:
if self._input_event is not None:
raise RuntimeError("request_input already pending for this node")
self._input_event = asyncio.Event()
self._input_result = None
if self._event_bus is not None:
await self._event_bus.emit_client_input_requested(
stream_id=self.node_id,
node_id=self.node_id,
prompt=prompt,
)
try:
if timeout is not None:
await asyncio.wait_for(self._input_event.wait(), timeout=timeout)
else:
await self._input_event.wait()
finally:
self._input_event = None
if self._input_result is None:
raise RuntimeError("input event was set but no input was provided")
result = self._input_result
self._input_result = None
return result
async def provide_input(self, content: str) -> None:
"""Called externally to fulfill a pending request_input()."""
if self._input_event is None:
raise RuntimeError("no pending request_input to fulfill")
self._input_result = content
self._input_event.set()
async def output_stream(self) -> AsyncIterator[str]:
"""Async iterator that yields output chunks until the final sentinel."""
while True:
chunk = await self._output_queue.get()
if chunk is None:
break
yield chunk
class InertNodeClientIO(NodeClientIO):
"""
Client I/O for client_facing=False nodes.
- emit_output() publishes NODE_INTERNAL_OUTPUT (content is not discarded).
- request_input() publishes NODE_INPUT_BLOCKED and returns a redirect string.
"""
def __init__(
self,
node_id: str,
event_bus: EventBus | None = None,
) -> None:
self.node_id = node_id
self._event_bus = event_bus
async def emit_output(self, content: str, is_final: bool = False) -> None:
if self._event_bus is not None:
await self._event_bus.emit_node_internal_output(
stream_id=self.node_id,
node_id=self.node_id,
content=content,
)
async def request_input(self, prompt: str = "", timeout: float | None = None) -> str:
if self._event_bus is not None:
await self._event_bus.emit_node_input_blocked(
stream_id=self.node_id,
node_id=self.node_id,
prompt=prompt,
)
return (
"You are an internal processing node. There is no user to interact with."
" Work with the data provided in your inputs to complete your task."
)
class ClientIOGateway:
"""Factory that creates the appropriate NodeClientIO for a node."""
def __init__(self, event_bus: EventBus | None = None) -> None:
self._event_bus = event_bus
def create_io(self, node_id: str, client_facing: bool) -> NodeClientIO:
if client_facing:
return ActiveNodeClientIO(
node_id=node_id,
event_bus=self._event_bus,
)
return InertNodeClientIO(
node_id=node_id,
event_bus=self._event_bus,
)
+191
View File
@@ -0,0 +1,191 @@
"""Context handoff: summarize a completed NodeConversation for the next graph node."""
from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any
from framework.graph.conversation import _try_extract_key
if TYPE_CHECKING:
from framework.graph.conversation import NodeConversation
from framework.llm.provider import LLMProvider
logger = logging.getLogger(__name__)
_TRUNCATE_CHARS = 500
# ---------------------------------------------------------------------------
# Data
# ---------------------------------------------------------------------------
@dataclass
class HandoffContext:
"""Structured summary of a completed node conversation."""
source_node_id: str
summary: str
key_outputs: dict[str, Any]
turn_count: int
total_tokens_used: int
# ---------------------------------------------------------------------------
# ContextHandoff
# ---------------------------------------------------------------------------
class ContextHandoff:
"""Summarize a completed NodeConversation into a HandoffContext.
Parameters
----------
llm : LLMProvider | None
Optional LLM provider for abstractive summarization.
When *None*, all summarization uses the extractive fallback.
"""
def __init__(self, llm: LLMProvider | None = None) -> None:
self.llm = llm
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def summarize_conversation(
self,
conversation: NodeConversation,
node_id: str,
output_keys: list[str] | None = None,
) -> HandoffContext:
"""Produce a HandoffContext from *conversation*.
1. Extracts turn_count & total_tokens_used (sync properties).
2. Extracts key_outputs by scanning assistant messages most-recent-first.
3. Builds a summary via the LLM (if available) or extractive fallback.
"""
turn_count = conversation.turn_count
total_tokens_used = conversation.estimate_tokens()
messages = conversation.messages # defensive copy
# --- key outputs ---------------------------------------------------
key_outputs: dict[str, Any] = {}
if output_keys:
remaining = set(output_keys)
for msg in reversed(messages):
if msg.role != "assistant" or not remaining:
continue
for key in list(remaining):
value = _try_extract_key(msg.content, key)
if value is not None:
key_outputs[key] = value
remaining.discard(key)
# --- summary -------------------------------------------------------
if self.llm is not None:
try:
summary = self._llm_summary(messages, output_keys or [])
except Exception:
logger.warning(
"LLM summarization failed; falling back to extractive.",
exc_info=True,
)
summary = self._extractive_summary(messages)
else:
summary = self._extractive_summary(messages)
return HandoffContext(
source_node_id=node_id,
summary=summary,
key_outputs=key_outputs,
turn_count=turn_count,
total_tokens_used=total_tokens_used,
)
@staticmethod
def format_as_input(handoff: HandoffContext) -> str:
"""Render *handoff* as structured plain text for the next node's input."""
header = (
f"--- CONTEXT FROM: {handoff.source_node_id} "
f"({handoff.turn_count} turns, ~{handoff.total_tokens_used} tokens) ---"
)
sections: list[str] = [header, ""]
if handoff.key_outputs:
sections.append("KEY OUTPUTS:")
for k, v in handoff.key_outputs.items():
sections.append(f"- {k}: {v}")
sections.append("")
summary_text = handoff.summary or "No summary available."
sections.append("SUMMARY:")
sections.append(summary_text)
sections.append("")
sections.append("--- END CONTEXT ---")
return "\n".join(sections)
# ------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------
@staticmethod
def _extractive_summary(messages: list) -> str:
"""Build a summary from key assistant messages without an LLM.
Strategy:
- Include the first assistant message (initial assessment).
- Include the last assistant message (final conclusion).
- Truncate each to ~500 chars.
"""
if not messages:
return "Empty conversation."
assistant_msgs = [m for m in messages if m.role == "assistant"]
if not assistant_msgs:
return "No assistant responses."
parts: list[str] = []
first = assistant_msgs[0].content
parts.append(first[:_TRUNCATE_CHARS])
if len(assistant_msgs) > 1:
last = assistant_msgs[-1].content
parts.append(last[:_TRUNCATE_CHARS])
return "\n\n".join(parts)
def _llm_summary(self, messages: list, output_keys: list[str]) -> str:
"""Produce a summary by calling the LLM provider."""
if self.llm is None:
raise ValueError("_llm_summary called without an LLM provider")
conversation_text = "\n".join(f"[{m.role}]: {m.content}" for m in messages)
key_hint = ""
if output_keys:
key_hint = (
"\nThe following output keys are especially important: "
+ ", ".join(output_keys)
+ ".\n"
)
system_prompt = (
"You are a concise summarizer. Given the conversation below, "
"produce a brief summary (at most ~500 tokens) that captures the "
"key decisions, findings, and outcomes. Focus on what was concluded "
"rather than the back-and-forth process." + key_hint
)
response = self.llm.complete(
messages=[{"role": "user", "content": conversation_text}],
system=system_prompt,
max_tokens=500,
)
return response.content.strip()
+216 -42
View File
@@ -75,6 +75,16 @@ class Message:
)
def _extract_spillover_filename(content: str) -> str | None:
"""Extract spillover filename from a truncated tool result.
Matches the pattern produced by EventLoopNode._truncate_tool_result():
"saved to 'tool_github_list_stargazers_abc123.txt'"
"""
match = re.search(r"saved to '([^']+)'", content)
return match.group(1) if match else None
# ---------------------------------------------------------------------------
# ConversationStore protocol (Phase 2)
# ---------------------------------------------------------------------------
@@ -108,6 +118,50 @@ class ConversationStore(Protocol):
# ---------------------------------------------------------------------------
def _try_extract_key(content: str, key: str) -> str | None:
"""Try 4 strategies to extract a *key*'s value from message content.
Strategies (in order):
1. Whole message is JSON ``json.loads``, check for key.
2. Embedded JSON via ``find_json_object`` helper.
3. Colon format: ``key: value``.
4. Equals format: ``key = value``.
"""
from framework.graph.node import find_json_object
# 1. Whole message is JSON
try:
parsed = json.loads(content)
if isinstance(parsed, dict) and key in parsed:
val = parsed[key]
return json.dumps(val) if not isinstance(val, str) else val
except (json.JSONDecodeError, TypeError):
pass
# 2. Embedded JSON via find_json_object
json_str = find_json_object(content)
if json_str:
try:
parsed = json.loads(json_str)
if isinstance(parsed, dict) and key in parsed:
val = parsed[key]
return json.dumps(val) if not isinstance(val, str) else val
except (json.JSONDecodeError, TypeError):
pass
# 3. Colon format: key: value
match = re.search(rf"\b{re.escape(key)}\s*:\s*(.+)", content)
if match:
return match.group(1).strip()
# 4. Equals format: key = value
match = re.search(rf"\b{re.escape(key)}\s*=\s*(.+)", content)
if match:
return match.group(1).strip()
return None
class NodeConversation:
"""Message history for a graph node with optional write-through persistence.
@@ -133,6 +187,7 @@ class NodeConversation:
self._messages: list[Message] = []
self._next_seq: int = 0
self._meta_persisted: bool = False
self._last_api_input_tokens: int | None = None
# --- Properties --------------------------------------------------------
@@ -205,14 +260,78 @@ class NodeConversation:
# --- Query -------------------------------------------------------------
def to_llm_messages(self) -> list[dict[str, Any]]:
"""Return messages as OpenAI-format dicts (system prompt excluded)."""
return [m.to_llm_dict() for m in self._messages]
"""Return messages as OpenAI-format dicts (system prompt excluded).
Automatically repairs orphaned tool_use blocks (assistant messages
with tool_calls that lack corresponding tool-result messages). This
can happen when a loop is cancelled mid-tool-execution.
"""
msgs = [m.to_llm_dict() for m in self._messages]
return self._repair_orphaned_tool_calls(msgs)
@staticmethod
def _repair_orphaned_tool_calls(
msgs: list[dict[str, Any]],
) -> list[dict[str, Any]]:
"""Ensure every tool_call has a matching tool-result message."""
repaired: list[dict[str, Any]] = []
for i, m in enumerate(msgs):
repaired.append(m)
tool_calls = m.get("tool_calls")
if m.get("role") != "assistant" or not tool_calls:
continue
# Collect IDs of tool results that follow this assistant message
answered: set[str] = set()
for j in range(i + 1, len(msgs)):
if msgs[j].get("role") == "tool":
tid = msgs[j].get("tool_call_id")
if tid:
answered.add(tid)
else:
break # stop at first non-tool message
# Patch any missing results
for tc in tool_calls:
tc_id = tc.get("id")
if tc_id and tc_id not in answered:
repaired.append(
{
"role": "tool",
"tool_call_id": tc_id,
"content": "ERROR: Tool execution was interrupted.",
}
)
return repaired
def estimate_tokens(self) -> int:
"""Rough token estimate: total characters / 4."""
"""Best available token estimate.
Uses actual API input token count when available (set via
:meth:`update_token_count`), otherwise falls back to the rough
``total_chars / 4`` heuristic.
"""
if self._last_api_input_tokens is not None:
return self._last_api_input_tokens
total_chars = sum(len(m.content) for m in self._messages)
return total_chars // 4
def update_token_count(self, actual_input_tokens: int) -> None:
"""Store actual API input token count for more accurate compaction.
Called by EventLoopNode after each LLM call with the ``input_tokens``
value from the API response. This value includes system prompt and
tool definitions, so it may be higher than a message-only estimate.
"""
self._last_api_input_tokens = actual_input_tokens
def usage_ratio(self) -> float:
"""Current token usage as a fraction of *max_history_tokens*.
Returns 0.0 when ``max_history_tokens`` is zero (unlimited).
"""
if self._max_history_tokens <= 0:
return 0.0
return self.estimate_tokens() / self._max_history_tokens
def needs_compaction(self) -> bool:
return self.estimate_tokens() >= self._max_history_tokens * self._compaction_threshold
@@ -244,42 +363,89 @@ class NodeConversation:
def _try_extract_key(self, content: str, key: str) -> str | None:
"""Try 4 strategies to extract a key's value from message content."""
from framework.graph.node import find_json_object
# 1. Whole message is JSON
try:
parsed = json.loads(content)
if isinstance(parsed, dict) and key in parsed:
val = parsed[key]
return json.dumps(val) if not isinstance(val, str) else val
except (json.JSONDecodeError, TypeError):
pass
# 2. Embedded JSON via find_json_object
json_str = find_json_object(content)
if json_str:
try:
parsed = json.loads(json_str)
if isinstance(parsed, dict) and key in parsed:
val = parsed[key]
return json.dumps(val) if not isinstance(val, str) else val
except (json.JSONDecodeError, TypeError):
pass
# 3. Colon format: key: value
match = re.search(rf"\b{re.escape(key)}\s*:\s*(.+)", content)
if match:
return match.group(1).strip()
# 4. Equals format: key = value
match = re.search(rf"\b{re.escape(key)}\s*=\s*(.+)", content)
if match:
return match.group(1).strip()
return None
return _try_extract_key(content, key)
# --- Lifecycle ---------------------------------------------------------
async def prune_old_tool_results(
self,
protect_tokens: int = 5000,
min_prune_tokens: int = 2000,
) -> int:
"""Replace old tool result content with compact placeholders.
Walks backward through messages. Recent tool results (within
*protect_tokens*) are kept intact. Older tool results have their
content replaced with a ~100-char placeholder that preserves the
spillover filename reference (if any). Message structure (role,
seq, tool_use_id) stays valid for the LLM API.
Error tool results are never pruned they prevent re-calling
failing tools.
Returns the number of messages pruned (0 if nothing was pruned).
"""
if not self._messages:
return 0
# Phase 1: Walk backward, classify tool results as protected vs pruneable
protected_tokens = 0
pruneable: list[int] = [] # indices into self._messages
pruneable_tokens = 0
for i in range(len(self._messages) - 1, -1, -1):
msg = self._messages[i]
if msg.role != "tool":
continue
if msg.is_error:
continue # never prune errors
if msg.content.startswith("[Pruned tool result"):
continue # already pruned
est = len(msg.content) // 4
if protected_tokens < protect_tokens:
protected_tokens += est
else:
pruneable.append(i)
pruneable_tokens += est
# Phase 2: Only prune if enough to be worthwhile
if pruneable_tokens < min_prune_tokens:
return 0
# Phase 3: Replace content with compact placeholder
count = 0
for i in pruneable:
msg = self._messages[i]
orig_len = len(msg.content)
spillover = _extract_spillover_filename(msg.content)
if spillover:
placeholder = (
f"[Pruned tool result: {orig_len} chars. "
f"Full data in '{spillover}'. "
f"Use load_data('{spillover}') to retrieve.]"
)
else:
placeholder = f"[Pruned tool result: {orig_len} chars cleared from context.]"
self._messages[i] = Message(
seq=msg.seq,
role=msg.role,
content=placeholder,
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
)
count += 1
if self._store:
await self._store.write_part(msg.seq, self._messages[i].to_storage_dict())
# Reset token estimate — content lengths changed
self._last_api_input_tokens = None
return count
async def compact(self, summary: str, keep_recent: int = 2) -> None:
"""Replace old messages with a summary, optionally keeping recent ones.
@@ -294,12 +460,18 @@ class NodeConversation:
# Clamp: must discard at least 1 message
keep_recent = max(0, min(keep_recent, len(self._messages) - 1))
if keep_recent > 0:
old_messages = self._messages[:-keep_recent]
recent_messages = self._messages[-keep_recent:]
else:
old_messages = self._messages
recent_messages = []
total = len(self._messages)
split = total - keep_recent if keep_recent > 0 else total
# Advance split past orphaned tool results at the boundary.
# Tool-role messages reference a tool_use from the preceding
# assistant message; if that assistant message falls into the
# compacted (old) portion the tool_result becomes invalid.
while split < total and self._messages[split].role == "tool":
split += 1
old_messages = list(self._messages[:split])
recent_messages = list(self._messages[split:])
# Extract protected values from messages being discarded
if self._output_keys:
@@ -330,6 +502,7 @@ class NodeConversation:
await self._store.write_cursor({"next_seq": self._next_seq})
self._messages = [summary_msg] + recent_messages
self._last_api_input_tokens = None # reset; next LLM call will recalibrate
async def clear(self) -> None:
"""Remove all messages, keep system prompt, preserve ``_next_seq``."""
@@ -337,6 +510,7 @@ class NodeConversation:
await self._store.delete_parts_before(self._next_seq)
await self._store.write_cursor({"next_seq": self._next_seq})
self._messages.clear()
self._last_api_input_tokens = None
def export_summary(self) -> str:
"""Structured summary with [STATS], [CONFIG], [RECENT_MESSAGES] sections."""
+29 -7
View File
@@ -21,7 +21,7 @@ allowing the LLM to evaluate whether proceeding along an edge makes sense
given the current goal, context, and execution state.
"""
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
@@ -29,7 +29,7 @@ from pydantic import BaseModel, Field
from framework.graph.safe_eval import safe_eval
class EdgeCondition(str, Enum):
class EdgeCondition(StrEnum):
"""When an edge should be traversed."""
ALWAYS = "always" # Always after source completes
@@ -156,6 +156,10 @@ class EdgeSpec(BaseModel):
memory: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
import logging
logger = logging.getLogger(__name__)
if not self.condition_expr:
return True
@@ -172,12 +176,24 @@ class EdgeSpec(BaseModel):
try:
# Safe evaluation using AST-based whitelist
return bool(safe_eval(self.condition_expr, context))
result = bool(safe_eval(self.condition_expr, context))
# Log the evaluation for visibility
# Extract the variable names used in the expression for debugging
expr_vars = {
k: repr(context[k])
for k in context
if k not in ("output", "memory", "result", "true", "false")
and k in self.condition_expr
}
logger.info(
" Edge %s: condition '%s'%s (vars: %s)",
self.id,
self.condition_expr,
result,
expr_vars or "none matched",
)
return result
except Exception as e:
# Log the error for debugging
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ Condition evaluation failed: {self.condition_expr}")
logger.warning(f" Error: {e}")
logger.warning(f" Available context keys: {list(context.keys())}")
@@ -419,6 +435,12 @@ class GraphSpec(BaseModel):
max_steps: int = Field(default=100, description="Maximum node executions before timeout")
max_retries_per_node: int = 3
# EventLoopNode configuration (from configure_loop)
loop_config: dict[str, Any] = Field(
default_factory=dict,
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
)
# Metadata
description: str = ""
created_by: str = "" # "human" or "builder_agent"
File diff suppressed because it is too large Load Diff
+269 -47
View File
@@ -11,11 +11,13 @@ The executor:
import asyncio
import logging
import warnings
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from framework.graph.edge import EdgeSpec, GraphSpec
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.goal import Goal
from framework.graph.node import (
FunctionNode,
@@ -54,6 +56,9 @@ class ExecutionResult:
had_partial_failures: bool = False # True if any node failed but eventually succeeded
execution_quality: str = "clean" # "clean", "degraded", or "failed"
# Visit tracking (for feedback/callback edges)
node_visit_counts: dict[str, int] = field(default_factory=dict) # {node_id: visit_count}
@property
def is_clean_success(self) -> bool:
"""True only if execution succeeded with no retries or failures."""
@@ -124,6 +129,10 @@ class GraphExecutor:
cleansing_config: CleansingConfig | None = None,
enable_parallel_execution: bool = True,
parallel_config: ParallelExecutionConfig | None = None,
event_bus: Any | None = None,
stream_id: str = "",
storage_path: str | Path | None = None,
loop_config: dict[str, Any] | None = None,
):
"""
Initialize the executor.
@@ -138,6 +147,10 @@ class GraphExecutor:
cleansing_config: Optional output cleansing configuration
enable_parallel_execution: Enable parallel fan-out execution (default True)
parallel_config: Configuration for parallel execution behavior
event_bus: Optional event bus for emitting node lifecycle events
stream_id: Stream ID for event correlation
storage_path: Optional base path for conversation persistence
loop_config: Optional EventLoopNode configuration (max_iterations, etc.)
"""
self.runtime = runtime
self.llm = llm
@@ -147,6 +160,10 @@ class GraphExecutor:
self.approval_callback = approval_callback
self.validator = OutputValidator()
self.logger = logging.getLogger(__name__)
self._event_bus = event_bus
self._stream_id = stream_id
self._storage_path = Path(storage_path) if storage_path else None
self._loop_config = loop_config or {}
# Initialize output cleaner
self.cleansing_config = cleansing_config or CleansingConfig()
@@ -250,6 +267,8 @@ class GraphExecutor:
total_tokens = 0
total_latency = 0
node_retry_counts: dict[str, int] = {} # Track retries per node
node_visit_counts: dict[str, int] = {} # Track visits for feedback loops
_is_retry = False # True when looping back for a retry (not a new visit)
# Determine entry point (may differ if resuming)
current_node_id = graph.get_entry_point(session_state)
@@ -269,6 +288,16 @@ class GraphExecutor:
self.logger.info(f" Goal: {goal.description}")
self.logger.info(f" Entry node: {graph.entry_node}")
# Set per-execution data_dir so data tools (save_data, load_data, etc.)
# and spillover files share the same session-scoped directory.
_ctx_token = None
if self._storage_path:
from framework.runner.tool_registry import ToolRegistry
_ctx_token = ToolRegistry.set_execution_context(
data_dir=str(self._storage_path / "data"),
)
try:
while steps < graph.max_steps:
steps += 1
@@ -278,6 +307,34 @@ class GraphExecutor:
if node_spec is None:
raise RuntimeError(f"Node not found: {current_node_id}")
# Enforce max_node_visits (feedback/callback edge support)
# Don't increment visit count on retries — retries are not new visits
if not _is_retry:
cnt = node_visit_counts.get(current_node_id, 0) + 1
node_visit_counts[current_node_id] = cnt
_is_retry = False
max_visits = getattr(node_spec, "max_node_visits", 1)
if max_visits > 0 and node_visit_counts[current_node_id] > max_visits:
self.logger.warning(
f" ⊘ Node '{node_spec.name}' visit limit reached "
f"({node_visit_counts[current_node_id]}/{max_visits}), skipping"
)
# Skip execution — follow outgoing edges using current memory
skip_result = NodeResult(success=True, output=memory.read_all())
next_node = self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
current_node_spec=node_spec,
result=skip_result,
memory=memory,
)
if next_node is None:
self.logger.info(" → No more edges after visit limit, ending")
break
current_node_id = next_node
continue
path.append(current_node_id)
# Check if pause (HITL) before execution
@@ -323,13 +380,33 @@ class GraphExecutor:
description=f"Validation errors for {current_node_id}: {validation_errors}",
)
# Emit node-started event (skip event_loop nodes — they emit their own)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_started(
stream_id=self._stream_id, node_id=current_node_id
)
# Execute node
self.logger.info(" Executing...")
result = await node_impl.execute(ctx)
# Emit node-completed event (skip event_loop nodes)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_completed(
stream_id=self._stream_id, node_id=current_node_id, iterations=1
)
if result.success:
# Validate output before accepting it
if result.output and node_spec.output_keys:
# Validate output before accepting it.
# Skip for event_loop nodes — their judge system is
# the sole acceptance mechanism (see WP-8). Empty
# strings and other flexible outputs are legitimate
# for LLM-driven nodes that already passed the judge.
if (
result.output
and node_spec.output_keys
and node_spec.node_type != "event_loop"
):
validation = self.validator.validate_all(
output=result.output,
expected_keys=node_spec.output_keys,
@@ -380,6 +457,15 @@ class GraphExecutor:
# [CORRECTED] Use node_spec.max_retries instead of hardcoded 3
max_retries = getattr(node_spec, "max_retries", 3)
# Event loop nodes handle retry internally via judge —
# executor retry is catastrophic (retry multiplication)
if node_spec.node_type == "event_loop" and max_retries > 0:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has max_retries={max_retries}. "
"Overriding to 0 — event loop nodes handle retry internally via judge."
)
max_retries = 0
if node_retry_counts[current_node_id] < max_retries:
# Retry - don't increment steps for retries
steps -= 1
@@ -395,49 +481,69 @@ class GraphExecutor:
self.logger.info(
f" ↻ Retrying ({node_retry_counts[current_node_id]}/{max_retries})..."
)
_is_retry = True
continue
else:
# Max retries exceeded - fail the execution
# Max retries exceeded - check for failure handlers
self.logger.error(
f" ✗ Max retries ({max_retries}) exceeded for node {current_node_id}"
)
self.runtime.report_problem(
severity="critical",
description=(
f"Node {current_node_id} failed after "
f"{max_retries} attempts: {result.error}"
),
)
self.runtime.end_run(
success=False,
output_data=memory.read_all(),
narrative=(
f"Failed at {node_spec.name} after "
f"{max_retries} retries: {result.error}"
),
# Check if there's an ON_FAILURE edge to follow
next_node = self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
current_node_spec=node_spec,
result=result, # result.success=False triggers ON_FAILURE
memory=memory,
)
# Calculate quality metrics
total_retries_count = sum(node_retry_counts.values())
nodes_failed = list(node_retry_counts.keys())
if next_node:
# Found a failure handler - route to it
self.logger.info(f" → Routing to failure handler: {next_node}")
current_node_id = next_node
continue # Continue execution with handler
else:
# No failure handler - terminate execution
self.runtime.report_problem(
severity="critical",
description=(
f"Node {current_node_id} failed after "
f"{max_retries} attempts: {result.error}"
),
)
self.runtime.end_run(
success=False,
output_data=memory.read_all(),
narrative=(
f"Failed at {node_spec.name} after "
f"{max_retries} retries: {result.error}"
),
)
return ExecutionResult(
success=False,
error=(
f"Node '{node_spec.name}' failed after "
f"{max_retries} attempts: {result.error}"
),
output=memory.read_all(),
steps_executed=steps,
total_tokens=total_tokens,
total_latency_ms=total_latency,
path=path,
total_retries=total_retries_count,
nodes_with_failures=nodes_failed,
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality="failed",
)
# Calculate quality metrics
total_retries_count = sum(node_retry_counts.values())
nodes_failed = list(node_retry_counts.keys())
return ExecutionResult(
success=False,
error=(
f"Node '{node_spec.name}' failed after "
f"{max_retries} attempts: {result.error}"
),
output=memory.read_all(),
steps_executed=steps,
total_tokens=total_tokens,
total_latency_ms=total_latency,
path=path,
total_retries=total_retries_count,
nodes_with_failures=nodes_failed,
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality="failed",
node_visit_counts=dict(node_visit_counts),
)
# Check if we just executed a pause node - if so, save state and return
# This must happen BEFORE determining next node, since pause nodes may have no edges
@@ -476,6 +582,7 @@ class GraphExecutor:
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality=exec_quality,
node_visit_counts=dict(node_visit_counts),
)
# Check if this is a terminal node - if so, we're done
@@ -596,6 +703,7 @@ class GraphExecutor:
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality=exec_quality,
node_visit_counts=dict(node_visit_counts),
)
except Exception as e:
@@ -622,8 +730,15 @@ class GraphExecutor:
retry_details=dict(node_retry_counts),
had_partial_failures=len(nodes_failed) > 0,
execution_quality="failed",
node_visit_counts=dict(node_visit_counts),
)
finally:
if _ctx_token is not None:
from framework.runner.tool_registry import ToolRegistry
ToolRegistry.reset_execution_context(_ctx_token)
def _build_context(
self,
node_spec: NodeSpec,
@@ -658,7 +773,15 @@ class GraphExecutor:
)
# Valid node types - no ambiguous "llm" type allowed
VALID_NODE_TYPES = {"llm_tool_use", "llm_generate", "router", "function", "human_input"}
VALID_NODE_TYPES = {
"llm_tool_use",
"llm_generate",
"router",
"function",
"human_input",
"event_loop",
}
DEPRECATED_NODE_TYPES = {"llm_tool_use": "event_loop", "llm_generate": "event_loop"}
def _get_node_implementation(
self, node_spec: NodeSpec, cleanup_llm_model: str | None = None
@@ -676,6 +799,17 @@ class GraphExecutor:
f"Use 'llm_tool_use' for nodes that call tools, 'llm_generate' for text generation."
)
# Warn on deprecated node types
if node_spec.node_type in self.DEPRECATED_NODE_TYPES:
replacement = self.DEPRECATED_NODE_TYPES[node_spec.node_type]
warnings.warn(
f"Node type '{node_spec.node_type}' is deprecated. "
f"Use '{replacement}' instead. "
f"Node: '{node_spec.id}'",
DeprecationWarning,
stacklevel=2,
)
# Create based on type
if node_spec.node_type == "llm_tool_use":
if not node_spec.tools:
@@ -713,6 +847,50 @@ class GraphExecutor:
cleanup_llm_model=cleanup_llm_model,
)
if node_spec.node_type == "event_loop":
# Auto-create EventLoopNode with sensible defaults.
# Custom configs can still be pre-registered via node_registry.
from framework.graph.event_loop_node import EventLoopNode, LoopConfig
# Create a FileConversationStore if a storage path is available
conv_store = None
if self._storage_path:
from framework.storage.conversation_store import FileConversationStore
store_path = self._storage_path / "conversations" / node_spec.id
conv_store = FileConversationStore(base_path=store_path)
# Auto-configure spillover directory for large tool results.
# When a tool result exceeds max_tool_result_chars, the full
# content is written to spillover_dir and the agent gets a
# truncated preview with instructions to use load_data().
# Uses storage_path/data which is session-scoped, matching the
# data_dir set via execution context for data tools.
spillover = None
if self._storage_path:
spillover = str(self._storage_path / "data")
lc = self._loop_config
default_max_iter = 100 if node_spec.client_facing else 50
node = EventLoopNode(
event_bus=self._event_bus,
judge=None, # implicit judge: accept when output_keys are filled
config=LoopConfig(
max_iterations=lc.get("max_iterations", default_max_iter),
max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 10),
tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
stall_detection_threshold=lc.get("stall_detection_threshold", 3),
max_history_tokens=lc.get("max_history_tokens", 32000),
max_tool_result_chars=lc.get("max_tool_result_chars", 3_000),
spillover_dir=spillover,
),
tool_executor=self.tool_executor,
conversation_store=conv_store,
)
# Cache so inject_event() is reachable for client-facing input
self.node_registry[node_spec.id] = node
return node
# Should never reach here due to validation above
raise RuntimeError(f"Unhandled node type: {node_spec.node_type}")
@@ -740,9 +918,12 @@ class GraphExecutor:
source_node_name=current_node_spec.name if current_node_spec else current_node_id,
target_node_name=target_node_spec.name if target_node_spec else edge.target,
):
# Validate and clean output before mapping inputs
# Validate and clean output before mapping inputs.
# Use full memory state (not just result.output) because
# target input_keys may come from earlier nodes in the
# graph, not only from the immediate source node.
if self.cleansing_config.enabled and target_node_spec:
output_to_validate = result.output
output_to_validate = memory.read_all()
validation = self.output_cleaner.validate_output(
output=output_to_validate,
@@ -823,6 +1004,21 @@ class GraphExecutor:
):
traversable.append(edge)
# Priority filtering for CONDITIONAL edges:
# When multiple CONDITIONAL edges match, keep only the highest-priority
# group. This prevents mutually-exclusive conditional branches (e.g.
# forward vs. feedback) from incorrectly triggering fan-out.
# ON_SUCCESS / other edge types are unaffected.
if len(traversable) > 1:
conditionals = [e for e in traversable if e.condition == EdgeCondition.CONDITIONAL]
if len(conditionals) > 1:
max_prio = max(e.priority for e in conditionals)
traversable = [
e
for e in traversable
if e.condition != EdgeCondition.CONDITIONAL or e.priority == max_prio
]
return traversable
def _find_convergence_node(
@@ -909,13 +1105,27 @@ class GraphExecutor:
branch.status = "failed"
branch.error = f"Node {branch.node_id} not found in graph"
return branch, RuntimeError(branch.error)
effective_max_retries = node_spec.max_retries
if node_spec.node_type == "event_loop":
if effective_max_retries > 1:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has "
f"max_retries={effective_max_retries}. Overriding "
"to 1 — event loop nodes handle retry internally."
)
effective_max_retries = 1
branch.status = "running"
try:
# Validate and clean output before mapping inputs (same as _follow_edges)
# Validate and clean output before mapping inputs (same as _follow_edges).
# Use full memory state since target input_keys may come
# from earlier nodes, not just the immediate source.
if self.cleansing_config.enabled and node_spec:
mem_snapshot = memory.read_all()
validation = self.output_cleaner.validate_output(
output=source_result.output,
output=mem_snapshot,
source_node_id=source_node_spec.id if source_node_spec else "unknown",
target_node_spec=node_spec,
)
@@ -926,7 +1136,7 @@ class GraphExecutor:
f"{branch.node_id}: {validation.errors}"
)
cleaned_output = self.output_cleaner.clean_output(
output=source_result.output,
output=mem_snapshot,
source_node_id=source_node_spec.id if source_node_spec else "unknown",
target_node_spec=node_spec,
validation_errors=validation.errors,
@@ -942,19 +1152,31 @@ class GraphExecutor:
# Execute with retries
last_result = None
for attempt in range(node_spec.max_retries):
for attempt in range(effective_max_retries):
branch.retry_count = attempt
# Build context for this branch
ctx = self._build_context(node_spec, memory, goal, mapped, graph.max_tokens)
node_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)
# Emit node-started event (skip event_loop nodes)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_started(
stream_id=self._stream_id, node_id=branch.node_id
)
self.logger.info(
f" ▶ Branch {node_spec.name}: executing (attempt {attempt + 1})"
)
result = await node_impl.execute(ctx)
last_result = result
# Emit node-completed event (skip event_loop nodes)
if self._event_bus and node_spec.node_type != "event_loop":
await self._event_bus.emit_node_loop_completed(
stream_id=self._stream_id, node_id=branch.node_id, iterations=1
)
if result.success:
# Write outputs to shared memory using async write
for key, value in result.output.items():
@@ -970,7 +1192,7 @@ class GraphExecutor:
self.logger.warning(
f" ↻ Branch {node_spec.name}: "
f"retry {attempt + 1}/{node_spec.max_retries}"
f"retry {attempt + 1}/{effective_max_retries}"
)
# All retries exhausted
@@ -979,7 +1201,7 @@ class GraphExecutor:
branch.result = last_result
self.logger.error(
f" ✗ Branch {node_spec.name}: "
f"failed after {node_spec.max_retries} attempts"
f"failed after {effective_max_retries} attempts"
)
return branch, last_result
+2 -2
View File
@@ -12,13 +12,13 @@ Goals are:
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class GoalStatus(str, Enum):
class GoalStatus(StrEnum):
"""Lifecycle status of a goal."""
DRAFT = "draft" # Being defined
+2 -2
View File
@@ -6,11 +6,11 @@ where agents need to gather input from humans.
"""
from dataclasses import dataclass, field
from enum import Enum
from enum import StrEnum
from typing import Any
class HITLInputType(str, Enum):
class HITLInputType(StrEnum):
"""Type of input expected from human."""
FREE_TEXT = "free_text" # Open-ended text response
+56 -4
View File
@@ -16,10 +16,12 @@ Protocol:
"""
import asyncio
import inspect
import logging
from abc import ABC, abstractmethod
from collections.abc import Callable
from dataclasses import dataclass, field
from datetime import UTC
from typing import Any
from pydantic import BaseModel, Field
@@ -153,7 +155,10 @@ class NodeSpec(BaseModel):
# Node behavior type
node_type: str = Field(
default="llm_tool_use",
description="Type: 'llm_tool_use', 'llm_generate', 'function', 'router', 'human_input'",
description=(
"Type: 'event_loop', 'function', 'router', 'human_input'. "
"Deprecated: 'llm_tool_use', 'llm_generate' (use 'event_loop' instead)."
),
)
# Data flow
@@ -205,6 +210,15 @@ class NodeSpec(BaseModel):
max_retries: int = Field(default=3)
retry_on: list[str] = Field(default_factory=list, description="Error types to retry on")
# Visit limits (for feedback/callback edges)
max_node_visits: int = Field(
default=1,
description=(
"Max times this node executes in one graph run. "
"Set >1 for feedback loops. 0 = unlimited (max_steps guards)."
),
)
# Pydantic model for output validation
output_model: type[BaseModel] | None = Field(
default=None,
@@ -218,6 +232,12 @@ class NodeSpec(BaseModel):
description="Maximum retries when Pydantic validation fails (with feedback to LLM)",
)
# Client-facing behavior
client_facing: bool = Field(
default=False,
description="If True, this node streams output to the end user and can request input.",
)
model_config = {"extra": "allow", "arbitrary_types_allowed": True}
@@ -1348,7 +1368,9 @@ Expected output keys: {output_keys}
LLM Response:
{raw_response}
Output ONLY the JSON object, nothing else."""
Output ONLY the JSON object, nothing else.
If no valid JSON object exists in the response, output exactly: {{"error": "NO_JSON_FOUND"}}
Do NOT fabricate data or return empty objects."""
try:
result = cleaner_llm.complete(
@@ -1395,6 +1417,14 @@ Output ONLY the JSON object, nothing else."""
parsed = json.loads(cleaned)
except json.JSONDecodeError:
parsed = json.loads(_fix_unescaped_newlines_in_json(cleaned))
# Validate LLM didn't return empty or fabricated data
if parsed.get("error") == "NO_JSON_FOUND":
raise ValueError("Cannot parse JSON from response")
if not parsed or parsed == {}:
raise ValueError("Cannot parse JSON from response")
if all(v is None for v in parsed.values()):
raise ValueError("Cannot parse JSON from response")
logger.info(" ✓ LLM cleaned JSON output")
return parsed
@@ -1504,6 +1534,8 @@ Output ONLY the JSON object, nothing else."""
def _build_system_prompt(self, ctx: NodeContext) -> str:
"""Build the system prompt."""
from datetime import datetime
parts = []
if ctx.node_spec.system_prompt:
@@ -1526,6 +1558,15 @@ Output ONLY the JSON object, nothing else."""
parts.append(prompt)
# Inject current datetime so LLM knows "now"
utc_dt = datetime.now(UTC)
local_dt = datetime.now().astimezone()
local_tz_name = local_dt.tzname() or "Unknown"
parts.append("\n## Runtime Context")
parts.append(f"- Current Date/Time (UTC): {utc_dt.isoformat()}")
parts.append(f"- Local Timezone: {local_tz_name}")
parts.append(f"- Current Date/Time (Local): {local_dt.isoformat()}")
if ctx.goal_context:
parts.append("\n# Goal Context")
parts.append(ctx.goal_context)
@@ -1727,8 +1768,19 @@ class FunctionNode(NodeProtocol):
start = time.time()
try:
# Call the function
result = self.func(**ctx.input_data)
# Filter input_data to only declared input_keys to prevent
# leaking extra memory keys from upstream nodes.
if ctx.node_spec.input_keys:
filtered = {
k: v for k, v in ctx.input_data.items() if k in ctx.node_spec.input_keys
}
else:
filtered = ctx.input_data
# Call the function (supports both sync and async)
result = self.func(**filtered)
if inspect.isawaitable(result):
result = await result
latency_ms = int((time.time() - start) * 1000)
+4 -1
View File
@@ -144,8 +144,11 @@ class OutputCleaner:
errors = []
warnings = []
# Check 1: Required input keys present
# Check 1: Required input keys present (skip nullable keys)
nullable = set(getattr(target_node_spec, "nullable_output_keys", None) or [])
for key in target_node_spec.input_keys:
if key in nullable:
continue
if key not in output:
errors.append(f"Missing required key: '{key}'")
continue
+6 -6
View File
@@ -11,13 +11,13 @@ The Plan is the contract between the external planner and the executor:
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ActionType(str, Enum):
class ActionType(StrEnum):
"""Types of actions a PlanStep can perform."""
LLM_CALL = "llm_call" # Call LLM for generation
@@ -27,7 +27,7 @@ class ActionType(str, Enum):
CODE_EXECUTION = "code_execution" # Execute dynamic code (sandboxed)
class StepStatus(str, Enum):
class StepStatus(StrEnum):
"""Status of a plan step."""
PENDING = "pending"
@@ -56,7 +56,7 @@ class StepStatus(str, Enum):
return self == StepStatus.COMPLETED
class ApprovalDecision(str, Enum):
class ApprovalDecision(StrEnum):
"""Human decision on a step requiring approval."""
APPROVE = "approve" # Execute as planned
@@ -91,7 +91,7 @@ class ApprovalResult(BaseModel):
model_config = {"extra": "allow"}
class JudgmentAction(str, Enum):
class JudgmentAction(StrEnum):
"""Actions the judge can take after evaluating a step."""
ACCEPT = "accept" # Step completed successfully, continue
@@ -423,7 +423,7 @@ class Plan(BaseModel):
}
class ExecutionStatus(str, Enum):
class ExecutionStatus(StrEnum):
"""Status of plan execution."""
COMPLETED = "completed"
-10
View File
@@ -75,16 +75,6 @@ class SafeEvalVisitor(ast.NodeVisitor):
def visit_Constant(self, node: ast.Constant) -> Any:
return node.value
# --- Number/String/Bytes/NameConstant (Python < 3.8 compat if needed) ---
def visit_Num(self, node: ast.Num) -> Any:
return node.n
def visit_Str(self, node: ast.Str) -> Any:
return node.s
def visit_NameConstant(self, node: ast.NameConstant) -> Any:
return node.value
# --- Data Structures ---
def visit_List(self, node: ast.List) -> list:
return [self.visit(elt) for elt in node.elts]
+5 -3
View File
@@ -126,14 +126,16 @@ class OutputValidator:
for key in expected_keys:
if key not in output:
errors.append(f"Missing required output key: '{key}'")
if key not in nullable_keys:
errors.append(f"Missing required output key: '{key}'")
elif not allow_empty:
value = output[key]
if value is None:
if key not in nullable_keys:
errors.append(f"Output key '{key}' is None")
elif isinstance(value, str) and len(value.strip()) == 0:
errors.append(f"Output key '{key}' is empty string")
if key not in nullable_keys:
errors.append(f"Output key '{key}' is empty string")
return ValidationResult(success=len(errors) == 0, errors=errors)
@@ -205,7 +207,7 @@ class OutputValidator:
def validate_no_hallucination(
self,
output: dict[str, Any],
max_length: int = 10000,
max_length: int = 50000,
) -> ValidationResult:
"""
Check for signs of LLM hallucination in output values.
+24 -1
View File
@@ -1,8 +1,31 @@
"""LLM provider abstraction."""
from framework.llm.provider import LLMProvider, LLMResponse
from framework.llm.stream_events import (
FinishEvent,
ReasoningDeltaEvent,
ReasoningStartEvent,
StreamErrorEvent,
StreamEvent,
TextDeltaEvent,
TextEndEvent,
ToolCallEvent,
ToolResultEvent,
)
__all__ = ["LLMProvider", "LLMResponse"]
__all__ = [
"LLMProvider",
"LLMResponse",
"StreamEvent",
"TextDeltaEvent",
"TextEndEvent",
"ToolCallEvent",
"ToolResultEvent",
"ReasoningStartEvent",
"ReasoningDeltaEvent",
"FinishEvent",
"StreamErrorEvent",
]
try:
from framework.llm.anthropic import AnthropicProvider # noqa: F401
+1 -1
View File
@@ -18,7 +18,7 @@ def _get_api_key_from_credential_store() -> str | None:
try:
from aden_tools.credentials import CredentialStoreAdapter
creds = CredentialStoreAdapter.with_env_storage()
creds = CredentialStoreAdapter.default()
if creds.is_available("anthropic"):
return creds.get("anthropic")
except ImportError:
+213 -5
View File
@@ -7,10 +7,11 @@ Groq, and local models.
See: https://docs.litellm.ai/docs/providers
"""
import asyncio
import json
import logging
import time
from collections.abc import Callable
from collections.abc import AsyncIterator, Callable
from datetime import datetime
from pathlib import Path
from typing import Any
@@ -23,6 +24,7 @@ except ImportError:
RateLimitError = Exception # type: ignore[assignment, misc]
from framework.llm.provider import LLMProvider, LLMResponse, Tool, ToolResult, ToolUse
from framework.llm.stream_events import StreamEvent
logger = logging.getLogger(__name__)
@@ -145,7 +147,7 @@ class LiteLLMProvider(LLMProvider):
if litellm is None:
raise ImportError(
"LiteLLM is not installed. Please install it with: pip install litellm"
"LiteLLM is not installed. Please install it with: uv pip install litellm"
)
def _completion_with_rate_limit_retry(self, **kwargs: Any) -> Any:
@@ -161,11 +163,24 @@ class LiteLLMProvider(LLMProvider):
content = response.choices[0].message.content if response.choices else None
has_tool_calls = bool(response.choices and response.choices[0].message.tool_calls)
if not content and not has_tool_calls:
# If the conversation ends with an assistant message,
# an empty response is expected — don't retry.
messages = kwargs.get("messages", [])
last_role = next(
(m["role"] for m in reversed(messages) if m.get("role") != "system"),
None,
)
if last_role == "assistant":
logger.debug(
"[retry] Empty response after assistant message — "
"expected, not retrying."
)
return response
finish_reason = (
response.choices[0].finish_reason if response.choices else "unknown"
)
# Dump full request to file for debugging
messages = kwargs.get("messages", [])
token_count, token_method = _estimate_tokens(model, messages)
dump_path = _dump_failed_request(
model=model,
@@ -378,11 +393,18 @@ class LiteLLMProvider(LLMProvider):
# Execute tools and add results.
for tool_call in message.tool_calls:
# Parse arguments
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
args = {}
# Surface error to LLM and skip tool execution
current_messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": "Invalid JSON arguments provided to tool.",
}
)
continue
tool_use = ToolUse(
id=tool_call.id,
@@ -425,3 +447,189 @@ class LiteLLMProvider(LLMProvider):
},
},
}
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator[StreamEvent]:
"""Stream a completion via litellm.acompletion(stream=True).
Yields StreamEvent objects as chunks arrive from the provider.
Tool call arguments are accumulated across chunks and yielded as
a single ToolCallEvent with fully parsed JSON when complete.
Empty responses (e.g. Gemini stealth rate-limits that return 200
with no content) are retried with exponential backoff, mirroring
the retry behaviour of ``_completion_with_rate_limit_retry``.
"""
from framework.llm.stream_events import (
FinishEvent,
StreamErrorEvent,
TextDeltaEvent,
TextEndEvent,
ToolCallEvent,
)
full_messages: list[dict[str, Any]] = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
kwargs: dict[str, Any] = {
"model": self.model,
"messages": full_messages,
"max_tokens": max_tokens,
"stream": True,
"stream_options": {"include_usage": True},
**self.extra_kwargs,
}
if self.api_key:
kwargs["api_key"] = self.api_key
if self.api_base:
kwargs["api_base"] = self.api_base
if tools:
kwargs["tools"] = [self._tool_to_openai_format(t) for t in tools]
for attempt in range(RATE_LIMIT_MAX_RETRIES + 1):
# Post-stream events (ToolCall, TextEnd, Finish) are buffered
# because they depend on the full stream. TextDeltaEvents are
# yielded immediately so callers see tokens in real time.
tail_events: list[StreamEvent] = []
accumulated_text = ""
tool_calls_acc: dict[int, dict[str, str]] = {}
input_tokens = 0
output_tokens = 0
try:
response = await litellm.acompletion(**kwargs) # type: ignore[union-attr]
async for chunk in response:
choice = chunk.choices[0] if chunk.choices else None
if not choice:
continue
delta = choice.delta
# --- Text content — yield immediately for real-time streaming ---
if delta and delta.content:
accumulated_text += delta.content
yield TextDeltaEvent(
content=delta.content,
snapshot=accumulated_text,
)
# --- Tool calls (accumulate across chunks) ---
if delta and delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index if hasattr(tc, "index") and tc.index is not None else 0
if idx not in tool_calls_acc:
tool_calls_acc[idx] = {"id": "", "name": "", "arguments": ""}
if tc.id:
tool_calls_acc[idx]["id"] = tc.id
if tc.function:
if tc.function.name:
tool_calls_acc[idx]["name"] = tc.function.name
if tc.function.arguments:
tool_calls_acc[idx]["arguments"] += tc.function.arguments
# --- Finish ---
if choice.finish_reason:
for _idx, tc_data in sorted(tool_calls_acc.items()):
try:
parsed_args = json.loads(tc_data["arguments"])
except (json.JSONDecodeError, KeyError):
parsed_args = {"_raw": tc_data.get("arguments", "")}
tail_events.append(
ToolCallEvent(
tool_use_id=tc_data["id"],
tool_name=tc_data["name"],
tool_input=parsed_args,
)
)
if accumulated_text:
tail_events.append(TextEndEvent(full_text=accumulated_text))
usage = getattr(chunk, "usage", None)
if usage:
input_tokens = getattr(usage, "prompt_tokens", 0) or 0
output_tokens = getattr(usage, "completion_tokens", 0) or 0
tail_events.append(
FinishEvent(
stop_reason=choice.finish_reason,
input_tokens=input_tokens,
output_tokens=output_tokens,
model=self.model,
)
)
# Check whether the stream produced any real content.
# (If text deltas were yielded above, has_content is True
# and we skip the retry path — nothing was yielded in vain.)
has_content = accumulated_text or tool_calls_acc
if not has_content and attempt < RATE_LIMIT_MAX_RETRIES:
# If the conversation ends with an assistant or tool
# message, an empty stream is expected — the LLM has
# nothing new to say. Don't burn retries on this;
# let the caller (EventLoopNode) decide what to do.
# Typical case: client_facing node where the LLM set
# all outputs via set_output tool calls, and the tool
# results are the last messages.
last_role = next(
(m["role"] for m in reversed(full_messages) if m.get("role") != "system"),
None,
)
if last_role in ("assistant", "tool"):
logger.debug(
"[stream] Empty response after %s message — expected, not retrying.",
last_role,
)
for event in tail_events:
yield event
return
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
token_count, token_method = _estimate_tokens(
self.model,
full_messages,
)
dump_path = _dump_failed_request(
model=self.model,
kwargs=kwargs,
error_type="empty_stream",
attempt=attempt,
)
logger.warning(
f"[stream-retry] {self.model} returned empty stream — "
f"~{token_count} tokens ({token_method}). "
f"Request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
continue
# Success (or final attempt) — flush remaining events.
for event in tail_events:
yield event
return
except RateLimitError as e:
if attempt < RATE_LIMIT_MAX_RETRIES:
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
logger.warning(
f"[stream-retry] {self.model} rate limited (429): {e!s}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
continue
yield StreamErrorEvent(error=str(e), recoverable=False)
return
except Exception as e:
yield StreamErrorEvent(error=str(e), recoverable=False)
return
+32 -1
View File
@@ -2,10 +2,16 @@
import json
import re
from collections.abc import Callable
from collections.abc import AsyncIterator, Callable
from typing import Any
from framework.llm.provider import LLMProvider, LLMResponse, Tool, ToolResult, ToolUse
from framework.llm.stream_events import (
FinishEvent,
StreamEvent,
TextDeltaEvent,
TextEndEvent,
)
class MockLLMProvider(LLMProvider):
@@ -175,3 +181,28 @@ class MockLLMProvider(LLMProvider):
output_tokens=0,
stop_reason="mock_complete",
)
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator[StreamEvent]:
"""Stream a mock completion as word-level TextDeltaEvents.
Splits the mock response into words and yields each as a separate
TextDeltaEvent with an accumulating snapshot, exercising the full
streaming pipeline without any API calls.
"""
content = self._generate_mock_response(system=system, json_mode=False)
words = content.split(" ")
accumulated = ""
for i, word in enumerate(words):
chunk = word if i == 0 else " " + word
accumulated += chunk
yield TextDeltaEvent(content=chunk, snapshot=accumulated)
yield TextEndEvent(full_text=accumulated)
yield FinishEvent(stop_reason="mock_complete", model=self.model)
+43 -1
View File
@@ -1,7 +1,7 @@
"""LLM Provider abstraction for pluggable LLM backends."""
from abc import ABC, abstractmethod
from collections.abc import Callable
from collections.abc import AsyncIterator, Callable
from dataclasses import dataclass, field
from typing import Any
@@ -108,3 +108,45 @@ class LLMProvider(ABC):
Final LLMResponse after tool use completes
"""
pass
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator["StreamEvent"]:
"""
Stream a completion as an async iterator of StreamEvents.
Default implementation wraps complete() with synthetic events.
Subclasses SHOULD override for true streaming.
Tool orchestration is the CALLER's responsibility:
- Caller detects ToolCallEvent, executes tool, adds result
to messages, calls stream() again.
"""
from framework.llm.stream_events import (
FinishEvent,
TextDeltaEvent,
TextEndEvent,
)
response = self.complete(
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
)
yield TextDeltaEvent(content=response.content, snapshot=response.content)
yield TextEndEvent(full_text=response.content)
yield FinishEvent(
stop_reason=response.stop_reason,
input_tokens=response.input_tokens,
output_tokens=response.output_tokens,
model=response.model,
)
# Deferred import target for type annotation
from framework.llm.stream_events import StreamEvent as StreamEvent # noqa: E402, F401
+96
View File
@@ -0,0 +1,96 @@
"""Stream event types for LLM streaming responses.
Defines a discriminated union of frozen dataclasses representing every event
a streaming LLM call can produce. These types form the contract between the
LLM provider layer, EventLoopNode, event bus, persistence, and monitoring.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Literal
@dataclass(frozen=True)
class TextDeltaEvent:
"""A chunk of text produced by the LLM."""
type: Literal["text_delta"] = "text_delta"
content: str = "" # this chunk's text
snapshot: str = "" # accumulated text so far
@dataclass(frozen=True)
class TextEndEvent:
"""Signals that text generation is complete."""
type: Literal["text_end"] = "text_end"
full_text: str = ""
@dataclass(frozen=True)
class ToolCallEvent:
"""The LLM has requested a tool call."""
type: Literal["tool_call"] = "tool_call"
tool_use_id: str = ""
tool_name: str = ""
tool_input: dict[str, Any] = field(default_factory=dict)
@dataclass(frozen=True)
class ToolResultEvent:
"""Result of executing a tool call."""
type: Literal["tool_result"] = "tool_result"
tool_use_id: str = ""
content: str = ""
is_error: bool = False
@dataclass(frozen=True)
class ReasoningStartEvent:
"""The LLM has started a reasoning/thinking block."""
type: Literal["reasoning_start"] = "reasoning_start"
@dataclass(frozen=True)
class ReasoningDeltaEvent:
"""A chunk of reasoning/thinking content."""
type: Literal["reasoning_delta"] = "reasoning_delta"
content: str = ""
@dataclass(frozen=True)
class FinishEvent:
"""The LLM has finished generating."""
type: Literal["finish"] = "finish"
stop_reason: str = ""
input_tokens: int = 0
output_tokens: int = 0
model: str = ""
@dataclass(frozen=True)
class StreamErrorEvent:
"""An error occurred during streaming."""
type: Literal["error"] = "error"
error: str = ""
recoverable: bool = False
# Discriminated union of all stream event types
StreamEvent = (
TextDeltaEvent
| TextEndEvent
| ToolCallEvent
| ToolResultEvent
| ReasoningStartEvent
| ReasoningDeltaEvent
| FinishEvent
| StreamErrorEvent
)
+454 -57
View File
@@ -4,25 +4,41 @@ MCP Server for Agent Building Tools
Exposes tools for building goal-driven agents via the Model Context Protocol.
Usage:
python -m framework.mcp.agent_builder_server
uv run python -m framework.mcp.agent_builder_server
"""
import json
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Annotated
from mcp.server import FastMCP
# Ensure exports/ is on sys.path so AgentRunner can import agent modules.
_framework_dir = Path(__file__).resolve().parent.parent # core/framework/ -> core/
_project_root = _framework_dir.parent # core/ -> project root
_exports_dir = _project_root / "exports"
if _exports_dir.is_dir() and str(_exports_dir) not in sys.path:
sys.path.insert(0, str(_exports_dir))
del _framework_dir, _project_root, _exports_dir
from framework.graph import Constraint, EdgeCondition, EdgeSpec, Goal, NodeSpec, SuccessCriterion
from framework.graph.plan import Plan
from mcp.server import FastMCP # noqa: E402
from framework.graph import ( # noqa: E402
Constraint,
EdgeCondition,
EdgeSpec,
Goal,
NodeSpec,
SuccessCriterion,
)
from framework.graph.plan import Plan # noqa: E402
# Testing framework imports
from framework.testing.prompts import (
from framework.testing.prompts import ( # noqa: E402
PYTEST_TEST_FILE_HEADER,
)
from framework.utils.io import atomic_write
from framework.utils.io import atomic_write # noqa: E402
# Initialize MCP server
mcp = FastMCP("agent-builder")
@@ -44,6 +60,7 @@ class BuildSession:
self.nodes: list[NodeSpec] = []
self.edges: list[EdgeSpec] = []
self.mcp_servers: list[dict] = [] # MCP server configurations
self.loop_config: dict = {} # LoopConfig parameters for EventLoopNodes
self.created_at = datetime.now().isoformat()
self.last_modified = datetime.now().isoformat()
@@ -56,6 +73,7 @@ class BuildSession:
"nodes": [n.model_dump() for n in self.nodes],
"edges": [e.model_dump() for e in self.edges],
"mcp_servers": self.mcp_servers,
"loop_config": self.loop_config,
"created_at": self.created_at,
"last_modified": self.last_modified,
}
@@ -102,6 +120,9 @@ class BuildSession:
# Restore MCP servers
session.mcp_servers = data.get("mcp_servers", [])
# Restore loop config
session.loop_config = data.get("loop_config", {})
return session
@@ -516,19 +537,63 @@ def _validate_tool_credentials(tools_list: list[str]) -> dict | None:
return None
def _validate_agent_path(agent_path: str) -> tuple[Path | None, str | None]:
"""
Validate and normalize agent_path.
Returns:
(Path, None) if valid
(None, error_json) if invalid
"""
if not agent_path:
return None, json.dumps(
{
"success": False,
"error": "agent_path is required (e.g., 'exports/my_agent')",
}
)
path = Path(agent_path)
if not path.exists():
return None, json.dumps(
{
"success": False,
"error": f"Agent path not found: {path}",
"hint": "Run export_graph to create an agent in exports/ first",
}
)
return path, None
@mcp.tool()
def add_node(
node_id: Annotated[str, "Unique identifier for the node"],
name: Annotated[str, "Human-readable name"],
description: Annotated[str, "What this node does"],
node_type: Annotated[str, "Type: llm_generate, llm_tool_use, router, or function"],
node_type: Annotated[
str,
"Type: event_loop (recommended), function, router. "
"Deprecated: llm_generate, llm_tool_use (use event_loop instead)",
],
input_keys: Annotated[str, "JSON array of keys this node reads from shared memory"],
output_keys: Annotated[str, "JSON array of keys this node writes to shared memory"],
system_prompt: Annotated[str, "Instructions for LLM nodes"] = "",
tools: Annotated[str, "JSON array of tool names for llm_tool_use nodes"] = "[]",
tools: Annotated[str, "JSON array of tool names for event_loop or llm_tool_use nodes"] = "[]",
routes: Annotated[
str, "JSON object mapping conditions to target node IDs for router nodes"
] = "{}",
client_facing: Annotated[
bool, "If True, node streams output to user and blocks for input between turns"
] = False,
nullable_output_keys: Annotated[
str, "JSON array of output keys that may remain unset (for mutually exclusive outputs)"
] = "[]",
max_node_visits: Annotated[
int,
"Max times this node executes per graph run. Set >1 for feedback loop targets. 0=unlimited",
] = 1,
) -> str:
"""Add a node to the agent graph. Nodes process inputs and produce outputs."""
session = get_session()
@@ -539,6 +604,7 @@ def add_node(
output_keys_list = json.loads(output_keys)
tools_list = json.loads(tools)
routes_dict = json.loads(routes)
nullable_output_keys_list = json.loads(nullable_output_keys)
except json.JSONDecodeError as e:
return json.dumps(
{
@@ -567,6 +633,9 @@ def add_node(
system_prompt=system_prompt or None,
tools=tools_list,
routes=routes_dict,
client_facing=client_facing,
nullable_output_keys=nullable_output_keys_list,
max_node_visits=max_node_visits,
)
session.nodes.append(node)
@@ -586,6 +655,26 @@ def add_node(
if node_type in ("llm_generate", "llm_tool_use") and not system_prompt:
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
# EventLoopNode validation
if node_type == "event_loop" and not system_prompt:
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
# Deprecated type warnings
if node_type in ("llm_generate", "llm_tool_use"):
warnings.append(
f"Node type '{node_type}' is deprecated. Use 'event_loop' instead. "
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# nullable_output_keys must be a subset of output_keys
if nullable_output_keys_list:
invalid_nullable = [k for k in nullable_output_keys_list if k not in output_keys_list]
if invalid_nullable:
errors.append(
f"nullable_output_keys {invalid_nullable} must be a subset of "
f"output_keys {output_keys_list}"
)
_save_session(session) # Auto-save
return json.dumps(
@@ -662,6 +751,7 @@ def add_edge(
# Validate
errors = []
warnings = []
if not any(n.id == source for n in session.nodes):
errors.append(f"Source node '{source}' not found")
@@ -670,12 +760,24 @@ def add_edge(
if edge_condition == EdgeCondition.CONDITIONAL and not condition_expr:
errors.append(f"Conditional edge '{edge_id}' needs condition_expr")
# Feedback edge validation
if priority < 0:
target_node = next((n for n in session.nodes if n.id == target), None)
if target_node and target_node.max_node_visits <= 1:
warnings.append(
f"Edge '{edge_id}' has negative priority (feedback edge) "
f"targeting '{target}', but node '{target}' has "
f"max_node_visits={target_node.max_node_visits}. "
"Consider increasing max_node_visits on the target node."
)
_save_session(session) # Auto-save
return json.dumps(
{
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
"edge": edge.model_dump(),
"total_edges": len(session.edges),
"approval_required": True,
@@ -709,12 +811,23 @@ def update_node(
node_id: Annotated[str, "ID of the node to update"],
name: Annotated[str, "Updated human-readable name"] = "",
description: Annotated[str, "Updated description"] = "",
node_type: Annotated[str, "Updated type: llm_generate, llm_tool_use, router, or function"] = "",
node_type: Annotated[
str,
"Updated type: event_loop (recommended), function, router. "
"Deprecated: llm_generate, llm_tool_use",
] = "",
input_keys: Annotated[str, "Updated JSON array of input keys"] = "",
output_keys: Annotated[str, "Updated JSON array of output keys"] = "",
system_prompt: Annotated[str, "Updated instructions for LLM nodes"] = "",
tools: Annotated[str, "Updated JSON array of tool names"] = "",
routes: Annotated[str, "Updated JSON object mapping conditions to target node IDs"] = "",
client_facing: Annotated[
str, "Updated client-facing flag ('true'/'false', empty=no change)"
] = "",
nullable_output_keys: Annotated[
str, "Updated JSON array of nullable output keys (empty=no change)"
] = "",
max_node_visits: Annotated[int, "Updated max node visits per graph run. 0=no change"] = 0,
) -> str:
"""Update an existing node in the agent graph. Only provided fields will be updated."""
session = get_session()
@@ -735,6 +848,9 @@ def update_node(
output_keys_list = json.loads(output_keys) if output_keys else None
tools_list = json.loads(tools) if tools else None
routes_dict = json.loads(routes) if routes else None
nullable_output_keys_list = (
json.loads(nullable_output_keys) if nullable_output_keys else None
)
except json.JSONDecodeError as e:
return json.dumps(
{
@@ -767,6 +883,12 @@ def update_node(
node.tools = tools_list
if routes_dict is not None:
node.routes = routes_dict
if client_facing:
node.client_facing = client_facing.lower() == "true"
if nullable_output_keys_list is not None:
node.nullable_output_keys = nullable_output_keys_list
if max_node_visits > 0:
node.max_node_visits = max_node_visits
# Validate
errors = []
@@ -779,6 +901,26 @@ def update_node(
if node.node_type in ("llm_generate", "llm_tool_use") and not node.system_prompt:
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
# EventLoopNode validation
if node.node_type == "event_loop" and not node.system_prompt:
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
# Deprecated type warnings
if node.node_type in ("llm_generate", "llm_tool_use"):
warnings.append(
f"Node type '{node.node_type}' is deprecated. Use 'event_loop' instead. "
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# nullable_output_keys must be a subset of output_keys
if node.nullable_output_keys:
invalid_nullable = [k for k in node.nullable_output_keys if k not in node.output_keys]
if invalid_nullable:
errors.append(
f"nullable_output_keys {invalid_nullable} must be a subset of "
f"output_keys {node.output_keys}"
)
_save_session(session) # Auto-save
return json.dumps(
@@ -979,17 +1121,30 @@ def validate_graph() -> str:
errors.append(f"Unreachable nodes: {unreachable}")
# === CONTEXT FLOW VALIDATION ===
# Build dependency map (node_id -> list of nodes it depends on)
# Build dependency maps — separate forward edges from feedback edges.
# Feedback edges (priority < 0) create cycles; they must not block the
# topological sort. Context they carry arrives on *revisits*, not on
# the first execution of a node.
feedback_edge_ids = {e.id for e in session.edges if e.priority < 0}
forward_dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
feedback_sources: dict[str, list[str]] = {node.id: [] for node in session.nodes}
# Combined map kept for error-message generation (all deps)
dependencies: dict[str, list[str]] = {node.id: [] for node in session.nodes}
for edge in session.edges:
if edge.target in dependencies:
dependencies[edge.target].append(edge.source)
if edge.target not in forward_dependencies:
continue
dependencies[edge.target].append(edge.source)
if edge.id in feedback_edge_ids:
feedback_sources[edge.target].append(edge.source)
else:
forward_dependencies[edge.target].append(edge.source)
# Build output map (node_id -> keys it produces)
node_outputs: dict[str, set[str]] = {node.id: set(node.output_keys) for node in session.nodes}
# Compute available context for each node (what keys it can read)
# Using topological order
# Using topological order on the forward-edge DAG
available_context: dict[str, set[str]] = {}
computed = set()
nodes_by_id = {n.id: n for n in session.nodes}
@@ -999,7 +1154,8 @@ def validate_graph() -> str:
# Entry nodes can only read from initial context
initial_context_keys: set[str] = set()
# Compute in topological order
# Compute in topological order (forward edges only — feedback edges
# don't block, since their context arrives on revisits)
remaining = {n.id for n in session.nodes}
max_iterations = len(session.nodes) * 2
@@ -1008,18 +1164,23 @@ def validate_graph() -> str:
break
for node_id in list(remaining):
deps = dependencies.get(node_id, [])
fwd_deps = forward_dependencies.get(node_id, [])
# Can compute if all dependencies are computed (or no dependencies)
if all(d in computed for d in deps):
# Collect outputs from all dependencies
# Can compute if all FORWARD dependencies are computed
if all(d in computed for d in fwd_deps):
# Collect outputs from all forward dependencies
available = set(initial_context_keys)
for dep_id in deps:
# Add outputs from dependency
for dep_id in fwd_deps:
available.update(node_outputs.get(dep_id, set()))
# Also add what was available to the dependency (transitive)
available.update(available_context.get(dep_id, set()))
# Also include context from already-computed feedback
# sources (bonus, not blocking)
for fb_src in feedback_sources.get(node_id, []):
if fb_src in computed:
available.update(node_outputs.get(fb_src, set()))
available.update(available_context.get(fb_src, set()))
available_context[node_id] = available
computed.add(node_id)
remaining.remove(node_id)
@@ -1029,15 +1190,37 @@ def validate_graph() -> str:
context_errors = []
context_warnings = []
missing_inputs: dict[str, list[str]] = {}
feedback_only_inputs: dict[str, list[str]] = {}
for node in session.nodes:
available = available_context.get(node.id, set())
for input_key in node.input_keys:
if input_key not in available:
if node.id not in missing_inputs:
missing_inputs[node.id] = []
missing_inputs[node.id].append(input_key)
# Check if this input is provided by a feedback source
fb_provides = set()
for fb_src in feedback_sources.get(node.id, []):
fb_provides.update(node_outputs.get(fb_src, set()))
fb_provides.update(available_context.get(fb_src, set()))
if input_key in fb_provides:
# Input arrives via feedback edge — warn, don't error
if node.id not in feedback_only_inputs:
feedback_only_inputs[node.id] = []
feedback_only_inputs[node.id].append(input_key)
else:
if node.id not in missing_inputs:
missing_inputs[node.id] = []
missing_inputs[node.id].append(input_key)
# Warn about feedback-only inputs (available on revisits, not first run)
for node_id, fb_keys in feedback_only_inputs.items():
fb_srcs = feedback_sources.get(node_id, [])
context_warnings.append(
f"Node '{node_id}' input(s) {fb_keys} are only provided via "
f"feedback edge(s) from {fb_srcs}. These will be available on "
f"revisits but not on the first execution."
)
# Generate helpful error messages
for node_id, missing in missing_inputs.items():
@@ -1117,6 +1300,87 @@ def validate_graph() -> str:
errors.extend(context_errors)
warnings.extend(context_warnings)
# === EventLoopNode-specific validation ===
from collections import defaultdict
# Detect fan-out: multiple ON_SUCCESS edges from same source
outgoing_success: dict[str, list[str]] = defaultdict(list)
for edge in session.edges:
cond = edge.condition.value if hasattr(edge.condition, "value") else edge.condition
if cond == "on_success":
outgoing_success[edge.source].append(edge.target)
for source_id, targets in outgoing_success.items():
if len(targets) > 1:
# Client-facing fan-out: cannot target multiple client_facing nodes
cf_targets = [
t for t in targets if any(n.id == t and n.client_facing for n in session.nodes)
]
if len(cf_targets) > 1:
errors.append(
f"Fan-out from '{source_id}' targets multiple client_facing "
f"nodes: {cf_targets}. Only one branch may be client-facing."
)
# Output key overlap on parallel event_loop nodes
el_targets = [
t
for t in targets
if any(n.id == t and n.node_type == "event_loop" for n in session.nodes)
]
if len(el_targets) > 1:
seen_keys: dict[str, str] = {}
for nid in el_targets:
node_obj = next((n for n in session.nodes if n.id == nid), None)
if node_obj:
for key in node_obj.output_keys:
if key in seen_keys:
errors.append(
f"Fan-out from '{source_id}': event_loop "
f"nodes '{seen_keys[key]}' and '{nid}' both "
f"write to output_key '{key}'. Parallel "
"nodes must have disjoint output_keys."
)
else:
seen_keys[key] = nid
# Feedback loop validation: targets should allow re-visits
for edge in session.edges:
if edge.priority < 0:
target_node = next((n for n in session.nodes if n.id == edge.target), None)
if target_node and target_node.max_node_visits <= 1:
warnings.append(
f"Feedback edge '{edge.id}' targets '{edge.target}' "
f"which has max_node_visits={target_node.max_node_visits}. "
"Consider setting max_node_visits > 1."
)
# nullable_output_keys must be subset of output_keys
for node in session.nodes:
if node.nullable_output_keys:
invalid = [k for k in node.nullable_output_keys if k not in node.output_keys]
if invalid:
errors.append(
f"Node '{node.id}': nullable_output_keys {invalid} "
f"must be a subset of output_keys {node.output_keys}"
)
# Deprecated node type warnings
deprecated_nodes = [
{"node_id": n.id, "type": n.node_type, "replacement": "event_loop"}
for n in session.nodes
if n.node_type in ("llm_generate", "llm_tool_use")
]
for dn in deprecated_nodes:
warnings.append(
f"Node '{dn['node_id']}' uses deprecated type '{dn['type']}'. Use 'event_loop' instead."
)
# Collect summary info
event_loop_nodes = [n.id for n in session.nodes if n.node_type == "event_loop"]
client_facing_nodes = [n.id for n in session.nodes if n.client_facing]
feedback_edges = [e.id for e in session.edges if e.priority < 0]
return json.dumps(
{
"valid": len(errors) == 0,
@@ -1133,6 +1397,10 @@ def validate_graph() -> str:
"context_flow": {node_id: list(keys) for node_id, keys in available_context.items()}
if available_context
else None,
"event_loop_nodes": event_loop_nodes,
"client_facing_nodes": client_facing_nodes,
"feedback_edges": feedback_edges,
"deprecated_node_types": deprecated_nodes,
}
)
@@ -1183,6 +1451,12 @@ def _generate_readme(session: BuildSession, export_data: dict, all_tools: set) -
if node.routes:
routes_str = ", ".join([f"{k}{v}" for k, v in node.routes.items()])
node_info.append(f" - Routes: {routes_str}")
if node.client_facing:
node_info.append(" - Client-facing: Yes (blocks for user input)")
if node.nullable_output_keys:
node_info.append(f" - Nullable outputs: `{', '.join(node.nullable_output_keys)}`")
if node.max_node_visits > 1:
node_info.append(f" - Max visits: {node.max_node_visits}")
nodes_section.append("\n".join(node_info))
# Build success criteria section
@@ -1236,7 +1510,12 @@ def _generate_readme(session: BuildSession, export_data: dict, all_tools: set) -
for edge in edges:
cond = edge.condition.value if hasattr(edge.condition, "value") else edge.condition
readme += f"- `{edge.source}` → `{edge.target}` (condition: {cond})\n"
priority_note = f", priority={edge.priority}" if edge.priority != 0 else ""
feedback_note = " **[FEEDBACK]**" if edge.priority < 0 else ""
readme += (
f"- `{edge.source}` → `{edge.target}` "
f"(condition: {cond}{priority_note}){feedback_note}\n"
)
readme += f"""
@@ -1451,6 +1730,10 @@ def export_graph() -> str:
"created_at": datetime.now().isoformat(),
}
# Include loop config if configured
if session.loop_config:
graph_spec["loop_config"] = session.loop_config
# Collect all tools referenced by nodes
all_tools = set()
for node in session.nodes:
@@ -1566,6 +1849,58 @@ def get_session_status() -> str:
"nodes": [n.id for n in session.nodes],
"edges": [(e.source, e.target) for e in session.edges],
"mcp_servers": [s["name"] for s in session.mcp_servers],
"event_loop_nodes": [n.id for n in session.nodes if n.node_type == "event_loop"],
"client_facing_nodes": [n.id for n in session.nodes if n.client_facing],
"deprecated_nodes": [
n.id for n in session.nodes if n.node_type in ("llm_generate", "llm_tool_use")
],
"feedback_edges": [e.id for e in session.edges if e.priority < 0],
}
)
@mcp.tool()
def configure_loop(
max_iterations: Annotated[int, "Maximum loop iterations per node execution (default 50)"] = 50,
max_tool_calls_per_turn: Annotated[int, "Maximum tool calls per LLM turn (default 10)"] = 10,
stall_detection_threshold: Annotated[
int, "Consecutive identical responses before stall detection triggers (default 3)"
] = 3,
max_history_tokens: Annotated[
int, "Maximum conversation history tokens before compaction (default 32000)"
] = 32000,
tool_call_overflow_margin: Annotated[
float,
"Overflow margin for max_tool_calls_per_turn. "
"Tool calls are only discarded when count exceeds "
"max_tool_calls_per_turn * (1 + margin). Default 0.5 (50% wiggle room)",
] = 0.5,
) -> str:
"""Configure event loop parameters for EventLoopNode execution.
These settings control how EventLoopNodes behave at runtime:
- max_iterations: prevents infinite loops
- max_tool_calls_per_turn: limits tool calls per LLM response
- tool_call_overflow_margin: wiggle room before tool calls are discarded (default 50%)
- stall_detection_threshold: detects when LLM repeats itself
- max_history_tokens: triggers conversation compaction
"""
session = get_session()
session.loop_config = {
"max_iterations": max_iterations,
"max_tool_calls_per_turn": max_tool_calls_per_turn,
"tool_call_overflow_margin": tool_call_overflow_margin,
"stall_detection_threshold": stall_detection_threshold,
"max_history_tokens": max_history_tokens,
}
_save_session(session)
return json.dumps(
{
"success": True,
"loop_config": session.loop_config,
}
)
@@ -1861,10 +2196,41 @@ def test_node(
result["routing_options"] = node_spec.routes
result["simulation"] = "Router would evaluate routes based on input and select target node"
elif node_spec.node_type in ("llm_generate", "llm_tool_use"):
# Show what prompt would be sent
elif node_spec.node_type == "event_loop":
# EventLoopNode simulation
result["system_prompt"] = node_spec.system_prompt
result["available_tools"] = node_spec.tools
result["client_facing"] = node_spec.client_facing
result["nullable_output_keys"] = node_spec.nullable_output_keys
result["max_node_visits"] = node_spec.max_node_visits
if mock_llm_response:
result["mock_response"] = mock_llm_response
result["simulation"] = (
"EventLoopNode would run a multi-turn streaming loop. "
"Each iteration: LLM call -> tool execution -> judge evaluation. "
"Loop continues until judge ACCEPTs or max_iterations reached."
)
else:
cf_note = (
"Node is client-facing: will block for user input between turns. "
if node_spec.client_facing
else ""
)
result["simulation"] = (
"EventLoopNode would stream LLM responses, execute tool calls, "
"and use judge evaluation to decide when to stop. "
+ cf_note
+ f"Max visits per graph run: {node_spec.max_node_visits}."
)
elif node_spec.node_type in ("llm_generate", "llm_tool_use"):
# Legacy LLM node types
result["system_prompt"] = node_spec.system_prompt
result["available_tools"] = node_spec.tools
result["deprecation_warning"] = (
f"Node type '{node_spec.node_type}' is deprecated. Use 'event_loop' instead."
)
if mock_llm_response:
result["mock_response"] = mock_llm_response
@@ -1879,6 +2245,7 @@ def test_node(
result["expected_memory_state"] = {
"inputs_available": {k: input_data.get(k, "<not provided>") for k in node_spec.input_keys},
"outputs_to_write": node_spec.output_keys,
"nullable_outputs": node_spec.nullable_output_keys or [],
}
return json.dumps(
@@ -1967,13 +2334,19 @@ def test_graph(
"writes": current_node.output_keys,
}
if current_node.node_type in ("llm_generate", "llm_tool_use"):
if current_node.node_type in ("llm_generate", "llm_tool_use", "event_loop"):
step_info["prompt_preview"] = (
current_node.system_prompt[:200] + "..."
if current_node.system_prompt and len(current_node.system_prompt) > 200
else current_node.system_prompt
)
step_info["tools_available"] = current_node.tools
if current_node.node_type == "event_loop":
step_info["event_loop_config"] = {
"client_facing": current_node.client_facing,
"max_node_visits": current_node.max_node_visits,
"nullable_output_keys": current_node.nullable_output_keys,
}
execution_trace.append(step_info)
@@ -1982,16 +2355,32 @@ def test_graph(
step_info["is_terminal"] = True
break
# Find next node via edges
# Find next node via edges (sorted by priority, highest first)
outgoing = sorted(
[e for e in session.edges if e.source == current_node_id],
key=lambda e: -e.priority,
)
next_node = None
for edge in session.edges:
if edge.source == current_node_id:
# In dry run, assume success path
if edge.condition.value in ("always", "on_success"):
next_node = edge.target
step_info["next_node"] = next_node
step_info["edge_condition"] = edge.condition.value
break
for edge in outgoing:
# In dry run, follow success/always edges (highest priority first)
if edge.condition.value in ("always", "on_success"):
next_node = edge.target
step_info["next_node"] = next_node
step_info["edge_condition"] = edge.condition.value
step_info["edge_priority"] = edge.priority
break
# Note any feedback edges from this node
feedback = [e for e in outgoing if e.priority < 0]
if feedback:
step_info["feedback_edges"] = [
{
"target": e.target,
"condition_expr": e.condition_expr,
"priority": e.priority,
}
for e in feedback
]
if next_node is None:
step_info["note"] = "No outgoing edge found (end of path)"
@@ -2597,10 +2986,11 @@ def generate_constraint_tests(
if not agent_path and _session:
agent_path = f"exports/{_session.name}"
if not agent_path:
return json.dumps({"error": "agent_path required (e.g., 'exports/my_agent')"})
path, err = _validate_agent_path(agent_path)
if err:
return err
agent_module = _get_agent_module_from_path(agent_path)
agent_module = _get_agent_module_from_path(path)
# Format constraints for display
constraints_formatted = (
@@ -2619,9 +3009,9 @@ def generate_constraint_tests(
return json.dumps(
{
"goal_id": goal_id,
"agent_path": agent_path,
"agent_path": str(path),
"agent_module": agent_module,
"output_file": f"{agent_path}/tests/test_constraints.py",
"output_file": f"{str(path)}/tests/test_constraints.py",
"constraints": [c.model_dump() for c in goal.constraints] if goal.constraints else [],
"constraints_formatted": constraints_formatted,
"test_guidelines": {
@@ -2677,10 +3067,11 @@ def generate_success_tests(
if not agent_path and _session:
agent_path = f"exports/{_session.name}"
if not agent_path:
return json.dumps({"error": "agent_path required (e.g., 'exports/my_agent')"})
path, err = _validate_agent_path(agent_path)
if err:
return err
agent_module = _get_agent_module_from_path(agent_path)
agent_module = _get_agent_module_from_path(path)
# Parse node/tool names for context
nodes = [n.strip() for n in node_names.split(",") if n.strip()]
@@ -2705,9 +3096,9 @@ def generate_success_tests(
return json.dumps(
{
"goal_id": goal_id,
"agent_path": agent_path,
"agent_path": str(path),
"agent_module": agent_module,
"output_file": f"{agent_path}/tests/test_success_criteria.py",
"output_file": f"{str(path)}/tests/test_success_criteria.py",
"success_criteria": [c.model_dump() for c in goal.success_criteria]
if goal.success_criteria
else [],
@@ -2766,7 +3157,11 @@ def run_tests(
import re
import subprocess
tests_dir = Path(agent_path) / "tests"
path, err = _validate_agent_path(agent_path)
if err:
return err
tests_dir = path / "tests"
if not tests_dir.exists():
return json.dumps(
@@ -2957,10 +3352,11 @@ def debug_test(
if not agent_path and _session:
agent_path = f"exports/{_session.name}"
if not agent_path:
return json.dumps({"error": "agent_path required (e.g., 'exports/my_agent')"})
path, err = _validate_agent_path(agent_path)
if err:
return err
tests_dir = Path(agent_path) / "tests"
tests_dir = path / "tests"
if not tests_dir.exists():
return json.dumps(
@@ -3101,10 +3497,11 @@ def list_tests(
if not agent_path and _session:
agent_path = f"exports/{_session.name}"
if not agent_path:
return json.dumps({"error": "agent_path required (e.g., 'exports/my_agent')"})
path, err = _validate_agent_path(agent_path)
if err:
return err
tests_dir = Path(agent_path) / "tests"
tests_dir = path / "tests"
if not tests_dir.exists():
return json.dumps(
@@ -3379,7 +3776,7 @@ def store_credential(
display_name: Annotated[str, "Human-readable name (e.g., 'HubSpot Access Token')"] = "",
) -> str:
"""
Store a credential securely in the encrypted credential store at ~/.hive/credentials.
Store a credential securely in the local encrypted store at ~/.hive/credentials.
Uses Fernet encryption (AES-128-CBC + HMAC). Requires HIVE_CREDENTIAL_KEY env var.
"""
@@ -3421,7 +3818,7 @@ def store_credential(
@mcp.tool()
def list_stored_credentials() -> str:
"""
List all credentials currently stored in the encrypted credential store.
List all credentials currently stored in the local encrypted store.
Returns credential IDs and metadata (never returns secret values).
"""
@@ -3461,7 +3858,7 @@ def delete_stored_credential(
credential_name: Annotated[str, "Logical credential name to delete (e.g., 'hubspot')"],
) -> str:
"""
Delete a credential from the encrypted credential store.
Delete a credential from the local encrypted store.
"""
try:
store = _get_credential_store()
+85 -28
View File
@@ -56,6 +56,18 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
action="store_true",
help="Show detailed execution logs (steps, LLM calls, etc.)",
)
run_parser.add_argument(
"--tui",
action="store_true",
help="Launch interactive terminal dashboard",
)
run_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
run_parser.set_defaults(func=cmd_run)
# info command
@@ -205,38 +217,83 @@ def cmd_run(args: argparse.Namespace) -> int:
print(f"Error reading input file: {e}", file=sys.stderr)
return 1
# Load and run agent
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=getattr(args, "model", "claude-haiku-4-5-20251001"),
)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Run the agent (with TUI or standard)
if getattr(args, "tui", False):
from framework.tui.app import AdenTUI
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
async def run_with_tui():
try:
# Load runner inside the async loop to ensure strict loop affinity
# (only one load — avoids spawning duplicate MCP subprocesses)
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=True,
)
except Exception as e:
print(f"Error loading agent: {e}")
return
context["user_id"] = os.environ.get("USER", "default_user")
# Force setup inside the loop
if runner._agent_runtime is None:
runner._setup()
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
print()
# Start runtime before TUI so it's ready for user input
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
# Run the agent
result = asyncio.run(runner.run(context))
app = AdenTUI(runner._agent_runtime)
# TUI handles execution via ChatRepl — user submits input,
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
return None
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
else:
# Standard execution — load runner here (not shared with TUI path)
try:
runner = AgentRunner.load(
args.agent_path,
mock_mode=args.mock,
model=args.model,
enable_tui=False,
)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
context["user_id"] = os.environ.get("USER", "default_user")
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
print()
result = asyncio.run(runner.run(context))
# Format output
output = {
+9
View File
@@ -362,6 +362,15 @@ class MCPClient:
# Call tool using persistent session
result = await self._session.call_tool(tool_name, arguments=arguments)
# Check for server-side errors (validation failures, tool exceptions, etc.)
if getattr(result, "isError", False):
error_text = ""
if result.content:
content_item = result.content[0]
if hasattr(content_item, "text"):
error_text = content_item.text
raise RuntimeError(f"MCP tool '{tool_name}' failed: {error_text}")
# Extract content
if result.content:
# MCP returns content as a list of content items
+215 -40
View File
@@ -28,6 +28,33 @@ logger = logging.getLogger(__name__)
# Configuration paths
HIVE_CONFIG_FILE = Path.home() / ".hive" / "configuration.json"
def _ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads it
directly so the runner (and any MCP subprocesses it spawns) can unlock the
encrypted credential store.
Only HIVE_CREDENTIAL_KEY is loaded this way all other secrets (API keys, etc.)
come from the credential store itself.
"""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
except ImportError:
pass
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
@@ -236,6 +263,15 @@ class AgentRunner:
result = await runner.run({"lead_id": "123"})
"""
@staticmethod
def _resolve_default_model() -> str:
"""Resolve the default model from ~/.hive/configuration.json."""
config = get_hive_config()
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
return "anthropic/claude-sonnet-4-20250514"
def __init__(
self,
agent_path: Path,
@@ -243,7 +279,8 @@ class AgentRunner:
goal: Goal,
mock_mode: bool = False,
storage_path: Path | None = None,
model: str = "cerebras/zai-glm-4.7",
model: str | None = None,
enable_tui: bool = False,
):
"""
Initialize the runner (use AgentRunner.load() instead).
@@ -254,14 +291,15 @@ class AgentRunner:
goal: Loaded Goal object
mock_mode: If True, use mock LLM responses
storage_path: Path for runtime storage (defaults to temp)
model: Model to use - any LiteLLM-compatible model name
(e.g., "claude-sonnet-4-20250514", "gpt-4o-mini", "gemini/gemini-pro")
model: Model to use (reads from agent config or ~/.hive/configuration.json if None)
enable_tui: If True, forces use of AgentRuntime with EventBus
"""
self.agent_path = agent_path
self.graph = graph
self.goal = goal
self.mock_mode = mock_mode
self.model = model
self.model = model or self._resolve_default_model()
self.enable_tui = enable_tui
# Set up storage
if storage_path:
@@ -275,6 +313,10 @@ class AgentRunner:
self._storage_path = default_storage
self._temp_dir = None
# Load HIVE_CREDENTIAL_KEY from shell config if not in env.
# Must happen before MCP subprocesses are spawned so they inherit it.
_ensure_credential_key_env()
# Initialize components
self._tool_registry = ToolRegistry()
self._runtime: Runtime | None = None
@@ -296,32 +338,121 @@ class AgentRunner:
if mcp_config_path.exists():
self._load_mcp_servers_from_config(mcp_config_path)
@staticmethod
def _import_agent_module(agent_path: Path):
"""Import an agent package from its directory path.
Tries package import first (works when exports/ is on sys.path,
which cli.py:_configure_paths() ensures). Falls back to direct
file import of agent.py via importlib.util.
"""
import importlib
package_name = agent_path.name
# Try importing as a package (works when exports/ is on sys.path)
try:
return importlib.import_module(package_name)
except ImportError:
pass
# Fallback: import agent.py directly via file path
import importlib.util
agent_py = agent_path / "agent.py"
if not agent_py.exists():
raise FileNotFoundError(
f"No importable agent found at {agent_path}. "
f"Expected a Python package with agent.py."
)
spec = importlib.util.spec_from_file_location(
f"{package_name}.agent",
agent_py,
submodule_search_locations=[str(agent_path)],
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
@classmethod
def load(
cls,
agent_path: str | Path,
mock_mode: bool = False,
storage_path: Path | None = None,
model: str = "cerebras/zai-glm-4.7",
model: str | None = None,
enable_tui: bool = False,
) -> "AgentRunner":
"""
Load an agent from an export folder.
Imports the agent's Python package and reads module-level variables
(goal, nodes, edges, etc.) to build a GraphSpec. Falls back to
agent.json if no Python module is found.
Args:
agent_path: Path to agent folder (containing agent.json)
agent_path: Path to agent folder
mock_mode: If True, use mock LLM responses
storage_path: Path for runtime storage (defaults to temp)
model: LLM model to use (any LiteLLM-compatible model name)
storage_path: Path for runtime storage (defaults to ~/.hive/storage/{name})
model: LLM model to use (reads from agent's default_config if None)
enable_tui: If True, forces use of AgentRuntime with EventBus
Returns:
AgentRunner instance ready to run
"""
agent_path = Path(agent_path)
# Load agent.json
# Try loading from Python module first (code-based agents)
agent_py = agent_path / "agent.py"
if agent_py.exists():
agent_module = cls._import_agent_module(agent_path)
goal = getattr(agent_module, "goal", None)
nodes = getattr(agent_module, "nodes", None)
edges = getattr(agent_module, "edges", None)
if goal is None or nodes is None or edges is None:
raise ValueError(
f"Agent at {agent_path} must define 'goal', 'nodes', and 'edges' "
f"in agent.py (or __init__.py)"
)
# Read model and max_tokens from agent's config if not explicitly provided
agent_config = getattr(agent_module, "default_config", None)
if model is None:
if agent_config and hasattr(agent_config, "model"):
model = agent_config.model
max_tokens = getattr(agent_config, "max_tokens", 1024) if agent_config else 1024
# Build GraphSpec from module-level variables
graph = GraphSpec(
id=f"{agent_path.name}-graph",
goal_id=goal.id,
version="1.0.0",
entry_node=getattr(agent_module, "entry_node", nodes[0].id),
entry_points=getattr(agent_module, "entry_points", {}),
terminal_nodes=getattr(agent_module, "terminal_nodes", []),
pause_nodes=getattr(agent_module, "pause_nodes", []),
nodes=nodes,
edges=edges,
max_tokens=max_tokens,
)
return cls(
agent_path=agent_path,
graph=graph,
goal=goal,
mock_mode=mock_mode,
storage_path=storage_path,
model=model,
enable_tui=enable_tui,
)
# Fallback: load from agent.json (legacy JSON-based agents)
agent_json_path = agent_path / "agent.json"
if not agent_json_path.exists():
raise FileNotFoundError(f"agent.json not found in {agent_path}")
raise FileNotFoundError(f"No agent.py or agent.json found in {agent_path}")
with open(agent_json_path) as f:
graph, goal = load_agent_export(f.read())
@@ -333,6 +464,7 @@ class AgentRunner:
mock_mode=mock_mode,
storage_path=storage_path,
model=model,
enable_tui=enable_tui,
)
def register_tool(
@@ -411,25 +543,8 @@ class AgentRunner:
return self._tool_registry.register_mcp_server(server_config)
def _load_mcp_servers_from_config(self, config_path: Path) -> None:
"""
Load and register MCP servers from a configuration file.
Args:
config_path: Path to mcp_servers.json file
"""
try:
with open(config_path) as f:
config = json.load(f)
servers = config.get("servers", [])
for server_config in servers:
try:
self._tool_registry.register_mcp_server(server_config)
except Exception as e:
server_name = server_config.get("name", "unknown")
logger.warning(f"Failed to register MCP server '{server_name}': {e}")
except Exception as e:
logger.warning(f"Failed to load MCP servers config from {config_path}: {e}")
"""Load and register MCP servers from a configuration file."""
self._tool_registry.load_mcp_config(config_path)
def set_approval_callback(self, callback: Callable) -> None:
"""
@@ -488,16 +603,25 @@ class AgentRunner:
api_key_env = self._get_api_key_env_var(self.model)
if api_key_env and os.environ.get(api_key_env):
self._llm = LiteLLMProvider(model=self.model)
elif api_key_env:
print(f"Warning: {api_key_env} not set. LLM calls will fail.")
print(f"Set it with: export {api_key_env}=your-api-key")
else:
# Fall back to credential store
api_key = self._get_api_key_from_credential_store()
if api_key:
self._llm = LiteLLMProvider(model=self.model, api_key=api_key)
# Set env var so downstream code (e.g. cleanup LLM in
# node._extract_json) can also find it
if api_key_env:
os.environ[api_key_env] = api_key
elif api_key_env:
print(f"Warning: {api_key_env} not set. LLM calls will fail.")
print(f"Set it with: export {api_key_env}=your-api-key")
# Get tools for executor/runtime
tools = list(self._tool_registry.get_tools().values())
tool_executor = self._tool_registry.get_executor()
if self._uses_async_entry_points:
# Multi-entry-point mode: use AgentRuntime
if self._uses_async_entry_points or self.enable_tui:
# Multi-entry-point mode or TUI mode: use AgentRuntime
self._setup_agent_runtime(tools, tool_executor)
else:
# Single-entry-point mode: use legacy GraphExecutor
@@ -535,6 +659,33 @@ class AgentRunner:
# Default: assume OpenAI-compatible
return "OPENAI_API_KEY"
def _get_api_key_from_credential_store(self) -> str | None:
"""Get the LLM API key from the encrypted credential store.
Maps model name to credential store ID (e.g. "anthropic/..." -> "anthropic")
and retrieves the key via CredentialStore.get().
"""
if not os.environ.get("HIVE_CREDENTIAL_KEY"):
return None
# Map model prefix to credential store ID
model_lower = self.model.lower()
cred_id = None
if model_lower.startswith("anthropic/") or model_lower.startswith("claude"):
cred_id = "anthropic"
# Add more mappings as providers are added to LLM_CREDENTIALS
if cred_id is None:
return None
try:
from framework.credentials import CredentialStore
store = CredentialStore.with_encrypted_storage()
return store.get(cred_id)
except Exception:
return None
def _setup_legacy_executor(self, tools: list, tool_executor: Callable | None) -> None:
"""Set up legacy single-entry-point execution using GraphExecutor."""
# Create runtime
@@ -547,6 +698,7 @@ class AgentRunner:
tools=tools,
tool_executor=tool_executor,
approval_callback=self._approval_callback,
loop_config=self.graph.loop_config,
)
def _setup_agent_runtime(self, tools: list, tool_executor: Callable | None) -> None:
@@ -566,6 +718,19 @@ class AgentRunner:
)
entry_points.append(ep)
# If TUI enabled but no entry points (single-entry agent), create default
if not entry_points and self.enable_tui and self.graph.entry_node:
logger.info("Creating default entry point for TUI")
entry_points.append(
EntryPointSpec(
id="default",
name="Default",
entry_node=self.graph.entry_node,
trigger_type="manual",
isolation_level="shared",
)
)
# Create AgentRuntime with all entry points
self._agent_runtime = create_agent_runtime(
graph=self.graph,
@@ -616,7 +781,7 @@ class AgentRunner:
error=error_msg,
)
if self._uses_async_entry_points:
if self._uses_async_entry_points or self.enable_tui:
# Multi-entry-point mode: use AgentRuntime
return await self._run_with_agent_runtime(
input_data=input_data or {},
@@ -908,15 +1073,25 @@ class AgentRunner:
EnvVarStorage,
)
# Build env mapping for fallback
# Build env mapping for credential lookup
env_mapping = {
(spec.credential_id or name): spec.env_var
for name, spec in CREDENTIAL_SPECS.items()
}
storage = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=env_mapping)],
)
# Only use EncryptedFileStorage if the encryption key is configured;
# otherwise just check env vars (avoids generating a throwaway key)
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(
primary=storages[0],
fallbacks=storages[1:],
)
store = CredentialStore(storage=storage)
# Build reverse mappings
+93 -5
View File
@@ -1,5 +1,6 @@
"""Tool discovery and registration for agent runner."""
import contextvars
import importlib.util
import inspect
import json
@@ -13,6 +14,13 @@ from framework.llm.provider import Tool, ToolResult, ToolUse
logger = logging.getLogger(__name__)
# Per-execution context overrides. Each asyncio task (and thus each
# concurrent graph execution) gets its own copy, so there are no races
# when multiple ExecutionStreams run in parallel.
_execution_context: contextvars.ContextVar[dict[str, Any] | None] = contextvars.ContextVar(
"_execution_context", default=None
)
@dataclass
class RegisteredTool:
@@ -33,6 +41,11 @@ class ToolRegistry:
4. Manually registered tools
"""
# Framework-internal context keys injected into tool calls.
# Stripped from LLM-facing schemas (the LLM doesn't know these values)
# and auto-injected at call time for tools that accept them.
CONTEXT_PARAMS = frozenset({"workspace_id", "agent_id", "session_id", "data_dir"})
def __init__(self):
self._tools: dict[str, RegisteredTool] = {}
self._mcp_clients: list[Any] = [] # List of MCPClient instances
@@ -257,6 +270,61 @@ class ToolRegistry:
"""
self._session_context.update(context)
@staticmethod
def set_execution_context(**context) -> contextvars.Token:
"""Set per-execution context overrides (concurrency-safe via contextvars).
Values set here take precedence over session context. Each asyncio
task gets its own copy, so concurrent executions don't interfere.
Returns a token that must be passed to :meth:`reset_execution_context`
to restore the previous state.
"""
current = _execution_context.get() or {}
return _execution_context.set({**current, **context})
@staticmethod
def reset_execution_context(token: contextvars.Token) -> None:
"""Restore execution context to its previous state."""
_execution_context.reset(token)
def load_mcp_config(self, config_path: Path) -> None:
"""
Load and register MCP servers from a config file.
Resolves relative ``cwd`` paths against the config file's parent
directory so callers never need to handle path resolution themselves.
Args:
config_path: Path to an ``mcp_servers.json`` file.
"""
try:
with open(config_path) as f:
config = json.load(f)
except Exception as e:
logger.warning(f"Failed to load MCP config from {config_path}: {e}")
return
base_dir = config_path.parent
# Support both formats:
# {"servers": [{"name": "x", ...}]} (list format)
# {"server-name": {"transport": ...}, ...} (dict format)
server_list = config.get("servers", [])
if not server_list and "servers" not in config:
# Treat top-level keys as server names
server_list = [{"name": name, **cfg} for name, cfg in config.items()]
for server_config in server_list:
cwd = server_config.get("cwd")
if cwd and not Path(cwd).is_absolute():
server_config["cwd"] = str((base_dir / cwd).resolve())
try:
self.register_mcp_server(server_config)
except Exception as e:
name = server_config.get("name", "unknown")
logger.warning(f"Failed to register MCP server '{name}': {e}")
def register_mcp_server(
self,
server_config: dict[str, Any],
@@ -305,15 +373,29 @@ class ToolRegistry:
# Register each tool
count = 0
for mcp_tool in client.list_tools():
# Convert MCP tool to framework Tool
# Convert MCP tool to framework Tool (strips context params from LLM schema)
tool = self._convert_mcp_tool_to_framework_tool(mcp_tool)
# Create executor that calls the MCP server
def make_mcp_executor(client_ref: MCPClient, tool_name: str, registry_ref):
def make_mcp_executor(
client_ref: MCPClient,
tool_name: str,
registry_ref,
tool_params: set[str],
):
def executor(inputs: dict) -> Any:
try:
# Inject session context for tools that need it
merged_inputs = {**registry_ref._session_context, **inputs}
# Build base context: session < execution (execution wins)
base_context = dict(registry_ref._session_context)
exec_ctx = _execution_context.get()
if exec_ctx:
base_context.update(exec_ctx)
# Only inject context params the tool accepts
filtered_context = {
k: v for k, v in base_context.items() if k in tool_params
}
merged_inputs = {**filtered_context, **inputs}
result = client_ref.call_tool(tool_name, merged_inputs)
# MCP tools return content array, extract the result
if isinstance(result, list) and len(result) > 0:
@@ -327,10 +409,11 @@ class ToolRegistry:
return executor
tool_params = set(mcp_tool.input_schema.get("properties", {}).keys())
self.register(
mcp_tool.name,
tool,
make_mcp_executor(client, mcp_tool.name, self),
make_mcp_executor(client, mcp_tool.name, self, tool_params),
)
count += 1
@@ -356,6 +439,11 @@ class ToolRegistry:
properties = input_schema.get("properties", {})
required = input_schema.get("required", [])
# Strip framework-internal context params from LLM-facing schema.
# The LLM can't know these values; they're auto-injected at call time.
properties = {k: v for k, v in properties.items() if k not in self.CONTEXT_PARAMS}
required = [r for r in required if r not in self.CONTEXT_PARAMS]
# Convert to framework Tool format
tool = Tool(
name=mcp_tool.name,
+19
View File
@@ -296,6 +296,25 @@ class AgentRuntime:
raise ValueError(f"Entry point '{entry_point_id}' not found")
return await stream.wait_for_completion(exec_id, timeout)
async def inject_input(self, node_id: str, content: str) -> bool:
"""Inject user input into a running client-facing node.
Routes input to the EventLoopNode identified by ``node_id``
across all active streams. Used by the TUI ChatRepl to deliver
user responses during client-facing node execution.
Args:
node_id: The node currently waiting for input
content: The user's input text
Returns:
True if input was delivered, False if no matching node found
"""
for stream in self._streams.values():
if await stream.inject_input(node_id, content):
return True
return False
async def get_goal_progress(self) -> dict[str, Any]:
"""
Evaluate goal progress across all streams.
+279 -2
View File
@@ -12,13 +12,13 @@ import logging
from collections.abc import Awaitable, Callable
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
logger = logging.getLogger(__name__)
class EventType(str, Enum):
class EventType(StrEnum):
"""Types of events that can be published."""
# Execution lifecycle
@@ -41,6 +41,28 @@ class EventType(str, Enum):
STREAM_STARTED = "stream_started"
STREAM_STOPPED = "stream_stopped"
# Node event-loop lifecycle
NODE_LOOP_STARTED = "node_loop_started"
NODE_LOOP_ITERATION = "node_loop_iteration"
NODE_LOOP_COMPLETED = "node_loop_completed"
# LLM streaming observability
LLM_TEXT_DELTA = "llm_text_delta"
LLM_REASONING_DELTA = "llm_reasoning_delta"
# Tool lifecycle
TOOL_CALL_STARTED = "tool_call_started"
TOOL_CALL_COMPLETED = "tool_call_completed"
# Client I/O (client_facing=True nodes only)
CLIENT_OUTPUT_DELTA = "client_output_delta"
CLIENT_INPUT_REQUESTED = "client_input_requested"
# Internal node observability (client_facing=False nodes)
NODE_INTERNAL_OUTPUT = "node_internal_output"
NODE_INPUT_BLOCKED = "node_input_blocked"
NODE_STALLED = "node_stalled"
# Custom events
CUSTOM = "custom"
@@ -51,6 +73,7 @@ class AgentEvent:
type: EventType
stream_id: str
node_id: str | None = None # Which node emitted this event
execution_id: str | None = None
data: dict[str, Any] = field(default_factory=dict)
timestamp: datetime = field(default_factory=datetime.now)
@@ -61,6 +84,7 @@ class AgentEvent:
return {
"type": self.type.value,
"stream_id": self.stream_id,
"node_id": self.node_id,
"execution_id": self.execution_id,
"data": self.data,
"timestamp": self.timestamp.isoformat(),
@@ -80,6 +104,7 @@ class Subscription:
event_types: set[EventType]
handler: EventHandler
filter_stream: str | None = None # Only receive events from this stream
filter_node: str | None = None # Only receive events from this node
filter_execution: str | None = None # Only receive events from this execution
@@ -138,6 +163,7 @@ class EventBus:
event_types: list[EventType],
handler: EventHandler,
filter_stream: str | None = None,
filter_node: str | None = None,
filter_execution: str | None = None,
) -> str:
"""
@@ -147,6 +173,7 @@ class EventBus:
event_types: Types of events to receive
handler: Async function to call when event occurs
filter_stream: Only receive events from this stream
filter_node: Only receive events from this node
filter_execution: Only receive events from this execution
Returns:
@@ -160,6 +187,7 @@ class EventBus:
event_types=set(event_types),
handler=handler,
filter_stream=filter_stream,
filter_node=filter_node,
filter_execution=filter_execution,
)
@@ -218,6 +246,10 @@ class EventBus:
if subscription.filter_stream and subscription.filter_stream != event.stream_id:
return False
# Check node filter
if subscription.filter_node and subscription.filter_node != event.node_id:
return False
# Check execution filter
if subscription.filter_execution and subscription.filter_execution != event.execution_id:
return False
@@ -359,6 +391,248 @@ class EventBus:
)
)
# === NODE EVENT-LOOP PUBLISHERS ===
async def emit_node_loop_started(
self,
stream_id: str,
node_id: str,
execution_id: str | None = None,
max_iterations: int | None = None,
) -> None:
"""Emit node loop started event."""
await self.publish(
AgentEvent(
type=EventType.NODE_LOOP_STARTED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"max_iterations": max_iterations},
)
)
async def emit_node_loop_iteration(
self,
stream_id: str,
node_id: str,
iteration: int,
execution_id: str | None = None,
) -> None:
"""Emit node loop iteration event."""
await self.publish(
AgentEvent(
type=EventType.NODE_LOOP_ITERATION,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"iteration": iteration},
)
)
async def emit_node_loop_completed(
self,
stream_id: str,
node_id: str,
iterations: int,
execution_id: str | None = None,
) -> None:
"""Emit node loop completed event."""
await self.publish(
AgentEvent(
type=EventType.NODE_LOOP_COMPLETED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"iterations": iterations},
)
)
# === LLM STREAMING PUBLISHERS ===
async def emit_llm_text_delta(
self,
stream_id: str,
node_id: str,
content: str,
snapshot: str,
execution_id: str | None = None,
) -> None:
"""Emit LLM text delta event."""
await self.publish(
AgentEvent(
type=EventType.LLM_TEXT_DELTA,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"content": content, "snapshot": snapshot},
)
)
async def emit_llm_reasoning_delta(
self,
stream_id: str,
node_id: str,
content: str,
execution_id: str | None = None,
) -> None:
"""Emit LLM reasoning delta event."""
await self.publish(
AgentEvent(
type=EventType.LLM_REASONING_DELTA,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"content": content},
)
)
# === TOOL LIFECYCLE PUBLISHERS ===
async def emit_tool_call_started(
self,
stream_id: str,
node_id: str,
tool_use_id: str,
tool_name: str,
tool_input: dict[str, Any] | None = None,
execution_id: str | None = None,
) -> None:
"""Emit tool call started event."""
await self.publish(
AgentEvent(
type=EventType.TOOL_CALL_STARTED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"tool_use_id": tool_use_id,
"tool_name": tool_name,
"tool_input": tool_input or {},
},
)
)
async def emit_tool_call_completed(
self,
stream_id: str,
node_id: str,
tool_use_id: str,
tool_name: str,
result: str = "",
is_error: bool = False,
execution_id: str | None = None,
) -> None:
"""Emit tool call completed event."""
await self.publish(
AgentEvent(
type=EventType.TOOL_CALL_COMPLETED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"tool_use_id": tool_use_id,
"tool_name": tool_name,
"result": result,
"is_error": is_error,
},
)
)
# === CLIENT I/O PUBLISHERS ===
async def emit_client_output_delta(
self,
stream_id: str,
node_id: str,
content: str,
snapshot: str,
execution_id: str | None = None,
) -> None:
"""Emit client output delta event (client_facing=True nodes)."""
await self.publish(
AgentEvent(
type=EventType.CLIENT_OUTPUT_DELTA,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"content": content, "snapshot": snapshot},
)
)
async def emit_client_input_requested(
self,
stream_id: str,
node_id: str,
prompt: str = "",
execution_id: str | None = None,
) -> None:
"""Emit client input requested event (client_facing=True nodes)."""
await self.publish(
AgentEvent(
type=EventType.CLIENT_INPUT_REQUESTED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"prompt": prompt},
)
)
# === INTERNAL NODE PUBLISHERS ===
async def emit_node_internal_output(
self,
stream_id: str,
node_id: str,
content: str,
execution_id: str | None = None,
) -> None:
"""Emit node internal output event (client_facing=False nodes)."""
await self.publish(
AgentEvent(
type=EventType.NODE_INTERNAL_OUTPUT,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"content": content},
)
)
async def emit_node_stalled(
self,
stream_id: str,
node_id: str,
reason: str = "",
execution_id: str | None = None,
) -> None:
"""Emit node stalled event."""
await self.publish(
AgentEvent(
type=EventType.NODE_STALLED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"reason": reason},
)
)
async def emit_node_input_blocked(
self,
stream_id: str,
node_id: str,
prompt: str = "",
execution_id: str | None = None,
) -> None:
"""Emit node input blocked event."""
await self.publish(
AgentEvent(
type=EventType.NODE_INPUT_BLOCKED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"prompt": prompt},
)
)
# === QUERY OPERATIONS ===
def get_history(
@@ -410,6 +684,7 @@ class EventBus:
self,
event_type: EventType,
stream_id: str | None = None,
node_id: str | None = None,
execution_id: str | None = None,
timeout: float | None = None,
) -> AgentEvent | None:
@@ -419,6 +694,7 @@ class EventBus:
Args:
event_type: Type of event to wait for
stream_id: Filter by stream
node_id: Filter by node
execution_id: Filter by execution
timeout: Maximum time to wait (seconds)
@@ -438,6 +714,7 @@ class EventBus:
event_types=[event_type],
handler=handler,
filter_stream=stream_id,
filter_node=node_id,
filter_execution=execution_id,
)
+32 -1
View File
@@ -153,6 +153,7 @@ class ExecutionStream:
# Execution tracking
self._active_executions: dict[str, ExecutionContext] = {}
self._execution_tasks: dict[str, asyncio.Task] = {}
self._active_executors: dict[str, GraphExecutor] = {}
self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
self._execution_result_times: dict[str, float] = {}
self._completion_events: dict[str, asyncio.Event] = {}
@@ -237,6 +238,21 @@ class ExecutionStream:
)
)
async def inject_input(self, node_id: str, content: str) -> bool:
"""Inject user input into a running client-facing EventLoopNode.
Searches active executors for a node matching ``node_id`` and calls
its ``inject_event()`` method to unblock ``_await_user_input()``.
Returns True if input was delivered, False otherwise.
"""
for executor in self._active_executors.values():
node = executor.node_registry.get(node_id)
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(content)
return True
return False
async def execute(
self,
input_data: dict[str, Any],
@@ -314,13 +330,25 @@ class ExecutionStream:
# Create runtime adapter for this execution
runtime_adapter = StreamRuntimeAdapter(self._runtime, execution_id)
# Create executor for this execution
# Create executor for this execution.
# Each execution gets its own storage under sessions/{exec_id}/
# so conversations, spillover, and data files are all scoped
# to this execution. The executor sets data_dir via execution
# context (contextvars) so data tools and spillover share the
# same session-scoped directory.
exec_storage = self._storage.base_path / "sessions" / execution_id
executor = GraphExecutor(
runtime=runtime_adapter,
llm=self._llm,
tools=self._tools,
tool_executor=self._tool_executor,
event_bus=self._event_bus,
stream_id=self.stream_id,
storage_path=exec_storage,
loop_config=self.graph.loop_config,
)
# Track executor so inject_input() can reach EventLoopNode instances
self._active_executors[execution_id] = executor
# Create modified graph with entry point
# We need to override the entry_node to use our entry point
@@ -334,6 +362,9 @@ class ExecutionStream:
session_state=ctx.session_state,
)
# Clean up executor reference
self._active_executors.pop(execution_id, None)
# Store result with retention
self._record_execution_result(execution_id, result)
+3 -3
View File
@@ -11,13 +11,13 @@ import asyncio
import logging
import time
from dataclasses import dataclass, field
from enum import Enum
from enum import StrEnum
from typing import Any
logger = logging.getLogger(__name__)
class IsolationLevel(str, Enum):
class IsolationLevel(StrEnum):
"""State isolation level for concurrent executions."""
ISOLATED = "isolated" # Private state per execution
@@ -25,7 +25,7 @@ class IsolationLevel(str, Enum):
SYNCHRONIZED = "synchronized" # Shared with write locks (strong consistency)
class StateScope(str, Enum):
class StateScope(StrEnum):
"""Scope for state operations."""
EXECUTION = "execution" # Local to a single execution
+2 -2
View File
@@ -10,13 +10,13 @@ This is MORE important than actions because:
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field, computed_field
class DecisionType(str, Enum):
class DecisionType(StrEnum):
"""Types of decisions an agent can make."""
TOOL_SELECTION = "tool_selection" # Which tool to use
+2 -2
View File
@@ -6,7 +6,7 @@ summaries and metrics that Builder needs to understand what happened.
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field, computed_field
@@ -14,7 +14,7 @@ from pydantic import BaseModel, Field, computed_field
from framework.schemas.decision import Decision, Outcome
class RunStatus(str, Enum):
class RunStatus(StrEnum):
"""Status of a run."""
RUNNING = "running"
+11 -3
View File
@@ -167,14 +167,18 @@ class ConcurrentStorage:
run: Run to save
immediate: If True, save immediately (bypasses batching)
"""
# Invalidate summary cache since the run data is changing
# This ensures load_summary() fetches fresh data after the save
self._cache.pop(f"summary:{run.id}", None)
if immediate or not self._running:
await self._save_run_locked(run)
# Update cache only after successful immediate write
self._cache[f"run:{run.id}"] = CacheEntry(run, time.time())
else:
# For batched writes, cache will be updated in _flush_batch after successful write
await self._write_queue.put(("run", run))
# Update cache
self._cache[f"run:{run.id}"] = CacheEntry(run, time.time())
async def _save_run_locked(self, run: Run) -> None:
"""Save a run with file locking, including index locks."""
lock_key = f"run:{run.id}"
@@ -363,8 +367,12 @@ class ConcurrentStorage:
try:
if item_type == "run":
await self._save_run_locked(item)
# Update cache only after successful batched write
# This fixes the race condition where cache was updated before write completed
self._cache[f"run:{item.id}"] = CacheEntry(item, time.time())
except Exception as e:
logger.error(f"Failed to save {item_type}: {e}")
# Cache is NOT updated on failure - prevents stale/inconsistent cache state
async def _flush_pending(self) -> None:
"""Flush all pending writes."""
+3 -3
View File
@@ -26,9 +26,9 @@ Testing tools are integrated into the main agent_builder_server.py:
## CLI Commands
```bash
python -m framework test-run <agent_path> --goal <goal_id>
python -m framework test-debug <goal_id> <test_id>
python -m framework test-list <agent_path> --goal <goal_id>
uv run python -m framework test-run <agent_path> --goal <goal_id>
uv run python -m framework test-debug <goal_id> <test_id>
uv run python -m framework test-list <agent_path> --goal <goal_id>
```
"""
+2 -2
View File
@@ -6,13 +6,13 @@ programmatic/MCP-based approval.
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ApprovalAction(str, Enum):
class ApprovalAction(StrEnum):
"""Actions a user can take on a generated test."""
APPROVE = "approve" # Accept as-is
+2 -2
View File
@@ -24,7 +24,7 @@ def _get_api_key():
# 1. Try CredentialStoreAdapter for Anthropic
try:
from aden_tools.credentials import CredentialStoreAdapter
creds = CredentialStoreAdapter.with_env_storage()
creds = CredentialStoreAdapter.default()
if creds.is_available("anthropic"):
return creds.get("anthropic")
except (ImportError, KeyError):
@@ -57,7 +57,7 @@ def _get_api_key():
"""Get API key from CredentialStoreAdapter or environment."""
try:
from aden_tools.credentials import CredentialStoreAdapter
creds = CredentialStoreAdapter.with_env_storage()
creds = CredentialStoreAdapter.default()
if creds.is_available("anthropic"):
return creds.get("anthropic")
except (ImportError, KeyError):
+3 -3
View File
@@ -6,13 +6,13 @@ but require mandatory user approval before being stored.
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ApprovalStatus(str, Enum):
class ApprovalStatus(StrEnum):
"""Status of user approval for a generated test."""
PENDING = "pending" # Awaiting user review
@@ -21,7 +21,7 @@ class ApprovalStatus(str, Enum):
REJECTED = "rejected" # User declined (with reason)
class TestType(str, Enum):
class TestType(StrEnum):
"""Type of test based on what it validates."""
__test__ = False # Not a pytest test class
+2 -2
View File
@@ -6,13 +6,13 @@ categorization for guiding iteration strategy.
"""
from datetime import datetime
from enum import Enum
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ErrorCategory(str, Enum):
class ErrorCategory(StrEnum):
"""
Category of test failure for guiding iteration.
+518
View File
@@ -0,0 +1,518 @@
import logging
import time
from textual.app import App, ComposeResult
from textual.binding import Binding
from textual.containers import Container, Horizontal, Vertical
from textual.widgets import Footer, Label
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.chat_repl import ChatRepl
from framework.tui.widgets.graph_view import GraphOverview
from framework.tui.widgets.log_pane import LogPane
class StatusBar(Container):
"""Live status bar showing agent execution state."""
DEFAULT_CSS = """
StatusBar {
dock: top;
height: 1;
background: $panel;
color: $text;
padding: 0 1;
}
StatusBar > Label {
width: 100%;
}
"""
def __init__(self, graph_id: str = ""):
super().__init__()
self._graph_id = graph_id
self._state = "idle"
self._active_node: str | None = None
self._node_detail: str = ""
self._start_time: float | None = None
self._final_elapsed: float | None = None
def compose(self) -> ComposeResult:
yield Label(id="status-content")
def on_mount(self) -> None:
self._refresh()
self.set_interval(1.0, self._refresh)
def _format_elapsed(self, seconds: float) -> str:
total = int(seconds)
hours, remainder = divmod(total, 3600)
mins, secs = divmod(remainder, 60)
if hours:
return f"{hours}:{mins:02d}:{secs:02d}"
return f"{mins}:{secs:02d}"
def _refresh(self) -> None:
parts: list[str] = []
if self._graph_id:
parts.append(f"[bold]{self._graph_id}[/bold]")
if self._state == "idle":
parts.append("[dim]○ idle[/dim]")
elif self._state == "running":
parts.append("[bold green]● running[/bold green]")
elif self._state == "completed":
parts.append("[green]✓ done[/green]")
elif self._state == "failed":
parts.append("[bold red]✗ failed[/bold red]")
if self._active_node:
node_str = f"[cyan]{self._active_node}[/cyan]"
if self._node_detail:
node_str += f" [dim]({self._node_detail})[/dim]"
parts.append(node_str)
if self._state == "running" and self._start_time:
parts.append(f"[dim]{self._format_elapsed(time.time() - self._start_time)}[/dim]")
elif self._final_elapsed is not None:
parts.append(f"[dim]{self._format_elapsed(self._final_elapsed)}[/dim]")
try:
label = self.query_one("#status-content", Label)
label.update("".join(parts))
except Exception:
pass
def set_graph_id(self, graph_id: str) -> None:
self._graph_id = graph_id
self._refresh()
def set_running(self, entry_node: str = "") -> None:
self._state = "running"
self._active_node = entry_node or None
self._node_detail = ""
self._start_time = time.time()
self._final_elapsed = None
self._refresh()
def set_completed(self) -> None:
self._state = "completed"
if self._start_time:
self._final_elapsed = time.time() - self._start_time
self._active_node = None
self._node_detail = ""
self._start_time = None
self._refresh()
def set_failed(self, error: str = "") -> None:
self._state = "failed"
if self._start_time:
self._final_elapsed = time.time() - self._start_time
self._node_detail = error[:40] if error else ""
self._start_time = None
self._refresh()
def set_active_node(self, node_id: str, detail: str = "") -> None:
self._active_node = node_id
self._node_detail = detail
self._refresh()
def set_node_detail(self, detail: str) -> None:
self._node_detail = detail
self._refresh()
class AdenTUI(App):
TITLE = "Aden TUI Dashboard"
COMMAND_PALETTE_BINDING = "ctrl+o"
CSS = """
Screen {
layout: vertical;
background: $surface;
}
#left-pane {
width: 60%;
height: 100%;
layout: vertical;
background: $surface;
}
GraphOverview {
height: 40%;
background: $panel;
padding: 0;
}
LogPane {
height: 60%;
background: $surface;
padding: 0;
margin-bottom: 1;
}
ChatRepl {
width: 40%;
height: 100%;
background: $panel;
border-left: tall $primary;
padding: 0;
}
#chat-history {
height: 1fr;
width: 100%;
background: $surface;
border: none;
scrollbar-background: $panel;
scrollbar-color: $primary;
}
RichLog {
background: $surface;
border: none;
scrollbar-background: $panel;
scrollbar-color: $primary;
}
Input {
background: $surface;
border: tall $primary;
margin-top: 1;
}
Input:focus {
border: tall $accent;
}
StatusBar {
background: $panel;
color: $text;
height: 1;
padding: 0 1;
}
Footer {
background: $panel;
color: $text-muted;
}
"""
BINDINGS = [
Binding("q", "quit", "Quit"),
Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
Binding("tab", "focus_next", "Next Panel", show=True),
Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
]
def __init__(self, runtime: AgentRuntime):
super().__init__()
self.runtime = runtime
self.log_pane = LogPane()
self.graph_view = GraphOverview(runtime)
self.chat_repl = ChatRepl(runtime)
self.status_bar = StatusBar(graph_id=runtime.graph.id)
self.is_ready = False
def compose(self) -> ComposeResult:
yield self.status_bar
yield Horizontal(
Vertical(
self.log_pane,
self.graph_view,
id="left-pane",
),
self.chat_repl,
)
yield Footer()
async def on_mount(self) -> None:
"""Called when app starts."""
self.title = "Aden TUI Dashboard"
# Add logging setup
self._setup_logging_queue()
# Set ready immediately so _poll_logs can process messages
self.is_ready = True
# Add event subscription with delay to ensure TUI is fully initialized
self.call_later(self._init_runtime_connection)
# Delay initial log messages until layout is fully rendered
def write_initial_logs():
logging.info("TUI Dashboard initialized successfully")
logging.info("Waiting for agent execution to start...")
# Wait for layout to be fully rendered before writing logs
self.set_timer(0.2, write_initial_logs)
def _setup_logging_queue(self) -> None:
"""Setup a thread-safe queue for logs."""
try:
import queue
from logging.handlers import QueueHandler
self.log_queue = queue.Queue()
self.queue_handler = QueueHandler(self.log_queue)
self.queue_handler.setLevel(logging.INFO)
# Get root logger
root_logger = logging.getLogger()
# Remove ALL existing handlers to prevent stdout output
# This is critical - StreamHandlers cause text to appear in header
for handler in root_logger.handlers[:]:
root_logger.removeHandler(handler)
# Add ONLY our queue handler
root_logger.addHandler(self.queue_handler)
root_logger.setLevel(logging.INFO)
# Suppress LiteLLM logging completely
litellm_logger = logging.getLogger("LiteLLM")
litellm_logger.setLevel(logging.CRITICAL) # Only show critical errors
litellm_logger.propagate = False # Don't propagate to root logger
# Start polling
self.set_interval(0.1, self._poll_logs)
except Exception:
pass
def _poll_logs(self) -> None:
"""Poll the log queue and update UI."""
if not self.is_ready:
return
try:
while not self.log_queue.empty():
record = self.log_queue.get_nowait()
# Filter out framework/library logs
if record.name.startswith(("textual", "LiteLLM", "litellm")):
continue
self.log_pane.write_python_log(record)
except Exception:
pass
_EVENT_TYPES = [
EventType.LLM_TEXT_DELTA,
EventType.CLIENT_OUTPUT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
EventType.EXECUTION_STARTED,
EventType.EXECUTION_COMPLETED,
EventType.EXECUTION_FAILED,
EventType.NODE_LOOP_STARTED,
EventType.NODE_LOOP_ITERATION,
EventType.NODE_LOOP_COMPLETED,
EventType.CLIENT_INPUT_REQUESTED,
EventType.NODE_STALLED,
EventType.GOAL_PROGRESS,
EventType.GOAL_ACHIEVED,
EventType.CONSTRAINT_VIOLATION,
EventType.STATE_CHANGED,
EventType.NODE_INPUT_BLOCKED,
]
_LOG_PANE_EVENTS = frozenset(_EVENT_TYPES) - {
EventType.LLM_TEXT_DELTA,
EventType.CLIENT_OUTPUT_DELTA,
}
async def _init_runtime_connection(self) -> None:
"""Subscribe to runtime events with an async handler."""
try:
self._subscription_id = self.runtime.subscribe_to_events(
event_types=self._EVENT_TYPES,
handler=self._handle_event,
)
except Exception:
pass
async def _handle_event(self, event: AgentEvent) -> None:
"""Called from the agent thread — bridge to Textual's main thread."""
try:
self.call_from_thread(self._route_event, event)
except Exception:
pass
def _route_event(self, event: AgentEvent) -> None:
"""Route incoming events to widgets. Runs on Textual's main thread."""
if not self.is_ready:
return
try:
et = event.type
# --- Chat REPL events ---
if et in (EventType.LLM_TEXT_DELTA, EventType.CLIENT_OUTPUT_DELTA):
self.chat_repl.handle_text_delta(
event.data.get("content", ""),
event.data.get("snapshot", ""),
)
elif et == EventType.TOOL_CALL_STARTED:
self.chat_repl.handle_tool_started(
event.data.get("tool_name", "unknown"),
event.data.get("tool_input", {}),
)
elif et == EventType.TOOL_CALL_COMPLETED:
self.chat_repl.handle_tool_completed(
event.data.get("tool_name", "unknown"),
event.data.get("result", ""),
event.data.get("is_error", False),
)
elif et == EventType.EXECUTION_COMPLETED:
self.chat_repl.handle_execution_completed(event.data.get("output", {}))
elif et == EventType.EXECUTION_FAILED:
self.chat_repl.handle_execution_failed(event.data.get("error", "Unknown error"))
elif et == EventType.CLIENT_INPUT_REQUESTED:
self.chat_repl.handle_input_requested(
event.node_id or event.data.get("node_id", ""),
)
# --- Graph view events ---
if et in (
EventType.EXECUTION_STARTED,
EventType.EXECUTION_COMPLETED,
EventType.EXECUTION_FAILED,
):
self.graph_view.update_execution(event)
if et == EventType.NODE_LOOP_STARTED:
self.graph_view.handle_node_loop_started(event.node_id or "")
elif et == EventType.NODE_LOOP_ITERATION:
self.graph_view.handle_node_loop_iteration(
event.node_id or "",
event.data.get("iteration", 0),
)
elif et == EventType.NODE_LOOP_COMPLETED:
self.graph_view.handle_node_loop_completed(event.node_id or "")
elif et == EventType.NODE_STALLED:
self.graph_view.handle_stalled(
event.node_id or "",
event.data.get("reason", ""),
)
if et == EventType.TOOL_CALL_STARTED:
self.graph_view.handle_tool_call(
event.node_id or "",
event.data.get("tool_name", "unknown"),
started=True,
)
elif et == EventType.TOOL_CALL_COMPLETED:
self.graph_view.handle_tool_call(
event.node_id or "",
event.data.get("tool_name", "unknown"),
started=False,
)
# --- Status bar events ---
if et == EventType.EXECUTION_STARTED:
entry_node = event.data.get("entry_node") or (
self.runtime.graph.entry_node if self.runtime else ""
)
self.status_bar.set_running(entry_node)
elif et == EventType.EXECUTION_COMPLETED:
self.status_bar.set_completed()
elif et == EventType.EXECUTION_FAILED:
self.status_bar.set_failed(event.data.get("error", ""))
elif et == EventType.NODE_LOOP_STARTED:
self.status_bar.set_active_node(event.node_id or "", "thinking...")
elif et == EventType.NODE_LOOP_ITERATION:
self.status_bar.set_node_detail(f"step {event.data.get('iteration', '?')}")
elif et == EventType.TOOL_CALL_STARTED:
self.status_bar.set_node_detail(f"{event.data.get('tool_name', '')}...")
elif et == EventType.TOOL_CALL_COMPLETED:
self.status_bar.set_node_detail("thinking...")
elif et == EventType.NODE_STALLED:
self.status_bar.set_node_detail(f"stalled: {event.data.get('reason', '')}")
# --- Log pane events ---
if et in self._LOG_PANE_EVENTS:
self.log_pane.write_event(event)
except Exception:
pass
def save_screenshot(self, filename: str | None = None) -> str:
"""Save a screenshot of the current screen as SVG (viewable in browsers).
Args:
filename: Optional filename for the screenshot. If None, generates timestamp-based name.
Returns:
Path to the saved SVG file.
"""
from datetime import datetime
from pathlib import Path
# Create screenshots directory
screenshots_dir = Path("screenshots")
screenshots_dir.mkdir(exist_ok=True)
# Generate filename if not provided
if filename is None:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"tui_screenshot_{timestamp}.svg"
# Ensure .svg extension
if not filename.endswith(".svg"):
filename += ".svg"
# Full path
filepath = screenshots_dir / filename
# Temporarily hide borders for cleaner screenshot
chat_widget = self.query_one(ChatRepl)
original_chat_border = chat_widget.styles.border_left
chat_widget.styles.border_left = ("none", "transparent")
# Hide all Input widget borders
input_widgets = self.query("Input")
original_input_borders = []
for input_widget in input_widgets:
original_input_borders.append(input_widget.styles.border)
input_widget.styles.border = ("none", "transparent")
try:
# Get SVG data from Textual and save it
svg_data = self.export_screenshot()
filepath.write_text(svg_data, encoding="utf-8")
finally:
# Restore the original borders
chat_widget.styles.border_left = original_chat_border
for i, input_widget in enumerate(input_widgets):
input_widget.styles.border = original_input_borders[i]
return str(filepath)
def action_screenshot(self) -> None:
"""Take a screenshot (bound to Ctrl+S)."""
try:
filepath = self.save_screenshot()
self.notify(
f"Screenshot saved: {filepath} (SVG - open in browser)",
severity="information",
timeout=5,
)
except Exception as e:
self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)
async def on_unmount(self) -> None:
"""Cleanup on app shutdown."""
self.is_ready = False
try:
if hasattr(self, "_subscription_id"):
self.runtime.unsubscribe_from_events(self._subscription_id)
except Exception:
pass
try:
if hasattr(self, "queue_handler"):
logging.getLogger().removeHandler(self.queue_handler)
except Exception:
pass
+311
View File
@@ -0,0 +1,311 @@
"""
Chat / REPL Widget - Uses RichLog for append-only, selection-safe display.
Streaming display approach:
- The processing-indicator Label is used as a live status bar during streaming
(Label.update() replaces text in-place, unlike RichLog which is append-only).
- On EXECUTION_COMPLETED, the final output is written to RichLog as permanent history.
- Tool events are written directly to RichLog as discrete status lines.
Client-facing input:
- When a client_facing=True EventLoopNode emits CLIENT_INPUT_REQUESTED, the
ChatRepl transitions to "waiting for input" state: input is re-enabled and
subsequent submissions are routed to runtime.inject_input() instead of
starting a new execution.
"""
import asyncio
import re
import threading
from typing import Any
from textual.app import ComposeResult
from textual.containers import Vertical
from textual.widgets import Input, Label, RichLog
from framework.runtime.agent_runtime import AgentRuntime
class ChatRepl(Vertical):
"""Widget for interactive chat/REPL."""
DEFAULT_CSS = """
ChatRepl {
width: 100%;
height: 100%;
layout: vertical;
}
ChatRepl > RichLog {
width: 100%;
height: 1fr;
background: $surface;
border: none;
scrollbar-background: $panel;
scrollbar-color: $primary;
}
ChatRepl > #processing-indicator {
width: 100%;
height: 1;
background: $primary 20%;
color: $text;
text-style: bold;
display: none;
}
ChatRepl > Input {
width: 100%;
height: auto;
dock: bottom;
background: $surface;
border: tall $primary;
margin-top: 1;
}
ChatRepl > Input:focus {
border: tall $accent;
}
"""
def __init__(self, runtime: AgentRuntime):
super().__init__()
self.runtime = runtime
self._current_exec_id: str | None = None
self._streaming_snapshot: str = ""
self._waiting_for_input: bool = False
self._input_node_id: str | None = None
# Dedicated event loop for agent execution.
# Keeps blocking runtime code (LLM calls, MCP tools) off
# the Textual event loop so the UI stays responsive.
self._agent_loop = asyncio.new_event_loop()
self._agent_thread = threading.Thread(
target=self._agent_loop.run_forever,
daemon=True,
name="agent-execution",
)
self._agent_thread.start()
def compose(self) -> ComposeResult:
yield RichLog(id="chat-history", highlight=True, markup=True, auto_scroll=False, wrap=True)
yield Label("Agent is processing...", id="processing-indicator")
yield Input(placeholder="Enter input for agent...", id="chat-input")
# Regex for file:// URIs that are NOT already inside Rich [link=...] markup
_FILE_URI_RE = re.compile(r"(?<!\[link=)(file://\S+)")
def _linkify(self, text: str) -> str:
"""Convert bare file:// URIs to clickable Rich [link=...] markup."""
return self._FILE_URI_RE.sub(r"[link=\1]\1[/link]", text)
def _write_history(self, content: str) -> None:
"""Write to chat history, only auto-scrolling if user is at the bottom."""
history = self.query_one("#chat-history", RichLog)
was_at_bottom = history.is_vertical_scroll_end
history.write(self._linkify(content))
if was_at_bottom:
history.scroll_end(animate=False)
def on_mount(self) -> None:
"""Add welcome message when widget mounts."""
history = self.query_one("#chat-history", RichLog)
history.write("[bold cyan]Chat REPL Ready[/bold cyan] — Type your input below\n")
async def on_input_submitted(self, message: Input.Submitted) -> None:
"""Handle input submission — either start new execution or inject input."""
user_input = message.value.strip()
if not user_input:
return
# Client-facing input: route to the waiting node
if self._waiting_for_input and self._input_node_id:
self._write_history(f"[bold green]You:[/bold green] {user_input}")
message.input.value = ""
# Disable input while agent processes the response
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = True
chat_input.placeholder = "Enter input for agent..."
self._waiting_for_input = False
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Thinking...")
node_id = self._input_node_id
self._input_node_id = None
try:
future = asyncio.run_coroutine_threadsafe(
self.runtime.inject_input(node_id, user_input),
self._agent_loop,
)
await asyncio.wrap_future(future)
except Exception as e:
self._write_history(f"[bold red]Error delivering input:[/bold red] {e}")
return
# Double-submit guard: reject input while an execution is in-flight
if self._current_exec_id is not None:
self._write_history("[dim]Agent is still running — please wait.[/dim]")
return
indicator = self.query_one("#processing-indicator", Label)
# Append user message and clear input
self._write_history(f"[bold green]You:[/bold green] {user_input}")
message.input.value = ""
try:
# Get entry point
entry_points = self.runtime.get_entry_points()
if not entry_points:
self._write_history("[bold red]Error:[/bold red] No entry points")
return
# Determine the input key from the entry node
entry_point = entry_points[0]
entry_node = self.runtime.graph.get_node(entry_point.entry_node)
if entry_node and entry_node.input_keys:
input_key = entry_node.input_keys[0]
else:
input_key = "input"
# Reset streaming state
self._streaming_snapshot = ""
# Show processing indicator
indicator.update("Thinking...")
indicator.display = True
# Disable input while the agent is working
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = True
# Submit execution to the dedicated agent loop so blocking
# runtime code (LLM, MCP tools) never touches Textual's loop.
# trigger() returns immediately with an exec_id; the heavy
# execution task runs entirely on the agent thread.
future = asyncio.run_coroutine_threadsafe(
self.runtime.trigger(
entry_point_id=entry_point.id,
input_data={input_key: user_input},
),
self._agent_loop,
)
# wrap_future lets us await without blocking Textual's loop
self._current_exec_id = await asyncio.wrap_future(future)
except Exception as e:
indicator.display = False
self._current_exec_id = None
# Re-enable input on error
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = False
self._write_history(f"[bold red]Error:[/bold red] {e}")
# -- Event handlers called by app.py _handle_event --
def handle_text_delta(self, content: str, snapshot: str) -> None:
"""Handle a streaming text token from the LLM."""
self._streaming_snapshot = snapshot
# Show a truncated live preview in the indicator label
indicator = self.query_one("#processing-indicator", Label)
preview = snapshot[-80:] if len(snapshot) > 80 else snapshot
# Replace newlines for single-line display
preview = preview.replace("\n", " ")
indicator.update(
f"Thinking: ...{preview}" if len(snapshot) > 80 else f"Thinking: {preview}"
)
def handle_tool_started(self, tool_name: str, tool_input: dict[str, Any]) -> None:
"""Handle a tool call starting."""
# Update indicator to show tool activity
indicator = self.query_one("#processing-indicator", Label)
indicator.update(f"Using tool: {tool_name}...")
# Write a discrete status line to history
self._write_history(f"[dim]Tool: {tool_name}[/dim]")
def handle_tool_completed(self, tool_name: str, result: str, is_error: bool) -> None:
"""Handle a tool call completing."""
result_str = str(result)
preview = result_str[:200] + "..." if len(result_str) > 200 else result_str
preview = preview.replace("\n", " ")
if is_error:
self._write_history(f"[dim red]Tool {tool_name} error: {preview}[/dim red]")
else:
self._write_history(f"[dim]Tool {tool_name} result: {preview}[/dim]")
# Restore thinking indicator
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Thinking...")
def handle_execution_completed(self, output: dict[str, Any]) -> None:
"""Handle execution finishing successfully."""
indicator = self.query_one("#processing-indicator", Label)
indicator.display = False
# Write the final streaming snapshot to permanent history (if any)
if self._streaming_snapshot:
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
else:
output_str = str(output.get("output_string", output))
self._write_history(f"[bold blue]Agent:[/bold blue] {output_str}")
self._write_history("") # separator
self._current_exec_id = None
self._streaming_snapshot = ""
self._waiting_for_input = False
self._input_node_id = None
# Re-enable input
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = False
chat_input.placeholder = "Enter input for agent..."
chat_input.focus()
def handle_execution_failed(self, error: str) -> None:
"""Handle execution failing."""
indicator = self.query_one("#processing-indicator", Label)
indicator.display = False
self._write_history(f"[bold red]Error:[/bold red] {error}")
self._write_history("") # separator
self._current_exec_id = None
self._streaming_snapshot = ""
self._waiting_for_input = False
self._input_node_id = None
# Re-enable input
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = False
chat_input.placeholder = "Enter input for agent..."
chat_input.focus()
def handle_input_requested(self, node_id: str) -> None:
"""Handle a client-facing node requesting user input.
Transitions to 'waiting for input' state: flushes the current
streaming snapshot to history, re-enables the input widget,
and sets a flag so the next submission routes to inject_input().
"""
# Flush accumulated streaming text as agent output
if self._streaming_snapshot:
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
self._streaming_snapshot = ""
self._waiting_for_input = True
self._input_node_id = node_id or None
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Waiting for your input...")
chat_input = self.query_one("#chat-input", Input)
chat_input.disabled = False
chat_input.placeholder = "Type your response..."
chat_input.focus()
+194
View File
@@ -0,0 +1,194 @@
"""
Graph/Tree Overview Widget - Displays real agent graph structure.
"""
from textual.app import ComposeResult
from textual.containers import Vertical
from textual.widgets import RichLog
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import EventType
class GraphOverview(Vertical):
"""Widget to display Agent execution graph/tree with real data."""
DEFAULT_CSS = """
GraphOverview {
width: 100%;
height: 100%;
background: $panel;
}
GraphOverview > RichLog {
width: 100%;
height: 100%;
background: $panel;
border: none;
scrollbar-background: $surface;
scrollbar-color: $primary;
}
"""
def __init__(self, runtime: AgentRuntime):
super().__init__()
self.runtime = runtime
self.active_node: str | None = None
self.execution_path: list[str] = []
# Per-node status strings shown next to the node in the graph display.
# e.g. {"planner": "thinking...", "searcher": "web_search..."}
self._node_status: dict[str, str] = {}
def compose(self) -> ComposeResult:
# Use RichLog for formatted output
yield RichLog(id="graph-display", highlight=True, markup=True)
def on_mount(self) -> None:
"""Display initial graph structure."""
self._display_graph()
def _topo_order(self) -> list[str]:
"""BFS from entry_node following edges."""
graph = self.runtime.graph
visited: list[str] = []
seen: set[str] = set()
queue = [graph.entry_node]
while queue:
nid = queue.pop(0)
if nid in seen:
continue
seen.add(nid)
visited.append(nid)
for edge in graph.get_outgoing_edges(nid):
if edge.target not in seen:
queue.append(edge.target)
# Append orphan nodes not reachable from entry
for node in graph.nodes:
if node.id not in seen:
visited.append(node.id)
return visited
def _render_node_line(self, node_id: str) -> str:
"""Render a single node with status symbol and optional status text."""
graph = self.runtime.graph
is_terminal = node_id in (graph.terminal_nodes or [])
is_active = node_id == self.active_node
is_done = node_id in self.execution_path and not is_active
status = self._node_status.get(node_id, "")
if is_active:
sym = "[bold green]●[/bold green]"
elif is_done:
sym = "[dim]✓[/dim]"
elif is_terminal:
sym = "[yellow]■[/yellow]"
else:
sym = ""
if is_active:
name = f"[bold green]{node_id}[/bold green]"
elif is_done:
name = f"[dim]{node_id}[/dim]"
else:
name = node_id
suffix = f" [italic]{status}[/italic]" if status else ""
return f" {sym} {name}{suffix}"
def _render_edges(self, node_id: str) -> list[str]:
"""Render edge connectors from this node to its targets."""
edges = self.runtime.graph.get_outgoing_edges(node_id)
if not edges:
return []
if len(edges) == 1:
return ["", ""]
# Fan-out: show branches
lines: list[str] = []
for i, edge in enumerate(edges):
connector = "" if i == len(edges) - 1 else ""
cond = ""
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
lines.append(f" {connector}──▶ {edge.target}{cond}")
return lines
def _display_graph(self) -> None:
"""Display the graph as an ASCII DAG with edge connectors."""
display = self.query_one("#graph-display", RichLog)
display.clear()
graph = self.runtime.graph
display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
# Render each node in topological order with edges
ordered = self._topo_order()
for node_id in ordered:
display.write(self._render_node_line(node_id))
for edge_line in self._render_edges(node_id):
display.write(edge_line)
# Execution path footer
if self.execution_path:
display.write("")
display.write(f"[dim]Path:[/dim] {''.join(self.execution_path[-5:])}")
def update_active_node(self, node_id: str) -> None:
"""Update the currently active node."""
self.active_node = node_id
if node_id not in self.execution_path:
self.execution_path.append(node_id)
self._display_graph()
def update_execution(self, event) -> None:
"""Update the displayed node status based on execution lifecycle events."""
if event.type == EventType.EXECUTION_STARTED:
self._node_status.clear()
self.execution_path.clear()
entry_node = event.data.get("entry_node") or (
self.runtime.graph.entry_node if self.runtime else None
)
if entry_node:
self.update_active_node(entry_node)
elif event.type == EventType.EXECUTION_COMPLETED:
self.active_node = None
self._node_status.clear()
self._display_graph()
elif event.type == EventType.EXECUTION_FAILED:
error = event.data.get("error", "Unknown error")
if self.active_node:
self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
self.active_node = None
self._display_graph()
# -- Event handlers called by app.py _handle_event --
def handle_node_loop_started(self, node_id: str) -> None:
"""A node's event loop has started."""
self._node_status[node_id] = "thinking..."
self.update_active_node(node_id)
def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
"""A node advanced to a new loop iteration."""
self._node_status[node_id] = f"step {iteration}"
self._display_graph()
def handle_node_loop_completed(self, node_id: str) -> None:
"""A node's event loop completed."""
self._node_status.pop(node_id, None)
self._display_graph()
def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
"""Show tool activity next to the active node."""
if started:
self._node_status[node_id] = f"{tool_name}..."
else:
# Restore to generic thinking status after tool completes
self._node_status[node_id] = "thinking..."
self._display_graph()
def handle_stalled(self, node_id: str, reason: str) -> None:
"""Highlight a stalled node."""
self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
self._display_graph()

Some files were not shown because too many files have changed in this diff Show More