Compare commits

...

283 Commits

Author SHA1 Message Date
Timothy 17df56fd66 feat: brevo tools 2026-02-22 20:26:16 -08:00
Timothy c409bd7b0e feat: consistent tool testing 2026-02-22 20:09:00 -08:00
Bryan @ Aden f12ab10725 Merge pull request #4930 from Ttian18/fix/tina/shift-enter-newline-4565
fix(tui): add Ctrl+J as newline fallback in chat input
2026-02-23 02:31:47 +00:00
Bryan @ Aden 0882fa6ce5 Merge pull request #5165 from ishaannk/main
feat: add stop/cancel execution control for agents
2026-02-23 02:24:46 +00:00
RichardTang-Aden 0b87e4c45d Merge pull request #5245 from TimothyZhang7/main
Release / Create Release (push) Waiting to run
doc(architecture): update documents
2026-02-22 18:04:51 -08:00
Timothy 9c7e846828 chore: put event loop node zoom inside worker bee graph 2026-02-22 18:03:31 -08:00
Timothy 13cc93c334 chore: architecture 2026-02-22 17:54:12 -08:00
Timothy 564b1bb752 chore: roadmap diagram 2026-02-22 17:45:56 -08:00
ishaannk fd89c7f56f Fix: Add trailing newlines for ruff format compliance 2026-02-23 04:41:38 +05:30
Timothy @aden 9c781ed78e Merge pull request #5224 from TimothyZhang7/feature/credential-v2
fix(micro-fix): tui select account
2026-02-21 23:13:18 -08:00
Timothy 460a24e34a fix: tui select account 2026-02-21 23:01:41 -08:00
ishank 2f11f0c911 Merge branch 'aden-hive:main' into main 2026-02-21 17:29:51 +05:30
ishaannk c3ae67fb1d address review comments: rename Stop to Pause and UI toggle change 2026-02-21 17:08:42 +05:30
Timothy @aden 8c750c7edd Merge pull request #5194 from Antiarin/fix/escalate-to-coder-execution-id
fix[bug](graph): add execution_id to base Runtime and restore ctx.execution_id in escalation handler
2026-02-21 03:05:14 -08:00
Antiarin 571838a289 fix(graph): add execution_id to NodeContext for escalate_to_coder 2026-02-21 16:20:04 +05:30
RichardTang-Aden dafaaae792 Merge pull request #5182 from TimothyZhang7/feature/credential-v2
Feature/credential v2
2026-02-20 19:52:00 -08:00
Timothy b45e14efb4 fix: zai api key setup 2026-02-20 19:40:07 -08:00
RichardTang-Aden e70cbf26e2 Merge pull request #5095 from NSkogstad-AUS/docs/update-tools-readme
Docs/update tools readme
2026-02-20 19:08:23 -08:00
RichardTang-Aden daafdc3704 Merge pull request #5103 from alhousseynou-ndiaye/feature/document-agent
docs: add document processing recipe
2026-02-20 19:06:59 -08:00
Bryan @ Aden bece21d217 Merge pull request #5169 from Schlaflied/docs/sync-zh-CN-readme
docs(i18n): sync zh-CN.md with latest README and fix broken links
2026-02-21 02:06:48 +00:00
Timothy f4594ecf37 fix: gmail batch tool schema coercion 2026-02-20 17:53:35 -08:00
Bryan @ Aden 8f1462cb79 Merge pull request #5113 from vakrahul/features/stripe-tools
feat(tools): add Stripe payment processing integration
2026-02-21 01:44:23 +00:00
Timothy 76d4d0de69 feat: credential v2 with provider loading and test agent 2026-02-20 17:43:00 -08:00
vakrahul 6ab4e1d641 fix: address maintainer PR reviews feedback for Stripe 2026-02-21 07:06:20 +05:30
vakrahul c5d87c99fd fix: address maintainer PR review feedback for Stripe 2026-02-21 06:30:31 +05:30
Schlaflied f53f403022 docs(i18n): sync zh-CN.md with latest README and fix broken links 2026-02-20 19:29:45 -05:00
Timothy b887b2951e wip: credential v2 2026-02-20 13:55:06 -08:00
ishaannk 842b69b155 feat: add stop/cancel execution control for agents 2026-02-21 02:03:29 +05:30
Nicolas Suescun d6c34106fc docs: fix CLI arguments mismatch for test-debug and test-list (#4113)
* docs: fix CLI usage args for test-debug/test-list to match implementation

* docs: restore 'uv run' prefix to test commands

Reverts unintentional removal of 'uv run' in usage examples as requested in code review.

* chore: changes to .gitignore
2026-02-20 17:58:17 +08:00
Nihal 67cbd31280 fix(graph): harden JSON parsing for async safety and large LLM outputs (#4869)
* perf(json): add json.loads fast path + asyncio.to_thread for extract_json

Addresses maintainer feedback:
- json.loads candidate fast path in find_json_object (300x speedup source)
- asyncio.to_thread wrappers for both _extract_json call sites (unblocks event loop)
- Remove ~480 lines of over-engineered incremental parsing logic

Total: ~16 lines, zero duplication, zero API surface change

* fix: simplify async JSON handling per maintainer feedback and align tests

* fix(test): replace tautology assertion in test_mismatched_then_valid

The original assertion `assert result is not None or result is None`
is always true. Replace with a meaningful type check.

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-20 17:23:25 +08:00
Timothy @aden cf877f2b49 Merge pull request #5121 from TimothyZhang7/fix/credential-error-types
Fix(micro-fix)/credential error types
2026-02-19 16:23:31 -08:00
Timothy 6f34cb2c8a fix: credential error types 2026-02-19 14:52:29 -08:00
Timothy b88aa2b53c Merge branch 'feature/tui-credential-setup' 2026-02-19 11:12:28 -08:00
Timothy 356cab19eb Merge branch 'fix/google-tool-healthcheck' into feature/tui-credential-setup 2026-02-19 11:12:15 -08:00
vakrahul 7c6d5fa446 test_credentials changess 2026-02-19 19:46:43 +05:30
vakrahul 2dae3e47fd test_credentials changes 2026-02-19 19:43:45 +05:30
vakrahul 6fce789607 feat: add Stripe tool integration and testss 2026-02-19 17:14:57 +05:30
vakrahul 9bbb5b38e6 feat: add Stripe tool integration and tests 2026-02-19 16:55:22 +05:30
vakrahul ac73aa93bf feat: add Stripe tool integration and tests 2026-02-19 16:51:08 +05:30
Timothy 52a56e4a10 fix: google tools need healthcheck 2026-02-18 23:07:12 -08:00
alhousseynou-ndiaye a1cede510d docs: add document processing recipe 2026-02-19 07:47:13 +01:00
Timothy @aden 682c10e873 Merge pull request #5099 from TimothyZhang7/main
release(docs): v0.5.1
2026-02-18 22:11:45 -08:00
Timothy 5605e24a0d fix: streaming output leakage 2026-02-18 22:10:02 -08:00
Timothy f7268a44d9 fix: worker credential setup 2026-02-18 21:50:18 -08:00
Timothy af7a4ff4e8 release: v0.5.1
- Bump framework version 0.5.0 → 0.5.1
- Add CHANGELOG.md with full release notes

Highlights: Hive Coder meta-agent, multi-graph runtime, TUI revamp,
subscription model support, 5 new tool integrations, deprecated node
type removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:15:20 -08:00
Timothy 60b9c0d763 release: v0.5.1
- Bump framework version 0.5.0 → 0.5.1
- Add CHANGELOG.md with full release notes

Highlights: Hive Coder meta-agent, multi-graph runtime, TUI revamp,
subscription model support, 5 new tool integrations, deprecated node
type removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:00:41 -08:00
Timothy @aden 5c550270c6 Merge pull request #5071 from TimothyZhang7/feature/queen-bee
Release / Create Release (push) Waiting to run
Feature/queen bee
2026-02-18 20:59:04 -08:00
Timothy e03fd48e48 fix: lint 2026-02-18 20:51:40 -08:00
Timothy 6420c74c24 fix: ci tests 2026-02-18 20:47:07 -08:00
Timothy ad74351530 fix: agent switch return 2026-02-18 20:39:25 -08:00
Timothy @aden 1b5f656429 Merge branch 'main' into feature/queen-bee 2026-02-18 20:34:19 -08:00
Timothy @aden 132d84c529 Merge pull request #5058 from adenhq/fix/deprecation
fix(arch): remove all deprecated concepts and deadcodes
2026-02-18 20:32:24 -08:00
Timothy @aden a03b378e9b Merge branch 'main' into fix/deprecation 2026-02-18 20:29:39 -08:00
Timothy 74635e1d7d feat: subscription model support, tui revamp 2026-02-18 20:28:11 -08:00
Bryan @ Aden 893053ede7 Merge pull request #5098 from adenhq/update/inbox-agent-fixes
(micro-fix): Update/inbox agent fixes
2026-02-19 03:35:25 +00:00
bryan 596ec6fec5 fixed credentials 2026-02-18 19:26:59 -08:00
bryan 5863b83172 Merge branch 'main' into update/inbox-agent-fixes 2026-02-18 19:12:01 -08:00
bryan 20c92b197a fixes to inbox agent 2026-02-18 19:08:55 -08:00
RichardTang-Aden ec9c6b4666 Merge pull request #5097 from RichardTang-Aden/feat/credential-setup-cli
Feat/credential setup cli
2026-02-18 17:22:53 -08:00
Richard Tang 8a73e5c119 chore: ruff lint 2026-02-18 17:21:45 -08:00
Richard Tang 717f0eee9a Merge branch 'main' into feat/credential-setup-cli 2026-02-18 17:20:40 -08:00
Richard Tang 09fb47f089 chore: ruff format 2026-02-18 17:14:26 -08:00
Richard Tang b46d943e71 chore: lint issues 2026-02-18 17:13:01 -08:00
NSkogstad-AUS b980d6f6ab docs(tools): fixed small inaccuracy with gmail description 2026-02-19 11:33:50 +11:00
NSkogstad-AUS 61f27369ef docs(tools): update Available Tools table with additional search functionalities 2026-02-19 11:28:18 +11:00
NSkogstad-AUS 204b0b4744 docs(tools): expand Available Tools table with all tools by category
Previously the table listed ~20 of ~50 available tools. This expands
it to cover all tools, grouped into categories: File System, Data Files,
Web & Search, Communication, Productivity & CRM, Cloud & APIs,
Security, and Utilities.

All tool names verified against registered @mcp.tool() functions in source.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-19 11:20:49 +11:00
Timothy 1b6ebb1e42 fix: put guardian back to hive coder 2026-02-18 15:06:25 -08:00
Timothy 7dfc75b3e6 feat: muti graph agent session 2026-02-18 12:46:59 -08:00
Richard Tang 2920b5ab01 chore: lint issues 2026-02-17 20:05:05 -08:00
Richard Tang 81ad0467b0 Merge branch 'main' into fix/deprecation 2026-02-17 20:02:47 -08:00
Richard Tang 115ca55ea0 fix: broken ci tests 2026-02-17 20:00:47 -08:00
Richard Tang f2814a26e6 chore: lint issue 2026-02-17 19:57:31 -08:00
Richard Tang 4d309950b0 fix: unused code and ci 2026-02-17 19:55:54 -08:00
RichardTang-Aden 39216a4c12 Merge pull request #5016 from adenhq/feat/pdf-ingestion
Feat/pdf ingestion
2026-02-17 19:29:35 -08:00
Aden HQ c7fa621aeb Merge branch 'main' into feat/pdf-ingestion 2026-02-17 19:28:54 -08:00
Timothy 5914d28cbe feat(queen): hive queen bee implementation v1 2026-02-17 19:19:09 -08:00
Richard Tang 8c3ad3d70a fix: email agent version 2026-02-17 19:17:07 -08:00
Richard Tang 9eb3fc6285 fix: fix email agent version 2026-02-17 19:09:17 -08:00
Richard Tang e95f7e7339 fix: make the email inbox management agent identical to main 2026-02-17 19:02:08 -08:00
RichardTang-Aden d949551399 Merge pull request #3332 from haliaeetusvocifer/feat/google-docs-integration
added feature google docs integration
2026-02-17 18:25:03 -08:00
Richard Tang a7dbd85ed4 fix: google docs credentials 2026-02-17 18:24:31 -08:00
Richard Tang 1f288dab1c fix: tools registration problems 2026-02-17 18:15:41 -08:00
Richard Tang 021754d941 Merge branch 'main' into feat/google-docs-integration 2026-02-17 18:05:07 -08:00
bryan 7412904fbf update to job hunter and website vulnerability 2026-02-17 16:58:12 -08:00
Timothy cd1976e2b9 feat: support openai compatible endpoints 2026-02-17 16:27:36 -08:00
haliaeetusvocifer 5f3e9379a3 completed task 2026-02-17 23:59:47 +00:00
Richard Tang 0e565d6cea feat: add the agent start confirmation and credential update option 2026-02-17 13:17:03 -08:00
Richard Tang 67b249dcd5 feat: add the credential setup step after credential validation 2026-02-17 12:59:20 -08:00
Timothy bbf1c8c790 fix(arch): remove all deprecated concepts and deadcodes 2026-02-17 10:59:15 -08:00
bryan 44a8b453b5 Merge branch 'main' into feat/pdf-ingestion 2026-02-16 18:40:46 -08:00
bryan 26511fe962 added pdf select and updated job hunter 2026-02-16 18:38:13 -08:00
RichardTang-Aden ce5893216a Merge pull request #4871 from paarths-collab/docs/root-install-warning
docs(readme): clarify uv workspace setup and prevent root pip install misuse
2026-02-16 18:36:57 -08:00
RichardTang-Aden 4e821e4dbf Merge pull request #5011 from RichardTang-Aden/main
micro-fix: chore: update the intro message of the agent
2026-02-16 18:05:18 -08:00
Richard Tang d11e97de59 chore: update the intro message of the agent 2026-02-16 18:03:58 -08:00
RichardTang-Aden 4b10d3e360 Merge pull request #5010 from RichardTang-Aden/main
feat: merge the sample agent so we have one email inbox management agent
2026-02-16 18:00:31 -08:00
Richard Tang e04479930f chore: update descriptions 2026-02-16 17:56:17 -08:00
Richard Tang 8a8c4cc3f5 chore: rename the email 2026-02-16 17:53:05 -08:00
Richard Tang 1e06ff611e refactor: merge the sample agent so we have one email inbox management agent 2026-02-16 17:43:18 -08:00
Pravin Mishra 1edc7bb9c7 feat(tools): add Discord integration (#2913) (#4247)
* feat(tools): add Discord integration (#2913)

- discord_list_guilds: list servers the bot is in
- discord_list_channels: list channels for a guild
- discord_send_message: send message to channel
- discord_get_messages: get recent messages

Auth: DISCORD_BOT_TOKEN, credential spec, health checker.
Uses Discord API v10 (Bot token).

Co-authored-by: Cursor <cursoragent@cursor.com>

* style: apply ruff format to discord tool files

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add rate limit handling, message validation, channel filter

- Rate limit (429): return clear error with retry_after from API
- Message length: validate before send, max 2000 chars per Discord limit
- Channel filter: text_only param (default True) for list_channels
- Add 6 new tests for rate limit, validation, filtering

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add retry on 429 rate limit

- Retry up to 2 times using Discord's retry_after
- Cap wait at 60s, fallback to exponential backoff if no retry_after
- Add _request_with_retry helper for all API calls
- Add 3 tests: retry then success, retry exhausted, tool-level retry

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(discord): remove unused DISCORD_API_BASE import

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: mishrapravin114 <mishrapravin114@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 09:29:56 +08:00
Siddharth Varshney 7b1e0af155 feat(utils): add proper __init__.py exports for utils module (#3979) 2026-02-16 20:20:09 +08:00
Jeet Karia 7b15616e29 feat(tools): add Exa Search API integration with 4 MCP tools (#4941)
Implements AI-powered web search, content extraction, and research tools
via the Exa API for agent workflows.

Tools: exa_search, exa_find_similar, exa_get_contents, exa_answer

Follows existing tool pattern (web_search_tool, hubspot_tool, slack_tool):
- register_tools(mcp, credentials) with @mcp.tool() decorators
- Credential fallback: CredentialStoreAdapter -> EXA_API_KEY env var
- Error handling: always returns dicts, never raises
- Retry with exponential backoff on HTTP 429

Includes:
- Neural/keyword search with domain, date, and category filters
- Similar page discovery via neural embeddings
- Content extraction from up to 10 URLs per request
- Citation-backed answer generation
- CredentialSpec in credentials/search.py
- Comprehensive unit tests (21 tests)
- 500/500 integration CI tests passing

Fixes #4177
2026-02-16 19:28:32 +08:00
Zhang bd7d2277d8 fix(tui): add Ctrl+J as newline fallback in chat input
Terminals without extended key reporting (VS Code, Cursor) send
identical events for Enter and Shift+Enter, making it impossible
to insert newlines. Ctrl+J produces a distinct key event in all
terminals.
2026-02-15 20:37:52 -08:00
Shivam Shahi– oss/acc 99ed00fd02 feat(tools): add Razorpay payment processing integration (#4467)
* feat(tools): add Razorpay payment processing integration

Add Razorpay MCP tool integration for payment processing, invoicing,
and refund management. Implements 6 MCP tools:

- razorpay_list_payments: List recent payments with filters (pagination, date range)
- razorpay_get_payment: Fetch detailed payment information by ID
- razorpay_create_payment_link: Create one-time payment links with shareable URLs
- razorpay_list_invoices: List invoices with status and type filtering
- razorpay_get_invoice: Fetch invoice details including line items
- razorpay_create_refund: Create full or partial refunds for payments

Features:
- Authentication via HTTP Basic Auth (RAZORPAY_API_KEY + RAZORPAY_API_SECRET)
- Credential spec in dedicated razorpay.py (follows repo pattern)
- Comprehensive error handling (401, 403, 404, 400, 429, 500, timeouts)
- Input validation (payment IDs, invoice IDs, amounts, currencies)
- Full test coverage (42 unit tests, 26 integration tests)

Closes #4404

* style: fix ruff I001 import order and W291 in tools

* fix: improve Razorpay credential tracking and validation

- Add razorpay_secret CredentialSpec with credential_group
- Fix amount=0 bug by using 'is not None' checks
- Add regex validation for payment/invoice IDs

* fix: use graceful credential handling instead of raising TypeError

Match codebase convention (calcom, lusha) - return None for non-string
credentials instead of raising TypeError, so the tool returns an error
dict instead of crashing.

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-16 12:02:16 +08:00
Timothy @aden f7af5f9ee8 Merge pull request #4926 from TimothyZhang7/example/gmail-inbox-guardian-agent
Release / Create Release (push) Waiting to run
chore(micro-fix): change the timer to 5 minutes
2026-02-15 19:30:27 -08:00
RichardTang-Aden e5bcc8005f Merge pull request #4922 from adenhq/feat/vulnerability_agent
Feat/vulnerability agent
2026-02-15 19:21:00 -08:00
Timothy 352d285212 chore: change the timer to 5 minutes 2026-02-15 18:47:22 -08:00
Timothy @aden 3ef60f9d14 Merge pull request #4925 from TimothyZhang7/example/gmail-inbox-guardian-agent
Example(micro-fix)/gmail inbox guardian agent
2026-02-15 18:44:19 -08:00
Timothy a103312127 feature: display timer status interactively 2026-02-15 18:41:45 -08:00
Timothy 3d0bba4167 example(agents): ready-to-use gmail automation agent 2026-02-15 18:34:14 -08:00
Timothy @aden 3df718cc14 Merge pull request #4920 from TimothyZhang7/fix/issue-4905
Fix/issue 4905
2026-02-15 18:08:28 -08:00
RichardTang-Aden c7497a180e Merge pull request #4918 from TimothyZhang7/feature/multi-entry-event-driven-agents
Fix/multi entry event driven agents
2026-02-15 18:05:43 -08:00
Bryan @ Aden 3f39039a21 Merge pull request #4800 from LukeM94/doc/typo-fixes-in-roadmap.md
doc: Fix typos in docs/roadmap.md
2026-02-16 01:57:35 +00:00
Bryan @ Aden 88fbd90fcc Merge pull request #4799 from zhanglinqian/fix-recipes-readme-links
docs: fix incorrect directory names in recipes README
2026-02-16 01:55:51 +00:00
bryan e0bf09dd78 lint fixes 2026-02-15 17:45:56 -08:00
bryan 3e158b07af Merge branch 'main' into feat/vulnerability_agent 2026-02-15 17:35:59 -08:00
Timothy 5319ed7ee1 chore: remove unsolicited docs 2026-02-15 17:32:36 -08:00
Timothy 978904d2a4 fix(executor): async operations on non-streaming llm complete for healthy event loop 2026-02-15 17:31:18 -08:00
bryan 4d876ecc54 vulnerability check to sample agents 2026-02-15 17:27:09 -08:00
RichardTang-Aden ba327d0b9e Merge pull request #4919 from adenhq/feat/sample-agent/job_hunter
Sample agent, micro-fix: remove dependency of brave search
2026-02-15 16:58:06 -08:00
Richard Tang b69cf3523c feat: remove dependency of brave search 2026-02-15 16:47:53 -08:00
Timothy 4d8c8e9308 feat(arch): architecture patches to support multi-entry agents consuming external events 2026-02-15 16:19:58 -08:00
Amit Kumar b70885934c [Integration]: Cal.com - Open Source Scheduling Infrastructure #3188 (#3255)
* feat(tools): add Cal.com scheduling integration with 8 MCP tools

Adds Cal.com API integration for booking and scheduling management:
- calcom_list_bookings, calcom_get_booking, calcom_create_booking, calcom_cancel_booking
- calcom_get_availability, calcom_update_schedule
- calcom_list_event_types, calcom_get_event_type

Includes dedicated credential spec, 20 unit tests, and full integration conformance.
Resolves #3188

* fix(calcom): address PR review + add missing list_schedules MCP tool

- Add isinstance(api_key, str) guard in _get_api_key, return None on
  non-string values for consistent error-dict handling
- Remove duplicate metadata from responses object in create_booking;
  metadata stays at top-level per Cal.com v1 API spec
- Expose availability parameter in calcom_update_schedule MCP tool
- Add calcom_list_schedules tool wrapping existing client method —
  needed to discover schedule IDs before calling update_schedule
- Register calcom_list_schedules in credential spec for CI conformance
- Add tests for credential handling, schedule availability, and
  list_schedules (24 tests total, 9 tools)

* docs: update README to include calcom_list_schedules (9 tools)

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-15 20:14:00 +08:00
paarths-collab 722b087fc0 docs(readme): clarify installation and prevent root pip install misuse 2026-02-15 17:39:41 +05:30
Aaryann Chandola 0c7ea272db [integration] feat(tools): add Google Calendar integration (#3171)
* feat(calendar): add Google Calendar integration with event management tools and health checks

* fix(calendar): align google_calendar_oauth credential spec with codebase pattern
2026-02-15 08:25:53 +08:00
Aaryann Chandola 5e4f322fc0 add new time tool for current date/time retrieval (#3425) 2026-02-14 21:57:07 +08:00
zhanglinqian c02e45f1aa docs: fix incorrect directory names in recipes README 2026-02-14 21:19:47 +08:00
LukeM94 a7217f138c Fix typos in docs/roadmap.md
Correct hyphenation and spelling in the product roadmap: change 'outcome oriented' to 'outcome-oriented' and fix 'Workder' to 'Worker' in the Deployment section.
2026-02-14 13:18:03 +00:00
Emmanuel Nwanguma 3502f25048 [Integration] feat(tools): add BigQuery MCP tool for SQL querying and data analysis (#3350)
* feat(tools): add BigQuery MCP tool for SQL querying and data analysis

- Add run_bigquery_query tool for executing read-only SQL queries
- Add describe_dataset tool for exploring dataset schemas
- Implement safety features: read-only enforcement, row limits (max 10k)
- Add comprehensive unit tests (27 tests passing)
- Follow CredentialStoreAdapter pattern from email tool
- Support ADC and service account authentication

Fixes #3067

* fix(bigquery): address PR review feedback

- Add credential_id and api_key_instructions to CredentialSpec
- Fix credential key name from 'bigquery_credentials' to 'bigquery'
- Pass credential path to BigQuery client via environment variable
- Fix ADC error message detection for both error variants
- Move google-cloud-bigquery to optional dependencies
- Update tests to use correct credential key names

All 27 tests passing

* fix(bigquery): return {error, help} when dependency missing

* fix(bigquery): return full ImportError message for missing dependency

* fix(bigquery): include 'help' key when dependency missing
2026-02-14 19:48:03 +08:00
RichardTang-Aden 93c026fe31 Merge pull request #4759 from adenhq/feat/sample-agent/job_hunter
[Feature][Sample Agent]: Job Hunting Agent
2026-02-13 20:24:35 -08:00
bryan e515977b96 Merge branch 'main' into feat/vulnerability_agent 2026-02-13 20:16:48 -08:00
bryan 045490a097 testing agent run 1-5 2026-02-13 20:16:12 -08:00
Richard Tang b25903fb7f feat: init job hunter 2026-02-13 20:08:55 -08:00
bryan acf4bd5152 tools for sample agent 2026-02-13 19:14:59 -08:00
Timothy 1f5711e1a1 Merge branch 'fix/transient-error-handlings' into feat/inbox-management 2026-02-13 18:53:33 -08:00
RichardTang-Aden ca2dd90313 Merge pull request #4610 from adenhq/feat/inbox-management
Feat/inbox management
2026-02-13 18:51:26 -08:00
Timothy 21e07f3b65 Merge branch 'feature/load-by-bytes' into feat/inbox-management 2026-02-13 18:43:45 -08:00
Timothy e8a06ddd34 feat: load by bytes instead of rows 2026-02-13 18:41:16 -08:00
bryan 34cc09904f fix pytest 2026-02-13 18:27:22 -08:00
bryan f6bba8b62f Merge branch 'main' into feat/inbox-management 2026-02-13 18:17:57 -08:00
bryan d241ad60f8 updated tui to two panel 2026-02-13 18:13:12 -08:00
Timothy 5a3fcf9a8a Merge branch 'main' into feat/inbox-management 2026-02-13 16:39:07 -08:00
Timothy 1f8a47203f fix: common transient errors and loop detection 2026-02-13 16:14:43 -08:00
RichardTang-Aden 7240090274 Merge pull request #4525 from e-cesar9/fix/4428-windows-defender-exclusions
perf(windows): add Windows Defender exclusions for 40% faster uv sync
2026-02-13 15:49:58 -08:00
Richard Tang 2e6a47c2df fix(windows): replace unicode bullets with ASCII dashes
The bullet character (•) cannot be displayed properly in PowerShell
on some Windows systems. Use ASCII dash (-) instead for compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 15:48:48 -08:00
Richard Tang 7f5ecd7913 fix: uv local storage 2026-02-13 15:47:57 -08:00
RichardTang-Aden 105b98b113 Merge pull request #4646 from jaathavan18/docs/issue-4645-fix-exports-link
docs: fix broken exports/ link in environment-setup.md
2026-02-13 15:20:07 -08:00
RichardTang-Aden 114e65ab41 Merge pull request #4649 from jaathavan18/docs/issue-4648-fix-setup-python-ref
docs: fix references to nonexistent setup-python.sh
2026-02-13 15:19:50 -08:00
RichardTang-Aden 0fc13a5cc3 Merge pull request #4660 from Hima-de/fix/antigravity-setup-docs
docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
2026-02-13 15:19:28 -08:00
bryan e651799e9e Merge branch 'main' into feat/inbox-management 2026-02-13 10:34:33 -08:00
Timothy @aden fcd3e514de Merge pull request #4722 from TimothyZhang7/main
chore(micro-fix): fix removed message
2026-02-13 09:43:05 -08:00
Timothy 7ab41de3a2 chore: fix removed message 2026-02-13 09:40:37 -08:00
Timothy @aden 58e023f277 Merge pull request #4720 from TimothyZhang7/feature/event-source
Feature/event source
2026-02-13 09:37:13 -08:00
Timothy @aden a98f2d5b86 Merge pull request #4647 from TimothyZhang7/feature/memory-inheritance
feat:phased compaction and event bus integration
2026-02-13 09:25:44 -08:00
Amit Kumar eca43231c0 [Feature]: Add Excel Tool for reading/writing .xlsx/.xlsm files #2675 (#3004)
* feat(tools): add Excel tool for reading/writing .xlsx/.xlsm files

Add excel_tool module with 7 MCP tools:
- excel_read: Read data from Excel files with pagination
- excel_write: Create new Excel files
- excel_append: Append rows to existing Excel files
- excel_info: Get file metadata (sheets, columns, rows)
- excel_sheet_list: List sheet names in a file
- excel_sql: Query Excel with SQL (DuckDB), multi-sheet support
- excel_search: Search values across sheets with match options

Includes 56 tests, openpyxl optional dependency, and documentation.

Fixes #2675

* fix(tools): address excel review feedback and stabilize tests
2026-02-13 19:32:07 +08:00
Amit Kumar 6763077887 feat(tools): add Google Maps Platform integration with 6 MCP tools (#3796)
Implements geocoding, routing, and location intelligence via Google Maps
Platform Web Services APIs for logistics, delivery, and location-based
agent workflows.

Tools: maps_geocode, maps_reverse_geocode, maps_directions,
maps_distance_matrix, maps_place_details, maps_place_search

Includes:
- page_token support for paginated place search results
- GoogleMapsHealthChecker for credential validation
- Comprehensive unit tests (42 tool tests, 30 health check tests)

Closes #3179
2026-02-13 19:01:28 +08:00
Siddharth Varshney f85ff8a2f8 feat(tools): add Telegram Bot integration (#3550)
* feat(tools): add Telegram Bot integration

- Add telegram_send_message and telegram_send_document tools
- Add credential spec for TELEGRAM_BOT_TOKEN
- Add comprehensive tests (18 test cases)
- Add README documentation with setup instructions

* fix(telegram_tool): catch network errors instead of letting them raise

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 17:10:40 +08:00
Amit Kumar 1a5c3480e6 feat(tools): add NewsData + Finlight MCP tools for market intelligence (#4473)
* feat(tools): add news MCP tools

* fix(tools): adjust news providers and fallback

* fix(tools): add rate limiting + sentiment normalization to news tools

Address PR feedback: exponential backoff (3 retries, 2^attempt delays)
on HTTP 429 for both NewsData and Finlight with seamless provider
fallback, and normalize sentiment scores to [-1.0, +1.0] range.

* fix(news_tool): make provider fallback lazy and handle network errors

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 16:25:56 +08:00
Hima-de 69a7fe7b92 docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
Replaces references to ./scripts/setup-python.sh with ./quickstart.sh,
which is the actual setup script in the repository.

Fixes #4648
2026-02-13 06:30:36 +00:00
Emmanuel Nwanguma a5418d760f fix(runtime): validate and create storage path on init (#4466)
- Add path validation in Runtime.__init__()
- Log warning when creating nonexistent storage directory
- Auto-create directory with parents to prevent silent failures later
- Implements Option 3 from issue discussion (explicit, safe, informative)

Fixes #1870
2026-02-13 14:24:14 +08:00
T.Trinath Reddy 0deeb87c63 feat(vision): add GCP Vision API integration (#4231)
* feat(vision): add GCP Vision API integration

* refactor(vision): move GCP Vision credentials to dedicated folder

* fix: clean up credentials imports and updated gitignore

* followed ruff alphabetic order for credentials
2026-02-13 14:00:15 +08:00
Timothy d1d5f49c5a fix: add more events to event bus 2026-02-12 20:42:20 -08:00
bryan 917e23ccc8 Merge branch 'main' into feat/inbox-management 2026-02-12 20:30:34 -08:00
RichardTang-Aden 988922304f Merge pull request #4451 from Ttian18/fix/tina-tui-copy-newline-4423
Release / Create Release (push) Waiting to run
fix(tui): add multiline input and cross-platform clipboard support
2026-02-12 20:27:32 -08:00
bryan ab2bd726c3 Merge branch 'main' into feat/inbox-management 2026-02-12 20:24:46 -08:00
bryan 713fefb163 inbox-manager is now continuous 2026-02-12 20:22:40 -08:00
Timothy 83140a1398 feat: event source in runtime 2026-02-12 19:52:15 -08:00
bryan cafa6dd930 update to inbox management agent 2026-02-12 19:17:54 -08:00
jaathavan18 82e1af1a7a docs: fix references to nonexistent setup-python.sh (#4648) 2026-02-12 22:08:35 -05:00
jaathavan18 30c3dc9205 docs: fix broken exports/ link in environment-setup.md (#4645) 2026-02-12 22:04:25 -05:00
Timothy 9a3c6703e1 feat:phased compaction and event bus integration 2026-02-12 18:41:32 -08:00
RichardTang-Aden e26468aa19 Merge pull request #4587 from Antiarin/feat/codex-integration
feat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support
2026-02-12 17:55:47 -08:00
Richard Tang fe14992696 docs: update documentation 2026-02-12 17:54:08 -08:00
Richard Tang d0775b95c6 fix: remove unused agents.md and quickstart clause 2026-02-12 17:48:31 -08:00
Richard Tang 96121b5757 fix: correct codex setups 2026-02-12 17:35:15 -08:00
Timothy @aden 11c003c48d Merge pull request #4636 from TimothyZhang7/feature/memory-inheritance
Feature: Conversation Memory & Continuous Agent Session
2026-02-12 13:52:19 -08:00
Timothy fbe72c58ae chore: fix ci tests 2026-02-12 13:49:34 -08:00
bryan 816156e87f merge: PR #4636 feature/memory-inheritance into feat/inbox-management
Brings in append_data tool, continuous conversation mode, conversation
judge, phase compaction, and prompt composer from the memory-inheritance
feature branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:23:25 -08:00
bryan 7bceab3cea wip: inbox management agent setup and gmail tool updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:22:19 -08:00
Timothy 83d7f56728 chore: update agent skills for new design philosophy 2026-02-12 12:10:37 -08:00
Timothy 76deba2a6a feat: consistent memory system 2026-02-12 11:41:22 -08:00
Antiarin d9d048b9e3 docs: update quickstart script for symlink handling and add Codex CLI documentation 2026-02-12 23:34:17 +05:30
bryan 930f417729 Merge remote-tracking branch 'origin/main' into feat/inbox-management 2026-02-12 08:44:13 -08:00
bryan 8e214d06c1 moved inbox management to sample agents 2026-02-12 08:33:43 -08:00
e-cesar9 63e0348963 fix(ux): improve failure messaging with success rate
Shows clear success rate when some exclusions fail (e.g., "Only 2/3
exclusions added (67%)"). Helps users understand that performance
benefit may be reduced when not all paths are excluded.

- Calculates success rate as percentage
- Shows "X/Y added (Z%)" format for clarity
- Warns that performance benefit may be reduced
- Better visibility of partial failures

Improves user awareness of partial installation issues.
2026-02-12 17:30:17 +01:00
e-cesar9 b46a5f0247 fix(robustness): recheck Defender status before adding exclusions
Adds Test-IsDefenderEnabled helper and rechecks Defender status and
paths before adding exclusions. Prevents adding ineffective exclusions
if Defender was disabled during user prompt (race condition).

- New helper: Test-IsDefenderEnabled() for quick boolean check
- Recheck Defender status immediately before Add operation
- Recheck paths to detect if added by another process
- Clear messages if status changed or already added

Fixes race condition where user could disable Defender or another
process could add exclusions during the prompt delay.
2026-02-12 17:29:44 +01:00
e-cesar9 79dfd90068 fix(security): add path validation for Defender exclusions
Validates that paths are within safe boundaries (project directory or
user AppData) before excluding them from Defender. Prevents accidental
or malicious exclusion of system directories.

- Adds safePrefixes validation (project dir + user AppData only)
- Checks each path against allowed prefixes
- Normalizes paths for consistent comparison
- Warns but processes non-existent paths (they may be created later)

Fixes potential security issue where modified script could exclude
system directories like C:\Windows or C:\Program Files.
2026-02-12 17:29:00 +01:00
Antiarin f9d5c7c751 fix(codex): patch prompt injection, checkout version, and style in codex-issue-triage workflow 2026-02-12 20:25:19 +05:30
Antiarin 8958fb2d88 eat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support 2026-02-12 18:32:58 +05:30
Zhang 3c51f2ac36 fix(tui): address code review feedback on TextArea migration
- Subclass TextArea as ChatTextArea to intercept Enter key before
  the base class swallows it (fixes submission not triggering)
- Remove event.shift access that raises AttributeError on Key events
- Make action_show_sessions directly call _submit_input instead of
  just placing text in the widget
2026-02-11 23:45:30 -08:00
RichardTang-Aden 170a0918f7 Merge pull request #4538 from jaathavan18/docs/issue-4493-remove-changelog-references-v2
docs: remove references to nonexistent CHANGELOG.md
2026-02-11 19:15:00 -08:00
jaathavan18 e3da3b619c docs: remove references to nonexistent CHANGELOG.md (#4493) 2026-02-11 21:48:28 -05:00
RichardTang-Aden 6e32513b79 Merge pull request #4535 from RichardTang-Aden/main
docs: add the automonous agent guide
2026-02-11 17:11:02 -08:00
Richard Tang 520e1963ee docs: add the automonous agent guide 2026-02-11 17:10:21 -08:00
RichardTang-Aden 843b9b55e2 Merge pull request #2862 from himanshu748/feat/antigravity-ide-2571
feat: Antigravity IDE support for MCP servers and skills
2026-02-11 16:59:15 -08:00
Richard Tang ccd305ff96 fix: remove incorrect rules folder 2026-02-11 16:58:02 -08:00
Richard Tang 3bd0d1e48c feat: add Antigravity workflows for all hive skills 2026-02-11 16:56:15 -08:00
Richard Tang d9bfa8e675 docs: rename .antigravity to .agent for Antigravity IDE compatibility 2026-02-11 16:42:26 -08:00
Richard Tang 27746147e2 fix: fix the antigravity config folder 2026-02-11 16:41:21 -08:00
Richard Tang 3a0b642980 fix: update the antigravity config 2026-02-11 16:39:28 -08:00
Richard Tang 8c0241f087 fix: use --directory instead of cwd for Antigravity MCP compatibility 2026-02-11 16:36:12 -08:00
Richard Tang 958d016174 fix: use uv run for MCP servers, script reads from template 2026-02-11 16:03:32 -08:00
Richard Tang 913d318ada docs: emphasize restarting Antigravity after MCP setup, fix config path and skill names 2026-02-11 15:57:11 -08:00
Richard Tang 8212920cb7 fix: correct .antigravity/skills symlinks to point to hive-* directories 2026-02-11 15:52:27 -08:00
Richard Tang 6414be7bd4 fix: change the wrong antigravity mcp path 2026-02-11 15:47:54 -08:00
himanshu748 ac62a82d08 docs: clarify Antigravity setup for everyone
- Define repo root at top; lead with quick start (3 steps)
- Add 'What you get' and prerequisites in one place
- Full setup step-by-step; troubleshooting: problem → fix
- Manual MCP config as single section; verification optional
- Plain language, scannable structure, no duplicate sections
2026-02-11 15:45:56 -08:00
himanshu748 a670548a57 Antigravity: one-command setup script and clearer docs
- Add scripts/setup-antigravity-mcp.sh to auto-detect repo root and write
  ~/.gemini/mcp.json with absolute paths (no manual path editing)
- Lead docs with Quick start (3 steps) and note ./ vs / for the script
- README: point to one-command setup; clarify script runs from repo folder
2026-02-11 15:45:56 -08:00
himanshu748 c4a7463f9d docs(antigravity): global config, absolute cwd, cwd warning note
- Add step to run core/setup_mcp.sh first
- Document that IDE often loads ~/.claude/mcp.json, not project config
- Add Option A: copy to ~/.claude/mcp.json with absolute cwd paths
- Note that cwd schema warning in IDE is a false positive
- Renumber setup steps (1–5)
2026-02-11 15:45:30 -08:00
himanshu748 edf0ac5270 docs: add how-to-verify section for Antigravity setup 2026-02-11 15:45:30 -08:00
himanshu748 8ff6b76f37 feat: Antigravity IDE support for MCP servers and skills (#2571)
- Add .antigravity/mcp_config.json with agent-builder and tools MCP servers
- Add .antigravity/skills/ with symlinks to .claude/skills/ (5 skills)
- Add docs/antigravity-setup.md with setup and troubleshooting
- Update README.md with Antigravity IDE support section
- Update DEVELOPER.md and docs/contributing-lint-setup.md with Antigravity refs

Mirrors Cursor integration for consistent multi-IDE support.
2026-02-11 15:45:30 -08:00
Bryan @ Aden c9f9eb365c Merge pull request #4441 from saurabh007007/feature/docs/spelling
docs:fix typo in docs for the directory in environment steup
2026-02-11 23:37:44 +00:00
e-cesar9 7a17c115d3 perf(windows): add Windows Defender exclusions for 40% faster uv sync
Fixes #4428

- Add Step 2.5 to quickstart.ps1 for optional Defender exclusions
- Requires admin + explicit user consent (default=no)
- Handles non-admin gracefully (shows manual commands)
- Improves uv sync by ~40% and cold-start by ~30% on Windows 11
- Never fails installation (graceful degradation)

Implementation:
- 3 helper functions: Test-IsAdmin, Test-DefenderExclusions, Add-DefenderExclusions
- Excludes: project dir, .venv, %APPDATA%\uv
- Security: Default=no, admin-only, clear trade-offs explained
- Robustness: Handles Defender disabled, existing exclusions, partial failures
- UX: Follows existing Write-Step/Ok/Warn patterns, clipboard fallback
- Testing: Idempotent, can run multiple times safely
2026-02-11 23:57:55 +01:00
Bryan @ Aden 9a2a11055f Merge pull request #4520 from jaathavan18/docs/issue-4495-remove-config-yaml-reference
docs: remove reference to nonexistent config.yaml
2026-02-11 22:36:16 +00:00
bryan f21aecd91c Add Gmail inbox management tools (list, get, trash, modify labels) 2026-02-11 14:33:01 -08:00
jaathavan18 4aef73c1d7 docs: remove reference to nonexistent config.yaml (#4495) 2026-02-11 15:07:04 -05:00
bryan 9df147b450 removed send_budget_alert_email, require provider param 2026-02-11 11:26:23 -08:00
RichardTang-Aden b71b4b0fc2 Merge pull request #4500 from RichardTang-Aden/main
fix: remove unused tools from root .mcp.json
2026-02-11 10:23:45 -08:00
Richard Tang 1bd2510c52 fix: remove unused tools from root .mcp.json 2026-02-11 10:20:20 -08:00
RichardTang-Aden 28b81092f9 Merge pull request #4498 from jaathavan18/docs/issue-4494-fix-env-example-reference
docs: fix .env.example references in tools README
2026-02-11 10:11:53 -08:00
RichardTang-Aden 4b9a3abba6 micro-fix: python lint error (#4499)
* micro-fix: python lint error

* micro-fix: python lint format
2026-02-11 10:09:17 -08:00
Richard Tang 0c76b6dcb1 micro-fix: python lint format 2026-02-11 10:08:40 -08:00
Richard Tang 090a85b41b micro-fix: python lint error 2026-02-11 10:04:25 -08:00
jaathavan18 992d573573 docs: fix .env.example references in tools README (#4494) 2026-02-11 12:58:51 -05:00
e-cesar9 9e768e660b micro-fix: move inline imports to module level in edge.py (#4480)
Fixes #4445

Moved repeated inline imports (logging, json, re) to module-level:
- Eliminates import overhead on every method call
- Follows PEP 8 conventions
- Added module-level logger instance
- re is used at line 259 (re.search)

Changes:
- 4 lines added (imports + logger)
- 13 lines removed (inline imports)
- No functional changes
2026-02-11 09:55:14 -08:00
RichardTang-Aden 26b9ed362e Merge pull request #4479 from e-cesar9/microfix/4444-typo-stirct
micro-fix: fix typo STIRCT → STRICT in safe_eval.py
2026-02-11 09:48:55 -08:00
Timothy @aden 976ae75fde Merge pull request #4487 from adenhq/feature/windows-quickstart
chore(micro-fix): windows quickstart
2026-02-11 08:18:53 -08:00
Timothy @aden 9da91b5319 Merge branch 'main' into feature/windows-quickstart 2026-02-11 08:18:07 -08:00
Timothy @aden 2493beaf5a Merge pull request #4437 from GovindhKishore/fix/powershell-syntax-error
micro-fix: resolve NativeCommandError in uv sync and ParserError in dynamic env var assignment
2026-02-11 07:51:01 -08:00
e-cesar9 d63dd021ab micro-fix: fix typo STIRCT → STRICT in safe_eval.py
Fixes #4444
2026-02-11 14:40:25 +01:00
Zhang 697ba89314 fix(tui): add multiline input and cross-platform clipboard support (#4423)
Replace single-line Input widget with TextArea in chat REPL so
Shift+Enter inserts newlines and multiline paste works correctly.
Add Windows clipboard support (clip.exe) and xsel fallback for Linux.
2026-02-11 00:01:09 -08:00
Arshad Uzzama Shaik b6c65ab5d5 fix(security): handle unix-style absolute paths correctly on Windows (#1204)
* fix(security): normalize unix-style paths on windows to prevent sandbox escape (fixes #499)

* fix(security): handle unix-style absolute paths correctly on Windows

* fix: address review comments on path sanitization

- Strip leading whitespace to prevent startswith bypass
- Replace lstrip with single-char strip to preserve UNC paths
- Expand ValueError comment for clarity

---------

Co-authored-by: Arshad Shaik <arshad.shaik@violetis.ai>
Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-11 15:40:38 +08:00
Arshad Uzzama Shaik 162f9a55ad fix(mcp): log errors in _load_active_session instead of silencing them (fixes #682) (#684)
Co-authored-by: Arshad Shaik <arshad.shaik@violetis.ai>
2026-02-11 15:09:10 +08:00
Govindh Kishore e484fdfa51 fix: Replaced multi-argument Join-Path with [System.IO.Path]::Combine for PS 5.1 compatibility. 2026-02-11 12:30:25 +05:30
Amit Kumar 77d9ccf2e4 feat(tools): add SerpAPI tools for Google Scholar & Patents search (#3986)
Implements 5 tools as proposed in #3224:
- scholar_search: Search Google Scholar for academic papers
- scholar_get_citations: Get citation formats (MLA, APA, Chicago, etc.)
- scholar_get_author: Author profiles with h-index, i10-index, metrics
- patents_search: Search Google Patents with filters
- patents_get_details: Detailed patent information by publication number

Follows existing tool pattern (web_search_tool, hubspot_tool, slack_tool):
- register_tools(mcp, credentials) with @mcp.tool() decorators
- _SerpAPIClient internal class for HTTP calls via httpx
- Credential fallback: CredentialStoreAdapter -> SERPAPI_API_KEY env var
- Error handling: always returns dicts, never raises

25 unit tests + live integration tests verified.
697/697 full test suite passing.

Fixes #3224
2026-02-11 14:46:59 +08:00
Govindh Kishore 94e39ee09e fix: resolve NativeCommandError by stabilizing uv sync output capture 2026-02-11 11:59:15 +05:30
saurabh007007 373ad77008 docs:fix typo in docs 2026-02-11 11:46:26 +05:30
Govindh Kishore 661b0c0038 fix: resolve PowerShell ParserError in quickstart.ps1 2026-02-11 11:02:37 +05:30
Govindh Kishore 8ed38bf0e2 fix: resolve PowerShell ParserError in quickstart.ps1 2026-02-11 11:01:33 +05:30
RichardTang-Aden 4d675dfff7 Merge pull request #4431 from adenhq/developer-success-documentation
docs: Developer success documentation
2026-02-10 20:12:11 -08:00
Timothy @aden 87e9bf853d Merge pull request #4425 from TimothyZhang7/feature/windows-quickstart
feat(quickstart): windows script and cli
2026-02-10 19:26:23 -08:00
Timothy @aden c56f78422a Merge pull request #4408 from adenhq/fix/first-success
(micro-fix): Fix/first success
2026-02-10 18:42:12 -08:00
Timothy ac311e10ba Merge remote-tracking branch 'origin/main' into fix/first-success 2026-02-10 18:40:27 -08:00
Timothy @aden 0297520263 Merge pull request #4309 from TimothyZhang7/feature/automated-testing-skill
Feature/automated testing skill
2026-02-10 18:36:57 -08:00
Bryan @ Aden 4803552a7a Merge pull request #4429 from adenhq/feat/opencode-skills
(micro-fix): added skills to opencode, linking to claude
2026-02-11 02:23:37 +00:00
Bryan @ Aden b8d85ff723 Merge pull request #4021 from NamanRajput-git/main
docs: fix incorrect submission email in quizzes documentation
2026-02-11 02:20:33 +00:00
bryan 7d571dfaec added skills to opencode, linking to claude 2026-02-10 18:20:21 -08:00
Bryan @ Aden 153e6142ff Merge pull request #4243 from vakrahul/feat/opencode-integration
feat: Add Opencode integration as a Coding Agent
2026-02-11 02:12:10 +00:00
Bryan @ Aden 228449c9d8 Merge pull request #3894 from AkhileshBabuT/docs/fix-environment-setup-3837-3841
docs: fix environment-setup.md for uv workspace setup
2026-02-11 02:09:33 +00:00
vakrahul c65eed8802 docs: apply PR review changes for quickstart and markdown formatting and updating 2026-02-11 07:36:23 +05:30
vakrahul c83aac5e12 docs: apply PR review changes for quickstart and markdown formatting 2026-02-11 07:29:48 +05:30
vakrahul 48b9241247 chore: address PR review feedback (formatting and outdated references) 2026-02-11 07:18:17 +05:30
Bryan @ Aden beec549f74 Merge pull request #4417 from e-cesar9/micro-fix/debug-logging-level
micro-fix: change debug logging to appropriate level in executor.py
2026-02-11 01:30:20 +00:00
Bryan @ Aden 310698ecc0 Merge pull request #4287 from nhockcuncon77/docs/contributing-numbering-and-tools-tests
docs(contributing): fix duplicate step numbers and add tools test command
2026-02-11 01:26:00 +00:00
Bryan @ Aden 4f719c4778 Merge pull request #4299 from YashovardhanB28/fix/llm-token-counting
docs(fix): correct LLMNode token counting bug
2026-02-11 01:25:51 +00:00
bryan 4cc00f3bdc removed dead tests, updated some drifting behavior 2026-02-10 16:56:36 -08:00
bryan 1f9c47fef1 removed twitter agent, create new terminal, resume command, update quickstart 2026-02-10 16:41:08 -08:00
e-cesar9 80a4980640 micro-fix: change debug logging to appropriate level in executor.py
Changes logger.info() with debug prefix to logger.debug() for
session state resume information. This prevents debug-level
information from appearing in production logs at INFO level.

- Removes redundant '🔍 Debug:' prefix
- Uses appropriate debug logging level
- Follows Python logging best practices
- Improves production log clarity

Addresses #4377
2026-02-11 00:15:17 +01:00
Timothy Zhang 8dbe424f5a fix: environment loader compatibility tweak for windows 2026-02-10 15:10:32 -08:00
Timothy Zhang ec9bf033e6 feat: windows quickstart and hive cli 2026-02-10 15:09:38 -08:00
Timothy @aden 5e537d9d55 Merge pull request #4410 from TimothyZhang7/feature/windows-quickstart
feat(quickstart): windows script
2026-02-10 12:52:13 -08:00
Timothy d6b95067a1 feat(quickstart): windows script 2026-02-10 12:49:25 -08:00
bryan 32cae75ef5 intro msg 2026-02-10 11:28:34 -08:00
bryan 21e7554cdb cred update in quickstart, sample agent check before agent run, agent has welcome msg 2026-02-10 11:27:17 -08:00
Timothy 374442e900 Merge branch 'main' into feature/automated-testing-skill 2026-02-09 20:16:11 -08:00
vakrahul a1a0ec5ddb Remove skills folder (reviewer will handle symlinks) and cleanup doc refs 2026-02-10 09:45:44 +05:30
Timothy 1fd56b079c fix: test cases 2026-02-09 20:12:37 -08:00
vakrahul 87b0037fcd Add Opencode skills (mirrored from .cursor/skills) 2026-02-10 09:25:43 +05:30
Timothy 767d32d420 fix: stucking test cases 2026-02-09 19:51:50 -08:00
vakrahul 929dc24e93 Fix config: use uv for mcp and remove pinned model 2026-02-10 09:12:42 +05:30
vakrahul 8cfb533fef Fix docs: remove recommended tag, fix rendering, remove duplicate setup 2026-02-10 09:10:47 +05:30
Timothy 6fd7efece6 feat: hive test, quickstart local settings 2026-02-09 18:53:42 -08:00
Timothy 776583b3ad fix: use more sensible default models 2026-02-09 17:07:34 -08:00
Timothy 9c28dae583 fix: gemini should use GEMINI_API_KEY 2026-02-09 17:05:11 -08:00
YashovardhanB 59a315b90b fix(graph): correct LLMNode token counting to include retries 2026-02-10 06:21:54 +05:30
Timothy 866518f188 fix: headless runtime consolidation 2026-02-09 16:29:06 -08:00
amazonproai e5428bec5c docs(contributing): fix duplicate step numbers and add tools test command
- Fix Getting Started steps 6/7 duplicated; renumber to 8 and 9
- Add command to run tools package tests (cd tools && uv run pytest)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 14:27:53 -08:00
Timothy faf534511b feat: automated test agent skill 2026-02-09 12:39:20 -08:00
Timothy 9d11f834b8 feat: automated testing skill 2026-02-09 11:05:59 -08:00
vakrahul 131b72cd0c feat: Add native Opencode support with Windows compatibility 2026-02-09 23:27:54 +05:30
Naman Rajput 1da9bb0c0f Merge pull request #1 from NamanRajput-git/fix-submission-email
Fix submission email in quizzes documentation
2026-02-08 00:53:27 +05:30
Naman Rajput 760ed51ad3 Fix submission email in quizzes documentation
Updates incorrect careers@aden.com to contact@adenhq.com
2026-02-08 00:45:53 +05:30
AkhileshBabuT a08f3a8925 docs: fix environment-setup.md for uv workspace setup
- Fix #3837: Replace pip install -e with uv sync
- Fix #3841: Update venv location to reflect single root .venv
2026-02-06 23:42:07 -05:00
oluwasegun.haziz.omd 7fae57f311 more fixes 2026-02-03 11:53:45 +00:00
haliaeetusvocifer 1f653969a9 added feature google docs integration 2026-02-03 03:51:39 +00:00
358 changed files with 59204 additions and 11671 deletions
+9
View File
@@ -0,0 +1,9 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"],
"disabled": false
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-concepts
---
use hive-concepts skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-create
---
use hive-create skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-credentials
---
use hive-credentials skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-patterns
---
use hive-patterns skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-test
---
use hive-test skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive
---
use hive skill
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+34
View File
@@ -0,0 +1,34 @@
{
"permissions": {
"allow": [
"mcp__agent-builder__create_session",
"mcp__agent-builder__set_goal",
"mcp__agent-builder__add_node",
"mcp__agent-builder__add_edge",
"mcp__agent-builder__configure_loop",
"mcp__agent-builder__add_mcp_server",
"mcp__agent-builder__validate_graph",
"mcp__agent-builder__export_graph",
"mcp__agent-builder__load_session_by_id",
"Bash(git status:*)",
"Bash(gh run view:*)",
"Bash(uv run:*)",
"Bash(env:*)",
"mcp__agent-builder__test_node",
"mcp__agent-builder__list_mcp_tools",
"Bash(python -m py_compile:*)",
"Bash(python -m pytest:*)",
"Bash(source:*)",
"mcp__agent-builder__update_node",
"mcp__agent-builder__check_missing_credentials",
"mcp__agent-builder__list_stored_credentials",
"Bash(find:*)",
"mcp__agent-builder__run_tests",
"Bash(PYTHONPATH=core:exports:tools/src uv run pytest:*)",
"mcp__agent-builder__list_agent_sessions",
"mcp__agent-builder__generate_constraint_tests",
"mcp__agent-builder__generate_success_tests"
]
},
"enabledMcpjsonServers": ["agent-builder", "tools"]
}
+69 -6
View File
@@ -492,7 +492,7 @@ AskUserQuestion(questions=[{
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- node_type: `"event_loop"` (the only valid type; use `client_facing: True` for HITL)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist from Step 1 — empty list if no tools needed)
@@ -553,6 +553,26 @@ AskUserQuestion(questions=[{
- condition_expr (Python expression, only if conditional)
- priority (positive = forward, negative = feedback/loop-back)
**DETERMINE the graph lifecycle.** Not every agent needs a terminal node:
| Pattern | `terminal_nodes` | When to Use |
|---------|-------------------|-------------|
| **Linear (finish)** | `["last-node"]` | Agent completes a task and exits (batch processing, one-shot generation) |
| **Forever-alive (loop)** | `[]` (empty) | Agent stays alive for continuous interaction (research assistant, personal assistant, monitoring) |
**Forever-alive pattern:** The deep_research_agent example uses `terminal_nodes=[]`. Every leaf node has edges that loop back to earlier nodes, creating a perpetual session. The agent only stops when the user explicitly exits. This is the preferred pattern for interactive, multi-turn agents.
**Key design rules for forever-alive graphs:**
- Every node must have at least one outgoing edge (no dead ends)
- Client-facing nodes block for user input — these are the natural "pause points"
- The user controls when to stop, not the graph
- Sessions accumulate memory across loops — plan for conversation compaction
- Use `conversation_mode="continuous"` to preserve conversation history across node transitions
- `max_iterations` should be set high (e.g., 100) since the agent is designed to run indefinitely
- The agent will NOT enter a "completed" execution state — this is intentional, not a bug
**Ask the user** which lifecycle pattern fits their agent. Default to forever-alive for interactive agents, linear for batch/one-shot tasks.
**RENDER the complete graph as ASCII art.** Make it large and clear — the user needs to see and understand the full workflow at a glance.
**IMPORTANT: Make the ASCII art BIG and READABLE.** Use a box-and-arrow style with generous spacing. Do NOT make it tiny or compressed. Example format:
@@ -669,6 +689,7 @@ AskUserQuestion(questions=[{
|------|---------------|
| `config.py` | `AgentMetadata.name` — the display name shown in TUI agent selection |
| `config.py` | `AgentMetadata.description` — agent description |
| `config.py` | `AgentMetadata.intro_message` — greeting shown to user when TUI loads |
| `agent.py` | Module docstring (line 1) |
| `agent.py` | `class OldNameAgent:``class NewNameAgent:` |
| `agent.py` | `GraphSpec(id="old-name-graph")``GraphSpec(id="new-name-graph")` — shown in TUI status bar |
@@ -735,7 +756,7 @@ mcp__agent-builder__export_graph()
**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
1. `config.py` - Runtime configuration with model settings
1. `config.py` - Runtime configuration with model settings and `AgentMetadata` (including `intro_message` — the greeting shown when TUI loads)
2. `nodes/__init__.py` - All NodeSpec definitions
3. `agent.py` - Goal, edges, graph config, and agent class
4. `__init__.py` - Package exports
@@ -831,8 +852,7 @@ cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME vali
| Type | tools param | Use when |
| ------------ | ----------------------- | --------------------------------------- |
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
| `event_loop` | `'["tool1"]'` or `'[]'` | All agent work (with or without tools, HITL via client_facing) |
---
@@ -911,6 +931,46 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
---
## REFERENCE: Graph Lifecycle & Conversation Memory
### Terminal vs Forever-Alive Graphs
Agents have two lifecycle patterns:
**Linear (terminal) graphs** have `terminal_nodes=["last-node"]`. Execution ends when the terminal node completes. The session enters a "completed" state. Use for batch processing, one-shot generation, and fire-and-forget tasks.
**Forever-alive graphs** have `terminal_nodes=[]` (empty). Every node has at least one outgoing edge — the graph loops indefinitely. The session **never enters a "completed" state** — this is intentional. The agent stays alive until the user explicitly exits. Use for interactive assistants, research tools, and any agent where the user drives the conversation.
The deep_research_agent example demonstrates this: `report` loops back to either `research` (dig deeper) or `intake` (new topic). The agent is a persistent, interactive assistant.
### Continuous Conversation Mode
When `conversation_mode="continuous"` is set on the GraphSpec, the framework preserves a **single conversation thread** across all node transitions:
**What the framework does automatically:**
- **Inherits conversation**: Same message history carries forward to the next node
- **Composes layered system prompts**: Identity (agent-level) + Narrative (auto-generated state summary) + Focus (per-node instructions)
- **Inserts transition markers**: At each node boundary, a "State of the World" message showing completed phases, current memory, and available data files
- **Accumulates tools**: Once a tool becomes available, it stays available in subsequent nodes
- **Compacts opportunistically**: At phase transitions, old tool results are pruned to stay within token budget
**What this means for agent builders:**
- Nodes don't need to re-explain context — the conversation carries it forward
- Output keys from earlier nodes are available in memory for edge conditions and later nodes
- For forever-alive agents, conversation memory persists across the entire session lifetime
- Plan for compaction: very long sessions will have older tool results summarized automatically
**When to use continuous mode:**
- Interactive agents with client-facing nodes (always)
- Multi-phase workflows where context matters across phases
- Forever-alive agents that loop indefinitely
**When NOT to use continuous mode:**
- Embarrassingly parallel fan-out nodes (each branch should be independent)
- Stateless utility agents that process items independently
---
## REFERENCE: Framework Capabilities for Qualification
Use this reference during STEP 2 to give accurate, honest assessments.
@@ -943,11 +1003,11 @@ Use this reference during STEP 2 to give accurate, honest assessments.
| Use Case | Why It's Problematic | Alternative |
|----------|---------------------|-------------|
| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
| Persistent background daemons (no user) | Forever-alive graphs need a user at client-facing nodes; no autonomous background polling without user | External scheduler triggering agent runs |
| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
| Guaranteed determinism | LLM outputs vary | Traditional code for deterministic parts |
| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
| Multi-user concurrency | Single-user session model | Separate agent instances per user |
@@ -978,3 +1038,6 @@ Use this reference during STEP 2 to give accurate, honest assessments.
11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
14. **Assuming all agents need terminal nodes** - Interactive agents often work best with `terminal_nodes=[]` (forever-alive pattern). The agent never enters "completed" state — this is intentional. Only batch/one-shot agents need terminal nodes
15. **Creating dead-end nodes in forever-alive graphs** - Every node must have at least one outgoing edge. A node with no outgoing edges will cause execution to end unexpectedly, breaking the forever-alive loop
16. **Not using continuous conversation mode for interactive agents** - Multi-phase interactive agents should use `conversation_mode="continuous"` to preserve context across node transitions. Without it, each node starts with a blank conversation and loses all prior context
@@ -1,12 +1,15 @@
"""Agent graph construction for Deep Research Agent."""
from pathlib import Path
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.runtime.event_bus import EventBus
from framework.runtime.core import Runtime
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import (
@@ -120,13 +123,31 @@ edges = [
condition_expr="needs_more_research == False",
priority=2,
),
# report -> research (user wants deeper research on current topic)
EdgeSpec(
id="report-to-research",
source="report",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'more_research'",
priority=2,
),
# report -> intake (user wants a new topic — default when not more_research)
EdgeSpec(
id="report-to-intake",
source="report",
target="intake",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() != 'more_research'",
priority=1,
),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = ["report"]
terminal_nodes = []
class DeepResearchAgent:
@@ -136,6 +157,12 @@ class DeepResearchAgent:
Flow: intake -> research -> review -> report
^ |
+-- feedback loop (if user wants more)
Uses AgentRuntime for proper session management:
- Session-scoped storage (sessions/{session_id}/)
- Checkpointing for resume capability
- Runtime logging
- Data folder for save_data/load_data
"""
def __init__(self, config=None):
@@ -147,10 +174,10 @@ class DeepResearchAgent:
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._executor: GraphExecutor | None = None
self._graph: GraphSpec | None = None
self._event_bus: EventBus | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
@@ -171,16 +198,20 @@ class DeepResearchAgent:
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
conversation_mode="continuous",
identity_prompt=(
"You are a rigorous research agent. You search for information "
"from diverse, authoritative sources, analyze findings critically, "
"and produce well-cited reports. You never fabricate information — "
"every claim must trace back to a source you actually retrieved."
),
)
def _setup(self, mock_mode=False) -> GraphExecutor:
"""Set up the executor with all components."""
from pathlib import Path
def _setup(self, mock_mode=False) -> None:
"""Set up the agent runtime with sessions, checkpoints, and logging."""
self._storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
self._event_bus = EventBus()
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
@@ -199,47 +230,63 @@ class DeepResearchAgent:
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
runtime = Runtime(storage_path)
self._executor = GraphExecutor(
runtime=runtime,
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True,
)
entry_point_specs = [
EntryPointSpec(
id="default",
name="Default",
entry_node=self.entry_node,
trigger_type="manual",
isolation_level="shared",
)
]
self._agent_runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=self._storage_path,
entry_points=entry_point_specs,
llm=llm,
tools=tools,
tool_executor=tool_executor,
event_bus=self._event_bus,
storage_path=storage_path,
loop_config=self._graph.loop_config,
checkpoint_config=checkpoint_config,
)
return self._executor
async def start(self, mock_mode=False) -> None:
"""Set up the agent (initialize executor and tools)."""
if self._executor is None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Clean up resources."""
self._executor = None
self._event_bus = None
"""Stop the agent runtime and clean up."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(
self,
entry_point: str,
input_data: dict,
entry_point: str = "default",
input_data: dict | None = None,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._executor is None:
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
if self._graph is None:
raise RuntimeError("Graph not built. Call start() first.")
return await self._executor.execute(
graph=self._graph,
goal=self.goal,
input_data=input_data,
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point,
input_data=input_data or {},
session_state=session_state,
)
@@ -250,7 +297,7 @@ class DeepResearchAgent:
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait(
"start", context, session_state=session_state
"default", context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
@@ -16,6 +16,11 @@ class AgentMetadata:
"multi-source search, quality evaluation, and synthesis - with TUI conversation "
"at key checkpoints for user guidance and feedback."
)
intro_message: str = (
"Hi! I'm your deep research assistant. Tell me a topic and I'll investigate it "
"thoroughly — searching multiple sources, evaluating quality, and synthesizing "
"a comprehensive report. What would you like me to research?"
)
metadata = AgentMetadata()
@@ -10,8 +10,13 @@ intake_node = NodeSpec(
description="Discuss the research topic with the user, clarify scope, and confirm direction",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["topic"],
output_keys=["research_brief"],
success_criteria=(
"The research brief is specific and actionable: it states the topic, "
"the key questions to answer, the desired scope, and depth."
),
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
@@ -38,10 +43,14 @@ research_node = NodeSpec(
name="Research",
description="Search the web, fetch source content, and compile findings",
node_type="event_loop",
max_node_visits=3,
max_node_visits=0,
input_keys=["research_brief", "feedback"],
output_keys=["findings", "sources", "gaps"],
nullable_output_keys=["feedback"],
success_criteria=(
"Findings reference at least 3 distinct sources with URLs. "
"Key claims are substantiated by fetched content, not generated."
),
system_prompt="""\
You are a research agent. Given a research brief, find and analyze sources.
@@ -56,18 +65,26 @@ Work in phases:
and any contradictions between sources.
Important:
- Work in batches of 3-4 tool calls at a time to manage context
- Work in batches of 3-4 tool calls at a time never more than 10 per turn
- After each batch, assess whether you have enough material
- Prefer quality over quantity 5 good sources beat 15 thin ones
- Track which URL each finding comes from (you'll need citations later)
- Call set_output for each key in a SEPARATE turn (not in the same turn as other tool calls)
When done, use set_output:
When done, use set_output (one key at a time, separate turns):
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
Include themes, contradictions, and confidence levels.")
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
""",
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
tools=[
"web_search",
"web_scrape",
"load_data",
"save_data",
"append_data",
"list_data_files",
],
)
# Node 3: Review (client-facing)
@@ -78,9 +95,13 @@ review_node = NodeSpec(
description="Present findings to user and decide whether to research more or write the report",
node_type="event_loop",
client_facing=True,
max_node_visits=3,
max_node_visits=0,
input_keys=["findings", "sources", "gaps", "research_brief"],
output_keys=["needs_more_research", "feedback"],
success_criteria=(
"The user has been presented with findings and has explicitly indicated "
"whether they want more research or are ready for the report."
),
system_prompt="""\
Present the research findings to the user clearly and concisely.
@@ -109,49 +130,70 @@ report_node = NodeSpec(
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status"],
output_keys=["delivery_status", "next_action"],
success_criteria=(
"An HTML report has been saved, the file link has been presented to the user, "
"and the user has indicated what they want to do next."
),
system_prompt="""\
Write a comprehensive research report as an HTML file and present it to the user.
Write a research report as an HTML file and present it to the user.
**STEP 1 Write the HTML report (tool calls, NO text to user yet):**
IMPORTANT: save_data requires TWO separate arguments: filename and data.
Call it like: save_data(filename="report.html", data="<html>...</html>")
Do NOT use _raw, do NOT nest arguments inside a JSON string.
1. Compose a complete, self-contained HTML document with embedded CSS styling.
Use a clean, readable design: max-width container, pleasant typography,
numbered citation links, a table of contents, and a references section.
**STEP 1 Write and save the HTML report (tool calls, NO text to user yet):**
Report structure inside the HTML:
- Title & date
- Executive Summary (2-3 paragraphs)
- Table of Contents
- Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications, areas of debate)
- Conclusion (key takeaways, confidence assessment)
- References (numbered list with clickable URLs)
Build a clean HTML document. Keep the HTML concise aim for clarity over length.
Use minimal embedded CSS (a few lines of style, not a full framework).
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
Report structure:
- Title & date
- Executive Summary (2-3 paragraphs)
- Key Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications)
- Conclusion (key takeaways)
- References (numbered list with clickable URLs)
2. Save the HTML file:
save_data(filename="report.html", data=<your_html>)
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Answer the original research questions from the brief
3. Get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
Save the HTML:
save_data(filename="report.html", data="<html>...</html>")
Then get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
If save_data fails, simplify and shorten the HTML, then retry.
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions.
of what the report covers. Ask if they have questions or want to continue.
**STEP 3 After the user responds:**
- Answer follow-up questions from the research material
- When the user is satisfied: set_output("delivery_status", "completed")
- Answer any follow-up questions from the research material
- When the user is ready to move on, ask what they'd like to do next:
- Research a new topic?
- Dig deeper into the current topic?
- Then call set_output:
- set_output("delivery_status", "completed")
- set_output("next_action", "new_topic") if they want a new topic
- set_output("next_action", "more_research") if they want deeper research
""",
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
tools=[
"save_data",
"append_data",
"edit_data",
"serve_file_to_user",
"load_data",
"list_data_files",
],
)
__all__ = [
+27 -5
View File
@@ -141,6 +141,12 @@ for f in ~/.zshrc ~/.bashrc ~/.profile; do [ -f "$f" ] && grep -q 'HIVE_CREDENTI
- **In shell config but NOT in current session** — run `source ~/.zshrc` (or `~/.bashrc`) first, then proceed
- **Not set anywhere** — `EncryptedFileStorage` will auto-generate one. After storing, tell the user to persist it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` in their shell profile
> **⚠️ IMPORTANT: After adding `HIVE_CREDENTIAL_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
#### Option 1: Aden Platform (OAuth)
This is the recommended flow for supported integrations (HubSpot, etc.).
@@ -202,6 +208,12 @@ if success:
print(f"Run: {source_cmd}")
```
> **⚠️ IMPORTANT: After adding `ADEN_API_KEY` to the user's shell config, always display:**
> ```
> ⚠️ Environment variables were added to your shell config.
> Open a NEW TERMINAL for them to take effect outside this session.
> ```
Also save to `~/.hive/configuration.json` for the framework:
```python
@@ -460,9 +472,14 @@ result: HealthCheckResult = check_credential_health("hubspot", token_value)
The local encrypted store requires `HIVE_CREDENTIAL_KEY` to encrypt/decrypt credentials.
- If the user doesn't have one, `EncryptedFileStorage` will auto-generate one and log it
- The user MUST persist this key (e.g., in `~/.bashrc` or a secrets manager)
- The user MUST persist this key (e.g., in `~/.bashrc`/`~/.zshrc` or a secrets manager)
- Without this key, stored credentials cannot be decrypted
- This is the ONLY secret that should live in `~/.bashrc` or environment config
**Shell config rule:** Only TWO keys belong in shell config (`~/.zshrc`/`~/.bashrc`):
- `HIVE_CREDENTIAL_KEY` — encryption key for the credential store
- `ADEN_API_KEY` — Aden platform auth key (needed before the store can sync)
All other API keys (Brave, Google, HubSpot, etc.) must go in the encrypted store only. **Never offer to add them to shell config.**
If `HIVE_CREDENTIAL_KEY` is not set:
@@ -475,6 +492,7 @@ If `HIVE_CREDENTIAL_KEY` is not set:
- **NEVER** log, print, or echo credential values in tool output
- **NEVER** store credentials in plaintext files, git-tracked files, or agent configs
- **NEVER** hardcode credentials in source code
- **NEVER** offer to save API keys to shell config (`~/.zshrc`/`~/.bashrc`) — the **only** keys that belong in shell config are `HIVE_CREDENTIAL_KEY` and `ADEN_API_KEY`. All other credentials (Brave, Google, HubSpot, GitHub, Resend, etc.) go in the encrypted store only.
- **ALWAYS** use `SecretStr` from Pydantic when handling credential values in Python
- **ALWAYS** use the local encrypted store (`~/.hive/credentials`) for persistence
- **ALWAYS** run health checks before storing credentials (when possible)
@@ -490,7 +508,7 @@ All credential specs are defined in `tools/src/aden_tools/credentials/`:
| `llm.py` | LLM Providers | `anthropic` | No |
| `search.py` | Search Tools | `brave_search`, `google_search`, `google_cse` | No |
| `email.py` | Email | `resend` | No |
| `integrations.py` | Integrations | `github`, `hubspot` | No / Yes |
| `integrations.py` | Integrations | `github`, `hubspot`, `google_calendar_oauth` | No / Yes |
**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment
variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS.
@@ -601,18 +619,22 @@ All credentials are now configured:
│ ✅ CREDENTIALS CONFIGURED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OPEN A NEW TERMINAL before running commands below. │
│ Environment variables were saved to your shell config but │
│ only take effect in new terminal sessions. │
│ │
│ NEXT STEPS: │
│ │
│ 1. RUN YOUR AGENT: │
│ │
PYTHONPATH=core:exports python -m research-agent tui
hive tui
│ │
│ 2. IF YOU ENCOUNTER ISSUES, USE THE DEBUGGER: │
│ │
│ /hive-debugger │
│ │
│ The debugger analyzes runtime logs, identifies retry loops, tool │
│ failures, stalled execution, and provides actionable fix suggestions. │
│ failures, stalled execution, and provides actionable fix suggestions.
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
+190 -66
View File
@@ -26,6 +26,17 @@ Use `/hive-debugger` when:
This skill works alongside agents running in TUI mode and provides supervisor-level insights into execution behavior.
### Forever-Alive Agent Awareness
Some agents use `terminal_nodes=[]` (the "forever-alive" pattern), meaning they loop indefinitely and never enter a "completed" execution state. For these agents:
- Sessions with status "in_progress" or "paused" are **normal**, not failures
- High step counts, long durations, and many node visits are expected behavior
- The agent stops only when the user explicitly exits — there is no graph-driven completion
- Debug focus should be on **quality of individual node visits and iterations**, not whether the session reached a terminal state
- Conversation memory accumulates across loops — watch for context overflow and stale data issues
**How to identify forever-alive agents:** Check `agent.py` or `agent.json` for `terminal_nodes=[]` (empty list). If empty, the agent is forever-alive.
---
## Prerequisites
@@ -47,7 +58,7 @@ Before using this skill, ensure:
**What to do:**
1. **Ask the developer which agent needs debugging:**
- Get agent name (e.g., "twitter_outreach", "deep_research_agent")
- Get agent name (e.g., "deep_research_agent", "deep_research_agent")
- Confirm the agent exists in `exports/{agent_name}/`
2. **Determine agent working directory:**
@@ -66,7 +77,7 @@ Before using this skill, ensure:
4. **Store context for the debugging session:**
- agent_name
- agent_work_dir (e.g., `/home/user/.hive/twitter_outreach`)
- agent_work_dir (e.g., `/home/user/.hive/deep_research_agent`)
- goal_id
- success_criteria
- constraints
@@ -74,19 +85,19 @@ Before using this skill, ensure:
**Example:**
```
Developer: "My twitter_outreach agent keeps failing"
Developer: "My deep_research_agent agent keeps failing"
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
You: "I'll help debug the deep_research_agent agent. Let me gather context..."
[Read exports/twitter_outreach/agent.json]
[Read exports/deep_research_agent/agent.json]
Context gathered:
- Agent: twitter_outreach
- Goal: twitter-outreach-multi-loop
- Working Directory: /home/user/.hive/twitter_outreach
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
- Constraints: ["Must verify handle exists", "Must personalize message"]
- Nodes: ["intake-collector", "profile-analyzer", "message-composer", "outreach-sender"]
- Agent: deep_research_agent
- Goal: deep-research
- Working Directory: /home/user/.hive/deep_research_agent
- Success Criteria: ["Produce a comprehensive research report with cited sources"]
- Constraints: ["Must cite all sources", "Must cover multiple perspectives"]
- Nodes: ["intake", "research", "analysis", "report-writer"]
```
---
@@ -142,6 +153,7 @@ Store the selected mode for the session.
- Check `attention_summary.categories` for issue types
- Note the `run_id` of problematic sessions
- Check `status` field: "degraded", "failure", "in_progress"
- **For forever-alive agents:** Sessions with status "in_progress" or "paused" are normal — these agents never reach "completed". Only flag sessions with `needs_attention: true` or actual error indicators (tool failures, retry loops, missing outputs). High step counts alone do not indicate a problem.
3. **Attention flag triggers to understand:**
From runtime_logger.py, runs are flagged when:
@@ -199,13 +211,20 @@ Which run would you like to investigate?
| **Tool Errors** | `tool_error_count > 0`, `attention_reasons` contains "tool_failures" | Tool calls failed (API errors, timeouts, auth issues) |
| **Retry Loops** | `retry_count > 3`, `verdict_counts.RETRY > 5` | Judge repeatedly rejecting outputs |
| **Guard Failures** | `guard_reject_count > 0` | Output validation failed (wrong types, missing keys) |
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress |
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress. **Caveat:** Forever-alive agents may legitimately have high step counts — check if agent is blocked at a client-facing node (normal) vs genuinely stuck in a loop |
| **High Latency** | `latency_ms > 60000`, `avg_step_latency > 5000` | Slow tool calls or LLM responses |
| **Client-Facing Issues** | `client_input_requested` but no `user_input_received` | Premature set_output before user input |
| **Edge Routing Errors** | `exit_status == "no_valid_edge"`, `attention_reasons` contains "routing_issue" | No edges match current state |
| **Memory/Context Issues** | `tokens_used > 100000`, `context_overflow_count > 0` | Conversation history too long |
| **Constraint Violations** | Compare output against goal constraints | Agent violated goal-level rules |
**Forever-Alive Agent Caveat:** If the agent uses `terminal_nodes=[]`, sessions will never reach "completed" status. This is by design. When debugging these agents, focus on:
- Whether individual node visits succeed (not whether the graph "finishes")
- Quality of each loop iteration — are outputs improving or degrading across loops?
- Whether client-facing nodes are correctly blocking for user input
- Memory accumulation issues: stale data from previous loops, context overflow across many iterations
- Conversation compaction behavior: is the conversation growing unbounded?
3. **Analyze each flagged node:**
- Node ID and name
- Exit status
@@ -224,7 +243,7 @@ Which run would you like to investigate?
```
Diagnosis for session_20260206_115718_e22339c5:
Problem Node: intake-collector
Problem Node: research
├─ Exit Status: escalate
├─ Retry Count: 5 (HIGH)
├─ Verdict Counts: {RETRY: 5, ESCALATE: 1}
@@ -232,7 +251,7 @@ Problem Node: intake-collector
├─ Total Steps: 8
└─ Categories: Missing Outputs + Retry Loops
Root Issue: The intake-collector node is stuck in a retry loop because it's not setting required outputs.
Root Issue: The research node is stuck in a retry loop because it's not setting required outputs.
```
---
@@ -293,25 +312,25 @@ Root Issue: The intake-collector node is stuck in a retry loop because it's not
**Example Output:**
```
Root Cause Analysis for intake-collector:
Root Cause Analysis for research:
Step-by-step breakdown:
Step 3:
- Tool Call: web_search(query="@RomuloNevesOf")
- Result: Found Twitter profile information
- Tool Call: web_search(query="latest AI regulations 2026")
- Result: Found relevant articles and sources
- Verdict: RETRY
- Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
- Feedback: "Missing required output 'research_findings'. You found sources but didn't call set_output."
Step 4:
- Tool Call: web_search(query="@RomuloNevesOf twitter")
- Result: Found additional Twitter information
- Tool Call: web_search(query="AI regulation policy 2026")
- Result: Found additional policy information
- Verdict: RETRY
- Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
- Feedback: "Still missing 'research_findings'. Use set_output to save your findings."
Steps 5-7: Similar pattern continues...
ROOT CAUSE: The node is successfully finding Twitter handles via web_search, but the LLM is not calling set_output to save the results. It keeps searching for more information instead of completing the task.
ROOT CAUSE: The node is successfully finding research sources via web_search, but the LLM is not calling set_output to save the results. It keeps searching for more information instead of completing the task.
```
---
@@ -562,15 +581,33 @@ PYTHONPATH=core:exports python -m {agent_name} --tui
### Find Available Checkpoints:
```bash
# In TUI:
/sessions {session_id}
Use MCP tools to programmatically find and inspect checkpoints:
# This shows all checkpoints with timestamps:
Available Checkpoints: (3)
1. cp_node_complete_intake_143030
2. cp_node_complete_research_143115
3. cp_pause_research_143130
```
# List all sessions to find the failed one
list_agent_sessions(agent_work_dir="~/.hive/agents/{agent_name}", status="failed")
# Inspect session state
get_agent_session_state(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}")
# Find clean checkpoints to resume from
list_agent_checkpoints(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", is_clean="true")
# Compare checkpoints to understand what changed
compare_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="{session_id}",
checkpoint_id_before="cp_node_complete_intake_143030",
checkpoint_id_after="cp_node_complete_research_143115"
)
# Inspect memory at a specific checkpoint
get_agent_checkpoint(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", checkpoint_id="cp_node_complete_intake_143030")
```
Or in TUI:
```bash
/sessions {session_id}
```
**Verification:**
@@ -653,7 +690,7 @@ Available Checkpoints: (3)
**Example interaction:**
```
Developer: "I applied the fix to intake-collector. How do I verify it works?"
Developer: "I applied the fix to research. How do I verify it works?"
You: "Great! Let's verify the fix with these steps:
@@ -665,11 +702,11 @@ You: "Great! Let's verify the fix with these steps:
[Use query_runtime_logs to check for attention flags]
3. Verify the specific node:
[Use query_runtime_log_details for intake-collector]
[Use query_runtime_log_details for research]
Expected results:
- No 'needs_attention' flags
- intake-collector shows exit_status='success'
- research shows exit_status='success'
- retry_count should be 0
Let me know when you've run it and I'll help check the logs!"
@@ -687,7 +724,7 @@ Let me know when you've run it and I'll help check the logs!"
- **Example:**
```
query_runtime_logs(
agent_work_dir="/home/user/.hive/twitter_outreach",
agent_work_dir="/home/user/.hive/deep_research_agent",
status="needs_attention",
limit=20
)
@@ -699,7 +736,7 @@ Let me know when you've run it and I'll help check the logs!"
- **Example:**
```
query_runtime_log_details(
agent_work_dir="/home/user/.hive/twitter_outreach",
agent_work_dir="/home/user/.hive/deep_research_agent",
run_id="session_20260206_115718_e22339c5",
needs_attention_only=True
)
@@ -711,9 +748,83 @@ Let me know when you've run it and I'll help check the logs!"
- **Example:**
```
query_runtime_log_raw(
agent_work_dir="/home/user/.hive/twitter_outreach",
agent_work_dir="/home/user/.hive/deep_research_agent",
run_id="session_20260206_115718_e22339c5",
node_id="intake-collector"
node_id="research"
)
```
### Session & Checkpoint Tools
**list_agent_sessions** - Browse sessions with filtering
- **When to use:** Finding resumable sessions, identifying failed sessions, Stage 3 triage
- **Returns:** Session list with status, timestamps, is_resumable, current_node, quality
- **Example:**
```
list_agent_sessions(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
status="failed",
limit=10
)
```
**get_agent_session_state** - Load full session state (excludes memory values)
- **When to use:** Inspecting session progress, checking is_resumable, examining path
- **Returns:** Full state with memory_keys/memory_size instead of memory values
- **Example:**
```
get_agent_session_state(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345"
)
```
**get_agent_session_memory** - Get memory contents from a session
- **When to use:** Stage 5 root cause analysis, inspecting produced data
- **Returns:** All memory keys+values, or a single key's value
- **Example:**
```
get_agent_session_memory(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
key="twitter_handles"
)
```
**list_agent_checkpoints** - List checkpoints for a session
- **When to use:** Stage 6 recovery, finding clean checkpoints to resume from
- **Returns:** Checkpoint summaries with type, node, clean status
- **Example:**
```
list_agent_checkpoints(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
is_clean="true"
)
```
**get_agent_checkpoint** - Load a specific checkpoint with full state
- **When to use:** Inspecting exact state at a checkpoint, comparing to current state
- **Returns:** Full checkpoint: memory snapshot, execution path, metrics
- **Example:**
```
get_agent_checkpoint(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
checkpoint_id="cp_node_complete_intake_143030"
)
```
**compare_agent_checkpoints** - Diff memory between two checkpoints
- **When to use:** Understanding data flow, finding where state diverged
- **Returns:** Memory diff (added/removed/changed keys) + execution path diff
- **Example:**
```
compare_agent_checkpoints(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
checkpoint_id_before="cp_node_complete_intake_143030",
checkpoint_id_after="cp_node_complete_research_143115"
)
```
@@ -739,27 +850,37 @@ Loop every 10 seconds:
2. If found: Alert and drill into L2
```
**Pattern 4: Session State + Checkpoint Recovery**
```
1. list_agent_sessions: Find failed/paused sessions
2. get_agent_session_state: Check is_resumable, see execution path
3. get_agent_session_memory: Inspect what data was produced
4. list_agent_checkpoints: Find clean checkpoints before failure
5. compare_agent_checkpoints: Understand what changed between checkpoints
6. Recommend resume command with specific checkpoint
```
---
## Complete Example Walkthrough
### Scenario: twitter_outreach Agent Failing
### Scenario: deep_research_agent Agent Failing
**Developer says:** "My twitter_outreach agent keeps failing. It's not finding Twitter handles correctly."
**Developer says:** "My deep_research_agent keeps failing. It's not saving research findings correctly."
#### Stage 1: Setup
```
You: "I'll help debug the twitter_outreach agent. Let me gather context..."
You: "I'll help debug the deep_research_agent agent. Let me gather context..."
[Read exports/twitter_outreach/agent.json]
[Read exports/deep_research_agent/agent.json]
Context:
- Agent: twitter_outreach
- Goal: twitter-outreach-multi-loop
- Working Dir: ~/.hive/agents/twitter_outreach
- Agent: deep_research_agent
- Goal: deep-research
- Working Dir: ~/.hive/agents/deep_research_agent
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
- Constraints: ["Must verify handle exists", "Must personalize message"]
- Nodes: intake-collector, profile-analyzer, message-composer, outreach-sender
- Nodes: research, profile-analyzer, message-composer, outreach-sender
```
#### Stage 2: Mode Selection
@@ -791,31 +912,31 @@ You: "Analyzing node-level details..."
[Call query_runtime_log_details for the session]
Results:
- Problem Node: intake-collector
- Problem Node: research
- Exit Status: escalate
- Retry Count: 5
- Verdict Counts: {RETRY: 5, ESCALATE: 1}
- Attention Reasons: ["high_retry_count", "missing_outputs"]
- Total Steps: 8
Diagnosis: intake-collector has retry loop + missing outputs issue
Diagnosis: research has retry loop + missing outputs issue
Category: Missing Outputs + Retry Loops
```
#### Stage 5: Root Cause (L3)
```
You: "Drilling into detailed logs for intake-collector..."
You: "Drilling into detailed logs for research..."
[Call query_runtime_log_raw for node intake-collector]
[Call query_runtime_log_raw for node research]
Evidence:
Step 3: web_search("@RomuloNevesOf") → Found profile
Step 3: web_search("latest AI regulations 2026") → Found articles
Verdict: RETRY
Feedback: "Missing required output 'twitter_handles'. You found the handle but didn't call set_output."
Feedback: "Missing required output 'research_findings'. You found sources but didn't call set_output."
Step 4: web_search("@RomuloNevesOf twitter") → Found more info
Step 4: web_search("AI regulation policy 2026") → Found more info
Verdict: RETRY
Feedback: "Still missing 'twitter_handles'. Use set_output to save your findings."
Feedback: "Still missing 'research_findings'. Use set_output to save your findings."
Steps 5-7: Same pattern repeats...
@@ -826,37 +947,37 @@ ROOT CAUSE: Node successfully finds handles but LLM doesn't call set_output. It
```
You: "Here's the fix for this issue:
## Issue: intake-collector Not Setting Outputs
## Issue: research Node Not Setting Outputs
**Root Cause:** Node successfully finds Twitter handles but doesn't call set_output
**Root Cause:** Node successfully finds research sources but doesn't call set_output
**Category:** Missing Outputs
**File to edit:** exports/twitter_outreach/nodes/intake_collector.py
**File to edit:** exports/deep_research_agent/nodes/research.py
**Changes:**
Update the system_prompt to emphasize the output requirement:
```python
system_prompt = """
You are the Intake Collector for the Twitter outreach agent.
You are the Research node for the deep research agent.
Your task:
1. Extract Twitter handle from user input (URL or @mention)
2. Verify it's a valid Twitter handle format
3. **CRITICAL: Call set_output(twitter_handles=[handle]) immediately after finding valid handle**
1. Search for relevant sources on the research topic
2. Extract key findings from each source
3. **CRITICAL: Call set_output(research_findings=[...]) immediately after gathering sufficient findings**
IMPORTANT: You MUST call set_output even if you found the handle on the first try.
Do not continue searching once you have a valid handle - set output and finish.
IMPORTANT: You MUST call set_output even if you found findings on the first search.
Do not continue searching indefinitely - set output and finish once you have enough data.
Output format:
{
"twitter_handles": ["@handle1", "@handle2", ...]
"research_findings": [{"source": "...", "summary": "...", "url": "..."}, ...]
}
"""
```
**Why this works:** The explicit instruction to call set_output immediately prevents the LLM from continuing to search after finding the handle.
**Why this works:** The explicit instruction to call set_output immediately prevents the LLM from continuing to search after finding sufficient sources.
```
#### Stage 7: Verification
@@ -880,13 +1001,13 @@ You: "Checking now..."
[Call query_runtime_log_details for the new run]
Results:
- Node: intake-collector
- Node: research
- Exit Status: success ✓
- Retry Count: 0 ✓
- Total Steps: 2 ✓
- No attention flags ✓
"Perfect! The fix worked. The intake-collector now:
"Perfect! The fix worked. The research now:
- Completes successfully on first try
- No retries needed
- Calls set_output properly
@@ -913,6 +1034,9 @@ Your agent should now work correctly!"
3. **Don't ignore edge conditions** - Missing edges cause routing failures
4. **Don't overlook judge configuration** - Mismatched expectations cause retry loops
5. **Don't forget nullable_output_keys** - Optional inputs need explicit marking
6. **Don't diagnose "in_progress" as a failure for forever-alive agents** - Agents with `terminal_nodes=[]` are designed to never enter "completed" state. This is intentional. Focus on quality of individual node visits, not session completion status
7. **Don't ignore conversation memory issues in long-running sessions** - In continuous conversation mode, history grows across node transitions and loop iterations. Watch for context overflow (tokens_used > 100K), stale data from previous loops affecting edge conditions, and compaction failures that cause the LLM to lose important context
8. **Don't confuse "waiting for user" with "stalled"** - Client-facing nodes in forever-alive agents block for user input by design. A session paused at a client-facing node is working correctly, not stalled
---
File diff suppressed because it is too large Load Diff
@@ -1,351 +1,333 @@
# Example: Testing a YouTube Research Agent
# Example: Iterative Testing of a Research Agent
This example walks through testing a YouTube research agent that finds relevant videos based on a topic.
This example walks through the full iterative test loop for a research agent that searches the web, reviews findings, and produces a cited report.
## Prerequisites
## Agent Structure
- Agent built with hive-create skill at `exports/youtube-research/`
- Goal defined with success criteria and constraints
## Step 1: Load the Goal
First, load the goal that was defined during the Goal stage:
```json
{
"id": "youtube-research",
"name": "YouTube Research Agent",
"description": "Find relevant YouTube videos on a given topic",
"success_criteria": [
{
"id": "find_videos",
"description": "Find 3-5 relevant videos",
"metric": "video_count",
"target": "3-5",
"weight": 1.0
},
{
"id": "relevance",
"description": "Videos must be relevant to the topic",
"metric": "relevance_score",
"target": ">0.8",
"weight": 0.8
}
],
"constraints": [
{
"id": "api_limits",
"description": "Must not exceed YouTube API rate limits",
"constraint_type": "hard",
"category": "technical"
},
{
"id": "content_safety",
"description": "Must filter out inappropriate content",
"constraint_type": "hard",
"category": "safety"
}
]
}
```
exports/deep_research_agent/
├── agent.py # Goal + graph: intake → research → review → report
├── nodes/__init__.py # Node definitions (system_prompt, input/output keys)
├── config.py # Model config
├── mcp_servers.json # Tools: web_search, web_scrape
└── tests/ # Test files (we'll create these)
```
## Step 2: Get Constraint Test Guidelines
**Goal:** "Rigorous Interactive Research" — find 5+ diverse sources, cite every claim, produce a complete report.
During the Goal stage (or early Eval), get test guidelines for constraints:
---
## Phase 1: Generate Tests
### Read the goal
```python
result = generate_constraint_tests(
goal_id="youtube-research",
goal_json='<goal JSON above>',
agent_path="exports/youtube-research"
)
Read(file_path="exports/deep_research_agent/agent.py")
# Extract: goal_id="rigorous-interactive-research"
# success_criteria: source-diversity (>=5), citation-coverage (100%), report-completeness (90%)
# constraints: no-hallucination, source-attribution
```
**The result contains guidelines (not generated tests):**
- `output_file`: Where to write tests
- `file_header`: Imports and fixtures to use
- `test_template`: Format for test functions
- `constraints_formatted`: The constraints to test
- `test_guidelines`: Rules for writing tests
## Step 3: Write Constraint Tests
Using the guidelines, write tests directly with the Write tool:
```python
# Write constraint tests using the provided file_header and guidelines
Write(
file_path="exports/youtube-research/tests/test_constraints.py",
content='''
"""Constraint tests for youtube-research agent."""
import os
import pytest
from exports.youtube_research import default_agent
pytestmark = pytest.mark.skipif(
not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing."
)
@pytest.mark.asyncio
async def test_constraint_api_limits_respected():
"""Verify API rate limits are not exceeded."""
import time
mock_mode = bool(os.environ.get("MOCK_MODE"))
for i in range(10):
result = await default_agent.run({"topic": f"test_{i}"}, mock_mode=mock_mode)
time.sleep(0.1)
# Should complete without rate limit errors
assert "rate limit" not in str(result).lower()
@pytest.mark.asyncio
async def test_constraint_content_safety_filter():
"""Verify inappropriate content is filtered."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "general topic"}, mock_mode=mock_mode)
for video in result.videos:
assert video.safe_for_work is True
assert video.age_restricted is False
'''
)
```
## Step 4: Get Success Criteria Test Guidelines
After the agent is built, get success criteria test guidelines:
### Get test guidelines
```python
result = generate_success_tests(
goal_id="youtube-research",
goal_json='<goal JSON>',
node_names="search_node,filter_node,rank_node,format_node",
tool_names="youtube_search,video_details,channel_info",
agent_path="exports/youtube-research"
goal_id="rigorous-interactive-research",
goal_json='{"id": "rigorous-interactive-research", "success_criteria": [{"id": "source-diversity", "description": "Use multiple diverse sources", "target": ">=5"}, {"id": "citation-coverage", "description": "Every claim cites its source", "target": "100%"}, {"id": "report-completeness", "description": "Report answers the research questions", "target": "90%"}]}',
node_names="intake,research,review,report",
tool_names="web_search,web_scrape",
agent_path="exports/deep_research_agent"
)
```
## Step 5: Write Success Criteria Tests
Using the guidelines, write success criteria tests:
### Write tests
```python
Write(
file_path="exports/youtube-research/tests/test_success_criteria.py",
content='''
"""Success criteria tests for youtube-research agent."""
import os
import pytest
from exports.youtube_research import default_agent
pytestmark = pytest.mark.skipif(
not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("MOCK_MODE"),
reason="API key required for real testing."
)
file_path="exports/deep_research_agent/tests/test_success_criteria.py",
content=result["file_header"] + '''
@pytest.mark.asyncio
async def test_find_videos_happy_path():
"""Test finding videos for a common topic."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "machine learning"}, mock_mode=mock_mode)
assert result.success
assert 3 <= len(result.videos) <= 5
assert all(v.title for v in result.videos)
assert all(v.video_id for v in result.videos)
async def test_success_source_diversity(runner, auto_responder, mock_mode):
"""At least 5 diverse sources are found."""
await auto_responder.start()
try:
result = await runner.run({"query": "impact of remote work on productivity"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
sources = output.get("sources", [])
if isinstance(sources, list):
assert len(sources) >= 5, f"Expected >= 5 sources, got {len(sources)}"
@pytest.mark.asyncio
async def test_find_videos_minimum_boundary():
"""Test at minimum threshold (3 videos)."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "niche topic xyz"}, mock_mode=mock_mode)
assert len(result.videos) >= 3
async def test_success_citation_coverage(runner, auto_responder, mock_mode):
"""Every factual claim in the report cites its source."""
await auto_responder.start()
try:
result = await runner.run({"query": "climate change effects on agriculture"})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
# Check that report contains numbered references
assert "[1]" in str(report) or "[source" in str(report).lower(), "Report lacks citations"
@pytest.mark.asyncio
async def test_relevance_score_threshold():
"""Test relevance scoring meets threshold."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "python programming"}, mock_mode=mock_mode)
for video in result.videos:
assert video.relevance_score > 0.8
async def test_success_report_completeness(runner, auto_responder, mock_mode):
"""Report addresses the original research question."""
query = "pros and cons of nuclear energy"
await auto_responder.start()
try:
result = await runner.run({"query": query})
finally:
await auto_responder.stop()
assert result.success, f"Agent failed: {result.error}"
output = result.output or {}
report = output.get("report", "")
assert len(str(report)) > 200, f"Report too short: {len(str(report))} chars"
@pytest.mark.asyncio
async def test_find_videos_no_results_graceful():
"""Test graceful handling of no results."""
mock_mode = bool(os.environ.get("MOCK_MODE"))
result = await default_agent.run({"topic": "xyznonexistent123"}, mock_mode=mock_mode)
async def test_empty_query_handling(runner, auto_responder, mock_mode):
"""Agent handles empty input gracefully."""
await auto_responder.start()
try:
result = await runner.run({"query": ""})
finally:
await auto_responder.stop()
output = result.output or {}
assert not result.success or output.get("error"), "Should handle empty query"
# Should not crash, return empty or message
assert result.videos == [] or result.message
@pytest.mark.asyncio
async def test_feedback_loop_terminates(runner, auto_responder, mock_mode):
"""Feedback loop between review and research terminates."""
await auto_responder.start()
try:
result = await runner.run({"query": "quantum computing basics"})
finally:
await auto_responder.stop()
visits = result.node_visit_counts or {}
for node_id, count in visits.items():
assert count <= 5, f"Node {node_id} visited {count} times"
'''
)
```
## Step 6: Run All Tests
---
Execute all tests:
## Phase 2: First Execution
```python
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]',
parallel=4
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Results:**
**Result:**
```json
{
"goal_id": "youtube-research",
"overall_passed": false,
"summary": {
"total": 6,
"passed": 5,
"failed": 1,
"pass_rate": "83.3%"
},
"duration_ms": 4521,
"results": [
{"test_id": "test_constraint_api_001", "passed": true, "duration_ms": 1234},
{"test_id": "test_constraint_content_001", "passed": true, "duration_ms": 456},
{"test_id": "test_success_001", "passed": true, "duration_ms": 789},
{"test_id": "test_success_002", "passed": true, "duration_ms": 654},
{"test_id": "test_success_003", "passed": true, "duration_ms": 543},
{"test_id": "test_success_004", "passed": false, "duration_ms": 845,
"error_category": "IMPLEMENTATION_ERROR",
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'"}
]
"overall_passed": false,
"summary": {"total": 5, "passed": 3, "failed": 2, "pass_rate": "60.0%"},
"failures": [
{"test_name": "test_success_source_diversity", "details": "AssertionError: Expected >= 5 sources, got 2"},
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
## Step 7: Debug the Failed Test
---
## Phase 3: Analyze (Iteration 1)
### Debug the first failure
```python
result = debug_test(
goal_id="youtube-research",
test_name="test_find_videos_no_results_graceful",
agent_path="exports/youtube-research"
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_source_diversity",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Expected >= 5 sources, got 2
```
### Find the session and inspect memory
```python
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_150000_abc12345
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_150000_abc12345",
key="research_results"
)
# → Only 2 sources found. LLM stopped searching after 2 queries.
```
### Check LLM behavior in the research node
```python
query_runtime_log_raw(
agent_work_dir="~/.hive/agents/deep_research_agent",
run_id="session_20260209_150000_abc12345",
node_id="research"
)
# → LLM called web_search twice, got results, immediately called set_output.
# → Prompt doesn't instruct it to find at least 5 sources.
```
**Root cause:** The research node's system_prompt doesn't specify minimum source requirements.
---
## Phase 4: Fix (Iteration 1)
```python
Read(file_path="exports/deep_research_agent/nodes/__init__.py")
# Fix the research node prompt
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Search for information on the user\'s topic using web search."',
new_string='system_prompt="Search for information on the user\'s topic using web search. You MUST find at least 5 diverse, authoritative sources. Use multiple different search queries with varied keywords. Do NOT call set_output until you have gathered at least 5 distinct sources from different domains."'
)
```
**Debug Output:**
---
## Phase 5: Recover & Resume (Iteration 1)
The fix is to the `research` node. Since this was a `run_tests` execution (no checkpoints), we re-run from scratch:
```python
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent",
fail_fast=True
)
```
**Result:**
```json
{
"test_id": "test_success_004",
"test_name": "test_find_videos_no_results_graceful",
"input": {"topic": "xyznonexistent123"},
"expected": "Empty list or message",
"actual": {"error": "TypeError: 'NoneType' object has no attribute 'videos'"},
"passed": false,
"error_message": "TypeError: 'NoneType' object has no attribute 'videos'",
"error_category": "IMPLEMENTATION_ERROR",
"stack_trace": "Traceback (most recent call last):\n File \"filter_node.py\", line 42\n for video in result.videos:\nTypeError: 'NoneType' object has no attribute 'videos'",
"logs": [
{"timestamp": "2026-01-20T10:00:01", "node": "search_node", "level": "INFO", "msg": "Searching for: xyznonexistent123"},
{"timestamp": "2026-01-20T10:00:02", "node": "search_node", "level": "WARNING", "msg": "No results found"},
{"timestamp": "2026-01-20T10:00:02", "node": "filter_node", "level": "ERROR", "msg": "NoneType error"}
],
"runtime_data": {
"execution_path": ["start", "search_node", "filter_node"],
"node_outputs": {
"search_node": null
}
},
"suggested_fix": "Add null check in filter_node before accessing .videos attribute",
"iteration_guidance": {
"stage": "Agent",
"action": "Fix the code in nodes/edges",
"restart_required": false,
"description": "The goal is correct, but filter_node doesn't handle null results from search_node."
}
"overall_passed": false,
"summary": {"total": 5, "passed": 4, "failed": 1, "pass_rate": "80.0%"},
"failures": [
{"test_name": "test_success_citation_coverage", "details": "AssertionError: Report lacks citations"}
]
}
```
## Step 8: Iterate Based on Category
Source diversity now passes. Citation coverage still fails.
Since this is an **IMPLEMENTATION_ERROR**, we:
---
1. **Don't restart** the Goal → Agent → Eval flow
2. **Fix the agent** using hive-create skill:
- Modify `filter_node` to handle null results
3. **Re-run Eval** (tests only)
### Fix in hive-create:
## Phase 3: Analyze (Iteration 2)
```python
# Update the filter_node to handle null
add_node(
node_id="filter_node",
name="Filter Node",
description="Filter and rank videos",
node_type="function",
input_keys=["search_results"],
output_keys=["filtered_videos"],
system_prompt="""
Filter videos by relevance.
IMPORTANT: Handle case where search_results is None or empty.
Return empty list if no results.
"""
debug_test(
goal_id="rigorous-interactive-research",
test_name="test_success_citation_coverage",
agent_path="exports/deep_research_agent"
)
# Category: ASSERTION_FAILURE — Report lacks citations
# Check what the report node produced
list_agent_sessions(
agent_work_dir="~/.hive/agents/deep_research_agent",
status="completed",
limit=1
)
# → session_20260209_151500_def67890
get_agent_session_memory(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_151500_def67890",
key="report"
)
# → Report text exists but uses no numbered references.
# → Sources are in memory but report node doesn't cite them.
```
**Root cause:** The report node's prompt doesn't instruct the LLM to include numbered citations.
---
## Phase 4: Fix (Iteration 2)
```python
Edit(
file_path="exports/deep_research_agent/nodes/__init__.py",
old_string='system_prompt="Write a comprehensive report based on the research findings."',
new_string='system_prompt="Write a comprehensive report based on the research findings. You MUST include numbered citations [1], [2], etc. for every factual claim. At the end, include a References section listing all sources with their URLs. Every claim must be traceable to a specific source."'
)
```
### Re-export and re-test:
---
## Phase 5: Resume (Iteration 2)
The fix is to the `report` node (the last node). To demonstrate checkpoint recovery, run via CLI:
```bash
# Run via CLI to get checkpoints
uv run hive run exports/deep_research_agent --input '{"topic": "climate change effects"}'
# After it runs, find the clean checkpoint before report
list_agent_checkpoints(
agent_work_dir="~/.hive/agents/deep_research_agent",
session_id="session_20260209_152000_ghi34567",
is_clean="true"
)
# → cp_node_complete_review_152100 (after review, before report)
# Resume — skips intake, research, review entirely
uv run hive run exports/deep_research_agent \
--resume-session session_20260209_152000_ghi34567 \
--checkpoint cp_node_complete_review_152100
```
Only the `report` node re-runs with the fixed prompt, using research data from the checkpoint.
---
## Phase 6: Final Verification
```python
# Re-export the fixed agent
export_graph(path="exports/youtube-research")
# Re-run tests
result = run_tests(
goal_id="youtube-research",
agent_path="exports/youtube-research",
test_types='["all"]'
run_tests(
goal_id="rigorous-interactive-research",
agent_path="exports/deep_research_agent"
)
```
**Updated Results:**
**Result:**
```json
{
"goal_id": "youtube-research",
"overall_passed": true,
"summary": {
"total": 6,
"passed": 6,
"failed": 0,
"pass_rate": "100.0%"
}
"overall_passed": true,
"summary": {"total": 5, "passed": 5, "failed": 0, "pass_rate": "100.0%"}
}
```
All tests pass.
---
## Summary
1. **Got guidelines** for constraint tests during Goal stage
2. **Wrote** constraint tests using Write tool
3. **Got guidelines** for success criteria tests during Eval stage
4. **Wrote** success criteria tests using Write tool
5. **Ran** tests in parallel
6. **Debugged** the one failure
7. **Categorized** as IMPLEMENTATION_ERROR
8. **Fixed** the agent (not the goal)
9. **Re-ran** Eval only (didn't restart full flow)
10. **Passed** all tests
| Iteration | Failure | Root Cause | Fix | Recovery |
|-----------|---------|------------|-----|----------|
| 1 | Source diversity (2 < 5) | Research prompt too vague | Added "at least 5 sources" to prompt | Re-run (no checkpoints) |
| 2 | No citations in report | Report prompt lacks citation instructions | Added citation requirements | Checkpoint resume (skipped 3 nodes) |
The agent is now validated and ready for production use.
**Key takeaways:**
- Phase 3 analysis (session memory + L3 logs) identified root causes without guessing
- Checkpoint recovery in iteration 2 saved time by skipping 3 expensive nodes
- Final `run_tests` confirms all scenarios pass end-to-end
+7
View File
@@ -0,0 +1,7 @@
# Project-level Codex config for Hive.
# Keep this file minimal: MCP connectivity + skill discovery.
[mcp_servers.agent-builder]
command = "uv"
args = ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"]
cwd = "."
+3 -1
View File
@@ -74,4 +74,6 @@ exports/*
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
screenshots/*
-5
View File
@@ -4,11 +4,6 @@
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
+30
View File
@@ -0,0 +1,30 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": [
"run",
"python",
"-m",
"framework.mcp.agent_builder_server"
],
"cwd": "core",
"env": {
"PYTHONPATH": "../tools/src"
}
},
"tools": {
"command": "uv",
"args": [
"run",
"python",
"mcp_server.py",
"--stdio"
],
"cwd": "tools",
"env": {
"PYTHONPATH": "src"
}
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-debugger
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/triage-issue
+207
View File
@@ -0,0 +1,207 @@
# Release Notes
**Release Date:** February 18, 2026
**Tag:** v0.5.1
## The Hive Gets a Brain
v0.5.1 is our most ambitious release yet. Hive agents can now **build other agents** -- the new Hive Coder meta-agent writes, tests, and fixes agent packages from natural language. The runtime grows multi-graph support so one session can orchestrate multiple agents simultaneously. The TUI gets a complete overhaul with an in-app agent picker, live streaming, and seamless escalation to the Coder. And we're now provider-agnostic: Claude Code subscriptions, OpenAI-compatible endpoints, and any LiteLLM-supported model work out of the box.
---
## Highlights
### Hive Coder -- The Agent That Builds Agents
A native meta-agent that lives inside the framework at `core/framework/agents/hive_coder/`. Give it a natural-language specification and it produces a complete agent package -- goal definition, node prompts, edge routing, MCP tool wiring, tests, and all boilerplate files.
```bash
# Launch the Coder directly
hive code
# Or escalate from any running agent (TUI)
Ctrl+E # or /coder in chat
```
The Coder ships with:
- **Reference documentation** -- anti-patterns, construction guide, and design patterns baked into its system prompt
- **Guardian watchdog** -- an event-driven monitor that catches agent failures and triggers automatic remediation
- **Coder Tools MCP server** -- file I/O, fuzzy-match editing, git snapshots, and sandboxed shell execution (`tools/coder_tools_server.py`)
- **Test generation** -- structural tests for forever-alive agents that don't hang on `runner.run()`
### Multi-Graph Agent Runtime
`AgentRuntime` now supports loading, managing, and switching between multiple agent graphs within a single session. Six new lifecycle tools give agents (and the TUI) full control:
```python
# Load a second agent into the runtime
await runtime.add_graph("exports/deep_research_agent")
# Tools available to agents:
# load_agent, unload_agent, start_agent, restart_agent, list_agents, get_user_presence
```
The Hive Coder uses multi-graph internally -- when you escalate from a worker agent, the Coder loads as a separate graph while the worker stays alive in the background.
### TUI Revamp
The Terminal UI gets a ground-up rebuild with five major additions:
- **Agent Picker** (Ctrl+A) -- tabbed modal screen for browsing Your Agents, Framework agents, and Examples with metadata badges (node count, tool count, session count, tags)
- **Runtime-optional startup** -- TUI launches without a pre-loaded agent, showing the picker on first open
- **Live streaming pane** -- dedicated RichLog widget shows LLM tokens as they arrive, replacing the old one-token-per-line display
- **PDF attachments** -- `/attach` and `/detach` commands with native OS file dialog (macOS, Linux, Windows)
- **Multi-graph commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>` for managing agent graphs in-session
### Provider-Agnostic LLM Support
Hive is no longer Anthropic-only. v0.5.1 adds first-class support for:
- **Claude Code subscriptions** -- `use_claude_code_subscription: true` in `~/.hive/configuration.json` reads OAuth tokens from `~/.claude/.credentials.json` with automatic refresh
- **OpenAI-compatible endpoints** -- `api_base` config routes traffic through any compatible API (Azure OpenAI, vLLM, Ollama, etc.)
- **Any LiteLLM model** -- `RuntimeConfig` now passes `api_key`, `api_base`, and `extra_kwargs` through to LiteLLM
The quickstart script auto-detects Claude Code subscriptions and ZAI Code installations.
---
## What's New
### Architecture & Runtime
- **Hive Coder meta-agent** -- Natural-language agent builder with reference docs, guardian watchdog, and `hive code` CLI command. (@TimothyZhang7)
- **Multi-graph agent sessions** -- `add_graph`/`remove_graph` on AgentRuntime with 6 lifecycle tools (`load_agent`, `unload_agent`, `start_agent`, `restart_agent`, `list_agents`, `get_user_presence`). (@TimothyZhang7)
- **Claude Code subscription support** -- OAuth token refresh via `use_claude_code_subscription` config, auto-detection in quickstart, LiteLLM header patching. (@TimothyZhang7)
- **OpenAI-compatible endpoint support** -- `api_base` and `extra_kwargs` in `RuntimeConfig` for any OpenAI-compatible API. (@TimothyZhang7)
- **Remove deprecated node types** -- Delete `FlexibleGraphExecutor`, `WorkerNode`, `HybridJudge`, `CodeSandbox`, `Plan`, `FunctionNode`, `LLMNode`, `RouterNode`. Deprecated types (`llm_tool_use`, `llm_generate`, `function`, `router`, `human_input`) now raise `RuntimeError` with migration guidance. (@TimothyZhang7)
- **Interactive credential setup** -- Guided `CredentialSetupSession` with health checks and encrypted storage, accessible via `hive setup-credentials` or automatic prompting on credential errors. (@RichardTang-Aden)
- **Pre-start confirmation prompt** -- Interactive prompt before agent execution allowing credential updates or abort. (@RichardTang-Aden)
- **Event bus multi-graph support** -- `graph_id` on events, `filter_graph` on subscriptions, `ESCALATION_REQUESTED` event type, `exclude_own_graph` filter. (@TimothyZhang7)
### TUI Improvements
- **In-app agent picker** (Ctrl+A) -- Tabbed modal for browsing agents with metadata badges (nodes, tools, sessions, tags). (@TimothyZhang7)
- **Runtime-optional TUI startup** -- Launches without a pre-loaded agent, shows agent picker on startup. (@TimothyZhang7)
- **Hive Coder escalation** (Ctrl+E) -- Escalate to Hive Coder and return; also available via `/coder` and `/back` chat commands. (@TimothyZhang7)
- **PDF attachment support** -- `/attach` and `/detach` commands with native OS file dialog. (@TimothyZhang7)
- **Streaming output pane** -- Dedicated RichLog widget for live LLM token streaming. (@TimothyZhang7)
- **Multi-graph TUI commands** -- `/graphs`, `/graph <id>`, `/load <path>`, `/unload <id>`. (@TimothyZhang7)
- **Agent Guardian watchdog** -- Event-driven monitor that catches secondary agent failures and triggers automatic remediation, with `--no-guardian` CLI flag. (@TimothyZhang7)
### New Tool Integrations
| Tool | Description | Contributor |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
| **Discord** | 4 MCP tools (`discord_list_guilds`, `discord_list_channels`, `discord_send_message`, `discord_get_messages`) with rate-limit retry and channel filtering | @mishrapravin114 |
| **Exa Search API** | 4 AI-powered search tools (`exa_search`, `exa_find_similar`, `exa_get_contents`, `exa_answer`) with neural/keyword search, domain filters, and citation-backed answers | @JeetKaria06 |
| **Razorpay** | 6 payment processing tools for payments, invoices, payment links, and refunds with HTTP Basic Auth | @shivamshahi07 |
| **Google Docs** | Document creation, reading, and editing with OAuth credential support | @haliaeetusvocifer |
| **Gmail enhancements** | Expanded mail operations for inbox management | @bryanadenhq |
### Infrastructure
- **Default node type → `event_loop`** -- `NodeSpec.node_type` defaults to `"event_loop"` instead of `"llm_tool_use"`. (@TimothyZhang7)
- **Default `max_node_visits` → 0 (unlimited)** -- Nodes default to unlimited visits, reducing friction for feedback loops and forever-alive agents. (@TimothyZhang7)
- **Remove `function` field from NodeSpec** -- Follows deprecation of `FunctionNode`. (@TimothyZhang7)
- **LiteLLM OAuth patch** -- Correct header construction for OAuth tokens (remove `x-api-key` when Bearer token is present). (@TimothyZhang7)
- **Orchestrator config centralization** -- Reads `api_key`, `api_base`, `extra_kwargs` from centralized `~/.hive/configuration.json`. (@TimothyZhang7)
- **System prompt datetime injection** -- All system prompts now include current date/time for time-aware agent behavior. (@TimothyZhang7)
- **Utils module exports** -- Proper `__init__.py` exports for the utils module. (@Siddharth2624)
- **Increased default max_tokens** -- Opus 4.6 defaults to 32768, Sonnet 4.5 to 16384 (up from 8192). (@TimothyZhang7)
---
## Bug Fixes
- Flush WIP accumulator outputs on cancel/failure so edge conditions see correct values on resume
- Stall detection state preserved across resume (no more resets on checkpoint restore)
- Skip client-facing blocking for event-triggered executions (timer/webhook)
- Executor retry override scoped to actual EventLoopNode instances only
- Add `_awaiting_input` flag to EventLoopNode to prevent input injection race conditions
- Fix TUI streaming display (tokens no longer appear one-per-line)
- Fix `_return_from_escalation` crash when ChatRepl widgets not yet mounted
- Fix tools registration problems for Google Docs credentials (@RichardTang-Aden)
- Fix email agent version conflicts (@RichardTang-Aden)
- Fix coder tool timeouts (120s for tests, 300s cap for commands)
## Documentation
- Clarify installation and prevent root pip install misuse (@paarths-collab)
---
## Agent Updates
- **Email Inbox Management** -- Consolidate `gmail_inbox_guardian` and `inbox_management` into a single unified agent with updated prompts and config. (@RichardTang-Aden, @bryanadenhq)
- **Job Hunter** -- Updated node prompts, config, and agent metadata; added PDF resume selection. (@bryanadenhq)
- **Deep Research Agent** -- Revised node implementations with updated prompts and output handling.
- **Tech News Reporter** -- Revised node prompts for improved output quality.
- **Vulnerability Assessment** -- Expanded prompts with more detailed assessment instructions. (@bryanadenhq)
---
## Breaking Changes
- **Deprecated node types raise `RuntimeError`** -- `llm_tool_use`, `llm_generate`, `function`, `router`, `human_input` now fail instead of warning. Migrate to `event_loop`.
- **`NodeSpec.node_type` defaults to `"event_loop"`** (was `"llm_tool_use"`)
- **`NodeSpec.max_node_visits` defaults to `0` / unlimited** (was `1`)
- **`NodeSpec.function` field removed** -- `FunctionNode` is deleted; use event_loop nodes with tools instead.
---
## Community Contributors
A huge thank you to everyone who contributed to this release:
- **Richard Tang** (@RichardTang-Aden) -- Interactive credential setup, pre-start confirmation, email agent consolidation, tool registration fixes, lint and formatting
- **Pravin Mishra** (@mishrapravin114) -- Discord integration with 4 MCP tools
- **Jeet Karia** (@JeetKaria06) -- Exa Search API integration with 4 AI-powered search tools
- **Shivam Shahi** (@shivamshahi07) -- Razorpay payment processing integration
- **Siddharth Varshney** (@Siddharth2624) -- Utils module exports
- **@haliaeetusvocifer** -- Google Docs integration with OAuth support
- **Bryan** (@bryanadenhq) -- PDF selection, inbox agent fixes, Job Hunter and Vulnerability Assessment updates
- **@paarths-collab** -- Documentation improvements
---
## Upgrading
```bash
git pull origin main
uv sync
```
### Migration Guide
If your agents use deprecated node types, update them:
```python
# Before (v0.5.0) -- these now raise RuntimeError
NodeSpec(node_type="llm_tool_use", ...)
NodeSpec(node_type="function", function=my_func, ...)
# After (v0.5.1) -- use event_loop for everything
NodeSpec(node_type="event_loop", ...) # or just omit node_type (it's the default now)
```
If your agents set `max_node_visits=1` explicitly, they'll still work. The only change is the _default_ -- new agents without an explicit value now get unlimited visits.
To try the new Hive Coder:
```bash
# Launch Coder directly
hive code
# Or from TUI -- press Ctrl+E to escalate
hive tui
```
---
## What's Next
- **Agent-to-agent communication** -- one agent's output triggers another agent's entry point
- **Cost visibility** -- detailed runtime log of LLM costs per node and per session
- **Persistent webhook subscriptions** -- survive agent restarts without re-registering
- **Remote agent deployment** -- run agents as long-lived services with HTTP APIs
+6 -4
View File
@@ -49,8 +49,8 @@ You may submit PRs without prior assignment for:
make check # Lint and format checks (ruff check + ruff format --check on core/ and tools/)
make test # Core tests (cd core && pytest tests/ -v)
```
6. Commit your changes following our commit conventions
7. Push to your fork and submit a Pull Request
8. Commit your changes following our commit conventions
9. Push to your fork and submit a Pull Request
## Development Setup
@@ -99,8 +99,7 @@ docs(readme): update installation instructions
2. Update documentation if needed
3. Add tests for new functionality
4. Ensure `make check` and `make test` pass
5. Update the CHANGELOG.md if applicable
6. Request review from maintainers
5. Request review from maintainers
### PR Title Format
@@ -145,6 +144,9 @@ make test
# Or run tests directly
cd core && pytest tests/ -v
# Run tools package tests (when contributing to tools/)
cd tools && uv run pytest tests/ -v
# Run tests for a specific agent
PYTHONPATH=exports uv run python -m agent_name test
```
+158 -86
View File
@@ -22,7 +22,6 @@
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
@@ -82,12 +81,17 @@ Use Hive when you need:
### Prerequisites
- Python 3.11+ for agent development
- Claude Code or Cursor for utilizing agent skills
- Claude Code, Codex CLI, or Cursor for utilizing agent skills
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
### Installation
> **Note**
> Hive uses a `uv` workspace layout and is not installed with `pip install`.
> Running `pip install -e .` from the repository root will create a placeholder package and Hive will not function correctly.
> Please use the quickstart script below to set up the environment.
```bash
# Clone the repository
git clone https://github.com/adenhq/hive.git
@@ -121,8 +125,45 @@ hive tui
hive run exports/your_agent_name --input '{"key": "value"}'
```
## Coding Agent Support
### Codex CLI
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1. **Config:** `.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2. **Skills:** `.agents/skills/` symlinks to Hive skills (tracked in git)
3. **Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
1. **Setup:** Run the quickstart script
2. **Launch:** Open Opencode in the project root.
3. **Activate:** Type `/hive` in the chat to switch to the Hive Agent.
4. **Verify:** Ask the agent _"List your tools"_ to confirm the connection.
The agent has access to all Hive skills and can scaffold agents, add tools, and debug workflows directly from the chat.
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
## Features
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
@@ -144,7 +185,6 @@ Hive is built to be model-agnostic and system-agnostic.
- **LLM flexibility** - Hive Framework is designed to support various types of LLMs, including hosted and local models through LiteLLM-compatible providers.
- **Business system connectivity** - Hive Framework is designed to connect to all kinds of business systems as tools, such as CRM, support, messaging, data, file, and internal APIs via MCP.
## Why Aden
Hive focuses on generating agents that run real business processes rather than generic agents. Instead of requiring you to manually design workflows, define agent interactions, and handle failures reactively, Hive flips the paradigm: **you describe outcomes, and the system builds itself**—delivering an outcome-driven, adaptive experience with an easy-to-use set of tools and integrations.
@@ -237,92 +277,124 @@ See [environment-setup.md](docs/environment-setup.md) for complete setup instruc
Aden Hive Agent Framework aims to help developers build outcome-oriented, self-adaptive agents. See [roadmap.md](docs/roadmap.md) for details.
```mermaid
flowchart TD
subgraph Foundation
direction LR
subgraph arch["Architecture"]
a1["Node-Based Architecture"]:::done
a2["Python SDK"]:::done
a3["LLM Integration"]:::done
a4["Communication Protocol"]:::done
end
subgraph ca["Coding Agent"]
b1["Goal Creation Session"]:::done
b2["Worker Agent Creation"]
b3["MCP Tools"]:::done
end
subgraph wa["Worker Agent"]
c1["Human-in-the-Loop"]:::done
c2["Callback Handlers"]:::done
c3["Intervention Points"]:::done
c4["Streaming Interface"]
end
subgraph cred["Credentials"]
d1["Setup Process"]:::done
d2["Pluggable Sources"]:::done
d3["Enterprise Secrets"]
d4["Integration Tools"]:::done
end
subgraph tools["Tools"]
e1["File Use"]:::done
e2["Memory STM/LTM"]:::done
e3["Web Search/Scraper"]:::done
e4["CSV/PDF"]:::done
e5["Excel/Email"]
end
subgraph core["Core"]
f1["Eval System"]
f2["Pydantic Validation"]:::done
f3["Documentation"]:::done
f4["Adaptiveness"]
f5["Sample Agents"]
end
end
flowchart TB
%% Main Entity
User([User])
subgraph Expansion
direction LR
subgraph intel["Intelligence"]
g1["Guardrails"]
g2["Streaming Mode"]
g3["Image Generation"]
g4["Semantic Search"]
%% =========================================
%% EXTERNAL EVENT SOURCES
%% =========================================
subgraph ExtEventSource [External Event Source]
E_Sch["Schedulers"]
E_WH["Webhook"]
E_SSE["SSE"]
end
subgraph mem["Memory Iteration"]
h1["Message Model & Sessions"]
h2["Storage Migration"]
h3["Context Building"]
h4["Proactive Compaction"]
h5["Token Tracking"]
end
subgraph evt["Event System"]
i1["Event Bus for Nodes"]
end
subgraph cas["Coding Agent Support"]
j1["Claude Code"]
j2["Cursor"]
j3["Opencode"]
j4["Antigravity"]
end
subgraph plat["Platform"]
k1["JavaScript/TypeScript SDK"]
k2["Custom Tool Integrator"]
k3["Windows Support"]
end
subgraph dep["Deployment"]
l1["Self-Hosted"]
l2["Cloud Services"]
l3["CI/CD Pipeline"]
end
subgraph tmpl["Templates"]
m1["Sales Agent"]
m2["Marketing Agent"]
m3["Analytics Agent"]
m4["Training Agent"]
m5["Smart Form Agent"]
end
end
classDef done fill:#9e9e9e,color:#fff,stroke:#757575
%% =========================================
%% SYSTEM NODES
%% =========================================
subgraph WorkerBees [Worker Bees]
WB_C["Conversation"]
WB_SP["System prompt"]
subgraph Graph [Graph]
direction TB
N1["Node"] --> N2["Node"] --> N3["Node"]
N1 -.-> AN["Active Node"]
N2 -.-> AN
N3 -.-> AN
%% Nested Event Loop Node
subgraph EventLoopNode [Event Loop Node]
ELN_L["listener"]
ELN_SP["System Prompt<br/>(Task)"]
ELN_EL["Event loop"]
ELN_C["Conversation"]
end
end
end
subgraph JudgeNode [Judge]
J_C["Criteria"]
J_P["Principles"]
J_EL["Event loop"] <--> J_S["Scheduler"]
end
subgraph QueenBee [Queen Bee]
QB_SP["System prompt"]
QB_EL["Event loop"]
QB_C["Conversation"]
end
subgraph Infra [Infra]
SA["Sub Agent"]
TR["Tool Registry"]
WTM["Write through Conversation Memory<br/>(Logs/RAM/Harddrive)"]
SM["Shared Memory<br/>(State/Harddrive)"]
EB["Event Bus<br/>(RAM)"]
CS["Credential Store<br/>(Harddrive/Cloud)"]
end
subgraph PC [PC]
B["Browser"]
CB["Codebase<br/>v 0.0.x ... v n.n.n"]
end
%% =========================================
%% CONNECTIONS & DATA FLOW
%% =========================================
%% External Event Routing
E_Sch --> ELN_L
E_WH --> ELN_L
E_SSE --> ELN_L
ELN_L -->|"triggers"| ELN_EL
%% User Interactions
User -->|"Talk"| WB_C
User -->|"Talk"| QB_C
User -->|"Read/Write Access"| CS
%% Inter-System Logic
ELN_C <-->|"Mirror"| WB_C
WB_C -->|"Focus"| AN
WorkerBees -->|"Inquire"| JudgeNode
JudgeNode -->|"Approve"| WorkerBees
%% Judge Alignments
J_C <-.->|"aligns"| WB_SP
J_P <-.->|"aligns"| QB_SP
%% Escalate path
J_EL -->|"Report (Escalate)"| QB_EL
%% Pub/Sub Logic
AN -->|"publish"| EB
EB -->|"subscribe"| QB_C
%% Infra and Process Spawning
ELN_EL -->|"Spawn"| SA
SA -->|"Inform"| ELN_EL
SA -->|"Starts"| B
B -->|"Report"| ELN_EL
TR -->|"Assigned"| EventLoopNode
CB -->|"Modify Worker Bee"| WorkerBees
%% =========================================
%% SHARED MEMORY & LOGS ACCESS
%% =========================================
%% Worker Bees Access
Graph <-->|"Read/Write"| WTM
Graph <-->|"Read/Write"| SM
%% Queen Bee Access
QB_C <-->|"Read/Write"| WTM
QB_EL <-->|"Read/Write"| SM
%% Credentials Access
CS -->|"Read Access"| QB_C
```
## Contributing
+3 -3
View File
@@ -82,7 +82,7 @@ Register an MCP server as a tool source for your agent.
"example_tool"
],
"total_mcp_servers": 1,
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in llm_tool_use nodes."
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
}
```
@@ -149,7 +149,7 @@ List tools available from registered MCP servers.
]
},
"total_tools": 6,
"note": "Use these tool names in the 'tools' parameter when adding llm_tool_use nodes"
"note": "Use these tool names in the 'tools' parameter when adding event_loop nodes"
}
```
@@ -246,7 +246,7 @@ Here's a complete workflow for building an agent with MCP tools:
"node_id": "web-searcher",
"name": "Web Search",
"description": "Search the web for information",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"query\"]",
"output_keys": "[\"search_results\"]",
"system_prompt": "Search for {query} using the web_search tool",
+2 -2
View File
@@ -119,7 +119,7 @@ builder = WorkflowBuilder()
builder.add_node(
node_id="researcher",
name="Web Researcher",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Research the topic using web_search",
tools=["web_search"], # Tool from tools MCP server
input_keys=["topic"],
@@ -137,7 +137,7 @@ Tools from MCP servers can be referenced in your agent.json just like built-in t
{
"id": "searcher",
"name": "Web Searcher",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"system_prompt": "Search for information about {topic}",
"tools": ["web_search", "web_scrape"],
"input_keys": ["topic"],
+17 -70
View File
@@ -103,31 +103,20 @@ Add a processing node to the agent graph.
- `node_id` (string, required): Unique node identifier
- `name` (string, required): Human-readable name
- `description` (string, required): What this node does
- `node_type` (string, required): One of: `llm_generate`, `llm_tool_use`, `router`, `function`
- `node_type` (string, required): Must be `event_loop` (the only valid type)
- `input_keys` (string, required): JSON array of input variable names
- `output_keys` (string, required): JSON array of output variable names
- `system_prompt` (string, optional): System prompt for LLM nodes
- `tools` (string, optional): JSON array of tool names for tool_use nodes
- `routes` (string, optional): JSON object of route mappings for router nodes
- `system_prompt` (string, optional): System prompt for the LLM
- `tools` (string, optional): JSON array of tool names
- `client_facing` (boolean, optional): Set to true for human-in-the-loop interaction
**Node Types:**
**Node Type:**
1. **llm_generate**: Uses LLM to generate output from inputs
- Requires: `system_prompt`
- Tools: Not used
2. **llm_tool_use**: Uses LLM with tools to accomplish tasks
- Requires: `system_prompt`, `tools`
- Tools: Array of tool names (e.g., `["web_search", "web_fetch"]`)
3. **router**: LLM-powered routing to different paths
- Requires: `system_prompt`, `routes`
- Routes: Object mapping route names to target node IDs
- Example: `{"pass": "success_node", "fail": "retry_node"}`
4. **function**: Executes a pre-defined function
- System prompt describes the function behavior
- No LLM calls, pure computation
**event_loop**: LLM-powered node with self-correction loop
- Requires: `system_prompt`
- Optional: `tools` (array of tool names, e.g., `["web_search", "web_fetch"]`)
- Optional: `client_facing` (set to true for HITL / user interaction)
- Supports: iterative refinement, judge-based evaluation, tool use, streaming
**Example:**
```json
@@ -135,7 +124,7 @@ Add a processing node to the agent graph.
"node_id": "search_sources",
"name": "Search Sources",
"description": "Searches for relevant sources on the topic",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"topic\", \"search_queries\"]",
"output_keys": "[\"sources\", \"source_count\"]",
"system_prompt": "Search for sources using the provided queries...",
@@ -198,7 +187,7 @@ Export the validated graph as an agent specification.
**What it does:**
1. Validates the graph
2. Auto-generates missing edges from router routes
2. Validates edge connectivity
3. Writes files to disk:
- `exports/{agent-name}/agent.json` - Full agent specification
- `exports/{agent-name}/README.md` - Auto-generated documentation
@@ -252,47 +241,6 @@ Test the complete agent graph with sample inputs.
---
### Evaluation Rules
#### `add_evaluation_rule`
Add a rule for the HybridJudge to evaluate node outputs.
**Parameters:**
- `rule_id` (string, required): Unique rule identifier
- `description` (string, required): What this rule checks
- `condition` (string, required): Python expression to evaluate
- `action` (string, required): Action to take: `accept`, `retry`, `escalate`
- `priority` (integer, optional): Rule priority (default: 0)
- `feedback_template` (string, optional): Feedback message template
**Condition Examples:**
- `'result.get("success") == True'` - Check for success flag
- `'result.get("error_type") == "timeout"'` - Check error type
- `'len(result.get("data", [])) > 0'` - Check for non-empty data
**Example:**
```json
{
"rule_id": "timeout_retry",
"description": "Retry on timeout errors",
"condition": "result.get('error_type') == 'timeout'",
"action": "retry",
"priority": 10,
"feedback_template": "Timeout occurred, retrying..."
}
```
#### `list_evaluation_rules`
List all configured evaluation rules.
#### `remove_evaluation_rule`
Remove an evaluation rule.
**Parameters:**
- `rule_id` (string, required): Rule to remove
---
## Example Workflow
Here's a complete workflow for building a research agent:
@@ -320,7 +268,7 @@ add_node(
node_id="planner",
name="Research Planner",
description="Creates research strategy",
node_type="llm_generate",
node_type="event_loop",
input_keys='["topic"]',
output_keys='["strategy", "queries"]',
system_prompt="Analyze topic and create research plan..."
@@ -330,7 +278,7 @@ add_node(
node_id="searcher",
name="Search Sources",
description="Find relevant sources",
node_type="llm_tool_use",
node_type="event_loop",
input_keys='["queries"]',
output_keys='["sources"]',
system_prompt="Search for sources...",
@@ -359,10 +307,9 @@ The exported agent will be saved to `exports/research-agent/`.
1. **Start with the goal**: Define clear success criteria before building nodes
2. **Test nodes individually**: Use `test_node` to verify each node works
3. **Use router nodes for branching**: Don't create edges manually for routers - define routes and they'll be auto-generated
4. **Add evaluation rules**: Help the judge evaluate outputs deterministically
5. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
6. **Check exports**: Review the generated README.md to verify your agent structure
3. **Use conditional edges for branching**: Define condition_expr on edges for decision points
4. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
5. **Check exports**: Review the generated README.md to verify your agent structure
---
+1 -1
View File
@@ -73,7 +73,7 @@ To use the agent builder with Claude Desktop or other MCP clients, add this to y
The MCP server provides tools for:
- Creating agent building sessions
- Defining goals with success criteria
- Adding nodes (llm_generate, llm_tool_use, router, function)
- Adding nodes (event_loop only)
- Connecting nodes with edges
- Validating and exporting agent graphs
- Testing nodes and full agent graphs
+14 -5
View File
@@ -68,7 +68,7 @@ from framework.graph.event_loop_node import ( # noqa: E402
)
from framework.graph.executor import GraphExecutor # noqa: E402
from framework.graph.goal import Goal # noqa: E402
from framework.graph.node import NodeSpec # noqa: E402
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
@@ -654,7 +654,7 @@ NODE_SPECS = {
id="sender",
name="Sender",
description="Send approved campaign emails",
node_type="function",
node_type="event_loop",
input_keys=["approved_emails"],
output_keys=["send_results"],
),
@@ -823,11 +823,20 @@ def _send_email_via_resend(
return {"error": f"Network error: {e}"}
class SenderNode(NodeProtocol):
"""Node wrapper for send_emails function."""
async def execute(self, ctx: NodeContext) -> NodeResult:
approved = ctx.input_data.get("approved_emails", "")
result_str = send_emails(approved_emails=approved)
ctx.memory.write("send_results", result_str)
return NodeResult(success=True, output={"send_results": result_str})
def send_emails(approved_emails: str = "") -> str:
"""Send approved campaign emails via Resend, or log if unconfigured.
Called by FunctionNode which unpacks input_keys as kwargs.
Returns a JSON string (FunctionNode wraps it in NodeResult).
Returns a JSON string.
"""
approved = approved_emails
if not approved:
@@ -1780,7 +1789,7 @@ async def _run_pipeline(websocket, initial_message: str):
)
for nid, impl in nodes.items():
executor.register_node(nid, impl)
executor.register_function("sender", send_emails)
executor.register_node("sender", SenderNode())
# --- Event forwarding: bus → WebSocket ---
+28 -19
View File
@@ -4,8 +4,8 @@ Minimal Manual Agent Example
This example demonstrates how to build and run an agent programmatically
without using the Claude Code CLI or external LLM APIs.
It uses 'function' nodes to define logic in pure Python, making it perfect
for understanding the core runtime loop:
It uses custom NodeProtocol implementations to define logic in pure Python,
making it perfect for understanding the core runtime loop:
Setup -> Graph definition -> Execution -> Result
Run with:
@@ -16,22 +16,33 @@ import asyncio
from framework.graph import EdgeCondition, EdgeSpec, Goal, GraphSpec, NodeSpec
from framework.graph.executor import GraphExecutor
from framework.graph.node import NodeContext, NodeProtocol, NodeResult
from framework.runtime.core import Runtime
# 1. Define Node Logic (Pure Python Functions)
def greet(name: str) -> str:
# 1. Define Node Logic (Custom NodeProtocol implementations)
class GreeterNode(NodeProtocol):
"""Generate a simple greeting."""
return f"Hello, {name}!"
async def execute(self, ctx: NodeContext) -> NodeResult:
name = ctx.input_data.get("name", "World")
greeting = f"Hello, {name}!"
ctx.memory.write("greeting", greeting)
return NodeResult(success=True, output={"greeting": greeting})
def uppercase(greeting: str) -> str:
class UppercaserNode(NodeProtocol):
"""Convert text to uppercase."""
return greeting.upper()
async def execute(self, ctx: NodeContext) -> NodeResult:
greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
result = greeting.upper()
ctx.memory.write("final_greeting", result)
return NodeResult(success=True, output={"final_greeting": result})
async def main():
print("🚀 Setting up Manual Agent...")
print("Setting up Manual Agent...")
# 2. Define the Goal
# Every agent needs a goal with success criteria
@@ -55,8 +66,7 @@ async def main():
id="greeter",
name="Greeter",
description="Generates a simple greeting",
node_type="function",
function="greet", # Matches the registered function name
node_type="event_loop",
input_keys=["name"],
output_keys=["greeting"],
)
@@ -65,8 +75,7 @@ async def main():
id="uppercaser",
name="Uppercaser",
description="Converts greeting to uppercase",
node_type="function",
function="uppercase",
node_type="event_loop",
input_keys=["greeting"],
output_keys=["final_greeting"],
)
@@ -98,23 +107,23 @@ async def main():
runtime = Runtime(storage_path=Path("./agent_logs"))
executor = GraphExecutor(runtime=runtime)
# 7. Register Function Implementations
# Connect string names in NodeSpecs to actual Python functions
executor.register_function("greeter", greet)
executor.register_function("uppercaser", uppercase)
# 7. Register Node Implementations
# Connect node IDs in the graph to actual Python implementations
executor.register_node("greeter", GreeterNode())
executor.register_node("uppercaser", UppercaserNode())
# 8. Execute Agent
print("Executing agent with input: name='Alice'...")
print("Executing agent with input: name='Alice'...")
result = await executor.execute(graph=graph, goal=goal, input_data={"name": "Alice"})
# 9. Verify Results
if result.success:
print("\nSuccess!")
print("\nSuccess!")
print(f"Path taken: {' -> '.join(result.path)}")
print(f"Final output: {result.output.get('final_greeting')}")
else:
print(f"\nFailed: {result.error}")
print(f"\nFailed: {result.error}")
if __name__ == "__main__":
+2 -2
View File
@@ -122,7 +122,7 @@ async def example_4_custom_agent_with_mcp_tools():
node_id="web-searcher",
name="Web Search",
description="Search the web for information",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Search for {query} and return the top results. Use the web_search tool.",
tools=["web_search"], # This tool comes from tools MCP server
input_keys=["query"],
@@ -133,7 +133,7 @@ async def example_4_custom_agent_with_mcp_tools():
node_id="summarizer",
name="Summarize Results",
description="Summarize the search results",
node_type="llm_generate",
node_type="event_loop",
system_prompt="Summarize the following search results in 2-3 sentences: {search_results}",
input_keys=["search_results"],
output_keys=["summary"],
+13
View File
@@ -0,0 +1,13 @@
"""Framework-provided agents."""
from pathlib import Path
FRAMEWORK_AGENTS_DIR = Path(__file__).parent
def list_framework_agents() -> list[Path]:
"""List all framework agent directories."""
return sorted(
[p for p in FRAMEWORK_AGENTS_DIR.iterdir() if p.is_dir() and (p / "agent.py").exists()],
key=lambda p: p.name,
)
@@ -0,0 +1,48 @@
"""
Credential Tester verify synced credentials via live API calls.
Interactive agent that lists connected accounts, lets the user pick one,
loads the provider's tools, and runs a chat session to test the credential.
"""
from .agent import (
CredentialTesterAgent,
configure_for_account,
conversation_mode,
edges,
entry_node,
entry_points,
goal,
identity_prompt,
list_connected_accounts,
loop_config,
nodes,
pause_nodes,
requires_account_selection,
skip_credential_validation,
skip_guardian,
terminal_nodes,
)
from .config import default_config
__version__ = "1.0.0"
__all__ = [
"CredentialTesterAgent",
"configure_for_account",
"conversation_mode",
"default_config",
"edges",
"entry_node",
"entry_points",
"goal",
"identity_prompt",
"list_connected_accounts",
"loop_config",
"nodes",
"pause_nodes",
"requires_account_selection",
"skip_credential_validation",
"skip_guardian",
"terminal_nodes",
]
@@ -0,0 +1,148 @@
"""CLI entry point for Credential Tester agent."""
import asyncio
import logging
import sys
import click
from .agent import CredentialTesterAgent
def setup_logging(verbose=False, debug=False):
if debug:
level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
elif verbose:
level, fmt = logging.INFO, "%(message)s"
else:
level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
def pick_account(agent: CredentialTesterAgent) -> dict | None:
"""Interactive account picker. Returns selected account dict or None."""
accounts = agent.list_accounts()
if not accounts:
click.echo("No connected accounts found.")
click.echo("Set ADEN_API_KEY and connect accounts at https://app.adenhq.com")
return None
click.echo("\nConnected accounts:\n")
for i, acct in enumerate(accounts, 1):
provider = acct.get("provider", "?")
alias = acct.get("alias", "?")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
click.echo(f" {i}. {provider}/{alias}{detail}")
click.echo()
while True:
choice = click.prompt("Pick an account to test", type=int, default=1)
if 1 <= choice <= len(accounts):
return accounts[choice - 1]
click.echo(f"Invalid choice. Enter 1-{len(accounts)}.")
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Credential Tester — verify synced credentials via live API calls."""
pass
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
def tui(verbose, debug):
"""Launch TUI to test a credential interactively."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires 'textual'. Install with: pip install textual")
sys.exit(1)
agent = CredentialTesterAgent()
account = pick_account(agent)
if account is None:
sys.exit(1)
agent.select_account(account)
provider = account.get("provider", "?")
alias = account.get("alias", "?")
click.echo(f"\nTesting {provider}/{alias}...\n")
async def run_tui():
agent._setup()
runtime = agent._agent_runtime
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_tui())
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
def shell(verbose, debug):
"""Interactive CLI session to test a credential."""
setup_logging(verbose=verbose, debug=debug)
asyncio.run(_interactive_shell(verbose))
async def _interactive_shell(verbose=False):
agent = CredentialTesterAgent()
account = pick_account(agent)
if account is None:
return
agent.select_account(account)
provider = account.get("provider", "?")
alias = account.get("alias", "?")
click.echo(f"\nTesting {provider}/{alias}")
click.echo("Type your requests or 'quit' to exit.\n")
await agent.start()
try:
result = await agent._agent_runtime.trigger_and_wait(
entry_point_id="start",
input_data={},
)
if result:
click.echo(f"\nSession ended: {'success' if result.success else result.error}")
except KeyboardInterrupt:
click.echo("\nGoodbye!")
finally:
await agent.stop()
@cli.command(name="list")
def list_accounts():
"""List all connected accounts."""
agent = CredentialTesterAgent()
accounts = agent.list_accounts()
if not accounts:
click.echo("No connected accounts found.")
return
click.echo("\nConnected accounts:\n")
for acct in accounts:
provider = acct.get("provider", "?")
alias = acct.get("alias", "?")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
click.echo(f" {provider}/{alias}{detail}")
if __name__ == "__main__":
cli()
@@ -0,0 +1,400 @@
"""Credential Tester agent — verify synced credentials via live API calls.
A framework agent that lets the user pick a connected account and test it
by making real API calls via the provider's tools.
When loaded via AgentRunner.load() (TUI picker, ``hive run``), the module-level
``nodes`` / ``edges`` variables provide a static graph. The TUI detects
``requires_account_selection`` and shows an account picker *before* starting
the agent. ``configure_for_account()`` then scopes the node's tools to the
selected provider.
When used directly (``CredentialTesterAgent``), the graph is built dynamically
after the user picks an account programmatically.
"""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
from framework.graph import Goal, NodeSpec, SuccessCriterion
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config
from .nodes import build_tester_node
if TYPE_CHECKING:
from framework.runner import AgentRunner
# ---------------------------------------------------------------------------
# Goal
# ---------------------------------------------------------------------------
goal = Goal(
id="credential-tester",
name="Credential Tester",
description="Verify that a synced credential can make real API calls.",
success_criteria=[
SuccessCriterion(
id="api-call-success",
description="At least one API call succeeds using the credential",
metric="api_call_success",
target="true",
weight=1.0,
),
],
constraints=[],
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def get_tools_for_provider(provider_name: str) -> list[str]:
"""Collect tool names for a specific Aden credential by credential_id.
Matches on ``credential_id`` (e.g. "google" Gmail tools only),
NOT ``aden_provider_name`` which can be shared across products
(e.g. both google and google_docs have aden_provider_name="google").
"""
from aden_tools.credentials import CREDENTIAL_SPECS
tools: list[str] = []
for spec in CREDENTIAL_SPECS.values():
if spec.credential_id == provider_name:
tools.extend(spec.tools)
return sorted(set(tools))
def list_connected_accounts() -> list[dict]:
"""List connected accounts from GET /v1/credentials."""
import os
from framework.credentials.aden.client import AdenClientConfig, AdenCredentialClient
api_key = os.environ.get("ADEN_API_KEY")
if not api_key:
return []
client = AdenCredentialClient(
AdenClientConfig(
base_url=os.environ.get("ADEN_API_URL", "https://api.adenhq.com"),
)
)
try:
integrations = client.list_integrations()
finally:
client.close()
return [
{
"provider": c.provider,
"alias": c.alias,
"identity": {"email": c.email} if c.email else {},
"integration_id": c.integration_id,
}
for c in integrations
if c.status == "active"
]
# ---------------------------------------------------------------------------
# Module-level hooks (read by AgentRunner.load / TUI)
# ---------------------------------------------------------------------------
skip_credential_validation = True
"""Don't validate credentials at load time — we don't know which provider yet."""
skip_guardian = True
"""Don't attach the Hive Coder guardian — this is a standalone utility agent."""
requires_account_selection = True
"""Signal TUI to show account picker before starting the agent."""
def configure_for_account(runner: AgentRunner, account: dict) -> None:
"""Scope the tester node's tools to the selected provider.
Called by the TUI after the user picks an account from the picker.
After scoping, re-enables credential validation so the selected
provider's credentials are checked before the agent starts.
"""
provider = account["provider"]
tools = get_tools_for_provider(provider)
tools.append("get_account_info")
alias = account.get("alias", "unknown")
email = account.get("identity", {}).get("email", "")
detail = f" (email: {email})" if email else ""
for node in runner.graph.nodes:
if node.id == "tester":
node.tools = sorted(set(tools))
# Update system prompt to be provider-specific
node.system_prompt = f"""\
You are a credential tester for the account: {provider}/{alias}{detail}
# Instructions
1. Suggest a simple read-only API call to verify the credential works \
(e.g. list messages, list channels, list contacts).
2. Execute the call when the user agrees.
3. Report the result: success (with sample data) or failure (with error).
4. Let the user request additional API calls to further test the credential.
# Account routing
IMPORTANT: Always pass `account="{alias}"` when calling any tool. \
This routes the API call to the correct credential. Never use the email \
or any other identifier always use the alias exactly as shown.
# Rules
- Start with read-only operations (list, get) before write operations.
- Always confirm with the user before performing write operations.
- If a call fails, report the exact error this helps diagnose credential issues.
- Be concise. No emojis.
"""
break
# Set intro message for TUI display
runner.intro_message = (
f"Testing {provider}/{alias}{detail}"
f"{len(tools)} tools loaded. "
f"I'll suggest a read-only API call to verify the credential works."
)
# ---------------------------------------------------------------------------
# Module-level graph variables (read by AgentRunner.load)
# ---------------------------------------------------------------------------
# The static node starts with minimal tools. configure_for_account() scopes
# it to the selected provider's tools before execution.
nodes = [
NodeSpec(
id="tester",
name="Credential Tester",
description=(
"Interactive credential testing — lets the user pick an account "
"and verify it via API calls."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=[],
output_keys=[],
tools=["get_account_info"],
system_prompt="""\
You are a credential tester. Your job is to help the user verify that their \
connected accounts can make real API calls.
# Startup
1. Call ``get_account_info`` to list the user's connected accounts.
2. Present the list and ask the user which account to test.
3. Once they pick one, note the account's **alias** (e.g. "Timothy", "work-slack").
4. Suggest a simple read-only API call to verify the credential works \
(e.g. list messages, list channels, list contacts).
5. Execute the call when the user agrees.
6. Report the result: success (with sample data) or failure (with error).
7. Let the user request additional API calls to further test the credential.
# Account routing
IMPORTANT: Always pass the account's **alias** as the ``account`` parameter \
when calling any tool. The alias is the routing key never use the email or \
any other identifier. For example, if the alias is "Timothy", call \
``gmail_list_messages(account="Timothy", ...)``.
# Rules
- Start with read-only operations (list, get) before write operations.
- Always confirm with the user before performing write operations.
- If a call fails, report the exact error this helps diagnose credential issues.
- Be concise. No emojis.
""",
),
]
edges = []
entry_node = "tester"
entry_points = {"start": "tester"}
pause_nodes = []
terminal_nodes = [] # Forever-alive: loops until user exits
conversation_mode = "continuous"
identity_prompt = (
"You are a credential tester that verifies connected accounts can make real API calls."
)
loop_config = {
"max_iterations": 50,
"max_tool_calls_per_turn": 10,
"max_history_tokens": 32000,
}
# ---------------------------------------------------------------------------
# Programmatic agent class (used by __main__.py CLI)
# ---------------------------------------------------------------------------
class CredentialTesterAgent:
"""Interactive agent that tests a specific credential via API calls.
Usage:
agent = CredentialTesterAgent()
accounts = agent.list_accounts()
agent.select_account(accounts[0])
await agent.start()
# ... user chats via TUI or CLI ...
await agent.stop()
"""
def __init__(self, config=None):
self.config = config or default_config
self._selected_account: dict | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def list_accounts(self) -> list[dict]:
"""List connected accounts from the Aden credential store."""
return list_connected_accounts()
def select_account(self, account: dict) -> None:
"""Select an account to test.
Args:
account: Account dict from list_accounts() with
provider, alias, identity keys.
"""
self._selected_account = account
@property
def selected_provider(self) -> str:
if self._selected_account is None:
raise RuntimeError("No account selected. Call select_account() first.")
return self._selected_account["provider"]
@property
def selected_alias(self) -> str:
if self._selected_account is None:
raise RuntimeError("No account selected. Call select_account() first.")
return self._selected_account.get("alias", "unknown")
def _build_graph(self) -> GraphSpec:
provider = self.selected_provider
alias = self.selected_alias
identity = self._selected_account.get("identity", {})
tools = get_tools_for_provider(provider)
tester_node = build_tester_node(
provider=provider,
alias=alias,
tools=tools,
identity=identity,
)
return GraphSpec(
id="credential-tester-graph",
goal_id=goal.id,
version="1.0.0",
entry_node="tester",
entry_points={"start": "tester"},
terminal_nodes=[],
pause_nodes=[],
nodes=[tester_node],
edges=[],
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config={
"max_iterations": 50,
"max_tool_calls_per_turn": 10,
"max_history_tokens": 32000,
},
conversation_mode="continuous",
identity_prompt=(
f"You are testing the {provider}/{alias} credential. "
"Help the user verify it works by making real API calls."
),
)
def _setup(self) -> None:
if self._selected_account is None:
raise RuntimeError("No account selected. Call select_account() first.")
self._storage_path = Path.home() / ".hive" / "agents" / "credential_tester"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
self._tool_registry.load_mcp_config(mcp_config_path)
extra_kwargs = getattr(self.config, "extra_kwargs", {}) or {}
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
**extra_kwargs,
)
tool_executor = self._tool_registry.get_executor()
tools = list(self._tool_registry.get_tools().values())
graph = self._build_graph()
self._agent_runtime = create_agent_runtime(
graph=graph,
goal=goal,
storage_path=self._storage_path,
entry_points=[
EntryPointSpec(
id="start",
name="Test Credential",
entry_node="tester",
trigger_type="manual",
isolation_level="isolated",
),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=CheckpointConfig(enabled=False),
graph_id="credential_tester",
)
async def start(self) -> None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup()
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Stop the agent runtime."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def run(self) -> ExecutionResult:
"""Run the agent (convenience for single execution)."""
await self.start()
try:
result = await self._agent_runtime.trigger_and_wait(
entry_point_id="start",
input_data={},
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
@@ -0,0 +1,19 @@
"""Runtime configuration for Credential Tester agent."""
from dataclasses import dataclass
from framework.config import RuntimeConfig
@dataclass
class AgentMetadata:
name: str = "Credential Tester"
version: str = "1.0.0"
description: str = (
"Test connected accounts by making real API calls. "
"Pick an account, verify credentials work, and explore available tools."
)
metadata = AgentMetadata()
default_config = RuntimeConfig(temperature=0.3)
@@ -0,0 +1,9 @@
{
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../../../tools",
"description": "Hive tools MCP server with provider-specific tools"
}
}
@@ -0,0 +1,69 @@
"""Node definitions for Credential Tester agent."""
from framework.graph import NodeSpec
def build_tester_node(
provider: str,
alias: str,
tools: list[str],
identity: dict[str, str],
) -> NodeSpec:
"""Build the tester node dynamically for the selected account.
Args:
provider: Aden provider name (e.g. "google", "slack").
alias: User-set alias (e.g. "Timothy").
tools: Tool names available for this provider.
identity: Identity dict (email, workspace, etc.) for context.
"""
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
return NodeSpec(
id="tester",
name="Credential Tester",
description=(
f"Interactive testing node for {provider}/{alias}. "
f"Has access to all {provider} tools to verify the credential works."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=[],
output_keys=[],
tools=tools,
system_prompt=f"""\
You are a credential tester for the account: {provider}/{alias}{detail}
Your job is to help the user verify that this credential works by making \
real API calls using the available tools.
# Account routing
IMPORTANT: Always pass `account="{alias}"` when calling any tool. \
This routes the API call to the correct credential. Never use the email \
or any other identifier always use the alias exactly as shown.
# Instructions
1. Start by greeting the user and confirming which account you're testing.
2. Suggest a simple, safe, read-only API call to verify the credential works \
(e.g. list messages, list channels, list contacts).
3. Execute the call when the user agrees.
4. Report the result clearly: success (with sample data) or failure (with error).
5. Let the user request additional API calls to further test the credential.
# Available tools
You have access to {len(tools)} tools for {provider}:
{chr(10).join(f"- {t}" for t in tools)}
# Rules
- Start with read-only operations (list, get) before write operations (create, update, delete).
- Always confirm with the user before performing write operations.
- If a call fails, report the exact error this helps diagnose credential issues.
- Be concise. No emojis.
""",
)
@@ -0,0 +1,44 @@
"""
Hive Coder Native coding agent that builds Hive agent packages.
Deeply understands the agent framework and produces complete Python packages
with goals, nodes, edges, system prompts, MCP configuration, and tests
from natural language specifications.
"""
from .agent import (
HiveCoderAgent,
conversation_mode,
default_agent,
edges,
entry_node,
entry_points,
goal,
identity_prompt,
loop_config,
nodes,
pause_nodes,
terminal_nodes,
)
from .config import AgentMetadata, RuntimeConfig, default_config, metadata
__version__ = "1.0.0"
__all__ = [
"HiveCoderAgent",
"default_agent",
"goal",
"nodes",
"edges",
"entry_node",
"entry_points",
"pause_nodes",
"terminal_nodes",
"conversation_mode",
"identity_prompt",
"loop_config",
"RuntimeConfig",
"AgentMetadata",
"default_config",
"metadata",
]
@@ -0,0 +1,223 @@
"""CLI entry point for Hive Coder agent."""
import asyncio
import json
import logging
import sys
import click
from .agent import HiveCoderAgent, default_agent
def setup_logging(verbose=False, debug=False):
"""Configure logging for execution visibility."""
if debug:
level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
elif verbose:
level, fmt = logging.INFO, "%(message)s"
else:
level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
logging.getLogger("framework").setLevel(level)
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""Hive Coder — Build Hive agent packages from natural language."""
pass
@cli.command()
@click.option("--request", "-r", type=str, required=True, help="What agent to build")
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--quiet", "-q", is_flag=True, help="Only output result JSON")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
def run(request, mock, quiet, verbose, debug):
"""Execute agent building from a request."""
if not quiet:
setup_logging(verbose=verbose, debug=debug)
context = {"user_request": request}
result = asyncio.run(default_agent.run(context, mock_mode=mock))
output_data = {
"success": result.success,
"steps_executed": result.steps_executed,
"output": result.output,
}
if result.error:
output_data["error"] = result.error
click.echo(json.dumps(output_data, indent=2, default=str))
sys.exit(0 if result.success else 1)
@cli.command()
@click.option("--mock", is_flag=True, help="Run in mock mode")
@click.option("--verbose", "-v", is_flag=True, help="Show execution details")
@click.option("--debug", is_flag=True, help="Show debug logging")
def tui(mock, verbose, debug):
"""Launch the TUI dashboard for interactive agent building."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires the 'textual' package. Install with: pip install textual")
sys.exit(1)
from pathlib import Path
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
async def run_with_tui():
agent = HiveCoderAgent()
agent._tool_registry = ToolRegistry()
storage_path = Path.home() / ".hive" / "agents" / "hive_coder"
storage_path.mkdir(parents=True, exist_ok=True)
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
agent._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock:
llm = LiteLLMProvider(
model=agent.config.model,
api_key=agent.config.api_key,
api_base=agent.config.api_base,
)
tools = list(agent._tool_registry.get_tools().values())
tool_executor = agent._tool_registry.get_executor()
graph = agent._build_graph()
runtime = create_agent_runtime(
graph=graph,
goal=agent.goal,
storage_path=storage_path,
entry_points=[
EntryPointSpec(
id="start",
name="Build Agent",
entry_node="coder",
trigger_type="manual",
isolation_level="isolated",
),
],
llm=llm,
tools=tools,
tool_executor=tool_executor,
)
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_with_tui())
@cli.command()
@click.option("--json", "output_json", is_flag=True)
def info(output_json):
"""Show agent information."""
info_data = default_agent.info()
if output_json:
click.echo(json.dumps(info_data, indent=2))
else:
click.echo(f"Agent: {info_data['name']}")
click.echo(f"Version: {info_data['version']}")
click.echo(f"Description: {info_data['description']}")
click.echo(f"\nNodes: {', '.join(info_data['nodes'])}")
click.echo(f"Client-facing: {', '.join(info_data['client_facing_nodes'])}")
click.echo(f"Entry: {info_data['entry_node']}")
click.echo(f"Terminal: {', '.join(info_data['terminal_nodes']) or '(forever-alive)'}")
@cli.command()
def validate():
"""Validate agent structure."""
validation = default_agent.validate()
if validation["valid"]:
click.echo("Agent is valid")
if validation["warnings"]:
for warning in validation["warnings"]:
click.echo(f" WARNING: {warning}")
else:
click.echo("Agent has errors:")
for error in validation["errors"]:
click.echo(f" ERROR: {error}")
sys.exit(0 if validation["valid"] else 1)
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
def shell(verbose):
"""Interactive agent building session (CLI, no TUI)."""
asyncio.run(_interactive_shell(verbose))
async def _interactive_shell(verbose=False):
"""Async interactive shell."""
setup_logging(verbose=verbose)
click.echo("=== Hive Coder ===")
click.echo("Describe the agent you want to build (or 'quit' to exit):\n")
agent = HiveCoderAgent()
await agent.start()
try:
while True:
try:
request = await asyncio.get_event_loop().run_in_executor(None, input, "Build> ")
if request.lower() in ["quit", "exit", "q"]:
click.echo("Goodbye!")
break
if not request.strip():
continue
click.echo("\nBuilding agent...\n")
result = await agent.trigger_and_wait("default", {"user_request": request})
if result is None:
click.echo("\n[Execution timed out]\n")
continue
if result.success:
output = result.output or {}
agent_name = output.get("agent_name", "unknown")
validation = output.get("validation_result", "unknown")
click.echo(f"\nAgent '{agent_name}' built. Validation: {validation}\n")
else:
click.echo(f"\nBuild failed: {result.error}\n")
except KeyboardInterrupt:
click.echo("\nGoodbye!")
break
except Exception as e:
click.echo(f"Error: {e}", err=True)
import traceback
traceback.print_exc()
finally:
await agent.stop()
if __name__ == "__main__":
cli()
+314
View File
@@ -0,0 +1,314 @@
"""Agent graph construction for Hive Coder."""
from pathlib import Path
from framework.graph import Constraint, Goal, SuccessCriterion
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import coder_node
# Goal definition
goal = Goal(
id="agent-builder",
name="Hive Agent Builder",
description=(
"Build complete, validated Hive agent packages from natural language "
"specifications. Produces production-ready Python packages with goals, "
"nodes, edges, system prompts, MCP configuration, and tests."
),
success_criteria=[
SuccessCriterion(
id="valid-package",
description="Generated agent package passes structural validation",
metric="validation_pass",
target="true",
weight=0.30,
),
SuccessCriterion(
id="complete-files",
description=(
"All required files generated: agent.py, config.py, "
"nodes/__init__.py, __init__.py, __main__.py, mcp_servers.json"
),
metric="file_count",
target=">=6",
weight=0.25,
),
SuccessCriterion(
id="user-satisfaction",
description="User reviews and approves the generated agent",
metric="user_approval",
target="true",
weight=0.25,
),
SuccessCriterion(
id="framework-compliance",
description=(
"Generated code follows framework patterns: STEP 1/STEP 2 "
"for client-facing, correct imports, entry_points format"
),
metric="pattern_compliance",
target="100%",
weight=0.20,
),
],
constraints=[
Constraint(
id="dynamic-tool-discovery",
description=(
"Always discover available tools dynamically via "
"discover_mcp_tools before referencing tools in agent designs"
),
constraint_type="hard",
category="correctness",
),
Constraint(
id="no-fabricated-tools",
description="Only reference tools that exist in hive-tools MCP",
constraint_type="hard",
category="correctness",
),
Constraint(
id="valid-python",
description="All generated Python files must be syntactically correct",
constraint_type="hard",
category="correctness",
),
Constraint(
id="self-verification",
description="Run validation after writing code; fix errors before presenting",
constraint_type="hard",
category="quality",
),
],
)
# Nodes — single coder node (guardian is now auto-attached by the framework)
nodes = [coder_node]
# No edges needed — single forever-alive event_loop node
edges = []
# Graph configuration
entry_node = "coder"
entry_points = {"start": "coder"}
pause_nodes = []
terminal_nodes = [] # Forever-alive: loops until user exits
# No async entry points — guardian is now auto-attached via attach_guardian()
async_entry_points = []
# Module-level variables read by AgentRunner.load()
conversation_mode = "continuous"
identity_prompt = (
"You are Hive Coder, the best agent-building coding agent on the planet. "
"You deeply understand the Hive agent framework at the source code level "
"and produce production-ready agent packages from natural language. "
"You can dynamically discover available framework tools, inspect runtime "
"sessions and checkpoints from agents you build, and run their test suites. "
"You follow coding agent discipline: read before writing, verify "
"assumptions by reading actual code, adhere to project conventions, "
"self-verify with validation, and fix your own errors. You are concise, "
"direct, and technically rigorous. No emojis. No fluff."
)
loop_config = {
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
}
class HiveCoderAgent:
"""
Hive Coder builds Hive agent packages from natural language.
Single-node architecture: the coder runs in a continuous while(true) loop.
The guardian watchdog is auto-attached by the framework in TUI mode.
"""
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self.async_entry_points = async_entry_points
self._graph: GraphSpec | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
return GraphSpec(
id="hive-coder-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config=loop_config,
conversation_mode=conversation_mode,
identity_prompt=identity_prompt,
async_entry_points=self.async_entry_points,
)
def _setup(self, mock_mode=False) -> None:
"""Set up the agent runtime."""
self._storage_path = Path.home() / ".hive" / "agents" / "hive_coder"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
if mcp_config_path.exists():
self._tool_registry.load_mcp_config(mcp_config_path)
llm = None
if not mock_mode:
llm = LiteLLMProvider(
model=self.config.model,
api_key=self.config.api_key,
api_base=self.config.api_base,
)
tool_executor = self._tool_registry.get_executor()
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True,
)
entry_point_specs = [
EntryPointSpec(
id="default",
name="Default",
entry_node=self.entry_node,
trigger_type="manual",
isolation_level="shared",
),
]
self._agent_runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=self._storage_path,
entry_points=entry_point_specs,
llm=llm,
tools=tools,
tool_executor=tool_executor,
checkpoint_config=checkpoint_config,
graph_id="hive_coder",
)
async def start(self, mock_mode=False) -> None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Stop the agent runtime and clean up."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(
self,
entry_point: str = "default",
input_data: dict | None = None,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point,
input_data=input_data or {},
session_state=session_state,
)
async def run(self, context: dict, mock_mode=False, session_state=None) -> ExecutionResult:
"""Run the agent (convenience method for single execution)."""
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait("default", context, session_state=session_state)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
def info(self):
"""Get agent information."""
return {
"name": metadata.name,
"version": metadata.version,
"description": metadata.description,
"goal": {
"name": self.goal.name,
"description": self.goal.description,
},
"nodes": [n.id for n in self.nodes],
"edges": [e.id for e in self.edges],
"entry_node": self.entry_node,
"entry_points": self.entry_points,
"pause_nodes": self.pause_nodes,
"terminal_nodes": self.terminal_nodes,
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
}
def validate(self):
"""Validate agent structure."""
errors = []
warnings = []
node_ids = {node.id for node in self.nodes}
for edge in self.edges:
if edge.source not in node_ids:
errors.append(f"Edge {edge.id}: source '{edge.source}' not found")
if edge.target not in node_ids:
errors.append(f"Edge {edge.id}: target '{edge.target}' not found")
if self.entry_node not in node_ids:
errors.append(f"Entry node '{self.entry_node}' not found")
for terminal in self.terminal_nodes:
if terminal not in node_ids:
errors.append(f"Terminal node '{terminal}' not found")
for ep_id, node_id in self.entry_points.items():
if node_id not in node_ids:
errors.append(f"Entry point '{ep_id}' references unknown node '{node_id}'")
return {
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
}
# Create default instance
default_agent = HiveCoderAgent()
@@ -0,0 +1,51 @@
"""Runtime configuration for Hive Coder agent."""
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "Hive Coder"
version: str = "1.0.0"
description: str = (
"Native coding agent that builds production-ready Hive agent packages "
"from natural language specifications. Deeply understands the agent framework "
"and produces complete Python packages with goals, nodes, edges, system prompts, "
"MCP configuration, and tests."
)
intro_message: str = (
"I'm Hive Coder — I build Hive agents. Describe what kind of agent "
"you want to create and I'll design, implement, and validate it for you."
)
metadata = AgentMetadata()
@@ -0,0 +1,96 @@
"""Attach the Hive Coder's guardian node to any agent runtime.
Usage::
from framework.agents.hive_coder.guardian import attach_guardian
runner._setup()
attach_guardian(runner._agent_runtime, runner._tool_registry)
await runner._agent_runtime.start()
Must be called **before** ``runtime.start()`` it injects the
guardian node into the graph and registers an event-driven entry point.
"""
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.execution_stream import EntryPointSpec
from .nodes import ALL_GUARDIAN_TOOLS, guardian_node
logger = logging.getLogger(__name__)
GUARDIAN_ENTRY_POINT = EntryPointSpec(
id="guardian",
name="Agent Guardian",
entry_node="guardian",
trigger_type="event",
trigger_config={
"event_types": [
"execution_failed",
"node_stalled",
"node_tool_doom_loop",
"constraint_violation",
],
"exclude_own_graph": False,
},
isolation_level="shared",
)
def attach_guardian(
runtime: AgentRuntime,
tool_registry: ToolRegistry,
) -> None:
"""Inject hive_coder's guardian node into *runtime*'s graph.
1. Registers graph lifecycle tools if not already present.
2. Refreshes the runtime's tool list and executor.
3. Adds the guardian node (with dynamically filtered tools) to the graph.
4. Registers an event-driven entry point that fires on execution failures,
stalls, tool doom loops, and constraint violations.
Must be called **before** ``runtime.start()``.
Raises:
RuntimeError: If the runtime is already running.
"""
from framework.tools.session_graph_tools import register_graph_tools
# 1. Register graph lifecycle tools if not already present
if not tool_registry.has_tool("load_agent"):
register_graph_tools(tool_registry, runtime)
# 2. Refresh tool schemas and executor on the runtime
runtime._tools = list(tool_registry.get_tools().values())
runtime._tool_executor = tool_registry.get_executor()
# 3. Filter guardian tools to only those available in the registry
available = set(tool_registry.get_tools().keys())
filtered_tools = [t for t in ALL_GUARDIAN_TOOLS if t in available]
# Build guardian node with filtered tool list
node = guardian_node.model_copy(update={"tools": filtered_tools})
# Add to the runtime's graph (so register_entry_point validation passes)
runtime.graph.nodes.append(node)
# Mark guardian as reachable in graph-level entry_points so
# GraphSpec.validate() doesn't flag it as unreachable.
runtime.graph.entry_points["guardian"] = "guardian"
# 4. Register event-driven entry point
runtime.register_entry_point(GUARDIAN_ENTRY_POINT)
logger.info(
"Guardian attached with %d tools: %s",
len(filtered_tools),
filtered_tools,
)
@@ -0,0 +1,9 @@
{
"coder-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "coder_tools_server.py", "--stdio"],
"cwd": "../../../../tools",
"description": "Unsandboxed file system tools for code generation and validation"
}
}
@@ -0,0 +1,556 @@
"""Node definitions for Hive Coder agent."""
from framework.graph import NodeSpec
# Single node — like opencode's while(true) loop.
# One continuous context handles the entire workflow:
# discover → design → implement → verify → present → iterate.
coder_node = NodeSpec(
id="coder",
name="Hive Coder",
description=(
"Autonomous coding agent that builds Hive agent packages. "
"Handles the full lifecycle: understanding user intent, "
"designing architecture, writing code, validating, and "
"iterating on feedback — all in one continuous conversation."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["user_request"],
output_keys=["agent_name", "validation_result"],
success_criteria=(
"A complete, validated Hive agent package exists at "
"exports/{agent_name}/ and passes structural validation."
),
system_prompt="""\
You are Hive Coder, the best agent-building coding agent. You build \
production-ready Hive agent packages from natural language.
# Core Mandates
- **Read before writing.** NEVER write code from assumptions. Read \
reference agents and templates first. Read every file before editing.
- **Conventions first.** Follow existing project patterns exactly. \
Analyze imports, structure, and style in reference agents.
- **Verify assumptions.** Never assume a class, import, or pattern \
exists. Read actual source to confirm. Search if unsure.
- **Discover tools dynamically.** NEVER reference tools from static \
docs. Always run discover_mcp_tools() to see what actually exists.
- **Professional objectivity.** If a use case is a poor fit for the \
framework, say so. Technical accuracy over validation.
- **Concise.** No emojis. No preambles. No postambles. Substance only.
- **Self-verify.** After writing code, run validation and tests. Fix \
errors yourself. Don't declare success until validation passes.
# Tools
## File I/O
- read_file(path, offset?, limit?) read with line numbers
- write_file(path, content) create/overwrite, auto-mkdir
- edit_file(path, old_text, new_text, replace_all?) fuzzy-match edit
- list_directory(path, recursive?) list contents
- search_files(pattern, path?, include?) regex search
- run_command(command, cwd?, timeout?) shell execution
- undo_changes(path?) restore from git snapshot
## Meta-Agent
- discover_mcp_tools(server_config_path?) connect to MCP servers \
and list all available tools with full schemas. Default: hive-tools.
- list_agents() list all agent packages in exports/ with session counts
- list_agent_sessions(agent_name, status?, limit?) list sessions
- get_agent_session_state(agent_name, session_id) full session state
- get_agent_session_memory(agent_name, session_id, key?) memory data
- list_agent_checkpoints(agent_name, session_id) list checkpoints
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) load checkpoint
- run_agent_tests(agent_name, test_types?, fail_fast?) run pytest with parsing
# Meta-Agent Capabilities
You are not just a file writer. You have deep integration with the \
Hive framework:
## Tool Discovery (MANDATORY before designing)
Before designing any agent, run discover_mcp_tools() to see what \
tools are actually available from the hive-tools MCP server. This \
returns full schemas with parameter names, types, and descriptions. \
NEVER guess tool names or parameters from memory. The tool catalog \
is the ground truth.
To check a specific agent's tools:
discover_mcp_tools("exports/{agent_name}/mcp_servers.json")
## Agent Awareness
Run list_agents() to see what agents already exist. Read their code \
for patterns:
read_file("exports/{name}/agent.py")
read_file("exports/{name}/nodes/__init__.py")
## Post-Build Testing
After writing agent code, validate structurally AND run tests:
run_command("python -c 'from {name} import default_agent; \\
print(default_agent.validate())'")
run_agent_tests("{name}")
## Debugging Built Agents
When a user says "my agent is failing" or "debug this agent":
1. list_agent_sessions("{agent_name}") find the session
2. get_agent_session_state("{agent_name}", "{session_id}") see status
3. get_agent_session_memory("{agent_name}", "{session_id}") inspect data
4. list_agent_checkpoints / get_agent_checkpoint trace execution
# Workflow
You operate in a continuous loop. The user describes what they want, \
you build it. No rigid phases use judgment. But the general flow is:
## 1. Understand
When the user describes what they want to build, hear the structure:
- The actors, the trigger, the core loop, the output, the pain.
Play back a model: "Here's what I'm picturing: [concrete picture]. \
Before I start [1-2 questions you can't infer]."
Ask only what you CANNOT infer. Fill blanks with domain knowledge.
## 2. Qualify
Assess framework fit honestly. Run discover_mcp_tools() to check \
what tools exist. Read the framework guide:
read_file("core/framework/agents/hive_coder/reference/framework_guide.md")
Consider:
- What works well (multi-turn, HITL, tool orchestration)
- Limitations (LLM latency, context limits, cost)
- Deal-breakers (missing tools, wrong paradigm)
Give a clear recommendation: proceed, adjust scope, or reconsider.
## 3. Design
Design the agent architecture:
- Goal: id, name, description, 3-5 success criteria, 2-4 constraints
- Nodes: **2-4 nodes MAXIMUM** (see rules below)
- Edges: on_success for linear, conditional for routing
- Lifecycle: ALWAYS forever-alive (`terminal_nodes=[]`) unless the user \
explicitly requests a one-shot/batch agent. Forever-alive agents loop \
continuously the user exits by closing the TUI. This is the standard \
pattern for all interactive agents.
### Node Count Rules (HARD LIMITS)
**2-4 nodes** for all agents. Never exceed 4 unless the user explicitly \
requests more. Each node boundary serializes outputs to shared memory \
and DESTROYS all in-context information (tool results, reasoning, history).
**MERGE nodes when:**
- Node has NO tools (pure LLM reasoning) merge into predecessor/successor
- Node sets only 1 trivial output collapse into predecessor
- Multiple consecutive autonomous nodes combine into one rich node
- A "report" or "summary" node merge into the client-facing node
- A "confirm" or "schedule" node that calls no external service remove
**SEPARATE nodes only when:**
- Client-facing vs autonomous (different interaction models)
- Fundamentally different tool sets
- Fan-out parallelism (parallel branches MUST be separate)
**Typical patterns:**
- 2 nodes: `interact (client-facing) process (autonomous) interact`
- 3 nodes: `intake (CF) process (auto) review (CF) intake`
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
Read reference agents before designing:
list_agents()
read_file("exports/deep_research_agent/agent.py")
read_file("exports/deep_research_agent/nodes/__init__.py")
Present the design with ASCII art graph. Get user approval.
## 4. Implement
Read templates before writing code:
read_file("core/framework/agents/hive_coder/reference/file_templates.md")
read_file("core/framework/agents/hive_coder/reference/anti_patterns.md")
Write files in order:
1. mkdir -p exports/{name}/nodes exports/{name}/tests
2. config.py RuntimeConfig + AgentMetadata
3. nodes/__init__.py NodeSpec definitions with system prompts
4. agent.py Goal, edges, graph, agent class
5. __init__.py package exports
6. __main__.py CLI with click
7. mcp_servers.json tool server config
8. tests/ fixtures
### Critical Rules
**Imports** (must match exactly only import what you use):
```python
from framework.graph import (
NodeSpec, EdgeSpec, EdgeCondition,
Goal, SuccessCriterion, Constraint,
)
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import (
AgentRuntime, create_agent_runtime,
)
from framework.runtime.execution_stream import EntryPointSpec
```
For agents with async entry points (timers, webhooks, events), also add:
```python
from framework.graph.edge import GraphSpec, AsyncEntryPointSpec
from framework.runtime.agent_runtime import (
AgentRuntime, AgentRuntimeConfig, create_agent_runtime,
)
```
NEVER `from core.framework...` PYTHONPATH includes core/.
**__init__.py MUST re-export ALL module-level variables** \
(THIS IS THE #1 SOURCE OF AGENT LOAD FAILURES):
The runner imports the package (__init__.py), NOT agent.py. It reads \
goal, nodes, edges, entry_node, entry_points, pause_nodes, \
terminal_nodes, conversation_mode, identity_prompt, loop_config via \
getattr(). If ANY are missing from __init__.py, they silently default \
to None or {} causing "must define goal, nodes, edges" or "node X \
is unreachable" errors. The __init__.py MUST import and re-export \
ALL of these from .agent:
```python
from .agent import (
MyAgent, default_agent, goal, nodes, edges,
entry_node, entry_points, pause_nodes, terminal_nodes,
conversation_mode, identity_prompt, loop_config,
)
```
**entry_points**: `{"start": "first-node-id"}`
For agents with multiple entry points (e.g. a reminder trigger), \
add them: `{"start": "intake", "reminder": "reminder"}`
**conversation_mode** ONLY two valid values:
- `"continuous"` recommended for interactive agents (context carries \
across node transitions)
- Omit entirely for isolated per-node conversations
NEVER use: "client_facing", "interactive", "adaptive", or any other \
value. These DO NOT EXIST.
**loop_config** ONLY three valid keys:
```python
loop_config = {
"max_iterations": 100,
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
}
```
NEVER add: "strategy", "mode", "timeout", or other keys.
**mcp_servers.json**:
```json
{
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools"
}
}
```
NO "mcpServers" wrapper. cwd "../../tools". command "uv".
**Storage**: `Path.home() / ".hive" / "agents" / "{name}"`
**Client-facing system prompts** STEP 1/STEP 2 pattern:
```
STEP 1 Present to user (text only, NO tool calls):
[instructions]
STEP 2 After user responds, call set_output:
[set_output calls]
```
**Autonomous system prompts** set_output in SEPARATE turn.
**Tools** NEVER fabricate tool names. Common hallucinations: \
csv_read, csv_write, csv_append, file_upload, database_query. \
If discover_mcp_tools() shows these don't exist, use alternatives \
(e.g. save_data/load_data for data persistence).
**Node rules**:
- **2-4 nodes MAX.** Never exceed 4. Merge thin nodes aggressively.
- A node with 0 tools is NOT a real node merge it.
- node_type always "event_loop"
- max_node_visits default is 0 (unbounded) correct for forever-alive. \
Only set >0 in one-shot agents with bounded feedback loops.
- Feedback inputs: nullable_output_keys
- terminal_nodes=[] for forever-alive (the default)
- Every node MUST have at least one outgoing edge (no dead ends)
- Agents are forever-alive unless user explicitly asks for one-shot
**Agent class**: CamelCase name, default_agent at module level. \
Constructor takes `config=None`. Follow the exact pattern in \
file_templates.md do NOT invent constructor params like \
`llm_provider` or `tool_registry`.
**Module-level variables** (read by AgentRunner.load()):
goal, nodes, edges, entry_node, entry_points, pause_nodes,
terminal_nodes, conversation_mode, identity_prompt, loop_config
For agents with async triggers, also export:
async_entry_points, runtime_config
**Async entry points** (timers, webhooks, events):
When an agent needs scheduled tasks, webhook reactions, or event-driven \
triggers, use `AsyncEntryPointSpec` (from framework.graph.edge) and \
`AgentRuntimeConfig` (from framework.runtime.agent_runtime):
- Timer (cron): `trigger_type="timer"`, \
`trigger_config={"cron": "0 9 * * *"}` standard 5-field cron expression \
(e.g. `"0 9 * * MON-FRI"` weekdays 9am, `"*/30 * * * *"` every 30 min)
- Timer (interval): `trigger_type="timer"`, \
`trigger_config={"interval_minutes": 20, "run_immediately": False}`
- Event (for webhooks): `trigger_type="event"`, \
`trigger_config={"event_types": ["webhook_received"]}`
- `isolation_level="shared"` so async runs can read primary session memory
- `runtime_config = AgentRuntimeConfig(webhook_routes=[...])` for HTTP webhooks
- Reference: `exports/gmail_inbox_guardian/agent.py`
- Full docs: `core/framework/agents/hive_coder/reference/framework_guide.md` \
(Async Entry Points section)
## 5. Verify
Run THREE validation steps after writing. All must pass:
**Step A Class validation** (checks graph structure):
```
run_command("python -c 'from {name} import default_agent; \\
print(default_agent.validate())'")
```
**Step B Runner load test** (checks package export contract \
THIS IS THE SAME PATH THE TUI USES):
```
run_command("python -c 'from framework.runner.runner import \\
AgentRunner; r = AgentRunner.load(\"exports/{name}\"); \\
print(\"AgentRunner.load: OK\")'")
```
This catches missing __init__.py exports, bad conversation_mode, \
invalid loop_config, and unreachable nodes. If Step A passes but \
Step B fails, the problem is in __init__.py exports.
**Step C Run tests:**
```
run_agent_tests("{name}")
```
If anything fails: read error, fix with edit_file, re-validate. Up to 3x.
**CRITICAL: Testing forever-alive agents**
Most agents use `terminal_nodes=[]` (forever-alive). This means \
`runner.run()` NEVER returns it hangs forever waiting for a \
terminal node that doesn't exist. Agent tests MUST be structural:
- Validate graph, node specs, edges, tools, prompts
- Check goal/constraints/success criteria definitions
- Test `AgentRunner.load()` + `_setup()` (skip if no API key)
- NEVER call `runner.run()` or `trigger_and_wait()` in tests for \
forever-alive agents they will hang and time out.
When you restructure an agent (change nodes/edges), always update \
the tests to match. Stale tests referencing old node names will fail.
## 6. Present
Show the user what you built: agent name, goal summary, graph ASCII \
art, files created, validation status. Offer to revise or build another.
After user confirms satisfaction:
set_output("agent_name", "the_agent_name")
set_output("validation_result", "valid")
If building another agent, just start the loop again no need to \
set_output until the user is done.
## 7. Live Test (optional)
After the user approves, offer to load and run the agent in-session. \
This runs it alongside you, with the Agent Guardian watching for \
failures automatically.
```
load_agent("exports/{name}") # registers as secondary graph
start_agent("{name}") # triggers default entry point
```
If the agent fails, the guardian fires and triages. You can also:
- `list_agents()` see all loaded graphs and status
- `restart_agent("{name}")` then `load_agent` pick up code changes
- `unload_agent("{name}")` remove it from the session
- `get_user_presence()` check if user is around
The agent runs in a shared session: it can read memory you've set and \
its outputs are visible to you. If the guardian escalates a failure, \
you'll see the error and can fix the code, then reload.
""",
tools=[
"read_file",
"write_file",
"edit_file",
"list_directory",
"search_files",
"run_command",
"undo_changes",
# Meta-agent tools
"discover_mcp_tools",
"list_agents",
"list_agent_sessions",
"get_agent_session_state",
"get_agent_session_memory",
"list_agent_checkpoints",
"get_agent_checkpoint",
"run_agent_tests",
# Graph lifecycle tools (multi-graph sessions)
"load_agent",
"unload_agent",
"start_agent",
"restart_agent",
"get_user_presence",
],
)
ALL_GUARDIAN_TOOLS = [
# File I/O — available when the agent has hive-tools MCP
"read_file",
"write_file",
"edit_file",
"search_files",
"run_command",
# Graph lifecycle — registered by attach_guardian()
"load_agent",
"unload_agent",
"start_agent",
"restart_agent",
"get_user_presence",
"list_agents",
]
guardian_node = NodeSpec(
id="guardian",
name="Agent Guardian",
description=(
"Event-driven guardian that monitors supervised agent graphs. "
"Triggers on failures, stalls, tool doom loops, and constraint "
"violations. Assesses severity, checks user presence, and decides: "
"ask the user (if present), attempt autonomous fix (if away), or "
"escalate for post-mortem."
),
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["event"],
output_keys=["resolution"],
nullable_output_keys=["resolution"],
success_criteria=(
"Failure is resolved — either by user guidance, autonomous fix, or documented escalation."
),
system_prompt="""\
You are the Agent Guardian a watchdog that monitors supervised agent \
graphs. You fire on failures, stalls, doom loops, and constraint \
violations. Your job: triage, fix, or escalate.
# Event Types
You trigger on these events:
## execution_failed
The agent graph crashed unhandled exception, LLM error, or tool failure.
- Read the error message and stack trace from the event data.
- Transient errors (rate limit, timeout, network): auto-retry via restart.
- Config errors (bad API key, missing tool): needs user input.
- Logic bugs (bad output, crash in code): read source, fix, reload.
- Catastrophic (data corruption): escalate, unload the agent.
## node_stalled
A node has been running too long without producing output. The LLM may \
be stuck in a reasoning loop, waiting for input that won't come, or \
the tool call is hanging.
- Check what node is stalled and how long it's been running.
- If the node is autonomous: restart the agent to break the stall.
- If the node is client-facing: check user presence the user may \
have left. Alert them or restart after a timeout.
- If a tool call is hanging: the MCP server may be down. Restart.
## node_tool_doom_loop
The LLM is calling the same tools repeatedly without making progress. \
This usually means the prompt is inadequate, the tool is returning \
unhelpful errors, or the LLM is stuck in a retry loop.
- Identify which tool is looping and what errors it's returning.
- If it's a transient tool error: restart to reset context.
- If it's a prompt/logic issue: read the node's source, fix the \
system prompt or tool configuration, then reload and restart.
- If the tool itself is broken: unload and escalate.
## constraint_violation
The agent violated a defined constraint (e.g., token budget exceeded, \
forbidden action attempted, output format invalid).
- Read which constraint was violated from the event data.
- Soft constraints (budget warning): log and notify user.
- Hard constraints (forbidden action): halt the agent immediately, \
escalate to user.
- Format violations: may be fixable by restarting with better context.
# Decision Protocol
1. **Identify the event type** and read the event data carefully.
2. **Assess severity:**
- Transient / auto-recoverable -> auto-retry
- Configuration / environment -> needs user input
- Logic bug / prompt issue -> needs code fix
- Catastrophic / safety -> escalate immediately
3. **Check user presence.** Call get_user_presence().
- **present** (idle < 2 min): Ask the user for guidance. Present the \
issue clearly and suggest options.
- **idle** (2-10 min): Attempt autonomous fix first. If it fails, \
queue a notification for when user returns.
- **away** (> 10 min) or **never_seen**: Attempt autonomous fix. \
Save escalation log via write_file if fix fails.
4. **Act.**
- Auto-retry: restart_agent(graph_id), then start_agent.
- Config issues: if user present, ask. If away, log and wait.
- Code fixes: read source, fix with edit_file, restart_agent.
- Escalation: save detailed log, unload the agent.
# Tools
- get_user_presence() -- check if user is active
- list_agents() -- see loaded graphs and status
- load_agent(path) -- load an agent graph
- unload_agent(graph_id) -- remove a graph
- start_agent(graph_id, entry_point, input_data) -- trigger execution
- restart_agent(graph_id) -- unload for reload
- read_file, write_file, edit_file -- inspect/fix agent source code \
(available when the agent's MCP server provides them)
- run_command -- run shell commands (available when provided by MCP)
# Rules
- Be concise. State the event type, your assessment, and your action.
- If asking the user, present the issue and 2-3 concrete options.
- After a fix attempt, verify it works before declaring success.
- For doom loops and stalls, prefer restart first it's the cheapest fix.
- set_output("resolution", "...") only after the issue is resolved or \
escalated. Use a brief description: "auto-fixed: retry after timeout", \
"escalated: missing API key", "user-resolved: updated config", \
"auto-fixed: restarted stalled node", "escalated: doom loop in tool X".
""",
# Placeholder — attach_guardian() replaces with filtered list at runtime
tools=ALL_GUARDIAN_TOOLS,
)
__all__ = ["coder_node", "guardian_node", "ALL_GUARDIAN_TOOLS"]
@@ -0,0 +1,107 @@
# Common Mistakes When Building Hive Agents
## Critical Errors
1. **Using tools that don't exist** — Always verify tools are available in the hive-tools MCP server before assigning them to nodes. Never guess tool names.
2. **Wrong entry_points format** — MUST be `{"start": "first-node-id"}`. NOT a set, NOT `{node_id: [keys]}`.
3. **Wrong mcp_servers.json format** — Flat dict (no `"mcpServers"` wrapper). `cwd` must be `"../../tools"`. `command` must be `"uv"` with args `["run", "python", ...]`.
4. **Missing STEP 1/STEP 2 in client-facing prompts** — Without explicit phases, the LLM calls set_output before the user responds. Always use the pattern.
5. **Forgetting nullable_output_keys** — When a node receives inputs from multiple edges and some inputs only arrive on certain edges (e.g., feedback), mark those as nullable. Without this, the executor blocks waiting for a value that will never arrive.
6. **Creating dead-end nodes in forever-alive graphs** — Every node must have at least one outgoing edge. A node with no outgoing edges ends the execution, breaking the loop.
7. **Setting max_node_visits to a non-zero value in forever-alive agents** — The framework default is `max_node_visits=0` (unbounded). Setting it to any positive value (e.g., 1) means the node stops executing after that many visits, silently breaking the forever-alive loop. Only set `max_node_visits > 0` in one-shot agents with feedback loops that need bounded retries.
7. **Missing module-level exports in `__init__.py`** — The runner loads agents via `importlib.import_module(package_name)`, which imports `__init__.py`. It then reads `goal`, `nodes`, `edges`, `entry_node`, `entry_points`, `pause_nodes`, `terminal_nodes`, `conversation_mode`, `identity_prompt`, `loop_config` via `getattr()`. If ANY of these are missing from `__init__.py`, they default to `None` or `{}` — causing "must define goal, nodes, edges" errors or "node X is unreachable" validation failures. **ALL module-level variables from agent.py must be re-exported in `__init__.py`.**
## Value Errors
8. **Invalid `conversation_mode` value** — Only two valid values: `"continuous"` (recommended for interactive agents) or omit entirely (for isolated per-node conversations). Values like `"client_facing"`, `"interactive"`, `"adaptive"` do NOT exist and will cause runtime errors.
9. **Invalid `loop_config` keys** — Only three valid keys: `max_iterations` (int), `max_tool_calls_per_turn` (int), `max_history_tokens` (int). Keys like `"strategy"`, `"mode"`, `"timeout"` are NOT valid and are silently ignored or cause errors.
10. **Fabricating tools that don't exist** — Never guess tool names. Always verify via `discover_mcp_tools()`. Common hallucinations: `csv_read`, `csv_write`, `csv_append`, `file_upload`, `database_query`. If a required tool doesn't exist, redesign the agent to use tools that DO exist (e.g., `save_data`/`load_data` for data persistence).
## Design Errors
11. **Too many thin nodes** — Hard limit: **2-4 nodes** for most agents. Each node boundary serializes outputs to shared memory and loses all in-context information (tool results, intermediate reasoning, conversation history). A node with 0 tools that just does LLM reasoning is NOT a real node — merge it into its predecessor or successor.
**Merge when:**
- Node has NO tools — pure LLM reasoning belongs in the node that produces or consumes its data
- Node sets only 1 trivial output (e.g., `set_output("done", "true")`) — collapse into predecessor
- Multiple consecutive autonomous nodes with same/similar tools — combine into one
- A "report" or "summary" node that just presents analysis — merge into the client-facing node
- A "schedule" or "confirm" node that doesn't actually schedule anything — remove entirely
**Keep separate when:**
- Client-facing vs autonomous — different interaction models require separate nodes
- Fundamentally different tool sets (e.g., web search vs file I/O)
- Fan-out parallelism — parallel branches MUST be separate nodes
**Bad example** (7 nodes — WAY too many):
```
profile_setup → daily_intake → update_tracker → analyze_progress → generate_plan → schedule_reminders → report
```
`analyze_progress` has no tools. `schedule_reminders` just sets one boolean. `report` just presents analysis. `update_tracker` and `generate_plan` are sequential autonomous work.
**Good example** (3 nodes):
```
intake (client-facing) → process (autonomous: track + analyze + plan) → intake (loop back)
```
One client-facing node handles ALL user interaction (setup, logging, reports). One autonomous node handles ALL backend work (CSV update, analysis, plan generation) with tools and context preserved.
12. **Adding framework gating for LLM behavior** — Don't add output rollback, premature rejection, or interaction protocol injection. Fix with better prompts or custom judges.
13. **Not using continuous conversation mode** — Interactive agents should use `conversation_mode="continuous"`. Without it, each node starts with blank context.
14. **Adding terminal nodes by default** — ALL agents should use `terminal_nodes=[]` (forever-alive) unless the user explicitly requests a one-shot/batch agent. Forever-alive is the standard pattern. Every node must have at least one outgoing edge. Dead-end nodes break the loop.
15. **Calling set_output in same turn as tool calls** — Instruct the LLM to call set_output in a SEPARATE turn from real tool calls.
## File Template Errors
16. **Wrong import paths** — Use `from framework.graph import ...`, NOT `from core.framework.graph import ...`. The PYTHONPATH includes `core/`.
17. **Missing storage path** — Agent class must set `self._storage_path = Path.home() / ".hive" / "agents" / "agent_name"`.
18. **Missing mcp_servers.json** — Without this, the agent has no tools at runtime.
19. **Bare `python` command in mcp_servers.json** — Use `"command": "uv"` with args `["run", "python", ...]`.
## Testing Errors
20. **Using `runner.run()` on forever-alive agents**`runner.run()` calls `trigger_and_wait()` which blocks until the graph reaches a terminal node. Forever-alive agents have `terminal_nodes=[]`, so **`runner.run()` hangs forever**. This is the #1 cause of stuck test suites.
**For forever-alive agents, write structural tests instead:**
- Validate graph structure (nodes, edges, entry points)
- Verify node specs (tools, prompts, client-facing flag)
- Check goal/constraints/success criteria definitions
- Test that `AgentRunner.load()` + `_setup()` succeeds (skip if no API key)
**What NOT to do:**
```python
# WRONG — hangs forever on forever-alive agents
result = await runner.run({"topic": "quantum computing"})
```
**Correct pattern for structure tests:**
```python
def test_research_has_web_tools(self):
assert "web_search" in research_node.tools
def test_research_routes_back_to_interact(self):
edges_to_interact = [e for e in edges if e.source == "research" and e.target == "interact"]
assert edges_to_interact
```
21. **Stale tests after agent restructuring** — When you change an agent's node count or names (e.g., 4 nodes → 2 nodes), the tests MUST be updated too. Tests referencing old node names (e.g., `"review"`, `"report"`) will fail or hang. Always check that test assertions match the current `nodes/__init__.py`.
22. **Running full integration tests without API keys** — Structural tests (validate, import) work without keys. Full integration tests need `ANTHROPIC_API_KEY`. Use `pytest.skip()` in the runner fixture when `_setup()` fails due to missing credentials.
23. **Forgetting sys.path setup in conftest.py** — Tests need `exports/` and `core/` on sys.path.
24. **Not using auto_responder for client-facing nodes** — Tests with client-facing nodes hang without an auto-responder that injects input. But note: even WITH auto_responder, forever-alive agents still hang because the graph never terminates. Auto-responder only helps for agents with terminal nodes.
@@ -0,0 +1,597 @@
# Agent File Templates
Complete code templates for each file in a Hive agent package.
## config.py
```python
"""Runtime configuration."""
import json
from dataclasses import dataclass, field
from pathlib import Path
def _load_preferred_model() -> str:
"""Load preferred model from ~/.hive/configuration.json."""
config_path = Path.home() / ".hive" / "configuration.json"
if config_path.exists():
try:
with open(config_path) as f:
config = json.load(f)
llm = config.get("llm", {})
if llm.get("provider") and llm.get("model"):
return f"{llm['provider']}/{llm['model']}"
except Exception:
pass
return "anthropic/claude-sonnet-4-20250514"
@dataclass
class RuntimeConfig:
model: str = field(default_factory=_load_preferred_model)
temperature: float = 0.7
max_tokens: int = 40000
api_key: str | None = None
api_base: str | None = None
default_config = RuntimeConfig()
@dataclass
class AgentMetadata:
name: str = "My Agent Name"
version: str = "1.0.0"
description: str = "What this agent does."
intro_message: str = "Welcome! What would you like me to do?"
metadata = AgentMetadata()
```
## nodes/__init__.py
```python
"""Node definitions for My Agent."""
from framework.graph import NodeSpec
# Node 1: Intake (client-facing)
intake_node = NodeSpec(
id="intake",
name="Intake",
description="Gather requirements from the user",
node_type="event_loop",
client_facing=True,
max_node_visits=0, # Unlimited for forever-alive
input_keys=["topic"],
output_keys=["brief"],
success_criteria="The brief is specific and actionable.",
system_prompt="""\
You are an intake specialist.
**STEP 1 — Read and respond (text only, NO tool calls):**
1. Read the topic provided
2. If vague, ask 1-2 clarifying questions
3. If clear, confirm your understanding
**STEP 2 — After the user confirms, call set_output:**
- set_output("brief", "Clear description of what to do")
""",
tools=[],
)
# Node 2: Worker (autonomous)
worker_node = NodeSpec(
id="worker",
name="Worker",
description="Do the main work",
node_type="event_loop",
max_node_visits=0,
input_keys=["brief", "feedback"],
output_keys=["results"],
nullable_output_keys=["feedback"], # Only on feedback edge
success_criteria="Results are complete and accurate.",
system_prompt="""\
You are a worker agent. Given a brief, do the work.
If feedback is provided, this is a follow-up — address the feedback.
Work in phases:
1. Use tools to gather/process data
2. Analyze results
3. Call set_output for each key in a SEPARATE turn:
- set_output("results", "structured results")
""",
tools=["web_search", "web_scrape", "save_data", "load_data", "list_data_files"],
)
# Node 3: Review (client-facing)
review_node = NodeSpec(
id="review",
name="Review",
description="Present results for user approval",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["results", "brief"],
output_keys=["next_action", "feedback"],
nullable_output_keys=["feedback"],
success_criteria="User has reviewed and decided next steps.",
system_prompt="""\
Present the results to the user.
**STEP 1 — Present (text only, NO tool calls):**
1. Summary of work done
2. Key results
3. Ask: satisfied, or want changes?
**STEP 2 — After user responds, call set_output:**
- set_output("next_action", "new_topic") — if starting fresh
- set_output("next_action", "revise") — if changes needed
- set_output("feedback", "what to change") — only if revising
""",
tools=[],
)
__all__ = ["intake_node", "worker_node", "review_node"]
```
## agent.py
```python
"""Agent graph construction for My Agent."""
from pathlib import Path
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import intake_node, worker_node, review_node
# Goal definition
goal = Goal(
id="my-agent-goal",
name="My Agent Goal",
description="What this agent achieves.",
success_criteria=[
SuccessCriterion(id="sc-1", description="...", metric="...", target="...", weight=0.5),
SuccessCriterion(id="sc-2", description="...", metric="...", target="...", weight=0.5),
],
constraints=[
Constraint(id="c-1", description="...", constraint_type="hard", category="quality"),
],
)
# Node list
nodes = [intake_node, worker_node, review_node]
# Edge definitions
edges = [
EdgeSpec(id="intake-to-worker", source="intake", target="worker",
condition=EdgeCondition.ON_SUCCESS, priority=1),
EdgeSpec(id="worker-to-review", source="worker", target="review",
condition=EdgeCondition.ON_SUCCESS, priority=1),
# Feedback loop
EdgeSpec(id="review-to-worker", source="review", target="worker",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'revise'", priority=2),
# Loop back for new topic
EdgeSpec(id="review-to-intake", source="review", target="intake",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'new_topic'", priority=1),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = [] # Forever-alive
# Module-level vars read by AgentRunner.load()
conversation_mode = "continuous"
identity_prompt = "You are a helpful agent."
loop_config = {"max_iterations": 100, "max_tool_calls_per_turn": 20, "max_history_tokens": 32000}
class MyAgent:
def __init__(self, config=None):
self.config = config or default_config
self.goal = goal
self.nodes = nodes
self.edges = edges
self.entry_node = entry_node
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._graph = None
self._agent_runtime = None
self._tool_registry = None
self._storage_path = None
def _build_graph(self):
return GraphSpec(
id="my-agent-graph",
goal_id=self.goal.id,
version="1.0.0",
entry_node=self.entry_node,
entry_points=self.entry_points,
terminal_nodes=self.terminal_nodes,
pause_nodes=self.pause_nodes,
nodes=self.nodes,
edges=self.edges,
default_model=self.config.model,
max_tokens=self.config.max_tokens,
loop_config=loop_config,
conversation_mode=conversation_mode,
identity_prompt=identity_prompt,
)
def _setup(self, mock_mode=False):
self._storage_path = Path.home() / ".hive" / "agents" / "my_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
self._tool_registry = ToolRegistry()
mcp_config = Path(__file__).parent / "mcp_servers.json"
if mcp_config.exists():
self._tool_registry.load_mcp_config(mcp_config)
llm = None
if not mock_mode:
llm = LiteLLMProvider(model=self.config.model, api_key=self.config.api_key, api_base=self.config.api_base)
tools = list(self._tool_registry.get_tools().values())
tool_executor = self._tool_registry.get_executor()
self._graph = self._build_graph()
self._agent_runtime = create_agent_runtime(
graph=self._graph, goal=self.goal, storage_path=self._storage_path,
entry_points=[EntryPointSpec(id="default", name="Default", entry_node=self.entry_node,
trigger_type="manual", isolation_level="shared")],
llm=llm, tools=tools, tool_executor=tool_executor,
checkpoint_config=CheckpointConfig(enabled=True, checkpoint_on_node_complete=True,
checkpoint_max_age_days=7, async_checkpoint=True),
)
async def start(self, mock_mode=False):
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self):
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(self, entry_point="default", input_data=None, timeout=None, session_state=None):
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point, input_data=input_data or {}, session_state=session_state)
async def run(self, context, mock_mode=False, session_state=None):
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait("default", context, session_state=session_state)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
await self.stop()
def info(self):
return {
"name": metadata.name, "version": metadata.version, "description": metadata.description,
"goal": {"name": self.goal.name, "description": self.goal.description},
"nodes": [n.id for n in self.nodes], "edges": [e.id for e in self.edges],
"entry_node": self.entry_node, "entry_points": self.entry_points,
"terminal_nodes": self.terminal_nodes,
"client_facing_nodes": [n.id for n in self.nodes if n.client_facing],
}
def validate(self):
errors, warnings = [], []
node_ids = {n.id for n in self.nodes}
for e in self.edges:
if e.source not in node_ids: errors.append(f"Edge {e.id}: source '{e.source}' not found")
if e.target not in node_ids: errors.append(f"Edge {e.id}: target '{e.target}' not found")
if self.entry_node not in node_ids: errors.append(f"Entry node '{self.entry_node}' not found")
for t in self.terminal_nodes:
if t not in node_ids: errors.append(f"Terminal node '{t}' not found")
for ep_id, nid in self.entry_points.items():
if nid not in node_ids: errors.append(f"Entry point '{ep_id}' references unknown node '{nid}'")
return {"valid": len(errors) == 0, "errors": errors, "warnings": warnings}
default_agent = MyAgent()
```
## agent.py — Async Entry Points Variant
When an agent needs timers, webhooks, or event-driven triggers, add
`async_entry_points` and optionally `runtime_config` as module-level variables.
These are IN ADDITION to the standard variables above.
```python
# Additional imports for async entry points
from framework.graph.edge import GraphSpec, AsyncEntryPointSpec
from framework.runtime.agent_runtime import (
AgentRuntime, AgentRuntimeConfig, create_agent_runtime,
)
# ... (goal, nodes, edges, entry_node, entry_points, etc. as above) ...
# Async entry points — event-driven triggers
async_entry_points = [
# Timer with cron: daily at 9am
AsyncEntryPointSpec(
id="daily-check",
name="Daily Check",
entry_node="process-node",
trigger_type="timer",
trigger_config={"cron": "0 9 * * *"},
isolation_level="shared",
max_concurrent=1,
),
# Timer with fixed interval: every 20 minutes
AsyncEntryPointSpec(
id="scheduled-check",
name="Scheduled Check",
entry_node="process-node",
trigger_type="timer",
trigger_config={"interval_minutes": 20, "run_immediately": False},
isolation_level="shared",
max_concurrent=1,
),
# Event: reacts to webhook events
AsyncEntryPointSpec(
id="webhook-event",
name="Webhook Event Handler",
entry_node="process-node",
trigger_type="event",
trigger_config={"event_types": ["webhook_received"]},
isolation_level="shared",
max_concurrent=10,
),
]
# Webhook server config (only needed if using webhooks)
runtime_config = AgentRuntimeConfig(
webhook_host="127.0.0.1",
webhook_port=8080,
webhook_routes=[
{
"source_id": "my-source",
"path": "/webhooks/my-source",
"methods": ["POST"],
},
],
)
```
**Key rules for async entry points:**
- `async_entry_points` is a list of `AsyncEntryPointSpec` (NOT `EntryPointSpec`)
- `runtime_config` is `AgentRuntimeConfig` (NOT `RuntimeConfig` from config.py)
- Valid trigger_types: `timer`, `event`, `webhook`, `manual`, `api`
- Valid isolation_levels: `isolated`, `shared`, `synchronized`
- Timer trigger_config (cron): `{"cron": "0 9 * * *"}` — standard 5-field cron expression
- Timer trigger_config (interval): `{"interval_minutes": float, "run_immediately": bool}`
- Event trigger_config: `{"event_types": ["webhook_received"], "filter_stream": "...", "filter_node": "..."}`
- Use `isolation_level="shared"` for async entry points that need to read
the primary session's memory (e.g., user-configured rules)
- The `_build_graph()` method passes `async_entry_points` to GraphSpec
- Reference: `exports/gmail_inbox_guardian/agent.py`
## __init__.py
**CRITICAL:** The runner imports the package (`__init__.py`) and reads ALL module-level
variables via `getattr()`. Every variable defined in `agent.py` that the runner needs
MUST be re-exported here. Missing exports cause silent failures (variables default to
`None` or `{}`), leading to "must define goal, nodes, edges" errors or graph validation
failures like "node X is unreachable".
```python
"""My Agent — description."""
from .agent import (
MyAgent,
default_agent,
goal,
nodes,
edges,
entry_node,
entry_points,
pause_nodes,
terminal_nodes,
conversation_mode,
identity_prompt,
loop_config,
)
from .config import default_config, metadata
__all__ = [
"MyAgent",
"default_agent",
"goal",
"nodes",
"edges",
"entry_node",
"entry_points",
"pause_nodes",
"terminal_nodes",
"conversation_mode",
"identity_prompt",
"loop_config",
"default_config",
"metadata",
]
```
**If the agent uses async entry points**, also import and export:
```python
from .agent import (
...,
async_entry_points,
runtime_config, # Only if using webhooks
)
__all__ = [
...,
"async_entry_points",
"runtime_config",
]
```
## __main__.py
```python
"""CLI entry point for My Agent."""
import asyncio, json, logging, sys
import click
from .agent import default_agent, MyAgent
def setup_logging(verbose=False, debug=False):
if debug: level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
elif verbose: level, fmt = logging.INFO, "%(message)s"
else: level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
@click.group()
@click.version_option(version="1.0.0")
def cli():
"""My Agent — description."""
pass
@cli.command()
@click.option("--topic", "-t", required=True)
@click.option("--mock", is_flag=True)
@click.option("--verbose", "-v", is_flag=True)
def run(topic, mock, verbose):
"""Execute the agent."""
setup_logging(verbose=verbose)
result = asyncio.run(default_agent.run({"topic": topic}, mock_mode=mock))
click.echo(json.dumps({"success": result.success, "output": result.output}, indent=2, default=str))
sys.exit(0 if result.success else 1)
@cli.command()
@click.option("--mock", is_flag=True)
def tui(mock):
"""Launch TUI dashboard."""
from pathlib import Path
from framework.tui.app import AdenTUI
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
async def run_tui():
agent = MyAgent()
agent._tool_registry = ToolRegistry()
storage = Path.home() / ".hive" / "agents" / "my_agent"
storage.mkdir(parents=True, exist_ok=True)
mcp_cfg = Path(__file__).parent / "mcp_servers.json"
if mcp_cfg.exists(): agent._tool_registry.load_mcp_config(mcp_cfg)
llm = None if mock else LiteLLMProvider(model=agent.config.model, api_key=agent.config.api_key, api_base=agent.config.api_base)
runtime = create_agent_runtime(
graph=agent._build_graph(), goal=agent.goal, storage_path=storage,
entry_points=[EntryPointSpec(id="start", name="Start", entry_node="intake", trigger_type="manual", isolation_level="isolated")],
llm=llm, tools=list(agent._tool_registry.get_tools().values()), tool_executor=agent._tool_registry.get_executor())
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_tui())
@cli.command()
def info():
"""Show agent info."""
data = default_agent.info()
click.echo(f"Agent: {data['name']}\nVersion: {data['version']}\nDescription: {data['description']}")
click.echo(f"Nodes: {', '.join(data['nodes'])}\nClient-facing: {', '.join(data['client_facing_nodes'])}")
@cli.command()
def validate():
"""Validate agent structure."""
v = default_agent.validate()
if v["valid"]: click.echo("Agent is valid")
else:
click.echo("Errors:")
for e in v["errors"]: click.echo(f" {e}")
sys.exit(0 if v["valid"] else 1)
if __name__ == "__main__":
cli()
```
## mcp_servers.json
```json
{
"hive-tools": {
"transport": "stdio",
"command": "uv",
"args": ["run", "python", "mcp_server.py", "--stdio"],
"cwd": "../../tools",
"description": "Hive tools MCP server"
}
}
```
**CRITICAL FORMAT RULES:**
- NO `"mcpServers"` wrapper (flat dict, not nested)
- `cwd` MUST be `"../../tools"` (relative from `exports/AGENT_NAME/` to `tools/`)
- `command` MUST be `"uv"` with `"args": ["run", "python", ...]` (NOT bare `"python"`)
## tests/conftest.py
```python
"""Test fixtures."""
import sys
from pathlib import Path
import pytest
import pytest_asyncio
_repo_root = Path(__file__).resolve().parents[3]
for _p in ["exports", "core"]:
_path = str(_repo_root / _p)
if _path not in sys.path:
sys.path.insert(0, _path)
AGENT_PATH = str(Path(__file__).resolve().parents[1])
@pytest.fixture(scope="session")
def mock_mode():
return True
@pytest_asyncio.fixture(scope="session")
async def runner(tmp_path_factory, mock_mode):
from framework.runner.runner import AgentRunner
storage = tmp_path_factory.mktemp("agent_storage")
r = AgentRunner.load(AGENT_PATH, mock_mode=mock_mode, storage_path=storage)
r._setup()
yield r
await r.cleanup_async()
```
## entry_points Format
MUST be: `{"start": "first-node-id"}`
NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
NOT: `{"first-node-id"}` (WRONG — this is a set)
@@ -0,0 +1,433 @@
# Hive Agent Framework — Condensed Reference
## Architecture
Agents are Python packages in `exports/`:
```
exports/my_agent/
├── __init__.py # MUST re-export ALL module-level vars from agent.py
├── __main__.py # CLI (run, tui, info, validate, shell)
├── agent.py # Graph construction (goal, edges, agent class)
├── config.py # Runtime config
├── nodes/__init__.py # Node definitions (NodeSpec)
├── mcp_servers.json # MCP tool server config
└── tests/ # pytest tests
```
## Agent Loading Contract
`AgentRunner.load()` imports the package (`__init__.py`) and reads these
module-level variables via `getattr()`:
| Variable | Required | Default if missing | Consequence |
|----------|----------|--------------------|-------------|
| `goal` | YES | `None` | **FATAL** — "must define goal, nodes, edges" |
| `nodes` | YES | `None` | **FATAL** — same error |
| `edges` | YES | `None` | **FATAL** — same error |
| `entry_node` | no | `nodes[0].id` | Probably wrong node |
| `entry_points` | no | `{}` | **Nodes unreachable** — validation fails |
| `terminal_nodes` | no | `[]` | OK for forever-alive |
| `pause_nodes` | no | `[]` | OK |
| `conversation_mode` | no | not passed | Isolated mode (no context carryover) |
| `identity_prompt` | no | not passed | No agent-level identity |
| `loop_config` | no | `{}` | No iteration limits |
| `async_entry_points` | no | `[]` | No async triggers (timers, webhooks, events) |
| `runtime_config` | no | `None` | No webhook server |
**CRITICAL:** `__init__.py` MUST import and re-export ALL of these from
`agent.py`. Missing exports silently fall back to defaults, causing
hard-to-debug failures.
**Why `default_agent.validate()` is NOT sufficient:**
`validate()` checks the agent CLASS's internal graph (self.nodes, self.edges).
These are always correct because the constructor references agent.py's module
vars directly. But `AgentRunner.load()` reads from the PACKAGE (`__init__.py`),
not the class. So `validate()` passes while `AgentRunner.load()` fails.
Always test with `AgentRunner.load("exports/{name}")` — this is the same
code path the TUI and `hive run` use.
## Goal
Defines success criteria and constraints:
```python
goal = Goal(
id="kebab-case-id",
name="Display Name",
description="What the agent does",
success_criteria=[
SuccessCriterion(id="sc-id", description="...", metric="...", target="...", weight=0.25),
],
constraints=[
Constraint(id="c-id", description="...", constraint_type="hard", category="quality"),
],
)
```
- 3-5 success criteria, weights sum to 1.0
- 1-5 constraints (hard/soft, categories: quality, accuracy, interaction, functional)
## NodeSpec Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| id | str | required | kebab-case identifier |
| name | str | required | Display name |
| description | str | required | What the node does |
| node_type | str | required | Always `"event_loop"` |
| input_keys | list[str] | required | Memory keys this node reads |
| output_keys | list[str] | required | Memory keys this node writes via set_output |
| system_prompt | str | "" | LLM instructions |
| tools | list[str] | [] | Tool names from MCP servers |
| client_facing | bool | False | If True, streams to user and blocks for input |
| nullable_output_keys | list[str] | [] | Keys that may remain unset |
| max_node_visits | int | 0 | 0=unlimited (default); >1 for one-shot feedback loops |
| max_retries | int | 3 | Retries on failure |
| success_criteria | str | "" | Natural language for judge evaluation |
## EdgeSpec Fields
| Field | Type | Description |
|-------|------|-------------|
| id | str | kebab-case identifier |
| source | str | Source node ID |
| target | str | Target node ID |
| condition | EdgeCondition | ON_SUCCESS, ON_FAILURE, ALWAYS, CONDITIONAL |
| condition_expr | str | Python expression evaluated against memory (for CONDITIONAL) |
| priority | int | Positive=forward (evaluated first), negative=feedback (loop-back) |
## Key Patterns
### STEP 1/STEP 2 (Client-Facing Nodes)
```
**STEP 1 — Respond to the user (text only, NO tool calls):**
[Present information, ask questions]
**STEP 2 — After the user responds, call set_output:**
- set_output("key", "value based on user response")
```
This prevents premature set_output before user interaction.
### Fewer, Richer Nodes (CRITICAL)
**Hard limit: 2-4 nodes for most agents.** Never exceed 5 unless the user
explicitly requests a complex multi-phase pipeline.
Each node boundary serializes outputs to shared memory and **destroys** all
in-context information: tool call results, intermediate reasoning, conversation
history. A research node that searches, fetches, and analyzes in ONE node keeps
all source material in its conversation context. Split across 3 nodes, each
downstream node only sees the serialized summary string.
**Decision framework — merge unless ANY of these apply:**
1. **Client-facing boundary** — Autonomous and client-facing work MUST be
separate nodes (different interaction models)
2. **Disjoint tool sets** — If tools are fundamentally different (e.g., web
search vs database), separate nodes make sense
3. **Parallel execution** — Fan-out branches must be separate nodes
**Red flags that you have too many nodes:**
- A node with 0 tools (pure LLM reasoning) → merge into predecessor/successor
- A node that sets only 1 trivial output → collapse into predecessor
- Multiple consecutive autonomous nodes → combine into one rich node
- A "report" node that presents analysis → merge into the client-facing node
- A "confirm" or "schedule" node that doesn't call any external service → remove
**Typical agent structure (3 nodes):**
```
intake (client-facing) ←→ process (autonomous) ←→ review (client-facing)
```
Or for simpler agents, just 2 nodes:
```
interact (client-facing) → process (autonomous) → interact (loop)
```
### nullable_output_keys
For inputs that only arrive on certain edges:
```python
research_node = NodeSpec(
input_keys=["brief", "feedback"],
nullable_output_keys=["feedback"], # Only present on feedback edge
max_node_visits=3,
)
```
### Mutually Exclusive Outputs
For routing decisions:
```python
review_node = NodeSpec(
output_keys=["approved", "feedback"],
nullable_output_keys=["approved", "feedback"], # Node sets one or the other
)
```
### Forever-Alive Pattern
`terminal_nodes=[]` — every node has outgoing edges, graph loops until user exits.
Use `conversation_mode="continuous"` to preserve context across transitions.
### set_output
- Synthetic tool injected by framework
- Call separately from real tool calls (separate turn)
- `set_output("key", "value")` stores to shared memory
## Edge Conditions
| Condition | When |
|-----------|------|
| ON_SUCCESS | Node completed successfully |
| ON_FAILURE | Node failed |
| ALWAYS | Unconditional |
| CONDITIONAL | condition_expr evaluates to True against memory |
condition_expr examples:
- `"needs_more_research == True"`
- `"str(next_action).lower() == 'new_agent'"`
- `"feedback is not None"`
## Graph Lifecycle
| Pattern | terminal_nodes | When |
|---------|---------------|------|
| **Forever-alive** | `[]` | **DEFAULT for all agents** |
| Linear | `["last-node"]` | Only if user explicitly requests one-shot/batch |
**Forever-alive is the default.** Always use `terminal_nodes=[]`.
The framework default for `max_node_visits` is 0 (unbounded), so
nodes work correctly in forever-alive loops without explicit override.
Only set `max_node_visits > 0` in one-shot agents with feedback loops.
Every node must have at least one outgoing edge — no dead ends. The
user exits by closing the TUI. Only use terminal nodes if the user
explicitly asks for a batch/one-shot agent that runs once and exits.
## Continuous Conversation Mode
`conversation_mode` has ONLY two valid states:
- `"continuous"` — recommended for interactive agents
- Omit entirely — isolated per-node conversations (each node starts fresh)
**INVALID values** (do NOT use): `"client_facing"`, `"interactive"`,
`"adaptive"`, `"shared"`. These do not exist in the framework.
When `conversation_mode="continuous"`:
- Same conversation thread carries across node transitions
- Layered system prompts: identity (agent-level) + narrative + focus (per-node)
- Transition markers inserted at boundaries
- Compaction happens opportunistically at phase transitions
## loop_config
Only three valid keys:
```python
loop_config = {
"max_iterations": 100, # Max LLM turns per node visit
"max_tool_calls_per_turn": 20, # Max tool calls per LLM response
"max_history_tokens": 32000, # Triggers conversation compaction
}
```
**INVALID keys** (do NOT use): `"strategy"`, `"mode"`, `"timeout"`,
`"temperature"`. These are silently ignored or cause errors.
## Data Tools (Spillover)
For large data that exceeds context:
- `save_data(filename, data)` — Write to session data dir
- `load_data(filename, offset, limit)` — Read with pagination
- `list_data_files()` — List files
- `serve_file_to_user(filename, label)` — Clickable file:// URI
`data_dir` is auto-injected by framework — LLM never sees it.
## Fan-Out / Fan-In
Multiple ON_SUCCESS edges from same source → parallel execution via asyncio.gather().
- Parallel nodes must have disjoint output_keys
- Only one branch may have client_facing nodes
- Fan-in node gets all outputs in shared memory
## Judge System
- **Implicit** (default): ACCEPTs when LLM finishes with no tool calls and all required outputs set
- **SchemaJudge**: Validates against Pydantic model
- **Custom**: Implement `evaluate(context) -> JudgeVerdict`
Judge is the SOLE acceptance mechanism — no ad-hoc framework gating.
## Async Entry Points (Webhooks, Timers, Events)
For agents that need to react to external events (incoming emails, scheduled
tasks, API calls), use `AsyncEntryPointSpec` and optionally `AgentRuntimeConfig`.
### Imports
```python
from framework.graph.edge import GraphSpec, AsyncEntryPointSpec
from framework.runtime.agent_runtime import AgentRuntime, AgentRuntimeConfig, create_agent_runtime
```
Note: `AsyncEntryPointSpec` is in `framework.graph.edge` (the graph/declarative layer).
`AgentRuntimeConfig` is in `framework.runtime.agent_runtime` (the runtime layer).
### AsyncEntryPointSpec Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| id | str | required | Unique identifier |
| name | str | required | Human-readable name |
| entry_node | str | required | Node ID to start execution from |
| trigger_type | str | `"manual"` | `webhook`, `api`, `timer`, `event`, `manual` |
| trigger_config | dict | `{}` | Trigger-specific config (see below) |
| isolation_level | str | `"shared"` | `isolated`, `shared`, `synchronized` |
| priority | int | `0` | Execution priority (higher = more priority) |
| max_concurrent | int | `10` | Max concurrent executions |
### Trigger Types
**timer** — Fires on a schedule. Two modes: cron expressions or fixed interval.
Cron (preferred for precise scheduling):
```python
AsyncEntryPointSpec(
id="daily-digest",
name="Daily Digest",
entry_node="check-node",
trigger_type="timer",
trigger_config={"cron": "0 9 * * *"}, # daily at 9am
isolation_level="shared",
max_concurrent=1,
)
```
- `cron` (str) — standard cron expression (5 fields: min hour dom month dow)
- Examples: `"0 9 * * *"` (daily 9am), `"0 9 * * MON-FRI"` (weekdays 9am), `"*/30 * * * *"` (every 30 min)
Fixed interval (simpler, for polling-style tasks):
```python
AsyncEntryPointSpec(
id="scheduled-check",
name="Scheduled Check",
entry_node="check-node",
trigger_type="timer",
trigger_config={"interval_minutes": 20, "run_immediately": False},
isolation_level="shared",
max_concurrent=1,
)
```
- `interval_minutes` (float) — how often to fire
- `run_immediately` (bool, default False) — fire once on startup
**event** — Subscribes to EventBus (e.g., webhook events):
```python
AsyncEntryPointSpec(
id="email-event",
name="Email Event Handler",
entry_node="process-emails",
trigger_type="event",
trigger_config={"event_types": ["webhook_received"]},
isolation_level="shared",
max_concurrent=10,
)
```
- `event_types` (list[str]) — EventType values to subscribe to
- `filter_stream` (str, optional) — only receive from this stream
- `filter_node` (str, optional) — only receive from this node
**webhook** — HTTP endpoint (requires AgentRuntimeConfig):
The webhook server publishes `WEBHOOK_RECEIVED` events on the EventBus.
An `event` trigger type with `event_types: ["webhook_received"]` subscribes
to those events. The flow is:
```
HTTP POST /webhooks/gmail → WebhookServer → EventBus (WEBHOOK_RECEIVED)
→ event entry point → triggers graph execution from entry_node
```
**manual** — Triggered programmatically via `runtime.trigger()`.
### Isolation Levels
| Level | Meaning |
|-------|---------|
| `isolated` | Private state per execution |
| `shared` | Eventual consistency — async executions can read primary session memory |
| `synchronized` | Shared with write locks (use when ordering matters) |
For most async patterns, use `shared` — the async execution reads the primary
session's memory (e.g., user-configured rules) and runs its own workflow.
### AgentRuntimeConfig (for webhook servers)
```python
from framework.runtime.agent_runtime import AgentRuntimeConfig
runtime_config = AgentRuntimeConfig(
webhook_host="127.0.0.1",
webhook_port=8080,
webhook_routes=[
{
"source_id": "gmail",
"path": "/webhooks/gmail",
"methods": ["POST"],
"secret": None, # Optional HMAC-SHA256 secret
},
],
)
```
`runtime_config` is a module-level variable read by `AgentRunner.load()`.
The runner passes it to `create_agent_runtime()`. On `runtime.start()`,
if webhook_routes is non-empty, an embedded HTTP server starts.
### Session Sharing
Timer and event triggers automatically call `_get_primary_session_state()`
before execution. This finds the active user-facing session and provides
its memory to the async execution, filtered to only the async entry node's
`input_keys`. This means the async flow can read user-configured values
(like rules, preferences) without needing separate configuration.
### Module-Level Variables
Agents with async entry points must export two additional variables:
```python
# In agent.py:
async_entry_points = [AsyncEntryPointSpec(...), ...]
runtime_config = AgentRuntimeConfig(...) # Only if using webhooks
```
Both must be re-exported from `__init__.py`:
```python
from .agent import (
..., async_entry_points, runtime_config,
)
```
### Reference Agent
See `exports/gmail_inbox_guardian/agent.py` for a complete example with:
- Primary client-facing intake node (user configures rules)
- Timer-based scheduled inbox checks (every 20 min)
- Webhook-triggered email event handling
- Shared isolation for memory access across streams
## Framework Capabilities
**Works well:** Multi-turn conversations, HITL review, tool orchestration, structured outputs, parallel execution, context management, error recovery, session persistence.
**Limitations:** LLM latency (2-10s/turn), context window limits (~128K), cost per run, rate limits, node boundaries lose context.
**Not designed for:** Sub-second responses, millions of items, real-time streaming, guaranteed determinism, offline/air-gapped.
## Tool Discovery
Do NOT rely on a static tool list — it will be outdated. Always use
`discover_mcp_tools()` to get the current tool catalog from the
hive-tools MCP server. This returns full schemas including parameter
names, types, and descriptions.
```
discover_mcp_tools() # default: hive-tools
discover_mcp_tools("exports/my_agent/mcp_servers.json") # specific agent
```
Common tool categories (verify via discover_mcp_tools):
- **Web**: search, scrape, PDF
- **Data**: save/load/append/list data files, serve to user
- **File**: view, write, replace, diff, list, grep
- **Communication**: email, gmail, slack, telegram
- **CRM**: hubspot, apollo, calcom
- **GitHub**: stargazers, user profiles, repos
- **Vision**: image analysis
- **Time**: current time
@@ -0,0 +1,31 @@
"""Test fixtures for Hive Coder agent."""
import sys
from pathlib import Path
import pytest
import pytest_asyncio
_repo_root = Path(__file__).resolve().parents[3]
for _p in ["exports", "core"]:
_path = str(_repo_root / _p)
if _path not in sys.path:
sys.path.insert(0, _path)
AGENT_PATH = str(Path(__file__).resolve().parents[1])
@pytest.fixture(scope="session")
def mock_mode():
return True
@pytest_asyncio.fixture(scope="session")
async def runner(tmp_path_factory, mock_mode):
from framework.runner.runner import AgentRunner
storage = tmp_path_factory.mktemp("agent_storage")
r = AgentRunner.load(AGENT_PATH, mock_mode=mock_mode, storage_path=storage)
r._setup()
yield r
await r.cleanup_async()
+9 -11
View File
@@ -245,20 +245,14 @@ class GraphBuilder:
warnings.append(f"Node '{node.id}' should have a description")
# Type-specific validation
if node.node_type == "llm_tool_use":
if not node.tools:
errors.append(f"LLM tool node '{node.id}' must specify tools")
if not node.system_prompt:
warnings.append(f"LLM node '{node.id}' should have a system_prompt")
if node.node_type == "event_loop":
if node.tools and not node.system_prompt:
warnings.append(f"Event loop node '{node.id}' should have a system_prompt")
if node.node_type == "router":
if not node.routes:
errors.append(f"Router node '{node.id}' must specify routes")
if node.node_type == "function":
if not node.function:
errors.append(f"Function node '{node.id}' must specify function name")
# Check input/output keys
if not node.input_keys:
suggestions.append(f"Consider specifying input_keys for '{node.id}'")
@@ -400,9 +394,13 @@ class GraphBuilder:
if not terminal_candidates and self.session.nodes:
warnings.append("No terminal nodes found (all nodes have outgoing edges)")
# Check reachability
# Check reachability from ALL entry candidates (not just the first one).
# Agents with async entry points have multiple nodes with no incoming
# edges (e.g., a primary entry node and an event-driven entry node).
if entry_candidates and self.session.nodes:
reachable = self._compute_reachable(entry_candidates[0])
reachable = set()
for candidate in entry_candidates:
reachable |= self._compute_reachable(candidate)
unreachable = [n.id for n in self.session.nodes if n.id not in reachable]
if unreachable:
errors.append(f"Unreachable nodes: {unreachable}")
+10 -3
View File
@@ -11,9 +11,9 @@ Usage:
Testing commands:
hive test-run <agent_path> --goal <goal_id>
hive test-debug <goal_id> <test_id>
hive test-list <goal_id>
hive test-stats <goal_id>
hive test-debug <agent_path> <test_name>
hive test-list <agent_path>
hive test-stats <agent_path>
"""
import argparse
@@ -56,6 +56,13 @@ def _configure_paths():
if (project_root / "core").is_dir() and core_str not in sys.path:
sys.path.insert(0, core_str)
# Add core/framework/agents/ so framework agents are importable as top-level packages
framework_agents_dir = project_root / "core" / "framework" / "agents"
if framework_agents_dir.is_dir():
fa_str = str(framework_agents_dir)
if fa_str not in sys.path:
sys.path.insert(0, fa_str)
def main():
_configure_paths()
+55 -3
View File
@@ -6,6 +6,7 @@ helper functions.
"""
import json
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
@@ -24,7 +25,7 @@ def get_hive_config() -> dict[str, Any]:
if not HIVE_CONFIG_FILE.exists():
return {}
try:
with open(HIVE_CONFIG_FILE) as f:
with open(HIVE_CONFIG_FILE, encoding="utf-8-sig") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
@@ -48,6 +49,56 @@ def get_max_tokens() -> int:
return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
def get_api_key() -> str | None:
"""Return the API key, supporting env var, Claude Code subscription, and ZAI Code.
Priority:
1. Claude Code subscription (``use_claude_code_subscription: true``)
reads the OAuth token from ``~/.claude/.credentials.json``.
2. Environment variable named in ``api_key_env_var``.
"""
llm = get_hive_config().get("llm", {})
# Claude Code subscription: read OAuth token directly
if llm.get("use_claude_code_subscription"):
try:
from framework.runner.runner import get_claude_code_token
token = get_claude_code_token()
if token:
return token
except ImportError:
pass
# Standard env-var path (covers ZAI Code and all API-key providers)
api_key_env_var = llm.get("api_key_env_var")
if api_key_env_var:
return os.environ.get(api_key_env_var)
return None
def get_api_base() -> str | None:
"""Return the api_base URL for OpenAI-compatible endpoints, if configured."""
return get_hive_config().get("llm", {}).get("api_base")
def get_llm_extra_kwargs() -> dict[str, Any]:
"""Return extra kwargs for LiteLLMProvider (e.g. OAuth headers).
When ``use_claude_code_subscription`` is enabled, returns
``extra_headers`` with the OAuth Bearer token so that litellm's
built-in Anthropic OAuth handler adds the required beta headers.
"""
llm = get_hive_config().get("llm", {})
if llm.get("use_claude_code_subscription"):
api_key = get_api_key()
if api_key:
return {
"extra_headers": {"authorization": f"Bearer {api_key}"},
}
return {}
# ---------------------------------------------------------------------------
# RuntimeConfig shared across agent templates
# ---------------------------------------------------------------------------
@@ -60,5 +111,6 @@ class RuntimeConfig:
model: str = field(default_factory=get_preferred_model)
temperature: float = 0.7
max_tokens: int = field(default_factory=get_max_tokens)
api_key: str | None = None
api_base: str | None = None
api_key: str | None = field(default_factory=get_api_key)
api_base: str | None = field(default_factory=get_api_base)
extra_kwargs: dict[str, Any] = field(default_factory=get_llm_extra_kwargs)
+17
View File
@@ -59,6 +59,13 @@ from .provider import (
CredentialProvider,
StaticProvider,
)
from .setup import (
CredentialSetupSession,
MissingCredential,
SetupResult,
detect_missing_credentials_from_nodes,
run_credential_setup_cli,
)
from .storage import (
CompositeStorage,
CredentialStorage,
@@ -68,6 +75,7 @@ from .storage import (
)
from .store import CredentialStore
from .template import TemplateResolver
from .validation import ensure_credential_key_env, validate_agent_credentials
# Aden sync components (lazy import to avoid httpx dependency when not needed)
# Usage: from core.framework.credentials.aden import AdenSyncProvider
@@ -111,6 +119,15 @@ __all__ = [
"CredentialRefreshError",
"CredentialValidationError",
"CredentialDecryptionError",
# Validation
"ensure_credential_key_env",
"validate_agent_credentials",
# Interactive setup
"CredentialSetupSession",
"MissingCredential",
"SetupResult",
"detect_missing_credentials_from_nodes",
"run_credential_setup_cli",
# Aden sync (optional - requires httpx)
"AdenSyncProvider",
"AdenCredentialClient",
+155 -202
View File
@@ -1,29 +1,31 @@
"""
Aden Credential Client.
HTTP client for communicating with the Aden authentication server.
The Aden server handles OAuth2 authorization flows and token management.
This client fetches tokens and delegates refresh operations to Aden.
HTTP client for the Aden authentication server.
Aden holds all OAuth secrets; agents receive only short-lived access tokens.
API (all endpoints authenticated with Bearer {api_key}):
GET /v1/credentials list integrations
GET /v1/credentials/{integration_id} get access token (auto-refreshes)
POST /v1/credentials/{integration_id}/refresh force refresh
GET /v1/credentials/{integration_id}/validate check validity
Integration IDs are base64-encoded hashes assigned by the Aden platform
(e.g. "Z29vZ2xlOlRpbW90aHk6MTYwNjc6MTM2ODQ"), NOT provider names.
Usage:
# API key loaded from ADEN_API_KEY environment variable by default
client = AdenCredentialClient(AdenClientConfig(
base_url="https://api.adenhq.com",
))
# Or explicitly provide the API key
client = AdenCredentialClient(AdenClientConfig(
base_url="https://api.adenhq.com",
api_key="your-api-key",
))
# List what's connected
for info in client.list_integrations():
print(f"{info.provider}/{info.alias}: {info.status}")
# Fetch a credential
response = client.get_credential("hubspot")
if response:
print(f"Token expires at: {response.expires_at}")
# Request a refresh
refreshed = client.request_refresh("hubspot")
# Get an access token
cred = client.get_credential(info.integration_id)
print(cred.access_token)
"""
from __future__ import annotations
@@ -88,8 +90,7 @@ class AdenClientConfig:
"""Base URL of the Aden server (e.g., 'https://api.adenhq.com')."""
api_key: str | None = None
"""Agent's API key for authenticating with Aden.
If not provided, loaded from ADEN_API_KEY environment variable."""
"""Agent API key. Loaded from ADEN_API_KEY env var if not provided."""
tenant_id: str | None = None
"""Optional tenant ID for multi-tenant deployments."""
@@ -104,7 +105,6 @@ class AdenClientConfig:
"""Base delay between retries in seconds (exponential backoff)."""
def __post_init__(self) -> None:
"""Load API key from environment if not provided."""
if self.api_key is None:
self.api_key = os.environ.get("ADEN_API_KEY")
if not self.api_key:
@@ -115,86 +115,124 @@ class AdenClientConfig:
@dataclass
class AdenCredentialResponse:
"""Response from Aden server containing credential data."""
class AdenIntegrationInfo:
"""An integration from GET /v1/credentials.
Example response item::
{
"integration_id": "Z29vZ2xlOlRpbW90aHk6MTYwNjc6MTM2ODQ",
"provider": "google",
"alias": "Timothy",
"status": "active",
"email": "timothy@acho.io",
"expires_at": "2026-02-20T21:46:04.863Z"
}
"""
integration_id: str
"""Unique identifier for the integration (e.g., 'hubspot')."""
"""Base64-encoded hash ID assigned by Aden."""
integration_type: str
"""Type of integration (e.g., 'hubspot', 'github', 'slack')."""
provider: str
"""Provider type (e.g. "google", "slack", "hubspot")."""
access_token: str
"""The access token for API calls."""
alias: str
"""User-set alias on the Aden platform."""
token_type: str = "Bearer"
"""Token type (usually 'Bearer')."""
status: str
"""Status: "active", "expired", "requires_reauth"."""
email: str = ""
"""Email associated with this connection."""
expires_at: datetime | None = None
"""When the access token expires (UTC)."""
"""When the current access token expires."""
scopes: list[str] = field(default_factory=list)
"""OAuth2 scopes granted to this token."""
metadata: dict[str, Any] = field(default_factory=dict)
"""Additional integration-specific metadata."""
# Backward compat — old code reads integration_type
@property
def integration_type(self) -> str:
return self.provider
@classmethod
def from_dict(
cls, data: dict[str, Any], integration_id: str | None = None
) -> AdenCredentialResponse:
"""Create from API response dictionary or normalized credential dict."""
def from_dict(cls, data: dict[str, Any]) -> AdenIntegrationInfo:
expires_at = None
if data.get("expires_at"):
expires_at = datetime.fromisoformat(data["expires_at"].replace("Z", "+00:00"))
resolved_integration_id = (
integration_id
or data.get("integration_id")
or data.get("alias")
or data.get("provider", "")
)
resolved_integration_type = data.get("integration_type") or data.get("provider", "")
metadata = data.get("metadata")
if metadata is None and data.get("email"):
metadata = {"email": data.get("email")}
if metadata is None:
metadata = {}
return cls(
integration_id=resolved_integration_id,
integration_type=resolved_integration_type,
access_token=data["access_token"],
token_type=data.get("token_type", "Bearer"),
integration_id=data.get("integration_id", ""),
provider=data.get("provider", ""),
alias=data.get("alias", ""),
status=data.get("status", "unknown"),
email=data.get("email", ""),
expires_at=expires_at,
scopes=data.get("scopes", []),
metadata=metadata,
)
@dataclass
class AdenIntegrationInfo:
"""Information about an available integration."""
class AdenCredentialResponse:
"""Response from GET /v1/credentials/{integration_id}.
Example::
{
"access_token": "ya29.a0AfH6SM...",
"token_type": "Bearer",
"expires_at": "2026-02-20T12:00:00.000Z",
"provider": "google",
"alias": "Timothy",
"email": "timothy@acho.io"
}
"""
integration_id: str
integration_type: str
status: str # "active", "requires_reauth", "expired"
"""The integration_id used in the request."""
access_token: str
"""Short-lived access token for API calls."""
token_type: str = "Bearer"
expires_at: datetime | None = None
provider: str = ""
"""Provider type (e.g. "google")."""
alias: str = ""
"""User-set alias."""
email: str = ""
"""Email associated with this connection."""
scopes: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
# Backward compat
@property
def integration_type(self) -> str:
return self.provider
@classmethod
def from_dict(cls, data: dict[str, Any]) -> AdenIntegrationInfo:
"""Create from API response dictionary."""
def from_dict(cls, data: dict[str, Any], integration_id: str = "") -> AdenCredentialResponse:
expires_at = None
if data.get("expires_at"):
expires_at = datetime.fromisoformat(data["expires_at"].replace("Z", "+00:00"))
# Build metadata from email if present
metadata = data.get("metadata") or {}
if not metadata and data.get("email"):
metadata = {"email": data["email"]}
return cls(
integration_id=data["integration_id"],
integration_type=data.get("provider", data["integration_id"]),
status=data.get("status", "unknown"),
integration_id=integration_id or data.get("integration_id", ""),
access_token=data["access_token"],
token_type=data.get("token_type", "Bearer"),
expires_at=expires_at,
provider=data.get("provider", ""),
alias=data.get("alias", ""),
email=data.get("email", ""),
scopes=data.get("scopes", []),
metadata=metadata,
)
@@ -202,56 +240,33 @@ class AdenCredentialClient:
"""
HTTP client for Aden credential server.
Handles communication with the Aden authentication server,
including fetching credentials, requesting refreshes, and
reporting usage statistics.
The client automatically handles:
- Retries with exponential backoff for transient failures
- Proper error classification (auth, not found, rate limit, etc.)
- Request headers for authentication and tenant isolation
Usage:
# API key loaded from ADEN_API_KEY environment variable
config = AdenClientConfig(
client = AdenCredentialClient(AdenClientConfig(
base_url="https://api.adenhq.com",
)
))
client = AdenCredentialClient(config)
# List integrations
for info in client.list_integrations():
print(f"{info.provider}/{info.alias}: {info.status}")
# Fetch a credential
cred = client.get_credential("hubspot")
if cred:
headers = {"Authorization": f"Bearer {cred.access_token}"}
# Get access token (uses base64 integration_id, NOT provider name)
cred = client.get_credential(info.integration_id)
headers = {"Authorization": f"Bearer {cred.access_token}"}
# List all integrations
integrations = client.list_integrations()
for info in integrations:
print(f"{info.integration_id}: {info.status}")
# Clean up
client.close()
"""
def __init__(self, config: AdenClientConfig):
"""
Initialize the Aden client.
Args:
config: Client configuration including base URL and API key.
"""
self.config = config
self._client: httpx.Client | None = None
def _get_client(self) -> httpx.Client:
"""Get or create the HTTP client."""
if self._client is None:
headers = {
"Authorization": f"Bearer {self.config.api_key}",
"Content-Type": "application/json",
"User-Agent": "hive-credential-store/1.0",
}
if self.config.tenant_id:
headers["X-Tenant-ID"] = self.config.tenant_id
@@ -260,7 +275,6 @@ class AdenCredentialClient:
timeout=self.config.timeout,
headers=headers,
)
return self._client
def _request_with_retry(
@@ -277,10 +291,13 @@ class AdenCredentialClient:
try:
response = client.request(method, path, **kwargs)
# Handle specific error codes
if response.status_code == 401:
raise AdenAuthenticationError("Agent API key is invalid or revoked")
if response.status_code == 403:
data = response.json()
raise AdenClientError(data.get("message", "Forbidden"))
if response.status_code == 404:
raise AdenNotFoundError(f"Integration not found: {path}")
@@ -293,14 +310,15 @@ class AdenCredentialClient:
if response.status_code == 400:
data = response.json()
if data.get("error") == "refresh_failed":
msg = data.get("message", "Bad request")
if data.get("error") == "refresh_failed" or "refresh" in msg.lower():
raise AdenRefreshError(
data.get("message", "Token refresh failed"),
msg,
requires_reauthorization=data.get("requires_reauthorization", False),
reauthorization_url=data.get("reauthorization_url"),
)
raise AdenClientError(f"Bad request: {msg}")
# Success or other error
response.raise_for_status()
return response
@@ -321,30 +339,40 @@ class AdenCredentialClient:
AdenRefreshError,
AdenRateLimitError,
):
# Don't retry these errors
raise
# Should not reach here, but just in case
raise AdenClientError(
f"Request failed after {self.config.retry_attempts} attempts"
) from last_error
def get_credential(self, integration_id: str) -> AdenCredentialResponse | None:
def list_integrations(self) -> list[AdenIntegrationInfo]:
"""
Fetch the current credential for an integration.
List all integrations for this agent's team.
The Aden server may refresh the token internally if it's expired
before returning it.
Args:
integration_id: The integration identifier (e.g., 'hubspot').
GET /v1/credentials {"integrations": [...]}
Returns:
Credential response with access token, or None if not found.
List of AdenIntegrationInfo with integration_id, provider,
alias, status, email, expires_at.
"""
response = self._request_with_retry("GET", "/v1/credentials")
data = response.json()
return [AdenIntegrationInfo.from_dict(item) for item in data.get("integrations", [])]
Raises:
AdenAuthenticationError: If API key is invalid.
AdenClientError: For connection failures.
# Alias
list_connections = list_integrations
def get_credential(self, integration_id: str) -> AdenCredentialResponse | None:
"""
Get access token for an integration. Auto-refreshes if near expiry.
GET /v1/credentials/{integration_id}
Args:
integration_id: Base64 hash ID from list_integrations().
Returns:
AdenCredentialResponse with access_token, or None if not found.
"""
try:
response = self._request_with_retry("GET", f"/v1/credentials/{integration_id}")
@@ -355,100 +383,34 @@ class AdenCredentialClient:
def request_refresh(self, integration_id: str) -> AdenCredentialResponse:
"""
Request the Aden server to refresh the token.
Force refresh the access token.
Use this when the local store detects an expired or near-expiry token.
The Aden server handles the actual OAuth2 refresh token flow.
POST /v1/credentials/{integration_id}/refresh
Args:
integration_id: The integration identifier.
integration_id: Base64 hash ID.
Returns:
Credential response with new access token.
Raises:
AdenRefreshError: If refresh fails (may require re-authorization).
AdenNotFoundError: If integration not found.
AdenAuthenticationError: If API key is invalid.
AdenRateLimitError: If rate limited.
AdenCredentialResponse with new access_token.
"""
response = self._request_with_retry("POST", f"/v1/credentials/{integration_id}/refresh")
data = response.json()
return AdenCredentialResponse.from_dict(data, integration_id=integration_id)
def list_integrations(self) -> list[AdenIntegrationInfo]:
"""
List all integrations available for this agent/tenant.
Returns:
List of integration info objects.
Raises:
AdenAuthenticationError: If API key is invalid.
AdenClientError: For connection failures.
"""
response = self._request_with_retry("GET", "/v1/credentials")
data = response.json()
return [AdenIntegrationInfo.from_dict(item) for item in data.get("integrations", [])]
def validate_token(self, integration_id: str) -> dict[str, Any]:
"""
Check if a token is still valid without fetching it.
Check if an integration's OAuth connection is valid.
Args:
integration_id: The integration identifier.
GET /v1/credentials/{integration_id}/validate
Returns:
Dict with 'valid' bool and optional 'expires_at', 'reason',
'requires_reauthorization', 'reauthorization_url'.
Raises:
AdenNotFoundError: If integration not found.
AdenAuthenticationError: If API key is invalid.
{"valid": bool, "status": str, "expires_at": str, "error": str|null}
"""
response = self._request_with_retry("GET", f"/v1/credentials/{integration_id}/validate")
return response.json()
def report_usage(
self,
integration_id: str,
operation: str,
status: str = "success",
metadata: dict[str, Any] | None = None,
) -> None:
"""
Report credential usage statistics to Aden.
This is optional and used for analytics/billing.
Args:
integration_id: The integration identifier.
operation: Operation name (e.g., 'api_call').
status: Operation status ('success', 'error').
metadata: Additional operation metadata.
"""
try:
self._request_with_retry(
"POST",
f"/v1/credentials/{integration_id}/usage",
json={
"operation": operation,
"status": status,
"timestamp": datetime.utcnow().isoformat() + "Z",
"metadata": metadata or {},
},
)
except Exception as e:
# Usage reporting is best-effort, don't fail on errors
logger.warning(f"Failed to report usage for '{integration_id}': {e}")
def health_check(self) -> dict[str, Any]:
"""
Check Aden server health and connectivity.
Returns:
Dict with 'status', 'version', 'timestamp', and optionally 'error'.
"""
"""Check Aden server health."""
try:
client = self._get_client()
response = client.get("/health")
@@ -456,26 +418,17 @@ class AdenCredentialClient:
data = response.json()
data["latency_ms"] = response.elapsed.total_seconds() * 1000
return data
return {
"status": "degraded",
"error": f"Unexpected status code: {response.status_code}",
}
return {"status": "degraded", "error": f"HTTP {response.status_code}"}
except Exception as e:
return {
"status": "unhealthy",
"error": str(e),
}
return {"status": "unhealthy", "error": str(e)}
def close(self) -> None:
"""Close the HTTP client and release resources."""
if self._client:
self._client.close()
self._client = None
def __enter__(self) -> AdenCredentialClient:
"""Context manager entry."""
return self
def __exit__(self, *args: Any) -> None:
"""Context manager exit."""
self.close()
+35 -7
View File
@@ -282,8 +282,8 @@ class AdenSyncProvider(CredentialProvider):
"""
Sync all credentials from Aden server to local store.
Fetches the list of available integrations from Aden and
populates the local credential store with current tokens.
Calls GET /v1/credentials to list integrations, then fetches
access tokens for each active one.
Args:
store: The credential store to populate.
@@ -298,9 +298,7 @@ class AdenSyncProvider(CredentialProvider):
for info in integrations:
if info.status != "active":
logger.warning(
f"Skipping integration '{info.integration_id}': status={info.status}"
)
logger.warning(f"Skipping connection '{info.alias}': status={info.status}")
continue
try:
@@ -308,9 +306,9 @@ class AdenSyncProvider(CredentialProvider):
if cred:
store.save_credential(cred)
synced += 1
logger.info(f"Synced credential '{info.integration_id}' from Aden")
logger.info(f"Synced credential '{info.alias}' from Aden")
except Exception as e:
logger.warning(f"Failed to sync '{info.integration_id}': {e}")
logger.warning(f"Failed to sync '{info.alias}': {e}")
except AdenClientError as e:
logger.error(f"Failed to list integrations from Aden: {e}")
@@ -373,6 +371,21 @@ class AdenSyncProvider(CredentialProvider):
value=SecretStr(aden_response.integration_type),
)
# Store alias (user-set name from Aden platform)
if aden_response.alias:
credential.keys["_alias"] = CredentialKey(
name="_alias",
value=SecretStr(aden_response.alias),
)
# Persist Aden metadata as identity keys
for meta_key, meta_value in (aden_response.metadata or {}).items():
if meta_value and isinstance(meta_value, str):
credential.keys[f"_identity_{meta_key}"] = CredentialKey(
name=f"_identity_{meta_key}",
value=SecretStr(meta_value),
)
# Update timestamps
credential.last_refreshed = datetime.now(UTC)
credential.provider_id = self.provider_id
@@ -400,12 +413,27 @@ class AdenSyncProvider(CredentialProvider):
),
}
# Store alias (user-set name from Aden platform)
if aden_response.alias:
keys["_alias"] = CredentialKey(
name="_alias",
value=SecretStr(aden_response.alias),
)
if aden_response.scopes:
keys["scope"] = CredentialKey(
name="scope",
value=SecretStr(" ".join(aden_response.scopes)),
)
# Persist Aden metadata as identity keys
for meta_key, meta_value in (aden_response.metadata or {}).items():
if meta_value and isinstance(meta_value, str):
keys[f"_identity_{meta_key}"] = CredentialKey(
name=f"_identity_{meta_key}",
value=SecretStr(meta_value),
)
return CredentialObject(
id=aden_response.integration_id,
credential_type=CredentialType.OAUTH2,
+72 -24
View File
@@ -114,8 +114,10 @@ class AdenCachedStorage(CredentialStorage):
self._cache_ttl = timedelta(seconds=cache_ttl_seconds)
self._prefer_local = prefer_local
self._cache_timestamps: dict[str, datetime] = {}
# Index: provider name (e.g., "hubspot") -> credential hash ID
self._provider_index: dict[str, str] = {}
# Index: provider name (e.g., "hubspot") -> list of credential hash IDs
self._provider_index: dict[str, list[str]] = {}
# Index: "provider:alias" -> credential hash ID (for alias-based routing)
self._alias_index: dict[str, str] = {}
def save(self, credential: CredentialObject) -> None:
"""
@@ -160,14 +162,16 @@ class AdenCachedStorage(CredentialStorage):
CredentialObject if found, None otherwise.
"""
# Check provider index first — Aden-synced credentials take priority
resolved_id = self._provider_index.get(credential_id)
if resolved_id and resolved_id != credential_id:
result = self._load_by_id(resolved_id)
if result is not None:
logger.info(
f"Loaded credential '{credential_id}' via provider index (id='{resolved_id}')"
)
return result
resolved_ids = self._provider_index.get(credential_id)
if resolved_ids:
for rid in resolved_ids:
if rid != credential_id:
result = self._load_by_id(rid)
if result is not None:
logger.info(
f"Loaded credential '{credential_id}' via provider index (id='{rid}')"
)
return result
# Direct lookup (exact credential_id match)
return self._load_by_id(credential_id)
@@ -208,6 +212,22 @@ class AdenCachedStorage(CredentialStorage):
# Return local credential if it exists (may be None)
return local_cred
def load_all_for_provider(self, provider_name: str) -> list[CredentialObject]:
"""Load all credentials for a given provider type.
Args:
provider_name: Provider name (e.g. "google", "slack").
Returns:
List of CredentialObjects for all accounts of this provider.
"""
results: list[CredentialObject] = []
for cid in self._provider_index.get(provider_name, []):
cred = self._load_by_id(cid)
if cred:
results.append(cred)
return results
def delete(self, credential_id: str) -> bool:
"""
Delete credential from local cache.
@@ -246,9 +266,11 @@ class AdenCachedStorage(CredentialStorage):
if self._local.exists(credential_id):
return True
# Check provider index
resolved_id = self._provider_index.get(credential_id)
if resolved_id and resolved_id != credential_id:
return self._local.exists(resolved_id)
resolved_ids = self._provider_index.get(credential_id)
if resolved_ids:
for rid in resolved_ids:
if rid != credential_id and self._local.exists(rid):
return True
return False
def _is_cache_fresh(self, credential_id: str) -> bool:
@@ -285,13 +307,15 @@ class AdenCachedStorage(CredentialStorage):
def _index_provider(self, credential: CredentialObject) -> None:
"""
Index a credential by its provider/integration type.
Index a credential by its provider/integration type and alias.
Aden credentials carry an ``_integration_type`` key whose value is
the provider name (e.g., ``hubspot``). This method maps that
provider name to the credential's hash ID so that subsequent
``load("hubspot")`` calls resolve to the correct credential.
Also indexes by ``_alias`` for alias-based multi-account routing.
Args:
credential: The credential to index.
"""
@@ -300,19 +324,45 @@ class AdenCachedStorage(CredentialStorage):
return
provider_name = integration_type_key.value.get_secret_value()
if provider_name:
self._provider_index[provider_name] = credential.id
if provider_name not in self._provider_index:
self._provider_index[provider_name] = []
if credential.id not in self._provider_index[provider_name]:
self._provider_index[provider_name].append(credential.id)
logger.debug(f"Indexed provider '{provider_name}' -> '{credential.id}'")
# Index by alias for multi-account routing
alias_key = credential.keys.get("_alias")
if alias_key:
alias = alias_key.value.get_secret_value()
if alias:
self._alias_index[f"{provider_name}:{alias}"] = credential.id
def load_by_alias(self, provider_name: str, alias: str) -> CredentialObject | None:
"""Load a credential by provider name and alias.
Args:
provider_name: Provider type (e.g. "google", "slack").
alias: User-set alias from the Aden platform.
Returns:
CredentialObject if found, None otherwise.
"""
cred_id = self._alias_index.get(f"{provider_name}:{alias}")
if cred_id:
return self._load_by_id(cred_id)
return None
def rebuild_provider_index(self) -> int:
"""
Rebuild the provider index from all locally cached credentials.
Rebuild the provider and alias indexes from all locally cached credentials.
Useful after loading from disk when the in-memory index is empty.
Useful after loading from disk when the in-memory indexes are empty.
Returns:
Number of provider mappings indexed.
"""
self._provider_index.clear()
self._alias_index.clear()
indexed = 0
for cred_id in self._local.list_all():
cred = self._local.load(cred_id)
@@ -328,8 +378,8 @@ class AdenCachedStorage(CredentialStorage):
"""
Sync all credentials from Aden server to local cache.
Fetches the list of available integrations from Aden and
updates the local cache with current tokens.
Calls GET /v1/credentials to list active integrations,
then fetches tokens for each.
Returns:
Number of credentials synced.
@@ -341,9 +391,7 @@ class AdenCachedStorage(CredentialStorage):
for info in integrations:
if info.status != "active":
logger.warning(
f"Skipping integration '{info.integration_id}': status={info.status}"
)
logger.warning(f"Skipping integration '{info.alias}': status={info.status}")
continue
try:
@@ -351,9 +399,9 @@ class AdenCachedStorage(CredentialStorage):
if cred:
self.save(cred)
synced += 1
logger.info(f"Synced credential '{info.integration_id}' from Aden")
logger.info(f"Synced credential '{info.alias}' from Aden")
except Exception as e:
logger.warning(f"Failed to sync '{info.integration_id}': {e}")
logger.warning(f"Failed to sync '{info.alias}': {e}")
except Exception as e:
logger.error(f"Failed to list integrations from Aden: {e}")
@@ -61,11 +61,13 @@ def mock_client(aden_config):
def aden_response():
"""Create a sample Aden credential response."""
return AdenCredentialResponse(
integration_id="hubspot",
integration_type="hubspot",
integration_id="aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1",
access_token="test-access-token",
token_type="Bearer",
expires_at=datetime.now(UTC) + timedelta(hours=1),
provider="hubspot",
alias="My HubSpot",
email="test@example.com",
scopes=["crm.objects.contacts.read", "crm.objects.contacts.write"],
metadata={"portal_id": "12345"},
)
@@ -108,18 +110,20 @@ class TestAdenCredentialResponse:
"""Tests for AdenCredentialResponse dataclass."""
def test_from_dict_basic(self):
"""Test creating response from dict."""
"""Test creating response from dict (real get-token format)."""
data = {
"integration_id": "github",
"integration_type": "github",
"access_token": "ghp_xxxxx",
"token_type": "Bearer",
"provider": "github",
"alias": "Work",
}
response = AdenCredentialResponse.from_dict(data)
response = AdenCredentialResponse.from_dict(data, integration_id="Z2l0aHViOldvcms6MTIzNDU")
assert response.integration_id == "github"
assert response.integration_type == "github"
assert response.integration_id == "Z2l0aHViOldvcms6MTIzNDU"
assert response.access_token == "ghp_xxxxx"
assert response.provider == "github"
assert response.integration_type == "github" # backward compat property
assert response.token_type == "Bearer"
assert response.expires_at is None
assert response.scopes == []
@@ -127,19 +131,23 @@ class TestAdenCredentialResponse:
def test_from_dict_full(self):
"""Test creating response with all fields."""
data = {
"integration_id": "hubspot",
"integration_type": "hubspot",
"access_token": "token123",
"token_type": "Bearer",
"expires_at": "2026-01-28T15:30:00Z",
"provider": "hubspot",
"alias": "My HubSpot",
"email": "test@example.com",
"scopes": ["read", "write"],
"metadata": {"key": "value"},
}
response = AdenCredentialResponse.from_dict(data)
response = AdenCredentialResponse.from_dict(data, integration_id="aHVic3BvdDp0ZXN0")
assert response.integration_id == "hubspot"
assert response.integration_id == "aHVic3BvdDp0ZXN0"
assert response.access_token == "token123"
assert response.provider == "hubspot"
assert response.alias == "My HubSpot"
assert response.email == "test@example.com"
assert response.expires_at is not None
assert response.scopes == ["read", "write"]
assert response.metadata == {"key": "value"}
@@ -149,21 +157,44 @@ class TestAdenIntegrationInfo:
"""Tests for AdenIntegrationInfo dataclass."""
def test_from_dict(self):
"""Test creating integration info from dict."""
"""Test creating integration info from real API format."""
data = {
"integration_id": "slack",
"integration_type": "slack",
"integration_id": "c2xhY2s6V29yayBTbGFjazoxMjM0NQ",
"provider": "slack",
"alias": "Work Slack",
"status": "active",
"expires_at": "2026-02-01T00:00:00Z",
"email": "user@example.com",
"expires_at": "2026-02-20T21:46:04.863Z",
}
info = AdenIntegrationInfo.from_dict(data)
assert info.integration_id == "slack"
assert info.integration_type == "slack"
assert info.integration_id == "c2xhY2s6V29yayBTbGFjazoxMjM0NQ"
assert info.provider == "slack"
assert info.integration_type == "slack" # backward compat property
assert info.alias == "Work Slack"
assert info.email == "user@example.com"
assert info.status == "active"
assert info.expires_at is not None
def test_from_dict_minimal(self):
"""Test creating integration info with minimal fields."""
data = {
"integration_id": "Z29vZ2xlOlRpbW90aHk6MTYwNjc",
"provider": "google",
"alias": "Timothy",
"status": "requires_reauth",
}
info = AdenIntegrationInfo.from_dict(data)
assert info.integration_id == "Z29vZ2xlOlRpbW90aHk6MTYwNjc"
assert info.provider == "google"
assert info.alias == "Timothy"
assert info.status == "requires_reauth"
assert info.email == ""
assert info.expires_at is None
# =============================================================================
# AdenSyncProvider Tests
@@ -220,10 +251,11 @@ class TestAdenSyncProvider:
def test_refresh_success(self, provider, mock_client, aden_response):
"""Test successful credential refresh."""
hash_id = "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
mock_client.request_refresh.return_value = aden_response
cred = CredentialObject(
id="hubspot",
id=hash_id,
credential_type=CredentialType.OAUTH2,
keys={
"access_token": CredentialKey(
@@ -239,7 +271,7 @@ class TestAdenSyncProvider:
assert refreshed.keys["access_token"].value.get_secret_value() == "test-access-token"
assert refreshed.keys["_aden_managed"].value.get_secret_value() == "true"
assert refreshed.last_refreshed is not None
mock_client.request_refresh.assert_called_once_with("hubspot")
mock_client.request_refresh.assert_called_once_with(hash_id)
def test_refresh_requires_reauth(self, provider, mock_client):
"""Test refresh that requires re-authorization."""
@@ -339,12 +371,13 @@ class TestAdenSyncProvider:
def test_fetch_from_aden(self, provider, mock_client, aden_response):
"""Test fetching credential from Aden."""
hash_id = "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
mock_client.get_credential.return_value = aden_response
cred = provider.fetch_from_aden("hubspot")
cred = provider.fetch_from_aden(hash_id)
assert cred is not None
assert cred.id == "hubspot"
assert cred.id == hash_id
assert cred.keys["access_token"].value.get_secret_value() == "test-access-token"
assert cred.auto_refresh is True
@@ -360,13 +393,15 @@ class TestAdenSyncProvider:
"""Test syncing all credentials."""
mock_client.list_integrations.return_value = [
AdenIntegrationInfo(
integration_id="hubspot",
integration_type="hubspot",
integration_id="aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1",
provider="hubspot",
alias="My HubSpot",
status="active",
),
AdenIntegrationInfo(
integration_id="github",
integration_type="github",
integration_id="Z2l0aHViOnRlc3Q6OTk5",
provider="github",
alias="Work GitHub",
status="requires_reauth", # Should be skipped
),
]
@@ -376,7 +411,7 @@ class TestAdenSyncProvider:
synced = provider.sync_all(store)
assert synced == 1 # Only active one was synced
assert store.get_credential("hubspot") is not None
assert store.get_credential("aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1") is not None
def test_validate_via_aden(self, provider, mock_client):
"""Test validation via Aden introspection."""
@@ -608,7 +643,7 @@ class TestAdenCachedStorage:
cached_storage.save(cred)
assert cached_storage._provider_index["hubspot"] == "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
assert cached_storage._provider_index["hubspot"] == ["aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"]
def test_load_by_provider_name(self, cached_storage):
"""Test load resolves provider name to hash-based credential ID."""
@@ -711,8 +746,8 @@ class TestAdenCachedStorage:
indexed = cached_storage.rebuild_provider_index()
assert indexed == 2
assert cached_storage._provider_index["hubspot"] == "hash_hub"
assert cached_storage._provider_index["slack"] == "hash_slack"
assert cached_storage._provider_index["hubspot"] == ["hash_hub"]
assert cached_storage._provider_index["slack"] == ["hash_slack"]
def test_save_without_integration_type_no_index(self, cached_storage):
"""Test save does not index credentials without _integration_type key."""
@@ -743,19 +778,23 @@ class TestAdenIntegration:
def test_full_workflow(self, mock_client, aden_response):
"""Test full workflow: sync, get, refresh."""
hash_id = "aHVic3BvdDp0ZXN0OjEzNjExOjExNTI1"
# Setup
mock_client.list_integrations.return_value = [
AdenIntegrationInfo(
integration_id="hubspot",
integration_type="hubspot",
integration_id=hash_id,
provider="hubspot",
alias="My HubSpot",
status="active",
),
]
mock_client.get_credential.return_value = aden_response
mock_client.request_refresh.return_value = AdenCredentialResponse(
integration_id="hubspot",
integration_type="hubspot",
integration_id=hash_id,
access_token="refreshed-token",
provider="hubspot",
alias="My HubSpot",
expires_at=datetime.now(UTC) + timedelta(hours=2),
scopes=[],
)
@@ -772,8 +811,8 @@ class TestAdenIntegration:
synced = provider.sync_all(store)
assert synced == 1
# Get credential
cred = store.get_credential("hubspot")
# Get credential by hash ID
cred = store.get_credential(hash_id)
assert cred is not None
assert cred.keys["access_token"].value.get_secret_value() == "test-access-token"
+52
View File
@@ -70,6 +70,29 @@ class CredentialKey(BaseModel):
return self.value.get_secret_value()
class CredentialIdentity(BaseModel):
"""Identity information for a credential (whose account is this?)."""
email: str | None = None
username: str | None = None
workspace: str | None = None
account_id: str | None = None
@property
def label(self) -> str:
"""Best human-readable identifier for display."""
return self.email or self.username or self.workspace or self.account_id or "unknown"
@property
def is_known(self) -> bool:
"""Whether any identity field is populated."""
return bool(self.email or self.username or self.workspace or self.account_id)
def to_dict(self) -> dict[str, str]:
"""Return only non-None identity fields."""
return {k: v for k, v in self.model_dump().items() if v is not None}
class CredentialObject(BaseModel):
"""
A credential object containing one or more keys.
@@ -202,6 +225,35 @@ class CredentialObject(BaseModel):
return None
@property
def identity(self) -> CredentialIdentity:
"""Extract identity from ``_identity_*`` keys in the vault."""
fields = {}
for key_name, key_obj in self.keys.items():
if key_name.startswith("_identity_"):
field_name = key_name[len("_identity_") :]
if field_name in CredentialIdentity.model_fields:
fields[field_name] = key_obj.value.get_secret_value()
return CredentialIdentity(**fields)
@property
def provider_type(self) -> str | None:
"""Return the integration/provider type (e.g. 'google', 'slack')."""
key = self.keys.get("_integration_type")
return key.value.get_secret_value() if key else None
@property
def alias(self) -> str | None:
"""Return the user-set alias from the Aden platform."""
key = self.keys.get("_alias")
return key.value.get_secret_value() if key else None
def set_identity(self, **fields: str) -> None:
"""Persist identity fields as ``_identity_*`` keys."""
for field_name, value in fields.items():
if value:
self.set_key(f"_identity_{field_name}", value)
class CredentialUsageSpec(BaseModel):
"""
+744
View File
@@ -0,0 +1,744 @@
"""
Interactive credential setup for CLI applications.
Provides a modular, reusable credential setup flow that can be triggered
when validate_agent_credentials() fails. Works with both TUI and headless CLIs.
Usage:
from framework.credentials.setup import CredentialSetupSession
# From agent path
session = CredentialSetupSession.from_agent_path("exports/my-agent")
result = session.run_interactive()
# From nodes directly
session = CredentialSetupSession.from_nodes(nodes)
result = session.run_interactive()
# With custom I/O (for integration with other UIs)
session = CredentialSetupSession(
missing=missing_creds,
input_fn=my_input,
print_fn=my_print,
)
"""
from __future__ import annotations
import getpass
import json
import os
import sys
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.graph import NodeSpec
# ANSI colors for terminal output
class Colors:
RED = "\033[0;31m"
GREEN = "\033[0;32m"
YELLOW = "\033[1;33m"
BLUE = "\033[0;34m"
CYAN = "\033[0;36m"
BOLD = "\033[1m"
DIM = "\033[2m"
NC = "\033[0m" # No Color
@classmethod
def disable(cls):
"""Disable colors (for non-TTY output)."""
cls.RED = cls.GREEN = cls.YELLOW = cls.BLUE = ""
cls.CYAN = cls.BOLD = cls.DIM = cls.NC = ""
@dataclass
class MissingCredential:
"""A credential that needs to be configured."""
credential_name: str
"""Internal credential name (e.g., 'brave_search')"""
env_var: str
"""Environment variable name (e.g., 'BRAVE_SEARCH_API_KEY')"""
description: str
"""Human-readable description"""
help_url: str
"""URL where user can obtain credential"""
api_key_instructions: str
"""Step-by-step instructions for getting API key"""
tools: list[str] = field(default_factory=list)
"""Tools that require this credential"""
node_types: list[str] = field(default_factory=list)
"""Node types that require this credential"""
aden_supported: bool = False
"""Whether Aden OAuth flow is supported"""
direct_api_key_supported: bool = True
"""Whether direct API key entry is supported"""
credential_id: str = ""
"""Credential store ID"""
credential_key: str = "api_key"
"""Key name within the credential"""
@dataclass
class SetupResult:
"""Result of credential setup session."""
success: bool
"""Whether all required credentials were configured"""
configured: list[str] = field(default_factory=list)
"""Credentials that were successfully set up"""
skipped: list[str] = field(default_factory=list)
"""Credentials user chose to skip"""
errors: list[str] = field(default_factory=list)
"""Any errors encountered"""
class CredentialSetupSession:
"""
Interactive credential setup session.
Can be used by any CLI (runner, coding agent, etc.) to guide users
through credential configuration when validation fails.
Example:
from framework.credentials.setup import CredentialSetupSession
from framework.credentials.models import CredentialError
try:
validate_agent_credentials(nodes)
except CredentialError:
session = CredentialSetupSession.from_nodes(nodes)
result = session.run_interactive()
if result.success:
# Retry - credentials are now configured
validate_agent_credentials(nodes)
"""
def __init__(
self,
missing: list[MissingCredential],
input_fn: Callable[[str], str] | None = None,
print_fn: Callable[[str], None] | None = None,
password_fn: Callable[[str], str] | None = None,
):
"""
Initialize the setup session.
Args:
missing: List of credentials that need setup
input_fn: Custom input function (default: built-in input)
print_fn: Custom print function (default: built-in print)
password_fn: Custom password input function (default: getpass.getpass)
"""
self.missing = missing
self.input_fn = input_fn or input
self.print_fn = print_fn or print
self.password_fn = password_fn or getpass.getpass
# Disable colors if not a TTY
if not sys.stdout.isatty():
Colors.disable()
@classmethod
def from_nodes(cls, nodes: list[NodeSpec]) -> CredentialSetupSession:
"""Create a setup session by detecting missing credentials from nodes."""
missing = detect_missing_credentials_from_nodes(nodes)
return cls(missing)
@classmethod
def from_agent_path(cls, agent_path: str | Path) -> CredentialSetupSession:
"""Create a setup session for an agent by path."""
agent_path = Path(agent_path)
# Load agent to get nodes
agent_json = agent_path / "agent.json"
agent_py = agent_path / "agent.py"
nodes = []
if agent_py.exists():
# Python-based agent
nodes = _load_nodes_from_python_agent(agent_path)
elif agent_json.exists():
# JSON-based agent
nodes = _load_nodes_from_json_agent(agent_json)
missing = detect_missing_credentials_from_nodes(nodes)
return cls(missing)
def run_interactive(self) -> SetupResult:
"""Run the interactive setup flow."""
configured: list[str] = []
skipped: list[str] = []
errors: list[str] = []
if not self.missing:
self._print(f"\n{Colors.GREEN}✓ All credentials are already configured!{Colors.NC}\n")
return SetupResult(success=True)
self._print_header()
# Ensure HIVE_CREDENTIAL_KEY is set before storing anything
if not self._ensure_credential_key():
return SetupResult(
success=False,
errors=["Failed to initialize credential store encryption key"],
)
for cred in self.missing:
try:
result = self._setup_single_credential(cred)
if result:
configured.append(cred.credential_name)
else:
skipped.append(cred.credential_name)
except KeyboardInterrupt:
self._print(f"\n{Colors.YELLOW}Setup interrupted.{Colors.NC}")
skipped.append(cred.credential_name)
break
except Exception as e:
errors.append(f"{cred.credential_name}: {e}")
self._print_summary(configured, skipped, errors)
return SetupResult(
success=len(errors) == 0 and len(skipped) == 0,
configured=configured,
skipped=skipped,
errors=errors,
)
def _print(self, msg: str) -> None:
"""Print a message."""
self.print_fn(msg)
def _input(self, prompt: str) -> str:
"""Get input from user."""
return self.input_fn(prompt)
def _print_header(self) -> None:
"""Print the setup header."""
self._print("")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD} CREDENTIAL SETUP{Colors.NC}")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print("")
self._print(f" {len(self.missing)} credential(s) need to be configured:")
for cred in self.missing:
affected = cred.tools or cred.node_types
self._print(f"{cred.env_var} ({', '.join(affected)})")
self._print("")
def _ensure_credential_key(self) -> bool:
"""Ensure HIVE_CREDENTIAL_KEY is available for encrypted storage."""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return True
# Try to load from shell config
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
return True
except ImportError:
pass
# Generate a new key
self._print(f"{Colors.YELLOW}Initializing credential store...{Colors.NC}")
try:
from cryptography.fernet import Fernet
generated_key = Fernet.generate_key().decode()
os.environ["HIVE_CREDENTIAL_KEY"] = generated_key
# Save to shell config
self._save_key_to_shell_config(generated_key)
return True
except Exception as e:
self._print(f"{Colors.RED}Failed to initialize credential store: {e}{Colors.NC}")
return False
def _save_key_to_shell_config(self, key: str) -> None:
"""Save HIVE_CREDENTIAL_KEY to shell config."""
try:
from aden_tools.credentials.shell_config import (
add_env_var_to_shell_config,
)
success, config_path = add_env_var_to_shell_config(
"HIVE_CREDENTIAL_KEY",
key,
comment="Encryption key for Hive credential store",
)
if success:
self._print(f"{Colors.GREEN}✓ Encryption key saved to {config_path}{Colors.NC}")
except Exception:
# Fallback: just tell the user
self._print("\n")
self._print(
f"{Colors.YELLOW}Add this to your shell config (~/.zshrc or ~/.bashrc):{Colors.NC}"
)
self._print(f' export HIVE_CREDENTIAL_KEY="{key}"')
def _setup_single_credential(self, cred: MissingCredential) -> bool:
"""Set up a single credential. Returns True if configured."""
self._print(f"\n{Colors.CYAN}{'' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD}Setting up: {cred.credential_name}{Colors.NC}")
affected = cred.tools or cred.node_types
self._print(f"{Colors.DIM}Required for: {', '.join(affected)}{Colors.NC}")
if cred.description:
self._print(f"{Colors.DIM}{cred.description}{Colors.NC}")
self._print(f"{Colors.CYAN}{'' * 60}{Colors.NC}")
# Show auth options
options = self._get_auth_options(cred)
choice = self._prompt_choice(options)
if choice == "skip":
return False
elif choice == "aden":
return self._setup_via_aden(cred)
elif choice == "direct":
return self._setup_direct_api_key(cred)
return False
def _get_auth_options(self, cred: MissingCredential) -> list[tuple[str, str, str]]:
"""Get available auth options as (key, label, description) tuples."""
options = []
if cred.direct_api_key_supported:
options.append(
(
"direct",
"Enter API key directly",
"Paste your API key from the provider's dashboard",
)
)
if cred.aden_supported:
options.append(
(
"aden",
"Use Aden Platform (OAuth)",
"Secure OAuth2 flow via hive.adenhq.com",
)
)
options.append(
(
"skip",
"Skip for now",
"Configure this credential later",
)
)
return options
def _prompt_choice(self, options: list[tuple[str, str, str]]) -> str:
"""Prompt user to choose from options."""
self._print("")
for i, (key, label, desc) in enumerate(options, 1):
if key == "skip":
self._print(f" {Colors.DIM}{i}) {label}{Colors.NC}")
else:
self._print(f" {Colors.CYAN}{i}){Colors.NC} {label}")
self._print(f" {Colors.DIM}{desc}{Colors.NC}")
self._print("")
while True:
try:
choice_str = self._input(f"Select option (1-{len(options)}): ").strip()
if not choice_str:
continue
choice_num = int(choice_str)
if 1 <= choice_num <= len(options):
return options[choice_num - 1][0]
except ValueError:
pass
self._print(f"{Colors.RED}Invalid choice. Enter 1-{len(options)}{Colors.NC}")
def _setup_direct_api_key(self, cred: MissingCredential) -> bool:
"""Guide user through direct API key setup."""
# Show instructions
if cred.api_key_instructions:
self._print(f"\n{Colors.BOLD}Setup Instructions:{Colors.NC}")
self._print(cred.api_key_instructions)
if cred.help_url:
self._print(f"\n{Colors.CYAN}Get your API key at:{Colors.NC} {cred.help_url}")
# Collect key (use password input to hide the value)
self._print("")
try:
api_key = self.password_fn(f"Paste your {cred.env_var}: ").strip()
except Exception:
# Fallback to regular input if password input fails
api_key = self._input(f"Paste your {cred.env_var}: ").strip()
if not api_key:
self._print(f"{Colors.YELLOW}No value entered. Skipping.{Colors.NC}")
return False
# Health check
health_result = self._run_health_check(cred, api_key)
if health_result is not None:
if health_result["valid"]:
self._print(f"{Colors.GREEN}{health_result['message']}{Colors.NC}")
else:
self._print(f"{Colors.YELLOW}{health_result['message']}{Colors.NC}")
confirm = self._input("Continue anyway? [y/N]: ").strip().lower()
if confirm != "y":
return False
# Store credential
self._store_credential(cred, api_key)
return True
def _setup_via_aden(self, cred: MissingCredential) -> bool:
"""Guide user through Aden OAuth flow."""
self._print(f"\n{Colors.BOLD}Aden Platform Setup{Colors.NC}")
self._print("This will sync credentials from your Aden account.")
self._print("")
# Check for ADEN_API_KEY
aden_key = os.environ.get("ADEN_API_KEY")
if not aden_key:
self._print("You need an Aden API key to use this method.")
self._print(f"{Colors.CYAN}Get one at:{Colors.NC} https://hive.adenhq.com")
self._print("")
try:
aden_key = self.password_fn("Paste your ADEN_API_KEY: ").strip()
except Exception:
aden_key = self._input("Paste your ADEN_API_KEY: ").strip()
if not aden_key:
self._print(f"{Colors.YELLOW}No key entered. Skipping.{Colors.NC}")
return False
os.environ["ADEN_API_KEY"] = aden_key
# Save to shell config
try:
from aden_tools.credentials.shell_config import add_env_var_to_shell_config
add_env_var_to_shell_config(
"ADEN_API_KEY",
aden_key,
comment="Aden Platform API key",
)
except Exception:
pass
# Sync from Aden
try:
from framework.credentials import CredentialStore
store = CredentialStore.with_aden_sync(
base_url="https://api.adenhq.com",
auto_sync=True,
)
# Check if the credential was synced
cred_id = cred.credential_id or cred.credential_name
if store.is_available(cred_id):
self._print(f"{Colors.GREEN}{cred.credential_name} synced from Aden{Colors.NC}")
# Export to current session
try:
value = store.get_key(cred_id, cred.credential_key)
if value:
os.environ[cred.env_var] = value
except Exception:
pass
return True
else:
self._print(
f"{Colors.YELLOW}{cred.credential_name} not found in Aden account.{Colors.NC}"
)
self._print("Please connect this integration on https://hive.adenhq.com first.")
return False
except Exception as e:
self._print(f"{Colors.RED}Failed to sync from Aden: {e}{Colors.NC}")
return False
def _run_health_check(self, cred: MissingCredential, value: str) -> dict[str, Any] | None:
"""Run health check on credential value."""
try:
from aden_tools.credentials import check_credential_health
result = check_credential_health(cred.credential_name, value)
return {
"valid": result.valid,
"message": result.message,
"details": result.details,
}
except Exception:
# No health checker available
return None
def _store_credential(self, cred: MissingCredential, value: str) -> None:
"""Store credential in encrypted store and export to env."""
from pydantic import SecretStr
from framework.credentials import CredentialKey, CredentialObject, CredentialStore
try:
store = CredentialStore.with_encrypted_storage()
cred_id = cred.credential_id or cred.credential_name
key_name = cred.credential_key or "api_key"
cred_obj = CredentialObject(
id=cred_id,
name=cred.description or cred.credential_name,
keys={key_name: CredentialKey(name=key_name, value=SecretStr(value))},
)
store.save_credential(cred_obj)
self._print(f"{Colors.GREEN}✓ Stored in ~/.hive/credentials/{Colors.NC}")
except Exception as e:
self._print(f"{Colors.YELLOW}⚠ Could not store in credential store: {e}{Colors.NC}")
# Export to current session
os.environ[cred.env_var] = value
self._print(f"{Colors.GREEN}✓ Exported to current session{Colors.NC}")
def _print_summary(self, configured: list[str], skipped: list[str], errors: list[str]) -> None:
"""Print final summary."""
self._print("")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD} SETUP COMPLETE{Colors.NC}")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
if configured:
self._print(f"\n{Colors.GREEN}✓ Configured:{Colors.NC}")
for name in configured:
self._print(f"{name}")
if skipped:
self._print(f"\n{Colors.YELLOW}⏭ Skipped:{Colors.NC}")
for name in skipped:
self._print(f"{name}")
if errors:
self._print(f"\n{Colors.RED}✗ Errors:{Colors.NC}")
for err in errors:
self._print(f"{err}")
if not skipped and not errors:
self._print(f"\n{Colors.GREEN}All credentials configured successfully!{Colors.NC}")
elif skipped:
self._print(f"\n{Colors.YELLOW}Note: Skipped credentials must be configured ")
self._print(f"before running the agent.{Colors.NC}")
self._print("")
def detect_missing_credentials_from_nodes(nodes: list) -> list[MissingCredential]:
"""
Detect missing credentials for a list of nodes.
Args:
nodes: List of NodeSpec objects
Returns:
List of MissingCredential objects for credentials that need setup
"""
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
except ImportError:
return []
# Collect required tools and node types
required_tools: set[str] = set()
node_types: set[str] = set()
for node in nodes:
if hasattr(node, "tools") and node.tools:
required_tools.update(node.tools)
if hasattr(node, "node_type"):
node_types.add(node.node_type)
# Build credential store to check availability.
# Env vars take priority over encrypted store (fresh key wins over stale).
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
env_storage = EnvVarStorage(env_mapping=env_mapping)
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storage = CompositeStorage(primary=env_storage, fallbacks=[EncryptedFileStorage()])
else:
storage = env_storage
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[MissingCredential] = []
checked: set[str] = set()
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_tools = sorted(t for t in required_tools if t in spec.tools)
missing.append(
MissingCredential(
credential_name=cred_name,
env_var=spec.env_var,
description=spec.description,
help_url=spec.help_url,
api_key_instructions=spec.api_key_instructions,
tools=affected_tools,
aden_supported=spec.aden_supported,
direct_api_key_supported=spec.direct_api_key_supported,
credential_id=spec.credential_id,
credential_key=spec.credential_key,
)
)
# Check node type credentials
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_types = sorted(t for t in node_types if t in spec.node_types)
missing.append(
MissingCredential(
credential_name=cred_name,
env_var=spec.env_var,
description=spec.description,
help_url=spec.help_url,
api_key_instructions=spec.api_key_instructions,
node_types=affected_types,
aden_supported=spec.aden_supported,
direct_api_key_supported=spec.direct_api_key_supported,
credential_id=spec.credential_id,
credential_key=spec.credential_key,
)
)
return missing
def _load_nodes_from_python_agent(agent_path: Path) -> list:
"""Load nodes from a Python-based agent."""
import importlib.util
agent_py = agent_path / "agent.py"
if not agent_py.exists():
return []
try:
# Add agent path and its parent to sys.path so imports work
paths_to_add = [str(agent_path), str(agent_path.parent)]
for p in paths_to_add:
if p not in sys.path:
sys.path.insert(0, p)
spec = importlib.util.spec_from_file_location(
f"{agent_path.name}.agent",
agent_py,
submodule_search_locations=[str(agent_path)],
)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return getattr(module, "nodes", [])
except Exception:
return []
def _load_nodes_from_json_agent(agent_json: Path) -> list:
"""Load nodes from a JSON-based agent."""
try:
with open(agent_json) as f:
data = json.load(f)
from framework.graph import NodeSpec
nodes_data = data.get("graph", {}).get("nodes", [])
nodes = []
for node_data in nodes_data:
nodes.append(
NodeSpec(
id=node_data.get("id", ""),
name=node_data.get("name", ""),
description=node_data.get("description", ""),
node_type=node_data.get("node_type", ""),
tools=node_data.get("tools", []),
input_keys=node_data.get("input_keys", []),
output_keys=node_data.get("output_keys", []),
)
)
return nodes
except Exception:
return []
def run_credential_setup_cli(agent_path: str | Path | None = None) -> int:
"""
Standalone CLI entry point for credential setup.
Can be called from:
- `hive setup-credentials <agent>`
- After CredentialError in runner CLI
- From coding agent CLI
Args:
agent_path: Optional path to agent directory
Returns:
Exit code (0 = success, 1 = failure/skipped)
"""
if agent_path:
session = CredentialSetupSession.from_agent_path(agent_path)
else:
# No agent specified - detect from current context or show error
print("Usage: hive setup-credentials <agent_path>")
return 1
result = session.run_interactive()
return 0 if result.success else 1
+48
View File
@@ -362,6 +362,54 @@ class CredentialStore:
"""
return self._storage.list_all()
def list_accounts(self, provider_name: str) -> list[dict[str, Any]]:
"""List all accounts for a provider type with their identities.
Args:
provider_name: Provider type name (e.g. "google", "slack").
Returns:
List of dicts with credential_id, provider, alias, identity, label.
"""
if hasattr(self._storage, "load_all_for_provider"):
creds = self._storage.load_all_for_provider(provider_name)
else:
cred = self.get_credential(provider_name)
creds = [cred] if cred else []
return [
{
"credential_id": c.id,
"provider": provider_name,
"alias": c.alias,
"identity": c.identity.to_dict(),
}
for c in creds
]
def get_credential_by_alias(self, provider_name: str, alias: str) -> CredentialObject | None:
"""Find a credential by provider name and alias.
Args:
provider_name: Provider type name (e.g. "google").
alias: User-set alias from the Aden platform.
Returns:
CredentialObject if found, None otherwise.
"""
if hasattr(self._storage, "load_by_alias"):
return self._storage.load_by_alias(provider_name, alias)
# Scan fallback for storage backends without alias index
if hasattr(self._storage, "load_all_for_provider"):
for cred in self._storage.load_all_for_provider(provider_name):
if cred.alias == alias:
return cred
return None
def get_credential_by_identity(self, provider_name: str, label: str) -> CredentialObject | None:
"""Alias for get_credential_by_alias (backward compat)."""
return self.get_credential_by_alias(provider_name, label)
def is_available(self, credential_id: str) -> bool:
"""
Check if a credential is available.
+335
View File
@@ -0,0 +1,335 @@
"""Credential validation utilities.
Provides reusable credential validation for agents, whether run through
the AgentRunner or directly via GraphExecutor.
"""
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
logger = logging.getLogger(__name__)
def ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY and ADEN_API_KEY from shell config if not in environment.
The setup-credentials skill writes these to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads
them directly so the runner (and any MCP subprocesses it spawns) can:
- Unlock the encrypted credential store (HIVE_CREDENTIAL_KEY)
- Enable Aden OAuth sync for Google/HubSpot/etc. (ADEN_API_KEY)
"""
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
except ImportError:
return
for var_name in ("HIVE_CREDENTIAL_KEY", "ADEN_API_KEY"):
if os.environ.get(var_name):
continue
found, value = check_env_var_in_shell_config(var_name)
if found and value:
os.environ[var_name] = value
logger.debug("Loaded %s from shell config", var_name)
@dataclass
class _CredentialCheck:
"""Result of checking a single credential."""
env_var: str
source: str
used_by: str
available: bool
help_url: str = ""
def _presync_aden_tokens(credential_specs: dict) -> None:
"""Sync Aden-backed OAuth tokens into env vars for validation.
When ADEN_API_KEY is available, fetches fresh OAuth tokens from the Aden
server and exports them to env vars. This ensures validation sees real
tokens instead of stale or mis-stored values in the encrypted store.
Only touches credentials that are ``aden_supported`` AND whose env var
is not already set (so explicit user exports always win).
"""
from framework.credentials.store import CredentialStore
try:
aden_store = CredentialStore.with_aden_sync(auto_sync=True)
except Exception as e:
logger.warning("Aden pre-sync unavailable: %s", e)
return
for name, spec in credential_specs.items():
if not spec.aden_supported:
continue
if os.environ.get(spec.env_var):
continue # Already set — don't overwrite
cred_id = spec.credential_id or name
try:
value = aden_store.get_key(cred_id, spec.credential_key)
if value:
os.environ[spec.env_var] = value
logger.debug("Pre-synced %s from Aden", spec.env_var)
else:
logger.warning(
"Pre-sync: %s (id=%s) available but key '%s' returned None",
spec.env_var,
cred_id,
spec.credential_key,
)
except Exception as e:
logger.warning(
"Pre-sync failed for %s (id=%s): %s",
spec.env_var,
cred_id,
e,
)
def validate_agent_credentials(nodes: list, quiet: bool = False, verify: bool = True) -> None:
"""Check that required credentials are available and valid before running an agent.
Two-phase validation:
1. **Presence** is the credential set (env var, encrypted store, or Aden sync)?
2. **Health check** does the credential actually work? Uses each tool's
registered ``check_credential_health`` endpoint (lightweight HTTP call).
Args:
nodes: List of NodeSpec objects from the agent graph.
quiet: If True, suppress the credential summary output.
verify: If True (default), run health checks on present credentials.
"""
# Collect required tools and node types
required_tools = {tool for node in nodes if node.tools for tool in node.tools}
node_types = {node.node_type for node in nodes}
try:
from aden_tools.credentials import CREDENTIAL_SPECS
except ImportError:
return # aden_tools not installed, skip check
from framework.credentials.storage import CompositeStorage, EncryptedFileStorage, EnvVarStorage
from framework.credentials.store import CredentialStore
# Build credential store.
# Env vars take priority — if a user explicitly exports a fresh key it
# must win over a potentially stale value in the encrypted store.
#
# Pre-sync: when ADEN_API_KEY is available, sync OAuth tokens from Aden
# into env vars so validation sees fresh tokens instead of stale values
# in the encrypted store (e.g., a previously mis-stored google.enc).
if os.environ.get("ADEN_API_KEY"):
_presync_aden_tokens(CREDENTIAL_SPECS)
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
env_storage = EnvVarStorage(env_mapping=env_mapping)
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storage = CompositeStorage(primary=env_storage, fallbacks=[EncryptedFileStorage()])
else:
storage = env_storage
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[str] = []
invalid: list[str] = []
# Aden-backed creds where ADEN_API_KEY is set but integration not connected
aden_not_connected: list[str] = []
failed_cred_names: list[str] = [] # all cred names that need (re-)collection
has_aden_key = bool(os.environ.get("ADEN_API_KEY"))
checked: set[str] = set()
# Credentials that are present and should be health-checked
to_verify: list[tuple[str, str]] = [] # (cred_name, used_by_label)
def _check_credential(spec, cred_name: str, label: str) -> None:
cred_id = spec.credential_id or cred_name
if not store.is_available(cred_id):
# If ADEN_API_KEY is set and this is an Aden-only credential,
# the issue is that the integration isn't connected on hive.adenhq.com,
# NOT that the user needs to re-enter ADEN_API_KEY.
if has_aden_key and spec.aden_supported and not spec.direct_api_key_supported:
aden_not_connected.append(
f" {spec.env_var} for {label}"
f"\n Connect this integration at hive.adenhq.com first."
)
else:
entry = f" {spec.env_var} for {label}"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
failed_cred_names.append(cred_name)
elif verify and spec.health_check_endpoint:
to_verify.append((cred_name, label))
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
if not spec.required:
continue
affected = sorted(t for t in required_tools if t in spec.tools)
label = ", ".join(affected)
_check_credential(spec, cred_name, label)
# Check node type credentials (e.g., ANTHROPIC_API_KEY for LLM nodes)
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
if not spec.required:
continue
affected_types = sorted(t for t in node_types if t in spec.node_types)
label = ", ".join(affected_types) + " nodes"
_check_credential(spec, cred_name, label)
# Phase 2: health-check present credentials
if to_verify:
try:
from aden_tools.credentials import check_credential_health
except ImportError:
check_credential_health = None # type: ignore[assignment]
if check_credential_health is not None:
for cred_name, label in to_verify:
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
value = store.get(cred_id)
if not value:
continue
try:
result = check_credential_health(
cred_name,
value,
health_check_endpoint=spec.health_check_endpoint,
health_check_method=spec.health_check_method,
)
if not result.valid:
entry = f" {spec.env_var} for {label}{result.message}"
if spec.help_url:
entry += f"\n Get a new key at: {spec.help_url}"
invalid.append(entry)
failed_cred_names.append(cred_name)
elif result.valid:
# Persist identity from health check (best-effort)
identity_data = result.details.get("identity")
if identity_data and isinstance(identity_data, dict):
try:
cred_obj = store.get_credential(cred_id, refresh_if_needed=False)
if cred_obj:
cred_obj.set_identity(**identity_data)
store.save_credential(cred_obj)
except Exception:
pass # Identity persistence is best-effort
except Exception as exc:
logger.debug("Health check for %s failed: %s", cred_name, exc)
errors = missing + invalid + aden_not_connected
if errors:
from framework.credentials.models import CredentialError
lines: list[str] = []
if missing:
lines.append("Missing credentials:\n")
lines.extend(missing)
if invalid:
if missing:
lines.append("")
lines.append("Invalid or expired credentials:\n")
lines.extend(invalid)
if aden_not_connected:
if missing or invalid:
lines.append("")
lines.append(
"Aden integrations not connected "
"(ADEN_API_KEY is set but OAuth tokens unavailable):\n"
)
lines.extend(aden_not_connected)
lines.append(
"\nTo fix: run /hive-credentials in Claude Code."
"\nIf you've already set up credentials, "
"restart your terminal to load them."
)
exc = CredentialError("\n".join(lines))
exc.failed_cred_names = failed_cred_names # type: ignore[attr-defined]
raise exc
def build_setup_session_from_error(
credential_error: Exception,
nodes: list | None = None,
agent_path: str | None = None,
):
"""Build a ``CredentialSetupSession`` that covers all failed credentials.
``validate_agent_credentials`` attaches ``failed_cred_names`` (both missing
and invalid) to the ``CredentialError``. This helper converts those names
into ``MissingCredential`` entries so the setup screen can re-collect them.
Falls back to the normal ``from_nodes`` / ``from_agent_path`` detection
when the attribute is absent.
Args:
credential_error: The ``CredentialError`` raised by validation.
nodes: Graph nodes (preferred avoids re-loading from disk).
agent_path: Agent directory path (used when nodes aren't available).
"""
from framework.credentials.setup import CredentialSetupSession, MissingCredential
# Start with normal detection (picks up truly missing creds)
if nodes is not None:
session = CredentialSetupSession.from_nodes(nodes)
elif agent_path is not None:
session = CredentialSetupSession.from_agent_path(agent_path)
else:
session = CredentialSetupSession(missing=[])
# Add credentials that are present but failed health checks
already = {m.credential_name for m in session.missing}
failed_names: list[str] = getattr(credential_error, "failed_cred_names", [])
if failed_names:
try:
from aden_tools.credentials import CREDENTIAL_SPECS
for name in failed_names:
if name in already:
continue
spec = CREDENTIAL_SPECS.get(name)
if spec is None:
continue
session.missing.append(
MissingCredential(
credential_name=name,
env_var=spec.env_var,
description=spec.description,
help_url=spec.help_url,
api_key_instructions=spec.api_key_instructions,
tools=list(spec.tools),
aden_supported=spec.aden_supported,
direct_api_key_supported=spec.direct_api_key_supported,
credential_id=spec.credential_id,
credential_key=spec.credential_key,
)
)
except ImportError:
pass
return session
+2 -52
View File
@@ -1,4 +1,4 @@
"""Graph structures: Goals, Nodes, Edges, and Flexible Execution."""
"""Graph structures: Goals, Nodes, Edges, and Execution."""
from framework.graph.client_io import (
ActiveNodeClientIO,
@@ -6,7 +6,6 @@ from framework.graph.client_io import (
InertNodeClientIO,
NodeClientIO,
)
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
@@ -18,31 +17,9 @@ from framework.graph.event_loop_node import (
OutputAccumulator,
)
from framework.graph.executor import GraphExecutor
from framework.graph.flexible_executor import ExecutorConfig, FlexibleGraphExecutor
from framework.graph.goal import Constraint, Goal, GoalStatus, SuccessCriterion
from framework.graph.judge import HybridJudge, create_default_judge
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
# Flexible execution (Worker-Judge pattern)
from framework.graph.plan import (
ActionSpec,
ActionType,
# HITL (Human-in-the-loop)
ApprovalDecision,
ApprovalRequest,
ApprovalResult,
EvaluationRule,
ExecutionStatus,
Judgment,
JudgmentAction,
Plan,
PlanExecutionResult,
PlanStep,
StepStatus,
load_export,
)
from framework.graph.worker_node import StepExecutionResult, WorkerNode
__all__ = [
# Goal
"Goal",
@@ -59,35 +36,8 @@ __all__ = [
"EdgeCondition",
"GraphSpec",
"DEFAULT_MAX_TOKENS",
# Executor (fixed graph)
# Executor
"GraphExecutor",
# Plan (flexible execution)
"Plan",
"PlanStep",
"ActionSpec",
"ActionType",
"StepStatus",
"Judgment",
"JudgmentAction",
"EvaluationRule",
"PlanExecutionResult",
"ExecutionStatus",
"load_export",
# HITL (Human-in-the-loop)
"ApprovalDecision",
"ApprovalRequest",
"ApprovalResult",
# Worker-Judge
"HybridJudge",
"create_default_judge",
"WorkerNode",
"StepExecutionResult",
"FlexibleGraphExecutor",
"ExecutorConfig",
# Code Sandbox
"CodeSandbox",
"safe_exec",
"safe_eval",
# Conversation
"NodeConversation",
"ConversationStore",
-413
View File
@@ -1,413 +0,0 @@
"""
Code Sandbox for Safe Execution of Dynamic Code.
Provides a restricted execution environment for code generated by
the external planner. This is critical for open-ended planning where
the planner can create arbitrary code actions.
Security measures:
1. Restricted builtins (no file I/O, no imports of dangerous modules)
2. Timeout enforcement
3. Memory limits (via resource module on Unix)
4. Namespace isolation
"""
import ast
import signal
import sys
from contextlib import contextmanager
from dataclasses import dataclass, field
from typing import Any
# Safe builtins whitelist
SAFE_BUILTINS = {
# Basic types
"True": True,
"False": False,
"None": None,
# Type constructors
"bool": bool,
"int": int,
"float": float,
"str": str,
"list": list,
"dict": dict,
"set": set,
"tuple": tuple,
"frozenset": frozenset,
# Basic functions
"abs": abs,
"all": all,
"any": any,
"bin": bin,
"chr": chr,
"divmod": divmod,
"enumerate": enumerate,
"filter": filter,
"format": format,
"hex": hex,
"isinstance": isinstance,
"issubclass": issubclass,
"iter": iter,
"len": len,
"map": map,
"max": max,
"min": min,
"next": next,
"oct": oct,
"ord": ord,
"pow": pow,
"range": range,
"repr": repr,
"reversed": reversed,
"round": round,
"slice": slice,
"sorted": sorted,
"sum": sum,
"zip": zip,
}
# Modules that can be imported
ALLOWED_MODULES = {
"math",
"json",
"re",
"datetime",
"collections",
"itertools",
"functools",
"operator",
"string",
"random",
"statistics",
"decimal",
"fractions",
}
# Dangerous AST nodes to block
BLOCKED_AST_NODES = {
ast.Import,
ast.ImportFrom,
ast.Global,
ast.Nonlocal,
}
class CodeSandboxError(Exception):
"""Error during sandboxed code execution."""
pass
class TimeoutError(CodeSandboxError):
"""Code execution timed out."""
pass
class SecurityError(CodeSandboxError):
"""Code contains potentially dangerous operations."""
pass
@dataclass
class SandboxResult:
"""Result of sandboxed code execution."""
success: bool
result: Any = None
error: str | None = None
stdout: str = ""
variables: dict[str, Any] = field(default_factory=dict)
execution_time_ms: int = 0
class RestrictedImporter:
"""Custom importer that only allows whitelisted modules."""
def __init__(self, allowed_modules: set[str]):
self.allowed_modules = allowed_modules
self._cache: dict[str, Any] = {}
def __call__(self, name: str, *args, **kwargs):
if name not in self.allowed_modules:
raise SecurityError(f"Import of module '{name}' is not allowed")
if name not in self._cache:
import importlib
self._cache[name] = importlib.import_module(name)
return self._cache[name]
class CodeValidator:
"""Validates code for safety before execution."""
def __init__(self, blocked_nodes: set[type] | None = None):
self.blocked_nodes = blocked_nodes or BLOCKED_AST_NODES
def validate(self, code: str) -> list[str]:
"""
Validate code and return list of issues.
Returns empty list if code is safe.
"""
issues = []
try:
tree = ast.parse(code)
except SyntaxError as e:
return [f"Syntax error: {e}"]
for node in ast.walk(tree):
# Check for blocked node types
if type(node) in self.blocked_nodes:
lineno = getattr(node, "lineno", "?")
issues.append(f"Blocked operation: {type(node).__name__} at line {lineno}")
# Check for dangerous attribute access
if isinstance(node, ast.Attribute):
if node.attr.startswith("_"):
issues.append(
f"Access to private attribute '{node.attr}' at line {node.lineno}"
)
# Check for exec/eval calls
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name):
if node.func.id in ("exec", "eval", "compile", "__import__"):
issues.append(
f"Blocked function call: {node.func.id} at line {node.lineno}"
)
return issues
class CodeSandbox:
"""
Sandboxed environment for executing dynamic code.
Usage:
sandbox = CodeSandbox(timeout_seconds=5)
result = sandbox.execute(
code="x = 1 + 2\\nresult = x * 3",
inputs={"multiplier": 2},
)
if result.success:
print(result.variables["result"]) # 6
"""
def __init__(
self,
timeout_seconds: int = 10,
allowed_modules: set[str] | None = None,
safe_builtins: dict[str, Any] | None = None,
):
self.timeout_seconds = timeout_seconds
self.allowed_modules = allowed_modules or ALLOWED_MODULES
self.safe_builtins = safe_builtins or SAFE_BUILTINS
self.validator = CodeValidator()
self.importer = RestrictedImporter(self.allowed_modules)
@contextmanager
def _timeout_context(self, seconds: int):
"""Context manager for timeout enforcement."""
def handler(signum, frame):
raise TimeoutError(f"Code execution timed out after {seconds} seconds")
# Only works on Unix-like systems
if hasattr(signal, "SIGALRM"):
old_handler = signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
signal.signal(signal.SIGALRM, old_handler)
else:
# Windows: no timeout support, just execute
yield
def _create_namespace(self, inputs: dict[str, Any]) -> dict[str, Any]:
"""Create isolated namespace for code execution."""
namespace = {
"__builtins__": dict(self.safe_builtins),
"__import__": self.importer,
}
# Add input variables
namespace.update(inputs)
return namespace
def execute(
self,
code: str,
inputs: dict[str, Any] | None = None,
extract_vars: list[str] | None = None,
) -> SandboxResult:
"""
Execute code in sandbox.
Args:
code: Python code to execute
inputs: Variables to inject into namespace
extract_vars: Variable names to extract from namespace after execution
Returns:
SandboxResult with execution outcome
"""
import time
inputs = inputs or {}
extract_vars = extract_vars or []
# Validate code first
issues = self.validator.validate(code)
if issues:
return SandboxResult(
success=False,
error=f"Code validation failed: {'; '.join(issues)}",
)
# Create isolated namespace
namespace = self._create_namespace(inputs)
# Capture stdout
import io
old_stdout = sys.stdout
sys.stdout = captured_stdout = io.StringIO()
start_time = time.time()
try:
with self._timeout_context(self.timeout_seconds):
# Compile and execute
compiled = compile(code, "<sandbox>", "exec")
exec(compiled, namespace)
execution_time_ms = int((time.time() - start_time) * 1000)
# Extract requested variables
extracted = {}
for var in extract_vars:
if var in namespace:
extracted[var] = namespace[var]
# Also extract any new variables (not in inputs or builtins)
for key, value in namespace.items():
if key not in inputs and key not in self.safe_builtins and not key.startswith("_"):
extracted[key] = value
return SandboxResult(
success=True,
result=namespace.get("result"), # Convention: 'result' is the return value
stdout=captured_stdout.getvalue(),
variables=extracted,
execution_time_ms=execution_time_ms,
)
except TimeoutError as e:
return SandboxResult(
success=False,
error=str(e),
execution_time_ms=self.timeout_seconds * 1000,
)
except SecurityError as e:
return SandboxResult(
success=False,
error=f"Security violation: {e}",
execution_time_ms=int((time.time() - start_time) * 1000),
)
except Exception as e:
return SandboxResult(
success=False,
error=f"{type(e).__name__}: {e}",
stdout=captured_stdout.getvalue(),
execution_time_ms=int((time.time() - start_time) * 1000),
)
finally:
sys.stdout = old_stdout
def execute_expression(
self,
expression: str,
inputs: dict[str, Any] | None = None,
) -> SandboxResult:
"""
Execute a single expression and return its value.
Simpler than execute() - just evaluates one expression.
"""
inputs = inputs or {}
# Validate
try:
ast.parse(expression, mode="eval")
except SyntaxError as e:
return SandboxResult(success=False, error=f"Syntax error: {e}")
namespace = self._create_namespace(inputs)
try:
with self._timeout_context(self.timeout_seconds):
result = eval(expression, namespace)
return SandboxResult(success=True, result=result)
except Exception as e:
return SandboxResult(
success=False,
error=f"{type(e).__name__}: {e}",
)
# Singleton instance with default settings
default_sandbox = CodeSandbox()
def safe_exec(
code: str,
inputs: dict[str, Any] | None = None,
timeout_seconds: int = 10,
) -> SandboxResult:
"""
Convenience function for safe code execution.
Args:
code: Python code to execute
inputs: Variables to inject
timeout_seconds: Max execution time
Returns:
SandboxResult
"""
sandbox = CodeSandbox(timeout_seconds=timeout_seconds)
return sandbox.execute(code, inputs)
def safe_eval(
expression: str,
inputs: dict[str, Any] | None = None,
timeout_seconds: int = 5,
) -> SandboxResult:
"""
Convenience function for safe expression evaluation.
Args:
expression: Python expression to evaluate
inputs: Variables to inject
timeout_seconds: Max execution time
Returns:
SandboxResult
"""
sandbox = CodeSandbox(timeout_seconds=timeout_seconds)
return sandbox.execute_expression(expression, inputs)
+123 -10
View File
@@ -27,6 +27,9 @@ class Message:
tool_use_id: str | None = None
tool_calls: list[dict[str, Any]] | None = None
is_error: bool = False
# Phase-aware compaction metadata (continuous mode)
phase_id: str | None = None
is_transition_marker: bool = False
def to_llm_dict(self) -> dict[str, Any]:
"""Convert to OpenAI-format message dict."""
@@ -60,6 +63,10 @@ class Message:
d["tool_calls"] = self.tool_calls
if self.is_error:
d["is_error"] = self.is_error
if self.phase_id is not None:
d["phase_id"] = self.phase_id
if self.is_transition_marker:
d["is_transition_marker"] = self.is_transition_marker
return d
@classmethod
@@ -72,6 +79,8 @@ class Message:
tool_use_id=data.get("tool_use_id"),
tool_calls=data.get("tool_calls"),
is_error=data.get("is_error", False),
phase_id=data.get("phase_id"),
is_transition_marker=data.get("is_transition_marker", False),
)
@@ -188,6 +197,7 @@ class NodeConversation:
self._next_seq: int = 0
self._meta_persisted: bool = False
self._last_api_input_tokens: int | None = None
self._current_phase: str | None = None
# --- Properties --------------------------------------------------------
@@ -195,6 +205,33 @@ class NodeConversation:
def system_prompt(self) -> str:
return self._system_prompt
def update_system_prompt(self, new_prompt: str) -> None:
"""Update the system prompt.
Used in continuous conversation mode at phase transitions to swap
Layer 3 (focus) while preserving the conversation history.
"""
self._system_prompt = new_prompt
def set_current_phase(self, phase_id: str) -> None:
"""Set the current phase ID. Subsequent messages will be stamped with it."""
self._current_phase = phase_id
async def switch_store(self, new_store: ConversationStore) -> None:
"""Switch to a new persistence store at a phase transition.
Subsequent messages are written to *new_store*. Meta (system
prompt, config) is re-persisted on the next write so the new
store's ``meta.json`` reflects the updated prompt.
"""
self._store = new_store
self._meta_persisted = False
await new_store.write_cursor({"next_seq": self._next_seq})
@property
def current_phase(self) -> str | None:
return self._current_phase
@property
def messages(self) -> list[Message]:
"""Return a defensive copy of the message list."""
@@ -216,8 +253,19 @@ class NodeConversation:
# --- Add messages ------------------------------------------------------
async def add_user_message(self, content: str) -> Message:
msg = Message(seq=self._next_seq, role="user", content=content)
async def add_user_message(
self,
content: str,
*,
is_transition_marker: bool = False,
) -> Message:
msg = Message(
seq=self._next_seq,
role="user",
content=content,
phase_id=self._current_phase,
is_transition_marker=is_transition_marker,
)
self._messages.append(msg)
self._next_seq += 1
await self._persist(msg)
@@ -233,6 +281,7 @@ class NodeConversation:
role="assistant",
content=content,
tool_calls=tool_calls,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -251,6 +300,7 @@ class NodeConversation:
content=content,
tool_use_id=tool_use_id,
is_error=is_error,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -380,6 +430,11 @@ class NodeConversation:
spillover filename reference (if any). Message structure (role,
seq, tool_use_id) stays valid for the LLM API.
Phase-aware behavior (continuous mode): when messages have ``phase_id``
metadata, all messages in the current phase are protected regardless of
token budget. Transition markers are never pruned. Older phases' tool
results are pruned more aggressively.
Error tool results are never pruned they prevent re-calling
failing tools.
@@ -388,13 +443,18 @@ class NodeConversation:
if not self._messages:
return 0
# Phase 1: Walk backward, classify tool results as protected vs pruneable
# Walk backward, classify tool results as protected vs pruneable
protected_tokens = 0
pruneable: list[int] = [] # indices into self._messages
pruneable_tokens = 0
for i in range(len(self._messages) - 1, -1, -1):
msg = self._messages[i]
# Transition markers are never pruned (any role)
if msg.is_transition_marker:
continue
if msg.role != "tool":
continue
if msg.is_error:
@@ -402,6 +462,10 @@ class NodeConversation:
if msg.content.startswith("[Pruned tool result"):
continue # already pruned
# Phase-aware: protect current phase messages
if self._current_phase and msg.phase_id == self._current_phase:
continue
est = len(msg.content) // 4
if protected_tokens < protect_tokens:
protected_tokens += est
@@ -409,11 +473,11 @@ class NodeConversation:
pruneable.append(i)
pruneable_tokens += est
# Phase 2: Only prune if enough to be worthwhile
# Only prune if enough to be worthwhile
if pruneable_tokens < min_prune_tokens:
return 0
# Phase 3: Replace content with compact placeholder
# Replace content with compact placeholder
count = 0
for i in pruneable:
msg = self._messages[i]
@@ -436,6 +500,8 @@ class NodeConversation:
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
count += 1
@@ -446,22 +512,38 @@ class NodeConversation:
self._last_api_input_tokens = None
return count
async def compact(self, summary: str, keep_recent: int = 2) -> None:
async def compact(
self,
summary: str,
keep_recent: int = 2,
phase_graduated: bool = False,
) -> None:
"""Replace old messages with a summary, optionally keeping recent ones.
Args:
summary: Caller-provided summary text.
keep_recent: Number of recent messages to preserve (default 2).
Clamped to [0, len(messages) - 1].
phase_graduated: When True and messages have phase_id metadata,
split at phase boundaries instead of using keep_recent.
Keeps current + previous phase intact; compacts older phases.
"""
if not self._messages:
return
# Clamp: must discard at least 1 message
keep_recent = max(0, min(keep_recent, len(self._messages) - 1))
total = len(self._messages)
split = total - keep_recent if keep_recent > 0 else total
# Phase-graduated: find the split point based on phase boundaries.
# Keeps current phase + previous phase intact, compacts older phases.
if phase_graduated and self._current_phase:
split = self._find_phase_graduated_split()
else:
split = None
if split is None:
# Fallback: use keep_recent (non-phase or single-phase conversation)
keep_recent = max(0, min(keep_recent, total - 1))
split = total - keep_recent if keep_recent > 0 else total
# Advance split past orphaned tool results at the boundary.
# Tool-role messages reference a tool_use from the preceding
@@ -470,6 +552,10 @@ class NodeConversation:
while split < total and self._messages[split].role == "tool":
split += 1
# Nothing to compact
if split == 0:
return
old_messages = list(self._messages[:split])
recent_messages = list(self._messages[split:])
@@ -504,6 +590,33 @@ class NodeConversation:
self._messages = [summary_msg] + recent_messages
self._last_api_input_tokens = None # reset; next LLM call will recalibrate
def _find_phase_graduated_split(self) -> int | None:
"""Find split point that preserves current + previous phase.
Returns the index of the first message in the protected set,
or None if phase graduation doesn't apply (< 3 phases).
"""
# Collect distinct phases in order of first appearance
phases_seen: list[str] = []
for msg in self._messages:
if msg.phase_id and msg.phase_id not in phases_seen:
phases_seen.append(msg.phase_id)
# Need at least 3 phases for graduation to be meaningful
# (current + previous are protected, older get compacted)
if len(phases_seen) < 3:
return None
# Protect: current phase + previous phase
protected_phases = {phases_seen[-1], phases_seen[-2]}
# Find split: first message belonging to a protected phase
for i, msg in enumerate(self._messages):
if msg.phase_id in protected_phases:
return i
return None
async def clear(self) -> None:
"""Remove all messages, keep system prompt, preserve ``_next_seq``."""
if self._store:
+177
View File
@@ -0,0 +1,177 @@
"""Level 2 Conversation-Aware Judge.
When a node has `success_criteria` set, the implicit judge upgrades:
after Level 0 passes (all output keys set), a fast LLM call evaluates
whether the conversation actually meets the criteria.
This prevents nodes from "checking boxes" (setting output keys) without
doing quality work. The LLM reads the recent conversation and assesses
whether the phase's goal was genuinely accomplished.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import Any
from framework.graph.conversation import NodeConversation
from framework.llm.provider import LLMProvider
logger = logging.getLogger(__name__)
@dataclass
class PhaseVerdict:
"""Result of Level 2 conversation-aware evaluation."""
action: str # "ACCEPT" or "RETRY"
confidence: float = 0.8
feedback: str = ""
async def evaluate_phase_completion(
llm: LLMProvider,
conversation: NodeConversation,
phase_name: str,
phase_description: str,
success_criteria: str,
accumulator_state: dict[str, Any],
max_history_tokens: int = 8_196,
) -> PhaseVerdict:
"""Level 2 judge: read the conversation and evaluate quality.
Only called after Level 0 passes (all output keys set).
Args:
llm: LLM provider for evaluation
conversation: The current conversation to evaluate
phase_name: Name of the current phase/node
phase_description: Description of the phase
success_criteria: Natural-language criteria for phase completion
accumulator_state: Current output key values
max_history_tokens: Main conversation token budget (judge gets 20%)
Returns:
PhaseVerdict with action and optional feedback
"""
# Build a compact view of the recent conversation
recent_messages = _extract_recent_context(conversation, max_messages=10)
outputs_summary = _format_outputs(accumulator_state)
system_prompt = (
"You are a quality judge evaluating whether a phase of work is complete. "
"Be concise. Evaluate based on the success criteria, not on style."
)
user_prompt = f"""Evaluate this phase:
PHASE: {phase_name}
DESCRIPTION: {phase_description}
SUCCESS CRITERIA:
{success_criteria}
OUTPUTS SET:
{outputs_summary}
RECENT CONVERSATION:
{recent_messages}
Has this phase accomplished its goal based on the success criteria?
Respond in exactly this format:
ACTION: ACCEPT or RETRY
CONFIDENCE: 0.X
FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
try:
response = await llm.acomplete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
max_tokens=max(1024, max_history_tokens // 5),
max_retries=1,
)
if not response.content or not response.content.strip():
logger.debug("Level 2 judge: empty response, accepting by default")
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
return _parse_verdict(response.content)
except Exception as e:
logger.warning(f"Level 2 judge failed, accepting by default: {e}")
# On failure, don't block — Level 0 already passed
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
def _extract_recent_context(conversation: NodeConversation, max_messages: int = 10) -> str:
"""Extract recent conversation messages for evaluation."""
messages = conversation.messages
recent = messages[-max_messages:] if len(messages) > max_messages else messages
parts = []
for msg in recent:
role = msg.role.upper()
content = msg.content or ""
# Truncate long tool results
if msg.role == "tool" and len(content) > 200:
content = content[:200] + "..."
if content.strip():
parts.append(f"[{role}]: {content.strip()}")
return "\n".join(parts) if parts else "(no messages)"
def _format_outputs(accumulator_state: dict[str, Any]) -> str:
"""Format output key values for evaluation.
Lists and dicts get structural formatting so the judge can assess
quantity and structure, not just a truncated stringification.
"""
if not accumulator_state:
return "(none)"
parts = []
for key, value in accumulator_state.items():
if isinstance(value, list):
# Show count + brief per-item preview so the judge can
# verify quantity without the full serialization.
items_preview = []
for i, item in enumerate(value[:8]):
item_str = str(item)
if len(item_str) > 150:
item_str = item_str[:150] + "..."
items_preview.append(f" [{i}]: {item_str}")
val_str = f"list ({len(value)} items):\n" + "\n".join(items_preview)
if len(value) > 8:
val_str += f"\n ... and {len(value) - 8} more"
elif isinstance(value, dict):
val_str = str(value)
if len(val_str) > 400:
val_str = val_str[:400] + "..."
else:
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
parts.append(f" {key}: {val_str}")
return "\n".join(parts)
def _parse_verdict(response: str) -> PhaseVerdict:
"""Parse LLM response into PhaseVerdict."""
action = "ACCEPT"
confidence = 0.8
feedback = ""
for line in response.strip().split("\n"):
line = line.strip()
if line.startswith("ACTION:"):
action_str = line.split(":", 1)[1].strip().upper()
if action_str in ("ACCEPT", "RETRY"):
action = action_str
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(line.split(":", 1)[1].strip())
except ValueError:
pass
elif line.startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return PhaseVerdict(action=action, confidence=confidence, feedback=feedback)
+28 -17
View File
@@ -21,6 +21,9 @@ allowing the LLM to evaluate whether proceeding along an edge makes sense
given the current goal, context, and execution state.
"""
import json
import logging
import re
from enum import StrEnum
from typing import Any
@@ -28,6 +31,8 @@ from pydantic import BaseModel, Field, model_validator
from framework.graph.safe_eval import safe_eval
logger = logging.getLogger(__name__)
DEFAULT_MAX_TOKENS = 8192
@@ -99,7 +104,7 @@ class EdgeSpec(BaseModel):
model_config = {"extra": "allow"}
def should_traverse(
async def should_traverse(
self,
source_success: bool,
source_output: dict[str, Any],
@@ -140,7 +145,7 @@ class EdgeSpec(BaseModel):
if llm is None or goal is None:
# Fallback to ON_SUCCESS if LLM not available
return source_success
return self._llm_decide(
return await self._llm_decide(
llm=llm,
goal=goal,
source_success=source_success,
@@ -158,9 +163,6 @@ class EdgeSpec(BaseModel):
memory: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
import logging
logger = logging.getLogger(__name__)
if not self.condition_expr:
return True
@@ -201,7 +203,7 @@ class EdgeSpec(BaseModel):
logger.warning(f" Available context keys: {list(context.keys())}")
return False
def _llm_decide(
async def _llm_decide(
self,
llm: Any,
goal: Any,
@@ -217,8 +219,6 @@ class EdgeSpec(BaseModel):
The LLM evaluates whether proceeding to the target node
is the best next step toward achieving the goal.
"""
import json
# Build context for LLM
prompt = f"""You are evaluating whether to proceed along an edge in an agent workflow.
@@ -247,15 +247,13 @@ Respond with ONLY a JSON object:
{{"proceed": true/false, "reasoning": "brief explanation"}}"""
try:
response = llm.complete(
response = await llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a routing agent. Respond with JSON only.",
max_tokens=150,
)
# Parse response
import re
json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
if json_match:
data = json.loads(json_match.group())
@@ -263,9 +261,6 @@ Respond with ONLY a JSON object:
reasoning = data.get("reasoning", "")
# Log the decision (using basic print for now)
import logging
logger = logging.getLogger(__name__)
logger.info(f" 🤔 LLM routing decision: {'PROCEED' if proceed else 'SKIP'}")
logger.info(f" Reason: {reasoning}")
@@ -273,9 +268,6 @@ Respond with ONLY a JSON object:
except Exception as e:
# Fallback: proceed on success
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ LLM routing failed, defaulting to on_success: {e}")
return source_success
@@ -443,6 +435,25 @@ class GraphSpec(BaseModel):
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
)
# Conversation mode
conversation_mode: str = Field(
default="continuous",
description=(
"How conversations flow between event_loop nodes. "
"'continuous' (default): one conversation threads through all "
"event_loop nodes with cumulative tools and layered prompt composition. "
"'isolated': each node gets a fresh conversation."
),
)
identity_prompt: str | None = Field(
default=None,
description=(
"Agent-level identity prompt (Layer 1 of the onion model). "
"In continuous mode, this is the static identity that persists "
"unchanged across all node transitions. In isolated mode, ignored."
),
)
# Metadata
description: str = ""
created_by: str = "" # "human" or "builder_agent"
File diff suppressed because it is too large Load Diff
+454 -94
View File
@@ -11,7 +11,6 @@ The executor:
import asyncio
import logging
import warnings
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
@@ -21,13 +20,10 @@ from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.goal import Goal
from framework.graph.node import (
FunctionNode,
LLMNode,
NodeContext,
NodeProtocol,
NodeResult,
NodeSpec,
RouterNode,
SharedMemory,
)
from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
@@ -138,6 +134,7 @@ class GraphExecutor:
runtime_logger: Any = None,
storage_path: str | Path | None = None,
loop_config: dict[str, Any] | None = None,
accounts_prompt: str = "",
):
"""
Initialize the executor.
@@ -157,6 +154,7 @@ class GraphExecutor:
runtime_logger: Optional RuntimeLogger for per-graph-run logging
storage_path: Optional base path for conversation persistence
loop_config: Optional EventLoopNode configuration (max_iterations, etc.)
accounts_prompt: Connected accounts block for system prompt injection
"""
self.runtime = runtime
self.llm = llm
@@ -171,6 +169,7 @@ class GraphExecutor:
self.runtime_logger = runtime_logger
self._storage_path = Path(storage_path) if storage_path else None
self._loop_config = loop_config or {}
self.accounts_prompt = accounts_prompt
# Initialize output cleaner
self.cleansing_config = cleansing_config or CleansingConfig()
@@ -186,6 +185,55 @@ class GraphExecutor:
# Pause/resume control
self._pause_requested = asyncio.Event()
def _write_progress(
self,
current_node: str,
path: list[str],
memory: Any,
node_visit_counts: dict[str, int],
) -> None:
"""Update state.json with live progress at node transitions.
Reads the existing state.json (written by ExecutionStream at session
start) and patches the progress fields in-place. This keeps
state.json as the single source of truth readers always see
current progress, not stale initial values.
The write is synchronous and best-effort: never blocks execution.
"""
if not self._storage_path:
return
try:
import json as _json
from datetime import datetime
state_path = self._storage_path / "state.json"
if state_path.exists():
state_data = _json.loads(state_path.read_text(encoding="utf-8"))
else:
state_data = {}
# Patch progress fields
progress = state_data.setdefault("progress", {})
progress["current_node"] = current_node
progress["path"] = list(path)
progress["node_visit_counts"] = dict(node_visit_counts)
progress["steps_executed"] = len(path)
# Update timestamp
timestamps = state_data.setdefault("timestamps", {})
timestamps["updated_at"] = datetime.now().isoformat()
# Persist full memory so state.json is sufficient for resume
# even if the process dies before the final write.
memory_snapshot = memory.read_all()
state_data["memory"] = memory_snapshot
state_data["memory_keys"] = list(memory_snapshot.keys())
state_path.write_text(_json.dumps(state_data, indent=2), encoding="utf-8")
except Exception:
pass # Best-effort — never block execution
def _validate_tools(self, graph: GraphSpec) -> list[str]:
"""
Validate that all tools declared by nodes are available.
@@ -257,6 +305,13 @@ class GraphExecutor:
# Initialize execution state
memory = SharedMemory()
# Continuous conversation mode state
is_continuous = getattr(graph, "conversation_mode", "isolated") == "continuous"
continuous_conversation = None # NodeConversation threaded across nodes
cumulative_tools: list = [] # Tools accumulate, never removed
cumulative_tool_names: set[str] = set()
cumulative_output_keys: list[str] = [] # Output keys from all visited nodes
# Initialize checkpoint store if checkpointing is enabled
checkpoint_store: CheckpointStore | None = None
if checkpoint_config and checkpoint_config.enabled and self._storage_path:
@@ -273,16 +328,26 @@ class GraphExecutor:
f"{type(memory_data).__name__}, expected dict"
)
else:
# Restore memory from previous session
# Restore memory from previous session.
# Skip validation — this data was already validated when
# originally written, and research text triggers false
# positives on the code-indicator heuristic.
for key, value in memory_data.items():
memory.write(key, value)
memory.write(key, value, validate=False)
self.logger.info(f"📥 Restored session state with {len(memory_data)} memory keys")
# Write new input data to memory (each key individually)
if input_data:
# Write new input data to memory (each key individually).
# Skip when resuming from a paused session — restored memory already
# contains all state including the original input, and re-writing
# input_data would overwrite intermediate results with stale values.
_is_resuming = bool(session_state and session_state.get("paused_at"))
if input_data and not _is_resuming:
for key, value in input_data.items():
memory.write(key, value)
# Detect event-triggered execution (timer/webhook) — no interactive user.
_event_triggered = bool(input_data and isinstance(input_data.get("event"), dict))
path: list[str] = []
total_tokens = 0
total_latency = 0
@@ -368,7 +433,7 @@ class GraphExecutor:
# Check if resuming from paused_at (session state resume)
paused_at = session_state.get("paused_at") if session_state else None
node_ids = [n.id for n in graph.nodes]
self.logger.info(f"🔍 Debug: paused_at={paused_at}, available node IDs={node_ids}")
self.logger.debug(f"paused_at={paused_at}, available node IDs={node_ids}")
if paused_at and graph.get_node(paused_at) is not None:
# Resume from paused_at node directly (works for any node, not just pause_nodes)
@@ -396,9 +461,76 @@ class GraphExecutor:
steps = 0
# Fresh shared-session execution: clear stale cursor so the entry
# node doesn't restore a filled OutputAccumulator from the previous
# webhook run (which would cause the judge to accept immediately).
# The conversation history is preserved (continuous memory).
_is_fresh_shared = bool(
session_state
and session_state.get("resume_session_id")
and not session_state.get("paused_at")
and not session_state.get("resume_from_checkpoint")
)
if _is_fresh_shared and is_continuous and self._storage_path:
try:
from framework.storage.conversation_store import FileConversationStore
entry_conv_path = self._storage_path / "conversations" / current_node_id
if entry_conv_path.exists():
_store = FileConversationStore(base_path=entry_conv_path)
# Read cursor to find next seq for the transition marker.
_cursor = await _store.read_cursor() or {}
_next_seq = _cursor.get("next_seq", 0)
if _next_seq == 0:
# Fallback: scan part files for max seq
_parts = await _store.read_parts()
if _parts:
_next_seq = max(p.get("seq", 0) for p in _parts) + 1
# Reset cursor — clears stale accumulator outputs and
# iteration counter so the node starts fresh work while
# the conversation thread carries forward.
await _store.write_cursor({})
# Append a transition marker so the LLM knows a new
# event arrived and previous results are outdated.
await _store.write_part(
_next_seq,
{
"role": "user",
"content": (
"--- NEW EVENT TRIGGER ---\n"
"A new event has been received. "
"Process this as a fresh request — "
"previous outputs are no longer valid."
),
"seq": _next_seq,
"is_transition_marker": True,
},
)
self.logger.info(
"🔄 Cleared stale cursor and added transition marker "
"for shared-session entry node '%s'",
current_node_id,
)
except Exception:
self.logger.debug(
"Could not prepare conversation store for shared-session entry node '%s'",
current_node_id,
exc_info=True,
)
if session_state and current_node_id != graph.entry_node:
self.logger.info(f"🔄 Resuming from: {current_node_id}")
# Emit resume event
if self._event_bus:
await self._event_bus.emit_execution_resumed(
stream_id=self._stream_id,
node_id=current_node_id,
)
# Start run
_run_id = self.runtime.start_run(
goal_id=goal.id,
@@ -435,6 +567,14 @@ class GraphExecutor:
if self._pause_requested.is_set():
self.logger.info("⏸ Pause detected - stopping at node boundary")
# Emit pause event
if self._event_bus:
await self._event_bus.emit_execution_paused(
stream_id=self._stream_id,
node_id=current_node_id,
reason="User requested pause (Ctrl+Z)",
)
# Create session state for pause
saved_memory = memory.read_all()
pause_session_state: dict[str, Any] = {
@@ -481,7 +621,7 @@ class GraphExecutor:
cnt = node_visit_counts.get(current_node_id, 0) + 1
node_visit_counts[current_node_id] = cnt
_is_retry = False
max_visits = getattr(node_spec, "max_node_visits", 1)
max_visits = getattr(node_spec, "max_node_visits", 0)
if max_visits > 0 and node_visit_counts[current_node_id] > max_visits:
self.logger.warning(
f" ⊘ Node '{node_spec.name}' visit limit reached "
@@ -489,7 +629,7 @@ class GraphExecutor:
)
# Skip execution — follow outgoing edges using current memory
skip_result = NodeResult(success=True, output=memory.read_all())
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -505,6 +645,21 @@ class GraphExecutor:
path.append(current_node_id)
# Clear stale nullable outputs from previous visits.
# When a node is re-visited (e.g. review → process-batch → review),
# nullable outputs from the PREVIOUS visit linger in shared memory.
# This causes stale edge conditions to fire (e.g. "feedback is not None"
# from visit 1 triggers even when visit 2 sets "final_summary" instead).
# Clearing them ensures only the CURRENT visit's outputs affect routing.
if node_visit_counts.get(current_node_id, 0) > 1:
nullable_keys = getattr(node_spec, "nullable_output_keys", None) or []
for key in nullable_keys:
if memory.read(key) is not None:
memory.write(key, None, validate=False)
self.logger.info(
f" 🧹 Cleared stale nullable output '{key}' from previous visit"
)
# Check if pause (HITL) before execution
if current_node_id in graph.pause_nodes:
self.logger.info(f"⏸ Paused at HITL node: {node_spec.name}")
@@ -515,6 +670,17 @@ class GraphExecutor:
self.logger.info(f" Inputs: {node_spec.input_keys}")
self.logger.info(f" Outputs: {node_spec.output_keys}")
# Continuous mode: accumulate tools and output keys from this node
if is_continuous and node_spec.tools:
for t in self.tools:
if t.name in node_spec.tools and t.name not in cumulative_tool_names:
cumulative_tools.append(t)
cumulative_tool_names.add(t.name)
if is_continuous and node_spec.output_keys:
for k in node_spec.output_keys:
if k not in cumulative_output_keys:
cumulative_output_keys.append(k)
# Build context for node
ctx = self._build_context(
node_spec=node_spec,
@@ -522,6 +688,11 @@ class GraphExecutor:
goal=goal,
input_data=input_data or {},
max_tokens=graph.max_tokens,
continuous_mode=is_continuous,
inherited_conversation=continuous_conversation if is_continuous else None,
override_tools=cumulative_tools if is_continuous else None,
cumulative_output_keys=cumulative_output_keys if is_continuous else None,
event_triggered=_event_triggered,
)
# Log actual input data being read
@@ -665,9 +836,13 @@ class GraphExecutor:
# [CORRECTED] Use node_spec.max_retries instead of hardcoded 3
max_retries = getattr(node_spec, "max_retries", 3)
# Event loop nodes handle retry internally via judge —
# executor retry is catastrophic (retry multiplication)
if node_spec.node_type == "event_loop" and max_retries > 0:
# EventLoopNode instances handle retry internally via judge —
# executor retry would cause catastrophic retry multiplication.
# Only override for actual EventLoopNode instances, not custom
# NodeProtocol implementations that happen to use node_type="event_loop"
from framework.graph.event_loop_node import EventLoopNode
if isinstance(node_impl, EventLoopNode) and max_retries > 0:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has max_retries={max_retries}. "
"Overriding to 0 — event loop nodes handle retry internally via judge."
@@ -689,6 +864,17 @@ class GraphExecutor:
self.logger.info(
f" ↻ Retrying ({node_retry_counts[current_node_id]}/{max_retries})..."
)
# Emit retry event
if self._event_bus:
await self._event_bus.emit_node_retry(
stream_id=self._stream_id,
node_id=current_node_id,
retry_count=retry_count,
max_retries=max_retries,
error=result.error or "",
)
_is_retry = True
continue
else:
@@ -698,7 +884,7 @@ class GraphExecutor:
)
# Check if there's an ON_FAILURE edge to follow
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -748,6 +934,7 @@ class GraphExecutor:
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"resume_from": current_node_id,
}
return ExecutionResult(
@@ -774,11 +961,22 @@ class GraphExecutor:
# This must happen BEFORE determining next node, since pause nodes may have no edges
if node_spec.id in graph.pause_nodes:
self.logger.info("💾 Saving session state after pause node")
# Emit pause event
if self._event_bus:
await self._event_bus.emit_execution_paused(
stream_id=self._stream_id,
node_id=node_spec.id,
reason="HITL pause node",
)
saved_memory = memory.read_all()
session_state_out = {
"paused_at": node_spec.id,
"resume_from": f"{node_spec.id}_resume", # Resume key
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"next_node": None, # Will resume from entry point
}
@@ -827,10 +1025,21 @@ class GraphExecutor:
if result.next_node:
# Router explicitly set next node
self.logger.info(f" → Router directing to: {result.next_node}")
# Emit edge traversed event for router-directed edge
if self._event_bus:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=result.next_node,
edge_condition="router",
)
current_node_id = result.next_node
self._write_progress(current_node_id, path, memory, node_visit_counts)
else:
# Get all traversable edges for fan-out detection
traversable_edges = self._get_all_traversable_edges(
traversable_edges = await self._get_all_traversable_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -849,6 +1058,18 @@ class GraphExecutor:
targets = [e.target for e in traversable_edges]
fan_in_node = self._find_convergence_node(graph, targets)
# Emit edge traversed events for fan-out branches
if self._event_bus:
for edge in traversable_edges:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=edge.target,
edge_condition=edge.condition.value
if hasattr(edge.condition, "value")
else str(edge.condition),
)
# Execute branches in parallel
(
_branch_results,
@@ -871,13 +1092,14 @@ class GraphExecutor:
if fan_in_node:
self.logger.info(f" ⑃ Fan-in: converging at {fan_in_node}")
current_node_id = fan_in_node
self._write_progress(current_node_id, path, memory, node_visit_counts)
else:
# No convergence point - branches are terminal
self.logger.info(" → Parallel branches completed (no convergence)")
break
else:
# Sequential: follow single edge (existing logic via _follow_edges)
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -891,6 +1113,14 @@ class GraphExecutor:
next_spec = graph.get_node(next_node)
self.logger.info(f" → Next: {next_spec.name if next_spec else next_node}")
# Emit edge traversed event for sequential edge
if self._event_bus:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=next_node,
)
# CHECKPOINT: node_complete (after determining next node)
if (
checkpoint_store
@@ -925,6 +1155,103 @@ class GraphExecutor:
current_node_id = next_node
# Write progress snapshot at node transition
self._write_progress(current_node_id, path, memory, node_visit_counts)
# Continuous mode: thread conversation forward with transition marker
if is_continuous and result.conversation is not None:
continuous_conversation = result.conversation
# Look up the next node spec for the transition marker
next_spec = graph.get_node(current_node_id)
if next_spec and next_spec.node_type == "event_loop":
from framework.graph.prompt_composer import (
build_narrative,
build_transition_marker,
compose_system_prompt,
)
# Build Layer 2 (narrative) from current state
narrative = build_narrative(memory, path, graph)
# Read agent working memory (adapt.md) once for both
# system prompt and transition marker.
_adapt_text: str | None = None
if self._storage_path:
_adapt_path = self._storage_path / "data" / "adapt.md"
if _adapt_path.exists():
_raw = _adapt_path.read_text(encoding="utf-8").strip()
_adapt_text = _raw or None
# Merge adapt.md into narrative for system prompt
if _adapt_text:
narrative = (
f"{narrative}\n\n--- Agent Memory ---\n{_adapt_text}"
if narrative
else _adapt_text
)
# Compose new system prompt (Layer 1 + 2 + 3 + accounts)
new_system = compose_system_prompt(
identity_prompt=getattr(graph, "identity_prompt", None),
focus_prompt=next_spec.system_prompt,
narrative=narrative,
accounts_prompt=self.accounts_prompt or None,
)
continuous_conversation.update_system_prompt(new_system)
# Switch conversation store to the next node's directory
# so the transition marker and all subsequent messages are
# persisted there instead of the first node's directory.
if self._storage_path:
from framework.storage.conversation_store import (
FileConversationStore,
)
next_store_path = self._storage_path / "conversations" / next_spec.id
next_store = FileConversationStore(base_path=next_store_path)
await continuous_conversation.switch_store(next_store)
# Insert transition marker into conversation
data_dir = str(self._storage_path / "data") if self._storage_path else None
marker = build_transition_marker(
previous_node=node_spec,
next_node=next_spec,
memory=memory,
cumulative_tool_names=sorted(cumulative_tool_names),
data_dir=data_dir,
adapt_content=_adapt_text,
)
await continuous_conversation.add_user_message(
marker,
is_transition_marker=True,
)
# Set current phase for phase-aware compaction
continuous_conversation.set_current_phase(next_spec.id)
# Opportunistic compaction at transition:
# 1. Prune old tool results (free, no LLM call)
# 2. If still over 80%, do a phase-graduated compact
if continuous_conversation.usage_ratio() > 0.5:
await continuous_conversation.prune_old_tool_results(
protect_tokens=2000,
)
if continuous_conversation.needs_compaction():
self.logger.info(
" Phase-boundary compaction (%.0f%% usage)",
continuous_conversation.usage_ratio() * 100,
)
summary = (
f"Summary of earlier phases (before {next_spec.name}). "
"See transition markers for phase details."
)
await continuous_conversation.compact(
summary,
keep_recent=4,
phase_graduated=True,
)
# Update input_data for next node
input_data = result.output
@@ -978,12 +1305,47 @@ class GraphExecutor:
had_partial_failures=len(nodes_failed) > 0,
execution_quality=exec_quality,
node_visit_counts=dict(node_visit_counts),
session_state={
"memory": output, # output IS memory.read_all()
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
},
)
except asyncio.CancelledError:
# Handle cancellation (e.g., TUI quit) - save as paused instead of failed
self.logger.info("⏸ Execution cancelled - saving state for resume")
# Flush WIP accumulator outputs from the interrupted node's
# cursor.json into SharedMemory so they survive resume. The
# accumulator writes to cursor.json on every set() call, but
# only writes to SharedMemory when the judge ACCEPTs. Without
# this, edge conditions checking these keys see None on resume.
if current_node_id and self._storage_path:
try:
import json as _json
cursor_path = (
self._storage_path / "conversations" / current_node_id / "cursor.json"
)
if cursor_path.exists():
cursor_data = _json.loads(cursor_path.read_text(encoding="utf-8"))
wip_outputs = cursor_data.get("outputs", {})
for key, value in wip_outputs.items():
if value is not None:
memory.write(key, value, validate=False)
if wip_outputs:
self.logger.info(
"Flushed %d WIP accumulator outputs to memory: %s",
len(wip_outputs),
list(wip_outputs.keys()),
)
except Exception:
self.logger.debug(
"Could not flush accumulator outputs from cursor",
exc_info=True,
)
# Save memory and state for resume
saved_memory = memory.read_all()
session_state_out: dict[str, Any] = {
@@ -1061,12 +1423,32 @@ class GraphExecutor:
execution_quality="failed",
)
# Flush WIP accumulator outputs (same as CancelledError path)
if current_node_id and self._storage_path:
try:
import json as _json
cursor_path = (
self._storage_path / "conversations" / current_node_id / "cursor.json"
)
if cursor_path.exists():
cursor_data = _json.loads(cursor_path.read_text(encoding="utf-8"))
for key, value in cursor_data.get("outputs", {}).items():
if value is not None:
memory.write(key, value, validate=False)
except Exception:
self.logger.debug(
"Could not flush accumulator outputs from cursor",
exc_info=True,
)
# Save memory and state for potential resume
saved_memory = memory.read_all()
session_state_out: dict[str, Any] = {
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"resume_from": current_node_id,
}
# Mark latest checkpoint for resume on failure
@@ -1119,12 +1501,21 @@ class GraphExecutor:
goal: Goal,
input_data: dict[str, Any],
max_tokens: int = 4096,
continuous_mode: bool = False,
inherited_conversation: Any = None,
override_tools: list | None = None,
cumulative_output_keys: list[str] | None = None,
event_triggered: bool = False,
) -> NodeContext:
"""Build execution context for a node."""
# Filter tools to those available to this node
available_tools = []
if node_spec.tools:
available_tools = [t for t in self.tools if t.name in node_spec.tools]
if override_tools is not None:
# Continuous mode: use cumulative tool set
available_tools = list(override_tools)
else:
available_tools = []
if node_spec.tools:
available_tools = [t for t in self.tools if t.name in node_spec.tools]
# Create scoped memory view
scoped_memory = memory.with_permissions(
@@ -1145,18 +1536,25 @@ class GraphExecutor:
max_tokens=max_tokens,
runtime_logger=self.runtime_logger,
pause_event=self._pause_requested, # Pass pause event for granular control
continuous_mode=continuous_mode,
inherited_conversation=inherited_conversation,
cumulative_output_keys=cumulative_output_keys or [],
event_triggered=event_triggered,
accounts_prompt=self.accounts_prompt,
execution_id=self.runtime.execution_id,
)
# Valid node types - no ambiguous "llm" type allowed
VALID_NODE_TYPES = {
"llm_tool_use",
"llm_generate",
"router",
"function",
"human_input",
"event_loop",
}
DEPRECATED_NODE_TYPES = {"llm_tool_use": "event_loop", "llm_generate": "event_loop"}
# Node types removed in v0.5 — provide migration guidance
REMOVED_NODE_TYPES = {
"function": "event_loop",
"llm_tool_use": "event_loop",
"llm_generate": "event_loop",
"router": "event_loop", # Unused theoretical infrastructure
"human_input": "event_loop", # Use client_facing=True instead
}
def _get_node_implementation(
self, node_spec: NodeSpec, cleanup_llm_model: str | None = None
@@ -1166,62 +1564,23 @@ class GraphExecutor:
if node_spec.id in self.node_registry:
return self.node_registry[node_spec.id]
# Reject removed node types with migration guidance
if node_spec.node_type in self.REMOVED_NODE_TYPES:
replacement = self.REMOVED_NODE_TYPES[node_spec.node_type]
raise RuntimeError(
f"Node type '{node_spec.node_type}' was removed in v0.5. "
f"Migrate node '{node_spec.id}' to '{replacement}'. "
f"See https://github.com/adenhq/hive/issues/4753 for migration guide."
)
# Validate node type
if node_spec.node_type not in self.VALID_NODE_TYPES:
raise RuntimeError(
f"Invalid node type '{node_spec.node_type}' for node '{node_spec.id}'. "
f"Must be one of: {sorted(self.VALID_NODE_TYPES)}. "
f"Use 'llm_tool_use' for nodes that call tools, 'llm_generate' for text generation."
)
# Warn on deprecated node types
if node_spec.node_type in self.DEPRECATED_NODE_TYPES:
replacement = self.DEPRECATED_NODE_TYPES[node_spec.node_type]
warnings.warn(
f"Node type '{node_spec.node_type}' is deprecated. "
f"Use '{replacement}' instead. "
f"Node: '{node_spec.id}'",
DeprecationWarning,
stacklevel=2,
)
# Create based on type
if node_spec.node_type == "llm_tool_use":
if not node_spec.tools:
raise RuntimeError(
f"Node '{node_spec.id}' is type 'llm_tool_use' but declares no tools. "
"Either add tools to the node or change type to 'llm_generate'."
)
return LLMNode(
tool_executor=self.tool_executor,
require_tools=True,
cleanup_llm_model=cleanup_llm_model,
)
if node_spec.node_type == "llm_generate":
return LLMNode(
tool_executor=None,
require_tools=False,
cleanup_llm_model=cleanup_llm_model,
)
if node_spec.node_type == "router":
return RouterNode()
if node_spec.node_type == "function":
# Function nodes need explicit registration
raise RuntimeError(
f"Function node '{node_spec.id}' not registered. Register with node_registry."
)
if node_spec.node_type == "human_input":
# Human input nodes are handled specially by HITL mechanism
return LLMNode(
tool_executor=None,
require_tools=False,
cleanup_llm_model=cleanup_llm_model,
f"Must be one of: {sorted(self.VALID_NODE_TYPES)}."
)
# Create based on type (only event_loop is valid)
if node_spec.node_type == "event_loop":
# Auto-create EventLoopNode with sensible defaults.
# Custom configs can still be pre-registered via node_registry.
@@ -1269,7 +1628,7 @@ class GraphExecutor:
# Should never reach here due to validation above
raise RuntimeError(f"Unhandled node type: {node_spec.node_type}")
def _follow_edges(
async def _follow_edges(
self,
graph: GraphSpec,
goal: Goal,
@@ -1284,7 +1643,7 @@ class GraphExecutor:
for edge in edges:
target_node_spec = graph.get_node(edge.target)
if edge.should_traverse(
if await edge.should_traverse(
source_success=result.success,
source_output=result.output,
memory=memory.read_all(),
@@ -1310,7 +1669,7 @@ class GraphExecutor:
self.logger.warning(f"⚠ Output validation failed: {validation.errors}")
# Clean the output
cleaned_output = self.output_cleaner.clean_output(
cleaned_output = await self.output_cleaner.clean_output(
output=output_to_validate,
source_node_id=current_node_id,
target_node_spec=target_node_spec,
@@ -1348,7 +1707,7 @@ class GraphExecutor:
return None
def _get_all_traversable_edges(
async def _get_all_traversable_edges(
self,
graph: GraphSpec,
goal: Goal,
@@ -1368,7 +1727,7 @@ class GraphExecutor:
for edge in edges:
target_node_spec = graph.get_node(edge.target)
if edge.should_traverse(
if await edge.should_traverse(
source_success=result.success,
source_output=result.output,
memory=memory.read_all(),
@@ -1481,14 +1840,19 @@ class GraphExecutor:
branch.error = f"Node {branch.node_id} not found in graph"
return branch, RuntimeError(branch.error)
# Get node implementation to check its type
branch_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)
effective_max_retries = node_spec.max_retries
if node_spec.node_type == "event_loop":
if effective_max_retries > 1:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has "
f"max_retries={effective_max_retries}. Overriding "
"to 1 — event loop nodes handle retry internally."
)
# Only override for actual EventLoopNode instances, not custom NodeProtocol impls
from framework.graph.event_loop_node import EventLoopNode
if isinstance(branch_impl, EventLoopNode) and effective_max_retries > 1:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has "
f"max_retries={effective_max_retries}. Overriding "
"to 1 — event loop nodes handle retry internally."
)
effective_max_retries = 1
branch.status = "running"
@@ -1510,7 +1874,7 @@ class GraphExecutor:
f"⚠ Output validation failed for branch "
f"{branch.node_id}: {validation.errors}"
)
cleaned_output = self.output_cleaner.clean_output(
cleaned_output = await self.output_cleaner.clean_output(
output=mem_snapshot,
source_node_id=source_node_spec.id if source_node_spec else "unknown",
target_node_spec=node_spec,
@@ -1654,10 +2018,6 @@ class GraphExecutor:
"""Register a custom node implementation."""
self.node_registry[node_id] = implementation
def register_function(self, node_id: str, func: Callable) -> None:
"""Register a function as a node."""
self.node_registry[node_id] = FunctionNode(func)
def request_pause(self) -> None:
"""
Request graceful pause of the current execution.
-552
View File
@@ -1,552 +0,0 @@
"""
Flexible Graph Executor with Worker-Judge Loop.
Executes plans created by external planner (Claude Code, etc.)
using a Worker-Judge loop:
1. External planner creates Plan
2. FlexibleGraphExecutor receives Plan
3. Worker executes each step
4. Judge evaluates each result
5. If Judge says "replan" return to external planner with feedback
6. If Judge says "escalate" request human intervention
7. If all steps complete return success
This keeps planning external while execution/evaluation is internal.
"""
from collections.abc import Callable
from dataclasses import dataclass
from datetime import datetime
from typing import Any
from framework.graph.code_sandbox import CodeSandbox
from framework.graph.goal import Goal
from framework.graph.judge import HybridJudge, create_default_judge
from framework.graph.plan import (
ApprovalDecision,
ApprovalRequest,
ApprovalResult,
ExecutionStatus,
Judgment,
JudgmentAction,
Plan,
PlanExecutionResult,
PlanStep,
StepStatus,
)
from framework.graph.worker_node import StepExecutionResult, WorkerNode
from framework.llm.provider import LLMProvider, Tool
from framework.runtime.core import Runtime
# Type alias for approval callback
ApprovalCallback = Callable[[ApprovalRequest], ApprovalResult]
@dataclass
class ExecutorConfig:
"""Configuration for FlexibleGraphExecutor."""
max_retries_per_step: int = 3
max_total_steps: int = 100
timeout_seconds: int = 300
enable_parallel_execution: bool = False # Future: parallel step execution
class FlexibleGraphExecutor:
"""
Executes plans with Worker-Judge loop.
Plans come from external source (Claude Code, etc.).
Returns feedback for replanning if needed.
Usage:
executor = FlexibleGraphExecutor(
runtime=runtime,
llm=llm_provider,
tools=tools,
)
result = await executor.execute_plan(plan, goal, context)
if result.status == ExecutionStatus.NEEDS_REPLAN:
# External planner should create new plan using result.feedback
new_plan = external_planner.replan(result.feedback_context)
result = await executor.execute_plan(new_plan, goal, result.feedback_context)
"""
def __init__(
self,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
functions: dict[str, Callable] | None = None,
judge: HybridJudge | None = None,
config: ExecutorConfig | None = None,
approval_callback: ApprovalCallback | None = None,
):
"""
Initialize the FlexibleGraphExecutor.
Args:
runtime: Runtime for decision logging
llm: LLM provider for Worker and Judge
tools: Available tools
tool_executor: Function to execute tools
functions: Registered functions
judge: Custom judge (defaults to HybridJudge with default rules)
config: Executor configuration
approval_callback: Callback for human-in-the-loop approval.
If None, steps requiring approval will pause execution.
"""
self.runtime = runtime
self.llm = llm
self.tools = tools or {}
self.tool_executor = tool_executor
self.functions = functions or {}
self.config = config or ExecutorConfig()
self.approval_callback = approval_callback
# Create judge
self.judge = judge or create_default_judge(llm)
# Create worker
self.worker = WorkerNode(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
functions=functions,
sandbox=CodeSandbox(),
)
async def execute_plan(
self,
plan: Plan,
goal: Goal,
context: dict[str, Any] | None = None,
) -> PlanExecutionResult:
"""
Execute a plan created by external planner.
Args:
plan: The plan to execute
goal: The goal context
context: Initial context (e.g., from previous execution)
Returns:
PlanExecutionResult with status and feedback
"""
context = context or {}
context.update(plan.context) # Merge plan's accumulated context
# Start run
_run_id = self.runtime.start_run(
goal_id=goal.id,
goal_description=goal.description,
input_data={"plan_id": plan.id, "revision": plan.revision},
)
steps_executed = 0
total_tokens = 0
total_latency = 0
try:
while steps_executed < self.config.max_total_steps:
# Get next ready steps
ready_steps = plan.get_ready_steps()
if not ready_steps:
# Check if we're done or stuck
if plan.is_complete():
break
else:
# No ready steps but not complete - something's wrong
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=(
"No executable steps available but plan not complete. "
"Check dependencies."
),
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
# Execute next step (for now, sequential; could be parallel)
step = ready_steps[0]
# Debug: show ready steps
# ready_ids = [s.id for s in ready_steps]
# print(f" [DEBUG] Ready steps: {ready_ids}, executing: {step.id}")
# APPROVAL CHECK - before execution
if step.requires_approval:
approval_result = await self._request_approval(step, context)
if approval_result is None:
# No callback, pause execution
step.status = StepStatus.AWAITING_APPROVAL
return self._create_result(
status=ExecutionStatus.AWAITING_APPROVAL,
plan=plan,
context=context,
feedback=f"Step '{step.id}' requires approval: {step.description}",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if approval_result.decision == ApprovalDecision.REJECT:
step.status = StepStatus.REJECTED
step.error = approval_result.reason or "Rejected by human"
# Skip this step and continue with dependents marked as skipped
self._skip_dependent_steps(plan, step.id)
continue
if approval_result.decision == ApprovalDecision.ABORT:
return self._create_result(
status=ExecutionStatus.ABORTED,
plan=plan,
context=context,
feedback=approval_result.reason or "Aborted by human",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if approval_result.decision == ApprovalDecision.MODIFY:
# Apply modifications to step
if approval_result.modifications:
self._apply_modifications(step, approval_result.modifications)
# APPROVE - continue to execution
step.status = StepStatus.IN_PROGRESS
step.started_at = datetime.now()
step.attempts += 1
# WORK
work_result = await self.worker.execute(step, context)
steps_executed += 1
total_tokens += work_result.tokens_used
total_latency += work_result.latency_ms
# JUDGE
judgment = await self.judge.evaluate(
step=step,
result=work_result.__dict__,
goal=goal,
context=context,
)
# Handle judgment
result = await self._handle_judgment(
step=step,
work_result=work_result,
judgment=judgment,
plan=plan,
goal=goal,
context=context,
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if result is not None:
# Judgment resulted in early return (replan/escalate)
self.runtime.end_run(
success=False,
narrative=f"Execution stopped: {result.status.value}",
)
return result
# All steps completed successfully
self.runtime.end_run(
success=True,
output_data=context,
narrative=f"Plan completed: {steps_executed} steps executed",
)
return self._create_result(
status=ExecutionStatus.COMPLETED,
plan=plan,
context=context,
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
except Exception as e:
self.runtime.report_problem(
severity="critical",
description=str(e),
)
self.runtime.end_run(
success=False,
narrative=f"Execution failed: {e}",
)
return PlanExecutionResult(
status=ExecutionStatus.FAILED,
error=str(e),
feedback=f"Execution error: {e}",
feedback_context=plan.to_feedback_context(),
completed_steps=[s.id for s in plan.get_completed_steps()],
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency_ms=total_latency,
)
async def _handle_judgment(
self,
step: PlanStep,
work_result: StepExecutionResult,
judgment: Judgment,
plan: Plan,
goal: Goal,
context: dict[str, Any],
steps_executed: int,
total_tokens: int,
total_latency: int,
) -> PlanExecutionResult | None:
"""
Handle judgment and return result if execution should stop.
Returns None to continue execution, or PlanExecutionResult to stop.
"""
if judgment.action == JudgmentAction.ACCEPT:
# Step succeeded - update state and continue
step.status = StepStatus.COMPLETED
step.completed_at = datetime.now()
step.result = work_result.outputs
# Map outputs to expected output keys
# If output has generic "result" key but step expects specific keys, map it
outputs_to_store = work_result.outputs.copy()
if step.expected_outputs and "result" in outputs_to_store:
result_value = outputs_to_store["result"]
# For each expected output key that's not in outputs, map from "result"
for expected_key in step.expected_outputs:
if expected_key not in outputs_to_store:
outputs_to_store[expected_key] = result_value
# Update context with mapped outputs
context.update(outputs_to_store)
# Store in plan context for replanning feedback
plan.context[step.id] = outputs_to_store
return None # Continue execution
elif judgment.action == JudgmentAction.RETRY:
# Retry step if under limit
if step.attempts < step.max_retries:
step.status = StepStatus.PENDING
step.error = judgment.feedback
# Record retry decision
self.runtime.decide(
intent=f"Retry step {step.id}",
options=[{"id": "retry", "description": "Retry with feedback"}],
chosen="retry",
reasoning=judgment.reasoning,
context={"attempt": step.attempts, "feedback": judgment.feedback},
)
return None # Continue (step will be retried)
else:
# Max retries exceeded - escalate to replan
step.status = StepStatus.FAILED
step.error = f"Max retries ({step.max_retries}) exceeded: {judgment.feedback}"
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=(
f"Step '{step.id}' failed after {step.attempts} attempts: "
f"{judgment.feedback}"
),
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
elif judgment.action == JudgmentAction.REPLAN:
# Return to external planner
step.status = StepStatus.FAILED
step.error = judgment.feedback
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=judgment.feedback or f"Step '{step.id}' requires replanning",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
elif judgment.action == JudgmentAction.ESCALATE:
# Request human intervention
return self._create_result(
status=ExecutionStatus.NEEDS_ESCALATION,
plan=plan,
context=context,
feedback=judgment.feedback or f"Step '{step.id}' requires human intervention",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
return None # Unknown action - continue
def _create_result(
self,
status: ExecutionStatus,
plan: Plan,
context: dict[str, Any],
feedback: str | None = None,
steps_executed: int = 0,
total_tokens: int = 0,
total_latency: int = 0,
) -> PlanExecutionResult:
"""Create a PlanExecutionResult."""
return PlanExecutionResult(
status=status,
results=context,
feedback=feedback,
feedback_context=plan.to_feedback_context(),
completed_steps=[s.id for s in plan.get_completed_steps()],
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency_ms=total_latency,
)
def register_function(self, name: str, func: Callable) -> None:
"""Register a function for FUNCTION actions."""
self.functions[name] = func
self.worker.register_function(name, func)
def register_tool(self, tool: Tool) -> None:
"""Register a tool for TOOL_USE actions."""
self.tools[tool.name] = tool
self.worker.register_tool(tool)
def add_evaluation_rule(self, rule) -> None:
"""Add an evaluation rule to the judge."""
self.judge.add_rule(rule)
async def _request_approval(
self,
step: PlanStep,
context: dict[str, Any],
) -> ApprovalResult | None:
"""
Request human approval for a step.
Returns None if no callback is set (execution should pause).
"""
if self.approval_callback is None:
return None
# Build preview of what will happen
preview_parts = []
if step.action.tool_name:
preview_parts.append(f"Tool: {step.action.tool_name}")
if step.action.tool_args:
import json
args_preview = json.dumps(step.action.tool_args, indent=2, default=str)
if len(args_preview) > 500:
args_preview = args_preview[:500] + "..."
preview_parts.append(f"Args: {args_preview}")
elif step.action.prompt:
prompt_preview = (
step.action.prompt[:300] + "..."
if len(step.action.prompt) > 300
else step.action.prompt
)
preview_parts.append(f"Prompt: {prompt_preview}")
# Include step inputs resolved from context (what will be sent/used)
relevant_context = {}
for input_key, input_value in step.inputs.items():
# Resolve variable references like "$email_sequence"
if isinstance(input_value, str) and input_value.startswith("$"):
context_key = input_value[1:] # Remove $ prefix
if context_key in context:
relevant_context[input_key] = context[context_key]
else:
relevant_context[input_key] = input_value
request = ApprovalRequest(
step_id=step.id,
step_description=step.description,
action_type=step.action.action_type.value,
action_details={
"tool_name": step.action.tool_name,
"tool_args": step.action.tool_args,
"prompt": step.action.prompt,
},
context=relevant_context,
approval_message=step.approval_message,
preview="\n".join(preview_parts) if preview_parts else None,
)
return self.approval_callback(request)
def _skip_dependent_steps(self, plan: Plan, rejected_step_id: str) -> None:
"""Mark steps that depend on a rejected step as skipped."""
for step in plan.steps:
if rejected_step_id in step.dependencies:
if step.status == StepStatus.PENDING:
step.status = StepStatus.SKIPPED
step.error = f"Skipped because dependency '{rejected_step_id}' was rejected"
# Recursively skip dependents
self._skip_dependent_steps(plan, step.id)
def _apply_modifications(self, step: PlanStep, modifications: dict[str, Any]) -> None:
"""Apply human modifications to a step before execution."""
# Allow modifying tool args
if "tool_args" in modifications and step.action.tool_args:
step.action.tool_args.update(modifications["tool_args"])
# Allow modifying prompt
if "prompt" in modifications:
step.action.prompt = modifications["prompt"]
# Allow modifying inputs
if "inputs" in modifications:
step.inputs.update(modifications["inputs"])
def set_approval_callback(self, callback: ApprovalCallback) -> None:
"""Set the approval callback for HITL steps."""
self.approval_callback = callback
# Convenience function for simple execution
async def execute_plan(
plan: Plan,
goal: Goal,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
context: dict[str, Any] | None = None,
) -> PlanExecutionResult:
"""
Execute a plan with default configuration.
Convenience function for simple use cases.
"""
executor = FlexibleGraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
)
return await executor.execute_plan(plan, goal, context)
-406
View File
@@ -1,406 +0,0 @@
"""
Hybrid Judge for Evaluating Plan Step Results.
The HybridJudge evaluates step execution results using:
1. Rule-based evaluation (fast, deterministic)
2. LLM-based evaluation (fallback for ambiguous cases)
Escalation path: rules LLM human
"""
from dataclasses import dataclass, field
from typing import Any
from framework.graph.code_sandbox import safe_eval
from framework.graph.goal import Goal
from framework.graph.plan import (
EvaluationRule,
Judgment,
JudgmentAction,
PlanStep,
)
from framework.llm.provider import LLMProvider
@dataclass
class RuleEvaluationResult:
"""Result of rule-based evaluation."""
is_definitive: bool # True if a rule matched definitively
judgment: Judgment | None = None
context: dict[str, Any] = field(default_factory=dict)
rules_checked: int = 0
rule_matched: str | None = None
class HybridJudge:
"""
Evaluates plan step results using rules first, then LLM fallback.
Usage:
judge = HybridJudge(llm=llm_provider)
judge.add_rule(EvaluationRule(
id="success_check",
condition="result.get('success') == True",
action=JudgmentAction.ACCEPT,
))
judgment = await judge.evaluate(step, result, goal)
"""
def __init__(
self,
llm: LLMProvider | None = None,
rules: list[EvaluationRule] | None = None,
llm_confidence_threshold: float = 0.7,
):
"""
Initialize the HybridJudge.
Args:
llm: LLM provider for ambiguous cases
rules: Initial evaluation rules
llm_confidence_threshold: Confidence below this triggers escalation
"""
self.llm = llm
self.rules: list[EvaluationRule] = rules or []
self.llm_confidence_threshold = llm_confidence_threshold
# Sort rules by priority (higher first)
self._sort_rules()
def _sort_rules(self):
"""Sort rules by priority."""
self.rules.sort(key=lambda r: -r.priority)
def add_rule(self, rule: EvaluationRule) -> None:
"""Add an evaluation rule."""
self.rules.append(rule)
self._sort_rules()
def remove_rule(self, rule_id: str) -> bool:
"""Remove a rule by ID. Returns True if found and removed."""
for i, rule in enumerate(self.rules):
if rule.id == rule_id:
self.rules.pop(i)
return True
return False
async def evaluate(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any] | None = None,
) -> Judgment:
"""
Evaluate a step result.
Args:
step: The executed plan step
result: The result of executing the step
goal: The goal context for evaluation
context: Additional context from previous steps
Returns:
Judgment with action and feedback
"""
context = context or {}
# Try rule-based evaluation first
rule_result = self._evaluate_rules(step, result, goal, context)
if rule_result.is_definitive:
return rule_result.judgment
# Fall back to LLM evaluation
if self.llm:
return await self._evaluate_llm(step, result, goal, context, rule_result)
# No LLM available - default to accept with low confidence
return Judgment(
action=JudgmentAction.ACCEPT,
reasoning="No definitive rule matched and no LLM available for evaluation",
confidence=0.5,
llm_used=False,
)
def _evaluate_rules(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any],
) -> RuleEvaluationResult:
"""Evaluate step using rules."""
rules_checked = 0
# Build evaluation context
eval_context = {
"step": step.model_dump() if hasattr(step, "model_dump") else step,
"result": result,
"goal": goal.model_dump() if hasattr(goal, "model_dump") else goal,
"context": context,
"success": isinstance(result, dict) and result.get("success", False),
"error": isinstance(result, dict) and result.get("error"),
}
for rule in self.rules:
rules_checked += 1
# Evaluate rule condition
eval_result = safe_eval(rule.condition, eval_context)
if eval_result.success and eval_result.result:
# Rule matched!
feedback = self._format_feedback(rule.feedback_template, eval_context)
return RuleEvaluationResult(
is_definitive=True,
judgment=Judgment(
action=rule.action,
reasoning=rule.description,
feedback=feedback if feedback else None,
rule_matched=rule.id,
confidence=1.0,
llm_used=False,
),
rules_checked=rules_checked,
rule_matched=rule.id,
)
# No rule matched definitively
return RuleEvaluationResult(
is_definitive=False,
context=eval_context,
rules_checked=rules_checked,
)
def _format_feedback(
self,
template: str,
context: dict[str, Any],
) -> str:
"""Format feedback template with context values."""
if not template:
return ""
try:
return template.format(**context)
except (KeyError, ValueError):
return template
async def _evaluate_llm(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any],
rule_result: RuleEvaluationResult,
) -> Judgment:
"""Evaluate step using LLM."""
system_prompt = self._build_llm_system_prompt(goal)
user_prompt = self._build_llm_user_prompt(step, result, context, rule_result)
try:
response = self.llm.complete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
)
# Parse LLM response
judgment = self._parse_llm_response(response.content)
judgment.llm_used = True
# Check confidence threshold
if judgment.confidence < self.llm_confidence_threshold:
# Low confidence - escalate
return Judgment(
action=JudgmentAction.ESCALATE,
reasoning=(
f"LLM confidence ({judgment.confidence:.2f}) "
f"below threshold ({self.llm_confidence_threshold})"
),
feedback=judgment.feedback,
confidence=judgment.confidence,
llm_used=True,
context={"original_judgment": judgment.model_dump()},
)
return judgment
except Exception as e:
# LLM failed - escalate
return Judgment(
action=JudgmentAction.ESCALATE,
reasoning=f"LLM evaluation failed: {e}",
feedback="Human review needed due to LLM error",
llm_used=True,
)
def _build_llm_system_prompt(self, goal: Goal) -> str:
"""Build system prompt for LLM judge."""
return f"""You are a judge evaluating the execution of a plan step.
GOAL: {goal.description}
SUCCESS CRITERIA:
{chr(10).join(f"- {sc.description}" for sc in goal.success_criteria)}
CONSTRAINTS:
{chr(10).join(f"- {c.description}" for c in goal.constraints)}
Your task is to evaluate whether the step was executed successfully and decide the next action.
Respond in this exact format:
ACTION: [ACCEPT|RETRY|REPLAN|ESCALATE]
CONFIDENCE: [0.0-1.0]
REASONING: [Your reasoning]
FEEDBACK: [Feedback for retry/replan, or empty if accepting]
Actions:
- ACCEPT: Step completed successfully, continue to next step
- RETRY: Step failed but can be retried with feedback
- REPLAN: Step failed in a way that requires replanning
- ESCALATE: Requires human intervention
"""
def _build_llm_user_prompt(
self,
step: PlanStep,
result: Any,
context: dict[str, Any],
rule_result: RuleEvaluationResult,
) -> str:
"""Build user prompt for LLM judge."""
return f"""Evaluate this step execution:
STEP: {step.description}
STEP ID: {step.id}
ACTION TYPE: {step.action.action_type}
EXPECTED OUTPUTS: {step.expected_outputs}
RESULT:
{result}
CONTEXT FROM PREVIOUS STEPS:
{context}
RULES CHECKED: {rule_result.rules_checked} (none matched definitively)
Please evaluate and provide your judgment."""
def _parse_llm_response(self, response: str) -> Judgment:
"""Parse LLM response into Judgment."""
lines = response.strip().split("\n")
action = JudgmentAction.ACCEPT
confidence = 0.8
reasoning = ""
feedback = ""
for line in lines:
line = line.strip()
if line.startswith("ACTION:"):
action_str = line.split(":", 1)[1].strip().upper()
try:
action = JudgmentAction(action_str.lower())
except ValueError:
action = JudgmentAction.ESCALATE
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(line.split(":", 1)[1].strip())
except ValueError:
confidence = 0.5
elif line.startswith("REASONING:"):
reasoning = line.split(":", 1)[1].strip()
elif line.startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return Judgment(
action=action,
reasoning=reasoning or "LLM evaluation",
feedback=feedback if feedback else None,
confidence=confidence,
)
# Factory function for creating judge with common rules
def create_default_judge(llm: LLMProvider | None = None) -> HybridJudge:
"""
Create a HybridJudge with commonly useful default rules.
Args:
llm: LLM provider for fallback evaluation
Returns:
Configured HybridJudge instance
"""
judge = HybridJudge(llm=llm)
# Rule: Accept on explicit success flag
judge.add_rule(
EvaluationRule(
id="explicit_success",
description="Step explicitly marked as successful",
condition="isinstance(result, dict) and result.get('success') == True",
action=JudgmentAction.ACCEPT,
priority=100,
)
)
# Rule: Retry on transient errors
judge.add_rule(
EvaluationRule(
id="transient_error_retry",
description="Transient error that can be retried",
condition=(
"isinstance(result, dict) and "
"result.get('error_type') in ['timeout', 'rate_limit', 'connection_error']"
),
action=JudgmentAction.RETRY,
feedback_template="Transient error: {result[error]}. Please retry.",
priority=90,
)
)
# Rule: Replan on missing data
judge.add_rule(
EvaluationRule(
id="missing_data_replan",
description="Required data not available",
condition="isinstance(result, dict) and result.get('error_type') == 'missing_data'",
action=JudgmentAction.REPLAN,
feedback_template="Missing required data: {result[error]}. Plan needs adjustment.",
priority=80,
)
)
# Rule: Escalate on security issues
judge.add_rule(
EvaluationRule(
id="security_escalate",
description="Security issue detected",
condition="isinstance(result, dict) and result.get('error_type') == 'security'",
action=JudgmentAction.ESCALATE,
feedback_template="Security issue detected: {result[error]}",
priority=200,
)
)
# Rule: Fail on max retries exceeded
judge.add_rule(
EvaluationRule(
id="max_retries_fail",
description="Maximum retries exceeded",
condition="step.get('attempts', 0) >= step.get('max_retries', 3)",
action=JudgmentAction.REPLAN,
feedback_template="Step '{step[id]}' failed after {step[attempts]} attempts",
priority=150,
)
)
return judge
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -206,7 +206,7 @@ class OutputCleaner:
warnings=warnings,
)
def clean_output(
async def clean_output(
self,
output: dict[str, Any],
source_node_id: str,
@@ -288,7 +288,7 @@ Return ONLY valid JSON matching the expected schema. No explanations, no markdow
f"🧹 Cleaning output from '{source_node_id}' using {self.config.fast_model}"
)
response = self.llm.complete(
response = await self.llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system=(
"You clean malformed agent outputs. Return only valid JSON matching the schema."
-513
View File
@@ -1,513 +0,0 @@
"""
Plan Data Structures for Flexible Execution.
Plans are created externally (by Claude Code or another LLM agent) and
executed internally by the FlexibleGraphExecutor with Worker-Judge loop.
The Plan is the contract between the external planner and the executor:
- Planner creates a Plan with PlanSteps
- Executor runs steps and judges results
- If replanning needed, returns feedback to external planner
"""
from datetime import datetime
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ActionType(StrEnum):
"""Types of actions a PlanStep can perform."""
LLM_CALL = "llm_call" # Call LLM for generation
TOOL_USE = "tool_use" # Use a registered tool
SUB_GRAPH = "sub_graph" # Execute a sub-graph
FUNCTION = "function" # Call a Python function
CODE_EXECUTION = "code_execution" # Execute dynamic code (sandboxed)
class StepStatus(StrEnum):
"""Status of a plan step."""
PENDING = "pending"
AWAITING_APPROVAL = "awaiting_approval" # Waiting for human approval
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
REJECTED = "rejected" # Human rejected execution
def is_terminal(self) -> bool:
"""Check if this status represents a terminal (finished) state.
Terminal states are states where the step will not execute further,
either because it completed successfully or failed/was skipped.
"""
return self in (
StepStatus.COMPLETED,
StepStatus.FAILED,
StepStatus.SKIPPED,
StepStatus.REJECTED,
)
def is_successful(self) -> bool:
"""Check if this status represents successful completion."""
return self == StepStatus.COMPLETED
class ApprovalDecision(StrEnum):
"""Human decision on a step requiring approval."""
APPROVE = "approve" # Execute as planned
REJECT = "reject" # Skip this step
MODIFY = "modify" # Execute with modifications
ABORT = "abort" # Stop entire execution
class ApprovalRequest(BaseModel):
"""Request for human approval before executing a step."""
step_id: str
step_description: str
action_type: str
action_details: dict[str, Any] = Field(default_factory=dict)
context: dict[str, Any] = Field(default_factory=dict)
approval_message: str | None = None
# Preview of what will happen
preview: str | None = None
model_config = {"extra": "allow"}
class ApprovalResult(BaseModel):
"""Result of human approval decision."""
decision: ApprovalDecision
reason: str | None = None
modifications: dict[str, Any] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class JudgmentAction(StrEnum):
"""Actions the judge can take after evaluating a step."""
ACCEPT = "accept" # Step completed successfully, continue
RETRY = "retry" # Retry the step with feedback
REPLAN = "replan" # Return to external planner for new plan
ESCALATE = "escalate" # Request human intervention
class ActionSpec(BaseModel):
"""
Specification for an action to be executed.
This is the "what to do" part of a PlanStep.
"""
action_type: ActionType
# For LLM_CALL
prompt: str | None = None
system_prompt: str | None = None
model: str | None = None
# For TOOL_USE
tool_name: str | None = None
tool_args: dict[str, Any] = Field(default_factory=dict)
# For SUB_GRAPH
graph_id: str | None = None
# For FUNCTION
function_name: str | None = None
function_args: dict[str, Any] = Field(default_factory=dict)
# For CODE_EXECUTION
code: str | None = None
language: str = "python"
model_config = {"extra": "allow"}
class PlanStep(BaseModel):
"""
A single step in a plan.
Created by external planner, executed by Worker, evaluated by Judge.
"""
id: str
description: str
action: ActionSpec
# Data flow
inputs: dict[str, Any] = Field(
default_factory=dict,
description="Input data for this step (can reference previous step outputs)",
)
expected_outputs: list[str] = Field(
default_factory=list, description="Keys this step should produce"
)
# Dependencies
dependencies: list[str] = Field(
default_factory=list, description="IDs of steps that must complete before this one"
)
# Human-in-the-loop (HITL)
requires_approval: bool = Field(
default=False, description="If True, requires human approval before execution"
)
approval_message: str | None = Field(
default=None, description="Message to show human when requesting approval"
)
# Execution state
status: StepStatus = StepStatus.PENDING
result: Any | None = None
error: str | None = None
attempts: int = 0
max_retries: int = 3
# Metadata
started_at: datetime | None = None
completed_at: datetime | None = None
model_config = {"extra": "allow"}
def is_ready(self, terminal_step_ids: set[str]) -> bool:
"""Check if this step is ready to execute (all dependencies finished).
A step is ready when:
1. Its status is PENDING (not yet started)
2. All its dependencies are in a terminal state (completed, failed, skipped, or rejected)
Note: This allows dependent steps to become "ready" even if their dependencies
failed. The executor should check if any dependencies failed and handle
accordingly (e.g., skip the step or mark it as blocked).
Args:
terminal_step_ids: Set of step IDs that are in a terminal state
"""
if self.status != StepStatus.PENDING:
return False
return all(dep in terminal_step_ids for dep in self.dependencies)
class Judgment(BaseModel):
"""
Result of judging a step execution.
The Judge evaluates step results and decides what to do next.
"""
action: JudgmentAction
reasoning: str
feedback: str | None = None # For retry/replan - what went wrong
# For rule-based judgments
rule_matched: str | None = None
# For LLM-based judgments
confidence: float = 1.0
llm_used: bool = False
# Context for replanning
context: dict[str, Any] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class EvaluationRule(BaseModel):
"""
A rule for the HybridJudge to evaluate step results.
Rules are checked before falling back to LLM evaluation.
"""
id: str
description: str
# Condition (Python expression evaluated with result, step, goal context)
condition: str
# What to do if condition matches
action: JudgmentAction
feedback_template: str = "" # Can use {result}, {step}, etc.
# Priority (higher = checked first)
priority: int = 0
model_config = {"extra": "allow"}
class Plan(BaseModel):
"""
A complete execution plan.
Created by external planner (Claude Code, etc).
Executed by FlexibleGraphExecutor.
"""
id: str
goal_id: str
description: str
# Steps to execute
steps: list[PlanStep] = Field(default_factory=list)
# Execution state
revision: int = 1 # Incremented on replan
current_step_idx: int = 0
# Accumulated context from execution
context: dict[str, Any] = Field(default_factory=dict)
# Metadata
created_at: datetime = Field(default_factory=datetime.now)
created_by: str = "external" # Who created this plan
# Previous attempt info (for replanning)
previous_feedback: str | None = None
model_config = {"extra": "allow"}
@classmethod
def from_json(cls, data: str | dict) -> "Plan":
"""
Load a Plan from exported JSON.
This handles the output from export_graph() and properly converts
action_type strings to ActionType enums.
Args:
data: JSON string or dict from export_graph()
Returns:
Plan object ready for FlexibleGraphExecutor
Example:
# Load from export_graph() output
exported = export_graph()
plan = Plan.from_json(exported)
# Load from file
with open("plan.json") as f:
plan = Plan.from_json(json.load(f))
"""
import json as json_module
if isinstance(data, str):
data = json_module.loads(data)
# Handle nested "plan" key from export_graph output
if "plan" in data:
data = data["plan"]
# Convert steps
steps = []
for step_data in data.get("steps", []):
action_data = step_data.get("action", {})
# Convert action_type string to enum
action_type_str = action_data.get("action_type", "function")
action_type = ActionType(action_type_str)
action = ActionSpec(
action_type=action_type,
prompt=action_data.get("prompt"),
system_prompt=action_data.get("system_prompt"),
tool_name=action_data.get("tool_name"),
tool_args=action_data.get("tool_args", {}),
function_name=action_data.get("function_name"),
function_args=action_data.get("function_args", {}),
code=action_data.get("code"),
)
step = PlanStep(
id=step_data["id"],
description=step_data.get("description", ""),
action=action,
inputs=step_data.get("inputs", {}),
expected_outputs=step_data.get("expected_outputs", []),
dependencies=step_data.get("dependencies", []),
requires_approval=step_data.get("requires_approval", False),
approval_message=step_data.get("approval_message"),
)
steps.append(step)
return cls(
id=data.get("id", "plan"),
goal_id=data.get("goal_id", ""),
description=data.get("description", ""),
steps=steps,
context=data.get("context", {}),
revision=data.get("revision", 1),
)
def get_step(self, step_id: str) -> PlanStep | None:
"""Get a step by ID."""
for step in self.steps:
if step.id == step_id:
return step
return None
def get_ready_steps(self) -> list[PlanStep]:
"""Get all steps that are ready to execute.
A step is ready when all its dependencies are in terminal states
(completed, failed, skipped, or rejected).
"""
terminal_ids = {s.id for s in self.steps if s.status.is_terminal()}
return [s for s in self.steps if s.is_ready(terminal_ids)]
def get_completed_steps(self) -> list[PlanStep]:
"""Get all completed steps."""
return [s for s in self.steps if s.status == StepStatus.COMPLETED]
def is_complete(self) -> bool:
"""Check if all steps are in terminal states (finished executing).
Returns True when all steps have reached a terminal state, regardless
of whether they succeeded or failed. Use has_failed_steps() to check
if any steps failed.
"""
return all(s.status.is_terminal() for s in self.steps)
def is_successful(self) -> bool:
"""Check if all steps completed successfully."""
return all(s.status == StepStatus.COMPLETED for s in self.steps)
def has_failed_steps(self) -> bool:
"""Check if any steps failed, were skipped, or were rejected."""
return any(
s.status in (StepStatus.FAILED, StepStatus.SKIPPED, StepStatus.REJECTED)
for s in self.steps
)
def get_failed_steps(self) -> list[PlanStep]:
"""Get all steps that failed, were skipped, or were rejected."""
return [
s
for s in self.steps
if s.status in (StepStatus.FAILED, StepStatus.SKIPPED, StepStatus.REJECTED)
]
def to_feedback_context(self) -> dict[str, Any]:
"""Create context for replanning."""
return {
"plan_id": self.id,
"revision": self.revision,
"completed_steps": [
{
"id": s.id,
"description": s.description,
"result": s.result,
}
for s in self.get_completed_steps()
],
"failed_steps": [
{
"id": s.id,
"description": s.description,
"error": s.error,
"attempts": s.attempts,
}
for s in self.steps
if s.status == StepStatus.FAILED
],
"context": self.context,
}
class ExecutionStatus(StrEnum):
"""Status of plan execution."""
COMPLETED = "completed"
AWAITING_APPROVAL = "awaiting_approval" # Paused for human approval
NEEDS_REPLAN = "needs_replan"
NEEDS_ESCALATION = "needs_escalation"
REJECTED = "rejected" # Human rejected a step
ABORTED = "aborted" # Human aborted execution
FAILED = "failed"
class PlanExecutionResult(BaseModel):
"""
Result of executing a plan.
Returned to external planner with status and feedback.
"""
status: ExecutionStatus
# Results from completed steps
results: dict[str, Any] = Field(default_factory=dict)
# For needs_replan - what to tell the planner
feedback: str | None = None
feedback_context: dict[str, Any] = Field(default_factory=dict)
# Steps that completed before stopping
completed_steps: list[str] = Field(default_factory=list)
# Metrics
steps_executed: int = 0
total_tokens: int = 0
total_latency_ms: int = 0
# Error info (for failed status)
error: str | None = None
model_config = {"extra": "allow"}
def load_export(data: str | dict) -> tuple["Plan", Any]:
"""
Load both Plan and Goal from export_graph() output.
The export_graph() MCP tool returns both the plan and the goal that was
defined and approved during the agent building process. This function
loads both so you can use them with FlexibleGraphExecutor.
Args:
data: JSON string or dict from export_graph()
Returns:
Tuple of (Plan, Goal) ready for FlexibleGraphExecutor
Example:
# Load from export_graph() output
exported = export_graph()
plan, goal = load_export(exported)
result = await executor.execute_plan(plan, goal, context)
"""
import json as json_module
from framework.graph.goal import Goal
if isinstance(data, str):
data = json_module.loads(data)
# Load plan
plan = Plan.from_json(data)
# Load goal
goal_data = data.get("goal", {})
if goal_data:
goal = Goal.model_validate(goal_data)
else:
# Fallback: create minimal goal from plan metadata
goal = Goal(
id=plan.goal_id,
name=plan.goal_id,
description=plan.description,
success_criteria=[],
constraints=[],
)
return plan, goal
+230
View File
@@ -0,0 +1,230 @@
"""Prompt composition for continuous agent mode.
Composes the three-layer system prompt (onion model) and generates
transition markers inserted into the conversation at phase boundaries.
Layer 1 Identity (static, defined at agent level, never changes):
"You are a thorough research agent. You prefer clarity over jargon..."
Layer 2 Narrative (auto-generated from conversation/memory state):
"We've finished scoping the project. The user wants to focus on..."
Layer 3 Focus (per-node system_prompt, reframed as focus directive):
"Your current attention: synthesize findings into a report..."
"""
from __future__ import annotations
import logging
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec, SharedMemory
logger = logging.getLogger(__name__)
def _with_datetime(prompt: str) -> str:
"""Append current datetime with local timezone to a system prompt."""
local = datetime.now().astimezone()
stamp = f"Current date and time: {local.strftime('%Y-%m-%d %H:%M %Z (UTC%z)')}"
return f"{prompt}\n\n{stamp}" if prompt else stamp
def build_accounts_prompt(accounts: list[dict[str, Any]]) -> str:
"""Build a prompt section describing connected accounts.
Args:
accounts: List of account info dicts from CredentialStoreAdapter.get_all_account_info().
Returns:
Formatted accounts block, or empty string if no accounts.
"""
if not accounts:
return ""
lines = [
"Connected accounts (use the alias as the `account` parameter "
"when calling tools to target a specific account):"
]
for acct in accounts:
provider = acct.get("provider", "unknown")
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
detail_parts = [f"{k}: {v}" for k, v in identity.items() if v]
detail = f" ({', '.join(detail_parts)})" if detail_parts else ""
lines.append(f"- {provider}/{alias}{detail}")
return "\n".join(lines)
def compose_system_prompt(
identity_prompt: str | None,
focus_prompt: str | None,
narrative: str | None = None,
accounts_prompt: str | None = None,
) -> str:
"""Compose the three-layer system prompt.
Args:
identity_prompt: Layer 1 static agent identity (from GraphSpec).
focus_prompt: Layer 3 per-node focus directive (from NodeSpec.system_prompt).
narrative: Layer 2 auto-generated from conversation state.
accounts_prompt: Connected accounts block (sits between identity and narrative).
Returns:
Composed system prompt with all layers present, plus current datetime.
"""
parts: list[str] = []
# Layer 1: Identity (always first, anchors the personality)
if identity_prompt:
parts.append(identity_prompt)
# Accounts (semi-static, deployment-specific)
if accounts_prompt:
parts.append(f"\n{accounts_prompt}")
# Layer 2: Narrative (what's happened so far)
if narrative:
parts.append(f"\n--- Context (what has happened so far) ---\n{narrative}")
# Layer 3: Focus (current phase directive)
if focus_prompt:
parts.append(f"\n--- Current Focus ---\n{focus_prompt}")
return _with_datetime("\n".join(parts) if parts else "")
def build_narrative(
memory: SharedMemory,
execution_path: list[str],
graph: GraphSpec,
) -> str:
"""Build Layer 2 (narrative) from structured state.
Deterministic no LLM call. Reads SharedMemory and execution path
to describe what has happened so far. Cheap and fast.
Args:
memory: Current shared memory state.
execution_path: List of node IDs visited so far.
graph: Graph spec (for node names/descriptions).
Returns:
Narrative string describing the session state.
"""
parts: list[str] = []
# Describe execution path
if execution_path:
phase_descriptions: list[str] = []
for node_id in execution_path:
node_spec = graph.get_node(node_id)
if node_spec:
phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
else:
phase_descriptions.append(f"- {node_id}")
parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
# Describe key memory values (skip very long values)
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
memory_lines.append(f"- {key}: {val_str}")
if memory_lines:
parts.append("Current state:\n" + "\n".join(memory_lines))
return "\n\n".join(parts) if parts else ""
def build_transition_marker(
previous_node: NodeSpec,
next_node: NodeSpec,
memory: SharedMemory,
cumulative_tool_names: list[str],
data_dir: Path | str | None = None,
adapt_content: str | None = None,
) -> str:
"""Build a 'State of the World' transition marker.
Inserted into the conversation as a user message at phase boundaries.
Gives the LLM full situational awareness: what happened, what's stored,
what tools are available, and what to focus on next.
Args:
previous_node: NodeSpec of the phase just completed.
next_node: NodeSpec of the phase about to start.
memory: Current shared memory state.
cumulative_tool_names: All tools available (cumulative set).
data_dir: Path to spillover data directory.
adapt_content: Agent working memory (adapt.md) content.
Returns:
Transition marker message text.
"""
sections: list[str] = []
# Header
sections.append(f"--- PHASE TRANSITION: {previous_node.name}{next_node.name} ---")
# What just completed
sections.append(f"\nCompleted: {previous_node.name}")
sections.append(f" {previous_node.description}")
# Outputs in memory
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
memory_lines.append(f" {key}: {val_str}")
if memory_lines:
sections.append("\nOutputs available:\n" + "\n".join(memory_lines))
# Files in data directory
if data_dir:
data_path = Path(data_dir)
if data_path.exists():
files = sorted(data_path.iterdir())
if files:
file_lines = [
f" {f.name} ({f.stat().st_size:,} bytes)" for f in files if f.is_file()
]
if file_lines:
sections.append(
"\nData files (use load_data to access):\n" + "\n".join(file_lines)
)
# Agent working memory
if adapt_content:
sections.append(f"\n--- Agent Memory ---\n{adapt_content}")
# Available tools
if cumulative_tool_names:
sections.append("\nAvailable tools: " + ", ".join(sorted(cumulative_tool_names)))
# Next phase
sections.append(f"\nNow entering: {next_node.name}")
sections.append(f" {next_node.description}")
# Reflection prompt (engineered metacognition)
sections.append(
"\nBefore proceeding, briefly reflect: what went well in the "
"previous phase? Are there any gaps or surprises worth noting?"
)
sections.append("\n--- END TRANSITION ---")
return "\n".join(sections)
+1 -1
View File
@@ -145,7 +145,7 @@ class SafeEvalVisitor(ast.NodeVisitor):
def visit_Attribute(self, node: ast.Attribute) -> Any:
# value.attr
# STIRCT CHECK: No access to private attributes (starting with _)
# STRICT CHECK: No access to private attributes (starting with _)
if node.attr.startswith("_"):
raise ValueError(f"Access to private attribute '{node.attr}' is not allowed")
-620
View File
@@ -1,620 +0,0 @@
"""
Worker Node for Executing Plan Steps.
The Worker executes individual plan steps by dispatching to the
appropriate executor based on action type:
- LLM calls
- Tool usage
- Sub-graph execution
- Function calls
- Code execution (sandboxed)
"""
import json
import logging
import re
import time
from collections.abc import Callable
from dataclasses import dataclass, field
from typing import Any
from framework.graph.code_sandbox import CodeSandbox
from framework.graph.plan import (
ActionSpec,
ActionType,
PlanStep,
)
from framework.llm.provider import LLMProvider, Tool
from framework.runtime.core import Runtime
logger = logging.getLogger(__name__)
def parse_llm_json_response(text: str) -> tuple[Any | None, str]:
"""
Parse JSON from LLM response, handling markdown code blocks.
LLMs often return JSON wrapped in markdown code blocks like:
```json
{"key": "value"}
```
This function extracts and parses the JSON.
Args:
text: Raw LLM response text
Returns:
Tuple of (parsed_json_or_None, cleaned_text)
"""
if not isinstance(text, str):
return None, str(text)
cleaned = text.strip()
# Try to extract JSON from markdown code blocks
# Pattern: ```json ... ``` or ``` ... ```
code_block_pattern = r"```(?:json)?\s*([\s\S]*?)\s*```"
matches = re.findall(code_block_pattern, cleaned)
if matches:
# Try to parse each match
for match in matches:
try:
parsed = json.loads(match.strip())
return parsed, match.strip()
except json.JSONDecodeError as e:
logger.debug(
f"Failed to parse JSON from code block: {e}. "
f"Content preview: {match.strip()[:100]}..."
)
continue
# No code blocks or parsing failed - try parsing the whole response
try:
parsed = json.loads(cleaned)
return parsed, cleaned
except json.JSONDecodeError as e:
logger.debug(
f"Failed to parse entire response as JSON: {e}. Content preview: {cleaned[:100]}..."
)
# Try to find JSON-like content (starts with { or [)
json_start_pattern = r"(\{[\s\S]*\}|\[[\s\S]*\])"
json_matches = re.findall(json_start_pattern, cleaned)
for match in json_matches:
try:
parsed = json.loads(match)
return parsed, match
except json.JSONDecodeError as e:
logger.debug(f"Failed to parse JSON pattern: {e}. Content preview: {match[:100]}...")
continue
# Could not parse as JSON - log warning
logger.warning(
f"Could not parse LLM response as JSON after trying all strategies. "
f"Response preview: {cleaned[:200]}..."
)
return None, cleaned
@dataclass
class StepExecutionResult:
"""Result of executing a plan step."""
success: bool
outputs: dict[str, Any] = field(default_factory=dict)
error: str | None = None
error_type: str | None = None # For judge rules: timeout, rate_limit, etc.
# Metadata
tokens_used: int = 0
latency_ms: int = 0
executor_type: str = ""
class WorkerNode:
"""
Executes plan steps by dispatching to appropriate executors.
Usage:
worker = WorkerNode(
runtime=runtime,
llm=llm_provider,
tools=tool_registry,
)
result = await worker.execute(step, context)
"""
def __init__(
self,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
functions: dict[str, Callable] | None = None,
sub_graph_executor: Callable | None = None,
sandbox: CodeSandbox | None = None,
):
"""
Initialize the Worker.
Args:
runtime: Runtime for decision logging
llm: LLM provider for LLM_CALL actions
tools: Available tools for TOOL_USE actions
tool_executor: Function to execute tools
functions: Registered functions for FUNCTION actions
sub_graph_executor: Function to execute sub-graphs
sandbox: Code sandbox for CODE_EXECUTION actions
"""
self.runtime = runtime
self.llm = llm
self.tools = tools or {}
self.tool_executor = tool_executor
self.functions = functions or {}
self.sub_graph_executor = sub_graph_executor
self.sandbox = sandbox or CodeSandbox()
async def execute(
self,
step: PlanStep,
context: dict[str, Any],
) -> StepExecutionResult:
"""
Execute a plan step.
Args:
step: The step to execute
context: Current execution context
Returns:
StepExecutionResult with outputs and status
"""
# Record decision
decision_id = self.runtime.decide(
intent=f"Execute plan step: {step.description}",
options=[
{
"id": step.action.action_type.value,
"description": f"Execute {step.action.action_type.value} action",
"action_type": step.action.action_type.value,
}
],
chosen=step.action.action_type.value,
reasoning=f"Step requires {step.action.action_type.value}",
context={"step_id": step.id, "inputs": step.inputs},
)
start_time = time.time()
try:
# Resolve inputs from context
resolved_inputs = self._resolve_inputs(step.inputs, context)
# Dispatch to appropriate executor
result = await self._dispatch(step.action, resolved_inputs, context)
latency_ms = int((time.time() - start_time) * 1000)
result.latency_ms = latency_ms
# Record outcome
self.runtime.record_outcome(
decision_id=decision_id,
success=result.success,
result=result.outputs if result.success else result.error,
tokens_used=result.tokens_used,
latency_ms=latency_ms,
)
return result
except Exception as e:
latency_ms = int((time.time() - start_time) * 1000)
self.runtime.record_outcome(
decision_id=decision_id,
success=False,
error=str(e),
latency_ms=latency_ms,
)
return StepExecutionResult(
success=False,
error=str(e),
error_type="exception",
latency_ms=latency_ms,
)
def _resolve_inputs(
self,
inputs: dict[str, Any],
context: dict[str, Any],
) -> dict[str, Any]:
"""Resolve input references from context."""
resolved = {}
for key, value in inputs.items():
if isinstance(value, str) and value.startswith("$"):
# Reference to context variable
ref_key = value[1:] # Remove $
resolved[key] = context.get(ref_key, value)
else:
resolved[key] = value
return resolved
async def _dispatch(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Dispatch to appropriate executor based on action type."""
if action.action_type == ActionType.LLM_CALL:
return await self._execute_llm_call(action, inputs, context)
elif action.action_type == ActionType.TOOL_USE:
return await self._execute_tool_use(action, inputs)
elif action.action_type == ActionType.SUB_GRAPH:
return await self._execute_sub_graph(action, inputs, context)
elif action.action_type == ActionType.FUNCTION:
return await self._execute_function(action, inputs)
elif action.action_type == ActionType.CODE_EXECUTION:
return self._execute_code(action, inputs, context)
else:
return StepExecutionResult(
success=False,
error=f"Unknown action type: {action.action_type}",
error_type="invalid_action",
)
async def _execute_llm_call(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute an LLM call action."""
if self.llm is None:
return StepExecutionResult(
success=False,
error="No LLM provider configured",
error_type="configuration",
executor_type="llm_call",
)
try:
# Build prompt with context data
prompt = action.prompt or ""
# First try format placeholders (for prompts like "Hello {name}")
if inputs:
try:
prompt = prompt.format(**inputs)
except (KeyError, ValueError):
pass # Keep original prompt if formatting fails
# Always append context data so LLM can personalize
# This ensures the LLM has access to lead info, company context, etc.
if inputs:
context_section = "\n\n--- Context Data ---\n"
for key, value in inputs.items():
if isinstance(value, dict | list):
context_section += f"{key}: {json.dumps(value, indent=2)}\n"
else:
context_section += f"{key}: {value}\n"
prompt = prompt + context_section
messages = [{"role": "user", "content": prompt}]
response = self.llm.complete(
messages=messages,
system=action.system_prompt,
)
# Try to parse JSON from LLM response
# LLMs often return JSON wrapped in markdown code blocks
parsed_json, _ = parse_llm_json_response(response.content)
# If JSON was parsed successfully, use it as the result
# Otherwise, use the raw text
result_value = parsed_json if parsed_json is not None else response.content
return StepExecutionResult(
success=True,
outputs={
"result": result_value,
"response": response.content, # Always keep raw response
"parsed_json": parsed_json, # Explicit parsed JSON (or None)
},
tokens_used=response.input_tokens + response.output_tokens,
executor_type="llm_call",
)
except Exception as e:
error_type = "rate_limit" if "rate" in str(e).lower() else "llm_error"
return StepExecutionResult(
success=False,
error=str(e),
error_type=error_type,
executor_type="llm_call",
)
async def _execute_tool_use(
self,
action: ActionSpec,
inputs: dict[str, Any],
) -> StepExecutionResult:
"""Execute a tool use action."""
tool_name = action.tool_name
if not tool_name:
return StepExecutionResult(
success=False,
error="No tool name specified",
error_type="invalid_action",
executor_type="tool_use",
)
# Merge action args with inputs
args = {**action.tool_args, **inputs}
# Resolve any $variable references in the merged args
# (tool_args may contain $refs that should be resolved from inputs)
resolved_args = {}
for key, value in args.items():
if isinstance(value, str) and value.startswith("$"):
ref_key = value[1:] # Remove $
resolved_args[key] = args.get(ref_key, value)
else:
resolved_args[key] = value
args = resolved_args
# First, check if we have a registered function with this name
# This allows simpler tool registration without full Tool/ToolExecutor setup
if tool_name in self.functions:
try:
func = self.functions[tool_name]
result = func(**args)
# Handle async functions
if hasattr(result, "__await__"):
result = await result
# If result is already a dict with success/outputs, use it directly
if isinstance(result, dict) and "success" in result:
return StepExecutionResult(
success=result.get("success", False),
outputs=result.get("outputs", {}),
error=result.get("error"),
error_type=result.get("error_type"),
executor_type="tool_use",
)
# Otherwise wrap the result
return StepExecutionResult(
success=True,
outputs={"result": result},
executor_type="tool_use",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="tool_exception",
executor_type="tool_use",
)
# Fall back to formal Tool registry
if tool_name not in self.tools:
return StepExecutionResult(
success=False,
error=f"Tool '{tool_name}' not found",
error_type="missing_tool",
executor_type="tool_use",
)
if self.tool_executor is None:
return StepExecutionResult(
success=False,
error="No tool executor configured",
error_type="configuration",
executor_type="tool_use",
)
try:
# Execute tool via formal executor
from framework.llm.provider import ToolUse
tool_use = ToolUse(
id=f"step_{tool_name}",
name=tool_name,
input=args,
)
result = self.tool_executor(tool_use)
if result.is_error:
return StepExecutionResult(
success=False,
outputs={},
error=result.content,
error_type="tool_error",
executor_type="tool_use",
)
# Parse JSON result and unpack fields into outputs
# Tools return JSON like {"lead_email": "...", "company_name": "..."}
# We want each field as a separate output key
outputs = {"result": result.content}
try:
parsed = json.loads(result.content)
if isinstance(parsed, dict):
# Unpack all fields from the JSON response
outputs.update(parsed)
except (json.JSONDecodeError, TypeError):
pass # Keep result as-is if not valid JSON
return StepExecutionResult(
success=True,
outputs=outputs,
executor_type="tool_use",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="tool_exception",
executor_type="tool_use",
)
async def _execute_sub_graph(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute a sub-graph action."""
if self.sub_graph_executor is None:
return StepExecutionResult(
success=False,
error="No sub-graph executor configured",
error_type="configuration",
executor_type="sub_graph",
)
graph_id = action.graph_id
if not graph_id:
return StepExecutionResult(
success=False,
error="No graph ID specified",
error_type="invalid_action",
executor_type="sub_graph",
)
try:
result = await self.sub_graph_executor(graph_id, inputs, context)
return StepExecutionResult(
success=result.success,
outputs=result.output if result.success else {},
error=result.error if not result.success else None,
tokens_used=result.total_tokens,
executor_type="sub_graph",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="sub_graph_exception",
executor_type="sub_graph",
)
async def _execute_function(
self,
action: ActionSpec,
inputs: dict[str, Any],
) -> StepExecutionResult:
"""Execute a function action."""
func_name = action.function_name
if not func_name:
return StepExecutionResult(
success=False,
error="No function name specified",
error_type="invalid_action",
executor_type="function",
)
if func_name not in self.functions:
return StepExecutionResult(
success=False,
error=f"Function '{func_name}' not registered",
error_type="missing_function",
executor_type="function",
)
try:
func = self.functions[func_name]
# Merge action args with inputs
args = {**action.function_args, **inputs}
# Execute function
result = func(**args)
# Handle async functions
if hasattr(result, "__await__"):
result = await result
return StepExecutionResult(
success=True,
outputs={"result": result},
executor_type="function",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="function_exception",
executor_type="function",
)
def _execute_code(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute a code action in sandbox."""
code = action.code
if not code:
return StepExecutionResult(
success=False,
error="No code specified",
error_type="invalid_action",
executor_type="code_execution",
)
# Merge inputs with context for code
code_inputs = {**context, **inputs}
# Execute in sandbox
sandbox_result = self.sandbox.execute(code, code_inputs)
if sandbox_result.success:
return StepExecutionResult(
success=True,
outputs={
"result": sandbox_result.result,
**sandbox_result.variables,
},
executor_type="code_execution",
latency_ms=sandbox_result.execution_time_ms,
)
else:
error_type = "security" if "Security" in (sandbox_result.error or "") else "code_error"
return StepExecutionResult(
success=False,
error=sandbox_result.error,
error_type=error_type,
executor_type="code_execution",
latency_ms=sandbox_result.execution_time_ms,
)
def register_function(self, name: str, func: Callable) -> None:
"""Register a function for FUNCTION actions."""
self.functions[name] = func
def register_tool(self, tool: Tool) -> None:
"""Register a tool for TOOL_USE actions."""
self.tools[tool.name] = tool
+40
View File
@@ -70,6 +70,7 @@ class AnthropicProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion from Claude (via LiteLLM)."""
return self._provider.complete(
@@ -79,6 +80,7 @@ class AnthropicProvider(LLMProvider):
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
def complete_with_tools(
@@ -97,3 +99,41 @@ class AnthropicProvider(LLMProvider):
tool_executor=tool_executor,
max_iterations=max_iterations,
)
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async completion via LiteLLM."""
return await self._provider.acomplete(
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
) -> LLMResponse:
"""Async tool-use loop via LiteLLM."""
return await self._provider.acomplete_with_tools(
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
)
+428 -16
View File
@@ -28,8 +28,54 @@ from framework.llm.stream_events import StreamEvent
logger = logging.getLogger(__name__)
def _patch_litellm_anthropic_oauth() -> None:
"""Patch litellm's Anthropic header construction to fix OAuth token handling.
litellm bug: validate_environment() puts the OAuth token into x-api-key,
but Anthropic's API rejects OAuth tokens in x-api-key. They must be sent
via Authorization: Bearer only, with x-api-key omitted entirely.
This patch wraps validate_environment to remove x-api-key when the
Authorization header carries an OAuth token (sk-ant-oat prefix).
See: https://github.com/BerriAI/litellm/issues/19618
"""
try:
from litellm.llms.anthropic.common_utils import AnthropicModelInfo
from litellm.types.llms.anthropic import ANTHROPIC_OAUTH_TOKEN_PREFIX
except ImportError:
return
original = AnthropicModelInfo.validate_environment
def _patched_validate_environment(
self, headers, model, messages, optional_params, litellm_params, api_key=None, api_base=None
):
result = original(
self,
headers,
model,
messages,
optional_params,
litellm_params,
api_key=api_key,
api_base=api_base,
)
auth = result.get("authorization", "")
if auth.startswith(f"Bearer {ANTHROPIC_OAUTH_TOKEN_PREFIX}"):
result.pop("x-api-key", None)
return result
AnthropicModelInfo.validate_environment = _patched_validate_environment
if litellm is not None:
_patch_litellm_anthropic_oauth()
RATE_LIMIT_MAX_RETRIES = 10
RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
# Directory for dumping failed requests
FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"
@@ -84,6 +130,91 @@ def _dump_failed_request(
return str(filepath)
def _compute_retry_delay(
attempt: int,
exception: BaseException | None = None,
backoff_base: int = RATE_LIMIT_BACKOFF_BASE,
max_delay: int = RATE_LIMIT_MAX_DELAY,
) -> float:
"""Compute retry delay, preferring server-provided Retry-After headers.
Priority:
1. retry-after-ms header (milliseconds, float)
2. retry-after header as seconds (float)
3. retry-after header as HTTP-date (RFC 7231)
4. Exponential backoff: backoff_base * 2^attempt
All values are capped at max_delay seconds.
"""
if exception is not None:
response = getattr(exception, "response", None)
if response is not None:
headers = getattr(response, "headers", None)
if headers is not None:
# Priority 1: retry-after-ms (milliseconds)
retry_after_ms = headers.get("retry-after-ms")
if retry_after_ms is not None:
try:
delay = float(retry_after_ms) / 1000.0
return min(max(delay, 0), max_delay)
except (ValueError, TypeError):
pass
# Priority 2: retry-after (seconds or HTTP-date)
retry_after = headers.get("retry-after")
if retry_after is not None:
# Try as seconds (float)
try:
delay = float(retry_after)
return min(max(delay, 0), max_delay)
except (ValueError, TypeError):
pass
# Try as HTTP-date (e.g., "Fri, 31 Dec 2025 23:59:59 GMT")
try:
from email.utils import parsedate_to_datetime
retry_date = parsedate_to_datetime(retry_after)
now = datetime.now(retry_date.tzinfo)
delay = (retry_date - now).total_seconds()
return min(max(delay, 0), max_delay)
except (ValueError, TypeError, OverflowError):
pass
# Fallback: exponential backoff
delay = backoff_base * (2**attempt)
return min(delay, max_delay)
def _is_stream_transient_error(exc: BaseException) -> bool:
"""Classify whether a streaming exception is transient (recoverable).
Transient errors (recoverable=True): network issues, server errors, timeouts.
Permanent errors (recoverable=False): auth, bad request, context window, etc.
"""
try:
from litellm.exceptions import (
APIConnectionError,
BadGatewayError,
InternalServerError,
ServiceUnavailableError,
)
transient_types: tuple[type[BaseException], ...] = (
APIConnectionError,
InternalServerError,
BadGatewayError,
ServiceUnavailableError,
TimeoutError,
ConnectionError,
OSError,
)
except ImportError:
transient_types = (TimeoutError, ConnectionError, OSError)
return isinstance(exc, transient_types)
class LiteLLMProvider(LLMProvider):
"""
LiteLLM-based LLM provider for multi-provider support.
@@ -150,10 +281,13 @@ class LiteLLMProvider(LLMProvider):
"LiteLLM is not installed. Please install it with: uv pip install litellm"
)
def _completion_with_rate_limit_retry(self, **kwargs: Any) -> Any:
def _completion_with_rate_limit_retry(
self, max_retries: int | None = None, **kwargs: Any
) -> Any:
"""Call litellm.completion with retry on 429 rate limit errors and empty responses."""
model = kwargs.get("model", self.model)
for attempt in range(RATE_LIMIT_MAX_RETRIES + 1):
retries = max_retries if max_retries is not None else RATE_LIMIT_MAX_RETRIES
for attempt in range(retries + 1):
try:
response = litellm.completion(**kwargs) # type: ignore[union-attr]
@@ -194,22 +328,22 @@ class LiteLLMProvider(LLMProvider):
f"Full request dumped to: {dump_path}"
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0})"
)
return response
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt)
logger.warning(
f"[retry] {model} returned empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0}) — "
f"likely rate limited or quota exceeded. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
continue
@@ -225,21 +359,21 @@ class LiteLLMProvider(LLMProvider):
error_type="rate_limit",
attempt=attempt,
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — rate limit error: {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
raise
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[retry] {model} rate limited (429): {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
# unreachable, but satisfies type checker
@@ -253,6 +387,7 @@ class LiteLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion using LiteLLM."""
# Prepare messages with system prompt
@@ -293,12 +428,17 @@ class LiteLLMProvider(LLMProvider):
kwargs["response_format"] = response_format
# Make the call
response = self._completion_with_rate_limit_retry(**kwargs)
response = self._completion_with_rate_limit_retry(max_retries=max_retries, **kwargs)
# Extract content
content = response.choices[0].message.content or ""
# Get usage info
# Get usage info.
# NOTE: completion_tokens includes reasoning/thinking tokens for models
# that use them (o1, gpt-5-mini, etc.). LiteLLM does not reliably expose
# usage.completion_tokens_details.reasoning_tokens across all providers.
# This means output_tokens may be inflated for reasoning models.
# Compaction is unaffected — it uses prompt_tokens (input-side only).
usage = response.usage
input_tokens = usage.prompt_tokens if usage else 0
output_tokens = usage.completion_tokens if usage else 0
@@ -433,6 +573,267 @@ class LiteLLMProvider(LLMProvider):
raw_response=None,
)
# ------------------------------------------------------------------
# Async variants — non-blocking on the event loop
# ------------------------------------------------------------------
async def _acompletion_with_rate_limit_retry(
self, max_retries: int | None = None, **kwargs: Any
) -> Any:
"""Async version of _completion_with_rate_limit_retry.
Uses litellm.acompletion and asyncio.sleep instead of blocking calls.
"""
model = kwargs.get("model", self.model)
retries = max_retries if max_retries is not None else RATE_LIMIT_MAX_RETRIES
for attempt in range(retries + 1):
try:
response = await litellm.acompletion(**kwargs) # type: ignore[union-attr]
content = response.choices[0].message.content if response.choices else None
has_tool_calls = bool(response.choices and response.choices[0].message.tool_calls)
if not content and not has_tool_calls:
messages = kwargs.get("messages", [])
last_role = next(
(m["role"] for m in reversed(messages) if m.get("role") != "system"),
None,
)
if last_role == "assistant":
logger.debug(
"[async-retry] Empty response after assistant message — "
"expected, not retrying."
)
return response
finish_reason = (
response.choices[0].finish_reason if response.choices else "unknown"
)
token_count, token_method = _estimate_tokens(model, messages)
dump_path = _dump_failed_request(
model=model,
kwargs=kwargs,
error_type="empty_response",
attempt=attempt,
)
logger.warning(
f"[async-retry] Empty response - {len(messages)} messages, "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
if attempt == retries:
logger.error(
f"[async-retry] GAVE UP on {model} after {retries + 1} "
f"attempts — empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0})"
)
return response
wait = _compute_retry_delay(attempt)
logger.warning(
f"[async-retry] {model} returned empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0}) — "
f"likely rate limited or quota exceeded. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{retries})"
)
await asyncio.sleep(wait)
continue
return response
except RateLimitError as e:
messages = kwargs.get("messages", [])
token_count, token_method = _estimate_tokens(model, messages)
dump_path = _dump_failed_request(
model=model,
kwargs=kwargs,
error_type="rate_limit",
attempt=attempt,
)
if attempt == retries:
logger.error(
f"[async-retry] GAVE UP on {model} after {retries + 1} "
f"attempts — rate limit error: {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
raise
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[async-retry] {model} rate limited (429): {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{retries})"
)
await asyncio.sleep(wait)
raise RuntimeError("Exhausted rate limit retries")
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async version of complete(). Uses litellm.acompletion — non-blocking."""
full_messages: list[dict[str, Any]] = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
if json_mode:
json_instruction = "\n\nPlease respond with a valid JSON object."
if full_messages and full_messages[0]["role"] == "system":
full_messages[0]["content"] += json_instruction
else:
full_messages.insert(0, {"role": "system", "content": json_instruction.strip()})
kwargs: dict[str, Any] = {
"model": self.model,
"messages": full_messages,
"max_tokens": max_tokens,
**self.extra_kwargs,
}
if self.api_key:
kwargs["api_key"] = self.api_key
if self.api_base:
kwargs["api_base"] = self.api_base
if tools:
kwargs["tools"] = [self._tool_to_openai_format(t) for t in tools]
if response_format:
kwargs["response_format"] = response_format
response = await self._acompletion_with_rate_limit_retry(max_retries=max_retries, **kwargs)
content = response.choices[0].message.content or ""
usage = response.usage
input_tokens = usage.prompt_tokens if usage else 0
output_tokens = usage.completion_tokens if usage else 0
return LLMResponse(
content=content,
model=response.model or self.model,
input_tokens=input_tokens,
output_tokens=output_tokens,
stop_reason=response.choices[0].finish_reason or "",
raw_response=response,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
max_tokens: int = 4096,
) -> LLMResponse:
"""Async version of complete_with_tools(). Uses litellm.acompletion — non-blocking."""
current_messages: list[dict[str, Any]] = []
if system:
current_messages.append({"role": "system", "content": system})
current_messages.extend(messages)
total_input_tokens = 0
total_output_tokens = 0
openai_tools = [self._tool_to_openai_format(t) for t in tools]
for _ in range(max_iterations):
kwargs: dict[str, Any] = {
"model": self.model,
"messages": current_messages,
"max_tokens": max_tokens,
"tools": openai_tools,
**self.extra_kwargs,
}
if self.api_key:
kwargs["api_key"] = self.api_key
if self.api_base:
kwargs["api_base"] = self.api_base
response = await self._acompletion_with_rate_limit_retry(**kwargs)
usage = response.usage
if usage:
total_input_tokens += usage.prompt_tokens
total_output_tokens += usage.completion_tokens
choice = response.choices[0]
message = choice.message
if choice.finish_reason == "stop" or not message.tool_calls:
return LLMResponse(
content=message.content or "",
model=response.model or self.model,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
stop_reason=choice.finish_reason or "stop",
raw_response=response,
)
current_messages.append(
{
"role": "assistant",
"content": message.content,
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {
"name": tc.function.name,
"arguments": tc.function.arguments,
},
}
for tc in message.tool_calls
],
}
)
for tool_call in message.tool_calls:
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
current_messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": "Invalid JSON arguments provided to tool.",
}
)
continue
tool_use = ToolUse(
id=tool_call.id,
name=tool_call.function.name,
input=args,
)
result = tool_executor(tool_use)
current_messages.append(
{
"role": "tool",
"tool_call_id": result.tool_use_id,
"content": result.content,
}
)
return LLMResponse(
content="Max tool iterations reached",
model=self.model,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
stop_reason="max_iterations",
raw_response=None,
)
def _tool_to_openai_format(self, tool: Tool) -> dict[str, Any]:
"""Convert Tool to OpenAI function calling format."""
return {
@@ -591,7 +992,7 @@ class LiteLLMProvider(LLMProvider):
for event in tail_events:
yield event
return
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt)
token_count, token_method = _estimate_tokens(
self.model,
full_messages,
@@ -619,10 +1020,10 @@ class LiteLLMProvider(LLMProvider):
except RateLimitError as e:
if attempt < RATE_LIMIT_MAX_RETRIES:
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[stream-retry] {self.model} rate limited (429): {e!s}. "
f"Retrying in {wait}s "
f"Retrying in {wait:.1f}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
@@ -631,5 +1032,16 @@ class LiteLLMProvider(LLMProvider):
return
except Exception as e:
yield StreamErrorEvent(error=str(e), recoverable=False)
if _is_stream_transient_error(e) and attempt < RATE_LIMIT_MAX_RETRIES:
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[stream-retry] {self.model} transient error "
f"({type(e).__name__}): {e!s}. "
f"Retrying in {wait:.1f}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
continue
recoverable = _is_stream_transient_error(e)
yield StreamErrorEvent(error=str(e), recoverable=recoverable)
return
+39
View File
@@ -120,6 +120,7 @@ class MockLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a mock completion without calling a real LLM.
@@ -182,6 +183,44 @@ class MockLLMProvider(LLMProvider):
stop_reason="mock_complete",
)
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async mock completion (no I/O, returns immediately)."""
return self.complete(
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
) -> LLMResponse:
"""Async mock tool-use completion (no I/O, returns immediately)."""
return self.complete_with_tools(
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
)
async def stream(
self,
messages: list[dict[str, Any]],
+62 -1
View File
@@ -1,8 +1,10 @@
"""LLM Provider abstraction for pluggable LLM backends."""
import asyncio
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Callable
from dataclasses import dataclass, field
from functools import partial
from typing import Any
@@ -65,6 +67,7 @@ class LLMProvider(ABC):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a completion from the LLM.
@@ -79,6 +82,8 @@ class LLMProvider(ABC):
- {"type": "json_schema", "json_schema": {"name": "...", "schema": {...}}}
for strict JSON schema enforcement
json_mode: If True, request structured JSON output from the LLM
max_retries: Override retry count for rate-limit/empty-response retries.
None uses the provider default.
Returns:
LLMResponse with content and metadata
@@ -109,6 +114,62 @@ class LLMProvider(ABC):
"""
pass
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list["Tool"] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> "LLMResponse":
"""Async version of complete(). Non-blocking on the event loop.
Default implementation offloads the sync complete() to a thread pool.
Subclasses SHOULD override for native async I/O.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
partial(
self.complete,
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
),
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list["Tool"],
tool_executor: Callable[["ToolUse"], "ToolResult"],
max_iterations: int = 10,
) -> "LLMResponse":
"""Async version of complete_with_tools(). Non-blocking on the event loop.
Default implementation offloads the sync complete_with_tools() to a thread pool.
Subclasses SHOULD override for native async I/O.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
partial(
self.complete_with_tools,
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
),
)
async def stream(
self,
messages: list[dict[str, Any]],
@@ -132,7 +193,7 @@ class LLMProvider(ABC):
TextEndEvent,
)
response = self.complete(
response = await self.acomplete(
messages=messages,
system=system,
tools=tools,

Some files were not shown because too many files have changed in this diff Show More