Compare commits

...

169 Commits

Author SHA1 Message Date
Timothy @aden a03b378e9b Merge branch 'main' into fix/deprecation 2026-02-18 20:29:39 -08:00
Bryan @ Aden 893053ede7 Merge pull request #5098 from adenhq/update/inbox-agent-fixes
(micro-fix): Update/inbox agent fixes
2026-02-19 03:35:25 +00:00
bryan 596ec6fec5 fixed credentials 2026-02-18 19:26:59 -08:00
bryan 5863b83172 Merge branch 'main' into update/inbox-agent-fixes 2026-02-18 19:12:01 -08:00
bryan 20c92b197a fixes to inbox agent 2026-02-18 19:08:55 -08:00
RichardTang-Aden ec9c6b4666 Merge pull request #5097 from RichardTang-Aden/feat/credential-setup-cli
Feat/credential setup cli
2026-02-18 17:22:53 -08:00
Richard Tang 8a73e5c119 chore: ruff lint 2026-02-18 17:21:45 -08:00
Richard Tang 717f0eee9a Merge branch 'main' into feat/credential-setup-cli 2026-02-18 17:20:40 -08:00
Richard Tang 09fb47f089 chore: ruff format 2026-02-18 17:14:26 -08:00
Richard Tang b46d943e71 chore: lint issues 2026-02-18 17:13:01 -08:00
Richard Tang 2920b5ab01 chore: lint issues 2026-02-17 20:05:05 -08:00
Richard Tang 81ad0467b0 Merge branch 'main' into fix/deprecation 2026-02-17 20:02:47 -08:00
Richard Tang 115ca55ea0 fix: broken ci tests 2026-02-17 20:00:47 -08:00
Richard Tang f2814a26e6 chore: lint issue 2026-02-17 19:57:31 -08:00
Richard Tang 4d309950b0 fix: unused code and ci 2026-02-17 19:55:54 -08:00
RichardTang-Aden 39216a4c12 Merge pull request #5016 from adenhq/feat/pdf-ingestion
Feat/pdf ingestion
2026-02-17 19:29:35 -08:00
Aden HQ c7fa621aeb Merge branch 'main' into feat/pdf-ingestion 2026-02-17 19:28:54 -08:00
Richard Tang 8c3ad3d70a fix: email agent version 2026-02-17 19:17:07 -08:00
Richard Tang 9eb3fc6285 fix: fix email agent version 2026-02-17 19:09:17 -08:00
Richard Tang e95f7e7339 fix: make the email inbox management agent identical to main 2026-02-17 19:02:08 -08:00
RichardTang-Aden d949551399 Merge pull request #3332 from haliaeetusvocifer/feat/google-docs-integration
added feature google docs integration
2026-02-17 18:25:03 -08:00
Richard Tang a7dbd85ed4 fix: google docs credentials 2026-02-17 18:24:31 -08:00
Richard Tang 1f288dab1c fix: tools registration problems 2026-02-17 18:15:41 -08:00
Richard Tang 021754d941 Merge branch 'main' into feat/google-docs-integration 2026-02-17 18:05:07 -08:00
bryan 7412904fbf update to job hunter and website vulnerability 2026-02-17 16:58:12 -08:00
haliaeetusvocifer 5f3e9379a3 completed task 2026-02-17 23:59:47 +00:00
Richard Tang 0e565d6cea feat: add the agent start confirmation and credential update option 2026-02-17 13:17:03 -08:00
Richard Tang 67b249dcd5 feat: add the credential setup step after credential validation 2026-02-17 12:59:20 -08:00
Timothy bbf1c8c790 fix(arch): remove all deprecated concepts and deadcodes 2026-02-17 10:59:15 -08:00
bryan 44a8b453b5 Merge branch 'main' into feat/pdf-ingestion 2026-02-16 18:40:46 -08:00
bryan 26511fe962 added pdf select and updated job hunter 2026-02-16 18:38:13 -08:00
RichardTang-Aden ce5893216a Merge pull request #4871 from paarths-collab/docs/root-install-warning
docs(readme): clarify uv workspace setup and prevent root pip install misuse
2026-02-16 18:36:57 -08:00
RichardTang-Aden 4e821e4dbf Merge pull request #5011 from RichardTang-Aden/main
micro-fix: chore: update the intro message of the agent
2026-02-16 18:05:18 -08:00
Richard Tang d11e97de59 chore: update the intro message of the agent 2026-02-16 18:03:58 -08:00
RichardTang-Aden 4b10d3e360 Merge pull request #5010 from RichardTang-Aden/main
feat: merge the sample agent so we have one email inbox management agent
2026-02-16 18:00:31 -08:00
Richard Tang e04479930f chore: update descriptions 2026-02-16 17:56:17 -08:00
Richard Tang 8a8c4cc3f5 chore: rename the email 2026-02-16 17:53:05 -08:00
Richard Tang 1e06ff611e refactor: merge the sample agent so we have one email inbox management agent 2026-02-16 17:43:18 -08:00
Pravin Mishra 1edc7bb9c7 feat(tools): add Discord integration (#2913) (#4247)
* feat(tools): add Discord integration (#2913)

- discord_list_guilds: list servers the bot is in
- discord_list_channels: list channels for a guild
- discord_send_message: send message to channel
- discord_get_messages: get recent messages

Auth: DISCORD_BOT_TOKEN, credential spec, health checker.
Uses Discord API v10 (Bot token).

Co-authored-by: Cursor <cursoragent@cursor.com>

* style: apply ruff format to discord tool files

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add rate limit handling, message validation, channel filter

- Rate limit (429): return clear error with retry_after from API
- Message length: validate before send, max 2000 chars per Discord limit
- Channel filter: text_only param (default True) for list_channels
- Add 6 new tests for rate limit, validation, filtering

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(discord): add retry on 429 rate limit

- Retry up to 2 times using Discord's retry_after
- Cap wait at 60s, fallback to exponential backoff if no retry_after
- Add _request_with_retry helper for all API calls
- Add 3 tests: retry then success, retry exhausted, tool-level retry

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(discord): remove unused DISCORD_API_BASE import

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: mishrapravin114 <mishrapravin114@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 09:29:56 +08:00
Siddharth Varshney 7b1e0af155 feat(utils): add proper __init__.py exports for utils module (#3979) 2026-02-16 20:20:09 +08:00
Jeet Karia 7b15616e29 feat(tools): add Exa Search API integration with 4 MCP tools (#4941)
Implements AI-powered web search, content extraction, and research tools
via the Exa API for agent workflows.

Tools: exa_search, exa_find_similar, exa_get_contents, exa_answer

Follows existing tool pattern (web_search_tool, hubspot_tool, slack_tool):
- register_tools(mcp, credentials) with @mcp.tool() decorators
- Credential fallback: CredentialStoreAdapter -> EXA_API_KEY env var
- Error handling: always returns dicts, never raises
- Retry with exponential backoff on HTTP 429

Includes:
- Neural/keyword search with domain, date, and category filters
- Similar page discovery via neural embeddings
- Content extraction from up to 10 URLs per request
- Citation-backed answer generation
- CredentialSpec in credentials/search.py
- Comprehensive unit tests (21 tests)
- 500/500 integration CI tests passing

Fixes #4177
2026-02-16 19:28:32 +08:00
Shivam Shahi– oss/acc 99ed00fd02 feat(tools): add Razorpay payment processing integration (#4467)
* feat(tools): add Razorpay payment processing integration

Add Razorpay MCP tool integration for payment processing, invoicing,
and refund management. Implements 6 MCP tools:

- razorpay_list_payments: List recent payments with filters (pagination, date range)
- razorpay_get_payment: Fetch detailed payment information by ID
- razorpay_create_payment_link: Create one-time payment links with shareable URLs
- razorpay_list_invoices: List invoices with status and type filtering
- razorpay_get_invoice: Fetch invoice details including line items
- razorpay_create_refund: Create full or partial refunds for payments

Features:
- Authentication via HTTP Basic Auth (RAZORPAY_API_KEY + RAZORPAY_API_SECRET)
- Credential spec in dedicated razorpay.py (follows repo pattern)
- Comprehensive error handling (401, 403, 404, 400, 429, 500, timeouts)
- Input validation (payment IDs, invoice IDs, amounts, currencies)
- Full test coverage (42 unit tests, 26 integration tests)

Closes #4404

* style: fix ruff I001 import order and W291 in tools

* fix: improve Razorpay credential tracking and validation

- Add razorpay_secret CredentialSpec with credential_group
- Fix amount=0 bug by using 'is not None' checks
- Add regex validation for payment/invoice IDs

* fix: use graceful credential handling instead of raising TypeError

Match codebase convention (calcom, lusha) - return None for non-string
credentials instead of raising TypeError, so the tool returns an error
dict instead of crashing.

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-16 12:02:16 +08:00
Timothy @aden f7af5f9ee8 Merge pull request #4926 from TimothyZhang7/example/gmail-inbox-guardian-agent
Release / Create Release (push) Waiting to run
chore(micro-fix): change the timer to 5 minutes
2026-02-15 19:30:27 -08:00
RichardTang-Aden e5bcc8005f Merge pull request #4922 from adenhq/feat/vulnerability_agent
Feat/vulnerability agent
2026-02-15 19:21:00 -08:00
Timothy 352d285212 chore: change the timer to 5 minutes 2026-02-15 18:47:22 -08:00
Timothy @aden 3ef60f9d14 Merge pull request #4925 from TimothyZhang7/example/gmail-inbox-guardian-agent
Example(micro-fix)/gmail inbox guardian agent
2026-02-15 18:44:19 -08:00
Timothy a103312127 feature: display timer status interactively 2026-02-15 18:41:45 -08:00
Timothy 3d0bba4167 example(agents): ready-to-use gmail automation agent 2026-02-15 18:34:14 -08:00
Timothy @aden 3df718cc14 Merge pull request #4920 from TimothyZhang7/fix/issue-4905
Fix/issue 4905
2026-02-15 18:08:28 -08:00
RichardTang-Aden c7497a180e Merge pull request #4918 from TimothyZhang7/feature/multi-entry-event-driven-agents
Fix/multi entry event driven agents
2026-02-15 18:05:43 -08:00
Bryan @ Aden 3f39039a21 Merge pull request #4800 from LukeM94/doc/typo-fixes-in-roadmap.md
doc: Fix typos in docs/roadmap.md
2026-02-16 01:57:35 +00:00
Bryan @ Aden 88fbd90fcc Merge pull request #4799 from zhanglinqian/fix-recipes-readme-links
docs: fix incorrect directory names in recipes README
2026-02-16 01:55:51 +00:00
bryan e0bf09dd78 lint fixes 2026-02-15 17:45:56 -08:00
bryan 3e158b07af Merge branch 'main' into feat/vulnerability_agent 2026-02-15 17:35:59 -08:00
Timothy 5319ed7ee1 chore: remove unsolicited docs 2026-02-15 17:32:36 -08:00
Timothy 978904d2a4 fix(executor): async operations on non-streaming llm complete for healthy event loop 2026-02-15 17:31:18 -08:00
bryan 4d876ecc54 vulnerability check to sample agents 2026-02-15 17:27:09 -08:00
RichardTang-Aden ba327d0b9e Merge pull request #4919 from adenhq/feat/sample-agent/job_hunter
Sample agent, micro-fix: remove dependency of brave search
2026-02-15 16:58:06 -08:00
Richard Tang b69cf3523c feat: remove dependency of brave search 2026-02-15 16:47:53 -08:00
Timothy 4d8c8e9308 feat(arch): architecture patches to support multi-entry agents consuming external events 2026-02-15 16:19:58 -08:00
Amit Kumar b70885934c [Integration]: Cal.com - Open Source Scheduling Infrastructure #3188 (#3255)
* feat(tools): add Cal.com scheduling integration with 8 MCP tools

Adds Cal.com API integration for booking and scheduling management:
- calcom_list_bookings, calcom_get_booking, calcom_create_booking, calcom_cancel_booking
- calcom_get_availability, calcom_update_schedule
- calcom_list_event_types, calcom_get_event_type

Includes dedicated credential spec, 20 unit tests, and full integration conformance.
Resolves #3188

* fix(calcom): address PR review + add missing list_schedules MCP tool

- Add isinstance(api_key, str) guard in _get_api_key, return None on
  non-string values for consistent error-dict handling
- Remove duplicate metadata from responses object in create_booking;
  metadata stays at top-level per Cal.com v1 API spec
- Expose availability parameter in calcom_update_schedule MCP tool
- Add calcom_list_schedules tool wrapping existing client method —
  needed to discover schedule IDs before calling update_schedule
- Register calcom_list_schedules in credential spec for CI conformance
- Add tests for credential handling, schedule availability, and
  list_schedules (24 tests total, 9 tools)

* docs: update README to include calcom_list_schedules (9 tools)

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-15 20:14:00 +08:00
paarths-collab 722b087fc0 docs(readme): clarify installation and prevent root pip install misuse 2026-02-15 17:39:41 +05:30
Aaryann Chandola 0c7ea272db [integration] feat(tools): add Google Calendar integration (#3171)
* feat(calendar): add Google Calendar integration with event management tools and health checks

* fix(calendar): align google_calendar_oauth credential spec with codebase pattern
2026-02-15 08:25:53 +08:00
Aaryann Chandola 5e4f322fc0 add new time tool for current date/time retrieval (#3425) 2026-02-14 21:57:07 +08:00
zhanglinqian c02e45f1aa docs: fix incorrect directory names in recipes README 2026-02-14 21:19:47 +08:00
LukeM94 a7217f138c Fix typos in docs/roadmap.md
Correct hyphenation and spelling in the product roadmap: change 'outcome oriented' to 'outcome-oriented' and fix 'Workder' to 'Worker' in the Deployment section.
2026-02-14 13:18:03 +00:00
Emmanuel Nwanguma 3502f25048 [Integration] feat(tools): add BigQuery MCP tool for SQL querying and data analysis (#3350)
* feat(tools): add BigQuery MCP tool for SQL querying and data analysis

- Add run_bigquery_query tool for executing read-only SQL queries
- Add describe_dataset tool for exploring dataset schemas
- Implement safety features: read-only enforcement, row limits (max 10k)
- Add comprehensive unit tests (27 tests passing)
- Follow CredentialStoreAdapter pattern from email tool
- Support ADC and service account authentication

Fixes #3067

* fix(bigquery): address PR review feedback

- Add credential_id and api_key_instructions to CredentialSpec
- Fix credential key name from 'bigquery_credentials' to 'bigquery'
- Pass credential path to BigQuery client via environment variable
- Fix ADC error message detection for both error variants
- Move google-cloud-bigquery to optional dependencies
- Update tests to use correct credential key names

All 27 tests passing

* fix(bigquery): return {error, help} when dependency missing

* fix(bigquery): return full ImportError message for missing dependency

* fix(bigquery): include 'help' key when dependency missing
2026-02-14 19:48:03 +08:00
RichardTang-Aden 93c026fe31 Merge pull request #4759 from adenhq/feat/sample-agent/job_hunter
[Feature][Sample Agent]: Job Hunting Agent
2026-02-13 20:24:35 -08:00
bryan e515977b96 Merge branch 'main' into feat/vulnerability_agent 2026-02-13 20:16:48 -08:00
bryan 045490a097 testing agent run 1-5 2026-02-13 20:16:12 -08:00
Richard Tang b25903fb7f feat: init job hunter 2026-02-13 20:08:55 -08:00
bryan acf4bd5152 tools for sample agent 2026-02-13 19:14:59 -08:00
Timothy 1f5711e1a1 Merge branch 'fix/transient-error-handlings' into feat/inbox-management 2026-02-13 18:53:33 -08:00
RichardTang-Aden ca2dd90313 Merge pull request #4610 from adenhq/feat/inbox-management
Feat/inbox management
2026-02-13 18:51:26 -08:00
Timothy 21e07f3b65 Merge branch 'feature/load-by-bytes' into feat/inbox-management 2026-02-13 18:43:45 -08:00
Timothy e8a06ddd34 feat: load by bytes instead of rows 2026-02-13 18:41:16 -08:00
bryan 34cc09904f fix pytest 2026-02-13 18:27:22 -08:00
bryan f6bba8b62f Merge branch 'main' into feat/inbox-management 2026-02-13 18:17:57 -08:00
bryan d241ad60f8 updated tui to two panel 2026-02-13 18:13:12 -08:00
Timothy 5a3fcf9a8a Merge branch 'main' into feat/inbox-management 2026-02-13 16:39:07 -08:00
Timothy 1f8a47203f fix: common transient errors and loop detection 2026-02-13 16:14:43 -08:00
RichardTang-Aden 7240090274 Merge pull request #4525 from e-cesar9/fix/4428-windows-defender-exclusions
perf(windows): add Windows Defender exclusions for 40% faster uv sync
2026-02-13 15:49:58 -08:00
Richard Tang 2e6a47c2df fix(windows): replace unicode bullets with ASCII dashes
The bullet character (•) cannot be displayed properly in PowerShell
on some Windows systems. Use ASCII dash (-) instead for compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 15:48:48 -08:00
Richard Tang 7f5ecd7913 fix: uv local storage 2026-02-13 15:47:57 -08:00
RichardTang-Aden 105b98b113 Merge pull request #4646 from jaathavan18/docs/issue-4645-fix-exports-link
docs: fix broken exports/ link in environment-setup.md
2026-02-13 15:20:07 -08:00
RichardTang-Aden 114e65ab41 Merge pull request #4649 from jaathavan18/docs/issue-4648-fix-setup-python-ref
docs: fix references to nonexistent setup-python.sh
2026-02-13 15:19:50 -08:00
RichardTang-Aden 0fc13a5cc3 Merge pull request #4660 from Hima-de/fix/antigravity-setup-docs
docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
2026-02-13 15:19:28 -08:00
bryan e651799e9e Merge branch 'main' into feat/inbox-management 2026-02-13 10:34:33 -08:00
Timothy @aden fcd3e514de Merge pull request #4722 from TimothyZhang7/main
chore(micro-fix): fix removed message
2026-02-13 09:43:05 -08:00
Timothy 7ab41de3a2 chore: fix removed message 2026-02-13 09:40:37 -08:00
Timothy @aden 58e023f277 Merge pull request #4720 from TimothyZhang7/feature/event-source
Feature/event source
2026-02-13 09:37:13 -08:00
Timothy @aden a98f2d5b86 Merge pull request #4647 from TimothyZhang7/feature/memory-inheritance
feat:phased compaction and event bus integration
2026-02-13 09:25:44 -08:00
Amit Kumar eca43231c0 [Feature]: Add Excel Tool for reading/writing .xlsx/.xlsm files #2675 (#3004)
* feat(tools): add Excel tool for reading/writing .xlsx/.xlsm files

Add excel_tool module with 7 MCP tools:
- excel_read: Read data from Excel files with pagination
- excel_write: Create new Excel files
- excel_append: Append rows to existing Excel files
- excel_info: Get file metadata (sheets, columns, rows)
- excel_sheet_list: List sheet names in a file
- excel_sql: Query Excel with SQL (DuckDB), multi-sheet support
- excel_search: Search values across sheets with match options

Includes 56 tests, openpyxl optional dependency, and documentation.

Fixes #2675

* fix(tools): address excel review feedback and stabilize tests
2026-02-13 19:32:07 +08:00
Amit Kumar 6763077887 feat(tools): add Google Maps Platform integration with 6 MCP tools (#3796)
Implements geocoding, routing, and location intelligence via Google Maps
Platform Web Services APIs for logistics, delivery, and location-based
agent workflows.

Tools: maps_geocode, maps_reverse_geocode, maps_directions,
maps_distance_matrix, maps_place_details, maps_place_search

Includes:
- page_token support for paginated place search results
- GoogleMapsHealthChecker for credential validation
- Comprehensive unit tests (42 tool tests, 30 health check tests)

Closes #3179
2026-02-13 19:01:28 +08:00
Siddharth Varshney f85ff8a2f8 feat(tools): add Telegram Bot integration (#3550)
* feat(tools): add Telegram Bot integration

- Add telegram_send_message and telegram_send_document tools
- Add credential spec for TELEGRAM_BOT_TOKEN
- Add comprehensive tests (18 test cases)
- Add README documentation with setup instructions

* fix(telegram_tool): catch network errors instead of letting them raise

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 17:10:40 +08:00
Amit Kumar 1a5c3480e6 feat(tools): add NewsData + Finlight MCP tools for market intelligence (#4473)
* feat(tools): add news MCP tools

* fix(tools): adjust news providers and fallback

* fix(tools): add rate limiting + sentiment normalization to news tools

Address PR feedback: exponential backoff (3 retries, 2^attempt delays)
on HTTP 429 for both NewsData and Finlight with seamless provider
fallback, and normalize sentiment scores to [-1.0, +1.0] range.

* fix(news_tool): make provider fallback lazy and handle network errors

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-02-13 16:25:56 +08:00
Hima-de 69a7fe7b92 docs: fix nonexistent setup-python.sh reference in antigravity-setup.md
Replaces references to ./scripts/setup-python.sh with ./quickstart.sh,
which is the actual setup script in the repository.

Fixes #4648
2026-02-13 06:30:36 +00:00
Emmanuel Nwanguma a5418d760f fix(runtime): validate and create storage path on init (#4466)
- Add path validation in Runtime.__init__()
- Log warning when creating nonexistent storage directory
- Auto-create directory with parents to prevent silent failures later
- Implements Option 3 from issue discussion (explicit, safe, informative)

Fixes #1870
2026-02-13 14:24:14 +08:00
T.Trinath Reddy 0deeb87c63 feat(vision): add GCP Vision API integration (#4231)
* feat(vision): add GCP Vision API integration

* refactor(vision): move GCP Vision credentials to dedicated folder

* fix: clean up credentials imports and updated gitignore

* followed ruff alphabetic order for credentials
2026-02-13 14:00:15 +08:00
Timothy d1d5f49c5a fix: add more events to event bus 2026-02-12 20:42:20 -08:00
bryan 917e23ccc8 Merge branch 'main' into feat/inbox-management 2026-02-12 20:30:34 -08:00
RichardTang-Aden 988922304f Merge pull request #4451 from Ttian18/fix/tina-tui-copy-newline-4423
Release / Create Release (push) Waiting to run
fix(tui): add multiline input and cross-platform clipboard support
2026-02-12 20:27:32 -08:00
bryan ab2bd726c3 Merge branch 'main' into feat/inbox-management 2026-02-12 20:24:46 -08:00
bryan 713fefb163 inbox-manager is now continuous 2026-02-12 20:22:40 -08:00
Timothy 83140a1398 feat: event source in runtime 2026-02-12 19:52:15 -08:00
bryan cafa6dd930 update to inbox management agent 2026-02-12 19:17:54 -08:00
jaathavan18 82e1af1a7a docs: fix references to nonexistent setup-python.sh (#4648) 2026-02-12 22:08:35 -05:00
jaathavan18 30c3dc9205 docs: fix broken exports/ link in environment-setup.md (#4645) 2026-02-12 22:04:25 -05:00
Timothy 9a3c6703e1 feat:phased compaction and event bus integration 2026-02-12 18:41:32 -08:00
RichardTang-Aden e26468aa19 Merge pull request #4587 from Antiarin/feat/codex-integration
feat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support
2026-02-12 17:55:47 -08:00
Richard Tang fe14992696 docs: update documentation 2026-02-12 17:54:08 -08:00
Richard Tang d0775b95c6 fix: remove unused agents.md and quickstart clause 2026-02-12 17:48:31 -08:00
Richard Tang 96121b5757 fix: correct codex setups 2026-02-12 17:35:15 -08:00
Timothy @aden 11c003c48d Merge pull request #4636 from TimothyZhang7/feature/memory-inheritance
Feature: Conversation Memory & Continuous Agent Session
2026-02-12 13:52:19 -08:00
Timothy fbe72c58ae chore: fix ci tests 2026-02-12 13:49:34 -08:00
bryan 816156e87f merge: PR #4636 feature/memory-inheritance into feat/inbox-management
Brings in append_data tool, continuous conversation mode, conversation
judge, phase compaction, and prompt composer from the memory-inheritance
feature branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:23:25 -08:00
bryan 7bceab3cea wip: inbox management agent setup and gmail tool updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 13:22:19 -08:00
Timothy 83d7f56728 chore: update agent skills for new design philosophy 2026-02-12 12:10:37 -08:00
Timothy 76deba2a6a feat: consistent memory system 2026-02-12 11:41:22 -08:00
Antiarin d9d048b9e3 docs: update quickstart script for symlink handling and add Codex CLI documentation 2026-02-12 23:34:17 +05:30
bryan 930f417729 Merge remote-tracking branch 'origin/main' into feat/inbox-management 2026-02-12 08:44:13 -08:00
bryan 8e214d06c1 moved inbox management to sample agents 2026-02-12 08:33:43 -08:00
e-cesar9 63e0348963 fix(ux): improve failure messaging with success rate
Shows clear success rate when some exclusions fail (e.g., "Only 2/3
exclusions added (67%)"). Helps users understand that performance
benefit may be reduced when not all paths are excluded.

- Calculates success rate as percentage
- Shows "X/Y added (Z%)" format for clarity
- Warns that performance benefit may be reduced
- Better visibility of partial failures

Improves user awareness of partial installation issues.
2026-02-12 17:30:17 +01:00
e-cesar9 b46a5f0247 fix(robustness): recheck Defender status before adding exclusions
Adds Test-IsDefenderEnabled helper and rechecks Defender status and
paths before adding exclusions. Prevents adding ineffective exclusions
if Defender was disabled during user prompt (race condition).

- New helper: Test-IsDefenderEnabled() for quick boolean check
- Recheck Defender status immediately before Add operation
- Recheck paths to detect if added by another process
- Clear messages if status changed or already added

Fixes race condition where user could disable Defender or another
process could add exclusions during the prompt delay.
2026-02-12 17:29:44 +01:00
e-cesar9 79dfd90068 fix(security): add path validation for Defender exclusions
Validates that paths are within safe boundaries (project directory or
user AppData) before excluding them from Defender. Prevents accidental
or malicious exclusion of system directories.

- Adds safePrefixes validation (project dir + user AppData only)
- Checks each path against allowed prefixes
- Normalizes paths for consistent comparison
- Warns but processes non-existent paths (they may be created later)

Fixes potential security issue where modified script could exclude
system directories like C:\Windows or C:\Program Files.
2026-02-12 17:29:00 +01:00
Antiarin f9d5c7c751 fix(codex): patch prompt injection, checkout version, and style in codex-issue-triage workflow 2026-02-12 20:25:19 +05:30
Antiarin 8958fb2d88 eat(codex): add project setup, quickstart bootstrap, and docs for Codex agent support 2026-02-12 18:32:58 +05:30
Zhang 3c51f2ac36 fix(tui): address code review feedback on TextArea migration
- Subclass TextArea as ChatTextArea to intercept Enter key before
  the base class swallows it (fixes submission not triggering)
- Remove event.shift access that raises AttributeError on Key events
- Make action_show_sessions directly call _submit_input instead of
  just placing text in the widget
2026-02-11 23:45:30 -08:00
RichardTang-Aden 170a0918f7 Merge pull request #4538 from jaathavan18/docs/issue-4493-remove-changelog-references-v2
docs: remove references to nonexistent CHANGELOG.md
2026-02-11 19:15:00 -08:00
jaathavan18 e3da3b619c docs: remove references to nonexistent CHANGELOG.md (#4493) 2026-02-11 21:48:28 -05:00
RichardTang-Aden 6e32513b79 Merge pull request #4535 from RichardTang-Aden/main
docs: add the automonous agent guide
2026-02-11 17:11:02 -08:00
Richard Tang 520e1963ee docs: add the automonous agent guide 2026-02-11 17:10:21 -08:00
RichardTang-Aden 843b9b55e2 Merge pull request #2862 from himanshu748/feat/antigravity-ide-2571
feat: Antigravity IDE support for MCP servers and skills
2026-02-11 16:59:15 -08:00
Richard Tang ccd305ff96 fix: remove incorrect rules folder 2026-02-11 16:58:02 -08:00
Richard Tang 3bd0d1e48c feat: add Antigravity workflows for all hive skills 2026-02-11 16:56:15 -08:00
Richard Tang d9bfa8e675 docs: rename .antigravity to .agent for Antigravity IDE compatibility 2026-02-11 16:42:26 -08:00
Richard Tang 27746147e2 fix: fix the antigravity config folder 2026-02-11 16:41:21 -08:00
Richard Tang 3a0b642980 fix: update the antigravity config 2026-02-11 16:39:28 -08:00
Richard Tang 8c0241f087 fix: use --directory instead of cwd for Antigravity MCP compatibility 2026-02-11 16:36:12 -08:00
Richard Tang 958d016174 fix: use uv run for MCP servers, script reads from template 2026-02-11 16:03:32 -08:00
Richard Tang 913d318ada docs: emphasize restarting Antigravity after MCP setup, fix config path and skill names 2026-02-11 15:57:11 -08:00
Richard Tang 8212920cb7 fix: correct .antigravity/skills symlinks to point to hive-* directories 2026-02-11 15:52:27 -08:00
Richard Tang 6414be7bd4 fix: change the wrong antigravity mcp path 2026-02-11 15:47:54 -08:00
himanshu748 ac62a82d08 docs: clarify Antigravity setup for everyone
- Define repo root at top; lead with quick start (3 steps)
- Add 'What you get' and prerequisites in one place
- Full setup step-by-step; troubleshooting: problem → fix
- Manual MCP config as single section; verification optional
- Plain language, scannable structure, no duplicate sections
2026-02-11 15:45:56 -08:00
himanshu748 a670548a57 Antigravity: one-command setup script and clearer docs
- Add scripts/setup-antigravity-mcp.sh to auto-detect repo root and write
  ~/.gemini/mcp.json with absolute paths (no manual path editing)
- Lead docs with Quick start (3 steps) and note ./ vs / for the script
- README: point to one-command setup; clarify script runs from repo folder
2026-02-11 15:45:56 -08:00
himanshu748 c4a7463f9d docs(antigravity): global config, absolute cwd, cwd warning note
- Add step to run core/setup_mcp.sh first
- Document that IDE often loads ~/.claude/mcp.json, not project config
- Add Option A: copy to ~/.claude/mcp.json with absolute cwd paths
- Note that cwd schema warning in IDE is a false positive
- Renumber setup steps (1–5)
2026-02-11 15:45:30 -08:00
himanshu748 edf0ac5270 docs: add how-to-verify section for Antigravity setup 2026-02-11 15:45:30 -08:00
himanshu748 8ff6b76f37 feat: Antigravity IDE support for MCP servers and skills (#2571)
- Add .antigravity/mcp_config.json with agent-builder and tools MCP servers
- Add .antigravity/skills/ with symlinks to .claude/skills/ (5 skills)
- Add docs/antigravity-setup.md with setup and troubleshooting
- Update README.md with Antigravity IDE support section
- Update DEVELOPER.md and docs/contributing-lint-setup.md with Antigravity refs

Mirrors Cursor integration for consistent multi-IDE support.
2026-02-11 15:45:30 -08:00
Bryan @ Aden c9f9eb365c Merge pull request #4441 from saurabh007007/feature/docs/spelling
docs:fix typo in docs for the directory in environment steup
2026-02-11 23:37:44 +00:00
e-cesar9 7a17c115d3 perf(windows): add Windows Defender exclusions for 40% faster uv sync
Fixes #4428

- Add Step 2.5 to quickstart.ps1 for optional Defender exclusions
- Requires admin + explicit user consent (default=no)
- Handles non-admin gracefully (shows manual commands)
- Improves uv sync by ~40% and cold-start by ~30% on Windows 11
- Never fails installation (graceful degradation)

Implementation:
- 3 helper functions: Test-IsAdmin, Test-DefenderExclusions, Add-DefenderExclusions
- Excludes: project dir, .venv, %APPDATA%\uv
- Security: Default=no, admin-only, clear trade-offs explained
- Robustness: Handles Defender disabled, existing exclusions, partial failures
- UX: Follows existing Write-Step/Ok/Warn patterns, clipboard fallback
- Testing: Idempotent, can run multiple times safely
2026-02-11 23:57:55 +01:00
Bryan @ Aden 9a2a11055f Merge pull request #4520 from jaathavan18/docs/issue-4495-remove-config-yaml-reference
docs: remove reference to nonexistent config.yaml
2026-02-11 22:36:16 +00:00
bryan f21aecd91c Add Gmail inbox management tools (list, get, trash, modify labels) 2026-02-11 14:33:01 -08:00
jaathavan18 4aef73c1d7 docs: remove reference to nonexistent config.yaml (#4495) 2026-02-11 15:07:04 -05:00
bryan 9df147b450 removed send_budget_alert_email, require provider param 2026-02-11 11:26:23 -08:00
RichardTang-Aden b71b4b0fc2 Merge pull request #4500 from RichardTang-Aden/main
fix: remove unused tools from root .mcp.json
2026-02-11 10:23:45 -08:00
Richard Tang 1bd2510c52 fix: remove unused tools from root .mcp.json 2026-02-11 10:20:20 -08:00
RichardTang-Aden 28b81092f9 Merge pull request #4498 from jaathavan18/docs/issue-4494-fix-env-example-reference
docs: fix .env.example references in tools README
2026-02-11 10:11:53 -08:00
RichardTang-Aden 4b9a3abba6 micro-fix: python lint error (#4499)
* micro-fix: python lint error

* micro-fix: python lint format
2026-02-11 10:09:17 -08:00
Richard Tang 0c76b6dcb1 micro-fix: python lint format 2026-02-11 10:08:40 -08:00
Richard Tang 090a85b41b micro-fix: python lint error 2026-02-11 10:04:25 -08:00
jaathavan18 992d573573 docs: fix .env.example references in tools README (#4494) 2026-02-11 12:58:51 -05:00
e-cesar9 9e768e660b micro-fix: move inline imports to module level in edge.py (#4480)
Fixes #4445

Moved repeated inline imports (logging, json, re) to module-level:
- Eliminates import overhead on every method call
- Follows PEP 8 conventions
- Added module-level logger instance
- re is used at line 259 (re.search)

Changes:
- 4 lines added (imports + logger)
- 13 lines removed (inline imports)
- No functional changes
2026-02-11 09:55:14 -08:00
RichardTang-Aden 26b9ed362e Merge pull request #4479 from e-cesar9/microfix/4444-typo-stirct
micro-fix: fix typo STIRCT → STRICT in safe_eval.py
2026-02-11 09:48:55 -08:00
Timothy @aden 976ae75fde Merge pull request #4487 from adenhq/feature/windows-quickstart
chore(micro-fix): windows quickstart
2026-02-11 08:18:53 -08:00
e-cesar9 d63dd021ab micro-fix: fix typo STIRCT → STRICT in safe_eval.py
Fixes #4444
2026-02-11 14:40:25 +01:00
Zhang 697ba89314 fix(tui): add multiline input and cross-platform clipboard support (#4423)
Replace single-line Input widget with TextArea in chat REPL so
Shift+Enter inserts newlines and multiline paste works correctly.
Add Windows clipboard support (clip.exe) and xsel fallback for Linux.
2026-02-11 00:01:09 -08:00
saurabh007007 373ad77008 docs:fix typo in docs 2026-02-11 11:46:26 +05:30
oluwasegun.haziz.omd 7fae57f311 more fixes 2026-02-03 11:53:45 +00:00
haliaeetusvocifer 1f653969a9 added feature google docs integration 2026-02-03 03:51:39 +00:00
255 changed files with 35975 additions and 7886 deletions
+9
View File
@@ -0,0 +1,9 @@
{
"mcpServers": {
"agent-builder": {
"command": "uv",
"args": ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"],
"disabled": false
}
}
}
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-concepts
---
use hive-concepts skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-create
---
use hive-create skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-credentials
---
use hive-credentials skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-patterns
---
use hive-patterns skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive-test
---
use hive-test skill
+5
View File
@@ -0,0 +1,5 @@
---
description: hive
---
use hive skill
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-concepts
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-create
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-credentials
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-patterns
+1
View File
@@ -0,0 +1 @@
../../.claude/skills/hive-test
+67 -5
View File
@@ -492,7 +492,7 @@ AskUserQuestion(questions=[{
- node_id (kebab-case)
- name
- description
- node_type: `"event_loop"` (recommended for all LLM work) or `"function"` (deterministic, no LLM)
- node_type: `"event_loop"` (the only valid type; use `client_facing: True` for HITL)
- input_keys (what data this node receives)
- output_keys (what data this node produces)
- tools (ONLY tools that exist from Step 1 — empty list if no tools needed)
@@ -553,6 +553,26 @@ AskUserQuestion(questions=[{
- condition_expr (Python expression, only if conditional)
- priority (positive = forward, negative = feedback/loop-back)
**DETERMINE the graph lifecycle.** Not every agent needs a terminal node:
| Pattern | `terminal_nodes` | When to Use |
|---------|-------------------|-------------|
| **Linear (finish)** | `["last-node"]` | Agent completes a task and exits (batch processing, one-shot generation) |
| **Forever-alive (loop)** | `[]` (empty) | Agent stays alive for continuous interaction (research assistant, personal assistant, monitoring) |
**Forever-alive pattern:** The deep_research_agent example uses `terminal_nodes=[]`. Every leaf node has edges that loop back to earlier nodes, creating a perpetual session. The agent only stops when the user explicitly exits. This is the preferred pattern for interactive, multi-turn agents.
**Key design rules for forever-alive graphs:**
- Every node must have at least one outgoing edge (no dead ends)
- Client-facing nodes block for user input — these are the natural "pause points"
- The user controls when to stop, not the graph
- Sessions accumulate memory across loops — plan for conversation compaction
- Use `conversation_mode="continuous"` to preserve conversation history across node transitions
- `max_iterations` should be set high (e.g., 100) since the agent is designed to run indefinitely
- The agent will NOT enter a "completed" execution state — this is intentional, not a bug
**Ask the user** which lifecycle pattern fits their agent. Default to forever-alive for interactive agents, linear for batch/one-shot tasks.
**RENDER the complete graph as ASCII art.** Make it large and clear — the user needs to see and understand the full workflow at a glance.
**IMPORTANT: Make the ASCII art BIG and READABLE.** Use a box-and-arrow style with generous spacing. Do NOT make it tiny or compressed. Example format:
@@ -832,8 +852,7 @@ cd /home/timothy/oss/hive && PYTHONPATH=exports uv run python -m AGENT_NAME vali
| Type | tools param | Use when |
| ------------ | ----------------------- | --------------------------------------- |
| `event_loop` | `'["tool1"]'` or `'[]'` | LLM-powered work with or without tools |
| `function` | N/A | Deterministic Python operations, no LLM |
| `event_loop` | `'["tool1"]'` or `'[]'` | All agent work (with or without tools, HITL via client_facing) |
---
@@ -912,6 +931,46 @@ result = await executor.execute(graph=graph, goal=goal, input_data=input_data)
---
## REFERENCE: Graph Lifecycle & Conversation Memory
### Terminal vs Forever-Alive Graphs
Agents have two lifecycle patterns:
**Linear (terminal) graphs** have `terminal_nodes=["last-node"]`. Execution ends when the terminal node completes. The session enters a "completed" state. Use for batch processing, one-shot generation, and fire-and-forget tasks.
**Forever-alive graphs** have `terminal_nodes=[]` (empty). Every node has at least one outgoing edge — the graph loops indefinitely. The session **never enters a "completed" state** — this is intentional. The agent stays alive until the user explicitly exits. Use for interactive assistants, research tools, and any agent where the user drives the conversation.
The deep_research_agent example demonstrates this: `report` loops back to either `research` (dig deeper) or `intake` (new topic). The agent is a persistent, interactive assistant.
### Continuous Conversation Mode
When `conversation_mode="continuous"` is set on the GraphSpec, the framework preserves a **single conversation thread** across all node transitions:
**What the framework does automatically:**
- **Inherits conversation**: Same message history carries forward to the next node
- **Composes layered system prompts**: Identity (agent-level) + Narrative (auto-generated state summary) + Focus (per-node instructions)
- **Inserts transition markers**: At each node boundary, a "State of the World" message showing completed phases, current memory, and available data files
- **Accumulates tools**: Once a tool becomes available, it stays available in subsequent nodes
- **Compacts opportunistically**: At phase transitions, old tool results are pruned to stay within token budget
**What this means for agent builders:**
- Nodes don't need to re-explain context — the conversation carries it forward
- Output keys from earlier nodes are available in memory for edge conditions and later nodes
- For forever-alive agents, conversation memory persists across the entire session lifetime
- Plan for compaction: very long sessions will have older tool results summarized automatically
**When to use continuous mode:**
- Interactive agents with client-facing nodes (always)
- Multi-phase workflows where context matters across phases
- Forever-alive agents that loop indefinitely
**When NOT to use continuous mode:**
- Embarrassingly parallel fan-out nodes (each branch should be independent)
- Stateless utility agents that process items independently
---
## REFERENCE: Framework Capabilities for Qualification
Use this reference during STEP 2 to give accurate, honest assessments.
@@ -944,11 +1003,11 @@ Use this reference during STEP 2 to give accurate, honest assessments.
| Use Case | Why It's Problematic | Alternative |
|----------|---------------------|-------------|
| Long-running daemons | Framework is request-response, not persistent | External scheduler + agent |
| Persistent background daemons (no user) | Forever-alive graphs need a user at client-facing nodes; no autonomous background polling without user | External scheduler triggering agent runs |
| Sub-second responses | LLM latency is inherent | Traditional code, no LLM |
| Processing millions of items | Context windows and rate limits | Batch processing + sampling |
| Real-time streaming data | No built-in pub/sub or streaming input | Custom MCP server + agent |
| Guaranteed determinism | LLM outputs vary | Function nodes for deterministic parts |
| Guaranteed determinism | LLM outputs vary | Traditional code for deterministic parts |
| Offline/air-gapped | Requires LLM API access | Local models (not currently supported) |
| Multi-user concurrency | Single-user session model | Separate agent instances per user |
@@ -979,3 +1038,6 @@ Use this reference during STEP 2 to give accurate, honest assessments.
11. **Adding framework gating for LLM behavior** - Fix prompts or use judges, not ad-hoc code
12. **Writing code before user approves the graph** - Always get approval on goal, nodes, and graph BEFORE writing any agent code
13. **Wrong mcp_servers.json format** - Use flat format (no `"mcpServers"` wrapper), `cwd` must be `"../../tools"`, and `command` must be `"uv"` with args `["run", "python", ...]`
14. **Assuming all agents need terminal nodes** - Interactive agents often work best with `terminal_nodes=[]` (forever-alive pattern). The agent never enters "completed" state — this is intentional. Only batch/one-shot agents need terminal nodes
15. **Creating dead-end nodes in forever-alive graphs** - Every node must have at least one outgoing edge. A node with no outgoing edges will cause execution to end unexpectedly, breaking the forever-alive loop
16. **Not using continuous conversation mode for interactive agents** - Multi-phase interactive agents should use `conversation_mode="continuous"` to preserve context across node transitions. Without it, each node starts with a blank conversation and loses all prior context
@@ -1,12 +1,15 @@
"""Agent graph construction for Deep Research Agent."""
from pathlib import Path
from framework.graph import EdgeSpec, EdgeCondition, Goal, SuccessCriterion, Constraint
from framework.graph.edge import GraphSpec
from framework.graph.executor import ExecutionResult, GraphExecutor
from framework.runtime.event_bus import EventBus
from framework.runtime.core import Runtime
from framework.graph.executor import ExecutionResult
from framework.graph.checkpoint_config import CheckpointConfig
from framework.llm import LiteLLMProvider
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from .config import default_config, metadata
from .nodes import (
@@ -120,13 +123,31 @@ edges = [
condition_expr="needs_more_research == False",
priority=2,
),
# report -> research (user wants deeper research on current topic)
EdgeSpec(
id="report-to-research",
source="report",
target="research",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() == 'more_research'",
priority=2,
),
# report -> intake (user wants a new topic — default when not more_research)
EdgeSpec(
id="report-to-intake",
source="report",
target="intake",
condition=EdgeCondition.CONDITIONAL,
condition_expr="str(next_action).lower() != 'more_research'",
priority=1,
),
]
# Graph configuration
entry_node = "intake"
entry_points = {"start": "intake"}
pause_nodes = []
terminal_nodes = ["report"]
terminal_nodes = []
class DeepResearchAgent:
@@ -136,6 +157,12 @@ class DeepResearchAgent:
Flow: intake -> research -> review -> report
^ |
+-- feedback loop (if user wants more)
Uses AgentRuntime for proper session management:
- Session-scoped storage (sessions/{session_id}/)
- Checkpointing for resume capability
- Runtime logging
- Data folder for save_data/load_data
"""
def __init__(self, config=None):
@@ -147,10 +174,10 @@ class DeepResearchAgent:
self.entry_points = entry_points
self.pause_nodes = pause_nodes
self.terminal_nodes = terminal_nodes
self._executor: GraphExecutor | None = None
self._graph: GraphSpec | None = None
self._event_bus: EventBus | None = None
self._agent_runtime: AgentRuntime | None = None
self._tool_registry: ToolRegistry | None = None
self._storage_path: Path | None = None
def _build_graph(self) -> GraphSpec:
"""Build the GraphSpec."""
@@ -171,16 +198,20 @@ class DeepResearchAgent:
"max_tool_calls_per_turn": 20,
"max_history_tokens": 32000,
},
conversation_mode="continuous",
identity_prompt=(
"You are a rigorous research agent. You search for information "
"from diverse, authoritative sources, analyze findings critically, "
"and produce well-cited reports. You never fabricate information — "
"every claim must trace back to a source you actually retrieved."
),
)
def _setup(self, mock_mode=False) -> GraphExecutor:
"""Set up the executor with all components."""
from pathlib import Path
def _setup(self, mock_mode=False) -> None:
"""Set up the agent runtime with sessions, checkpoints, and logging."""
self._storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
self._storage_path.mkdir(parents=True, exist_ok=True)
storage_path = Path.home() / ".hive" / "agents" / "deep_research_agent"
storage_path.mkdir(parents=True, exist_ok=True)
self._event_bus = EventBus()
self._tool_registry = ToolRegistry()
mcp_config_path = Path(__file__).parent / "mcp_servers.json"
@@ -199,47 +230,63 @@ class DeepResearchAgent:
tools = list(self._tool_registry.get_tools().values())
self._graph = self._build_graph()
runtime = Runtime(storage_path)
self._executor = GraphExecutor(
runtime=runtime,
checkpoint_config = CheckpointConfig(
enabled=True,
checkpoint_on_node_start=False,
checkpoint_on_node_complete=True,
checkpoint_max_age_days=7,
async_checkpoint=True,
)
entry_point_specs = [
EntryPointSpec(
id="default",
name="Default",
entry_node=self.entry_node,
trigger_type="manual",
isolation_level="shared",
)
]
self._agent_runtime = create_agent_runtime(
graph=self._graph,
goal=self.goal,
storage_path=self._storage_path,
entry_points=entry_point_specs,
llm=llm,
tools=tools,
tool_executor=tool_executor,
event_bus=self._event_bus,
storage_path=storage_path,
loop_config=self._graph.loop_config,
checkpoint_config=checkpoint_config,
)
return self._executor
async def start(self, mock_mode=False) -> None:
"""Set up the agent (initialize executor and tools)."""
if self._executor is None:
"""Set up and start the agent runtime."""
if self._agent_runtime is None:
self._setup(mock_mode=mock_mode)
if not self._agent_runtime.is_running:
await self._agent_runtime.start()
async def stop(self) -> None:
"""Clean up resources."""
self._executor = None
self._event_bus = None
"""Stop the agent runtime and clean up."""
if self._agent_runtime and self._agent_runtime.is_running:
await self._agent_runtime.stop()
self._agent_runtime = None
async def trigger_and_wait(
self,
entry_point: str,
input_data: dict,
entry_point: str = "default",
input_data: dict | None = None,
timeout: float | None = None,
session_state: dict | None = None,
) -> ExecutionResult | None:
"""Execute the graph and wait for completion."""
if self._executor is None:
if self._agent_runtime is None:
raise RuntimeError("Agent not started. Call start() first.")
if self._graph is None:
raise RuntimeError("Graph not built. Call start() first.")
return await self._executor.execute(
graph=self._graph,
goal=self.goal,
input_data=input_data,
return await self._agent_runtime.trigger_and_wait(
entry_point_id=entry_point,
input_data=input_data or {},
session_state=session_state,
)
@@ -250,7 +297,7 @@ class DeepResearchAgent:
await self.start(mock_mode=mock_mode)
try:
result = await self.trigger_and_wait(
"start", context, session_state=session_state
"default", context, session_state=session_state
)
return result or ExecutionResult(success=False, error="Execution timeout")
finally:
@@ -10,8 +10,13 @@ intake_node = NodeSpec(
description="Discuss the research topic with the user, clarify scope, and confirm direction",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["topic"],
output_keys=["research_brief"],
success_criteria=(
"The research brief is specific and actionable: it states the topic, "
"the key questions to answer, the desired scope, and depth."
),
system_prompt="""\
You are a research intake specialist. The user wants to research a topic.
Have a brief conversation to clarify what they need.
@@ -38,10 +43,14 @@ research_node = NodeSpec(
name="Research",
description="Search the web, fetch source content, and compile findings",
node_type="event_loop",
max_node_visits=3,
max_node_visits=0,
input_keys=["research_brief", "feedback"],
output_keys=["findings", "sources", "gaps"],
nullable_output_keys=["feedback"],
success_criteria=(
"Findings reference at least 3 distinct sources with URLs. "
"Key claims are substantiated by fetched content, not generated."
),
system_prompt="""\
You are a research agent. Given a research brief, find and analyze sources.
@@ -56,18 +65,26 @@ Work in phases:
and any contradictions between sources.
Important:
- Work in batches of 3-4 tool calls at a time to manage context
- Work in batches of 3-4 tool calls at a time never more than 10 per turn
- After each batch, assess whether you have enough material
- Prefer quality over quantity 5 good sources beat 15 thin ones
- Track which URL each finding comes from (you'll need citations later)
- Call set_output for each key in a SEPARATE turn (not in the same turn as other tool calls)
When done, use set_output:
When done, use set_output (one key at a time, separate turns):
- set_output("findings", "Structured summary: key findings with source URLs for each claim. \
Include themes, contradictions, and confidence levels.")
- set_output("sources", [{"url": "...", "title": "...", "summary": "..."}])
- set_output("gaps", "What aspects of the research brief are NOT well-covered yet, if any.")
""",
tools=["web_search", "web_scrape", "load_data", "save_data", "list_data_files"],
tools=[
"web_search",
"web_scrape",
"load_data",
"save_data",
"append_data",
"list_data_files",
],
)
# Node 3: Review (client-facing)
@@ -78,9 +95,13 @@ review_node = NodeSpec(
description="Present findings to user and decide whether to research more or write the report",
node_type="event_loop",
client_facing=True,
max_node_visits=3,
max_node_visits=0,
input_keys=["findings", "sources", "gaps", "research_brief"],
output_keys=["needs_more_research", "feedback"],
success_criteria=(
"The user has been presented with findings and has explicitly indicated "
"whether they want more research or are ready for the report."
),
system_prompt="""\
Present the research findings to the user clearly and concisely.
@@ -109,49 +130,70 @@ report_node = NodeSpec(
description="Write a cited HTML report from the findings and present it to the user",
node_type="event_loop",
client_facing=True,
max_node_visits=0,
input_keys=["findings", "sources", "research_brief"],
output_keys=["delivery_status"],
output_keys=["delivery_status", "next_action"],
success_criteria=(
"An HTML report has been saved, the file link has been presented to the user, "
"and the user has indicated what they want to do next."
),
system_prompt="""\
Write a comprehensive research report as an HTML file and present it to the user.
Write a research report as an HTML file and present it to the user.
**STEP 1 Write the HTML report (tool calls, NO text to user yet):**
IMPORTANT: save_data requires TWO separate arguments: filename and data.
Call it like: save_data(filename="report.html", data="<html>...</html>")
Do NOT use _raw, do NOT nest arguments inside a JSON string.
1. Compose a complete, self-contained HTML document with embedded CSS styling.
Use a clean, readable design: max-width container, pleasant typography,
numbered citation links, a table of contents, and a references section.
**STEP 1 Write and save the HTML report (tool calls, NO text to user yet):**
Report structure inside the HTML:
- Title & date
- Executive Summary (2-3 paragraphs)
- Table of Contents
- Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications, areas of debate)
- Conclusion (key takeaways, confidence assessment)
- References (numbered list with clickable URLs)
Build a clean HTML document. Keep the HTML concise aim for clarity over length.
Use minimal embedded CSS (a few lines of style, not a full framework).
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Distinguish well-supported conclusions from speculation
- Answer the original research questions from the brief
Report structure:
- Title & date
- Executive Summary (2-3 paragraphs)
- Key Findings (organized by theme, with [n] citation links)
- Analysis (synthesis, implications)
- Conclusion (key takeaways)
- References (numbered list with clickable URLs)
2. Save the HTML file:
save_data(filename="report.html", data=<your_html>)
Requirements:
- Every factual claim must cite its source with [n] notation
- Be objective present multiple viewpoints where sources disagree
- Answer the original research questions from the brief
3. Get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
Save the HTML:
save_data(filename="report.html", data="<html>...</html>")
Then get the clickable link:
serve_file_to_user(filename="report.html", label="Research Report")
If save_data fails, simplify and shorten the HTML, then retry.
**STEP 2 Present the link to the user (text only, NO tool calls):**
Tell the user the report is ready and include the file:// URI from
serve_file_to_user so they can click it to open. Give a brief summary
of what the report covers. Ask if they have questions.
of what the report covers. Ask if they have questions or want to continue.
**STEP 3 After the user responds:**
- Answer follow-up questions from the research material
- When the user is satisfied: set_output("delivery_status", "completed")
- Answer any follow-up questions from the research material
- When the user is ready to move on, ask what they'd like to do next:
- Research a new topic?
- Dig deeper into the current topic?
- Then call set_output:
- set_output("delivery_status", "completed")
- set_output("next_action", "new_topic") if they want a new topic
- set_output("next_action", "more_research") if they want deeper research
""",
tools=["save_data", "serve_file_to_user", "load_data", "list_data_files"],
tools=[
"save_data",
"append_data",
"edit_data",
"serve_file_to_user",
"load_data",
"list_data_files",
],
)
__all__ = [
+1 -1
View File
@@ -508,7 +508,7 @@ All credential specs are defined in `tools/src/aden_tools/credentials/`:
| `llm.py` | LLM Providers | `anthropic` | No |
| `search.py` | Search Tools | `brave_search`, `google_search`, `google_cse` | No |
| `email.py` | Email | `resend` | No |
| `integrations.py` | Integrations | `github`, `hubspot` | No / Yes |
| `integrations.py` | Integrations | `github`, `hubspot`, `google_calendar_oauth` | No / Yes |
**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment
variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS.
+23 -1
View File
@@ -26,6 +26,17 @@ Use `/hive-debugger` when:
This skill works alongside agents running in TUI mode and provides supervisor-level insights into execution behavior.
### Forever-Alive Agent Awareness
Some agents use `terminal_nodes=[]` (the "forever-alive" pattern), meaning they loop indefinitely and never enter a "completed" execution state. For these agents:
- Sessions with status "in_progress" or "paused" are **normal**, not failures
- High step counts, long durations, and many node visits are expected behavior
- The agent stops only when the user explicitly exits — there is no graph-driven completion
- Debug focus should be on **quality of individual node visits and iterations**, not whether the session reached a terminal state
- Conversation memory accumulates across loops — watch for context overflow and stale data issues
**How to identify forever-alive agents:** Check `agent.py` or `agent.json` for `terminal_nodes=[]` (empty list). If empty, the agent is forever-alive.
---
## Prerequisites
@@ -142,6 +153,7 @@ Store the selected mode for the session.
- Check `attention_summary.categories` for issue types
- Note the `run_id` of problematic sessions
- Check `status` field: "degraded", "failure", "in_progress"
- **For forever-alive agents:** Sessions with status "in_progress" or "paused" are normal — these agents never reach "completed". Only flag sessions with `needs_attention: true` or actual error indicators (tool failures, retry loops, missing outputs). High step counts alone do not indicate a problem.
3. **Attention flag triggers to understand:**
From runtime_logger.py, runs are flagged when:
@@ -199,13 +211,20 @@ Which run would you like to investigate?
| **Tool Errors** | `tool_error_count > 0`, `attention_reasons` contains "tool_failures" | Tool calls failed (API errors, timeouts, auth issues) |
| **Retry Loops** | `retry_count > 3`, `verdict_counts.RETRY > 5` | Judge repeatedly rejecting outputs |
| **Guard Failures** | `guard_reject_count > 0` | Output validation failed (wrong types, missing keys) |
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress |
| **Stalled Execution** | `total_steps > 20`, `verdict_counts.CONTINUE > 10` | EventLoopNode not making progress. **Caveat:** Forever-alive agents may legitimately have high step counts — check if agent is blocked at a client-facing node (normal) vs genuinely stuck in a loop |
| **High Latency** | `latency_ms > 60000`, `avg_step_latency > 5000` | Slow tool calls or LLM responses |
| **Client-Facing Issues** | `client_input_requested` but no `user_input_received` | Premature set_output before user input |
| **Edge Routing Errors** | `exit_status == "no_valid_edge"`, `attention_reasons` contains "routing_issue" | No edges match current state |
| **Memory/Context Issues** | `tokens_used > 100000`, `context_overflow_count > 0` | Conversation history too long |
| **Constraint Violations** | Compare output against goal constraints | Agent violated goal-level rules |
**Forever-Alive Agent Caveat:** If the agent uses `terminal_nodes=[]`, sessions will never reach "completed" status. This is by design. When debugging these agents, focus on:
- Whether individual node visits succeed (not whether the graph "finishes")
- Quality of each loop iteration — are outputs improving or degrading across loops?
- Whether client-facing nodes are correctly blocking for user input
- Memory accumulation issues: stale data from previous loops, context overflow across many iterations
- Conversation compaction behavior: is the conversation growing unbounded?
3. **Analyze each flagged node:**
- Node ID and name
- Exit status
@@ -1015,6 +1034,9 @@ Your agent should now work correctly!"
3. **Don't ignore edge conditions** - Missing edges cause routing failures
4. **Don't overlook judge configuration** - Mismatched expectations cause retry loops
5. **Don't forget nullable_output_keys** - Optional inputs need explicit marking
6. **Don't diagnose "in_progress" as a failure for forever-alive agents** - Agents with `terminal_nodes=[]` are designed to never enter "completed" state. This is intentional. Focus on quality of individual node visits, not session completion status
7. **Don't ignore conversation memory issues in long-running sessions** - In continuous conversation mode, history grows across node transitions and loop iterations. Watch for context overflow (tokens_used > 100K), stale data from previous loops affecting edge conditions, and compaction failures that cause the LLM to lose important context
8. **Don't confuse "waiting for user" with "stalled"** - Client-facing nodes in forever-alive agents block for user input by design. A session paused at a client-facing node is working correctly, not stalled
---
+7
View File
@@ -0,0 +1,7 @@
# Project-level Codex config for Hive.
# Keep this file minimal: MCP connectivity + skill discovery.
[mcp_servers.agent-builder]
command = "uv"
args = ["run", "--directory", "core", "-m", "framework.mcp.agent_builder_server"]
cwd = "."
+3 -1
View File
@@ -74,4 +74,6 @@ exports/*
docs/github-issues/*
core/tests/*dumps/*
screenshots/*
screenshots/*
-5
View File
@@ -4,11 +4,6 @@
"command": "uv",
"args": ["run", "-m", "framework.mcp.agent_builder_server"],
"cwd": "core"
},
"tools": {
"command": "uv",
"args": ["run", "mcp_server.py", "--stdio"],
"cwd": "tools"
}
}
}
-20
View File
@@ -1,20 +0,0 @@
---
name: hive
description: Hive Agent Builder & Manager
mode: primary
tools:
agent-builder: true
tools: true
---
# Hive Agent
You are the Hive Agent Builder. Your goal is to help the user construct, configure, and deploy AI agents using the Hive framework.
## Capabilities
1. **Scaffold Agents:** Create new agent directories/configs.
2. **Manage Tools:** Add/remove tools via MCP.
3. **Debug:** Analyze agent workflows.
## Context
- You are an expert in the Hive framework architecture.
- Always use the `agent-builder` MCP server for filesystem operations.
+1 -2
View File
@@ -99,8 +99,7 @@ docs(readme): update installation instructions
2. Update documentation if needed
3. Add tests for new functionality
4. Ensure `make check` and `make test` pass
5. Update the CHANGELOG.md if applicable
6. Request review from maintainers
5. Request review from maintainers
### PR Title Format
+29 -2
View File
@@ -22,7 +22,6 @@
<img src="https://img.shields.io/badge/MCP-102_Tools-00ADD8?style=flat-square" alt="MCP" />
</p>
<p align="center">
<img src="https://img.shields.io/badge/AI_Agents-Self--Improving-brightgreen?style=flat-square" alt="AI Agents" />
<img src="https://img.shields.io/badge/Multi--Agent-Systems-blue?style=flat-square" alt="Multi-Agent" />
@@ -82,12 +81,17 @@ Use Hive when you need:
### Prerequisites
- Python 3.11+ for agent development
- Claude Code or Cursor for utilizing agent skills
- Claude Code, Codex CLI, or Cursor for utilizing agent skills
> **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
### Installation
>**Note**
> Hive uses a `uv` workspace layout and is not installed with `pip install`.
> Running `pip install -e .` from the repository root will create a placeholder package and Hive will not function correctly.
> Please use the quickstart script below to set up the environment.
```bash
# Clone the repository
git clone https://github.com/adenhq/hive.git
@@ -121,6 +125,18 @@ hive tui
hive run exports/your_agent_name --input '{"key": "value"}'
```
## Coding Agent Support
### Codex CLI
Hive includes native support for [OpenAI Codex CLI](https://github.com/openai/codex) (v0.101.0+).
1. **Config:** `.codex/config.toml` with `agent-builder` MCP server (tracked in git)
2. **Skills:** `.agents/skills/` symlinks to Hive skills (tracked in git)
3. **Launch:** Run `codex` in the repo root, then type `use hive`
Example:
```
codex> use hive
```
### Opencode
Hive includes native support for [Opencode](https://github.com/opencode-ai/opencode).
@@ -133,6 +149,16 @@ The agent has access to all Hive skills and can scaffold agents, add tools, and
**[📖 Complete Setup Guide](docs/environment-setup.md)** - Detailed instructions for agent development
### Antigravity IDE Support
Skills and MCP servers are also available in [Antigravity IDE](https://antigravity.google/) (Google's AI-powered IDE). **Easiest:** open a terminal in the hive repo folder and run (use `./` — the script is inside the repo):
```bash
./scripts/setup-antigravity-mcp.sh
```
**Important:** Always restart/refresh Antigravity IDE after running the setup script—MCP servers only load on startup. After restart, **agent-builder** and **tools** MCP servers should connect. Skills are under `.agent/skills/` (symlinks to `.claude/skills/`). See [docs/antigravity-setup.md](docs/antigravity-setup.md) for manual setup and troubleshooting.
## Features
- **[Goal-Driven Development](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
@@ -312,6 +338,7 @@ subgraph Expansion
j2["Cursor"]
j3["Opencode"]
j4["Antigravity"]
j5["Codex CLI"]
end
subgraph plat["Platform"]
k1["JavaScript/TypeScript SDK"]
+3 -3
View File
@@ -82,7 +82,7 @@ Register an MCP server as a tool source for your agent.
"example_tool"
],
"total_mcp_servers": 1,
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in llm_tool_use nodes."
"note": "MCP server 'tools' registered with 6 tools. These tools can now be used in event_loop nodes."
}
```
@@ -149,7 +149,7 @@ List tools available from registered MCP servers.
]
},
"total_tools": 6,
"note": "Use these tool names in the 'tools' parameter when adding llm_tool_use nodes"
"note": "Use these tool names in the 'tools' parameter when adding event_loop nodes"
}
```
@@ -246,7 +246,7 @@ Here's a complete workflow for building an agent with MCP tools:
"node_id": "web-searcher",
"name": "Web Search",
"description": "Search the web for information",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"query\"]",
"output_keys": "[\"search_results\"]",
"system_prompt": "Search for {query} using the web_search tool",
+2 -2
View File
@@ -119,7 +119,7 @@ builder = WorkflowBuilder()
builder.add_node(
node_id="researcher",
name="Web Researcher",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Research the topic using web_search",
tools=["web_search"], # Tool from tools MCP server
input_keys=["topic"],
@@ -137,7 +137,7 @@ Tools from MCP servers can be referenced in your agent.json just like built-in t
{
"id": "searcher",
"name": "Web Searcher",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"system_prompt": "Search for information about {topic}",
"tools": ["web_search", "web_scrape"],
"input_keys": ["topic"],
+17 -70
View File
@@ -103,31 +103,20 @@ Add a processing node to the agent graph.
- `node_id` (string, required): Unique node identifier
- `name` (string, required): Human-readable name
- `description` (string, required): What this node does
- `node_type` (string, required): One of: `llm_generate`, `llm_tool_use`, `router`, `function`
- `node_type` (string, required): Must be `event_loop` (the only valid type)
- `input_keys` (string, required): JSON array of input variable names
- `output_keys` (string, required): JSON array of output variable names
- `system_prompt` (string, optional): System prompt for LLM nodes
- `tools` (string, optional): JSON array of tool names for tool_use nodes
- `routes` (string, optional): JSON object of route mappings for router nodes
- `system_prompt` (string, optional): System prompt for the LLM
- `tools` (string, optional): JSON array of tool names
- `client_facing` (boolean, optional): Set to true for human-in-the-loop interaction
**Node Types:**
**Node Type:**
1. **llm_generate**: Uses LLM to generate output from inputs
- Requires: `system_prompt`
- Tools: Not used
2. **llm_tool_use**: Uses LLM with tools to accomplish tasks
- Requires: `system_prompt`, `tools`
- Tools: Array of tool names (e.g., `["web_search", "web_fetch"]`)
3. **router**: LLM-powered routing to different paths
- Requires: `system_prompt`, `routes`
- Routes: Object mapping route names to target node IDs
- Example: `{"pass": "success_node", "fail": "retry_node"}`
4. **function**: Executes a pre-defined function
- System prompt describes the function behavior
- No LLM calls, pure computation
**event_loop**: LLM-powered node with self-correction loop
- Requires: `system_prompt`
- Optional: `tools` (array of tool names, e.g., `["web_search", "web_fetch"]`)
- Optional: `client_facing` (set to true for HITL / user interaction)
- Supports: iterative refinement, judge-based evaluation, tool use, streaming
**Example:**
```json
@@ -135,7 +124,7 @@ Add a processing node to the agent graph.
"node_id": "search_sources",
"name": "Search Sources",
"description": "Searches for relevant sources on the topic",
"node_type": "llm_tool_use",
"node_type": "event_loop",
"input_keys": "[\"topic\", \"search_queries\"]",
"output_keys": "[\"sources\", \"source_count\"]",
"system_prompt": "Search for sources using the provided queries...",
@@ -198,7 +187,7 @@ Export the validated graph as an agent specification.
**What it does:**
1. Validates the graph
2. Auto-generates missing edges from router routes
2. Validates edge connectivity
3. Writes files to disk:
- `exports/{agent-name}/agent.json` - Full agent specification
- `exports/{agent-name}/README.md` - Auto-generated documentation
@@ -252,47 +241,6 @@ Test the complete agent graph with sample inputs.
---
### Evaluation Rules
#### `add_evaluation_rule`
Add a rule for the HybridJudge to evaluate node outputs.
**Parameters:**
- `rule_id` (string, required): Unique rule identifier
- `description` (string, required): What this rule checks
- `condition` (string, required): Python expression to evaluate
- `action` (string, required): Action to take: `accept`, `retry`, `escalate`
- `priority` (integer, optional): Rule priority (default: 0)
- `feedback_template` (string, optional): Feedback message template
**Condition Examples:**
- `'result.get("success") == True'` - Check for success flag
- `'result.get("error_type") == "timeout"'` - Check error type
- `'len(result.get("data", [])) > 0'` - Check for non-empty data
**Example:**
```json
{
"rule_id": "timeout_retry",
"description": "Retry on timeout errors",
"condition": "result.get('error_type') == 'timeout'",
"action": "retry",
"priority": 10,
"feedback_template": "Timeout occurred, retrying..."
}
```
#### `list_evaluation_rules`
List all configured evaluation rules.
#### `remove_evaluation_rule`
Remove an evaluation rule.
**Parameters:**
- `rule_id` (string, required): Rule to remove
---
## Example Workflow
Here's a complete workflow for building a research agent:
@@ -320,7 +268,7 @@ add_node(
node_id="planner",
name="Research Planner",
description="Creates research strategy",
node_type="llm_generate",
node_type="event_loop",
input_keys='["topic"]',
output_keys='["strategy", "queries"]',
system_prompt="Analyze topic and create research plan..."
@@ -330,7 +278,7 @@ add_node(
node_id="searcher",
name="Search Sources",
description="Find relevant sources",
node_type="llm_tool_use",
node_type="event_loop",
input_keys='["queries"]',
output_keys='["sources"]',
system_prompt="Search for sources...",
@@ -359,10 +307,9 @@ The exported agent will be saved to `exports/research-agent/`.
1. **Start with the goal**: Define clear success criteria before building nodes
2. **Test nodes individually**: Use `test_node` to verify each node works
3. **Use router nodes for branching**: Don't create edges manually for routers - define routes and they'll be auto-generated
4. **Add evaluation rules**: Help the judge evaluate outputs deterministically
5. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
6. **Check exports**: Review the generated README.md to verify your agent structure
3. **Use conditional edges for branching**: Define condition_expr on edges for decision points
4. **Validate early, validate often**: Run `validate_graph` after adding nodes/edges
5. **Check exports**: Review the generated README.md to verify your agent structure
---
+1 -1
View File
@@ -73,7 +73,7 @@ To use the agent builder with Claude Desktop or other MCP clients, add this to y
The MCP server provides tools for:
- Creating agent building sessions
- Defining goals with success criteria
- Adding nodes (llm_generate, llm_tool_use, router, function)
- Adding nodes (event_loop only)
- Connecting nodes with edges
- Validating and exporting agent graphs
- Testing nodes and full agent graphs
+14 -5
View File
@@ -68,7 +68,7 @@ from framework.graph.event_loop_node import ( # noqa: E402
)
from framework.graph.executor import GraphExecutor # noqa: E402
from framework.graph.goal import Goal # noqa: E402
from framework.graph.node import NodeSpec # noqa: E402
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec # noqa: E402
from framework.llm.litellm import LiteLLMProvider # noqa: E402
from framework.runner.tool_registry import ToolRegistry # noqa: E402
from framework.runtime.core import Runtime # noqa: E402
@@ -654,7 +654,7 @@ NODE_SPECS = {
id="sender",
name="Sender",
description="Send approved campaign emails",
node_type="function",
node_type="event_loop",
input_keys=["approved_emails"],
output_keys=["send_results"],
),
@@ -823,11 +823,20 @@ def _send_email_via_resend(
return {"error": f"Network error: {e}"}
class SenderNode(NodeProtocol):
"""Node wrapper for send_emails function."""
async def execute(self, ctx: NodeContext) -> NodeResult:
approved = ctx.input_data.get("approved_emails", "")
result_str = send_emails(approved_emails=approved)
ctx.memory.write("send_results", result_str)
return NodeResult(success=True, output={"send_results": result_str})
def send_emails(approved_emails: str = "") -> str:
"""Send approved campaign emails via Resend, or log if unconfigured.
Called by FunctionNode which unpacks input_keys as kwargs.
Returns a JSON string (FunctionNode wraps it in NodeResult).
Returns a JSON string.
"""
approved = approved_emails
if not approved:
@@ -1780,7 +1789,7 @@ async def _run_pipeline(websocket, initial_message: str):
)
for nid, impl in nodes.items():
executor.register_node(nid, impl)
executor.register_function("sender", send_emails)
executor.register_node("sender", SenderNode())
# --- Event forwarding: bus → WebSocket ---
+28 -19
View File
@@ -4,8 +4,8 @@ Minimal Manual Agent Example
This example demonstrates how to build and run an agent programmatically
without using the Claude Code CLI or external LLM APIs.
It uses 'function' nodes to define logic in pure Python, making it perfect
for understanding the core runtime loop:
It uses custom NodeProtocol implementations to define logic in pure Python,
making it perfect for understanding the core runtime loop:
Setup -> Graph definition -> Execution -> Result
Run with:
@@ -16,22 +16,33 @@ import asyncio
from framework.graph import EdgeCondition, EdgeSpec, Goal, GraphSpec, NodeSpec
from framework.graph.executor import GraphExecutor
from framework.graph.node import NodeContext, NodeProtocol, NodeResult
from framework.runtime.core import Runtime
# 1. Define Node Logic (Pure Python Functions)
def greet(name: str) -> str:
# 1. Define Node Logic (Custom NodeProtocol implementations)
class GreeterNode(NodeProtocol):
"""Generate a simple greeting."""
return f"Hello, {name}!"
async def execute(self, ctx: NodeContext) -> NodeResult:
name = ctx.input_data.get("name", "World")
greeting = f"Hello, {name}!"
ctx.memory.write("greeting", greeting)
return NodeResult(success=True, output={"greeting": greeting})
def uppercase(greeting: str) -> str:
class UppercaserNode(NodeProtocol):
"""Convert text to uppercase."""
return greeting.upper()
async def execute(self, ctx: NodeContext) -> NodeResult:
greeting = ctx.input_data.get("greeting") or ctx.memory.read("greeting") or ""
result = greeting.upper()
ctx.memory.write("final_greeting", result)
return NodeResult(success=True, output={"final_greeting": result})
async def main():
print("🚀 Setting up Manual Agent...")
print("Setting up Manual Agent...")
# 2. Define the Goal
# Every agent needs a goal with success criteria
@@ -55,8 +66,7 @@ async def main():
id="greeter",
name="Greeter",
description="Generates a simple greeting",
node_type="function",
function="greet", # Matches the registered function name
node_type="event_loop",
input_keys=["name"],
output_keys=["greeting"],
)
@@ -65,8 +75,7 @@ async def main():
id="uppercaser",
name="Uppercaser",
description="Converts greeting to uppercase",
node_type="function",
function="uppercase",
node_type="event_loop",
input_keys=["greeting"],
output_keys=["final_greeting"],
)
@@ -98,23 +107,23 @@ async def main():
runtime = Runtime(storage_path=Path("./agent_logs"))
executor = GraphExecutor(runtime=runtime)
# 7. Register Function Implementations
# Connect string names in NodeSpecs to actual Python functions
executor.register_function("greeter", greet)
executor.register_function("uppercaser", uppercase)
# 7. Register Node Implementations
# Connect node IDs in the graph to actual Python implementations
executor.register_node("greeter", GreeterNode())
executor.register_node("uppercaser", UppercaserNode())
# 8. Execute Agent
print("Executing agent with input: name='Alice'...")
print("Executing agent with input: name='Alice'...")
result = await executor.execute(graph=graph, goal=goal, input_data={"name": "Alice"})
# 9. Verify Results
if result.success:
print("\nSuccess!")
print("\nSuccess!")
print(f"Path taken: {' -> '.join(result.path)}")
print(f"Final output: {result.output.get('final_greeting')}")
else:
print(f"\nFailed: {result.error}")
print(f"\nFailed: {result.error}")
if __name__ == "__main__":
+2 -2
View File
@@ -122,7 +122,7 @@ async def example_4_custom_agent_with_mcp_tools():
node_id="web-searcher",
name="Web Search",
description="Search the web for information",
node_type="llm_tool_use",
node_type="event_loop",
system_prompt="Search for {query} and return the top results. Use the web_search tool.",
tools=["web_search"], # This tool comes from tools MCP server
input_keys=["query"],
@@ -133,7 +133,7 @@ async def example_4_custom_agent_with_mcp_tools():
node_id="summarizer",
name="Summarize Results",
description="Summarize the search results",
node_type="llm_generate",
node_type="event_loop",
system_prompt="Summarize the following search results in 2-3 sentences: {search_results}",
input_keys=["search_results"],
output_keys=["summary"],
+9 -11
View File
@@ -245,20 +245,14 @@ class GraphBuilder:
warnings.append(f"Node '{node.id}' should have a description")
# Type-specific validation
if node.node_type == "llm_tool_use":
if not node.tools:
errors.append(f"LLM tool node '{node.id}' must specify tools")
if not node.system_prompt:
warnings.append(f"LLM node '{node.id}' should have a system_prompt")
if node.node_type == "event_loop":
if node.tools and not node.system_prompt:
warnings.append(f"Event loop node '{node.id}' should have a system_prompt")
if node.node_type == "router":
if not node.routes:
errors.append(f"Router node '{node.id}' must specify routes")
if node.node_type == "function":
if not node.function:
errors.append(f"Function node '{node.id}' must specify function name")
# Check input/output keys
if not node.input_keys:
suggestions.append(f"Consider specifying input_keys for '{node.id}'")
@@ -400,9 +394,13 @@ class GraphBuilder:
if not terminal_candidates and self.session.nodes:
warnings.append("No terminal nodes found (all nodes have outgoing edges)")
# Check reachability
# Check reachability from ALL entry candidates (not just the first one).
# Agents with async entry points have multiple nodes with no incoming
# edges (e.g., a primary entry node and an event-driven entry node).
if entry_candidates and self.session.nodes:
reachable = self._compute_reachable(entry_candidates[0])
reachable = set()
for candidate in entry_candidates:
reachable |= self._compute_reachable(candidate)
unreachable = [n.id for n in self.session.nodes if n.id not in reachable]
if unreachable:
errors.append(f"Unreachable nodes: {unreachable}")
+17
View File
@@ -59,6 +59,13 @@ from .provider import (
CredentialProvider,
StaticProvider,
)
from .setup import (
CredentialSetupSession,
MissingCredential,
SetupResult,
detect_missing_credentials_from_nodes,
run_credential_setup_cli,
)
from .storage import (
CompositeStorage,
CredentialStorage,
@@ -68,6 +75,7 @@ from .storage import (
)
from .store import CredentialStore
from .template import TemplateResolver
from .validation import ensure_credential_key_env, validate_agent_credentials
# Aden sync components (lazy import to avoid httpx dependency when not needed)
# Usage: from core.framework.credentials.aden import AdenSyncProvider
@@ -111,6 +119,15 @@ __all__ = [
"CredentialRefreshError",
"CredentialValidationError",
"CredentialDecryptionError",
# Validation
"ensure_credential_key_env",
"validate_agent_credentials",
# Interactive setup
"CredentialSetupSession",
"MissingCredential",
"SetupResult",
"detect_missing_credentials_from_nodes",
"run_credential_setup_cli",
# Aden sync (optional - requires httpx)
"AdenSyncProvider",
"AdenCredentialClient",
+745
View File
@@ -0,0 +1,745 @@
"""
Interactive credential setup for CLI applications.
Provides a modular, reusable credential setup flow that can be triggered
when validate_agent_credentials() fails. Works with both TUI and headless CLIs.
Usage:
from framework.credentials.setup import CredentialSetupSession
# From agent path
session = CredentialSetupSession.from_agent_path("exports/my-agent")
result = session.run_interactive()
# From nodes directly
session = CredentialSetupSession.from_nodes(nodes)
result = session.run_interactive()
# With custom I/O (for integration with other UIs)
session = CredentialSetupSession(
missing=missing_creds,
input_fn=my_input,
print_fn=my_print,
)
"""
from __future__ import annotations
import getpass
import json
import os
import sys
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from framework.graph import NodeSpec
# ANSI colors for terminal output
class Colors:
RED = "\033[0;31m"
GREEN = "\033[0;32m"
YELLOW = "\033[1;33m"
BLUE = "\033[0;34m"
CYAN = "\033[0;36m"
BOLD = "\033[1m"
DIM = "\033[2m"
NC = "\033[0m" # No Color
@classmethod
def disable(cls):
"""Disable colors (for non-TTY output)."""
cls.RED = cls.GREEN = cls.YELLOW = cls.BLUE = ""
cls.CYAN = cls.BOLD = cls.DIM = cls.NC = ""
@dataclass
class MissingCredential:
"""A credential that needs to be configured."""
credential_name: str
"""Internal credential name (e.g., 'brave_search')"""
env_var: str
"""Environment variable name (e.g., 'BRAVE_SEARCH_API_KEY')"""
description: str
"""Human-readable description"""
help_url: str
"""URL where user can obtain credential"""
api_key_instructions: str
"""Step-by-step instructions for getting API key"""
tools: list[str] = field(default_factory=list)
"""Tools that require this credential"""
node_types: list[str] = field(default_factory=list)
"""Node types that require this credential"""
aden_supported: bool = False
"""Whether Aden OAuth flow is supported"""
direct_api_key_supported: bool = True
"""Whether direct API key entry is supported"""
credential_id: str = ""
"""Credential store ID"""
credential_key: str = "api_key"
"""Key name within the credential"""
@dataclass
class SetupResult:
"""Result of credential setup session."""
success: bool
"""Whether all required credentials were configured"""
configured: list[str] = field(default_factory=list)
"""Credentials that were successfully set up"""
skipped: list[str] = field(default_factory=list)
"""Credentials user chose to skip"""
errors: list[str] = field(default_factory=list)
"""Any errors encountered"""
class CredentialSetupSession:
"""
Interactive credential setup session.
Can be used by any CLI (runner, coding agent, etc.) to guide users
through credential configuration when validation fails.
Example:
from framework.credentials.setup import CredentialSetupSession
from framework.credentials.models import CredentialError
try:
validate_agent_credentials(nodes)
except CredentialError:
session = CredentialSetupSession.from_nodes(nodes)
result = session.run_interactive()
if result.success:
# Retry - credentials are now configured
validate_agent_credentials(nodes)
"""
def __init__(
self,
missing: list[MissingCredential],
input_fn: Callable[[str], str] | None = None,
print_fn: Callable[[str], None] | None = None,
password_fn: Callable[[str], str] | None = None,
):
"""
Initialize the setup session.
Args:
missing: List of credentials that need setup
input_fn: Custom input function (default: built-in input)
print_fn: Custom print function (default: built-in print)
password_fn: Custom password input function (default: getpass.getpass)
"""
self.missing = missing
self.input_fn = input_fn or input
self.print_fn = print_fn or print
self.password_fn = password_fn or getpass.getpass
# Disable colors if not a TTY
if not sys.stdout.isatty():
Colors.disable()
@classmethod
def from_nodes(cls, nodes: list[NodeSpec]) -> CredentialSetupSession:
"""Create a setup session by detecting missing credentials from nodes."""
missing = detect_missing_credentials_from_nodes(nodes)
return cls(missing)
@classmethod
def from_agent_path(cls, agent_path: str | Path) -> CredentialSetupSession:
"""Create a setup session for an agent by path."""
agent_path = Path(agent_path)
# Load agent to get nodes
agent_json = agent_path / "agent.json"
agent_py = agent_path / "agent.py"
nodes = []
if agent_py.exists():
# Python-based agent
nodes = _load_nodes_from_python_agent(agent_path)
elif agent_json.exists():
# JSON-based agent
nodes = _load_nodes_from_json_agent(agent_json)
missing = detect_missing_credentials_from_nodes(nodes)
return cls(missing)
def run_interactive(self) -> SetupResult:
"""Run the interactive setup flow."""
configured: list[str] = []
skipped: list[str] = []
errors: list[str] = []
if not self.missing:
self._print(f"\n{Colors.GREEN}✓ All credentials are already configured!{Colors.NC}\n")
return SetupResult(success=True)
self._print_header()
# Ensure HIVE_CREDENTIAL_KEY is set before storing anything
if not self._ensure_credential_key():
return SetupResult(
success=False,
errors=["Failed to initialize credential store encryption key"],
)
for cred in self.missing:
try:
result = self._setup_single_credential(cred)
if result:
configured.append(cred.credential_name)
else:
skipped.append(cred.credential_name)
except KeyboardInterrupt:
self._print(f"\n{Colors.YELLOW}Setup interrupted.{Colors.NC}")
skipped.append(cred.credential_name)
break
except Exception as e:
errors.append(f"{cred.credential_name}: {e}")
self._print_summary(configured, skipped, errors)
return SetupResult(
success=len(errors) == 0 and len(skipped) == 0,
configured=configured,
skipped=skipped,
errors=errors,
)
def _print(self, msg: str) -> None:
"""Print a message."""
self.print_fn(msg)
def _input(self, prompt: str) -> str:
"""Get input from user."""
return self.input_fn(prompt)
def _print_header(self) -> None:
"""Print the setup header."""
self._print("")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD} CREDENTIAL SETUP{Colors.NC}")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print("")
self._print(f" {len(self.missing)} credential(s) need to be configured:")
for cred in self.missing:
affected = cred.tools or cred.node_types
self._print(f"{cred.env_var} ({', '.join(affected)})")
self._print("")
def _ensure_credential_key(self) -> bool:
"""Ensure HIVE_CREDENTIAL_KEY is available for encrypted storage."""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return True
# Try to load from shell config
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
return True
except ImportError:
pass
# Generate a new key
self._print(f"{Colors.YELLOW}Initializing credential store...{Colors.NC}")
try:
from cryptography.fernet import Fernet
generated_key = Fernet.generate_key().decode()
os.environ["HIVE_CREDENTIAL_KEY"] = generated_key
# Save to shell config
self._save_key_to_shell_config(generated_key)
return True
except Exception as e:
self._print(f"{Colors.RED}Failed to initialize credential store: {e}{Colors.NC}")
return False
def _save_key_to_shell_config(self, key: str) -> None:
"""Save HIVE_CREDENTIAL_KEY to shell config."""
try:
from aden_tools.credentials.shell_config import (
add_env_var_to_shell_config,
)
success, config_path = add_env_var_to_shell_config(
"HIVE_CREDENTIAL_KEY",
key,
comment="Encryption key for Hive credential store",
)
if success:
self._print(f"{Colors.GREEN}✓ Encryption key saved to {config_path}{Colors.NC}")
except Exception:
# Fallback: just tell the user
self._print("\n")
self._print(
f"{Colors.YELLOW}Add this to your shell config (~/.zshrc or ~/.bashrc):{Colors.NC}"
)
self._print(f' export HIVE_CREDENTIAL_KEY="{key}"')
def _setup_single_credential(self, cred: MissingCredential) -> bool:
"""Set up a single credential. Returns True if configured."""
self._print(f"\n{Colors.CYAN}{'' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD}Setting up: {cred.credential_name}{Colors.NC}")
affected = cred.tools or cred.node_types
self._print(f"{Colors.DIM}Required for: {', '.join(affected)}{Colors.NC}")
if cred.description:
self._print(f"{Colors.DIM}{cred.description}{Colors.NC}")
self._print(f"{Colors.CYAN}{'' * 60}{Colors.NC}")
# Show auth options
options = self._get_auth_options(cred)
choice = self._prompt_choice(options)
if choice == "skip":
return False
elif choice == "aden":
return self._setup_via_aden(cred)
elif choice == "direct":
return self._setup_direct_api_key(cred)
return False
def _get_auth_options(self, cred: MissingCredential) -> list[tuple[str, str, str]]:
"""Get available auth options as (key, label, description) tuples."""
options = []
if cred.direct_api_key_supported:
options.append(
(
"direct",
"Enter API key directly",
"Paste your API key from the provider's dashboard",
)
)
if cred.aden_supported:
options.append(
(
"aden",
"Use Aden Platform (OAuth)",
"Secure OAuth2 flow via hive.adenhq.com",
)
)
options.append(
(
"skip",
"Skip for now",
"Configure this credential later",
)
)
return options
def _prompt_choice(self, options: list[tuple[str, str, str]]) -> str:
"""Prompt user to choose from options."""
self._print("")
for i, (key, label, desc) in enumerate(options, 1):
if key == "skip":
self._print(f" {Colors.DIM}{i}) {label}{Colors.NC}")
else:
self._print(f" {Colors.CYAN}{i}){Colors.NC} {label}")
self._print(f" {Colors.DIM}{desc}{Colors.NC}")
self._print("")
while True:
try:
choice_str = self._input(f"Select option (1-{len(options)}): ").strip()
if not choice_str:
continue
choice_num = int(choice_str)
if 1 <= choice_num <= len(options):
return options[choice_num - 1][0]
except ValueError:
pass
self._print(f"{Colors.RED}Invalid choice. Enter 1-{len(options)}{Colors.NC}")
def _setup_direct_api_key(self, cred: MissingCredential) -> bool:
"""Guide user through direct API key setup."""
# Show instructions
if cred.api_key_instructions:
self._print(f"\n{Colors.BOLD}Setup Instructions:{Colors.NC}")
self._print(cred.api_key_instructions)
if cred.help_url:
self._print(f"\n{Colors.CYAN}Get your API key at:{Colors.NC} {cred.help_url}")
# Collect key (use password input to hide the value)
self._print("")
try:
api_key = self.password_fn(f"Paste your {cred.env_var}: ").strip()
except Exception:
# Fallback to regular input if password input fails
api_key = self._input(f"Paste your {cred.env_var}: ").strip()
if not api_key:
self._print(f"{Colors.YELLOW}No value entered. Skipping.{Colors.NC}")
return False
# Health check
health_result = self._run_health_check(cred, api_key)
if health_result is not None:
if health_result["valid"]:
self._print(f"{Colors.GREEN}{health_result['message']}{Colors.NC}")
else:
self._print(f"{Colors.YELLOW}{health_result['message']}{Colors.NC}")
confirm = self._input("Continue anyway? [y/N]: ").strip().lower()
if confirm != "y":
return False
# Store credential
self._store_credential(cred, api_key)
return True
def _setup_via_aden(self, cred: MissingCredential) -> bool:
"""Guide user through Aden OAuth flow."""
self._print(f"\n{Colors.BOLD}Aden Platform Setup{Colors.NC}")
self._print("This will sync credentials from your Aden account.")
self._print("")
# Check for ADEN_API_KEY
aden_key = os.environ.get("ADEN_API_KEY")
if not aden_key:
self._print("You need an Aden API key to use this method.")
self._print(f"{Colors.CYAN}Get one at:{Colors.NC} https://hive.adenhq.com")
self._print("")
try:
aden_key = self.password_fn("Paste your ADEN_API_KEY: ").strip()
except Exception:
aden_key = self._input("Paste your ADEN_API_KEY: ").strip()
if not aden_key:
self._print(f"{Colors.YELLOW}No key entered. Skipping.{Colors.NC}")
return False
os.environ["ADEN_API_KEY"] = aden_key
# Save to shell config
try:
from aden_tools.credentials.shell_config import add_env_var_to_shell_config
add_env_var_to_shell_config(
"ADEN_API_KEY",
aden_key,
comment="Aden Platform API key",
)
except Exception:
pass
# Sync from Aden
try:
from framework.credentials import CredentialStore
store = CredentialStore.with_aden_sync(
base_url="https://api.adenhq.com",
auto_sync=True,
)
# Check if the credential was synced
cred_id = cred.credential_id or cred.credential_name
if store.is_available(cred_id):
self._print(f"{Colors.GREEN}{cred.credential_name} synced from Aden{Colors.NC}")
# Export to current session
try:
value = store.get_key(cred_id, cred.credential_key)
if value:
os.environ[cred.env_var] = value
except Exception:
pass
return True
else:
self._print(
f"{Colors.YELLOW}{cred.credential_name} not found in Aden account.{Colors.NC}"
)
self._print("Please connect this integration on https://hive.adenhq.com first.")
return False
except Exception as e:
self._print(f"{Colors.RED}Failed to sync from Aden: {e}{Colors.NC}")
return False
def _run_health_check(self, cred: MissingCredential, value: str) -> dict[str, Any] | None:
"""Run health check on credential value."""
try:
from aden_tools.credentials import check_credential_health
result = check_credential_health(cred.credential_name, value)
return {
"valid": result.valid,
"message": result.message,
"details": result.details,
}
except Exception:
# No health checker available
return None
def _store_credential(self, cred: MissingCredential, value: str) -> None:
"""Store credential in encrypted store and export to env."""
from pydantic import SecretStr
from framework.credentials import CredentialKey, CredentialObject, CredentialStore
try:
store = CredentialStore.with_encrypted_storage()
cred_id = cred.credential_id or cred.credential_name
key_name = cred.credential_key or "api_key"
cred_obj = CredentialObject(
id=cred_id,
name=cred.description or cred.credential_name,
keys={key_name: CredentialKey(name=key_name, value=SecretStr(value))},
)
store.save_credential(cred_obj)
self._print(f"{Colors.GREEN}✓ Stored in ~/.hive/credentials/{Colors.NC}")
except Exception as e:
self._print(f"{Colors.YELLOW}⚠ Could not store in credential store: {e}{Colors.NC}")
# Export to current session
os.environ[cred.env_var] = value
self._print(f"{Colors.GREEN}✓ Exported to current session{Colors.NC}")
def _print_summary(self, configured: list[str], skipped: list[str], errors: list[str]) -> None:
"""Print final summary."""
self._print("")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
self._print(f"{Colors.BOLD} SETUP COMPLETE{Colors.NC}")
self._print(f"{Colors.YELLOW}{'=' * 60}{Colors.NC}")
if configured:
self._print(f"\n{Colors.GREEN}✓ Configured:{Colors.NC}")
for name in configured:
self._print(f"{name}")
if skipped:
self._print(f"\n{Colors.YELLOW}⏭ Skipped:{Colors.NC}")
for name in skipped:
self._print(f"{name}")
if errors:
self._print(f"\n{Colors.RED}✗ Errors:{Colors.NC}")
for err in errors:
self._print(f"{err}")
if not skipped and not errors:
self._print(f"\n{Colors.GREEN}All credentials configured successfully!{Colors.NC}")
elif skipped:
self._print(f"\n{Colors.YELLOW}Note: Skipped credentials must be configured ")
self._print(f"before running the agent.{Colors.NC}")
self._print("")
def detect_missing_credentials_from_nodes(nodes: list) -> list[MissingCredential]:
"""
Detect missing credentials for a list of nodes.
Args:
nodes: List of NodeSpec objects
Returns:
List of MissingCredential objects for credentials that need setup
"""
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
except ImportError:
return []
# Collect required tools and node types
required_tools: set[str] = set()
node_types: set[str] = set()
for node in nodes:
if hasattr(node, "tools") and node.tools:
required_tools.update(node.tools)
if hasattr(node, "node_type"):
node_types.add(node.node_type)
# Build credential store to check availability
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(primary=storages[0], fallbacks=storages[1:])
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[MissingCredential] = []
checked: set[str] = set()
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_tools = sorted(t for t in required_tools if t in spec.tools)
missing.append(
MissingCredential(
credential_name=cred_name,
env_var=spec.env_var,
description=spec.description,
help_url=spec.help_url,
api_key_instructions=spec.api_key_instructions,
tools=affected_tools,
aden_supported=spec.aden_supported,
direct_api_key_supported=spec.direct_api_key_supported,
credential_id=spec.credential_id,
credential_key=spec.credential_key,
)
)
# Check node type credentials
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_types = sorted(t for t in node_types if t in spec.node_types)
missing.append(
MissingCredential(
credential_name=cred_name,
env_var=spec.env_var,
description=spec.description,
help_url=spec.help_url,
api_key_instructions=spec.api_key_instructions,
node_types=affected_types,
aden_supported=spec.aden_supported,
direct_api_key_supported=spec.direct_api_key_supported,
credential_id=spec.credential_id,
credential_key=spec.credential_key,
)
)
return missing
def _load_nodes_from_python_agent(agent_path: Path) -> list:
"""Load nodes from a Python-based agent."""
import importlib.util
agent_py = agent_path / "agent.py"
if not agent_py.exists():
return []
try:
# Add agent path and its parent to sys.path so imports work
paths_to_add = [str(agent_path), str(agent_path.parent)]
for p in paths_to_add:
if p not in sys.path:
sys.path.insert(0, p)
spec = importlib.util.spec_from_file_location(
f"{agent_path.name}.agent",
agent_py,
submodule_search_locations=[str(agent_path)],
)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return getattr(module, "nodes", [])
except Exception:
return []
def _load_nodes_from_json_agent(agent_json: Path) -> list:
"""Load nodes from a JSON-based agent."""
try:
with open(agent_json) as f:
data = json.load(f)
from framework.graph import NodeSpec
nodes_data = data.get("graph", {}).get("nodes", [])
nodes = []
for node_data in nodes_data:
nodes.append(
NodeSpec(
id=node_data.get("id", ""),
name=node_data.get("name", ""),
description=node_data.get("description", ""),
node_type=node_data.get("node_type", ""),
tools=node_data.get("tools", []),
input_keys=node_data.get("input_keys", []),
output_keys=node_data.get("output_keys", []),
)
)
return nodes
except Exception:
return []
def run_credential_setup_cli(agent_path: str | Path | None = None) -> int:
"""
Standalone CLI entry point for credential setup.
Can be called from:
- `hive setup-credentials <agent>`
- After CredentialError in runner CLI
- From coding agent CLI
Args:
agent_path: Optional path to agent directory
Returns:
Exit code (0 = success, 1 = failure/skipped)
"""
if agent_path:
session = CredentialSetupSession.from_agent_path(agent_path)
else:
# No agent specified - detect from current context or show error
print("Usage: hive setup-credentials <agent_path>")
return 1
result = session.run_interactive()
return 0 if result.success else 1
+139
View File
@@ -0,0 +1,139 @@
"""Credential validation utilities.
Provides reusable credential validation for agents, whether run through
the AgentRunner or directly via GraphExecutor.
"""
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
logger = logging.getLogger(__name__)
def ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY and ADEN_API_KEY from shell config if not in environment.
The setup-credentials skill writes these to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads
them directly so the runner (and any MCP subprocesses it spawns) can:
- Unlock the encrypted credential store (HIVE_CREDENTIAL_KEY)
- Enable Aden OAuth sync for Google/HubSpot/etc. (ADEN_API_KEY)
"""
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
except ImportError:
return
for var_name in ("HIVE_CREDENTIAL_KEY", "ADEN_API_KEY"):
if os.environ.get(var_name):
continue
found, value = check_env_var_in_shell_config(var_name)
if found and value:
os.environ[var_name] = value
logger.debug("Loaded %s from shell config", var_name)
@dataclass
class _CredentialCheck:
"""Result of checking a single credential."""
env_var: str
source: str
used_by: str
available: bool
help_url: str = ""
def validate_agent_credentials(nodes: list, quiet: bool = False) -> None:
"""Check that required credentials are available before running an agent.
Uses CredentialStoreAdapter.default() which includes Aden sync support,
correctly resolving OAuth credentials stored under hashed IDs.
Prints a summary of all credentials and their sources (encrypted store, env var).
Raises CredentialError with actionable guidance if any are missing.
Args:
nodes: List of NodeSpec objects from the agent graph.
quiet: If True, suppress the credential summary output.
"""
# Collect required tools and node types
required_tools = {tool for node in nodes if node.tools for tool in node.tools}
node_types = {node.node_type for node in nodes}
try:
from aden_tools.credentials import CREDENTIAL_SPECS
except ImportError:
return # aden_tools not installed, skip check
from framework.credentials.storage import CompositeStorage, EncryptedFileStorage, EnvVarStorage
from framework.credentials.store import CredentialStore
# Build credential store
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(primary=storages[0], fallbacks=storages[1:])
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[str] = []
checked: set[str] = set()
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected = sorted(t for t in required_tools if t in spec.tools)
entry = f" {spec.env_var} for {', '.join(affected)}"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
# Check node type credentials (e.g., ANTHROPIC_API_KEY for LLM nodes)
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_types = sorted(t for t in node_types if t in spec.node_types)
entry = f" {spec.env_var} for {', '.join(affected_types)} nodes"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
if missing:
from framework.credentials.models import CredentialError
lines = ["Missing required credentials:\n"]
lines.extend(missing)
lines.append(
"\nTo fix: run /hive-credentials in Claude Code."
"\nIf you've already set up credentials, restart your terminal to load them."
)
raise CredentialError("\n".join(lines))
+2 -52
View File
@@ -1,4 +1,4 @@
"""Graph structures: Goals, Nodes, Edges, and Flexible Execution."""
"""Graph structures: Goals, Nodes, Edges, and Execution."""
from framework.graph.client_io import (
ActiveNodeClientIO,
@@ -6,7 +6,6 @@ from framework.graph.client_io import (
InertNodeClientIO,
NodeClientIO,
)
from framework.graph.code_sandbox import CodeSandbox, safe_eval, safe_exec
from framework.graph.context_handoff import ContextHandoff, HandoffContext
from framework.graph.conversation import ConversationStore, Message, NodeConversation
from framework.graph.edge import DEFAULT_MAX_TOKENS, EdgeCondition, EdgeSpec, GraphSpec
@@ -18,31 +17,9 @@ from framework.graph.event_loop_node import (
OutputAccumulator,
)
from framework.graph.executor import GraphExecutor
from framework.graph.flexible_executor import ExecutorConfig, FlexibleGraphExecutor
from framework.graph.goal import Constraint, Goal, GoalStatus, SuccessCriterion
from framework.graph.judge import HybridJudge, create_default_judge
from framework.graph.node import NodeContext, NodeProtocol, NodeResult, NodeSpec
# Flexible execution (Worker-Judge pattern)
from framework.graph.plan import (
ActionSpec,
ActionType,
# HITL (Human-in-the-loop)
ApprovalDecision,
ApprovalRequest,
ApprovalResult,
EvaluationRule,
ExecutionStatus,
Judgment,
JudgmentAction,
Plan,
PlanExecutionResult,
PlanStep,
StepStatus,
load_export,
)
from framework.graph.worker_node import StepExecutionResult, WorkerNode
__all__ = [
# Goal
"Goal",
@@ -59,35 +36,8 @@ __all__ = [
"EdgeCondition",
"GraphSpec",
"DEFAULT_MAX_TOKENS",
# Executor (fixed graph)
# Executor
"GraphExecutor",
# Plan (flexible execution)
"Plan",
"PlanStep",
"ActionSpec",
"ActionType",
"StepStatus",
"Judgment",
"JudgmentAction",
"EvaluationRule",
"PlanExecutionResult",
"ExecutionStatus",
"load_export",
# HITL (Human-in-the-loop)
"ApprovalDecision",
"ApprovalRequest",
"ApprovalResult",
# Worker-Judge
"HybridJudge",
"create_default_judge",
"WorkerNode",
"StepExecutionResult",
"FlexibleGraphExecutor",
"ExecutorConfig",
# Code Sandbox
"CodeSandbox",
"safe_exec",
"safe_eval",
# Conversation
"NodeConversation",
"ConversationStore",
-413
View File
@@ -1,413 +0,0 @@
"""
Code Sandbox for Safe Execution of Dynamic Code.
Provides a restricted execution environment for code generated by
the external planner. This is critical for open-ended planning where
the planner can create arbitrary code actions.
Security measures:
1. Restricted builtins (no file I/O, no imports of dangerous modules)
2. Timeout enforcement
3. Memory limits (via resource module on Unix)
4. Namespace isolation
"""
import ast
import signal
import sys
from contextlib import contextmanager
from dataclasses import dataclass, field
from typing import Any
# Safe builtins whitelist
SAFE_BUILTINS = {
# Basic types
"True": True,
"False": False,
"None": None,
# Type constructors
"bool": bool,
"int": int,
"float": float,
"str": str,
"list": list,
"dict": dict,
"set": set,
"tuple": tuple,
"frozenset": frozenset,
# Basic functions
"abs": abs,
"all": all,
"any": any,
"bin": bin,
"chr": chr,
"divmod": divmod,
"enumerate": enumerate,
"filter": filter,
"format": format,
"hex": hex,
"isinstance": isinstance,
"issubclass": issubclass,
"iter": iter,
"len": len,
"map": map,
"max": max,
"min": min,
"next": next,
"oct": oct,
"ord": ord,
"pow": pow,
"range": range,
"repr": repr,
"reversed": reversed,
"round": round,
"slice": slice,
"sorted": sorted,
"sum": sum,
"zip": zip,
}
# Modules that can be imported
ALLOWED_MODULES = {
"math",
"json",
"re",
"datetime",
"collections",
"itertools",
"functools",
"operator",
"string",
"random",
"statistics",
"decimal",
"fractions",
}
# Dangerous AST nodes to block
BLOCKED_AST_NODES = {
ast.Import,
ast.ImportFrom,
ast.Global,
ast.Nonlocal,
}
class CodeSandboxError(Exception):
"""Error during sandboxed code execution."""
pass
class TimeoutError(CodeSandboxError):
"""Code execution timed out."""
pass
class SecurityError(CodeSandboxError):
"""Code contains potentially dangerous operations."""
pass
@dataclass
class SandboxResult:
"""Result of sandboxed code execution."""
success: bool
result: Any = None
error: str | None = None
stdout: str = ""
variables: dict[str, Any] = field(default_factory=dict)
execution_time_ms: int = 0
class RestrictedImporter:
"""Custom importer that only allows whitelisted modules."""
def __init__(self, allowed_modules: set[str]):
self.allowed_modules = allowed_modules
self._cache: dict[str, Any] = {}
def __call__(self, name: str, *args, **kwargs):
if name not in self.allowed_modules:
raise SecurityError(f"Import of module '{name}' is not allowed")
if name not in self._cache:
import importlib
self._cache[name] = importlib.import_module(name)
return self._cache[name]
class CodeValidator:
"""Validates code for safety before execution."""
def __init__(self, blocked_nodes: set[type] | None = None):
self.blocked_nodes = blocked_nodes or BLOCKED_AST_NODES
def validate(self, code: str) -> list[str]:
"""
Validate code and return list of issues.
Returns empty list if code is safe.
"""
issues = []
try:
tree = ast.parse(code)
except SyntaxError as e:
return [f"Syntax error: {e}"]
for node in ast.walk(tree):
# Check for blocked node types
if type(node) in self.blocked_nodes:
lineno = getattr(node, "lineno", "?")
issues.append(f"Blocked operation: {type(node).__name__} at line {lineno}")
# Check for dangerous attribute access
if isinstance(node, ast.Attribute):
if node.attr.startswith("_"):
issues.append(
f"Access to private attribute '{node.attr}' at line {node.lineno}"
)
# Check for exec/eval calls
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name):
if node.func.id in ("exec", "eval", "compile", "__import__"):
issues.append(
f"Blocked function call: {node.func.id} at line {node.lineno}"
)
return issues
class CodeSandbox:
"""
Sandboxed environment for executing dynamic code.
Usage:
sandbox = CodeSandbox(timeout_seconds=5)
result = sandbox.execute(
code="x = 1 + 2\\nresult = x * 3",
inputs={"multiplier": 2},
)
if result.success:
print(result.variables["result"]) # 6
"""
def __init__(
self,
timeout_seconds: int = 10,
allowed_modules: set[str] | None = None,
safe_builtins: dict[str, Any] | None = None,
):
self.timeout_seconds = timeout_seconds
self.allowed_modules = allowed_modules or ALLOWED_MODULES
self.safe_builtins = safe_builtins or SAFE_BUILTINS
self.validator = CodeValidator()
self.importer = RestrictedImporter(self.allowed_modules)
@contextmanager
def _timeout_context(self, seconds: int):
"""Context manager for timeout enforcement."""
def handler(signum, frame):
raise TimeoutError(f"Code execution timed out after {seconds} seconds")
# Only works on Unix-like systems
if hasattr(signal, "SIGALRM"):
old_handler = signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
signal.signal(signal.SIGALRM, old_handler)
else:
# Windows: no timeout support, just execute
yield
def _create_namespace(self, inputs: dict[str, Any]) -> dict[str, Any]:
"""Create isolated namespace for code execution."""
namespace = {
"__builtins__": dict(self.safe_builtins),
"__import__": self.importer,
}
# Add input variables
namespace.update(inputs)
return namespace
def execute(
self,
code: str,
inputs: dict[str, Any] | None = None,
extract_vars: list[str] | None = None,
) -> SandboxResult:
"""
Execute code in sandbox.
Args:
code: Python code to execute
inputs: Variables to inject into namespace
extract_vars: Variable names to extract from namespace after execution
Returns:
SandboxResult with execution outcome
"""
import time
inputs = inputs or {}
extract_vars = extract_vars or []
# Validate code first
issues = self.validator.validate(code)
if issues:
return SandboxResult(
success=False,
error=f"Code validation failed: {'; '.join(issues)}",
)
# Create isolated namespace
namespace = self._create_namespace(inputs)
# Capture stdout
import io
old_stdout = sys.stdout
sys.stdout = captured_stdout = io.StringIO()
start_time = time.time()
try:
with self._timeout_context(self.timeout_seconds):
# Compile and execute
compiled = compile(code, "<sandbox>", "exec")
exec(compiled, namespace)
execution_time_ms = int((time.time() - start_time) * 1000)
# Extract requested variables
extracted = {}
for var in extract_vars:
if var in namespace:
extracted[var] = namespace[var]
# Also extract any new variables (not in inputs or builtins)
for key, value in namespace.items():
if key not in inputs and key not in self.safe_builtins and not key.startswith("_"):
extracted[key] = value
return SandboxResult(
success=True,
result=namespace.get("result"), # Convention: 'result' is the return value
stdout=captured_stdout.getvalue(),
variables=extracted,
execution_time_ms=execution_time_ms,
)
except TimeoutError as e:
return SandboxResult(
success=False,
error=str(e),
execution_time_ms=self.timeout_seconds * 1000,
)
except SecurityError as e:
return SandboxResult(
success=False,
error=f"Security violation: {e}",
execution_time_ms=int((time.time() - start_time) * 1000),
)
except Exception as e:
return SandboxResult(
success=False,
error=f"{type(e).__name__}: {e}",
stdout=captured_stdout.getvalue(),
execution_time_ms=int((time.time() - start_time) * 1000),
)
finally:
sys.stdout = old_stdout
def execute_expression(
self,
expression: str,
inputs: dict[str, Any] | None = None,
) -> SandboxResult:
"""
Execute a single expression and return its value.
Simpler than execute() - just evaluates one expression.
"""
inputs = inputs or {}
# Validate
try:
ast.parse(expression, mode="eval")
except SyntaxError as e:
return SandboxResult(success=False, error=f"Syntax error: {e}")
namespace = self._create_namespace(inputs)
try:
with self._timeout_context(self.timeout_seconds):
result = eval(expression, namespace)
return SandboxResult(success=True, result=result)
except Exception as e:
return SandboxResult(
success=False,
error=f"{type(e).__name__}: {e}",
)
# Singleton instance with default settings
default_sandbox = CodeSandbox()
def safe_exec(
code: str,
inputs: dict[str, Any] | None = None,
timeout_seconds: int = 10,
) -> SandboxResult:
"""
Convenience function for safe code execution.
Args:
code: Python code to execute
inputs: Variables to inject
timeout_seconds: Max execution time
Returns:
SandboxResult
"""
sandbox = CodeSandbox(timeout_seconds=timeout_seconds)
return sandbox.execute(code, inputs)
def safe_eval(
expression: str,
inputs: dict[str, Any] | None = None,
timeout_seconds: int = 5,
) -> SandboxResult:
"""
Convenience function for safe expression evaluation.
Args:
expression: Python expression to evaluate
inputs: Variables to inject
timeout_seconds: Max execution time
Returns:
SandboxResult
"""
sandbox = CodeSandbox(timeout_seconds=timeout_seconds)
return sandbox.execute_expression(expression, inputs)
+123 -10
View File
@@ -27,6 +27,9 @@ class Message:
tool_use_id: str | None = None
tool_calls: list[dict[str, Any]] | None = None
is_error: bool = False
# Phase-aware compaction metadata (continuous mode)
phase_id: str | None = None
is_transition_marker: bool = False
def to_llm_dict(self) -> dict[str, Any]:
"""Convert to OpenAI-format message dict."""
@@ -60,6 +63,10 @@ class Message:
d["tool_calls"] = self.tool_calls
if self.is_error:
d["is_error"] = self.is_error
if self.phase_id is not None:
d["phase_id"] = self.phase_id
if self.is_transition_marker:
d["is_transition_marker"] = self.is_transition_marker
return d
@classmethod
@@ -72,6 +79,8 @@ class Message:
tool_use_id=data.get("tool_use_id"),
tool_calls=data.get("tool_calls"),
is_error=data.get("is_error", False),
phase_id=data.get("phase_id"),
is_transition_marker=data.get("is_transition_marker", False),
)
@@ -188,6 +197,7 @@ class NodeConversation:
self._next_seq: int = 0
self._meta_persisted: bool = False
self._last_api_input_tokens: int | None = None
self._current_phase: str | None = None
# --- Properties --------------------------------------------------------
@@ -195,6 +205,33 @@ class NodeConversation:
def system_prompt(self) -> str:
return self._system_prompt
def update_system_prompt(self, new_prompt: str) -> None:
"""Update the system prompt.
Used in continuous conversation mode at phase transitions to swap
Layer 3 (focus) while preserving the conversation history.
"""
self._system_prompt = new_prompt
def set_current_phase(self, phase_id: str) -> None:
"""Set the current phase ID. Subsequent messages will be stamped with it."""
self._current_phase = phase_id
async def switch_store(self, new_store: ConversationStore) -> None:
"""Switch to a new persistence store at a phase transition.
Subsequent messages are written to *new_store*. Meta (system
prompt, config) is re-persisted on the next write so the new
store's ``meta.json`` reflects the updated prompt.
"""
self._store = new_store
self._meta_persisted = False
await new_store.write_cursor({"next_seq": self._next_seq})
@property
def current_phase(self) -> str | None:
return self._current_phase
@property
def messages(self) -> list[Message]:
"""Return a defensive copy of the message list."""
@@ -216,8 +253,19 @@ class NodeConversation:
# --- Add messages ------------------------------------------------------
async def add_user_message(self, content: str) -> Message:
msg = Message(seq=self._next_seq, role="user", content=content)
async def add_user_message(
self,
content: str,
*,
is_transition_marker: bool = False,
) -> Message:
msg = Message(
seq=self._next_seq,
role="user",
content=content,
phase_id=self._current_phase,
is_transition_marker=is_transition_marker,
)
self._messages.append(msg)
self._next_seq += 1
await self._persist(msg)
@@ -233,6 +281,7 @@ class NodeConversation:
role="assistant",
content=content,
tool_calls=tool_calls,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -251,6 +300,7 @@ class NodeConversation:
content=content,
tool_use_id=tool_use_id,
is_error=is_error,
phase_id=self._current_phase,
)
self._messages.append(msg)
self._next_seq += 1
@@ -380,6 +430,11 @@ class NodeConversation:
spillover filename reference (if any). Message structure (role,
seq, tool_use_id) stays valid for the LLM API.
Phase-aware behavior (continuous mode): when messages have ``phase_id``
metadata, all messages in the current phase are protected regardless of
token budget. Transition markers are never pruned. Older phases' tool
results are pruned more aggressively.
Error tool results are never pruned they prevent re-calling
failing tools.
@@ -388,13 +443,18 @@ class NodeConversation:
if not self._messages:
return 0
# Phase 1: Walk backward, classify tool results as protected vs pruneable
# Walk backward, classify tool results as protected vs pruneable
protected_tokens = 0
pruneable: list[int] = [] # indices into self._messages
pruneable_tokens = 0
for i in range(len(self._messages) - 1, -1, -1):
msg = self._messages[i]
# Transition markers are never pruned (any role)
if msg.is_transition_marker:
continue
if msg.role != "tool":
continue
if msg.is_error:
@@ -402,6 +462,10 @@ class NodeConversation:
if msg.content.startswith("[Pruned tool result"):
continue # already pruned
# Phase-aware: protect current phase messages
if self._current_phase and msg.phase_id == self._current_phase:
continue
est = len(msg.content) // 4
if protected_tokens < protect_tokens:
protected_tokens += est
@@ -409,11 +473,11 @@ class NodeConversation:
pruneable.append(i)
pruneable_tokens += est
# Phase 2: Only prune if enough to be worthwhile
# Only prune if enough to be worthwhile
if pruneable_tokens < min_prune_tokens:
return 0
# Phase 3: Replace content with compact placeholder
# Replace content with compact placeholder
count = 0
for i in pruneable:
msg = self._messages[i]
@@ -436,6 +500,8 @@ class NodeConversation:
tool_use_id=msg.tool_use_id,
tool_calls=msg.tool_calls,
is_error=msg.is_error,
phase_id=msg.phase_id,
is_transition_marker=msg.is_transition_marker,
)
count += 1
@@ -446,22 +512,38 @@ class NodeConversation:
self._last_api_input_tokens = None
return count
async def compact(self, summary: str, keep_recent: int = 2) -> None:
async def compact(
self,
summary: str,
keep_recent: int = 2,
phase_graduated: bool = False,
) -> None:
"""Replace old messages with a summary, optionally keeping recent ones.
Args:
summary: Caller-provided summary text.
keep_recent: Number of recent messages to preserve (default 2).
Clamped to [0, len(messages) - 1].
phase_graduated: When True and messages have phase_id metadata,
split at phase boundaries instead of using keep_recent.
Keeps current + previous phase intact; compacts older phases.
"""
if not self._messages:
return
# Clamp: must discard at least 1 message
keep_recent = max(0, min(keep_recent, len(self._messages) - 1))
total = len(self._messages)
split = total - keep_recent if keep_recent > 0 else total
# Phase-graduated: find the split point based on phase boundaries.
# Keeps current phase + previous phase intact, compacts older phases.
if phase_graduated and self._current_phase:
split = self._find_phase_graduated_split()
else:
split = None
if split is None:
# Fallback: use keep_recent (non-phase or single-phase conversation)
keep_recent = max(0, min(keep_recent, total - 1))
split = total - keep_recent if keep_recent > 0 else total
# Advance split past orphaned tool results at the boundary.
# Tool-role messages reference a tool_use from the preceding
@@ -470,6 +552,10 @@ class NodeConversation:
while split < total and self._messages[split].role == "tool":
split += 1
# Nothing to compact
if split == 0:
return
old_messages = list(self._messages[:split])
recent_messages = list(self._messages[split:])
@@ -504,6 +590,33 @@ class NodeConversation:
self._messages = [summary_msg] + recent_messages
self._last_api_input_tokens = None # reset; next LLM call will recalibrate
def _find_phase_graduated_split(self) -> int | None:
"""Find split point that preserves current + previous phase.
Returns the index of the first message in the protected set,
or None if phase graduation doesn't apply (< 3 phases).
"""
# Collect distinct phases in order of first appearance
phases_seen: list[str] = []
for msg in self._messages:
if msg.phase_id and msg.phase_id not in phases_seen:
phases_seen.append(msg.phase_id)
# Need at least 3 phases for graduation to be meaningful
# (current + previous are protected, older get compacted)
if len(phases_seen) < 3:
return None
# Protect: current phase + previous phase
protected_phases = {phases_seen[-1], phases_seen[-2]}
# Find split: first message belonging to a protected phase
for i, msg in enumerate(self._messages):
if msg.phase_id in protected_phases:
return i
return None
async def clear(self) -> None:
"""Remove all messages, keep system prompt, preserve ``_next_seq``."""
if self._store:
+177
View File
@@ -0,0 +1,177 @@
"""Level 2 Conversation-Aware Judge.
When a node has `success_criteria` set, the implicit judge upgrades:
after Level 0 passes (all output keys set), a fast LLM call evaluates
whether the conversation actually meets the criteria.
This prevents nodes from "checking boxes" (setting output keys) without
doing quality work. The LLM reads the recent conversation and assesses
whether the phase's goal was genuinely accomplished.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import Any
from framework.graph.conversation import NodeConversation
from framework.llm.provider import LLMProvider
logger = logging.getLogger(__name__)
@dataclass
class PhaseVerdict:
"""Result of Level 2 conversation-aware evaluation."""
action: str # "ACCEPT" or "RETRY"
confidence: float = 0.8
feedback: str = ""
async def evaluate_phase_completion(
llm: LLMProvider,
conversation: NodeConversation,
phase_name: str,
phase_description: str,
success_criteria: str,
accumulator_state: dict[str, Any],
max_history_tokens: int = 8_196,
) -> PhaseVerdict:
"""Level 2 judge: read the conversation and evaluate quality.
Only called after Level 0 passes (all output keys set).
Args:
llm: LLM provider for evaluation
conversation: The current conversation to evaluate
phase_name: Name of the current phase/node
phase_description: Description of the phase
success_criteria: Natural-language criteria for phase completion
accumulator_state: Current output key values
max_history_tokens: Main conversation token budget (judge gets 20%)
Returns:
PhaseVerdict with action and optional feedback
"""
# Build a compact view of the recent conversation
recent_messages = _extract_recent_context(conversation, max_messages=10)
outputs_summary = _format_outputs(accumulator_state)
system_prompt = (
"You are a quality judge evaluating whether a phase of work is complete. "
"Be concise. Evaluate based on the success criteria, not on style."
)
user_prompt = f"""Evaluate this phase:
PHASE: {phase_name}
DESCRIPTION: {phase_description}
SUCCESS CRITERIA:
{success_criteria}
OUTPUTS SET:
{outputs_summary}
RECENT CONVERSATION:
{recent_messages}
Has this phase accomplished its goal based on the success criteria?
Respond in exactly this format:
ACTION: ACCEPT or RETRY
CONFIDENCE: 0.X
FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
try:
response = await llm.acomplete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
max_tokens=max(1024, max_history_tokens // 5),
max_retries=1,
)
if not response.content or not response.content.strip():
logger.debug("Level 2 judge: empty response, accepting by default")
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
return _parse_verdict(response.content)
except Exception as e:
logger.warning(f"Level 2 judge failed, accepting by default: {e}")
# On failure, don't block — Level 0 already passed
return PhaseVerdict(action="ACCEPT", confidence=0.5, feedback="")
def _extract_recent_context(conversation: NodeConversation, max_messages: int = 10) -> str:
"""Extract recent conversation messages for evaluation."""
messages = conversation.messages
recent = messages[-max_messages:] if len(messages) > max_messages else messages
parts = []
for msg in recent:
role = msg.role.upper()
content = msg.content or ""
# Truncate long tool results
if msg.role == "tool" and len(content) > 200:
content = content[:200] + "..."
if content.strip():
parts.append(f"[{role}]: {content.strip()}")
return "\n".join(parts) if parts else "(no messages)"
def _format_outputs(accumulator_state: dict[str, Any]) -> str:
"""Format output key values for evaluation.
Lists and dicts get structural formatting so the judge can assess
quantity and structure, not just a truncated stringification.
"""
if not accumulator_state:
return "(none)"
parts = []
for key, value in accumulator_state.items():
if isinstance(value, list):
# Show count + brief per-item preview so the judge can
# verify quantity without the full serialization.
items_preview = []
for i, item in enumerate(value[:8]):
item_str = str(item)
if len(item_str) > 150:
item_str = item_str[:150] + "..."
items_preview.append(f" [{i}]: {item_str}")
val_str = f"list ({len(value)} items):\n" + "\n".join(items_preview)
if len(value) > 8:
val_str += f"\n ... and {len(value) - 8} more"
elif isinstance(value, dict):
val_str = str(value)
if len(val_str) > 400:
val_str = val_str[:400] + "..."
else:
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
parts.append(f" {key}: {val_str}")
return "\n".join(parts)
def _parse_verdict(response: str) -> PhaseVerdict:
"""Parse LLM response into PhaseVerdict."""
action = "ACCEPT"
confidence = 0.8
feedback = ""
for line in response.strip().split("\n"):
line = line.strip()
if line.startswith("ACTION:"):
action_str = line.split(":", 1)[1].strip().upper()
if action_str in ("ACCEPT", "RETRY"):
action = action_str
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(line.split(":", 1)[1].strip())
except ValueError:
pass
elif line.startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return PhaseVerdict(action=action, confidence=confidence, feedback=feedback)
+28 -17
View File
@@ -21,6 +21,9 @@ allowing the LLM to evaluate whether proceeding along an edge makes sense
given the current goal, context, and execution state.
"""
import json
import logging
import re
from enum import StrEnum
from typing import Any
@@ -28,6 +31,8 @@ from pydantic import BaseModel, Field, model_validator
from framework.graph.safe_eval import safe_eval
logger = logging.getLogger(__name__)
DEFAULT_MAX_TOKENS = 8192
@@ -99,7 +104,7 @@ class EdgeSpec(BaseModel):
model_config = {"extra": "allow"}
def should_traverse(
async def should_traverse(
self,
source_success: bool,
source_output: dict[str, Any],
@@ -140,7 +145,7 @@ class EdgeSpec(BaseModel):
if llm is None or goal is None:
# Fallback to ON_SUCCESS if LLM not available
return source_success
return self._llm_decide(
return await self._llm_decide(
llm=llm,
goal=goal,
source_success=source_success,
@@ -158,9 +163,6 @@ class EdgeSpec(BaseModel):
memory: dict[str, Any],
) -> bool:
"""Evaluate a conditional expression."""
import logging
logger = logging.getLogger(__name__)
if not self.condition_expr:
return True
@@ -201,7 +203,7 @@ class EdgeSpec(BaseModel):
logger.warning(f" Available context keys: {list(context.keys())}")
return False
def _llm_decide(
async def _llm_decide(
self,
llm: Any,
goal: Any,
@@ -217,8 +219,6 @@ class EdgeSpec(BaseModel):
The LLM evaluates whether proceeding to the target node
is the best next step toward achieving the goal.
"""
import json
# Build context for LLM
prompt = f"""You are evaluating whether to proceed along an edge in an agent workflow.
@@ -247,15 +247,13 @@ Respond with ONLY a JSON object:
{{"proceed": true/false, "reasoning": "brief explanation"}}"""
try:
response = llm.complete(
response = await llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a routing agent. Respond with JSON only.",
max_tokens=150,
)
# Parse response
import re
json_match = re.search(r"\{[^{}]*\}", response.content, re.DOTALL)
if json_match:
data = json.loads(json_match.group())
@@ -263,9 +261,6 @@ Respond with ONLY a JSON object:
reasoning = data.get("reasoning", "")
# Log the decision (using basic print for now)
import logging
logger = logging.getLogger(__name__)
logger.info(f" 🤔 LLM routing decision: {'PROCEED' if proceed else 'SKIP'}")
logger.info(f" Reason: {reasoning}")
@@ -273,9 +268,6 @@ Respond with ONLY a JSON object:
except Exception as e:
# Fallback: proceed on success
import logging
logger = logging.getLogger(__name__)
logger.warning(f" ⚠ LLM routing failed, defaulting to on_success: {e}")
return source_success
@@ -443,6 +435,25 @@ class GraphSpec(BaseModel):
description="EventLoopNode configuration (max_iterations, max_tool_calls_per_turn, etc.)",
)
# Conversation mode
conversation_mode: str = Field(
default="continuous",
description=(
"How conversations flow between event_loop nodes. "
"'continuous' (default): one conversation threads through all "
"event_loop nodes with cumulative tools and layered prompt composition. "
"'isolated': each node gets a fresh conversation."
),
)
identity_prompt: str | None = Field(
default=None,
description=(
"Agent-level identity prompt (Layer 1 of the onion model). "
"In continuous mode, this is the static identity that persists "
"unchanged across all node transitions. In isolated mode, ignored."
),
)
# Metadata
description: str = ""
created_by: str = "" # "human" or "builder_agent"
File diff suppressed because it is too large Load Diff
+364 -92
View File
@@ -11,7 +11,6 @@ The executor:
import asyncio
import logging
import warnings
from collections.abc import Callable
from dataclasses import dataclass, field
from pathlib import Path
@@ -21,13 +20,10 @@ from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.goal import Goal
from framework.graph.node import (
FunctionNode,
LLMNode,
NodeContext,
NodeProtocol,
NodeResult,
NodeSpec,
RouterNode,
SharedMemory,
)
from framework.graph.output_cleaner import CleansingConfig, OutputCleaner
@@ -186,6 +182,55 @@ class GraphExecutor:
# Pause/resume control
self._pause_requested = asyncio.Event()
def _write_progress(
self,
current_node: str,
path: list[str],
memory: Any,
node_visit_counts: dict[str, int],
) -> None:
"""Update state.json with live progress at node transitions.
Reads the existing state.json (written by ExecutionStream at session
start) and patches the progress fields in-place. This keeps
state.json as the single source of truth readers always see
current progress, not stale initial values.
The write is synchronous and best-effort: never blocks execution.
"""
if not self._storage_path:
return
try:
import json as _json
from datetime import datetime
state_path = self._storage_path / "state.json"
if state_path.exists():
state_data = _json.loads(state_path.read_text(encoding="utf-8"))
else:
state_data = {}
# Patch progress fields
progress = state_data.setdefault("progress", {})
progress["current_node"] = current_node
progress["path"] = list(path)
progress["node_visit_counts"] = dict(node_visit_counts)
progress["steps_executed"] = len(path)
# Update timestamp
timestamps = state_data.setdefault("timestamps", {})
timestamps["updated_at"] = datetime.now().isoformat()
# Persist full memory so state.json is sufficient for resume
# even if the process dies before the final write.
memory_snapshot = memory.read_all()
state_data["memory"] = memory_snapshot
state_data["memory_keys"] = list(memory_snapshot.keys())
state_path.write_text(_json.dumps(state_data, indent=2), encoding="utf-8")
except Exception:
pass # Best-effort — never block execution
def _validate_tools(self, graph: GraphSpec) -> list[str]:
"""
Validate that all tools declared by nodes are available.
@@ -257,6 +302,13 @@ class GraphExecutor:
# Initialize execution state
memory = SharedMemory()
# Continuous conversation mode state
is_continuous = getattr(graph, "conversation_mode", "isolated") == "continuous"
continuous_conversation = None # NodeConversation threaded across nodes
cumulative_tools: list = [] # Tools accumulate, never removed
cumulative_tool_names: set[str] = set()
cumulative_output_keys: list[str] = [] # Output keys from all visited nodes
# Initialize checkpoint store if checkpointing is enabled
checkpoint_store: CheckpointStore | None = None
if checkpoint_config and checkpoint_config.enabled and self._storage_path:
@@ -273,16 +325,26 @@ class GraphExecutor:
f"{type(memory_data).__name__}, expected dict"
)
else:
# Restore memory from previous session
# Restore memory from previous session.
# Skip validation — this data was already validated when
# originally written, and research text triggers false
# positives on the code-indicator heuristic.
for key, value in memory_data.items():
memory.write(key, value)
memory.write(key, value, validate=False)
self.logger.info(f"📥 Restored session state with {len(memory_data)} memory keys")
# Write new input data to memory (each key individually)
if input_data:
# Write new input data to memory (each key individually).
# Skip when resuming from a paused session — restored memory already
# contains all state including the original input, and re-writing
# input_data would overwrite intermediate results with stale values.
_is_resuming = bool(session_state and session_state.get("paused_at"))
if input_data and not _is_resuming:
for key, value in input_data.items():
memory.write(key, value)
# Detect event-triggered execution (timer/webhook) — no interactive user.
_event_triggered = bool(input_data and isinstance(input_data.get("event"), dict))
path: list[str] = []
total_tokens = 0
total_latency = 0
@@ -396,9 +458,76 @@ class GraphExecutor:
steps = 0
# Fresh shared-session execution: clear stale cursor so the entry
# node doesn't restore a filled OutputAccumulator from the previous
# webhook run (which would cause the judge to accept immediately).
# The conversation history is preserved (continuous memory).
_is_fresh_shared = bool(
session_state
and session_state.get("resume_session_id")
and not session_state.get("paused_at")
and not session_state.get("resume_from_checkpoint")
)
if _is_fresh_shared and is_continuous and self._storage_path:
try:
from framework.storage.conversation_store import FileConversationStore
entry_conv_path = self._storage_path / "conversations" / current_node_id
if entry_conv_path.exists():
_store = FileConversationStore(base_path=entry_conv_path)
# Read cursor to find next seq for the transition marker.
_cursor = await _store.read_cursor() or {}
_next_seq = _cursor.get("next_seq", 0)
if _next_seq == 0:
# Fallback: scan part files for max seq
_parts = await _store.read_parts()
if _parts:
_next_seq = max(p.get("seq", 0) for p in _parts) + 1
# Reset cursor — clears stale accumulator outputs and
# iteration counter so the node starts fresh work while
# the conversation thread carries forward.
await _store.write_cursor({})
# Append a transition marker so the LLM knows a new
# event arrived and previous results are outdated.
await _store.write_part(
_next_seq,
{
"role": "user",
"content": (
"--- NEW EVENT TRIGGER ---\n"
"A new event has been received. "
"Process this as a fresh request — "
"previous outputs are no longer valid."
),
"seq": _next_seq,
"is_transition_marker": True,
},
)
self.logger.info(
"🔄 Cleared stale cursor and added transition marker "
"for shared-session entry node '%s'",
current_node_id,
)
except Exception:
self.logger.debug(
"Could not prepare conversation store for shared-session entry node '%s'",
current_node_id,
exc_info=True,
)
if session_state and current_node_id != graph.entry_node:
self.logger.info(f"🔄 Resuming from: {current_node_id}")
# Emit resume event
if self._event_bus:
await self._event_bus.emit_execution_resumed(
stream_id=self._stream_id,
node_id=current_node_id,
)
# Start run
_run_id = self.runtime.start_run(
goal_id=goal.id,
@@ -435,6 +564,14 @@ class GraphExecutor:
if self._pause_requested.is_set():
self.logger.info("⏸ Pause detected - stopping at node boundary")
# Emit pause event
if self._event_bus:
await self._event_bus.emit_execution_paused(
stream_id=self._stream_id,
node_id=current_node_id,
reason="User requested pause (Ctrl+Z)",
)
# Create session state for pause
saved_memory = memory.read_all()
pause_session_state: dict[str, Any] = {
@@ -489,7 +626,7 @@ class GraphExecutor:
)
# Skip execution — follow outgoing edges using current memory
skip_result = NodeResult(success=True, output=memory.read_all())
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -530,6 +667,17 @@ class GraphExecutor:
self.logger.info(f" Inputs: {node_spec.input_keys}")
self.logger.info(f" Outputs: {node_spec.output_keys}")
# Continuous mode: accumulate tools and output keys from this node
if is_continuous and node_spec.tools:
for t in self.tools:
if t.name in node_spec.tools and t.name not in cumulative_tool_names:
cumulative_tools.append(t)
cumulative_tool_names.add(t.name)
if is_continuous and node_spec.output_keys:
for k in node_spec.output_keys:
if k not in cumulative_output_keys:
cumulative_output_keys.append(k)
# Build context for node
ctx = self._build_context(
node_spec=node_spec,
@@ -537,6 +685,11 @@ class GraphExecutor:
goal=goal,
input_data=input_data or {},
max_tokens=graph.max_tokens,
continuous_mode=is_continuous,
inherited_conversation=continuous_conversation if is_continuous else None,
override_tools=cumulative_tools if is_continuous else None,
cumulative_output_keys=cumulative_output_keys if is_continuous else None,
event_triggered=_event_triggered,
)
# Log actual input data being read
@@ -680,9 +833,13 @@ class GraphExecutor:
# [CORRECTED] Use node_spec.max_retries instead of hardcoded 3
max_retries = getattr(node_spec, "max_retries", 3)
# Event loop nodes handle retry internally via judge —
# executor retry is catastrophic (retry multiplication)
if node_spec.node_type == "event_loop" and max_retries > 0:
# EventLoopNode instances handle retry internally via judge —
# executor retry would cause catastrophic retry multiplication.
# Only override for actual EventLoopNode instances, not custom
# NodeProtocol implementations that happen to use node_type="event_loop"
from framework.graph.event_loop_node import EventLoopNode
if isinstance(node_impl, EventLoopNode) and max_retries > 0:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has max_retries={max_retries}. "
"Overriding to 0 — event loop nodes handle retry internally via judge."
@@ -704,6 +861,17 @@ class GraphExecutor:
self.logger.info(
f" ↻ Retrying ({node_retry_counts[current_node_id]}/{max_retries})..."
)
# Emit retry event
if self._event_bus:
await self._event_bus.emit_node_retry(
stream_id=self._stream_id,
node_id=current_node_id,
retry_count=retry_count,
max_retries=max_retries,
error=result.error or "",
)
_is_retry = True
continue
else:
@@ -713,7 +881,7 @@ class GraphExecutor:
)
# Check if there's an ON_FAILURE edge to follow
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -763,6 +931,7 @@ class GraphExecutor:
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"resume_from": current_node_id,
}
return ExecutionResult(
@@ -789,11 +958,22 @@ class GraphExecutor:
# This must happen BEFORE determining next node, since pause nodes may have no edges
if node_spec.id in graph.pause_nodes:
self.logger.info("💾 Saving session state after pause node")
# Emit pause event
if self._event_bus:
await self._event_bus.emit_execution_paused(
stream_id=self._stream_id,
node_id=node_spec.id,
reason="HITL pause node",
)
saved_memory = memory.read_all()
session_state_out = {
"paused_at": node_spec.id,
"resume_from": f"{node_spec.id}_resume", # Resume key
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"next_node": None, # Will resume from entry point
}
@@ -842,10 +1022,21 @@ class GraphExecutor:
if result.next_node:
# Router explicitly set next node
self.logger.info(f" → Router directing to: {result.next_node}")
# Emit edge traversed event for router-directed edge
if self._event_bus:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=result.next_node,
edge_condition="router",
)
current_node_id = result.next_node
self._write_progress(current_node_id, path, memory, node_visit_counts)
else:
# Get all traversable edges for fan-out detection
traversable_edges = self._get_all_traversable_edges(
traversable_edges = await self._get_all_traversable_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -864,6 +1055,18 @@ class GraphExecutor:
targets = [e.target for e in traversable_edges]
fan_in_node = self._find_convergence_node(graph, targets)
# Emit edge traversed events for fan-out branches
if self._event_bus:
for edge in traversable_edges:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=edge.target,
edge_condition=edge.condition.value
if hasattr(edge.condition, "value")
else str(edge.condition),
)
# Execute branches in parallel
(
_branch_results,
@@ -886,13 +1089,14 @@ class GraphExecutor:
if fan_in_node:
self.logger.info(f" ⑃ Fan-in: converging at {fan_in_node}")
current_node_id = fan_in_node
self._write_progress(current_node_id, path, memory, node_visit_counts)
else:
# No convergence point - branches are terminal
self.logger.info(" → Parallel branches completed (no convergence)")
break
else:
# Sequential: follow single edge (existing logic via _follow_edges)
next_node = self._follow_edges(
next_node = await self._follow_edges(
graph=graph,
goal=goal,
current_node_id=current_node_id,
@@ -906,6 +1110,14 @@ class GraphExecutor:
next_spec = graph.get_node(next_node)
self.logger.info(f" → Next: {next_spec.name if next_spec else next_node}")
# Emit edge traversed event for sequential edge
if self._event_bus:
await self._event_bus.emit_edge_traversed(
stream_id=self._stream_id,
source_node=current_node_id,
target_node=next_node,
)
# CHECKPOINT: node_complete (after determining next node)
if (
checkpoint_store
@@ -940,6 +1152,84 @@ class GraphExecutor:
current_node_id = next_node
# Write progress snapshot at node transition
self._write_progress(current_node_id, path, memory, node_visit_counts)
# Continuous mode: thread conversation forward with transition marker
if is_continuous and result.conversation is not None:
continuous_conversation = result.conversation
# Look up the next node spec for the transition marker
next_spec = graph.get_node(current_node_id)
if next_spec and next_spec.node_type == "event_loop":
from framework.graph.prompt_composer import (
build_narrative,
build_transition_marker,
compose_system_prompt,
)
# Build Layer 2 (narrative) from current state
narrative = build_narrative(memory, path, graph)
# Compose new system prompt (Layer 1 + 2 + 3)
new_system = compose_system_prompt(
identity_prompt=getattr(graph, "identity_prompt", None),
focus_prompt=next_spec.system_prompt,
narrative=narrative,
)
continuous_conversation.update_system_prompt(new_system)
# Switch conversation store to the next node's directory
# so the transition marker and all subsequent messages are
# persisted there instead of the first node's directory.
if self._storage_path:
from framework.storage.conversation_store import (
FileConversationStore,
)
next_store_path = self._storage_path / "conversations" / next_spec.id
next_store = FileConversationStore(base_path=next_store_path)
await continuous_conversation.switch_store(next_store)
# Insert transition marker into conversation
data_dir = str(self._storage_path / "data") if self._storage_path else None
marker = build_transition_marker(
previous_node=node_spec,
next_node=next_spec,
memory=memory,
cumulative_tool_names=sorted(cumulative_tool_names),
data_dir=data_dir,
)
await continuous_conversation.add_user_message(
marker,
is_transition_marker=True,
)
# Set current phase for phase-aware compaction
continuous_conversation.set_current_phase(next_spec.id)
# Opportunistic compaction at transition:
# 1. Prune old tool results (free, no LLM call)
# 2. If still over 80%, do a phase-graduated compact
if continuous_conversation.usage_ratio() > 0.5:
await continuous_conversation.prune_old_tool_results(
protect_tokens=2000,
)
if continuous_conversation.needs_compaction():
self.logger.info(
" Phase-boundary compaction (%.0f%% usage)",
continuous_conversation.usage_ratio() * 100,
)
summary = (
f"Summary of earlier phases (before {next_spec.name}). "
"See transition markers for phase details."
)
await continuous_conversation.compact(
summary,
keep_recent=4,
phase_graduated=True,
)
# Update input_data for next node
input_data = result.output
@@ -993,6 +1283,11 @@ class GraphExecutor:
had_partial_failures=len(nodes_failed) > 0,
execution_quality=exec_quality,
node_visit_counts=dict(node_visit_counts),
session_state={
"memory": output, # output IS memory.read_all()
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
},
)
except asyncio.CancelledError:
@@ -1082,6 +1377,7 @@ class GraphExecutor:
"memory": saved_memory,
"execution_path": list(path),
"node_visit_counts": dict(node_visit_counts),
"resume_from": current_node_id,
}
# Mark latest checkpoint for resume on failure
@@ -1134,12 +1430,21 @@ class GraphExecutor:
goal: Goal,
input_data: dict[str, Any],
max_tokens: int = 4096,
continuous_mode: bool = False,
inherited_conversation: Any = None,
override_tools: list | None = None,
cumulative_output_keys: list[str] | None = None,
event_triggered: bool = False,
) -> NodeContext:
"""Build execution context for a node."""
# Filter tools to those available to this node
available_tools = []
if node_spec.tools:
available_tools = [t for t in self.tools if t.name in node_spec.tools]
if override_tools is not None:
# Continuous mode: use cumulative tool set
available_tools = list(override_tools)
else:
available_tools = []
if node_spec.tools:
available_tools = [t for t in self.tools if t.name in node_spec.tools]
# Create scoped memory view
scoped_memory = memory.with_permissions(
@@ -1160,18 +1465,23 @@ class GraphExecutor:
max_tokens=max_tokens,
runtime_logger=self.runtime_logger,
pause_event=self._pause_requested, # Pass pause event for granular control
continuous_mode=continuous_mode,
inherited_conversation=inherited_conversation,
cumulative_output_keys=cumulative_output_keys or [],
event_triggered=event_triggered,
)
# Valid node types - no ambiguous "llm" type allowed
VALID_NODE_TYPES = {
"llm_tool_use",
"llm_generate",
"router",
"function",
"human_input",
"event_loop",
}
DEPRECATED_NODE_TYPES = {"llm_tool_use": "event_loop", "llm_generate": "event_loop"}
# Node types removed in v0.5 — provide migration guidance
REMOVED_NODE_TYPES = {
"function": "event_loop",
"llm_tool_use": "event_loop",
"llm_generate": "event_loop",
"router": "event_loop", # Unused theoretical infrastructure
"human_input": "event_loop", # Use client_facing=True instead
}
def _get_node_implementation(
self, node_spec: NodeSpec, cleanup_llm_model: str | None = None
@@ -1181,62 +1491,23 @@ class GraphExecutor:
if node_spec.id in self.node_registry:
return self.node_registry[node_spec.id]
# Reject removed node types with migration guidance
if node_spec.node_type in self.REMOVED_NODE_TYPES:
replacement = self.REMOVED_NODE_TYPES[node_spec.node_type]
raise RuntimeError(
f"Node type '{node_spec.node_type}' was removed in v0.5. "
f"Migrate node '{node_spec.id}' to '{replacement}'. "
f"See https://github.com/adenhq/hive/issues/4753 for migration guide."
)
# Validate node type
if node_spec.node_type not in self.VALID_NODE_TYPES:
raise RuntimeError(
f"Invalid node type '{node_spec.node_type}' for node '{node_spec.id}'. "
f"Must be one of: {sorted(self.VALID_NODE_TYPES)}. "
f"Use 'llm_tool_use' for nodes that call tools, 'llm_generate' for text generation."
)
# Warn on deprecated node types
if node_spec.node_type in self.DEPRECATED_NODE_TYPES:
replacement = self.DEPRECATED_NODE_TYPES[node_spec.node_type]
warnings.warn(
f"Node type '{node_spec.node_type}' is deprecated. "
f"Use '{replacement}' instead. "
f"Node: '{node_spec.id}'",
DeprecationWarning,
stacklevel=2,
)
# Create based on type
if node_spec.node_type == "llm_tool_use":
if not node_spec.tools:
raise RuntimeError(
f"Node '{node_spec.id}' is type 'llm_tool_use' but declares no tools. "
"Either add tools to the node or change type to 'llm_generate'."
)
return LLMNode(
tool_executor=self.tool_executor,
require_tools=True,
cleanup_llm_model=cleanup_llm_model,
)
if node_spec.node_type == "llm_generate":
return LLMNode(
tool_executor=None,
require_tools=False,
cleanup_llm_model=cleanup_llm_model,
)
if node_spec.node_type == "router":
return RouterNode()
if node_spec.node_type == "function":
# Function nodes need explicit registration
raise RuntimeError(
f"Function node '{node_spec.id}' not registered. Register with node_registry."
)
if node_spec.node_type == "human_input":
# Human input nodes are handled specially by HITL mechanism
return LLMNode(
tool_executor=None,
require_tools=False,
cleanup_llm_model=cleanup_llm_model,
f"Must be one of: {sorted(self.VALID_NODE_TYPES)}."
)
# Create based on type (only event_loop is valid)
if node_spec.node_type == "event_loop":
# Auto-create EventLoopNode with sensible defaults.
# Custom configs can still be pre-registered via node_registry.
@@ -1284,7 +1555,7 @@ class GraphExecutor:
# Should never reach here due to validation above
raise RuntimeError(f"Unhandled node type: {node_spec.node_type}")
def _follow_edges(
async def _follow_edges(
self,
graph: GraphSpec,
goal: Goal,
@@ -1299,7 +1570,7 @@ class GraphExecutor:
for edge in edges:
target_node_spec = graph.get_node(edge.target)
if edge.should_traverse(
if await edge.should_traverse(
source_success=result.success,
source_output=result.output,
memory=memory.read_all(),
@@ -1325,7 +1596,7 @@ class GraphExecutor:
self.logger.warning(f"⚠ Output validation failed: {validation.errors}")
# Clean the output
cleaned_output = self.output_cleaner.clean_output(
cleaned_output = await self.output_cleaner.clean_output(
output=output_to_validate,
source_node_id=current_node_id,
target_node_spec=target_node_spec,
@@ -1363,7 +1634,7 @@ class GraphExecutor:
return None
def _get_all_traversable_edges(
async def _get_all_traversable_edges(
self,
graph: GraphSpec,
goal: Goal,
@@ -1383,7 +1654,7 @@ class GraphExecutor:
for edge in edges:
target_node_spec = graph.get_node(edge.target)
if edge.should_traverse(
if await edge.should_traverse(
source_success=result.success,
source_output=result.output,
memory=memory.read_all(),
@@ -1496,14 +1767,19 @@ class GraphExecutor:
branch.error = f"Node {branch.node_id} not found in graph"
return branch, RuntimeError(branch.error)
# Get node implementation to check its type
branch_impl = self._get_node_implementation(node_spec, graph.cleanup_llm_model)
effective_max_retries = node_spec.max_retries
if node_spec.node_type == "event_loop":
if effective_max_retries > 1:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has "
f"max_retries={effective_max_retries}. Overriding "
"to 1 — event loop nodes handle retry internally."
)
# Only override for actual EventLoopNode instances, not custom NodeProtocol impls
from framework.graph.event_loop_node import EventLoopNode
if isinstance(branch_impl, EventLoopNode) and effective_max_retries > 1:
self.logger.warning(
f"EventLoopNode '{node_spec.id}' has "
f"max_retries={effective_max_retries}. Overriding "
"to 1 — event loop nodes handle retry internally."
)
effective_max_retries = 1
branch.status = "running"
@@ -1525,7 +1801,7 @@ class GraphExecutor:
f"⚠ Output validation failed for branch "
f"{branch.node_id}: {validation.errors}"
)
cleaned_output = self.output_cleaner.clean_output(
cleaned_output = await self.output_cleaner.clean_output(
output=mem_snapshot,
source_node_id=source_node_spec.id if source_node_spec else "unknown",
target_node_spec=node_spec,
@@ -1669,10 +1945,6 @@ class GraphExecutor:
"""Register a custom node implementation."""
self.node_registry[node_id] = implementation
def register_function(self, node_id: str, func: Callable) -> None:
"""Register a function as a node."""
self.node_registry[node_id] = FunctionNode(func)
def request_pause(self) -> None:
"""
Request graceful pause of the current execution.
-552
View File
@@ -1,552 +0,0 @@
"""
Flexible Graph Executor with Worker-Judge Loop.
Executes plans created by external planner (Claude Code, etc.)
using a Worker-Judge loop:
1. External planner creates Plan
2. FlexibleGraphExecutor receives Plan
3. Worker executes each step
4. Judge evaluates each result
5. If Judge says "replan" return to external planner with feedback
6. If Judge says "escalate" request human intervention
7. If all steps complete return success
This keeps planning external while execution/evaluation is internal.
"""
from collections.abc import Callable
from dataclasses import dataclass
from datetime import datetime
from typing import Any
from framework.graph.code_sandbox import CodeSandbox
from framework.graph.goal import Goal
from framework.graph.judge import HybridJudge, create_default_judge
from framework.graph.plan import (
ApprovalDecision,
ApprovalRequest,
ApprovalResult,
ExecutionStatus,
Judgment,
JudgmentAction,
Plan,
PlanExecutionResult,
PlanStep,
StepStatus,
)
from framework.graph.worker_node import StepExecutionResult, WorkerNode
from framework.llm.provider import LLMProvider, Tool
from framework.runtime.core import Runtime
# Type alias for approval callback
ApprovalCallback = Callable[[ApprovalRequest], ApprovalResult]
@dataclass
class ExecutorConfig:
"""Configuration for FlexibleGraphExecutor."""
max_retries_per_step: int = 3
max_total_steps: int = 100
timeout_seconds: int = 300
enable_parallel_execution: bool = False # Future: parallel step execution
class FlexibleGraphExecutor:
"""
Executes plans with Worker-Judge loop.
Plans come from external source (Claude Code, etc.).
Returns feedback for replanning if needed.
Usage:
executor = FlexibleGraphExecutor(
runtime=runtime,
llm=llm_provider,
tools=tools,
)
result = await executor.execute_plan(plan, goal, context)
if result.status == ExecutionStatus.NEEDS_REPLAN:
# External planner should create new plan using result.feedback
new_plan = external_planner.replan(result.feedback_context)
result = await executor.execute_plan(new_plan, goal, result.feedback_context)
"""
def __init__(
self,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
functions: dict[str, Callable] | None = None,
judge: HybridJudge | None = None,
config: ExecutorConfig | None = None,
approval_callback: ApprovalCallback | None = None,
):
"""
Initialize the FlexibleGraphExecutor.
Args:
runtime: Runtime for decision logging
llm: LLM provider for Worker and Judge
tools: Available tools
tool_executor: Function to execute tools
functions: Registered functions
judge: Custom judge (defaults to HybridJudge with default rules)
config: Executor configuration
approval_callback: Callback for human-in-the-loop approval.
If None, steps requiring approval will pause execution.
"""
self.runtime = runtime
self.llm = llm
self.tools = tools or {}
self.tool_executor = tool_executor
self.functions = functions or {}
self.config = config or ExecutorConfig()
self.approval_callback = approval_callback
# Create judge
self.judge = judge or create_default_judge(llm)
# Create worker
self.worker = WorkerNode(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
functions=functions,
sandbox=CodeSandbox(),
)
async def execute_plan(
self,
plan: Plan,
goal: Goal,
context: dict[str, Any] | None = None,
) -> PlanExecutionResult:
"""
Execute a plan created by external planner.
Args:
plan: The plan to execute
goal: The goal context
context: Initial context (e.g., from previous execution)
Returns:
PlanExecutionResult with status and feedback
"""
context = context or {}
context.update(plan.context) # Merge plan's accumulated context
# Start run
_run_id = self.runtime.start_run(
goal_id=goal.id,
goal_description=goal.description,
input_data={"plan_id": plan.id, "revision": plan.revision},
)
steps_executed = 0
total_tokens = 0
total_latency = 0
try:
while steps_executed < self.config.max_total_steps:
# Get next ready steps
ready_steps = plan.get_ready_steps()
if not ready_steps:
# Check if we're done or stuck
if plan.is_complete():
break
else:
# No ready steps but not complete - something's wrong
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=(
"No executable steps available but plan not complete. "
"Check dependencies."
),
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
# Execute next step (for now, sequential; could be parallel)
step = ready_steps[0]
# Debug: show ready steps
# ready_ids = [s.id for s in ready_steps]
# print(f" [DEBUG] Ready steps: {ready_ids}, executing: {step.id}")
# APPROVAL CHECK - before execution
if step.requires_approval:
approval_result = await self._request_approval(step, context)
if approval_result is None:
# No callback, pause execution
step.status = StepStatus.AWAITING_APPROVAL
return self._create_result(
status=ExecutionStatus.AWAITING_APPROVAL,
plan=plan,
context=context,
feedback=f"Step '{step.id}' requires approval: {step.description}",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if approval_result.decision == ApprovalDecision.REJECT:
step.status = StepStatus.REJECTED
step.error = approval_result.reason or "Rejected by human"
# Skip this step and continue with dependents marked as skipped
self._skip_dependent_steps(plan, step.id)
continue
if approval_result.decision == ApprovalDecision.ABORT:
return self._create_result(
status=ExecutionStatus.ABORTED,
plan=plan,
context=context,
feedback=approval_result.reason or "Aborted by human",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if approval_result.decision == ApprovalDecision.MODIFY:
# Apply modifications to step
if approval_result.modifications:
self._apply_modifications(step, approval_result.modifications)
# APPROVE - continue to execution
step.status = StepStatus.IN_PROGRESS
step.started_at = datetime.now()
step.attempts += 1
# WORK
work_result = await self.worker.execute(step, context)
steps_executed += 1
total_tokens += work_result.tokens_used
total_latency += work_result.latency_ms
# JUDGE
judgment = await self.judge.evaluate(
step=step,
result=work_result.__dict__,
goal=goal,
context=context,
)
# Handle judgment
result = await self._handle_judgment(
step=step,
work_result=work_result,
judgment=judgment,
plan=plan,
goal=goal,
context=context,
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
if result is not None:
# Judgment resulted in early return (replan/escalate)
self.runtime.end_run(
success=False,
narrative=f"Execution stopped: {result.status.value}",
)
return result
# All steps completed successfully
self.runtime.end_run(
success=True,
output_data=context,
narrative=f"Plan completed: {steps_executed} steps executed",
)
return self._create_result(
status=ExecutionStatus.COMPLETED,
plan=plan,
context=context,
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
except Exception as e:
self.runtime.report_problem(
severity="critical",
description=str(e),
)
self.runtime.end_run(
success=False,
narrative=f"Execution failed: {e}",
)
return PlanExecutionResult(
status=ExecutionStatus.FAILED,
error=str(e),
feedback=f"Execution error: {e}",
feedback_context=plan.to_feedback_context(),
completed_steps=[s.id for s in plan.get_completed_steps()],
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency_ms=total_latency,
)
async def _handle_judgment(
self,
step: PlanStep,
work_result: StepExecutionResult,
judgment: Judgment,
plan: Plan,
goal: Goal,
context: dict[str, Any],
steps_executed: int,
total_tokens: int,
total_latency: int,
) -> PlanExecutionResult | None:
"""
Handle judgment and return result if execution should stop.
Returns None to continue execution, or PlanExecutionResult to stop.
"""
if judgment.action == JudgmentAction.ACCEPT:
# Step succeeded - update state and continue
step.status = StepStatus.COMPLETED
step.completed_at = datetime.now()
step.result = work_result.outputs
# Map outputs to expected output keys
# If output has generic "result" key but step expects specific keys, map it
outputs_to_store = work_result.outputs.copy()
if step.expected_outputs and "result" in outputs_to_store:
result_value = outputs_to_store["result"]
# For each expected output key that's not in outputs, map from "result"
for expected_key in step.expected_outputs:
if expected_key not in outputs_to_store:
outputs_to_store[expected_key] = result_value
# Update context with mapped outputs
context.update(outputs_to_store)
# Store in plan context for replanning feedback
plan.context[step.id] = outputs_to_store
return None # Continue execution
elif judgment.action == JudgmentAction.RETRY:
# Retry step if under limit
if step.attempts < step.max_retries:
step.status = StepStatus.PENDING
step.error = judgment.feedback
# Record retry decision
self.runtime.decide(
intent=f"Retry step {step.id}",
options=[{"id": "retry", "description": "Retry with feedback"}],
chosen="retry",
reasoning=judgment.reasoning,
context={"attempt": step.attempts, "feedback": judgment.feedback},
)
return None # Continue (step will be retried)
else:
# Max retries exceeded - escalate to replan
step.status = StepStatus.FAILED
step.error = f"Max retries ({step.max_retries}) exceeded: {judgment.feedback}"
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=(
f"Step '{step.id}' failed after {step.attempts} attempts: "
f"{judgment.feedback}"
),
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
elif judgment.action == JudgmentAction.REPLAN:
# Return to external planner
step.status = StepStatus.FAILED
step.error = judgment.feedback
return self._create_result(
status=ExecutionStatus.NEEDS_REPLAN,
plan=plan,
context=context,
feedback=judgment.feedback or f"Step '{step.id}' requires replanning",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
elif judgment.action == JudgmentAction.ESCALATE:
# Request human intervention
return self._create_result(
status=ExecutionStatus.NEEDS_ESCALATION,
plan=plan,
context=context,
feedback=judgment.feedback or f"Step '{step.id}' requires human intervention",
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency=total_latency,
)
return None # Unknown action - continue
def _create_result(
self,
status: ExecutionStatus,
plan: Plan,
context: dict[str, Any],
feedback: str | None = None,
steps_executed: int = 0,
total_tokens: int = 0,
total_latency: int = 0,
) -> PlanExecutionResult:
"""Create a PlanExecutionResult."""
return PlanExecutionResult(
status=status,
results=context,
feedback=feedback,
feedback_context=plan.to_feedback_context(),
completed_steps=[s.id for s in plan.get_completed_steps()],
steps_executed=steps_executed,
total_tokens=total_tokens,
total_latency_ms=total_latency,
)
def register_function(self, name: str, func: Callable) -> None:
"""Register a function for FUNCTION actions."""
self.functions[name] = func
self.worker.register_function(name, func)
def register_tool(self, tool: Tool) -> None:
"""Register a tool for TOOL_USE actions."""
self.tools[tool.name] = tool
self.worker.register_tool(tool)
def add_evaluation_rule(self, rule) -> None:
"""Add an evaluation rule to the judge."""
self.judge.add_rule(rule)
async def _request_approval(
self,
step: PlanStep,
context: dict[str, Any],
) -> ApprovalResult | None:
"""
Request human approval for a step.
Returns None if no callback is set (execution should pause).
"""
if self.approval_callback is None:
return None
# Build preview of what will happen
preview_parts = []
if step.action.tool_name:
preview_parts.append(f"Tool: {step.action.tool_name}")
if step.action.tool_args:
import json
args_preview = json.dumps(step.action.tool_args, indent=2, default=str)
if len(args_preview) > 500:
args_preview = args_preview[:500] + "..."
preview_parts.append(f"Args: {args_preview}")
elif step.action.prompt:
prompt_preview = (
step.action.prompt[:300] + "..."
if len(step.action.prompt) > 300
else step.action.prompt
)
preview_parts.append(f"Prompt: {prompt_preview}")
# Include step inputs resolved from context (what will be sent/used)
relevant_context = {}
for input_key, input_value in step.inputs.items():
# Resolve variable references like "$email_sequence"
if isinstance(input_value, str) and input_value.startswith("$"):
context_key = input_value[1:] # Remove $ prefix
if context_key in context:
relevant_context[input_key] = context[context_key]
else:
relevant_context[input_key] = input_value
request = ApprovalRequest(
step_id=step.id,
step_description=step.description,
action_type=step.action.action_type.value,
action_details={
"tool_name": step.action.tool_name,
"tool_args": step.action.tool_args,
"prompt": step.action.prompt,
},
context=relevant_context,
approval_message=step.approval_message,
preview="\n".join(preview_parts) if preview_parts else None,
)
return self.approval_callback(request)
def _skip_dependent_steps(self, plan: Plan, rejected_step_id: str) -> None:
"""Mark steps that depend on a rejected step as skipped."""
for step in plan.steps:
if rejected_step_id in step.dependencies:
if step.status == StepStatus.PENDING:
step.status = StepStatus.SKIPPED
step.error = f"Skipped because dependency '{rejected_step_id}' was rejected"
# Recursively skip dependents
self._skip_dependent_steps(plan, step.id)
def _apply_modifications(self, step: PlanStep, modifications: dict[str, Any]) -> None:
"""Apply human modifications to a step before execution."""
# Allow modifying tool args
if "tool_args" in modifications and step.action.tool_args:
step.action.tool_args.update(modifications["tool_args"])
# Allow modifying prompt
if "prompt" in modifications:
step.action.prompt = modifications["prompt"]
# Allow modifying inputs
if "inputs" in modifications:
step.inputs.update(modifications["inputs"])
def set_approval_callback(self, callback: ApprovalCallback) -> None:
"""Set the approval callback for HITL steps."""
self.approval_callback = callback
# Convenience function for simple execution
async def execute_plan(
plan: Plan,
goal: Goal,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
context: dict[str, Any] | None = None,
) -> PlanExecutionResult:
"""
Execute a plan with default configuration.
Convenience function for simple use cases.
"""
executor = FlexibleGraphExecutor(
runtime=runtime,
llm=llm,
tools=tools,
tool_executor=tool_executor,
)
return await executor.execute_plan(plan, goal, context)
-406
View File
@@ -1,406 +0,0 @@
"""
Hybrid Judge for Evaluating Plan Step Results.
The HybridJudge evaluates step execution results using:
1. Rule-based evaluation (fast, deterministic)
2. LLM-based evaluation (fallback for ambiguous cases)
Escalation path: rules LLM human
"""
from dataclasses import dataclass, field
from typing import Any
from framework.graph.code_sandbox import safe_eval
from framework.graph.goal import Goal
from framework.graph.plan import (
EvaluationRule,
Judgment,
JudgmentAction,
PlanStep,
)
from framework.llm.provider import LLMProvider
@dataclass
class RuleEvaluationResult:
"""Result of rule-based evaluation."""
is_definitive: bool # True if a rule matched definitively
judgment: Judgment | None = None
context: dict[str, Any] = field(default_factory=dict)
rules_checked: int = 0
rule_matched: str | None = None
class HybridJudge:
"""
Evaluates plan step results using rules first, then LLM fallback.
Usage:
judge = HybridJudge(llm=llm_provider)
judge.add_rule(EvaluationRule(
id="success_check",
condition="result.get('success') == True",
action=JudgmentAction.ACCEPT,
))
judgment = await judge.evaluate(step, result, goal)
"""
def __init__(
self,
llm: LLMProvider | None = None,
rules: list[EvaluationRule] | None = None,
llm_confidence_threshold: float = 0.7,
):
"""
Initialize the HybridJudge.
Args:
llm: LLM provider for ambiguous cases
rules: Initial evaluation rules
llm_confidence_threshold: Confidence below this triggers escalation
"""
self.llm = llm
self.rules: list[EvaluationRule] = rules or []
self.llm_confidence_threshold = llm_confidence_threshold
# Sort rules by priority (higher first)
self._sort_rules()
def _sort_rules(self):
"""Sort rules by priority."""
self.rules.sort(key=lambda r: -r.priority)
def add_rule(self, rule: EvaluationRule) -> None:
"""Add an evaluation rule."""
self.rules.append(rule)
self._sort_rules()
def remove_rule(self, rule_id: str) -> bool:
"""Remove a rule by ID. Returns True if found and removed."""
for i, rule in enumerate(self.rules):
if rule.id == rule_id:
self.rules.pop(i)
return True
return False
async def evaluate(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any] | None = None,
) -> Judgment:
"""
Evaluate a step result.
Args:
step: The executed plan step
result: The result of executing the step
goal: The goal context for evaluation
context: Additional context from previous steps
Returns:
Judgment with action and feedback
"""
context = context or {}
# Try rule-based evaluation first
rule_result = self._evaluate_rules(step, result, goal, context)
if rule_result.is_definitive:
return rule_result.judgment
# Fall back to LLM evaluation
if self.llm:
return await self._evaluate_llm(step, result, goal, context, rule_result)
# No LLM available - default to accept with low confidence
return Judgment(
action=JudgmentAction.ACCEPT,
reasoning="No definitive rule matched and no LLM available for evaluation",
confidence=0.5,
llm_used=False,
)
def _evaluate_rules(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any],
) -> RuleEvaluationResult:
"""Evaluate step using rules."""
rules_checked = 0
# Build evaluation context
eval_context = {
"step": step.model_dump() if hasattr(step, "model_dump") else step,
"result": result,
"goal": goal.model_dump() if hasattr(goal, "model_dump") else goal,
"context": context,
"success": isinstance(result, dict) and result.get("success", False),
"error": isinstance(result, dict) and result.get("error"),
}
for rule in self.rules:
rules_checked += 1
# Evaluate rule condition
eval_result = safe_eval(rule.condition, eval_context)
if eval_result.success and eval_result.result:
# Rule matched!
feedback = self._format_feedback(rule.feedback_template, eval_context)
return RuleEvaluationResult(
is_definitive=True,
judgment=Judgment(
action=rule.action,
reasoning=rule.description,
feedback=feedback if feedback else None,
rule_matched=rule.id,
confidence=1.0,
llm_used=False,
),
rules_checked=rules_checked,
rule_matched=rule.id,
)
# No rule matched definitively
return RuleEvaluationResult(
is_definitive=False,
context=eval_context,
rules_checked=rules_checked,
)
def _format_feedback(
self,
template: str,
context: dict[str, Any],
) -> str:
"""Format feedback template with context values."""
if not template:
return ""
try:
return template.format(**context)
except (KeyError, ValueError):
return template
async def _evaluate_llm(
self,
step: PlanStep,
result: Any,
goal: Goal,
context: dict[str, Any],
rule_result: RuleEvaluationResult,
) -> Judgment:
"""Evaluate step using LLM."""
system_prompt = self._build_llm_system_prompt(goal)
user_prompt = self._build_llm_user_prompt(step, result, context, rule_result)
try:
response = self.llm.complete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
)
# Parse LLM response
judgment = self._parse_llm_response(response.content)
judgment.llm_used = True
# Check confidence threshold
if judgment.confidence < self.llm_confidence_threshold:
# Low confidence - escalate
return Judgment(
action=JudgmentAction.ESCALATE,
reasoning=(
f"LLM confidence ({judgment.confidence:.2f}) "
f"below threshold ({self.llm_confidence_threshold})"
),
feedback=judgment.feedback,
confidence=judgment.confidence,
llm_used=True,
context={"original_judgment": judgment.model_dump()},
)
return judgment
except Exception as e:
# LLM failed - escalate
return Judgment(
action=JudgmentAction.ESCALATE,
reasoning=f"LLM evaluation failed: {e}",
feedback="Human review needed due to LLM error",
llm_used=True,
)
def _build_llm_system_prompt(self, goal: Goal) -> str:
"""Build system prompt for LLM judge."""
return f"""You are a judge evaluating the execution of a plan step.
GOAL: {goal.description}
SUCCESS CRITERIA:
{chr(10).join(f"- {sc.description}" for sc in goal.success_criteria)}
CONSTRAINTS:
{chr(10).join(f"- {c.description}" for c in goal.constraints)}
Your task is to evaluate whether the step was executed successfully and decide the next action.
Respond in this exact format:
ACTION: [ACCEPT|RETRY|REPLAN|ESCALATE]
CONFIDENCE: [0.0-1.0]
REASONING: [Your reasoning]
FEEDBACK: [Feedback for retry/replan, or empty if accepting]
Actions:
- ACCEPT: Step completed successfully, continue to next step
- RETRY: Step failed but can be retried with feedback
- REPLAN: Step failed in a way that requires replanning
- ESCALATE: Requires human intervention
"""
def _build_llm_user_prompt(
self,
step: PlanStep,
result: Any,
context: dict[str, Any],
rule_result: RuleEvaluationResult,
) -> str:
"""Build user prompt for LLM judge."""
return f"""Evaluate this step execution:
STEP: {step.description}
STEP ID: {step.id}
ACTION TYPE: {step.action.action_type}
EXPECTED OUTPUTS: {step.expected_outputs}
RESULT:
{result}
CONTEXT FROM PREVIOUS STEPS:
{context}
RULES CHECKED: {rule_result.rules_checked} (none matched definitively)
Please evaluate and provide your judgment."""
def _parse_llm_response(self, response: str) -> Judgment:
"""Parse LLM response into Judgment."""
lines = response.strip().split("\n")
action = JudgmentAction.ACCEPT
confidence = 0.8
reasoning = ""
feedback = ""
for line in lines:
line = line.strip()
if line.startswith("ACTION:"):
action_str = line.split(":", 1)[1].strip().upper()
try:
action = JudgmentAction(action_str.lower())
except ValueError:
action = JudgmentAction.ESCALATE
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(line.split(":", 1)[1].strip())
except ValueError:
confidence = 0.5
elif line.startswith("REASONING:"):
reasoning = line.split(":", 1)[1].strip()
elif line.startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return Judgment(
action=action,
reasoning=reasoning or "LLM evaluation",
feedback=feedback if feedback else None,
confidence=confidence,
)
# Factory function for creating judge with common rules
def create_default_judge(llm: LLMProvider | None = None) -> HybridJudge:
"""
Create a HybridJudge with commonly useful default rules.
Args:
llm: LLM provider for fallback evaluation
Returns:
Configured HybridJudge instance
"""
judge = HybridJudge(llm=llm)
# Rule: Accept on explicit success flag
judge.add_rule(
EvaluationRule(
id="explicit_success",
description="Step explicitly marked as successful",
condition="isinstance(result, dict) and result.get('success') == True",
action=JudgmentAction.ACCEPT,
priority=100,
)
)
# Rule: Retry on transient errors
judge.add_rule(
EvaluationRule(
id="transient_error_retry",
description="Transient error that can be retried",
condition=(
"isinstance(result, dict) and "
"result.get('error_type') in ['timeout', 'rate_limit', 'connection_error']"
),
action=JudgmentAction.RETRY,
feedback_template="Transient error: {result[error]}. Please retry.",
priority=90,
)
)
# Rule: Replan on missing data
judge.add_rule(
EvaluationRule(
id="missing_data_replan",
description="Required data not available",
condition="isinstance(result, dict) and result.get('error_type') == 'missing_data'",
action=JudgmentAction.REPLAN,
feedback_template="Missing required data: {result[error]}. Plan needs adjustment.",
priority=80,
)
)
# Rule: Escalate on security issues
judge.add_rule(
EvaluationRule(
id="security_escalate",
description="Security issue detected",
condition="isinstance(result, dict) and result.get('error_type') == 'security'",
action=JudgmentAction.ESCALATE,
feedback_template="Security issue detected: {result[error]}",
priority=200,
)
)
# Rule: Fail on max retries exceeded
judge.add_rule(
EvaluationRule(
id="max_retries_fail",
description="Maximum retries exceeded",
condition="step.get('attempts', 0) >= step.get('max_retries', 3)",
action=JudgmentAction.REPLAN,
feedback_template="Step '{step[id]}' failed after {step[attempts]} attempts",
priority=150,
)
)
return judge
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -206,7 +206,7 @@ class OutputCleaner:
warnings=warnings,
)
def clean_output(
async def clean_output(
self,
output: dict[str, Any],
source_node_id: str,
@@ -288,7 +288,7 @@ Return ONLY valid JSON matching the expected schema. No explanations, no markdow
f"🧹 Cleaning output from '{source_node_id}' using {self.config.fast_model}"
)
response = self.llm.complete(
response = await self.llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system=(
"You clean malformed agent outputs. Return only valid JSON matching the schema."
-513
View File
@@ -1,513 +0,0 @@
"""
Plan Data Structures for Flexible Execution.
Plans are created externally (by Claude Code or another LLM agent) and
executed internally by the FlexibleGraphExecutor with Worker-Judge loop.
The Plan is the contract between the external planner and the executor:
- Planner creates a Plan with PlanSteps
- Executor runs steps and judges results
- If replanning needed, returns feedback to external planner
"""
from datetime import datetime
from enum import StrEnum
from typing import Any
from pydantic import BaseModel, Field
class ActionType(StrEnum):
"""Types of actions a PlanStep can perform."""
LLM_CALL = "llm_call" # Call LLM for generation
TOOL_USE = "tool_use" # Use a registered tool
SUB_GRAPH = "sub_graph" # Execute a sub-graph
FUNCTION = "function" # Call a Python function
CODE_EXECUTION = "code_execution" # Execute dynamic code (sandboxed)
class StepStatus(StrEnum):
"""Status of a plan step."""
PENDING = "pending"
AWAITING_APPROVAL = "awaiting_approval" # Waiting for human approval
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
REJECTED = "rejected" # Human rejected execution
def is_terminal(self) -> bool:
"""Check if this status represents a terminal (finished) state.
Terminal states are states where the step will not execute further,
either because it completed successfully or failed/was skipped.
"""
return self in (
StepStatus.COMPLETED,
StepStatus.FAILED,
StepStatus.SKIPPED,
StepStatus.REJECTED,
)
def is_successful(self) -> bool:
"""Check if this status represents successful completion."""
return self == StepStatus.COMPLETED
class ApprovalDecision(StrEnum):
"""Human decision on a step requiring approval."""
APPROVE = "approve" # Execute as planned
REJECT = "reject" # Skip this step
MODIFY = "modify" # Execute with modifications
ABORT = "abort" # Stop entire execution
class ApprovalRequest(BaseModel):
"""Request for human approval before executing a step."""
step_id: str
step_description: str
action_type: str
action_details: dict[str, Any] = Field(default_factory=dict)
context: dict[str, Any] = Field(default_factory=dict)
approval_message: str | None = None
# Preview of what will happen
preview: str | None = None
model_config = {"extra": "allow"}
class ApprovalResult(BaseModel):
"""Result of human approval decision."""
decision: ApprovalDecision
reason: str | None = None
modifications: dict[str, Any] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class JudgmentAction(StrEnum):
"""Actions the judge can take after evaluating a step."""
ACCEPT = "accept" # Step completed successfully, continue
RETRY = "retry" # Retry the step with feedback
REPLAN = "replan" # Return to external planner for new plan
ESCALATE = "escalate" # Request human intervention
class ActionSpec(BaseModel):
"""
Specification for an action to be executed.
This is the "what to do" part of a PlanStep.
"""
action_type: ActionType
# For LLM_CALL
prompt: str | None = None
system_prompt: str | None = None
model: str | None = None
# For TOOL_USE
tool_name: str | None = None
tool_args: dict[str, Any] = Field(default_factory=dict)
# For SUB_GRAPH
graph_id: str | None = None
# For FUNCTION
function_name: str | None = None
function_args: dict[str, Any] = Field(default_factory=dict)
# For CODE_EXECUTION
code: str | None = None
language: str = "python"
model_config = {"extra": "allow"}
class PlanStep(BaseModel):
"""
A single step in a plan.
Created by external planner, executed by Worker, evaluated by Judge.
"""
id: str
description: str
action: ActionSpec
# Data flow
inputs: dict[str, Any] = Field(
default_factory=dict,
description="Input data for this step (can reference previous step outputs)",
)
expected_outputs: list[str] = Field(
default_factory=list, description="Keys this step should produce"
)
# Dependencies
dependencies: list[str] = Field(
default_factory=list, description="IDs of steps that must complete before this one"
)
# Human-in-the-loop (HITL)
requires_approval: bool = Field(
default=False, description="If True, requires human approval before execution"
)
approval_message: str | None = Field(
default=None, description="Message to show human when requesting approval"
)
# Execution state
status: StepStatus = StepStatus.PENDING
result: Any | None = None
error: str | None = None
attempts: int = 0
max_retries: int = 3
# Metadata
started_at: datetime | None = None
completed_at: datetime | None = None
model_config = {"extra": "allow"}
def is_ready(self, terminal_step_ids: set[str]) -> bool:
"""Check if this step is ready to execute (all dependencies finished).
A step is ready when:
1. Its status is PENDING (not yet started)
2. All its dependencies are in a terminal state (completed, failed, skipped, or rejected)
Note: This allows dependent steps to become "ready" even if their dependencies
failed. The executor should check if any dependencies failed and handle
accordingly (e.g., skip the step or mark it as blocked).
Args:
terminal_step_ids: Set of step IDs that are in a terminal state
"""
if self.status != StepStatus.PENDING:
return False
return all(dep in terminal_step_ids for dep in self.dependencies)
class Judgment(BaseModel):
"""
Result of judging a step execution.
The Judge evaluates step results and decides what to do next.
"""
action: JudgmentAction
reasoning: str
feedback: str | None = None # For retry/replan - what went wrong
# For rule-based judgments
rule_matched: str | None = None
# For LLM-based judgments
confidence: float = 1.0
llm_used: bool = False
# Context for replanning
context: dict[str, Any] = Field(default_factory=dict)
model_config = {"extra": "allow"}
class EvaluationRule(BaseModel):
"""
A rule for the HybridJudge to evaluate step results.
Rules are checked before falling back to LLM evaluation.
"""
id: str
description: str
# Condition (Python expression evaluated with result, step, goal context)
condition: str
# What to do if condition matches
action: JudgmentAction
feedback_template: str = "" # Can use {result}, {step}, etc.
# Priority (higher = checked first)
priority: int = 0
model_config = {"extra": "allow"}
class Plan(BaseModel):
"""
A complete execution plan.
Created by external planner (Claude Code, etc).
Executed by FlexibleGraphExecutor.
"""
id: str
goal_id: str
description: str
# Steps to execute
steps: list[PlanStep] = Field(default_factory=list)
# Execution state
revision: int = 1 # Incremented on replan
current_step_idx: int = 0
# Accumulated context from execution
context: dict[str, Any] = Field(default_factory=dict)
# Metadata
created_at: datetime = Field(default_factory=datetime.now)
created_by: str = "external" # Who created this plan
# Previous attempt info (for replanning)
previous_feedback: str | None = None
model_config = {"extra": "allow"}
@classmethod
def from_json(cls, data: str | dict) -> "Plan":
"""
Load a Plan from exported JSON.
This handles the output from export_graph() and properly converts
action_type strings to ActionType enums.
Args:
data: JSON string or dict from export_graph()
Returns:
Plan object ready for FlexibleGraphExecutor
Example:
# Load from export_graph() output
exported = export_graph()
plan = Plan.from_json(exported)
# Load from file
with open("plan.json") as f:
plan = Plan.from_json(json.load(f))
"""
import json as json_module
if isinstance(data, str):
data = json_module.loads(data)
# Handle nested "plan" key from export_graph output
if "plan" in data:
data = data["plan"]
# Convert steps
steps = []
for step_data in data.get("steps", []):
action_data = step_data.get("action", {})
# Convert action_type string to enum
action_type_str = action_data.get("action_type", "function")
action_type = ActionType(action_type_str)
action = ActionSpec(
action_type=action_type,
prompt=action_data.get("prompt"),
system_prompt=action_data.get("system_prompt"),
tool_name=action_data.get("tool_name"),
tool_args=action_data.get("tool_args", {}),
function_name=action_data.get("function_name"),
function_args=action_data.get("function_args", {}),
code=action_data.get("code"),
)
step = PlanStep(
id=step_data["id"],
description=step_data.get("description", ""),
action=action,
inputs=step_data.get("inputs", {}),
expected_outputs=step_data.get("expected_outputs", []),
dependencies=step_data.get("dependencies", []),
requires_approval=step_data.get("requires_approval", False),
approval_message=step_data.get("approval_message"),
)
steps.append(step)
return cls(
id=data.get("id", "plan"),
goal_id=data.get("goal_id", ""),
description=data.get("description", ""),
steps=steps,
context=data.get("context", {}),
revision=data.get("revision", 1),
)
def get_step(self, step_id: str) -> PlanStep | None:
"""Get a step by ID."""
for step in self.steps:
if step.id == step_id:
return step
return None
def get_ready_steps(self) -> list[PlanStep]:
"""Get all steps that are ready to execute.
A step is ready when all its dependencies are in terminal states
(completed, failed, skipped, or rejected).
"""
terminal_ids = {s.id for s in self.steps if s.status.is_terminal()}
return [s for s in self.steps if s.is_ready(terminal_ids)]
def get_completed_steps(self) -> list[PlanStep]:
"""Get all completed steps."""
return [s for s in self.steps if s.status == StepStatus.COMPLETED]
def is_complete(self) -> bool:
"""Check if all steps are in terminal states (finished executing).
Returns True when all steps have reached a terminal state, regardless
of whether they succeeded or failed. Use has_failed_steps() to check
if any steps failed.
"""
return all(s.status.is_terminal() for s in self.steps)
def is_successful(self) -> bool:
"""Check if all steps completed successfully."""
return all(s.status == StepStatus.COMPLETED for s in self.steps)
def has_failed_steps(self) -> bool:
"""Check if any steps failed, were skipped, or were rejected."""
return any(
s.status in (StepStatus.FAILED, StepStatus.SKIPPED, StepStatus.REJECTED)
for s in self.steps
)
def get_failed_steps(self) -> list[PlanStep]:
"""Get all steps that failed, were skipped, or were rejected."""
return [
s
for s in self.steps
if s.status in (StepStatus.FAILED, StepStatus.SKIPPED, StepStatus.REJECTED)
]
def to_feedback_context(self) -> dict[str, Any]:
"""Create context for replanning."""
return {
"plan_id": self.id,
"revision": self.revision,
"completed_steps": [
{
"id": s.id,
"description": s.description,
"result": s.result,
}
for s in self.get_completed_steps()
],
"failed_steps": [
{
"id": s.id,
"description": s.description,
"error": s.error,
"attempts": s.attempts,
}
for s in self.steps
if s.status == StepStatus.FAILED
],
"context": self.context,
}
class ExecutionStatus(StrEnum):
"""Status of plan execution."""
COMPLETED = "completed"
AWAITING_APPROVAL = "awaiting_approval" # Paused for human approval
NEEDS_REPLAN = "needs_replan"
NEEDS_ESCALATION = "needs_escalation"
REJECTED = "rejected" # Human rejected a step
ABORTED = "aborted" # Human aborted execution
FAILED = "failed"
class PlanExecutionResult(BaseModel):
"""
Result of executing a plan.
Returned to external planner with status and feedback.
"""
status: ExecutionStatus
# Results from completed steps
results: dict[str, Any] = Field(default_factory=dict)
# For needs_replan - what to tell the planner
feedback: str | None = None
feedback_context: dict[str, Any] = Field(default_factory=dict)
# Steps that completed before stopping
completed_steps: list[str] = Field(default_factory=list)
# Metrics
steps_executed: int = 0
total_tokens: int = 0
total_latency_ms: int = 0
# Error info (for failed status)
error: str | None = None
model_config = {"extra": "allow"}
def load_export(data: str | dict) -> tuple["Plan", Any]:
"""
Load both Plan and Goal from export_graph() output.
The export_graph() MCP tool returns both the plan and the goal that was
defined and approved during the agent building process. This function
loads both so you can use them with FlexibleGraphExecutor.
Args:
data: JSON string or dict from export_graph()
Returns:
Tuple of (Plan, Goal) ready for FlexibleGraphExecutor
Example:
# Load from export_graph() output
exported = export_graph()
plan, goal = load_export(exported)
result = await executor.execute_plan(plan, goal, context)
"""
import json as json_module
from framework.graph.goal import Goal
if isinstance(data, str):
data = json_module.loads(data)
# Load plan
plan = Plan.from_json(data)
# Load goal
goal_data = data.get("goal", {})
if goal_data:
goal = Goal.model_validate(goal_data)
else:
# Fallback: create minimal goal from plan metadata
goal = Goal(
id=plan.goal_id,
name=plan.goal_id,
description=plan.description,
success_criteria=[],
constraints=[],
)
return plan, goal
+185
View File
@@ -0,0 +1,185 @@
"""Prompt composition for continuous agent mode.
Composes the three-layer system prompt (onion model) and generates
transition markers inserted into the conversation at phase boundaries.
Layer 1 Identity (static, defined at agent level, never changes):
"You are a thorough research agent. You prefer clarity over jargon..."
Layer 2 Narrative (auto-generated from conversation/memory state):
"We've finished scoping the project. The user wants to focus on..."
Layer 3 Focus (per-node system_prompt, reframed as focus directive):
"Your current attention: synthesize findings into a report..."
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from framework.graph.edge import GraphSpec
from framework.graph.node import NodeSpec, SharedMemory
logger = logging.getLogger(__name__)
def compose_system_prompt(
identity_prompt: str | None,
focus_prompt: str | None,
narrative: str | None = None,
) -> str:
"""Compose the three-layer system prompt.
Args:
identity_prompt: Layer 1 static agent identity (from GraphSpec).
focus_prompt: Layer 3 per-node focus directive (from NodeSpec.system_prompt).
narrative: Layer 2 auto-generated from conversation state.
Returns:
Composed system prompt with all layers present.
"""
parts: list[str] = []
# Layer 1: Identity (always first, anchors the personality)
if identity_prompt:
parts.append(identity_prompt)
# Layer 2: Narrative (what's happened so far)
if narrative:
parts.append(f"\n--- Context (what has happened so far) ---\n{narrative}")
# Layer 3: Focus (current phase directive)
if focus_prompt:
parts.append(f"\n--- Current Focus ---\n{focus_prompt}")
return "\n".join(parts) if parts else ""
def build_narrative(
memory: SharedMemory,
execution_path: list[str],
graph: GraphSpec,
) -> str:
"""Build Layer 2 (narrative) from structured state.
Deterministic no LLM call. Reads SharedMemory and execution path
to describe what has happened so far. Cheap and fast.
Args:
memory: Current shared memory state.
execution_path: List of node IDs visited so far.
graph: Graph spec (for node names/descriptions).
Returns:
Narrative string describing the session state.
"""
parts: list[str] = []
# Describe execution path
if execution_path:
phase_descriptions: list[str] = []
for node_id in execution_path:
node_spec = graph.get_node(node_id)
if node_spec:
phase_descriptions.append(f"- {node_spec.name}: {node_spec.description}")
else:
phase_descriptions.append(f"- {node_id}")
parts.append("Phases completed:\n" + "\n".join(phase_descriptions))
# Describe key memory values (skip very long values)
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 200:
val_str = val_str[:200] + "..."
memory_lines.append(f"- {key}: {val_str}")
if memory_lines:
parts.append("Current state:\n" + "\n".join(memory_lines))
return "\n\n".join(parts) if parts else ""
def build_transition_marker(
previous_node: NodeSpec,
next_node: NodeSpec,
memory: SharedMemory,
cumulative_tool_names: list[str],
data_dir: Path | str | None = None,
) -> str:
"""Build a 'State of the World' transition marker.
Inserted into the conversation as a user message at phase boundaries.
Gives the LLM full situational awareness: what happened, what's stored,
what tools are available, and what to focus on next.
Args:
previous_node: NodeSpec of the phase just completed.
next_node: NodeSpec of the phase about to start.
memory: Current shared memory state.
cumulative_tool_names: All tools available (cumulative set).
data_dir: Path to spillover data directory.
Returns:
Transition marker message text.
"""
sections: list[str] = []
# Header
sections.append(f"--- PHASE TRANSITION: {previous_node.name}{next_node.name} ---")
# What just completed
sections.append(f"\nCompleted: {previous_node.name}")
sections.append(f" {previous_node.description}")
# Outputs in memory
all_memory = memory.read_all()
if all_memory:
memory_lines: list[str] = []
for key, value in all_memory.items():
if value is None:
continue
val_str = str(value)
if len(val_str) > 300:
val_str = val_str[:300] + "..."
memory_lines.append(f" {key}: {val_str}")
if memory_lines:
sections.append("\nOutputs available:\n" + "\n".join(memory_lines))
# Files in data directory
if data_dir:
data_path = Path(data_dir)
if data_path.exists():
files = sorted(data_path.iterdir())
if files:
file_lines = [
f" {f.name} ({f.stat().st_size:,} bytes)" for f in files if f.is_file()
]
if file_lines:
sections.append(
"\nData files (use load_data to access):\n" + "\n".join(file_lines)
)
# Available tools
if cumulative_tool_names:
sections.append("\nAvailable tools: " + ", ".join(sorted(cumulative_tool_names)))
# Next phase
sections.append(f"\nNow entering: {next_node.name}")
sections.append(f" {next_node.description}")
# Reflection prompt (engineered metacognition)
sections.append(
"\nBefore proceeding, briefly reflect: what went well in the "
"previous phase? Are there any gaps or surprises worth noting?"
)
sections.append("\n--- END TRANSITION ---")
return "\n".join(sections)
+1 -1
View File
@@ -145,7 +145,7 @@ class SafeEvalVisitor(ast.NodeVisitor):
def visit_Attribute(self, node: ast.Attribute) -> Any:
# value.attr
# STIRCT CHECK: No access to private attributes (starting with _)
# STRICT CHECK: No access to private attributes (starting with _)
if node.attr.startswith("_"):
raise ValueError(f"Access to private attribute '{node.attr}' is not allowed")
-620
View File
@@ -1,620 +0,0 @@
"""
Worker Node for Executing Plan Steps.
The Worker executes individual plan steps by dispatching to the
appropriate executor based on action type:
- LLM calls
- Tool usage
- Sub-graph execution
- Function calls
- Code execution (sandboxed)
"""
import json
import logging
import re
import time
from collections.abc import Callable
from dataclasses import dataclass, field
from typing import Any
from framework.graph.code_sandbox import CodeSandbox
from framework.graph.plan import (
ActionSpec,
ActionType,
PlanStep,
)
from framework.llm.provider import LLMProvider, Tool
from framework.runtime.core import Runtime
logger = logging.getLogger(__name__)
def parse_llm_json_response(text: str) -> tuple[Any | None, str]:
"""
Parse JSON from LLM response, handling markdown code blocks.
LLMs often return JSON wrapped in markdown code blocks like:
```json
{"key": "value"}
```
This function extracts and parses the JSON.
Args:
text: Raw LLM response text
Returns:
Tuple of (parsed_json_or_None, cleaned_text)
"""
if not isinstance(text, str):
return None, str(text)
cleaned = text.strip()
# Try to extract JSON from markdown code blocks
# Pattern: ```json ... ``` or ``` ... ```
code_block_pattern = r"```(?:json)?\s*([\s\S]*?)\s*```"
matches = re.findall(code_block_pattern, cleaned)
if matches:
# Try to parse each match
for match in matches:
try:
parsed = json.loads(match.strip())
return parsed, match.strip()
except json.JSONDecodeError as e:
logger.debug(
f"Failed to parse JSON from code block: {e}. "
f"Content preview: {match.strip()[:100]}..."
)
continue
# No code blocks or parsing failed - try parsing the whole response
try:
parsed = json.loads(cleaned)
return parsed, cleaned
except json.JSONDecodeError as e:
logger.debug(
f"Failed to parse entire response as JSON: {e}. Content preview: {cleaned[:100]}..."
)
# Try to find JSON-like content (starts with { or [)
json_start_pattern = r"(\{[\s\S]*\}|\[[\s\S]*\])"
json_matches = re.findall(json_start_pattern, cleaned)
for match in json_matches:
try:
parsed = json.loads(match)
return parsed, match
except json.JSONDecodeError as e:
logger.debug(f"Failed to parse JSON pattern: {e}. Content preview: {match[:100]}...")
continue
# Could not parse as JSON - log warning
logger.warning(
f"Could not parse LLM response as JSON after trying all strategies. "
f"Response preview: {cleaned[:200]}..."
)
return None, cleaned
@dataclass
class StepExecutionResult:
"""Result of executing a plan step."""
success: bool
outputs: dict[str, Any] = field(default_factory=dict)
error: str | None = None
error_type: str | None = None # For judge rules: timeout, rate_limit, etc.
# Metadata
tokens_used: int = 0
latency_ms: int = 0
executor_type: str = ""
class WorkerNode:
"""
Executes plan steps by dispatching to appropriate executors.
Usage:
worker = WorkerNode(
runtime=runtime,
llm=llm_provider,
tools=tool_registry,
)
result = await worker.execute(step, context)
"""
def __init__(
self,
runtime: Runtime,
llm: LLMProvider | None = None,
tools: dict[str, Tool] | None = None,
tool_executor: Callable | None = None,
functions: dict[str, Callable] | None = None,
sub_graph_executor: Callable | None = None,
sandbox: CodeSandbox | None = None,
):
"""
Initialize the Worker.
Args:
runtime: Runtime for decision logging
llm: LLM provider for LLM_CALL actions
tools: Available tools for TOOL_USE actions
tool_executor: Function to execute tools
functions: Registered functions for FUNCTION actions
sub_graph_executor: Function to execute sub-graphs
sandbox: Code sandbox for CODE_EXECUTION actions
"""
self.runtime = runtime
self.llm = llm
self.tools = tools or {}
self.tool_executor = tool_executor
self.functions = functions or {}
self.sub_graph_executor = sub_graph_executor
self.sandbox = sandbox or CodeSandbox()
async def execute(
self,
step: PlanStep,
context: dict[str, Any],
) -> StepExecutionResult:
"""
Execute a plan step.
Args:
step: The step to execute
context: Current execution context
Returns:
StepExecutionResult with outputs and status
"""
# Record decision
decision_id = self.runtime.decide(
intent=f"Execute plan step: {step.description}",
options=[
{
"id": step.action.action_type.value,
"description": f"Execute {step.action.action_type.value} action",
"action_type": step.action.action_type.value,
}
],
chosen=step.action.action_type.value,
reasoning=f"Step requires {step.action.action_type.value}",
context={"step_id": step.id, "inputs": step.inputs},
)
start_time = time.time()
try:
# Resolve inputs from context
resolved_inputs = self._resolve_inputs(step.inputs, context)
# Dispatch to appropriate executor
result = await self._dispatch(step.action, resolved_inputs, context)
latency_ms = int((time.time() - start_time) * 1000)
result.latency_ms = latency_ms
# Record outcome
self.runtime.record_outcome(
decision_id=decision_id,
success=result.success,
result=result.outputs if result.success else result.error,
tokens_used=result.tokens_used,
latency_ms=latency_ms,
)
return result
except Exception as e:
latency_ms = int((time.time() - start_time) * 1000)
self.runtime.record_outcome(
decision_id=decision_id,
success=False,
error=str(e),
latency_ms=latency_ms,
)
return StepExecutionResult(
success=False,
error=str(e),
error_type="exception",
latency_ms=latency_ms,
)
def _resolve_inputs(
self,
inputs: dict[str, Any],
context: dict[str, Any],
) -> dict[str, Any]:
"""Resolve input references from context."""
resolved = {}
for key, value in inputs.items():
if isinstance(value, str) and value.startswith("$"):
# Reference to context variable
ref_key = value[1:] # Remove $
resolved[key] = context.get(ref_key, value)
else:
resolved[key] = value
return resolved
async def _dispatch(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Dispatch to appropriate executor based on action type."""
if action.action_type == ActionType.LLM_CALL:
return await self._execute_llm_call(action, inputs, context)
elif action.action_type == ActionType.TOOL_USE:
return await self._execute_tool_use(action, inputs)
elif action.action_type == ActionType.SUB_GRAPH:
return await self._execute_sub_graph(action, inputs, context)
elif action.action_type == ActionType.FUNCTION:
return await self._execute_function(action, inputs)
elif action.action_type == ActionType.CODE_EXECUTION:
return self._execute_code(action, inputs, context)
else:
return StepExecutionResult(
success=False,
error=f"Unknown action type: {action.action_type}",
error_type="invalid_action",
)
async def _execute_llm_call(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute an LLM call action."""
if self.llm is None:
return StepExecutionResult(
success=False,
error="No LLM provider configured",
error_type="configuration",
executor_type="llm_call",
)
try:
# Build prompt with context data
prompt = action.prompt or ""
# First try format placeholders (for prompts like "Hello {name}")
if inputs:
try:
prompt = prompt.format(**inputs)
except (KeyError, ValueError):
pass # Keep original prompt if formatting fails
# Always append context data so LLM can personalize
# This ensures the LLM has access to lead info, company context, etc.
if inputs:
context_section = "\n\n--- Context Data ---\n"
for key, value in inputs.items():
if isinstance(value, dict | list):
context_section += f"{key}: {json.dumps(value, indent=2)}\n"
else:
context_section += f"{key}: {value}\n"
prompt = prompt + context_section
messages = [{"role": "user", "content": prompt}]
response = self.llm.complete(
messages=messages,
system=action.system_prompt,
)
# Try to parse JSON from LLM response
# LLMs often return JSON wrapped in markdown code blocks
parsed_json, _ = parse_llm_json_response(response.content)
# If JSON was parsed successfully, use it as the result
# Otherwise, use the raw text
result_value = parsed_json if parsed_json is not None else response.content
return StepExecutionResult(
success=True,
outputs={
"result": result_value,
"response": response.content, # Always keep raw response
"parsed_json": parsed_json, # Explicit parsed JSON (or None)
},
tokens_used=response.input_tokens + response.output_tokens,
executor_type="llm_call",
)
except Exception as e:
error_type = "rate_limit" if "rate" in str(e).lower() else "llm_error"
return StepExecutionResult(
success=False,
error=str(e),
error_type=error_type,
executor_type="llm_call",
)
async def _execute_tool_use(
self,
action: ActionSpec,
inputs: dict[str, Any],
) -> StepExecutionResult:
"""Execute a tool use action."""
tool_name = action.tool_name
if not tool_name:
return StepExecutionResult(
success=False,
error="No tool name specified",
error_type="invalid_action",
executor_type="tool_use",
)
# Merge action args with inputs
args = {**action.tool_args, **inputs}
# Resolve any $variable references in the merged args
# (tool_args may contain $refs that should be resolved from inputs)
resolved_args = {}
for key, value in args.items():
if isinstance(value, str) and value.startswith("$"):
ref_key = value[1:] # Remove $
resolved_args[key] = args.get(ref_key, value)
else:
resolved_args[key] = value
args = resolved_args
# First, check if we have a registered function with this name
# This allows simpler tool registration without full Tool/ToolExecutor setup
if tool_name in self.functions:
try:
func = self.functions[tool_name]
result = func(**args)
# Handle async functions
if hasattr(result, "__await__"):
result = await result
# If result is already a dict with success/outputs, use it directly
if isinstance(result, dict) and "success" in result:
return StepExecutionResult(
success=result.get("success", False),
outputs=result.get("outputs", {}),
error=result.get("error"),
error_type=result.get("error_type"),
executor_type="tool_use",
)
# Otherwise wrap the result
return StepExecutionResult(
success=True,
outputs={"result": result},
executor_type="tool_use",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="tool_exception",
executor_type="tool_use",
)
# Fall back to formal Tool registry
if tool_name not in self.tools:
return StepExecutionResult(
success=False,
error=f"Tool '{tool_name}' not found",
error_type="missing_tool",
executor_type="tool_use",
)
if self.tool_executor is None:
return StepExecutionResult(
success=False,
error="No tool executor configured",
error_type="configuration",
executor_type="tool_use",
)
try:
# Execute tool via formal executor
from framework.llm.provider import ToolUse
tool_use = ToolUse(
id=f"step_{tool_name}",
name=tool_name,
input=args,
)
result = self.tool_executor(tool_use)
if result.is_error:
return StepExecutionResult(
success=False,
outputs={},
error=result.content,
error_type="tool_error",
executor_type="tool_use",
)
# Parse JSON result and unpack fields into outputs
# Tools return JSON like {"lead_email": "...", "company_name": "..."}
# We want each field as a separate output key
outputs = {"result": result.content}
try:
parsed = json.loads(result.content)
if isinstance(parsed, dict):
# Unpack all fields from the JSON response
outputs.update(parsed)
except (json.JSONDecodeError, TypeError):
pass # Keep result as-is if not valid JSON
return StepExecutionResult(
success=True,
outputs=outputs,
executor_type="tool_use",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="tool_exception",
executor_type="tool_use",
)
async def _execute_sub_graph(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute a sub-graph action."""
if self.sub_graph_executor is None:
return StepExecutionResult(
success=False,
error="No sub-graph executor configured",
error_type="configuration",
executor_type="sub_graph",
)
graph_id = action.graph_id
if not graph_id:
return StepExecutionResult(
success=False,
error="No graph ID specified",
error_type="invalid_action",
executor_type="sub_graph",
)
try:
result = await self.sub_graph_executor(graph_id, inputs, context)
return StepExecutionResult(
success=result.success,
outputs=result.output if result.success else {},
error=result.error if not result.success else None,
tokens_used=result.total_tokens,
executor_type="sub_graph",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="sub_graph_exception",
executor_type="sub_graph",
)
async def _execute_function(
self,
action: ActionSpec,
inputs: dict[str, Any],
) -> StepExecutionResult:
"""Execute a function action."""
func_name = action.function_name
if not func_name:
return StepExecutionResult(
success=False,
error="No function name specified",
error_type="invalid_action",
executor_type="function",
)
if func_name not in self.functions:
return StepExecutionResult(
success=False,
error=f"Function '{func_name}' not registered",
error_type="missing_function",
executor_type="function",
)
try:
func = self.functions[func_name]
# Merge action args with inputs
args = {**action.function_args, **inputs}
# Execute function
result = func(**args)
# Handle async functions
if hasattr(result, "__await__"):
result = await result
return StepExecutionResult(
success=True,
outputs={"result": result},
executor_type="function",
)
except Exception as e:
return StepExecutionResult(
success=False,
error=str(e),
error_type="function_exception",
executor_type="function",
)
def _execute_code(
self,
action: ActionSpec,
inputs: dict[str, Any],
context: dict[str, Any],
) -> StepExecutionResult:
"""Execute a code action in sandbox."""
code = action.code
if not code:
return StepExecutionResult(
success=False,
error="No code specified",
error_type="invalid_action",
executor_type="code_execution",
)
# Merge inputs with context for code
code_inputs = {**context, **inputs}
# Execute in sandbox
sandbox_result = self.sandbox.execute(code, code_inputs)
if sandbox_result.success:
return StepExecutionResult(
success=True,
outputs={
"result": sandbox_result.result,
**sandbox_result.variables,
},
executor_type="code_execution",
latency_ms=sandbox_result.execution_time_ms,
)
else:
error_type = "security" if "Security" in (sandbox_result.error or "") else "code_error"
return StepExecutionResult(
success=False,
error=sandbox_result.error,
error_type=error_type,
executor_type="code_execution",
latency_ms=sandbox_result.execution_time_ms,
)
def register_function(self, name: str, func: Callable) -> None:
"""Register a function for FUNCTION actions."""
self.functions[name] = func
def register_tool(self, tool: Tool) -> None:
"""Register a tool for TOOL_USE actions."""
self.tools[tool.name] = tool
+40
View File
@@ -70,6 +70,7 @@ class AnthropicProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion from Claude (via LiteLLM)."""
return self._provider.complete(
@@ -79,6 +80,7 @@ class AnthropicProvider(LLMProvider):
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
def complete_with_tools(
@@ -97,3 +99,41 @@ class AnthropicProvider(LLMProvider):
tool_executor=tool_executor,
max_iterations=max_iterations,
)
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async completion via LiteLLM."""
return await self._provider.acomplete(
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
) -> LLMResponse:
"""Async tool-use loop via LiteLLM."""
return await self._provider.acomplete_with_tools(
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
)
+383 -16
View File
@@ -30,6 +30,7 @@ logger = logging.getLogger(__name__)
RATE_LIMIT_MAX_RETRIES = 10
RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
# Directory for dumping failed requests
FAILED_REQUESTS_DIR = Path.home() / ".hive" / "failed_requests"
@@ -84,6 +85,91 @@ def _dump_failed_request(
return str(filepath)
def _compute_retry_delay(
attempt: int,
exception: BaseException | None = None,
backoff_base: int = RATE_LIMIT_BACKOFF_BASE,
max_delay: int = RATE_LIMIT_MAX_DELAY,
) -> float:
"""Compute retry delay, preferring server-provided Retry-After headers.
Priority:
1. retry-after-ms header (milliseconds, float)
2. retry-after header as seconds (float)
3. retry-after header as HTTP-date (RFC 7231)
4. Exponential backoff: backoff_base * 2^attempt
All values are capped at max_delay seconds.
"""
if exception is not None:
response = getattr(exception, "response", None)
if response is not None:
headers = getattr(response, "headers", None)
if headers is not None:
# Priority 1: retry-after-ms (milliseconds)
retry_after_ms = headers.get("retry-after-ms")
if retry_after_ms is not None:
try:
delay = float(retry_after_ms) / 1000.0
return min(max(delay, 0), max_delay)
except (ValueError, TypeError):
pass
# Priority 2: retry-after (seconds or HTTP-date)
retry_after = headers.get("retry-after")
if retry_after is not None:
# Try as seconds (float)
try:
delay = float(retry_after)
return min(max(delay, 0), max_delay)
except (ValueError, TypeError):
pass
# Try as HTTP-date (e.g., "Fri, 31 Dec 2025 23:59:59 GMT")
try:
from email.utils import parsedate_to_datetime
retry_date = parsedate_to_datetime(retry_after)
now = datetime.now(retry_date.tzinfo)
delay = (retry_date - now).total_seconds()
return min(max(delay, 0), max_delay)
except (ValueError, TypeError, OverflowError):
pass
# Fallback: exponential backoff
delay = backoff_base * (2**attempt)
return min(delay, max_delay)
def _is_stream_transient_error(exc: BaseException) -> bool:
"""Classify whether a streaming exception is transient (recoverable).
Transient errors (recoverable=True): network issues, server errors, timeouts.
Permanent errors (recoverable=False): auth, bad request, context window, etc.
"""
try:
from litellm.exceptions import (
APIConnectionError,
BadGatewayError,
InternalServerError,
ServiceUnavailableError,
)
transient_types: tuple[type[BaseException], ...] = (
APIConnectionError,
InternalServerError,
BadGatewayError,
ServiceUnavailableError,
TimeoutError,
ConnectionError,
OSError,
)
except ImportError:
transient_types = (TimeoutError, ConnectionError, OSError)
return isinstance(exc, transient_types)
class LiteLLMProvider(LLMProvider):
"""
LiteLLM-based LLM provider for multi-provider support.
@@ -150,10 +236,13 @@ class LiteLLMProvider(LLMProvider):
"LiteLLM is not installed. Please install it with: uv pip install litellm"
)
def _completion_with_rate_limit_retry(self, **kwargs: Any) -> Any:
def _completion_with_rate_limit_retry(
self, max_retries: int | None = None, **kwargs: Any
) -> Any:
"""Call litellm.completion with retry on 429 rate limit errors and empty responses."""
model = kwargs.get("model", self.model)
for attempt in range(RATE_LIMIT_MAX_RETRIES + 1):
retries = max_retries if max_retries is not None else RATE_LIMIT_MAX_RETRIES
for attempt in range(retries + 1):
try:
response = litellm.completion(**kwargs) # type: ignore[union-attr]
@@ -194,22 +283,22 @@ class LiteLLMProvider(LLMProvider):
f"Full request dumped to: {dump_path}"
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0})"
)
return response
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt)
logger.warning(
f"[retry] {model} returned empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0}) — "
f"likely rate limited or quota exceeded. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
continue
@@ -225,21 +314,21 @@ class LiteLLMProvider(LLMProvider):
error_type="rate_limit",
attempt=attempt,
)
if attempt == RATE_LIMIT_MAX_RETRIES:
if attempt == retries:
logger.error(
f"[retry] GAVE UP on {model} after {RATE_LIMIT_MAX_RETRIES + 1} "
f"[retry] GAVE UP on {model} after {retries + 1} "
f"attempts — rate limit error: {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
raise
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[retry] {model} rate limited (429): {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
f"(attempt {attempt + 1}/{retries})"
)
time.sleep(wait)
# unreachable, but satisfies type checker
@@ -253,6 +342,7 @@ class LiteLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Generate a completion using LiteLLM."""
# Prepare messages with system prompt
@@ -293,12 +383,17 @@ class LiteLLMProvider(LLMProvider):
kwargs["response_format"] = response_format
# Make the call
response = self._completion_with_rate_limit_retry(**kwargs)
response = self._completion_with_rate_limit_retry(max_retries=max_retries, **kwargs)
# Extract content
content = response.choices[0].message.content or ""
# Get usage info
# Get usage info.
# NOTE: completion_tokens includes reasoning/thinking tokens for models
# that use them (o1, gpt-5-mini, etc.). LiteLLM does not reliably expose
# usage.completion_tokens_details.reasoning_tokens across all providers.
# This means output_tokens may be inflated for reasoning models.
# Compaction is unaffected — it uses prompt_tokens (input-side only).
usage = response.usage
input_tokens = usage.prompt_tokens if usage else 0
output_tokens = usage.completion_tokens if usage else 0
@@ -433,6 +528,267 @@ class LiteLLMProvider(LLMProvider):
raw_response=None,
)
# ------------------------------------------------------------------
# Async variants — non-blocking on the event loop
# ------------------------------------------------------------------
async def _acompletion_with_rate_limit_retry(
self, max_retries: int | None = None, **kwargs: Any
) -> Any:
"""Async version of _completion_with_rate_limit_retry.
Uses litellm.acompletion and asyncio.sleep instead of blocking calls.
"""
model = kwargs.get("model", self.model)
retries = max_retries if max_retries is not None else RATE_LIMIT_MAX_RETRIES
for attempt in range(retries + 1):
try:
response = await litellm.acompletion(**kwargs) # type: ignore[union-attr]
content = response.choices[0].message.content if response.choices else None
has_tool_calls = bool(response.choices and response.choices[0].message.tool_calls)
if not content and not has_tool_calls:
messages = kwargs.get("messages", [])
last_role = next(
(m["role"] for m in reversed(messages) if m.get("role") != "system"),
None,
)
if last_role == "assistant":
logger.debug(
"[async-retry] Empty response after assistant message — "
"expected, not retrying."
)
return response
finish_reason = (
response.choices[0].finish_reason if response.choices else "unknown"
)
token_count, token_method = _estimate_tokens(model, messages)
dump_path = _dump_failed_request(
model=model,
kwargs=kwargs,
error_type="empty_response",
attempt=attempt,
)
logger.warning(
f"[async-retry] Empty response - {len(messages)} messages, "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
if attempt == retries:
logger.error(
f"[async-retry] GAVE UP on {model} after {retries + 1} "
f"attempts — empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0})"
)
return response
wait = _compute_retry_delay(attempt)
logger.warning(
f"[async-retry] {model} returned empty response "
f"(finish_reason={finish_reason}, "
f"choices={len(response.choices) if response.choices else 0}) — "
f"likely rate limited or quota exceeded. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{retries})"
)
await asyncio.sleep(wait)
continue
return response
except RateLimitError as e:
messages = kwargs.get("messages", [])
token_count, token_method = _estimate_tokens(model, messages)
dump_path = _dump_failed_request(
model=model,
kwargs=kwargs,
error_type="rate_limit",
attempt=attempt,
)
if attempt == retries:
logger.error(
f"[async-retry] GAVE UP on {model} after {retries + 1} "
f"attempts — rate limit error: {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}"
)
raise
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[async-retry] {model} rate limited (429): {e!s}. "
f"~{token_count} tokens ({token_method}). "
f"Full request dumped to: {dump_path}. "
f"Retrying in {wait}s "
f"(attempt {attempt + 1}/{retries})"
)
await asyncio.sleep(wait)
raise RuntimeError("Exhausted rate limit retries")
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async version of complete(). Uses litellm.acompletion — non-blocking."""
full_messages: list[dict[str, Any]] = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
if json_mode:
json_instruction = "\n\nPlease respond with a valid JSON object."
if full_messages and full_messages[0]["role"] == "system":
full_messages[0]["content"] += json_instruction
else:
full_messages.insert(0, {"role": "system", "content": json_instruction.strip()})
kwargs: dict[str, Any] = {
"model": self.model,
"messages": full_messages,
"max_tokens": max_tokens,
**self.extra_kwargs,
}
if self.api_key:
kwargs["api_key"] = self.api_key
if self.api_base:
kwargs["api_base"] = self.api_base
if tools:
kwargs["tools"] = [self._tool_to_openai_format(t) for t in tools]
if response_format:
kwargs["response_format"] = response_format
response = await self._acompletion_with_rate_limit_retry(max_retries=max_retries, **kwargs)
content = response.choices[0].message.content or ""
usage = response.usage
input_tokens = usage.prompt_tokens if usage else 0
output_tokens = usage.completion_tokens if usage else 0
return LLMResponse(
content=content,
model=response.model or self.model,
input_tokens=input_tokens,
output_tokens=output_tokens,
stop_reason=response.choices[0].finish_reason or "",
raw_response=response,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
max_tokens: int = 4096,
) -> LLMResponse:
"""Async version of complete_with_tools(). Uses litellm.acompletion — non-blocking."""
current_messages: list[dict[str, Any]] = []
if system:
current_messages.append({"role": "system", "content": system})
current_messages.extend(messages)
total_input_tokens = 0
total_output_tokens = 0
openai_tools = [self._tool_to_openai_format(t) for t in tools]
for _ in range(max_iterations):
kwargs: dict[str, Any] = {
"model": self.model,
"messages": current_messages,
"max_tokens": max_tokens,
"tools": openai_tools,
**self.extra_kwargs,
}
if self.api_key:
kwargs["api_key"] = self.api_key
if self.api_base:
kwargs["api_base"] = self.api_base
response = await self._acompletion_with_rate_limit_retry(**kwargs)
usage = response.usage
if usage:
total_input_tokens += usage.prompt_tokens
total_output_tokens += usage.completion_tokens
choice = response.choices[0]
message = choice.message
if choice.finish_reason == "stop" or not message.tool_calls:
return LLMResponse(
content=message.content or "",
model=response.model or self.model,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
stop_reason=choice.finish_reason or "stop",
raw_response=response,
)
current_messages.append(
{
"role": "assistant",
"content": message.content,
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {
"name": tc.function.name,
"arguments": tc.function.arguments,
},
}
for tc in message.tool_calls
],
}
)
for tool_call in message.tool_calls:
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
current_messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": "Invalid JSON arguments provided to tool.",
}
)
continue
tool_use = ToolUse(
id=tool_call.id,
name=tool_call.function.name,
input=args,
)
result = tool_executor(tool_use)
current_messages.append(
{
"role": "tool",
"tool_call_id": result.tool_use_id,
"content": result.content,
}
)
return LLMResponse(
content="Max tool iterations reached",
model=self.model,
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
stop_reason="max_iterations",
raw_response=None,
)
def _tool_to_openai_format(self, tool: Tool) -> dict[str, Any]:
"""Convert Tool to OpenAI function calling format."""
return {
@@ -591,7 +947,7 @@ class LiteLLMProvider(LLMProvider):
for event in tail_events:
yield event
return
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt)
token_count, token_method = _estimate_tokens(
self.model,
full_messages,
@@ -619,10 +975,10 @@ class LiteLLMProvider(LLMProvider):
except RateLimitError as e:
if attempt < RATE_LIMIT_MAX_RETRIES:
wait = RATE_LIMIT_BACKOFF_BASE * (2**attempt)
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[stream-retry] {self.model} rate limited (429): {e!s}. "
f"Retrying in {wait}s "
f"Retrying in {wait:.1f}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
@@ -631,5 +987,16 @@ class LiteLLMProvider(LLMProvider):
return
except Exception as e:
yield StreamErrorEvent(error=str(e), recoverable=False)
if _is_stream_transient_error(e) and attempt < RATE_LIMIT_MAX_RETRIES:
wait = _compute_retry_delay(attempt, exception=e)
logger.warning(
f"[stream-retry] {self.model} transient error "
f"({type(e).__name__}): {e!s}. "
f"Retrying in {wait:.1f}s "
f"(attempt {attempt + 1}/{RATE_LIMIT_MAX_RETRIES})"
)
await asyncio.sleep(wait)
continue
recoverable = _is_stream_transient_error(e)
yield StreamErrorEvent(error=str(e), recoverable=recoverable)
return
+39
View File
@@ -120,6 +120,7 @@ class MockLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a mock completion without calling a real LLM.
@@ -182,6 +183,44 @@ class MockLLMProvider(LLMProvider):
stop_reason="mock_complete",
)
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""Async mock completion (no I/O, returns immediately)."""
return self.complete(
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list[Tool],
tool_executor: Callable[[ToolUse], ToolResult],
max_iterations: int = 10,
) -> LLMResponse:
"""Async mock tool-use completion (no I/O, returns immediately)."""
return self.complete_with_tools(
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
)
async def stream(
self,
messages: list[dict[str, Any]],
+62 -1
View File
@@ -1,8 +1,10 @@
"""LLM Provider abstraction for pluggable LLM backends."""
import asyncio
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Callable
from dataclasses import dataclass, field
from functools import partial
from typing import Any
@@ -65,6 +67,7 @@ class LLMProvider(ABC):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
"""
Generate a completion from the LLM.
@@ -79,6 +82,8 @@ class LLMProvider(ABC):
- {"type": "json_schema", "json_schema": {"name": "...", "schema": {...}}}
for strict JSON schema enforcement
json_mode: If True, request structured JSON output from the LLM
max_retries: Override retry count for rate-limit/empty-response retries.
None uses the provider default.
Returns:
LLMResponse with content and metadata
@@ -109,6 +114,62 @@ class LLMProvider(ABC):
"""
pass
async def acomplete(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list["Tool"] | None = None,
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> "LLMResponse":
"""Async version of complete(). Non-blocking on the event loop.
Default implementation offloads the sync complete() to a thread pool.
Subclasses SHOULD override for native async I/O.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
partial(
self.complete,
messages=messages,
system=system,
tools=tools,
max_tokens=max_tokens,
response_format=response_format,
json_mode=json_mode,
max_retries=max_retries,
),
)
async def acomplete_with_tools(
self,
messages: list[dict[str, Any]],
system: str,
tools: list["Tool"],
tool_executor: Callable[["ToolUse"], "ToolResult"],
max_iterations: int = 10,
) -> "LLMResponse":
"""Async version of complete_with_tools(). Non-blocking on the event loop.
Default implementation offloads the sync complete_with_tools() to a thread pool.
Subclasses SHOULD override for native async I/O.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
partial(
self.complete_with_tools,
messages=messages,
system=system,
tools=tools,
tool_executor=tool_executor,
max_iterations=max_iterations,
),
)
async def stream(
self,
messages: list[dict[str, Any]],
@@ -132,7 +193,7 @@ class LLMProvider(ABC):
TextEndEvent,
)
response = self.complete(
response = await self.acomplete(
messages=messages,
system=system,
tools=tools,
+44 -609
View File
@@ -36,12 +36,7 @@ from framework.graph import ( # noqa: E402
NodeSpec,
SuccessCriterion,
)
from framework.graph.plan import Plan # noqa: E402
# Testing framework imports
from framework.testing.prompts import ( # noqa: E402
PYTEST_TEST_FILE_HEADER,
)
from framework.testing.prompts import PYTEST_TEST_FILE_HEADER # noqa: E402
from framework.utils.io import atomic_write # noqa: E402
# Initialize MCP server
@@ -587,13 +582,12 @@ def add_node(
description: Annotated[str, "What this node does"],
node_type: Annotated[
str,
"Type: event_loop (recommended), function, router. "
"Deprecated: llm_generate, llm_tool_use (use event_loop instead)",
"Type: event_loop (recommended), router.",
],
input_keys: Annotated[str, "JSON array of keys this node reads from shared memory"],
output_keys: Annotated[str, "JSON array of keys this node writes to shared memory"],
system_prompt: Annotated[str, "Instructions for LLM nodes"] = "",
tools: Annotated[str, "JSON array of tool names for event_loop or llm_tool_use nodes"] = "[]",
tools: Annotated[str, "JSON array of tool names for event_loop nodes"] = "[]",
routes: Annotated[
str, "JSON object mapping conditions to target node IDs for router nodes"
] = "{}",
@@ -665,24 +659,18 @@ def add_node(
errors.append("Node must have an id")
if not name:
errors.append("Node must have a name")
if node_type == "llm_tool_use" and not tools_list:
errors.append(f"Node '{node_id}' of type llm_tool_use must specify tools")
# Reject removed node types
if node_type in ("function", "llm_tool_use", "llm_generate"):
errors.append(f"Node type '{node_type}' is no longer supported. Use 'event_loop' instead.")
if node_type == "router" and not routes_dict:
errors.append(f"Router node '{node_id}' must specify routes")
if node_type in ("llm_generate", "llm_tool_use") and not system_prompt:
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
# EventLoopNode validation
if node_type == "event_loop" and not system_prompt:
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
# Deprecated type warnings
if node_type in ("llm_generate", "llm_tool_use"):
warnings.append(
f"Node type '{node_type}' is deprecated. Use 'event_loop' instead. "
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# Warn about client_facing on nodes with tools (likely autonomous work)
if node_type == "event_loop" and client_facing and tools_list:
warnings.append(
@@ -838,8 +826,7 @@ def update_node(
description: Annotated[str, "Updated description"] = "",
node_type: Annotated[
str,
"Updated type: event_loop (recommended), function, router. "
"Deprecated: llm_generate, llm_tool_use",
"Updated type: event_loop (recommended), router.",
] = "",
input_keys: Annotated[str, "Updated JSON array of input keys"] = "",
output_keys: Annotated[str, "Updated JSON array of output keys"] = "",
@@ -919,24 +906,19 @@ def update_node(
errors = []
warnings = []
if node.node_type == "llm_tool_use" and not node.tools:
errors.append(f"Node '{node_id}' of type llm_tool_use must specify tools")
# Reject removed node types
if node.node_type in ("function", "llm_tool_use", "llm_generate"):
errors.append(
f"Node type '{node.node_type}' is no longer supported. Use 'event_loop' instead."
)
if node.node_type == "router" and not node.routes:
errors.append(f"Router node '{node_id}' must specify routes")
if node.node_type in ("llm_generate", "llm_tool_use") and not node.system_prompt:
warnings.append(f"LLM node '{node_id}' should have a system_prompt")
# EventLoopNode validation
if node.node_type == "event_loop" and not node.system_prompt:
warnings.append(f"Event loop node '{node_id}' should have a system_prompt")
# Deprecated type warnings
if node.node_type in ("llm_generate", "llm_tool_use"):
warnings.append(
f"Node type '{node.node_type}' is deprecated. Use 'event_loop' instead. "
"EventLoopNode supports tool use, streaming, and judge-based evaluation."
)
# nullable_output_keys must be a subset of output_keys
if node.nullable_output_keys:
invalid_nullable = [k for k in node.nullable_output_keys if k not in node.output_keys]
@@ -1112,11 +1094,11 @@ def validate_graph() -> str:
if entry_candidates:
reachable = set()
# For pause/resume agents, start from ALL entry points (including resume)
if is_pause_resume_agent:
to_visit = list(entry_candidates) # All nodes without incoming edges
else:
to_visit = [entry_candidates[0]] # Just the primary entry
# Start from ALL entry candidates (nodes without incoming edges).
# This handles both pause/resume agents and async entry point agents
# where multiple nodes have no incoming edges (e.g., a primary entry
# node and an event-driven entry node).
to_visit = list(entry_candidates)
while to_visit:
current = to_visit.pop()
@@ -1390,17 +1372,6 @@ def validate_graph() -> str:
f"must be a subset of output_keys {node.output_keys}"
)
# Deprecated node type warnings
deprecated_nodes = [
{"node_id": n.id, "type": n.node_type, "replacement": "event_loop"}
for n in session.nodes
if n.node_type in ("llm_generate", "llm_tool_use")
]
for dn in deprecated_nodes:
warnings.append(
f"Node '{dn['node_id']}' uses deprecated type '{dn['type']}'. Use 'event_loop' instead."
)
# Warn if all event_loop nodes are client_facing (common misconfiguration)
el_nodes = [n for n in session.nodes if n.node_type == "event_loop"]
cf_el_nodes = [n for n in el_nodes if n.client_facing]
@@ -1436,7 +1407,6 @@ def validate_graph() -> str:
"event_loop_nodes": event_loop_nodes,
"client_facing_nodes": client_facing_nodes,
"feedback_edges": feedback_edges,
"deprecated_node_types": deprecated_nodes,
}
)
@@ -1646,9 +1616,8 @@ def export_graph() -> str:
"""
Export the validated graph as a GraphSpec for GraphExecutor.
Exports the complete agent definition including nodes, edges, goal,
and evaluation rules. The GraphExecutor runs the graph with dynamic
edge traversal and routing logic.
Exports the complete agent definition including nodes, edges, and goal.
The GraphExecutor runs the graph with dynamic edge traversal and routing logic.
AUTOMATICALLY WRITES FILES TO DISK:
- exports/{agent-name}/agent.json - Full agent specification
@@ -1856,7 +1825,6 @@ def export_graph() -> str:
"files_written": files_written,
"graph": graph_spec,
"goal": session.goal.model_dump(),
"evaluation_rules": _evaluation_rules,
"required_tools": list(all_tools),
"node_count": len(session.nodes),
"edge_count": len(edges_list),
@@ -1966,9 +1934,6 @@ def get_session_status() -> str:
"mcp_servers": [s["name"] for s in session.mcp_servers],
"event_loop_nodes": [n.id for n in session.nodes if n.node_type == "event_loop"],
"client_facing_nodes": [n.id for n in session.nodes if n.client_facing],
"deprecated_nodes": [
n.id for n in session.nodes if n.node_type in ("llm_generate", "llm_tool_use")
],
"feedback_edges": [e.id for e in session.edges if e.priority < 0],
}
)
@@ -2139,7 +2104,7 @@ def add_mcp_server(
"total_mcp_servers": len(session.mcp_servers),
"note": (
f"MCP server '{name}' registered with {len(tool_names)} tools. "
"These tools can now be used in llm_tool_use nodes."
"These tools can now be used in event_loop nodes."
),
},
indent=2,
@@ -2240,7 +2205,7 @@ def list_mcp_tools(
"success": True,
"tools_by_server": all_tools,
"total_tools": total_tools,
"note": "Use these tool names in the 'tools' parameter when adding llm_tool_use nodes",
"note": "Use these tool names in the 'tools' parameter when adding event_loop nodes",
},
indent=2,
)
@@ -2339,23 +2304,6 @@ def test_node(
+ f"Max visits per graph run: {node_spec.max_node_visits}."
)
elif node_spec.node_type in ("llm_generate", "llm_tool_use"):
# Legacy LLM node types
result["system_prompt"] = node_spec.system_prompt
result["available_tools"] = node_spec.tools
result["deprecation_warning"] = (
f"Node type '{node_spec.node_type}' is deprecated. Use 'event_loop' instead."
)
if mock_llm_response:
result["mock_response"] = mock_llm_response
result["simulation"] = "LLM would receive prompt and produce response"
else:
result["simulation"] = "LLM would be called with the system prompt and input data"
elif node_spec.node_type == "function":
result["simulation"] = "Function node would execute deterministic logic"
# Show memory state after (simulated)
result["expected_memory_state"] = {
"inputs_available": {k: input_data.get(k, "<not provided>") for k in node_spec.input_keys},
@@ -2449,7 +2397,7 @@ def test_graph(
"writes": current_node.output_keys,
}
if current_node.node_type in ("llm_generate", "llm_tool_use", "event_loop"):
if current_node.node_type == "event_loop":
step_info["prompt_preview"] = (
current_node.system_prompt[:200] + "..."
if current_node.system_prompt and len(current_node.system_prompt) > 200
@@ -2520,466 +2468,6 @@ def test_graph(
)
# =============================================================================
# FLEXIBLE EXECUTION TOOLS (Worker-Judge Pattern)
# =============================================================================
# Storage for evaluation rules
_evaluation_rules: list[dict] = []
@mcp.tool()
def add_evaluation_rule(
rule_id: Annotated[str, "Unique identifier for the rule"],
description: Annotated[str, "Human-readable description of what this rule checks"],
condition: Annotated[
str,
"Python expression with result, step, goal context. E.g., 'result.get(\"success\")'",
],
action: Annotated[str, "Action when rule matches: accept, retry, replan, escalate"],
feedback_template: Annotated[
str, "Template for feedback message, can use {result}, {step}"
] = "",
priority: Annotated[int, "Rule priority (higher = checked first)"] = 0,
) -> str:
"""
Add an evaluation rule for the HybridJudge.
Rules are checked in priority order before falling back to LLM evaluation.
Use this to define deterministic success/failure conditions.
Example conditions:
- 'result.get("success") == True' - Check for explicit success flag
- 'result.get("error_type") == "timeout"' - Check for specific error type
- 'len(result.get("data", [])) > 0' - Check for non-empty data
"""
global _evaluation_rules
# Validate action
valid_actions = ["accept", "retry", "replan", "escalate"]
if action.lower() not in valid_actions:
return json.dumps(
{
"success": False,
"error": f"Invalid action '{action}'. Must be one of: {valid_actions}",
}
)
# Check for duplicate
if any(r["id"] == rule_id for r in _evaluation_rules):
return json.dumps(
{
"success": False,
"error": f"Rule '{rule_id}' already exists",
}
)
rule = {
"id": rule_id,
"description": description,
"condition": condition,
"action": action.lower(),
"feedback_template": feedback_template,
"priority": priority,
}
_evaluation_rules.append(rule)
_evaluation_rules.sort(key=lambda r: -r["priority"])
return json.dumps(
{
"success": True,
"rule": rule,
"total_rules": len(_evaluation_rules),
}
)
@mcp.tool()
def list_evaluation_rules() -> str:
"""List all configured evaluation rules for the HybridJudge."""
return json.dumps(
{
"rules": _evaluation_rules,
"total": len(_evaluation_rules),
}
)
@mcp.tool()
def remove_evaluation_rule(
rule_id: Annotated[str, "ID of the rule to remove"],
) -> str:
"""Remove an evaluation rule."""
global _evaluation_rules
for i, rule in enumerate(_evaluation_rules):
if rule["id"] == rule_id:
_evaluation_rules.pop(i)
return json.dumps({"success": True, "removed": rule_id})
return json.dumps({"success": False, "error": f"Rule '{rule_id}' not found"})
@mcp.tool()
def create_plan(
plan_id: Annotated[str, "Unique identifier for the plan"],
goal_id: Annotated[str, "ID of the goal this plan achieves"],
description: Annotated[str, "Description of what this plan does"],
steps: Annotated[
str,
"JSON array of plan steps with id, description, action, inputs, outputs, deps",
],
context: Annotated[str, "JSON object with initial context for execution"] = "{}",
) -> str:
"""
Create a plan for flexible execution.
Plans are executed by the Worker-Judge loop. Each step specifies:
- id: Unique step identifier
- description: What this step does
- action: Object with action_type and parameters
- action_type: "llm_call", "tool_use", "function", "code_execution", "sub_graph"
- For llm_call: prompt, system_prompt
- For tool_use: tool_name, tool_args
- For function: function_name, function_args
- For code_execution: code
- inputs: Dict mapping input names to values or "$variable" references
- expected_outputs: List of output keys this step should produce
- dependencies: List of step IDs that must complete first (deps)
Example step:
{
"id": "step_1",
"description": "Fetch user data",
"action": {"action_type": "tool_use", "tool_name": "get_user", ...},
"inputs": {"user_id": "$input_user_id"},
"expected_outputs": ["user_data"],
"dependencies": []
}
"""
try:
steps_list = json.loads(steps)
context_dict = json.loads(context)
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
# Validate steps
errors = []
step_ids = set()
for i, step in enumerate(steps_list):
if "id" not in step:
errors.append(f"Step {i} missing 'id'")
else:
if step["id"] in step_ids:
errors.append(f"Duplicate step id: {step['id']}")
step_ids.add(step["id"])
if "description" not in step:
errors.append(f"Step {i} missing 'description'")
if "action" not in step:
errors.append(f"Step {i} missing 'action'")
elif "action_type" not in step.get("action", {}):
errors.append(f"Step {i} action missing 'action_type'")
# Check dependencies exist
for dep in step.get("dependencies", []):
if dep not in step_ids:
errors.append(f"Step {step.get('id', i)} has unknown dependency: {dep}")
if errors:
return json.dumps({"success": False, "errors": errors})
# Build plan object
plan = {
"id": plan_id,
"goal_id": goal_id,
"description": description,
"steps": steps_list,
"context": context_dict,
"revision": 1,
"created_at": datetime.now().isoformat(),
}
return json.dumps(
{
"success": True,
"plan": plan,
"step_count": len(steps_list),
"note": "Plan created. Use execute_plan to run it with the Worker-Judge loop.",
},
indent=2,
)
@mcp.tool()
def validate_plan(
plan_json: Annotated[str, "JSON string of the plan to validate"],
) -> str:
"""
Validate a plan structure before execution.
Checks:
- All required fields present
- No circular dependencies
- All dependencies reference existing steps
- Action types are valid
- Context flow: all $variable references can be resolved
"""
try:
plan = json.loads(plan_json)
except json.JSONDecodeError as e:
return json.dumps({"valid": False, "errors": [f"Invalid JSON: {e}"]})
errors = []
warnings = []
# Check required fields
required = ["id", "goal_id", "steps"]
for field in required:
if field not in plan:
errors.append(f"Missing required field: {field}")
if "steps" not in plan:
return json.dumps({"valid": False, "errors": errors})
steps = plan["steps"]
step_ids = {s.get("id") for s in steps if "id" in s}
steps_by_id = {s.get("id"): s for s in steps}
# Check each step
valid_action_types = ["llm_call", "tool_use", "function", "code_execution", "sub_graph"]
for i, step in enumerate(steps):
step_id = step.get("id", f"step_{i}")
# Check dependencies
for dep in step.get("dependencies", []):
if dep not in step_ids:
errors.append(f"Step '{step_id}': unknown dependency '{dep}'")
# Check action type
action = step.get("action", {})
action_type = action.get("action_type")
if action_type and action_type not in valid_action_types:
errors.append(f"Step '{step_id}': invalid action_type '{action_type}'")
# Check action has required params
if action_type == "llm_call" and not action.get("prompt"):
warnings.append(f"Step '{step_id}': llm_call without prompt")
if action_type == "tool_use" and not action.get("tool_name"):
errors.append(f"Step '{step_id}': tool_use requires tool_name")
if action_type == "code_execution" and not action.get("code"):
errors.append(f"Step '{step_id}': code_execution requires code")
# Check for circular dependencies
def has_cycle(step_id: str, visited: set, path: set) -> bool:
if step_id in path:
return True
if step_id in visited:
return False
visited.add(step_id)
path.add(step_id)
step = next((s for s in steps if s.get("id") == step_id), None)
if step:
for dep in step.get("dependencies", []):
if has_cycle(dep, visited, path):
return True
path.remove(step_id)
return False
for step in steps:
if has_cycle(step.get("id", ""), set(), set()):
errors.append(f"Circular dependency detected involving step '{step.get('id')}'")
break
# === CONTEXT FLOW VALIDATION ===
# Compute what keys each step can access (from dependencies' outputs)
# Build output map (step_id -> expected_outputs)
step_outputs: dict[str, set[str]] = {}
for step in steps:
step_outputs[step.get("id", "")] = set(step.get("expected_outputs", []))
# Compute available context for each step in topological order
available_context: dict[str, set[str]] = {}
computed = set()
remaining = set(step_ids)
# Get initial context keys from plan.context
initial_context = set(plan.get("context", {}).keys())
for _ in range(len(steps) * 2):
if not remaining:
break
for step_id in list(remaining):
step = steps_by_id.get(step_id)
if not step:
remaining.discard(step_id)
continue
deps = step.get("dependencies", [])
# Can compute if all dependencies are computed
if all(d in computed for d in deps):
# Collect outputs from all dependencies (transitive)
available = set(initial_context)
for dep_id in deps:
available.update(step_outputs.get(dep_id, set()))
available.update(available_context.get(dep_id, set()))
available_context[step_id] = available
computed.add(step_id)
remaining.discard(step_id)
break
# Check each step's inputs can be resolved
context_errors = []
context_warnings = []
for step in steps:
step_id = step.get("id", "")
available = available_context.get(step_id, set())
deps = step.get("dependencies", [])
inputs = step.get("inputs", {})
missing_vars = []
for _, input_value in inputs.items():
# Check $variable references
if isinstance(input_value, str) and input_value.startswith("$"):
var_name = input_value[1:] # Remove $ prefix
if var_name not in available:
missing_vars.append(var_name)
if missing_vars:
if not deps:
# Entry step - inputs must come from initial context
context_warnings.append(
f"Step '{step_id}' requires ${missing_vars} from initial context. "
f"Ensure these are provided when running the agent: {missing_vars}"
)
else:
# Find which step could provide each missing var
suggestions = []
for var in missing_vars:
producers = [s.get("id") for s in steps if var in s.get("expected_outputs", [])]
if producers:
suggestions.append(f"${var} is produced by {producers} - add as dependency")
else:
suggestions.append(
f"${var} is not produced by any step - add a step that outputs '{var}'"
)
context_errors.append(
f"Step '{step_id}' references ${missing_vars} but deps "
f"{deps} don't provide them. Suggestions: {'; '.join(suggestions)}"
)
errors.extend(context_errors)
warnings.extend(context_warnings)
return json.dumps(
{
"valid": len(errors) == 0,
"errors": errors,
"warnings": warnings,
"step_count": len(steps),
"context_flow": {step_id: list(keys) for step_id, keys in available_context.items()}
if available_context
else None,
}
)
@mcp.tool()
def simulate_plan_execution(
plan_json: Annotated[str, "JSON string of the plan to simulate"],
max_steps: Annotated[int, "Maximum steps to simulate"] = 20,
) -> str:
"""
Simulate plan execution without actually running it.
Shows the order steps would execute based on dependencies.
Useful for understanding the execution flow before running.
"""
try:
plan = json.loads(plan_json)
except json.JSONDecodeError as e:
return json.dumps({"success": False, "error": f"Invalid JSON: {e}"})
# Validate first
validation = json.loads(validate_plan(plan_json))
if not validation["valid"]:
return json.dumps(
{
"success": False,
"error": "Plan is not valid",
"validation_errors": validation["errors"],
}
)
steps = plan.get("steps", [])
completed = set()
execution_order = []
iteration = 0
while len(completed) < len(steps) and iteration < max_steps:
iteration += 1
# Find ready steps
ready = []
for step in steps:
step_id = step.get("id")
if step_id in completed:
continue
deps = set(step.get("dependencies", []))
if deps.issubset(completed):
ready.append(step)
if not ready:
break
# Execute first ready step (in real execution, could be parallel)
step = ready[0]
step_id = step.get("id")
execution_order.append(
{
"iteration": iteration,
"step_id": step_id,
"description": step.get("description"),
"action_type": step.get("action", {}).get("action_type"),
"dependencies_met": list(step.get("dependencies", [])),
"parallel_candidates": [s.get("id") for s in ready[1:]],
}
)
completed.add(step_id)
remaining = [s.get("id") for s in steps if s.get("id") not in completed]
return json.dumps(
{
"success": True,
"execution_order": execution_order,
"steps_simulated": len(execution_order),
"remaining_steps": remaining,
"plan_complete": len(remaining) == 0,
"note": (
"This is a simulation. Actual execution may differ "
"based on step results and judge decisions."
),
},
indent=2,
)
# =============================================================================
# TESTING TOOLS (Goal-Based Evaluation)
# =============================================================================
@@ -3713,90 +3201,37 @@ def list_tests(
)
# =============================================================================
# PLAN LOADING AND EXECUTION
# =============================================================================
def load_plan_from_json(plan_json: str | dict) -> Plan:
"""
Load a Plan object from exported JSON.
Args:
plan_json: JSON string or dict from export_graph()
Returns:
Plan object ready for FlexibleGraphExecutor
"""
from framework.graph.plan import Plan
return Plan.from_json(plan_json)
@mcp.tool()
def load_exported_plan(
plan_json: Annotated[str, "JSON string from export_graph() output"],
) -> str:
"""
Validate and load an exported plan, returning its structure.
Use this to verify a plan can be loaded before execution.
"""
try:
plan = load_plan_from_json(plan_json)
return json.dumps(
{
"success": True,
"plan_id": plan.id,
"goal_id": plan.goal_id,
"description": plan.description,
"step_count": len(plan.steps),
"steps": [
{
"id": s.id,
"description": s.description,
"action_type": s.action.action_type.value,
"dependencies": s.dependencies,
}
for s in plan.steps
],
},
indent=2,
)
except Exception as e:
return json.dumps({"success": False, "error": str(e)})
# =============================================================================
# CREDENTIAL STORE TOOLS
# =============================================================================
def _get_credential_store():
"""Get a CredentialStore that checks encrypted files and env vars.
"""Get a CredentialStore that checks encrypted files, env vars, and Aden sync.
Uses CompositeStorage: encrypted file storage (primary) with env var fallback.
This ensures credentials stored via `store_credential` AND env vars are both found.
Uses CredentialStoreAdapter.default() which handles:
- Aden sync + provider index (resolving hashed IDs for OAuth)
- CompositeStorage (encrypted primary + env fallback)
- Auto-refresh of OAuth tokens
- Graceful fallback if Aden is unavailable
"""
from framework.credentials import CredentialStore
from framework.credentials.storage import CompositeStorage, EncryptedFileStorage, EnvVarStorage
# Build env var mapping from CREDENTIAL_SPECS for the fallback
env_mapping: dict[str, str] = {}
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from aden_tools.credentials.store_adapter import CredentialStoreAdapter
for name, spec in CREDENTIAL_SPECS.items():
cred_id = spec.credential_id or name
env_mapping[cred_id] = spec.env_var
return CredentialStoreAdapter.default().store
except ImportError:
pass
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
storage = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping=env_mapping)],
)
return CredentialStore(storage=storage)
storage = CompositeStorage(
primary=EncryptedFileStorage(),
fallbacks=[EnvVarStorage(env_mapping={})],
)
return CredentialStore(storage=storage)
@mcp.tool()
+173 -3
View File
@@ -331,6 +331,20 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
)
resume_parser.set_defaults(func=cmd_resume)
# setup-credentials command
setup_creds_parser = subparsers.add_parser(
"setup-credentials",
help="Interactive credential setup",
description="Guide through setting up required credentials for an agent.",
)
setup_creds_parser.add_argument(
"agent_path",
type=str,
nargs="?",
help="Path to agent folder (optional - runs general setup if not specified)",
)
setup_creds_parser.set_defaults(func=cmd_setup_credentials)
def _load_resume_state(
agent_path: str, session_id: str, checkpoint_id: str | None = None
@@ -362,6 +376,7 @@ def _load_resume_state(
except (json.JSONDecodeError, OSError):
return None
return {
"resume_session_id": session_id,
"memory": cp_data.get("shared_memory", {}),
"paused_at": cp_data.get("next_node") or cp_data.get("current_node"),
"execution_path": cp_data.get("execution_path", []),
@@ -379,6 +394,7 @@ def _load_resume_state(
progress = state_data.get("progress", {})
paused_at = progress.get("paused_at") or progress.get("resume_from")
return {
"resume_session_id": session_id,
"memory": state_data.get("memory", {}),
"paused_at": paused_at,
"execution_path": progress.get("path", []),
@@ -386,6 +402,40 @@ def _load_resume_state(
}
def _prompt_before_start(agent_path: str, runner, model: str | None = None):
"""Prompt user to start agent or update credentials.
Returns:
Updated runner if user proceeds, None if user aborts.
"""
from framework.credentials.setup import CredentialSetupSession
from framework.runner import AgentRunner
while True:
print()
try:
choice = input("Press Enter to start agent, or 'u' to update credentials: ").strip()
except (EOFError, KeyboardInterrupt):
print()
return None
if choice == "":
return runner
elif choice.lower() == "u":
session = CredentialSetupSession.from_agent_path(agent_path)
result = session.run_interactive()
if result.success:
# Reload runner with updated credentials
try:
runner = AgentRunner.load(agent_path, model=model)
except Exception as e:
print(f"Error reloading agent: {e}")
return None
# Loop back to prompt again
elif choice.lower() == "q":
return None
def cmd_run(args: argparse.Namespace) -> int:
"""Run an exported agent."""
import logging
@@ -432,11 +482,46 @@ def cmd_run(args: argparse.Namespace) -> int:
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
# Offer interactive credential setup if running in a terminal
if sys.stdin.isatty():
print()
try:
choice = input("Would you like to set up credentials now? [Y/n]: ")
choice = choice.strip()
except (EOFError, KeyboardInterrupt):
print()
return
if choice.lower() != "n":
from framework.credentials.setup import CredentialSetupSession
session = CredentialSetupSession.from_agent_path(args.agent_path)
result = session.run_interactive()
if result.success:
# Retry loading with credentials now configured
try:
runner = AgentRunner.load(args.agent_path, model=args.model)
except CredentialError as retry_e:
print(f"\n{retry_e}", file=sys.stderr)
return
except Exception as retry_e:
print(f"Error loading agent: {retry_e}")
return
else:
return
else:
return
else:
return
except Exception as e:
print(f"Error loading agent: {e}")
return
# Prompt before starting (allows credential updates)
if sys.stdin.isatty():
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return
# Force setup inside the loop
if runner._agent_runtime is None:
runner._setup()
@@ -475,11 +560,45 @@ def cmd_run(args: argparse.Namespace) -> int:
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
# Offer interactive credential setup if running in a terminal
if sys.stdin.isatty():
print()
try:
choice = input("Would you like to set up credentials now? [Y/n]: ").strip()
except (EOFError, KeyboardInterrupt):
print()
return 1
if choice.lower() != "n":
from framework.credentials.setup import CredentialSetupSession
session = CredentialSetupSession.from_agent_path(args.agent_path)
result = session.run_interactive()
if result.success:
# Retry loading with credentials now configured
try:
runner = AgentRunner.load(args.agent_path, model=args.model)
except CredentialError as retry_e:
print(f"\n{retry_e}", file=sys.stderr)
return 1
except Exception as retry_e:
print(f"Error loading agent: {retry_e}")
return 1
else:
return 1
else:
return 1
else:
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Prompt before starting (allows credential updates)
if sys.stdin.isatty() and not args.quiet:
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return 1
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
@@ -1281,7 +1400,35 @@ def cmd_tui(args: argparse.Namespace) -> int:
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
# Offer interactive credential setup if running in a terminal
if sys.stdin.isatty():
print()
try:
choice = input("Would you like to set up credentials now? [Y/n]: ").strip()
except (EOFError, KeyboardInterrupt):
print()
return
if choice.lower() != "n":
from framework.credentials.setup import CredentialSetupSession
session = CredentialSetupSession.from_agent_path(agent_path)
result = session.run_interactive()
if result.success:
# Retry loading with credentials now configured
try:
runner = AgentRunner.load(agent_path, model=args.model)
except CredentialError as retry_e:
print(f"\n{retry_e}", file=sys.stderr)
return
except Exception as retry_e:
print(f"Error loading agent: {retry_e}")
return
else:
return
else:
return
else:
return
except Exception as e:
print(f"Error loading agent: {e}")
return
@@ -1447,6 +1594,7 @@ def _select_agent(agents_dir: Path) -> str | None:
for path in agents_dir.iterdir():
if _is_valid_agent_dir(path):
agents.append(path)
agents.sort(key=lambda p: p.name)
if not agents:
print(f"No agents found in {agents_dir}", file=sys.stderr)
@@ -1716,3 +1864,25 @@ def cmd_resume(args: argparse.Namespace) -> int:
if args.tui:
print("Mode: TUI")
return 1
def cmd_setup_credentials(args: argparse.Namespace) -> int:
"""Interactive credential setup for an agent."""
from framework.credentials.setup import CredentialSetupSession
agent_path = getattr(args, "agent_path", None)
if agent_path:
# Setup credentials for a specific agent
session = CredentialSetupSession.from_agent_path(agent_path)
else:
# No agent specified - show usage
print("Usage: hive setup-credentials <agent_path>")
print()
print("Examples:")
print(" hive setup-credentials exports/my-agent")
print(" hive setup-credentials examples/templates/deep_research_agent")
return 1
result = session.run_interactive()
return 0 if result.success else 1
+5 -2
View File
@@ -183,8 +183,11 @@ class MCPClient:
from mcp import ClientSession
from mcp.client.stdio import stdio_client
# Create persistent stdio client context
self._stdio_context = stdio_client(server_params)
# Create persistent stdio client context.
# Redirect server stderr to devnull to prevent raw
# output from leaking behind the TUI.
devnull = open(os.devnull, "w") # noqa: SIM115
self._stdio_context = stdio_client(server_params, errlog=devnull)
(
self._read_stream,
self._write_stream,
+1 -1
View File
@@ -456,7 +456,7 @@ Respond with JSON only:
}}"""
try:
response = self._llm.complete(
response = await self._llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a request router. Respond with JSON only.",
max_tokens=256,
+68 -199
View File
@@ -9,6 +9,10 @@ from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
validate_agent_credentials,
)
from framework.graph import Goal
from framework.graph.edge import (
DEFAULT_MAX_TOKENS,
@@ -21,7 +25,7 @@ from framework.graph.executor import ExecutionResult
from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, Tool
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
from framework.runtime.agent_runtime import AgentRuntime, AgentRuntimeConfig, create_agent_runtime
from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.runtime_log_store import RuntimeLogStore
@@ -31,32 +35,6 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
def _ensure_credential_key_env() -> None:
"""Load HIVE_CREDENTIAL_KEY from shell config if not already in environment.
The setup-credentials skill writes the encryption key to ~/.zshrc or ~/.bashrc.
If the user hasn't sourced their config in the current shell, this reads it
directly so the runner (and any MCP subprocesses it spawns) can unlock the
encrypted credential store.
Only HIVE_CREDENTIAL_KEY is loaded this way all other secrets (API keys, etc.)
come from the credential store itself.
"""
if os.environ.get("HIVE_CREDENTIAL_KEY"):
return
try:
from aden_tools.credentials.shell_config import check_env_var_in_shell_config
found, value = check_env_var_in_shell_config("HIVE_CREDENTIAL_KEY")
if found and value:
os.environ["HIVE_CREDENTIAL_KEY"] = value
logger.debug("Loaded HIVE_CREDENTIAL_KEY from shell config")
except ImportError:
pass
CLAUDE_CREDENTIALS_FILE = Path.home() / ".claude" / ".credentials.json"
@@ -268,6 +246,7 @@ class AgentRunner:
storage_path: Path | None = None,
model: str | None = None,
intro_message: str = "",
runtime_config: "AgentRuntimeConfig | None" = None,
):
"""
Initialize the runner (use AgentRunner.load() instead).
@@ -280,6 +259,7 @@ class AgentRunner:
storage_path: Path for runtime storage (defaults to temp)
model: Model to use (reads from agent config or ~/.hive/configuration.json if None)
intro_message: Optional greeting shown to user on TUI load
runtime_config: Optional AgentRuntimeConfig (webhook settings, etc.)
"""
self.agent_path = agent_path
self.graph = graph
@@ -287,6 +267,7 @@ class AgentRunner:
self.mock_mode = mock_mode
self.model = model or self._resolve_default_model()
self.intro_message = intro_message
self.runtime_config = runtime_config
# Set up storage
if storage_path:
@@ -331,88 +312,8 @@ class AgentRunner:
"""Check that required credentials are available before spawning MCP servers.
Raises CredentialError with actionable guidance if any are missing.
Uses graph node specs + CREDENTIAL_SPECS no tool registry needed.
"""
required_tools: set[str] = set()
for node in self.graph.nodes:
if node.tools:
required_tools.update(node.tools)
node_types: set[str] = {node.node_type for node in self.graph.nodes}
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
except ImportError:
return # aden_tools not installed, skip check
# Build credential store (same logic as validate())
env_mapping = {
(spec.credential_id or name): spec.env_var for name, spec in CREDENTIAL_SPECS.items()
}
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(primary=storages[0], fallbacks=storages[1:])
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
missing: list[str] = []
checked: set[str] = set()
# Check tool credentials
for tool_name in sorted(required_tools):
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected = sorted(t for t in required_tools if t in spec.tools)
entry = f" {spec.env_var} for {', '.join(affected)}"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
# Check node type credentials (e.g., ANTHROPIC_API_KEY for LLM nodes)
for nt in sorted(node_types):
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
affected_types = sorted(t for t in node_types if t in spec.node_types)
entry = f" {spec.env_var} for {', '.join(affected_types)} nodes"
if spec.help_url:
entry += f"\n Get it at: {spec.help_url}"
missing.append(entry)
if missing:
from framework.credentials.models import CredentialError
lines = ["Missing required credentials:\n"]
lines.extend(missing)
lines.append("\nTo fix: run /hive-credentials in Claude Code.")
raise CredentialError("\n".join(lines))
validate_agent_credentials(self.graph.nodes)
@staticmethod
def _import_agent_module(agent_path: Path):
@@ -510,18 +411,32 @@ class AgentRunner:
intro_message = agent_metadata.intro_message
# Build GraphSpec from module-level variables
graph = GraphSpec(
id=f"{agent_path.name}-graph",
goal_id=goal.id,
version="1.0.0",
entry_node=getattr(agent_module, "entry_node", nodes[0].id),
entry_points=getattr(agent_module, "entry_points", {}),
terminal_nodes=getattr(agent_module, "terminal_nodes", []),
pause_nodes=getattr(agent_module, "pause_nodes", []),
nodes=nodes,
edges=edges,
max_tokens=max_tokens,
)
graph_kwargs: dict = {
"id": f"{agent_path.name}-graph",
"goal_id": goal.id,
"version": "1.0.0",
"entry_node": getattr(agent_module, "entry_node", nodes[0].id),
"entry_points": getattr(agent_module, "entry_points", {}),
"async_entry_points": getattr(agent_module, "async_entry_points", []),
"terminal_nodes": getattr(agent_module, "terminal_nodes", []),
"pause_nodes": getattr(agent_module, "pause_nodes", []),
"nodes": nodes,
"edges": edges,
"max_tokens": max_tokens,
"loop_config": getattr(agent_module, "loop_config", {}),
}
# Only pass optional fields if explicitly defined by the agent module
conversation_mode = getattr(agent_module, "conversation_mode", None)
if conversation_mode is not None:
graph_kwargs["conversation_mode"] = conversation_mode
identity_prompt = getattr(agent_module, "identity_prompt", None)
if identity_prompt is not None:
graph_kwargs["identity_prompt"] = identity_prompt
graph = GraphSpec(**graph_kwargs)
# Read runtime config (webhook settings, etc.) if defined
agent_runtime_config = getattr(agent_module, "runtime_config", None)
return cls(
agent_path=agent_path,
@@ -531,6 +446,7 @@ class AgentRunner:
storage_path=storage_path,
model=model,
intro_message=intro_message,
runtime_config=agent_runtime_config,
)
# Fallback: load from agent.json (legacy JSON-based agents)
@@ -686,7 +602,9 @@ class AgentRunner:
else:
# Fall back to environment variable
# First check api_key_env_var from config (set by quickstart)
api_key_env = llm_config.get("api_key_env_var") or self._get_api_key_env_var(self.model)
api_key_env = llm_config.get("api_key_env_var") or self._get_api_key_env_var(
self.model
)
if api_key_env and os.environ.get(api_key_env):
self._llm = LiteLLMProvider(model=self.model)
else:
@@ -784,17 +702,19 @@ class AgentRunner:
)
entry_points.append(ep)
# Single-entry agent with no async entry points: create a default entry point
if not entry_points and self.graph.entry_node:
logger.info("Creating default entry point for single-entry agent")
entry_points.append(
# Always create a primary entry point for the graph's entry node.
# For multi-entry-point agents this ensures the primary path (e.g.
# user-facing rule setup) is reachable alongside async entry points.
if self.graph.entry_node:
entry_points.insert(
0,
EntryPointSpec(
id="default",
name="Default",
entry_node=self.graph.entry_node,
trigger_type="manual",
isolation_level="shared",
)
),
)
# Create AgentRuntime with all entry points
@@ -821,6 +741,7 @@ class AgentRunner:
tool_executor=tool_executor,
runtime_log_store=log_store,
checkpoint_config=checkpoint_config,
config=self.runtime_config,
)
# Pass intro_message through for TUI display
@@ -1143,88 +1064,36 @@ class AgentRunner:
warnings.append(f"Missing tool implementations: {', '.join(missing_tools)}")
# Check credentials for required tools and node types
# Uses CredentialStore (encrypted files + env var fallback)
# Uses CredentialStoreAdapter.default() which includes Aden sync support
missing_credentials = []
try:
from aden_tools.credentials import CREDENTIAL_SPECS
from aden_tools.credentials.store_adapter import CredentialStoreAdapter
from framework.credentials import CredentialStore
from framework.credentials.storage import (
CompositeStorage,
EncryptedFileStorage,
EnvVarStorage,
)
# Build env mapping for credential lookup
env_mapping = {
(spec.credential_id or name): spec.env_var
for name, spec in CREDENTIAL_SPECS.items()
}
# Only use EncryptedFileStorage if the encryption key is configured;
# otherwise just check env vars (avoids generating a throwaway key)
storages: list = [EnvVarStorage(env_mapping=env_mapping)]
if os.environ.get("HIVE_CREDENTIAL_KEY"):
storages.insert(0, EncryptedFileStorage())
if len(storages) == 1:
storage = storages[0]
else:
storage = CompositeStorage(
primary=storages[0],
fallbacks=storages[1:],
)
store = CredentialStore(storage=storage)
# Build reverse mappings
tool_to_cred: dict[str, str] = {}
node_type_to_cred: dict[str, str] = {}
for cred_name, spec in CREDENTIAL_SPECS.items():
for tool_name in spec.tools:
tool_to_cred[tool_name] = cred_name
for nt in spec.node_types:
node_type_to_cred[nt] = cred_name
adapter = CredentialStoreAdapter.default()
# Check tool credentials
checked: set[str] = set()
for tool_name in info.required_tools:
cred_name = tool_to_cred.get(tool_name)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
missing_credentials.append(spec.env_var)
affected_tools = [t for t in info.required_tools if t in spec.tools]
tools_str = ", ".join(affected_tools)
warning_msg = f"Missing {spec.env_var} for {tools_str}"
if spec.help_url:
warning_msg += f"\n Get it at: {spec.help_url}"
warnings.append(warning_msg)
for _cred_name, spec in adapter.get_missing_for_tools(list(info.required_tools)):
missing_credentials.append(spec.env_var)
affected_tools = [t for t in info.required_tools if t in spec.tools]
tools_str = ", ".join(affected_tools)
warning_msg = f"Missing {spec.env_var} for {tools_str}"
if spec.help_url:
warning_msg += f"\n Get it at: {spec.help_url}"
warnings.append(warning_msg)
# Check node type credentials (e.g., ANTHROPIC_API_KEY for LLM nodes)
node_types = list({node.node_type for node in self.graph.nodes})
for nt in node_types:
cred_name = node_type_to_cred.get(nt)
if cred_name is None or cred_name in checked:
continue
checked.add(cred_name)
spec = CREDENTIAL_SPECS[cred_name]
cred_id = spec.credential_id or cred_name
if spec.required and not store.is_available(cred_id):
missing_credentials.append(spec.env_var)
affected_types = [t for t in node_types if t in spec.node_types]
types_str = ", ".join(affected_types)
warning_msg = f"Missing {spec.env_var} for {types_str} nodes"
if spec.help_url:
warning_msg += f"\n Get it at: {spec.help_url}"
warnings.append(warning_msg)
for _cred_name, spec in adapter.get_missing_for_node_types(node_types):
missing_credentials.append(spec.env_var)
affected_types = [t for t in node_types if t in spec.node_types]
types_str = ", ".join(affected_types)
warning_msg = f"Missing {spec.env_var} for {types_str} nodes"
if spec.help_url:
warning_msg += f"\n Get it at: {spec.help_url}"
warnings.append(warning_msg)
except ImportError:
# aden_tools not installed - fall back to direct check
has_llm_nodes = any(
node.node_type in ("llm_generate", "llm_tool_use") for node in self.graph.nodes
)
has_llm_nodes = any(node.node_type == "event_loop" for node in self.graph.nodes)
if has_llm_nodes:
api_key_env = self._get_api_key_env_var(self.model)
if api_key_env and not os.environ.get(api_key_env):
@@ -1306,7 +1175,7 @@ Respond with JSON only:
}}"""
try:
response = eval_llm.complete(
response = await eval_llm.acomplete(
messages=[{"role": "user", "content": prompt}],
system="You are a capability evaluator. Respond with JSON only.",
max_tokens=256,
+244 -1
View File
@@ -7,8 +7,9 @@ while preserving the goal-driven approach.
import asyncio
import logging
import time
from collections.abc import Callable
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING, Any
@@ -39,6 +40,11 @@ class AgentRuntimeConfig:
max_history: int = 1000
execution_result_max: int = 1000
execution_result_ttl_seconds: float | None = None
# Webhook server config (only starts if webhook_routes is non-empty)
webhook_host: str = "127.0.0.1"
webhook_port: int = 8080
webhook_routes: list[dict] = field(default_factory=list)
# Each dict: {"source_id": str, "path": str, "methods": ["POST"], "secret": str|None}
class AgentRuntime:
@@ -150,6 +156,15 @@ class AgentRuntime:
self._entry_points: dict[str, EntryPointSpec] = {}
self._streams: dict[str, ExecutionStream] = {}
# Webhook server (created on start if webhook_routes configured)
self._webhook_server: Any = None
# Event-driven entry point subscriptions
self._event_subscriptions: list[str] = []
# Timer tasks for scheduled entry points
self._timer_tasks: list[asyncio.Task] = []
# Next fire time for each timer entry point (ep_id -> datetime)
self._timer_next_fire: dict[str, float] = {}
# State
self._running = False
self._lock = asyncio.Lock()
@@ -234,6 +249,137 @@ class AgentRuntime:
await stream.start()
self._streams[ep_id] = stream
# Start webhook server if routes are configured
if self._config.webhook_routes:
from framework.runtime.webhook_server import (
WebhookRoute,
WebhookServer,
WebhookServerConfig,
)
wh_config = WebhookServerConfig(
host=self._config.webhook_host,
port=self._config.webhook_port,
)
self._webhook_server = WebhookServer(self._event_bus, wh_config)
for rc in self._config.webhook_routes:
route = WebhookRoute(
source_id=rc["source_id"],
path=rc["path"],
methods=rc.get("methods", ["POST"]),
secret=rc.get("secret"),
)
self._webhook_server.add_route(route)
await self._webhook_server.start()
# Subscribe event-driven entry points to EventBus
from framework.runtime.event_bus import EventType as _ET
for ep_id, spec in self._entry_points.items():
if spec.trigger_type != "event":
continue
tc = spec.trigger_config
event_types = [_ET(et) for et in tc.get("event_types", [])]
if not event_types:
logger.warning(
f"Entry point '{ep_id}' has trigger_type='event' "
"but no event_types in trigger_config"
)
continue
# Capture ep_id in closure
def _make_handler(entry_point_id: str):
async def _on_event(event):
if self._running and entry_point_id in self._streams:
# Run in the same session as the primary entry
# point so memory (e.g. user-defined rules) is
# shared and logs land in one session directory.
session_state = self._get_primary_session_state(
exclude_entry_point=entry_point_id
)
await self.trigger(
entry_point_id,
{"event": event.to_dict()},
session_state=session_state,
)
return _on_event
sub_id = self._event_bus.subscribe(
event_types=event_types,
handler=_make_handler(ep_id),
filter_stream=tc.get("filter_stream"),
filter_node=tc.get("filter_node"),
)
self._event_subscriptions.append(sub_id)
# Start timer-driven entry points
for ep_id, spec in self._entry_points.items():
if spec.trigger_type != "timer":
continue
tc = spec.trigger_config
interval = tc.get("interval_minutes")
if not interval or interval <= 0:
logger.warning(
f"Entry point '{ep_id}' has trigger_type='timer' "
"but no valid interval_minutes in trigger_config"
)
continue
run_immediately = tc.get("run_immediately", False)
def _make_timer(entry_point_id: str, mins: float, immediate: bool):
async def _timer_loop():
interval_secs = mins * 60
if not immediate:
self._timer_next_fire[entry_point_id] = time.monotonic() + interval_secs
await asyncio.sleep(interval_secs)
while self._running:
self._timer_next_fire.pop(entry_point_id, None)
try:
if self._should_skip_timer(entry_point_id):
logger.info(
"Timer '%s' skipped — primary stream busy",
entry_point_id,
)
else:
session_state = self._get_primary_session_state(
exclude_entry_point=entry_point_id
)
await self.trigger(
entry_point_id,
{"event": {"source": "timer", "reason": "scheduled"}},
session_state=session_state,
)
logger.info(
"Timer fired for entry point '%s' (next in %s min)",
entry_point_id,
mins,
)
except Exception:
logger.error(
"Timer trigger failed for '%s'",
entry_point_id,
exc_info=True,
)
self._timer_next_fire[entry_point_id] = time.monotonic() + interval_secs
await asyncio.sleep(interval_secs)
return _timer_loop
task = asyncio.create_task(_make_timer(ep_id, interval, run_immediately)())
self._timer_tasks.append(task)
logger.info(
"Started timer for entry point '%s' every %s min%s",
ep_id,
interval,
" (immediate first run)" if run_immediately else "",
)
self._running = True
logger.info(f"AgentRuntime started with {len(self._streams)} streams")
@@ -243,6 +389,21 @@ class AgentRuntime:
return
async with self._lock:
# Cancel timer tasks
for task in self._timer_tasks:
task.cancel()
self._timer_tasks.clear()
# Unsubscribe event-driven entry points
for sub_id in self._event_subscriptions:
self._event_bus.unsubscribe(sub_id)
self._event_subscriptions.clear()
# Stop webhook server
if self._webhook_server:
await self._webhook_server.stop()
self._webhook_server = None
# Stop all streams
for stream in self._streams.values():
await stream.stop()
@@ -314,6 +475,83 @@ class AgentRuntime:
raise ValueError(f"Entry point '{entry_point_id}' not found")
return await stream.wait_for_completion(exec_id, timeout)
def _should_skip_timer(self, timer_ep_id: str) -> bool:
"""Return True if a non-timer stream is actively running (not waiting for input).
Timers should only fire when the primary stream is idle (blocked
waiting for client input) or has no active execution. This prevents
concurrent pipeline runs that would race on shared memory.
"""
for ep_id, stream in self._streams.items():
if ep_id == timer_ep_id:
continue
spec = self._entry_points.get(ep_id)
if spec and spec.trigger_type == "timer":
continue
if stream.active_execution_ids and not stream.is_awaiting_input:
return True
return False
def _get_primary_session_state(self, exclude_entry_point: str) -> dict[str, Any] | None:
"""Build session_state so an async entry point runs in the primary session.
Looks for an active execution from another stream (the "primary"
session, e.g. the user-facing intake loop) and returns a
``session_state`` dict containing:
- ``resume_session_id``: reuse the same session directory
- ``memory``: only the keys that the async entry node declares
as inputs (e.g. ``rules``, ``max_emails``). Stale outputs
from previous runs (``emails``, ``actions_taken``, ) are
excluded so each trigger starts fresh.
The memory is read from the primary session's ``state.json``
which is kept up-to-date by ``GraphExecutor._write_progress()``
at every node transition.
Returns ``None`` if no primary session is active (the webhook
execution will just create its own session).
"""
import json as _json
# Determine which memory keys the async entry node needs.
allowed_keys: set[str] | None = None
ep_spec = self._entry_points.get(exclude_entry_point)
if ep_spec:
entry_node = self.graph.get_node(ep_spec.entry_node)
if entry_node and entry_node.input_keys:
allowed_keys = set(entry_node.input_keys)
for ep_id, stream in self._streams.items():
if ep_id == exclude_entry_point:
continue
for exec_id in stream.active_execution_ids:
state_path = self._storage.base_path / "sessions" / exec_id / "state.json"
try:
if state_path.exists():
data = _json.loads(state_path.read_text(encoding="utf-8"))
full_memory = data.get("memory", {})
if not full_memory:
continue
# Filter to only input keys so stale outputs
# from previous triggers don't leak through.
if allowed_keys is not None:
memory = {k: v for k, v in full_memory.items() if k in allowed_keys}
else:
memory = full_memory
if memory:
return {
"resume_session_id": exec_id,
"memory": memory,
}
except Exception:
logger.debug(
"Could not read state.json for %s: skipping",
exec_id,
exc_info=True,
)
return None
async def inject_input(self, node_id: str, content: str) -> bool:
"""Inject user input into a running client-facing node.
@@ -448,6 +686,11 @@ class AgentRuntime:
"""Access the outcome aggregator."""
return self._outcome_aggregator
@property
def webhook_server(self) -> Any:
"""Access the webhook server (None if no webhook entry points)."""
return self._webhook_server
@property
def is_running(self) -> bool:
"""Check if runtime is running."""
+6
View File
@@ -56,6 +56,12 @@ class Runtime:
"""
def __init__(self, storage_path: str | Path):
# Validate and create storage path if needed
path = Path(storage_path) if isinstance(storage_path, str) else storage_path
if not path.exists():
logger.warning(f"Storage path does not exist, creating: {path}")
path.mkdir(parents=True, exist_ok=True)
self.storage = FileStorage(storage_path)
self._current_run: Run | None = None
self._current_node: str = "unknown"
+187
View File
@@ -62,6 +62,23 @@ class EventType(StrEnum):
NODE_INTERNAL_OUTPUT = "node_internal_output"
NODE_INPUT_BLOCKED = "node_input_blocked"
NODE_STALLED = "node_stalled"
NODE_TOOL_DOOM_LOOP = "node_tool_doom_loop"
# Judge decisions
JUDGE_VERDICT = "judge_verdict"
# Output tracking
OUTPUT_KEY_SET = "output_key_set"
# Retry / edge tracking
NODE_RETRY = "node_retry"
EDGE_TRAVERSED = "edge_traversed"
# Context management
CONTEXT_COMPACTED = "context_compacted"
# External triggers
WEBHOOK_RECEIVED = "webhook_received"
# Custom events
CUSTOM = "custom"
@@ -615,6 +632,24 @@ class EventBus:
)
)
async def emit_tool_doom_loop(
self,
stream_id: str,
node_id: str,
description: str = "",
execution_id: str | None = None,
) -> None:
"""Emit tool doom loop detection event."""
await self.publish(
AgentEvent(
type=EventType.NODE_TOOL_DOOM_LOOP,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"description": description},
)
)
async def emit_node_input_blocked(
self,
stream_id: str,
@@ -633,6 +668,158 @@ class EventBus:
)
)
# === JUDGE / OUTPUT / RETRY / EDGE PUBLISHERS ===
async def emit_judge_verdict(
self,
stream_id: str,
node_id: str,
action: str,
feedback: str = "",
judge_type: str = "implicit",
iteration: int = 0,
execution_id: str | None = None,
) -> None:
"""Emit judge verdict event."""
await self.publish(
AgentEvent(
type=EventType.JUDGE_VERDICT,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"action": action,
"feedback": feedback,
"judge_type": judge_type,
"iteration": iteration,
},
)
)
async def emit_output_key_set(
self,
stream_id: str,
node_id: str,
key: str,
execution_id: str | None = None,
) -> None:
"""Emit output key set event."""
await self.publish(
AgentEvent(
type=EventType.OUTPUT_KEY_SET,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"key": key},
)
)
async def emit_node_retry(
self,
stream_id: str,
node_id: str,
retry_count: int,
max_retries: int,
error: str = "",
execution_id: str | None = None,
) -> None:
"""Emit node retry event."""
await self.publish(
AgentEvent(
type=EventType.NODE_RETRY,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={
"retry_count": retry_count,
"max_retries": max_retries,
"error": error,
},
)
)
async def emit_edge_traversed(
self,
stream_id: str,
source_node: str,
target_node: str,
edge_condition: str = "",
execution_id: str | None = None,
) -> None:
"""Emit edge traversed event."""
await self.publish(
AgentEvent(
type=EventType.EDGE_TRAVERSED,
stream_id=stream_id,
node_id=source_node,
execution_id=execution_id,
data={
"source_node": source_node,
"target_node": target_node,
"edge_condition": edge_condition,
},
)
)
async def emit_execution_paused(
self,
stream_id: str,
node_id: str,
reason: str = "",
execution_id: str | None = None,
) -> None:
"""Emit execution paused event."""
await self.publish(
AgentEvent(
type=EventType.EXECUTION_PAUSED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"reason": reason},
)
)
async def emit_execution_resumed(
self,
stream_id: str,
node_id: str,
execution_id: str | None = None,
) -> None:
"""Emit execution resumed event."""
await self.publish(
AgentEvent(
type=EventType.EXECUTION_RESUMED,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={},
)
)
async def emit_webhook_received(
self,
source_id: str,
path: str,
method: str,
headers: dict[str, str],
payload: dict[str, Any],
query_params: dict[str, str] | None = None,
) -> None:
"""Emit webhook received event."""
await self.publish(
AgentEvent(
type=EventType.WEBHOOK_RECEIVED,
stream_id=source_id,
data={
"path": path,
"method": method,
"headers": headers,
"payload": payload,
"query_params": query_params or {},
},
)
)
# === QUERY OPERATIONS ===
def get_history(
+85 -21
View File
@@ -196,6 +196,22 @@ class ExecutionStream:
)
)
@property
def active_execution_ids(self) -> list[str]:
"""Return IDs of all currently active executions."""
return list(self._active_executions.keys())
@property
def is_awaiting_input(self) -> bool:
"""True when an active execution is blocked waiting for client input."""
if not self._active_executors:
return False
for executor in self._active_executors.values():
for node in executor.node_registry.values():
if getattr(node, "_awaiting_input", False):
return True
return False
def _record_execution_result(self, execution_id: str, result: ExecutionResult) -> None:
"""Record a completed execution result with retention pruning."""
self._execution_results[execution_id] = result
@@ -293,8 +309,13 @@ class ExecutionStream:
if not self._running:
raise RuntimeError(f"ExecutionStream '{self.stream_id}' is not running")
# Generate execution ID using unified session format
if self._session_store:
# When resuming, reuse the original session ID so the execution
# continues in the same session directory instead of creating a new one.
resume_session_id = session_state.get("resume_session_id") if session_state else None
if resume_session_id:
execution_id = resume_session_id
elif self._session_store:
execution_id = self._session_store.generate_session_id()
else:
# Fallback to old format if SessionStore not available (shouldn't happen)
@@ -337,6 +358,11 @@ class ExecutionStream:
"""Run a single execution within the stream."""
execution_id = ctx.id
# When sharing a session with another entry point (resume_session_id),
# skip writing initial/final session state — the primary execution
# owns the state.json and _write_progress() keeps memory up-to-date.
_is_shared_session = bool(ctx.session_state and ctx.session_state.get("resume_session_id"))
# Acquire semaphore to limit concurrency
async with self._semaphore:
ctx.status = "running"
@@ -399,7 +425,8 @@ class ExecutionStream:
self._active_executors[execution_id] = executor
# Write initial session state
await self._write_session_state(execution_id, ctx)
if not _is_shared_session:
await self._write_session_state(execution_id, ctx)
# Create modified graph with entry point
# We need to override the entry_node to use our entry point
@@ -433,8 +460,9 @@ class ExecutionStream:
if result.paused_at:
ctx.status = "paused"
# Write final session state
await self._write_session_state(execution_id, ctx, result=result)
# Write final session state (skip for shared-session executions)
if not _is_shared_session:
await self._write_session_state(execution_id, ctx, result=result)
# Emit completion/failure event
if self._event_bus:
@@ -485,11 +513,14 @@ class ExecutionStream:
# Store result with retention
self._record_execution_result(execution_id, result)
# Write session state
if has_result and result.paused_at:
await self._write_session_state(execution_id, ctx, result=result)
else:
await self._write_session_state(execution_id, ctx, error="Execution cancelled")
# Write session state (skip for shared-session executions)
if not _is_shared_session:
if has_result and result.paused_at:
await self._write_session_state(execution_id, ctx, result=result)
else:
await self._write_session_state(
execution_id, ctx, error="Execution cancelled"
)
# Don't re-raise - we've handled it and saved state
@@ -506,8 +537,9 @@ class ExecutionStream:
),
)
# Write error session state
await self._write_session_state(execution_id, ctx, error=str(e))
# Write error session state (skip for shared-session executions)
if not _is_shared_session:
await self._write_session_state(execution_id, ctx, error=str(e))
# End run with failure (for observability)
try:
@@ -597,10 +629,22 @@ class ExecutionStream:
entry_point=self.entry_spec.id,
)
else:
# Create initial state
from framework.schemas.session_state import SessionTimestamps
# Create initial state — when resuming, preserve the previous
# execution's progress so crashes don't lose track of state.
from framework.schemas.session_state import (
SessionProgress,
SessionTimestamps,
)
now = datetime.now().isoformat()
ss = ctx.session_state or {}
progress = SessionProgress(
current_node=ss.get("paused_at") or ss.get("resume_from"),
paused_at=ss.get("paused_at"),
resume_from=ss.get("paused_at") or ss.get("resume_from"),
path=ss.get("execution_path", []),
node_visit_counts=ss.get("node_visit_counts", {}),
)
state = SessionState(
session_id=execution_id,
stream_id=self.stream_id,
@@ -613,6 +657,8 @@ class ExecutionStream:
started_at=ctx.started_at.isoformat(),
updated_at=now,
),
progress=progress,
memory=ss.get("memory", {}),
input_data=ctx.input_data,
)
@@ -629,20 +675,35 @@ class ExecutionStream:
logger.error(f"Failed to write state.json for {execution_id}: {e}")
def _create_modified_graph(self) -> "GraphSpec":
"""Create a graph with the entry point overridden."""
# Use the existing graph but override entry_node
"""Create a graph with the entry point overridden.
Preserves the original graph's entry_points and async_entry_points
so that validation correctly considers ALL entry nodes reachable.
Each stream only executes from its own entry_node, but the full
graph must validate with all entry points accounted for.
"""
from framework.graph.edge import GraphSpec
# Create a copy with modified entry node
# Merge entry points: this stream's entry + original graph's primary
# entry + any other entry points. This ensures all nodes are
# reachable during validation even though this stream only starts
# from self.entry_spec.entry_node.
merged_entry_points = {
"start": self.entry_spec.entry_node,
}
# Preserve the original graph's primary entry node
if self.graph.entry_node:
merged_entry_points["primary"] = self.graph.entry_node
# Include any explicitly defined entry points from the graph
merged_entry_points.update(self.graph.entry_points)
return GraphSpec(
id=self.graph.id,
goal_id=self.graph.goal_id,
version=self.graph.version,
entry_node=self.entry_spec.entry_node, # Use our entry point
entry_points={
"start": self.entry_spec.entry_node,
**self.graph.entry_points,
},
entry_points=merged_entry_points,
async_entry_points=self.graph.async_entry_points,
terminal_nodes=self.graph.terminal_nodes,
pause_nodes=self.graph.pause_nodes,
nodes=self.graph.nodes,
@@ -651,6 +712,9 @@ class ExecutionStream:
max_tokens=self.graph.max_tokens,
max_steps=self.graph.max_steps,
cleanup_llm_model=self.graph.cleanup_llm_model,
loop_config=self.graph.loop_config,
conversation_mode=self.graph.conversation_mode,
identity_prompt=self.graph.identity_prompt,
)
async def wait_for_completion(
@@ -30,14 +30,14 @@ class NodeStepLog(BaseModel):
"""Full tool and LLM details for one step within a node.
For EventLoopNode, each iteration is a step. For single-step nodes
(LLMNode, FunctionNode, RouterNode), step_index is 0.
(e.g. RouterNode), step_index is 0.
OTel-aligned fields (trace_id, span_id, execution_id) enable correlation
and future OpenTelemetry export without schema changes.
"""
node_id: str
node_type: str = "" # "event_loop"|"llm_tool_use"|"llm_generate"|"function"|"router"
node_type: str = "" # "event_loop" (the only valid type)
step_index: int = 0 # iteration number for event_loop, 0 for single-step nodes
llm_text: str = ""
tool_calls: list[ToolCallLog] = Field(default_factory=list)
@@ -64,7 +64,7 @@ def sample_graph():
id="process-webhook",
name="Process Webhook",
description="Process incoming webhook",
node_type="llm_generate",
node_type="event_loop",
input_keys=["webhook_data"],
output_keys=["result"],
),
@@ -72,7 +72,7 @@ def sample_graph():
id="process-api",
name="Process API Request",
description="Process API request",
node_type="llm_generate",
node_type="event_loop",
input_keys=["request_data"],
output_keys=["result"],
),
@@ -538,7 +538,7 @@ class TestGraphSpecValidation:
id="valid-node",
name="Valid Node",
description="A valid node",
node_type="llm_generate",
node_type="event_loop",
input_keys=[],
output_keys=[],
),
@@ -0,0 +1,717 @@
"""
Tests for WebhookServer and event-driven entry points.
"""
import asyncio
import hashlib
import hmac as hmac_mod
import json
import tempfile
from pathlib import Path
from unittest.mock import patch
import aiohttp
import pytest
from framework.runtime.agent_runtime import AgentRuntime, AgentRuntimeConfig
from framework.runtime.event_bus import AgentEvent, EventBus, EventType
from framework.runtime.execution_stream import EntryPointSpec
from framework.runtime.webhook_server import (
WebhookRoute,
WebhookServer,
WebhookServerConfig,
)
def _make_server(event_bus: EventBus, routes: list[WebhookRoute] | None = None):
"""Helper to create a WebhookServer with port=0 for OS-assigned port."""
config = WebhookServerConfig(host="127.0.0.1", port=0)
server = WebhookServer(event_bus, config)
for route in routes or []:
server.add_route(route)
return server
def _base_url(server: WebhookServer) -> str:
"""Get the base URL for a running server."""
return f"http://127.0.0.1:{server.port}"
class TestWebhookServerLifecycle:
"""Tests for server start/stop."""
@pytest.mark.asyncio
async def test_start_stop(self):
bus = EventBus()
server = _make_server(
bus,
[
WebhookRoute(source_id="test", path="/webhooks/test", methods=["POST"]),
],
)
await server.start()
assert server.is_running
assert server.port is not None
await server.stop()
assert not server.is_running
assert server.port is None
@pytest.mark.asyncio
async def test_no_routes_skips_start(self):
bus = EventBus()
server = _make_server(bus) # no routes
await server.start()
assert not server.is_running
@pytest.mark.asyncio
async def test_stop_when_not_started(self):
bus = EventBus()
server = _make_server(bus)
# Should be a no-op, not raise
await server.stop()
assert not server.is_running
class TestWebhookEventPublishing:
"""Tests for HTTP request -> EventBus event publishing."""
@pytest.mark.asyncio
async def test_post_publishes_webhook_received(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(source_id="gh", path="/webhooks/github", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/github",
json={"action": "opened", "number": 42},
) as resp:
assert resp.status == 202
body = await resp.json()
assert body["status"] == "accepted"
# Give event bus time to dispatch
await asyncio.sleep(0.05)
assert len(received) == 1
event = received[0]
assert event.type == EventType.WEBHOOK_RECEIVED
assert event.stream_id == "gh"
assert event.data["path"] == "/webhooks/github"
assert event.data["method"] == "POST"
assert event.data["payload"] == {"action": "opened", "number": 42}
assert isinstance(event.data["headers"], dict)
assert event.data["query_params"] == {}
finally:
await server.stop()
@pytest.mark.asyncio
async def test_query_params_included(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(source_id="hook", path="/webhooks/hook", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/hook?source=test&v=2",
json={"data": "hello"},
) as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 1
assert received[0].data["query_params"] == {"source": "test", "v": "2"}
finally:
await server.stop()
@pytest.mark.asyncio
async def test_non_json_body(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(source_id="raw", path="/webhooks/raw", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/raw",
data=b"plain text body",
headers={"Content-Type": "text/plain"},
) as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 1
assert received[0].data["payload"] == {"raw_body": "plain text body"}
finally:
await server.stop()
@pytest.mark.asyncio
async def test_empty_body(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(source_id="empty", path="/webhooks/empty", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(f"{_base_url(server)}/webhooks/empty") as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 1
assert received[0].data["payload"] == {}
finally:
await server.stop()
@pytest.mark.asyncio
async def test_multiple_routes(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(source_id="a", path="/webhooks/a", methods=["POST"]),
WebhookRoute(source_id="b", path="/webhooks/b", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/a", json={"from": "a"}
) as resp:
assert resp.status == 202
async with session.post(
f"{_base_url(server)}/webhooks/b", json={"from": "b"}
) as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 2
stream_ids = {e.stream_id for e in received}
assert stream_ids == {"a", "b"}
finally:
await server.stop()
@pytest.mark.asyncio
async def test_filter_stream_subscription(self):
"""Subscribers can filter by stream_id (source_id)."""
bus = EventBus()
a_events = []
b_events = []
async def handle_a(event):
a_events.append(event)
async def handle_b(event):
b_events.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handle_a, filter_stream="a")
bus.subscribe([EventType.WEBHOOK_RECEIVED], handle_b, filter_stream="b")
server = _make_server(
bus,
[
WebhookRoute(source_id="a", path="/webhooks/a", methods=["POST"]),
WebhookRoute(source_id="b", path="/webhooks/b", methods=["POST"]),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
await session.post(f"{_base_url(server)}/webhooks/a", json={"x": 1})
await session.post(f"{_base_url(server)}/webhooks/b", json={"x": 2})
await asyncio.sleep(0.05)
assert len(a_events) == 1
assert a_events[0].data["payload"] == {"x": 1}
assert len(b_events) == 1
assert b_events[0].data["payload"] == {"x": 2}
finally:
await server.stop()
class TestHMACVerification:
"""Tests for HMAC-SHA256 signature verification."""
@pytest.mark.asyncio
async def test_valid_signature_accepted(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
secret = "test-secret-key"
server = _make_server(
bus,
[
WebhookRoute(
source_id="secure",
path="/webhooks/secure",
methods=["POST"],
secret=secret,
),
],
)
await server.start()
try:
body = json.dumps({"event": "push"}).encode()
sig = hmac_mod.new(secret.encode(), body, hashlib.sha256).hexdigest()
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/secure",
data=body,
headers={
"Content-Type": "application/json",
"X-Hub-Signature-256": f"sha256={sig}",
},
) as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 1
finally:
await server.stop()
@pytest.mark.asyncio
async def test_invalid_signature_rejected(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(
source_id="secure",
path="/webhooks/secure",
methods=["POST"],
secret="real-secret",
),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/secure",
json={"event": "push"},
headers={"X-Hub-Signature-256": "sha256=invalidsignature"},
) as resp:
assert resp.status == 401
await asyncio.sleep(0.05)
assert len(received) == 0 # No event published
finally:
await server.stop()
@pytest.mark.asyncio
async def test_missing_signature_rejected(self):
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(
source_id="secure",
path="/webhooks/secure",
methods=["POST"],
secret="my-secret",
),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
# No X-Hub-Signature-256 header
async with session.post(
f"{_base_url(server)}/webhooks/secure",
json={"event": "push"},
) as resp:
assert resp.status == 401
await asyncio.sleep(0.05)
assert len(received) == 0
finally:
await server.stop()
@pytest.mark.asyncio
async def test_no_secret_skips_verification(self):
"""Routes without a secret accept any request."""
bus = EventBus()
received = []
async def handler(event):
received.append(event)
bus.subscribe([EventType.WEBHOOK_RECEIVED], handler)
server = _make_server(
bus,
[
WebhookRoute(
source_id="open",
path="/webhooks/open",
methods=["POST"],
secret=None,
),
],
)
await server.start()
try:
async with aiohttp.ClientSession() as session:
async with session.post(
f"{_base_url(server)}/webhooks/open",
json={"data": "test"},
) as resp:
assert resp.status == 202
await asyncio.sleep(0.05)
assert len(received) == 1
finally:
await server.stop()
class TestEventDrivenEntryPoints:
"""Tests for event-driven entry points wired through AgentRuntime."""
def _make_graph_and_goal(self):
"""Minimal graph + goal for testing entry point triggering."""
from framework.graph import Goal
from framework.graph.edge import GraphSpec
from framework.graph.goal import SuccessCriterion
from framework.graph.node import NodeSpec
nodes = [
NodeSpec(
id="process-event",
name="Process Event",
description="Process incoming event",
node_type="event_loop",
input_keys=["event"],
output_keys=["result"],
),
]
graph = GraphSpec(
id="test-graph",
goal_id="test-goal",
version="1.0.0",
entry_node="process-event",
entry_points={"start": "process-event"},
async_entry_points=[],
terminal_nodes=[],
pause_nodes=[],
nodes=nodes,
edges=[],
)
goal = Goal(
id="test-goal",
name="Test Goal",
description="Test",
success_criteria=[
SuccessCriterion(
id="sc-1",
description="Done",
metric="done",
target="yes",
weight=1.0,
),
],
)
return graph, goal
@pytest.mark.asyncio
async def test_event_entry_point_subscribes_to_bus(self):
"""Entry point with trigger_type='event' subscribes and triggers on matching events."""
graph, goal = self._make_graph_and_goal()
config = AgentRuntimeConfig(
webhook_host="127.0.0.1",
webhook_port=0,
webhook_routes=[
{"source_id": "gh", "path": "/webhooks/github"},
],
)
with tempfile.TemporaryDirectory() as tmpdir:
runtime = AgentRuntime(
graph=graph,
goal=goal,
storage_path=Path(tmpdir),
config=config,
)
runtime.register_entry_point(
EntryPointSpec(
id="gh-handler",
name="GitHub Handler",
entry_node="process-event",
trigger_type="event",
trigger_config={
"event_types": ["webhook_received"],
"filter_stream": "gh",
},
)
)
trigger_calls = []
async def mock_trigger(ep_id, data, **kwargs):
trigger_calls.append((ep_id, data))
with patch.object(runtime, "trigger", side_effect=mock_trigger):
await runtime.start()
try:
assert runtime.webhook_server is not None
assert runtime.webhook_server.is_running
port = runtime.webhook_server.port
async with aiohttp.ClientSession() as session:
async with session.post(
f"http://127.0.0.1:{port}/webhooks/github",
json={"action": "push", "ref": "main"},
) as resp:
assert resp.status == 202
await asyncio.sleep(0.1)
assert len(trigger_calls) == 1
ep_id, data = trigger_calls[0]
assert ep_id == "gh-handler"
assert "event" in data
assert data["event"]["type"] == "webhook_received"
assert data["event"]["stream_id"] == "gh"
assert data["event"]["data"]["payload"] == {
"action": "push",
"ref": "main",
}
finally:
await runtime.stop()
assert runtime.webhook_server is None
@pytest.mark.asyncio
async def test_event_entry_point_filter_stream(self):
"""Entry point only triggers for matching stream_id (source_id)."""
graph, goal = self._make_graph_and_goal()
config = AgentRuntimeConfig(
webhook_routes=[
{"source_id": "github", "path": "/webhooks/github"},
{"source_id": "stripe", "path": "/webhooks/stripe"},
],
webhook_port=0,
)
with tempfile.TemporaryDirectory() as tmpdir:
runtime = AgentRuntime(
graph=graph,
goal=goal,
storage_path=Path(tmpdir),
config=config,
)
runtime.register_entry_point(
EntryPointSpec(
id="gh-only",
name="GitHub Only",
entry_node="process-event",
trigger_type="event",
trigger_config={
"event_types": ["webhook_received"],
"filter_stream": "github",
},
)
)
trigger_calls = []
async def mock_trigger(ep_id, data, **kwargs):
trigger_calls.append((ep_id, data))
with patch.object(runtime, "trigger", side_effect=mock_trigger):
await runtime.start()
try:
port = runtime.webhook_server.port
async with aiohttp.ClientSession() as session:
# POST to stripe — should NOT trigger
await session.post(
f"http://127.0.0.1:{port}/webhooks/stripe",
json={"type": "payment"},
)
# POST to github — should trigger
await session.post(
f"http://127.0.0.1:{port}/webhooks/github",
json={"action": "opened"},
)
await asyncio.sleep(0.1)
assert len(trigger_calls) == 1
assert trigger_calls[0][0] == "gh-only"
finally:
await runtime.stop()
@pytest.mark.asyncio
async def test_no_webhook_routes_skips_server(self):
"""Runtime without webhook_routes does not start a webhook server."""
graph, goal = self._make_graph_and_goal()
with tempfile.TemporaryDirectory() as tmpdir:
runtime = AgentRuntime(
graph=graph,
goal=goal,
storage_path=Path(tmpdir),
)
runtime.register_entry_point(
EntryPointSpec(
id="manual",
name="Manual",
entry_node="process-event",
trigger_type="manual",
)
)
await runtime.start()
try:
assert runtime.webhook_server is None
finally:
await runtime.stop()
@pytest.mark.asyncio
async def test_event_entry_point_custom_event(self):
"""Entry point can subscribe to CUSTOM events, not just webhooks."""
graph, goal = self._make_graph_and_goal()
with tempfile.TemporaryDirectory() as tmpdir:
runtime = AgentRuntime(
graph=graph,
goal=goal,
storage_path=Path(tmpdir),
)
runtime.register_entry_point(
EntryPointSpec(
id="custom-handler",
name="Custom Handler",
entry_node="process-event",
trigger_type="event",
trigger_config={
"event_types": ["custom"],
},
)
)
trigger_calls = []
async def mock_trigger(ep_id, data, **kwargs):
trigger_calls.append((ep_id, data))
with patch.object(runtime, "trigger", side_effect=mock_trigger):
await runtime.start()
try:
await runtime.event_bus.publish(
AgentEvent(
type=EventType.CUSTOM,
stream_id="some-source",
data={"key": "value"},
)
)
await asyncio.sleep(0.1)
assert len(trigger_calls) == 1
assert trigger_calls[0][0] == "custom-handler"
assert trigger_calls[0][1]["event"]["type"] == "custom"
assert trigger_calls[0][1]["event"]["data"]["key"] == "value"
finally:
await runtime.stop()
+175
View File
@@ -0,0 +1,175 @@
"""
Webhook HTTP Server - Receives HTTP requests and publishes them as EventBus events.
Only starts if webhook-type entry points are registered. Uses aiohttp for
a lightweight embedded HTTP server that runs within the existing asyncio loop.
"""
import hashlib
import hmac
import json
import logging
from dataclasses import dataclass
from aiohttp import web
from framework.runtime.event_bus import EventBus
logger = logging.getLogger(__name__)
@dataclass
class WebhookRoute:
"""A registered webhook route derived from an EntryPointSpec."""
source_id: str
path: str
methods: list[str]
secret: str | None = None # For HMAC-SHA256 signature verification
@dataclass
class WebhookServerConfig:
"""Configuration for the webhook HTTP server."""
host: str = "127.0.0.1"
port: int = 8080
class WebhookServer:
"""
Embedded HTTP server that receives webhook requests and publishes
them as WEBHOOK_RECEIVED events on the EventBus.
The server's only job is: receive HTTP -> publish AgentEvent.
Subscribers decide what to do with the event.
Lifecycle:
server = WebhookServer(event_bus, config)
server.add_route(WebhookRoute(...))
await server.start()
# ... server running ...
await server.stop()
"""
def __init__(
self,
event_bus: EventBus,
config: WebhookServerConfig | None = None,
):
self._event_bus = event_bus
self._config = config or WebhookServerConfig()
self._routes: dict[str, WebhookRoute] = {} # path -> route
self._app: web.Application | None = None
self._runner: web.AppRunner | None = None
self._site: web.TCPSite | None = None
def add_route(self, route: WebhookRoute) -> None:
"""Register a webhook route."""
self._routes[route.path] = route
async def start(self) -> None:
"""Start the HTTP server. No-op if no routes registered."""
if not self._routes:
logger.debug("No webhook routes registered, skipping server start")
return
self._app = web.Application()
for path, route in self._routes.items():
for method in route.methods:
self._app.router.add_route(method, path, self._handle_request)
self._runner = web.AppRunner(self._app)
await self._runner.setup()
self._site = web.TCPSite(
self._runner,
self._config.host,
self._config.port,
)
await self._site.start()
logger.info(
f"Webhook server started on {self._config.host}:{self._config.port} "
f"with {len(self._routes)} route(s)"
)
async def stop(self) -> None:
"""Stop the HTTP server gracefully."""
if self._runner:
await self._runner.cleanup()
self._runner = None
self._app = None
self._site = None
logger.info("Webhook server stopped")
async def _handle_request(self, request: web.Request) -> web.Response:
"""Handle an incoming webhook request."""
path = request.path
route = self._routes.get(path)
if route is None:
return web.json_response({"error": "Not found"}, status=404)
# Read body
try:
body = await request.read()
except Exception:
return web.json_response(
{"error": "Failed to read request body"},
status=400,
)
# Verify HMAC signature if secret is configured
if route.secret:
if not self._verify_signature(request, body, route.secret):
return web.json_response({"error": "Invalid signature"}, status=401)
# Parse body as JSON (fall back to raw text for non-JSON)
try:
payload = json.loads(body) if body else {}
except (json.JSONDecodeError, ValueError):
payload = {"raw_body": body.decode("utf-8", errors="replace")}
# Publish event to bus
await self._event_bus.emit_webhook_received(
source_id=route.source_id,
path=path,
method=request.method,
headers=dict(request.headers),
payload=payload,
query_params=dict(request.query),
)
return web.json_response({"status": "accepted"}, status=202)
def _verify_signature(
self,
request: web.Request,
body: bytes,
secret: str,
) -> bool:
"""Verify HMAC-SHA256 signature from X-Hub-Signature-256 header."""
signature_header = request.headers.get("X-Hub-Signature-256", "")
if not signature_header.startswith("sha256="):
return False
expected_sig = signature_header[7:] # strip "sha256="
computed_sig = hmac.new(
secret.encode("utf-8"),
body,
hashlib.sha256,
).hexdigest()
return hmac.compare_digest(expected_sig, computed_sig)
@property
def is_running(self) -> bool:
"""Check if the server is running."""
return self._site is not None
@property
def port(self) -> int | None:
"""Return the actual listening port (useful when configured with port=0)."""
if self._site and self._site._server and self._site._server.sockets:
return self._site._server.sockets[0].getsockname()[1]
return None
+17 -5
View File
@@ -156,8 +156,13 @@ class SessionState(BaseModel):
@computed_field
@property
def is_resumable(self) -> bool:
"""Can this session be resumed?"""
return self.status == SessionStatus.PAUSED and self.progress.resume_from is not None
"""Can this session be resumed?
Every non-completed session is resumable. If resume_from/paused_at
aren't set, the executor falls back to the graph entry point —
so we don't gate on those. Even catastrophic failures are resumable.
"""
return self.status != SessionStatus.COMPLETED
@computed_field
@property
@@ -279,9 +284,16 @@ class SessionState(BaseModel):
def to_session_state_dict(self) -> dict[str, Any]:
"""Convert to session_state format for GraphExecutor.execute()."""
# Derive resume target: explicit > last node in path > entry point
resume_from = (
self.progress.resume_from
or self.progress.paused_at
or (self.progress.path[-1] if self.progress.path else None)
)
return {
"paused_at": self.progress.paused_at,
"resume_from": self.progress.resume_from,
"paused_at": resume_from,
"resume_from": resume_from,
"memory": self.memory,
"next_node": None,
"execution_path": self.progress.path,
"node_visit_counts": self.progress.node_visit_counts,
}
+137 -47
View File
@@ -1,18 +1,18 @@
import logging
import platform
import subprocess
import threading
import time
from textual.app import App, ComposeResult
from textual.binding import Binding
from textual.containers import Container, Horizontal, Vertical
from textual.widgets import Footer, Input, Label
from textual.containers import Container, Horizontal
from textual.widgets import Footer, Label
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.chat_repl import ChatRepl
from framework.tui.widgets.graph_view import GraphOverview
from framework.tui.widgets.log_pane import LogPane
from framework.tui.widgets.selectable_rich_log import SelectableRichLog
@@ -136,28 +136,15 @@ class AdenTUI(App):
background: $surface;
}
#left-pane {
width: 60%;
height: 100%;
layout: vertical;
background: $surface;
}
GraphOverview {
height: 40%;
width: 40%;
height: 100%;
background: $panel;
padding: 0;
}
LogPane {
height: 60%;
background: $surface;
padding: 0;
margin-bottom: 1;
}
ChatRepl {
width: 40%;
width: 60%;
height: 100%;
background: $panel;
border-left: tall $primary;
@@ -180,13 +167,13 @@ class AdenTUI(App):
scrollbar-color: $primary;
}
Input {
ChatTextArea {
background: $surface;
border: tall $primary;
margin-top: 1;
}
Input:focus {
ChatTextArea:focus {
border: tall $accent;
}
@@ -208,8 +195,10 @@ class AdenTUI(App):
Binding("ctrl+c", "ctrl_c", "Interrupt", show=False, priority=True),
Binding("super+c", "ctrl_c", "Copy", show=False, priority=True),
Binding("ctrl+s", "screenshot", "Screenshot (SVG)", show=True, priority=True),
Binding("ctrl+l", "toggle_logs", "Toggle Logs", show=True, priority=True),
Binding("ctrl+z", "pause_execution", "Pause", show=True, priority=True),
Binding("ctrl+r", "show_sessions", "Sessions", show=True, priority=True),
Binding("ctrl+p", "attach_pdf", "Attach PDF", show=True, priority=True),
Binding("tab", "focus_next", "Next Panel", show=True),
Binding("shift+tab", "focus_previous", "Previous Panel", show=False),
]
@@ -223,7 +212,6 @@ class AdenTUI(App):
super().__init__()
self.runtime = runtime
self.log_pane = LogPane()
self.graph_view = GraphOverview(runtime)
self.chat_repl = ChatRepl(runtime, resume_session, resume_checkpoint)
self.status_bar = StatusBar(graph_id=runtime.graph.id)
@@ -253,11 +241,7 @@ class AdenTUI(App):
yield self.status_bar
yield Horizontal(
Vertical(
self.log_pane,
self.graph_view,
id="left-pane",
),
self.graph_view,
self.chat_repl,
)
@@ -328,7 +312,7 @@ class AdenTUI(App):
if record.name.startswith(("textual", "LiteLLM", "litellm")):
continue
self.log_pane.write_python_log(record)
self.chat_repl.write_python_log(record)
except Exception:
pass
@@ -350,6 +334,14 @@ class AdenTUI(App):
EventType.CONSTRAINT_VIOLATION,
EventType.STATE_CHANGED,
EventType.NODE_INPUT_BLOCKED,
EventType.CONTEXT_COMPACTED,
EventType.NODE_INTERNAL_OUTPUT,
EventType.JUDGE_VERDICT,
EventType.OUTPUT_KEY_SET,
EventType.NODE_RETRY,
EventType.EDGE_TRAVERSED,
EventType.EXECUTION_PAUSED,
EventType.EXECUTION_RESUMED,
]
_LOG_PANE_EVENTS = frozenset(_EVENT_TYPES) - {
@@ -368,15 +360,36 @@ class AdenTUI(App):
pass
async def _handle_event(self, event: AgentEvent) -> None:
"""Called from the agent thread — bridge to Textual's main thread."""
"""Bridge events to Textual's main thread for UI updates.
Events may arrive from the agent-execution thread (normal LLM/tool
work) or from the Textual thread itself (e.g. webhook server events).
``call_from_thread`` requires a *different* thread, so we detect
which thread we're on and act accordingly.
"""
try:
self.call_from_thread(self._route_event, event)
except Exception:
pass
if threading.get_ident() == self._thread_id:
# Already on Textual's thread — call directly.
self._route_event(event)
else:
# On a different thread — bridge via call_from_thread.
self.call_from_thread(self._route_event, event)
except Exception as e:
logging.getLogger("tui.events").error(
"call_from_thread failed for %s (node=%s): %s",
event.type.value,
event.node_id or "?",
e,
)
def _route_event(self, event: AgentEvent) -> None:
"""Route incoming events to widgets. Runs on Textual's main thread."""
if not self.is_ready:
logging.getLogger("tui.events").warning(
"Event dropped (not ready): %s node=%s",
event.type.value,
event.node_id or "?",
)
return
try:
@@ -407,6 +420,35 @@ class AdenTUI(App):
self.chat_repl.handle_input_requested(
event.node_id or event.data.get("node_id", ""),
)
elif et == EventType.NODE_LOOP_STARTED:
self.chat_repl.handle_node_started(event.node_id or "")
elif et == EventType.NODE_LOOP_ITERATION:
self.chat_repl.handle_loop_iteration(event.data.get("iteration", 0))
# Track active node in chat_repl for mid-execution input
if et == EventType.NODE_LOOP_STARTED:
self.chat_repl.handle_node_started(event.node_id or "")
elif et == EventType.NODE_LOOP_COMPLETED:
self.chat_repl.handle_node_completed(event.node_id or "")
# Non-client-facing node output → chat repl
if et == EventType.NODE_INTERNAL_OUTPUT:
content = event.data.get("content", "")
if content.strip():
self.chat_repl.handle_internal_output(event.node_id or "", content)
# Execution paused/resumed → chat repl
if et == EventType.EXECUTION_PAUSED:
reason = event.data.get("reason", "")
self.chat_repl.handle_execution_paused(event.node_id or "", reason)
elif et == EventType.EXECUTION_RESUMED:
self.chat_repl.handle_execution_resumed(event.node_id or "")
# Goal achieved / constraint violation → chat repl
if et == EventType.GOAL_ACHIEVED:
self.chat_repl.handle_goal_achieved(event.data)
elif et == EventType.CONSTRAINT_VIOLATION:
self.chat_repl.handle_constraint_violation(event.data)
# --- Graph view events ---
if et in (
@@ -444,6 +486,13 @@ class AdenTUI(App):
started=False,
)
# Edge traversal → graph view
if et == EventType.EDGE_TRAVERSED:
self.graph_view.handle_edge_traversed(
event.data.get("source_node", ""),
event.data.get("target_node", ""),
)
# --- Status bar events ---
if et == EventType.EXECUTION_STARTED:
entry_node = event.data.get("entry_node") or (
@@ -464,12 +513,36 @@ class AdenTUI(App):
self.status_bar.set_node_detail("thinking...")
elif et == EventType.NODE_STALLED:
self.status_bar.set_node_detail(f"stalled: {event.data.get('reason', '')}")
elif et == EventType.CONTEXT_COMPACTED:
before = event.data.get("usage_before", "?")
after = event.data.get("usage_after", "?")
self.status_bar.set_node_detail(f"compacted: {before}% \u2192 {after}%")
elif et == EventType.JUDGE_VERDICT:
action = event.data.get("action", "?")
self.status_bar.set_node_detail(f"judge: {action}")
elif et == EventType.OUTPUT_KEY_SET:
key = event.data.get("key", "?")
self.status_bar.set_node_detail(f"set: {key}")
elif et == EventType.NODE_RETRY:
retry = event.data.get("retry_count", "?")
max_r = event.data.get("max_retries", "?")
self.status_bar.set_node_detail(f"retry {retry}/{max_r}")
elif et == EventType.EXECUTION_PAUSED:
self.status_bar.set_node_detail("paused")
elif et == EventType.EXECUTION_RESUMED:
self.status_bar.set_node_detail("resumed")
# --- Log pane events ---
# --- Log events (inline in chat) ---
if et in self._LOG_PANE_EVENTS:
self.log_pane.write_event(event)
except Exception:
pass
self.chat_repl.write_log_event(event)
except Exception as e:
logging.getLogger("tui.events").error(
"Route failed for %s (node=%s): %s",
event.type.value,
event.node_id or "?",
e,
exc_info=True,
)
def save_screenshot(self, filename: str | None = None) -> str:
"""Save a screenshot of the current screen as SVG (viewable in browsers).
@@ -504,8 +577,8 @@ class AdenTUI(App):
original_chat_border = chat_widget.styles.border_left
chat_widget.styles.border_left = ("none", "transparent")
# Hide all Input widget borders
input_widgets = self.query("Input")
# Hide all TextArea widget borders
input_widgets = self.query("ChatTextArea")
original_input_borders = []
for input_widget in input_widgets:
original_input_borders.append(input_widget.styles.border)
@@ -535,6 +608,12 @@ class AdenTUI(App):
except Exception as e:
self.notify(f"Screenshot failed: {e}", severity="error", timeout=5)
def action_toggle_logs(self) -> None:
"""Toggle inline log display in chat (bound to Ctrl+L)."""
self.chat_repl.toggle_logs()
mode = "ON" if self.chat_repl._show_logs else "OFF"
self.notify(f"Logs {mode}", severity="information", timeout=2)
def action_pause_execution(self) -> None:
"""Immediately pause execution by cancelling task (bound to Ctrl+Z)."""
try:
@@ -575,19 +654,12 @@ class AdenTUI(App):
timeout=5,
)
def action_show_sessions(self) -> None:
async def action_show_sessions(self) -> None:
"""Show sessions list (bound to Ctrl+R)."""
# Send /sessions command to chat input
try:
chat_repl = self.query_one(ChatRepl)
chat_input = chat_repl.query_one("#chat-input", Input)
chat_input.value = "/sessions"
# Trigger submission
self.notify(
"💡 Type /sessions in the chat to see all sessions",
severity="information",
timeout=3,
)
await chat_repl._submit_input("/sessions")
except Exception:
self.notify(
"Use /sessions command to see all sessions",
@@ -595,6 +667,24 @@ class AdenTUI(App):
timeout=3,
)
async def action_attach_pdf(self) -> None:
"""Open native OS file dialog for PDF selection (bound to Ctrl+P)."""
from framework.tui.widgets.file_browser import _has_gui, pick_pdf_file
if not _has_gui():
self.notify(
"No GUI available. Use /attach <path> instead.",
severity="warning",
timeout=5,
)
return
self.notify("Opening file dialog...", severity="information", timeout=2)
path = await pick_pdf_file()
if path is not None:
self.chat_repl.attach_pdf(path)
async def on_unmount(self) -> None:
"""Cleanup on app shutdown - cancel execution which will save state."""
self.is_ready = False
+247 -31
View File
@@ -15,19 +15,49 @@ Client-facing input:
"""
import asyncio
import logging
import re
import shutil
import threading
from pathlib import Path
from typing import Any
from textual.app import ComposeResult
from textual.containers import Vertical
from textual.widgets import Input, Label
from textual.message import Message
from textual.widgets import Label, TextArea
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import AgentEvent
from framework.tui.widgets.log_pane import format_event, format_python_log
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
class ChatTextArea(TextArea):
"""TextArea that submits on Enter and inserts newlines on Shift+Enter."""
class Submitted(Message):
"""Posted when the user presses Enter."""
def __init__(self, text: str) -> None:
super().__init__()
self.text = text
async def _on_key(self, event) -> None:
if event.key == "enter":
text = self.text.strip()
self.clear()
if text:
self.post_message(self.Submitted(text))
event.stop()
event.prevent_default()
elif event.key == "shift+enter":
event.key = "enter"
await super()._on_key(event)
else:
await super()._on_key(event)
class ChatRepl(Vertical):
"""Widget for interactive chat/REPL."""
@@ -56,16 +86,17 @@ class ChatRepl(Vertical):
display: none;
}
ChatRepl > Input {
ChatRepl > ChatTextArea {
width: 100%;
height: auto;
max-height: 7;
dock: bottom;
background: $surface;
border: tall $primary;
margin-top: 1;
}
ChatRepl > Input:focus {
ChatRepl > ChatTextArea:focus {
border: tall $accent;
}
"""
@@ -83,9 +114,13 @@ class ChatRepl(Vertical):
self._waiting_for_input: bool = False
self._input_node_id: str | None = None
self._pending_ask_question: str = ""
self._active_node_id: str | None = None # Currently executing node
self._resume_session = resume_session
self._resume_checkpoint = resume_checkpoint
self._session_index: list[str] = [] # IDs from last listing
self._show_logs: bool = False # Clean mode by default
self._log_buffer: list[str] = [] # Buffered log lines for backfill on toggle ON
self._attached_pdf: dict | None = None # Pending PDF attachment for next message
# Dedicated event loop for agent execution.
# Keeps blocking runtime code (LLM calls, MCP tools) off
@@ -108,7 +143,7 @@ class ChatRepl(Vertical):
min_width=0,
)
yield Label("Agent is processing...", id="processing-indicator")
yield Input(placeholder="Enter input for agent...", id="chat-input")
yield ChatTextArea(id="chat-input", placeholder="Enter input for agent...")
# Regex for file:// URIs that are NOT already inside Rich [link=...] markup
_FILE_URI_RE = re.compile(r"(?<!\[link=)(file://[^\s)\]>*]+)")
@@ -131,6 +166,31 @@ class ChatRepl(Vertical):
if was_at_bottom:
history.scroll_end(animate=False)
def toggle_logs(self) -> None:
"""Toggle inline log display on/off. Backfills buffered logs on toggle ON."""
self._show_logs = not self._show_logs
if self._show_logs and self._log_buffer:
self._write_history("[dim]--- Backfilling logs ---[/dim]")
for line in self._log_buffer:
self._write_history(line)
self._write_history("[dim]--- Live logs ---[/dim]")
mode = "ON (dirty)" if self._show_logs else "OFF (clean)"
self._write_history(f"[dim]Logs {mode}[/dim]")
def write_log_event(self, event: AgentEvent) -> None:
"""Buffer a formatted agent event. Display inline if logs are ON."""
formatted = format_event(event)
self._log_buffer.append(formatted)
if self._show_logs:
self._write_history(formatted)
def write_python_log(self, record: logging.LogRecord) -> None:
"""Buffer a formatted Python log record. Display inline if logs are ON."""
formatted = format_python_log(record)
self._log_buffer.append(formatted)
if self._show_logs:
self._write_history(formatted)
async def _handle_command(self, command: str) -> None:
"""Handle slash commands for session and checkpoint operations."""
parts = command.split(maxsplit=2)
@@ -138,6 +198,9 @@ class ChatRepl(Vertical):
if cmd == "/help":
self._write_history("""[bold cyan]Available Commands:[/bold cyan]
[bold]/attach[/bold] - Open file dialog to attach a PDF
[bold]/attach[/bold] <file_path> - Attach a PDF from a specific path
[bold]/detach[/bold] - Remove the currently attached PDF
[bold]/sessions[/bold] - List all sessions for this agent
[bold]/sessions[/bold] <session_id> - Show session details and checkpoints
[bold]/resume[/bold] - List sessions and pick one to resume
@@ -148,12 +211,11 @@ class ChatRepl(Vertical):
[bold]/help[/bold] - Show this help message
[dim]Examples:[/dim]
/attach [dim]# Open file picker dialog[/dim]
/attach ~/Documents/report.pdf [dim]# Attach a specific PDF[/dim]
/detach [dim]# Remove attached PDF[/dim]
/sessions [dim]# List all sessions[/dim]
/sessions session_20260208_143022 [dim]# Show session details[/dim]
/resume [dim]# Show numbered session list[/dim]
/resume 1 [dim]# Resume first listed session[/dim]
/resume session_20260208_143022 [dim]# Resume by full session ID[/dim]
/recover session_20260208_143022 cp_xxx [dim]# Recover from specific checkpoint[/dim]
/pause [dim]# Pause (or Ctrl+Z)[/dim]
""")
elif cmd == "/sessions":
@@ -194,6 +256,16 @@ class ChatRepl(Vertical):
session_id = parts[1].strip()
checkpoint_id = parts[2].strip()
await self._cmd_recover(session_id, checkpoint_id)
elif cmd == "/attach":
file_path = parts[1].strip() if len(parts) > 1 else None
await self._cmd_attach(file_path)
elif cmd == "/detach":
if self._attached_pdf:
name = self._attached_pdf["filename"]
self._attached_pdf = None
self._write_history(f"[dim]Detached: {name}[/dim]")
else:
self._write_history("[dim]No PDF attached.[/dim]")
elif cmd == "/pause":
await self._cmd_pause()
else:
@@ -202,6 +274,63 @@ class ChatRepl(Vertical):
"Type [bold]/help[/bold] for available commands"
)
def attach_pdf(self, path: Path) -> None:
"""Validate and stage a PDF file for the next message.
Copies the PDF to ~/.hive/assets/ and stores the path. The agent's
pdf_read tool handles text extraction at runtime.
Called by /attach <path> or by the native file dialog.
"""
path = Path(path).expanduser().resolve()
if not path.exists():
self._write_history(f"[bold red]Error:[/bold red] File not found: {path}")
return
if path.suffix.lower() != ".pdf":
self._write_history("[bold red]Error:[/bold red] Only PDF files are supported")
return
# Copy to ~/.hive/assets/, deduplicating like a normal filesystem:
# resume.pdf → resume(1).pdf → resume(2).pdf
assets_dir = Path.home() / ".hive" / "assets"
assets_dir.mkdir(parents=True, exist_ok=True)
dest = assets_dir / path.name
counter = 1
while dest.exists():
dest = assets_dir / f"{path.stem}({counter}){path.suffix}"
counter += 1
shutil.copy2(path, dest)
self._attached_pdf = {
"filename": path.name,
"path": str(dest),
}
self._write_history(f"[green]Attached:[/green] {path.name}")
self._write_history("[dim]PDF will be read by the agent on your next message.[/dim]")
async def _cmd_attach(self, file_path: str | None = None) -> None:
"""Attach a PDF file for context injection into the next message."""
if file_path is None:
from framework.tui.widgets.file_browser import _has_gui, pick_pdf_file
if not _has_gui():
self._write_history(
"[bold yellow]No GUI available.[/bold yellow] "
"Provide a path: [bold]/attach /path/to/file.pdf[/bold]"
)
return
self._write_history("[dim]Opening file dialog...[/dim]")
path = await pick_pdf_file()
if path is not None:
self.attach_pdf(path)
return
self.attach_pdf(Path(file_path))
async def _cmd_sessions(self, session_id: str | None) -> None:
"""List sessions or show details of a specific session."""
try:
@@ -451,6 +580,7 @@ class ChatRepl(Vertical):
if paused_at:
# Has paused_at - resume from there
resume_session_state = {
"resume_session_id": session_id,
"paused_at": paused_at,
"memory": state.get("memory", {}),
"execution_path": progress.get("path", []),
@@ -458,8 +588,13 @@ class ChatRepl(Vertical):
}
resume_info = f"From node: [cyan]{paused_at}[/cyan]"
else:
# No paused_at - just retry with same input
resume_session_state = {}
# No paused_at - retry with same input but reuse session directory
resume_session_state = {
"resume_session_id": session_id,
"memory": state.get("memory", {}),
"execution_path": progress.get("path", []),
"node_visit_counts": progress.get("node_visit_counts", {}),
}
resume_info = "Retrying with same input"
# Display resume info
@@ -485,7 +620,7 @@ class ChatRepl(Vertical):
indicator.display = True
# Update placeholder
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.placeholder = "Commands: /pause, /sessions (agent resuming...)"
# Trigger execution with resume state
@@ -563,6 +698,7 @@ class ChatRepl(Vertical):
# Create session_state for checkpoint recovery
recover_session_state = {
"resume_session_id": session_id,
"resume_from_checkpoint": checkpoint_id,
}
@@ -572,7 +708,7 @@ class ChatRepl(Vertical):
indicator.display = True
# Update placeholder
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.placeholder = "Commands: /pause, /sessions (agent recovering...)"
# Trigger execution with checkpoint recovery
@@ -739,9 +875,12 @@ class ChatRepl(Vertical):
# Silently fail - don't block TUI startup
pass
async def on_input_submitted(self, message: Input.Submitted) -> None:
"""Handle input submission — either start new execution or inject input."""
user_input = message.value.strip()
async def on_chat_text_area_submitted(self, message: ChatTextArea.Submitted) -> None:
"""Handle chat input submission."""
await self._submit_input(message.text)
async def _submit_input(self, user_input: str) -> None:
"""Handle submitted text — either start new execution or inject input."""
if not user_input:
return
@@ -749,16 +888,14 @@ class ChatRepl(Vertical):
# Commands work during execution, during client-facing input, anytime
if user_input.startswith("/"):
await self._handle_command(user_input)
message.input.value = ""
return
# Client-facing input: route to the waiting node
if self._waiting_for_input and self._input_node_id:
self._write_history(f"[bold green]You:[/bold green] {user_input}")
message.input.value = ""
# Keep input enabled for commands (but change placeholder)
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.placeholder = "Commands: /pause, /sessions (agent processing...)"
self._waiting_for_input = False
@@ -778,16 +915,29 @@ class ChatRepl(Vertical):
self._write_history(f"[bold red]Error delivering input:[/bold red] {e}")
return
# Double-submit guard: reject input while an execution is in-flight
# Mid-execution input: inject into the active node's conversation
if self._current_exec_id is not None and self._active_node_id:
self._write_history(f"[bold green]You:[/bold green] {user_input}")
node_id = self._active_node_id
try:
future = asyncio.run_coroutine_threadsafe(
self.runtime.inject_input(node_id, user_input),
self._agent_loop,
)
await asyncio.wrap_future(future)
except Exception as e:
self._write_history(f"[bold red]Error delivering input:[/bold red] {e}")
return
# Double-submit guard: no active node to inject into
if self._current_exec_id is not None:
self._write_history("[dim]Agent is still running — please wait.[/dim]")
return
indicator = self.query_one("#processing-indicator", Label)
# Append user message and clear input
# Append user message
self._write_history(f"[bold green]You:[/bold green] {user_input}")
message.input.value = ""
try:
# Get entry point
@@ -813,9 +963,16 @@ class ChatRepl(Vertical):
indicator.display = True
# Keep input enabled for commands during execution
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.placeholder = "Commands available: /pause, /sessions, /help"
# Build input data, injecting attached PDF file path if present
input_data = {input_key: user_input}
if self._attached_pdf:
input_data["pdf_file_path"] = self._attached_pdf["path"]
self._write_history(f"[dim]Including PDF: {self._attached_pdf['filename']}[/dim]")
self._attached_pdf = None
# Submit execution to the dedicated agent loop so blocking
# runtime code (LLM, MCP tools) never touches Textual's loop.
# trigger() returns immediately with an exec_id; the heavy
@@ -823,7 +980,7 @@ class ChatRepl(Vertical):
future = asyncio.run_coroutine_threadsafe(
self.runtime.trigger(
entry_point_id=entry_point.id,
input_data={input_key: user_input},
input_data=input_data,
),
self._agent_loop,
)
@@ -834,12 +991,32 @@ class ChatRepl(Vertical):
indicator.display = False
self._current_exec_id = None
# Re-enable input on error
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.disabled = False
self._write_history(f"[bold red]Error:[/bold red] {e}")
# -- Event handlers called by app.py _handle_event --
def handle_node_started(self, node_id: str) -> None:
"""Reset streaming state and track active node when a new node begins.
Flushes any stale ``_streaming_snapshot`` left over from the
previous node and resets the processing indicator so the user
sees a clean transition between graph nodes.
"""
self._active_node_id = node_id
if self._streaming_snapshot:
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
self._streaming_snapshot = ""
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Thinking...")
def handle_loop_iteration(self, iteration: int) -> None:
"""Flush accumulated streaming text when a new loop iteration starts."""
if self._streaming_snapshot:
self._write_history(f"[bold blue]Agent:[/bold blue] {self._streaming_snapshot}")
self._streaming_snapshot = ""
def handle_text_delta(self, content: str, snapshot: str) -> None:
"""Handle a streaming text token from the LLM."""
self._streaming_snapshot = snapshot
@@ -867,8 +1044,11 @@ class ChatRepl(Vertical):
# Update indicator to show tool activity
indicator.update(f"Using tool: {tool_name}...")
# Write a discrete status line to history
self._write_history(f"[dim]Tool: {tool_name}[/dim]")
# Buffer and conditionally display tool status line
line = f"[dim]Tool: {tool_name}[/dim]"
self._log_buffer.append(line)
if self._show_logs:
self._write_history(line)
def handle_tool_completed(self, tool_name: str, result: str, is_error: bool) -> None:
"""Handle a tool call completing."""
@@ -882,9 +1062,12 @@ class ChatRepl(Vertical):
preview = preview.replace("\n", " ")
if is_error:
self._write_history(f"[dim red]Tool {tool_name} error: {preview}[/dim red]")
line = f"[dim red]Tool {tool_name} error: {preview}[/dim red]"
else:
self._write_history(f"[dim]Tool {tool_name} result: {preview}[/dim]")
line = f"[dim]Tool {tool_name} result: {preview}[/dim]"
self._log_buffer.append(line)
if self._show_logs:
self._write_history(line)
# Restore thinking indicator
indicator = self.query_one("#processing-indicator", Label)
@@ -907,10 +1090,12 @@ class ChatRepl(Vertical):
self._streaming_snapshot = ""
self._waiting_for_input = False
self._input_node_id = None
self._active_node_id = None
self._pending_ask_question = ""
self._log_buffer.clear()
# Re-enable input
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.disabled = False
chat_input.placeholder = "Enter input for agent..."
chat_input.focus()
@@ -928,9 +1113,11 @@ class ChatRepl(Vertical):
self._waiting_for_input = False
self._pending_ask_question = ""
self._input_node_id = None
self._active_node_id = None
self._log_buffer.clear()
# Re-enable input
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.disabled = False
chat_input.placeholder = "Enter input for agent..."
chat_input.focus()
@@ -961,7 +1148,36 @@ class ChatRepl(Vertical):
indicator = self.query_one("#processing-indicator", Label)
indicator.update("Waiting for your input...")
chat_input = self.query_one("#chat-input", Input)
chat_input = self.query_one("#chat-input", ChatTextArea)
chat_input.disabled = False
chat_input.placeholder = "Type your response..."
chat_input.focus()
def handle_node_completed(self, node_id: str) -> None:
"""Clear active node when it finishes."""
if self._active_node_id == node_id:
self._active_node_id = None
def handle_internal_output(self, node_id: str, content: str) -> None:
"""Show output from non-client-facing nodes."""
self._write_history(f"[dim cyan]⟨{node_id}⟩[/dim cyan] {content}")
def handle_execution_paused(self, node_id: str, reason: str) -> None:
"""Show that execution has been paused."""
msg = f"[bold yellow]⏸ Paused[/bold yellow] at [cyan]{node_id}[/cyan]"
if reason:
msg += f" [dim]({reason})[/dim]"
self._write_history(msg)
def handle_execution_resumed(self, node_id: str) -> None:
"""Show that execution has been resumed."""
self._write_history(f"[bold green]▶ Resumed[/bold green] from [cyan]{node_id}[/cyan]")
def handle_goal_achieved(self, data: dict[str, Any]) -> None:
"""Show goal achievement prominently."""
self._write_history("[bold green]★ Goal achieved![/bold green]")
def handle_constraint_violation(self, data: dict[str, Any]) -> None:
"""Show constraint violation as a warning."""
desc = data.get("description", "Unknown constraint")
self._write_history(f"[bold red]⚠ Constraint violation:[/bold red] {desc}")
+135
View File
@@ -0,0 +1,135 @@
"""
Native OS file dialog for PDF selection.
Launches the platform's native file picker (macOS: NSOpenPanel via osascript,
Linux: zenity/kdialog, Windows: PowerShell OpenFileDialog) in a background
thread so Textual's event loop stays responsive.
Falls back to None when no GUI is available (SSH, headless).
"""
import asyncio
import os
import subprocess
import sys
from pathlib import Path
def _has_gui() -> bool:
"""Detect whether a GUI display is available."""
if sys.platform == "darwin":
# macOS: GUI is available unless running over SSH without display forwarding.
return "SSH_CONNECTION" not in os.environ or "DISPLAY" in os.environ
elif sys.platform == "win32":
return True
else:
# Linux/BSD: Need X11 or Wayland.
return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
def _linux_file_dialog() -> subprocess.CompletedProcess | None:
"""Try zenity, then kdialog, on Linux. Returns CompletedProcess or None."""
# Try zenity (GTK)
try:
return subprocess.run(
[
"zenity",
"--file-selection",
"--title=Select a PDF file",
"--file-filter=PDF files (*.pdf)|*.pdf",
],
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
# Try kdialog (KDE)
try:
return subprocess.run(
[
"kdialog",
"--getopenfilename",
".",
"PDF files (*.pdf)",
],
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
return None
def _pick_pdf_subprocess() -> Path | None:
"""Run the native file dialog. BLOCKS until user picks or cancels.
Returns a Path on success, None on cancel or error.
Must be called from a non-main thread (via asyncio.to_thread).
"""
try:
if sys.platform == "darwin":
result = subprocess.run(
[
"osascript",
"-e",
'POSIX path of (choose file of type {"com.adobe.pdf"} '
'with prompt "Select a PDF file")',
],
capture_output=True,
text=True,
timeout=300,
)
elif sys.platform == "win32":
ps_script = (
"Add-Type -AssemblyName System.Windows.Forms; "
"$f = New-Object System.Windows.Forms.OpenFileDialog; "
"$f.Filter = 'PDF files (*.pdf)|*.pdf'; "
"$f.Title = 'Select a PDF file'; "
"if ($f.ShowDialog() -eq 'OK') { $f.FileName }"
)
result = subprocess.run(
["powershell", "-NoProfile", "-Command", ps_script],
capture_output=True,
text=True,
timeout=300,
)
else:
result = _linux_file_dialog()
if result is None:
return None
if result.returncode != 0:
return None
path_str = result.stdout.strip()
if not path_str:
return None
path = Path(path_str)
if path.is_file() and path.suffix.lower() == ".pdf":
return path
return None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
return None
async def pick_pdf_file() -> Path | None:
"""Open a native OS file dialog to pick a PDF file.
Non-blocking: runs the dialog subprocess in a background thread via
asyncio.to_thread(), so the calling event loop stays responsive.
Returns:
Path to the selected PDF, or None if the user cancelled,
no GUI is available, or the dialog command was not found.
"""
if not _has_gui():
return None
return await asyncio.to_thread(_pick_pdf_subprocess)
+384 -12
View File
@@ -1,7 +1,15 @@
"""
Graph/Tree Overview Widget - Displays real agent graph structure.
Supports rendering loops (back-edges) via right-side return channels:
arrows drawn on the right margin that visually point back up to earlier nodes.
"""
from __future__ import annotations
import re
import time
from textual.app import ComposeResult
from textual.containers import Vertical
@@ -9,6 +17,17 @@ from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# Width of each return-channel column (padding + │ + gap)
_CHANNEL_WIDTH = 5
# Regex to strip Rich markup tags for measuring visible width
_MARKUP_RE = re.compile(r"\[/?[^\]]*\]")
def _plain_len(s: str) -> int:
"""Return the visible character length of a Rich-markup string."""
return len(_MARKUP_RE.sub("", s))
class GraphOverview(Vertical):
"""Widget to display Agent execution graph/tree with real data."""
@@ -46,6 +65,13 @@ class GraphOverview(Vertical):
def on_mount(self) -> None:
"""Display initial graph structure."""
self._display_graph()
# Refresh every 1s so timer countdowns stay current
if self.runtime._timer_next_fire is not None:
self.set_interval(1.0, self._display_graph)
# ------------------------------------------------------------------
# Graph analysis helpers
# ------------------------------------------------------------------
def _topo_order(self) -> list[str]:
"""BFS from entry_node following edges."""
@@ -68,6 +94,39 @@ class GraphOverview(Vertical):
visited.append(node.id)
return visited
def _detect_back_edges(self, ordered: list[str]) -> list[dict]:
"""Find edges where target appears before (or equal to) source in topo order.
Returns a list of dicts with keys: edge, source, target, source_idx, target_idx.
"""
order_idx = {nid: i for i, nid in enumerate(ordered)}
back_edges: list[dict] = []
for node_id in ordered:
for edge in self.runtime.graph.get_outgoing_edges(node_id):
target_idx = order_idx.get(edge.target, -1)
source_idx = order_idx.get(node_id, -1)
if target_idx != -1 and target_idx <= source_idx:
back_edges.append(
{
"edge": edge,
"source": node_id,
"target": edge.target,
"source_idx": source_idx,
"target_idx": target_idx,
}
)
return back_edges
def _is_back_edge(self, source: str, target: str, order_idx: dict[str, int]) -> bool:
"""Check whether an edge from *source* to *target* is a back-edge."""
si = order_idx.get(source, -1)
ti = order_idx.get(target, -1)
return ti != -1 and ti <= si
# ------------------------------------------------------------------
# Line rendering (Pass 1)
# ------------------------------------------------------------------
def _render_node_line(self, node_id: str) -> str:
"""Render a single node with status symbol and optional status text."""
graph = self.runtime.graph
@@ -95,43 +154,349 @@ class GraphOverview(Vertical):
suffix = f" [italic]{status}[/italic]" if status else ""
return f" {sym} {name}{suffix}"
def _render_edges(self, node_id: str) -> list[str]:
"""Render edge connectors from this node to its targets."""
edges = self.runtime.graph.get_outgoing_edges(node_id)
if not edges:
def _render_edges(self, node_id: str, order_idx: dict[str, int]) -> list[str]:
"""Render forward-edge connectors from *node_id*.
Back-edges are excluded here they are drawn by the return-channel
overlay in Pass 2.
"""
all_edges = self.runtime.graph.get_outgoing_edges(node_id)
if not all_edges:
return []
if len(edges) == 1:
# Split into forward and back
forward = [e for e in all_edges if not self._is_back_edge(node_id, e.target, order_idx)]
if not forward:
# All edges are back-edges — nothing to render here
return []
if len(forward) == 1:
return ["", ""]
# Fan-out: show branches
lines: list[str] = []
for i, edge in enumerate(edges):
connector = "" if i == len(edges) - 1 else ""
for i, edge in enumerate(forward):
connector = "" if i == len(forward) - 1 else ""
cond = ""
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
lines.append(f" {connector}──▶ {edge.target}{cond}")
return lines
# ------------------------------------------------------------------
# Return-channel overlay (Pass 2)
# ------------------------------------------------------------------
def _overlay_return_channels(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
available_width: int,
) -> list[str]:
"""Overlay right-side return channels onto the line buffer.
Each back-edge gets a vertical channel on the right margin. Channels
are allocated left-to-right by increasing span length so that shorter
(inner) loops are closer to the graph body and longer (outer) loops are
further right.
If the terminal is too narrow to fit even one channel, we fall back to
simple inline ```` annotations instead.
"""
if not back_edges:
return lines
num_channels = len(back_edges)
# Sort by span length ascending → inner loops get nearest channel
sorted_be = sorted(back_edges, key=lambda b: b["source_idx"] - b["target_idx"])
# --- Insert dedicated connector lines for back-edge sources ---
# Each back-edge source gets a blank line inserted after its node
# section (after any forward-edge lines). We process insertions in
# reverse order so that earlier indices remain valid.
all_node_lines_set = set(node_line_map.values())
insertions: list[tuple[int, int]] = [] # (insert_after_line, be_index)
for be_idx, be in enumerate(sorted_be):
source_node_line = node_line_map.get(be["source"])
if source_node_line is None:
continue
# Walk forward to find the last line in this node's section
last_section_line = source_node_line
for li in range(source_node_line + 1, len(lines)):
if li in all_node_lines_set:
break
last_section_line = li
insertions.append((last_section_line, be_idx))
source_line_for_be: dict[int, int] = {}
for insert_after, be_idx in sorted(insertions, reverse=True):
insert_at = insert_after + 1
lines.insert(insert_at, "") # placeholder for connector
source_line_for_be[be_idx] = insert_at
# Shift node_line_map entries that come after the insertion point
for nid in node_line_map:
if node_line_map[nid] > insert_after:
node_line_map[nid] += 1
# Also shift already-assigned source lines
for prev_idx in source_line_for_be:
if prev_idx != be_idx and source_line_for_be[prev_idx] > insert_after:
source_line_for_be[prev_idx] += 1
# Recompute max content width after insertions
max_content_w = max(_plain_len(ln) for ln in lines) if lines else 0
# Check if we have room for channels
channels_total_w = num_channels * _CHANNEL_WIDTH
if max_content_w + channels_total_w + 2 > available_width:
return self._inline_back_edge_fallback(lines, node_line_map, back_edges)
content_pad = max_content_w + 3 # gap between content and first channel
# Build channel info with final line positions
channel_info: list[dict] = []
for ch_idx, be in enumerate(sorted_be):
target_line = node_line_map.get(be["target"])
source_line = source_line_for_be.get(ch_idx)
if target_line is None or source_line is None:
continue
col = content_pad + ch_idx * _CHANNEL_WIDTH
channel_info.append(
{
"target_line": target_line,
"source_line": source_line,
"col": col,
}
)
if not channel_info:
return lines
# Build overlay grid — one row per line, columns for channel area
total_width = content_pad + num_channels * _CHANNEL_WIDTH + 1
overlay_width = total_width - max_content_w
overlays: list[list[str]] = [[" "] * overlay_width for _ in range(len(lines))]
for ci in channel_info:
tl = ci["target_line"]
sl = ci["source_line"]
col_offset = ci["col"] - max_content_w
if col_offset < 0 or col_offset >= overlay_width:
continue
# Target line: ◄──...──┐
if 0 <= tl < len(overlays):
for c in range(col_offset):
if overlays[tl][c] == " ":
overlays[tl][c] = ""
overlays[tl][col_offset] = ""
# Source line: ──...──┘
if 0 <= sl < len(overlays):
for c in range(col_offset):
if overlays[sl][c] == " ":
overlays[sl][c] = ""
overlays[sl][col_offset] = ""
# Vertical lines between target+1 and source-1
for li in range(tl + 1, sl):
if 0 <= li < len(overlays) and overlays[li][col_offset] == " ":
overlays[li][col_offset] = ""
# Merge overlays into the line strings
result: list[str] = []
for i, line in enumerate(lines):
pw = _plain_len(line)
pad = max_content_w - pw
overlay_chars = overlays[i] if i < len(overlays) else []
overlay_str = "".join(overlay_chars)
overlay_trimmed = overlay_str.rstrip()
if overlay_trimmed:
is_target_line = any(ci["target_line"] == i for ci in channel_info)
if is_target_line:
overlay_trimmed = "" + overlay_trimmed[1:]
is_source_line = any(ci["source_line"] == i for ci in channel_info)
if is_source_line and not line.strip():
# Inserted blank line → build └───┘ connector.
# " └" = 3 chars of content prefix, so remaining pad = max_content_w - 3
remaining_pad = max_content_w - 3
full = list(" " * remaining_pad + overlay_trimmed)
# Find the ┘ corner for this source connector
corner_pos = -1
for ci_s in channel_info:
if ci_s["source_line"] == i:
corner_pos = remaining_pad + (ci_s["col"] - max_content_w)
break
# Fill everything up to the corner with ─
if corner_pos >= 0:
for c in range(corner_pos):
if full[c] not in ("", "", ""):
full[c] = ""
connector = "" + "".join(full).rstrip()
result.append(f"[dim]{connector}[/dim]")
continue
colored_overlay = f"[dim]{' ' * pad}{overlay_trimmed}[/dim]"
result.append(f"{line}{colored_overlay}")
else:
result.append(line)
return result
def _inline_back_edge_fallback(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
) -> list[str]:
"""Fallback: add inline ↺ annotations when terminal is too narrow for channels."""
# Group back-edges by source node
source_to_be: dict[str, list[dict]] = {}
for be in back_edges:
source_to_be.setdefault(be["source"], []).append(be)
result = list(lines)
# Insert annotation lines after each source node's section
offset = 0
all_node_lines = sorted(node_line_map.values())
for source, bes in source_to_be.items():
source_line = node_line_map.get(source)
if source_line is None:
continue
# Find end of source node section
end_line = source_line
for nl in all_node_lines:
if nl > source_line:
end_line = nl - 1
break
else:
end_line = len(lines) - 1
# Insert after last content line of this node's section
insert_at = end_line + offset + 1
for be in bes:
cond = ""
edge = be["edge"]
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
annotation = f" [yellow]↺[/yellow] {be['target']}{cond}"
result.insert(insert_at, annotation)
insert_at += 1
offset += 1
return result
# ------------------------------------------------------------------
# Main display
# ------------------------------------------------------------------
def _display_graph(self) -> None:
"""Display the graph as an ASCII DAG with edge connectors."""
"""Display the graph as an ASCII DAG with edge connectors and loop channels."""
display = self.query_one("#graph-display", RichLog)
display.clear()
graph = self.runtime.graph
display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
# Render each node in topological order with edges
ordered = self._topo_order()
order_idx = {nid: i for i, nid in enumerate(ordered)}
# --- Pass 1: Build line buffer ---
lines: list[str] = []
node_line_map: dict[str, int] = {}
for node_id in ordered:
display.write(self._render_node_line(node_id))
for edge_line in self._render_edges(node_id):
display.write(edge_line)
node_line_map[node_id] = len(lines)
lines.append(self._render_node_line(node_id))
for edge_line in self._render_edges(node_id, order_idx):
lines.append(edge_line)
# --- Pass 2: Overlay return channels for back-edges ---
back_edges = self._detect_back_edges(ordered)
if back_edges:
# Try to get actual widget width; default to a reasonable value
try:
available_width = self.size.width or 60
except Exception:
available_width = 60
lines = self._overlay_return_channels(lines, node_line_map, back_edges, available_width)
# Write all lines
for line in lines:
display.write(line)
# Execution path footer
if self.execution_path:
display.write("")
display.write(f"[dim]Path:[/dim] {''.join(self.execution_path[-5:])}")
# Event sources section
self._render_event_sources(display)
# ------------------------------------------------------------------
# Event sources display
# ------------------------------------------------------------------
def _render_event_sources(self, display: RichLog) -> None:
"""Render event source info (webhooks, timers) below the graph."""
entry_points = self.runtime.get_entry_points()
# Filter to non-manual entry points (webhooks, timers, events)
event_sources = [ep for ep in entry_points if ep.trigger_type not in ("manual",)]
if not event_sources:
return
display.write("")
display.write("[bold cyan]Event Sources[/bold cyan]")
config = self.runtime._config
for ep in event_sources:
if ep.trigger_type == "timer":
interval = ep.trigger_config.get("interval_minutes", "?")
display.write(f" [green]⏱[/green] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show interval + next fire countdown
next_fire = self.runtime._timer_next_fire.get(ep.id)
if next_fire is not None:
remaining = max(0, next_fire - time.monotonic())
mins, secs = divmod(int(remaining), 60)
display.write(
f" [dim]every {interval} min — next in {mins}m {secs:02d}s[/dim]"
)
else:
display.write(f" [dim]every {interval} min[/dim]")
elif ep.trigger_type in ("event", "webhook"):
display.write(f" [yellow]⚡[/yellow] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show webhook endpoint if configured
route = None
for r in config.webhook_routes:
src = r.get("source_id", "")
if src and src in ep.id:
route = r
break
if not route and config.webhook_routes:
# Fall back to first route
route = config.webhook_routes[0]
if route:
host = config.webhook_host
port = config.webhook_port
path = route.get("path", "/webhook")
display.write(f" [dim]{host}:{port}{path}[/dim]")
else:
event_types = ep.trigger_config.get("event_types", [])
if event_types:
display.write(f" [dim]events: {', '.join(event_types)}[/dim]")
# ------------------------------------------------------------------
# Public API (called by app.py)
# ------------------------------------------------------------------
def update_active_node(self, node_id: str) -> None:
"""Update the currently active node."""
self.active_node = node_id
@@ -177,6 +542,8 @@ class GraphOverview(Vertical):
def handle_node_loop_completed(self, node_id: str) -> None:
"""A node's event loop completed."""
self._node_status.pop(node_id, None)
if self.active_node == node_id:
self.active_node = None
self._display_graph()
def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
@@ -192,3 +559,8 @@ class GraphOverview(Vertical):
"""Highlight a stalled node."""
self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
self._display_graph()
def handle_edge_traversed(self, source_node: str, target_node: str) -> None:
"""Highlight an edge being traversed."""
self._node_status[source_node] = f"[dim]→ {target_node}[/dim]"
self._display_graph()
+104 -86
View File
@@ -1,5 +1,8 @@
"""
Log Pane Widget - Uses RichLog for reliable rendering.
Log formatting utilities and LogPane widget.
The module-level functions (format_event, extract_event_text, format_python_log)
can be used by any widget that needs to render log lines without instantiating LogPane.
"""
import logging
@@ -11,36 +14,108 @@ from textual.containers import Container
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# --- Module-level formatting constants ---
EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
EventType.EXECUTION_STARTED: (">>", "bold cyan"),
EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
EventType.EXECUTION_FAILED: ("!!", "bold red"),
EventType.TOOL_CALL_STARTED: ("->", "yellow"),
EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
EventType.NODE_LOOP_ITERATION: ("..", "dim"),
EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
EventType.NODE_STALLED: ("!!", "bold yellow"),
EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
EventType.GOAL_PROGRESS: ("%%", "blue"),
EventType.GOAL_ACHIEVED: ("**", "bold green"),
EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
EventType.STATE_CHANGED: ("~~", "dim"),
EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
}
LOG_LEVEL_COLORS: dict[int, str] = {
logging.DEBUG: "dim",
logging.INFO: "",
logging.WARNING: "yellow",
logging.ERROR: "red",
logging.CRITICAL: "bold red",
}
# --- Module-level formatting functions ---
def extract_event_text(event: AgentEvent) -> str:
"""Extract human-readable text from an event's data dict."""
et = event.type
data = event.data
if et == EventType.EXECUTION_STARTED:
return "Execution started"
elif et == EventType.EXECUTION_COMPLETED:
return "Execution completed"
elif et == EventType.EXECUTION_FAILED:
return f"Execution FAILED: {data.get('error', 'unknown')}"
elif et == EventType.TOOL_CALL_STARTED:
return f"Tool call: {data.get('tool_name', 'unknown')}"
elif et == EventType.TOOL_CALL_COMPLETED:
name = data.get("tool_name", "unknown")
if data.get("is_error"):
preview = str(data.get("result", ""))[:80]
return f"Tool error: {name} - {preview}"
return f"Tool done: {name}"
elif et == EventType.NODE_LOOP_STARTED:
return f"Node started: {event.node_id or 'unknown'}"
elif et == EventType.NODE_LOOP_ITERATION:
return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
elif et == EventType.NODE_LOOP_COMPLETED:
return f"Node done: {event.node_id or 'unknown'}"
elif et == EventType.NODE_STALLED:
reason = data.get("reason", "")
node = event.node_id or "unknown"
return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
elif et == EventType.NODE_INPUT_BLOCKED:
return f"Node input blocked: {event.node_id or 'unknown'}"
elif et == EventType.GOAL_PROGRESS:
return f"Goal progress: {data.get('progress', '?')}"
elif et == EventType.GOAL_ACHIEVED:
return "Goal achieved"
elif et == EventType.CONSTRAINT_VIOLATION:
return f"Constraint violated: {data.get('description', 'unknown')}"
elif et == EventType.STATE_CHANGED:
return f"State changed: {data.get('key', 'unknown')}"
elif et == EventType.CLIENT_INPUT_REQUESTED:
return "Waiting for user input"
else:
return f"{et.value}: {data}"
def format_event(event: AgentEvent) -> str:
"""Format an AgentEvent as a Rich markup string with timestamp + symbol."""
ts = event.timestamp.strftime("%H:%M:%S")
symbol, color = EVENT_FORMAT.get(event.type, ("--", "dim"))
text = extract_event_text(event)
return f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]"
def format_python_log(record: logging.LogRecord) -> str:
"""Format a Python log record as a Rich markup string with timestamp and severity color."""
ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
color = LOG_LEVEL_COLORS.get(record.levelno, "")
msg = record.getMessage()
if color:
return f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}"
else:
return f"[dim]{ts}[/dim] {record.levelname} {msg}"
# --- LogPane widget (kept for backward compatibility) ---
class LogPane(Container):
"""Widget to display logs with reliable rendering."""
_EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
EventType.EXECUTION_STARTED: (">>", "bold cyan"),
EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
EventType.EXECUTION_FAILED: ("!!", "bold red"),
EventType.TOOL_CALL_STARTED: ("->", "yellow"),
EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
EventType.NODE_LOOP_ITERATION: ("..", "dim"),
EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
EventType.NODE_STALLED: ("!!", "bold yellow"),
EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
EventType.GOAL_PROGRESS: ("%%", "blue"),
EventType.GOAL_ACHIEVED: ("**", "bold green"),
EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
EventType.STATE_CHANGED: ("~~", "dim"),
EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
}
_LOG_LEVEL_COLORS = {
logging.DEBUG: "dim",
logging.INFO: "",
logging.WARNING: "yellow",
logging.ERROR: "red",
logging.CRITICAL: "bold red",
}
DEFAULT_CSS = """
LogPane {
width: 100%;
@@ -58,84 +133,27 @@ class LogPane(Container):
"""
def compose(self) -> ComposeResult:
# RichLog is designed for log display and doesn't have TextArea's rendering issues
yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
def write_event(self, event: AgentEvent) -> None:
"""Format an AgentEvent with timestamp + symbol and write to the log."""
ts = event.timestamp.strftime("%H:%M:%S")
symbol, color = self._EVENT_FORMAT.get(event.type, ("--", "dim"))
text = self._extract_event_text(event)
self.write_log(f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]")
def _extract_event_text(self, event: AgentEvent) -> str:
"""Extract human-readable text from an event's data dict."""
et = event.type
data = event.data
if et == EventType.EXECUTION_STARTED:
return "Execution started"
elif et == EventType.EXECUTION_COMPLETED:
return "Execution completed"
elif et == EventType.EXECUTION_FAILED:
return f"Execution FAILED: {data.get('error', 'unknown')}"
elif et == EventType.TOOL_CALL_STARTED:
return f"Tool call: {data.get('tool_name', 'unknown')}"
elif et == EventType.TOOL_CALL_COMPLETED:
name = data.get("tool_name", "unknown")
if data.get("is_error"):
preview = str(data.get("result", ""))[:80]
return f"Tool error: {name} - {preview}"
return f"Tool done: {name}"
elif et == EventType.NODE_LOOP_STARTED:
return f"Node started: {event.node_id or 'unknown'}"
elif et == EventType.NODE_LOOP_ITERATION:
return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
elif et == EventType.NODE_LOOP_COMPLETED:
return f"Node done: {event.node_id or 'unknown'}"
elif et == EventType.NODE_STALLED:
reason = data.get("reason", "")
node = event.node_id or "unknown"
return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
elif et == EventType.NODE_INPUT_BLOCKED:
return f"Node input blocked: {event.node_id or 'unknown'}"
elif et == EventType.GOAL_PROGRESS:
return f"Goal progress: {data.get('progress', '?')}"
elif et == EventType.GOAL_ACHIEVED:
return "Goal achieved"
elif et == EventType.CONSTRAINT_VIOLATION:
return f"Constraint violated: {data.get('description', 'unknown')}"
elif et == EventType.STATE_CHANGED:
return f"State changed: {data.get('key', 'unknown')}"
elif et == EventType.CLIENT_INPUT_REQUESTED:
return "Waiting for user input"
else:
return f"{et.value}: {data}"
self.write_log(format_event(event))
def write_python_log(self, record: logging.LogRecord) -> None:
"""Format a Python log record with timestamp and severity color."""
ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
color = self._LOG_LEVEL_COLORS.get(record.levelno, "")
msg = record.getMessage()
if color:
self.write_log(f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}")
else:
self.write_log(f"[dim]{ts}[/dim] {record.levelname} {msg}")
self.write_log(format_python_log(record))
def write_log(self, message: str) -> None:
"""Write a log message to the log pane."""
try:
# Check if widget is mounted
if not self.is_mounted:
return
log = self.query_one("#main-log", RichLog)
# Check if log is mounted
if not log.is_mounted:
return
# Only auto-scroll if user is already at the bottom
was_at_bottom = log.is_vertical_scroll_end
log.write(message)
@@ -195,12 +195,27 @@ def _copy_to_clipboard(text: str) -> None:
try:
if sys.platform == "darwin":
subprocess.run(["pbcopy"], input=text.encode(), check=True, timeout=5)
elif sys.platform.startswith("linux"):
elif sys.platform == "win32":
subprocess.run(
["xclip", "-selection", "clipboard"],
input=text.encode(),
["clip.exe"],
input=text.encode("utf-16le"),
check=True,
timeout=5,
)
elif sys.platform.startswith("linux"):
try:
subprocess.run(
["xclip", "-selection", "clipboard"],
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
subprocess.run(
["xsel", "--clipboard", "--input"],
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
pass
+5
View File
@@ -0,0 +1,5 @@
"""Utility functions for the Hive framework."""
from framework.utils.io import atomic_write
__all__ = ["atomic_write"]
+1
View File
@@ -20,6 +20,7 @@ dependencies = [
[project.optional-dependencies]
tui = ["textual>=0.75.0"]
webhook = ["aiohttp>=3.9.0"]
[project.scripts]
hive = "framework.cli:main"
@@ -157,39 +157,6 @@ class TestEventLoopOutputKeyOverlap:
key_errors = [e for e in errors if "output_key" in e]
assert len(key_errors) == 0
def test_overlapping_keys_non_event_loop_no_error(self):
"""Non-event_loop nodes with overlapping keys -> no error (last-wins OK)."""
graph = GraphSpec(
id="g1",
goal_id="goal1",
entry_node="src",
nodes=[
NodeSpec(id="src", name="src", description="Source node"),
NodeSpec(
id="a",
name="a",
description="Node a",
node_type="llm_generate",
output_keys=["shared"],
),
NodeSpec(
id="b",
name="b",
description="Node b",
node_type="llm_generate",
output_keys=["shared"],
),
],
edges=[
EdgeSpec(id="src->a", source="src", target="a", condition=EdgeCondition.ON_SUCCESS),
EdgeSpec(id="src->b", source="src", target="b", condition=EdgeCondition.ON_SUCCESS),
],
)
errors = graph.validate()
key_errors = [e for e in errors if "output_key" in e]
assert len(key_errors) == 0
# ---------------------------------------------------------------------------
# Baseline: no fan-out -> no errors from these rules
@@ -85,14 +85,14 @@ async def test_direct_key_access_in_conditional_edge():
id="score_node",
name="ScoreNode",
description="Outputs a score",
node_type="function",
node_type="event_loop",
output_keys=["score"],
),
NodeSpec(
id="high_score_node",
name="HighScoreNode",
description="Handles high scores",
node_type="function",
node_type="event_loop",
input_keys=["score"],
output_keys=["result"],
),
@@ -153,14 +153,14 @@ async def test_backward_compatibility_output_syntax():
id="score_node",
name="ScoreNode",
description="Outputs a score",
node_type="function",
node_type="event_loop",
output_keys=["score"],
),
NodeSpec(
id="consumer_node",
name="ConsumerNode",
description="Consumer",
node_type="function",
node_type="event_loop",
input_keys=["score"],
output_keys=["processed"],
),
@@ -221,14 +221,14 @@ async def test_multiple_keys_in_expression():
id="multi_key_node",
name="MultiKeyNode",
description="Outputs multiple keys",
node_type="function",
node_type="event_loop",
output_keys=["x", "y"],
),
NodeSpec(
id="consumer_node",
name="ConsumerNode",
description="Consumer",
node_type="function",
node_type="event_loop",
input_keys=["x", "y"],
output_keys=["processed"],
),
@@ -295,14 +295,14 @@ async def test_negative_case_condition_false():
id="low_score_node",
name="LowScoreNode",
description="Outputs low score",
node_type="function",
node_type="event_loop",
output_keys=["score"],
),
NodeSpec(
id="high_score_handler",
name="HighScoreHandler",
description="Should NOT execute",
node_type="function",
node_type="event_loop",
input_keys=["score"],
output_keys=["result"],
),
+538
View File
@@ -0,0 +1,538 @@
"""Tests for the Continuous Agent architecture (conversation threading + cumulative tools).
Validates:
- conversation_mode="isolated" preserves existing behavior
- conversation_mode="continuous" threads one conversation across nodes
- Transition markers are inserted at phase boundaries
- System prompt updates at each transition (layered prompt composition)
- Tools accumulate across nodes in continuous mode
- prompt_composer functions work correctly
"""
from __future__ import annotations
from collections.abc import AsyncIterator
from typing import Any
from unittest.mock import MagicMock
import pytest
from framework.graph.conversation import NodeConversation
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Goal
from framework.graph.node import NodeResult, NodeSpec, SharedMemory
from framework.graph.prompt_composer import (
build_narrative,
build_transition_marker,
compose_system_prompt,
)
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import FinishEvent, TextDeltaEvent, ToolCallEvent
from framework.runtime.core import Runtime
# ---------------------------------------------------------------------------
# Mock LLM
# ---------------------------------------------------------------------------
class MockStreamingLLM(LLMProvider):
"""Mock LLM that yields pre-programmed StreamEvent sequences."""
def __init__(self, scenarios: list[list] | None = None):
self.scenarios = scenarios or []
self._call_index = 0
self.stream_calls: list[dict] = []
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator:
self.stream_calls.append({"messages": messages, "system": system, "tools": tools})
if not self.scenarios:
return
events = self.scenarios[self._call_index % len(self.scenarios)]
self._call_index += 1
for event in events:
yield event
def complete(self, messages, system="", **kwargs) -> LLMResponse:
return LLMResponse(content="Summary.", model="mock", stop_reason="stop")
def complete_with_tools(self, messages, system, tools, tool_executor, **kwargs) -> LLMResponse:
return LLMResponse(content="", model="mock", stop_reason="stop")
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _set_output_scenario(key: str, value: str) -> list:
"""LLM calls set_output then finishes."""
return [
ToolCallEvent(
tool_use_id=f"call_{key}",
tool_name="set_output",
tool_input={"key": key, "value": value},
),
FinishEvent(stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"),
]
def _text_then_set_output(text: str, key: str, value: str) -> list:
"""LLM produces text, then calls set_output, then finishes (2 turns needed)."""
return [
TextDeltaEvent(content=text, snapshot=text),
ToolCallEvent(
tool_use_id=f"call_{key}",
tool_name="set_output",
tool_input={"key": key, "value": value},
),
FinishEvent(stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"),
]
def _text_finish(text: str) -> list:
"""LLM produces text and stops (triggers judge)."""
return [
TextDeltaEvent(content=text, snapshot=text),
FinishEvent(stop_reason="stop", input_tokens=10, output_tokens=5, model="mock"),
]
def _make_runtime():
rt = MagicMock(spec=Runtime)
rt.start_run = MagicMock(return_value="run_1")
rt.end_run = MagicMock()
rt.report_problem = MagicMock()
rt.decide = MagicMock(return_value="dec_1")
rt.record_outcome = MagicMock()
rt.set_node = MagicMock()
return rt
def _make_goal():
return Goal(id="g1", name="test", description="test goal")
def _make_tool(name: str) -> Tool:
return Tool(
name=name,
description=f"Tool {name}",
parameters={"type": "object", "properties": {}},
)
# ===========================================================================
# prompt_composer unit tests
# ===========================================================================
class TestComposeSystemPrompt:
def test_all_layers(self):
result = compose_system_prompt(
identity_prompt="I am a research agent.",
focus_prompt="Focus on writing the report.",
narrative="We found 5 sources on topic X.",
)
assert "I am a research agent." in result
assert "Focus on writing the report." in result
assert "We found 5 sources on topic X." in result
# Identity comes first
assert result.index("I am a research agent.") < result.index("Focus on writing")
def test_identity_only(self):
result = compose_system_prompt(identity_prompt="I am an agent.", focus_prompt=None)
assert result == "I am an agent."
def test_focus_only(self):
result = compose_system_prompt(identity_prompt=None, focus_prompt="Do the thing.")
assert "Current Focus" in result
assert "Do the thing." in result
def test_empty(self):
result = compose_system_prompt(identity_prompt=None, focus_prompt=None)
assert result == ""
class TestBuildNarrative:
def test_with_execution_path(self):
memory = SharedMemory()
memory.write("findings", "some findings")
node_a = NodeSpec(
id="a", name="Research", description="Research the topic", node_type="event_loop"
)
node_b = NodeSpec(id="b", name="Report", description="Write report", node_type="event_loop")
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="a",
nodes=[node_a, node_b],
edges=[],
)
result = build_narrative(memory, ["a"], graph)
assert "Research" in result
assert "findings" in result
def test_empty_state(self):
memory = SharedMemory()
graph = GraphSpec(id="g1", goal_id="g1", entry_node="a", nodes=[], edges=[])
result = build_narrative(memory, [], graph)
assert result == ""
class TestBuildTransitionMarker:
def test_basic_marker(self):
prev = NodeSpec(
id="research", name="Research", description="Find sources", node_type="event_loop"
)
next_n = NodeSpec(
id="report", name="Report", description="Write report", node_type="event_loop"
)
memory = SharedMemory()
memory.write("findings", "important stuff")
marker = build_transition_marker(
previous_node=prev,
next_node=next_n,
memory=memory,
cumulative_tool_names=["web_search", "save_data"],
)
assert "PHASE TRANSITION" in marker
assert "Research" in marker
assert "Report" in marker
assert "findings" in marker
assert "web_search" in marker
assert "reflect" in marker.lower()
# ===========================================================================
# NodeConversation.update_system_prompt
# ===========================================================================
class TestUpdateSystemPrompt:
def test_update(self):
conv = NodeConversation(system_prompt="original")
assert conv.system_prompt == "original"
conv.update_system_prompt("updated")
assert conv.system_prompt == "updated"
# ===========================================================================
# Conversation threading through executor
# ===========================================================================
class TestContinuousConversation:
"""Test that conversation_mode='continuous' threads a single conversation."""
@pytest.mark.asyncio
async def test_isolated_mode_no_conversation_in_result(self):
"""In isolated mode, NodeResult.conversation should be None."""
runtime = _make_runtime()
llm = MockStreamingLLM(
scenarios=[
_set_output_scenario("result", "done"),
_text_finish("accepted"),
]
)
spec = NodeSpec(
id="n1",
name="Node1",
description="test",
node_type="event_loop",
output_keys=["result"],
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[spec],
edges=[],
conversation_mode="isolated",
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
@pytest.mark.asyncio
async def test_continuous_threads_conversation(self):
"""In continuous mode, second node sees messages from first node."""
runtime = _make_runtime()
# Node A: set_output("brief", "the brief"), then finish (accept)
# Node B: set_output("report", "the report"), then finish (accept)
llm = MockStreamingLLM(
scenarios=[
_text_then_set_output("I'll research this.", "brief", "the brief"),
_text_finish(""), # triggers accept for node A (all keys set)
_text_then_set_output("Here's the report.", "report", "the report"),
_text_finish(""), # triggers accept for node B
]
)
node_a = NodeSpec(
id="a",
name="Intake",
description="Gather requirements",
node_type="event_loop",
output_keys=["brief"],
)
node_b = NodeSpec(
id="b",
name="Report",
description="Write report",
node_type="event_loop",
input_keys=["brief"],
output_keys=["report"],
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="a",
nodes=[node_a, node_b],
edges=[EdgeSpec(id="e1", source="a", target="b", condition=EdgeCondition.ON_SUCCESS)],
terminal_nodes=["b"],
conversation_mode="continuous",
identity_prompt="You are a thorough research agent.",
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
assert result.path == ["a", "b"]
# Verify the LLM saw the identity prompt in system messages
# The second node's system prompt should contain the identity
if len(llm.stream_calls) >= 3:
system_at_node_b = llm.stream_calls[2]["system"]
assert "thorough research agent" in system_at_node_b
@pytest.mark.asyncio
async def test_continuous_transition_marker_present(self):
"""Transition marker should appear in messages when switching nodes."""
runtime = _make_runtime()
llm = MockStreamingLLM(
scenarios=[
_text_then_set_output("Research done.", "brief", "the brief"),
_text_finish(""),
_text_then_set_output("Report done.", "report", "the report"),
_text_finish(""),
]
)
node_a = NodeSpec(
id="a",
name="Research",
description="Do research",
node_type="event_loop",
output_keys=["brief"],
)
node_b = NodeSpec(
id="b",
name="Report",
description="Write report",
node_type="event_loop",
input_keys=["brief"],
output_keys=["report"],
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="a",
nodes=[node_a, node_b],
edges=[EdgeSpec(id="e1", source="a", target="b", condition=EdgeCondition.ON_SUCCESS)],
terminal_nodes=["b"],
conversation_mode="continuous",
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# When node B's first LLM call happens, its messages should contain
# the transition marker from the executor
if len(llm.stream_calls) >= 3:
node_b_messages = llm.stream_calls[2]["messages"]
all_content = " ".join(
m.get("content", "") for m in node_b_messages if isinstance(m.get("content"), str)
)
assert "PHASE TRANSITION" in all_content
# ===========================================================================
# Cumulative tools
# ===========================================================================
class TestCumulativeTools:
"""Test that tools accumulate in continuous mode."""
@pytest.mark.asyncio
async def test_isolated_mode_tools_scoped(self):
"""In isolated mode, each node only gets its own declared tools."""
runtime = _make_runtime()
tool_a = _make_tool("web_search")
tool_b = _make_tool("save_data")
llm = MockStreamingLLM(
scenarios=[
_text_then_set_output("Done.", "brief", "brief"),
_text_finish(""),
_text_then_set_output("Done.", "report", "report"),
_text_finish(""),
]
)
node_a = NodeSpec(
id="a",
name="Research",
description="Research",
node_type="event_loop",
output_keys=["brief"],
tools=["web_search"],
)
node_b = NodeSpec(
id="b",
name="Report",
description="Report",
node_type="event_loop",
input_keys=["brief"],
output_keys=["report"],
tools=["save_data"],
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="a",
nodes=[node_a, node_b],
edges=[EdgeSpec(id="e1", source="a", target="b", condition=EdgeCondition.ON_SUCCESS)],
terminal_nodes=["b"],
conversation_mode="isolated",
)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=[tool_a, tool_b],
)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# In isolated mode, node B should NOT have web_search
if len(llm.stream_calls) >= 3:
node_b_tools = llm.stream_calls[2].get("tools") or []
tool_names = [t.name for t in node_b_tools]
assert "save_data" in tool_names or "set_output" in tool_names
# web_search should NOT be present (only set_output + save_data)
real_tools = [n for n in tool_names if n != "set_output"]
assert "web_search" not in real_tools
@pytest.mark.asyncio
async def test_continuous_mode_tools_accumulate(self):
"""In continuous mode, node B should have both web_search and save_data."""
runtime = _make_runtime()
tool_a = _make_tool("web_search")
tool_b = _make_tool("save_data")
llm = MockStreamingLLM(
scenarios=[
_text_then_set_output("Done.", "brief", "brief"),
_text_finish(""),
_text_then_set_output("Done.", "report", "report"),
_text_finish(""),
]
)
node_a = NodeSpec(
id="a",
name="Research",
description="Research",
node_type="event_loop",
output_keys=["brief"],
tools=["web_search"],
)
node_b = NodeSpec(
id="b",
name="Report",
description="Report",
node_type="event_loop",
input_keys=["brief"],
output_keys=["report"],
tools=["save_data"],
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="a",
nodes=[node_a, node_b],
edges=[EdgeSpec(id="e1", source="a", target="b", condition=EdgeCondition.ON_SUCCESS)],
terminal_nodes=["b"],
conversation_mode="continuous",
)
executor = GraphExecutor(
runtime=runtime,
llm=llm,
tools=[tool_a, tool_b],
)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# In continuous mode, node B should have BOTH tools
if len(llm.stream_calls) >= 3:
node_b_tools = llm.stream_calls[2].get("tools") or []
tool_names = [t.name for t in node_b_tools]
real_tools = [n for n in tool_names if n != "set_output"]
assert "web_search" in real_tools
assert "save_data" in real_tools
# ===========================================================================
# Schema field defaults
# ===========================================================================
class TestSchemaDefaults:
def test_graphspec_defaults(self):
"""New fields should have safe defaults."""
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[],
edges=[],
)
assert graph.conversation_mode == "continuous"
assert graph.identity_prompt is None
def test_nodespec_defaults(self):
"""NodeSpec.success_criteria should default to None."""
spec = NodeSpec(
id="n1",
name="test",
description="test",
node_type="event_loop",
)
assert spec.success_criteria is None
def test_noderesult_defaults(self):
"""NodeResult.conversation should default to None."""
result = NodeResult(success=True)
assert result.conversation is None
+380
View File
@@ -0,0 +1,380 @@
"""Tests for Level 2 conversation-aware judge.
Validates:
- No success_criteria Level 0 only (existing behavior)
- success_criteria set, good conversation Level 2 ACCEPT
- success_criteria set, poor conversation Level 2 RETRY with feedback
- Custom explicit judge takes priority over Level 2
- Level 2 fires only when Level 0 passes (all keys set)
- _parse_verdict correctly parses LLM responses
"""
from __future__ import annotations
from collections.abc import AsyncIterator
from typing import Any
from unittest.mock import MagicMock
import pytest
from framework.graph.conversation import NodeConversation
from framework.graph.conversation_judge import (
_parse_verdict,
evaluate_phase_completion,
)
from framework.graph.edge import GraphSpec
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Goal
from framework.graph.node import NodeSpec
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import FinishEvent, TextDeltaEvent, ToolCallEvent
from framework.runtime.core import Runtime
# ---------------------------------------------------------------------------
# Mock LLM
# ---------------------------------------------------------------------------
class MockStreamingLLM(LLMProvider):
"""Mock LLM that yields pre-programmed StreamEvent sequences."""
def __init__(self, scenarios: list[list] | None = None, complete_response: str = ""):
self.scenarios = scenarios or []
self._call_index = 0
self.stream_calls: list[dict] = []
self.complete_response = complete_response
self.complete_calls: list[dict] = []
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator:
self.stream_calls.append({"messages": messages, "system": system, "tools": tools})
if not self.scenarios:
return
events = self.scenarios[self._call_index % len(self.scenarios)]
self._call_index += 1
for event in events:
yield event
def complete(self, messages, system="", **kwargs) -> LLMResponse:
self.complete_calls.append({"messages": messages, "system": system})
return LLMResponse(content=self.complete_response, model="mock", stop_reason="stop")
def complete_with_tools(self, messages, system, tools, tool_executor, **kwargs) -> LLMResponse:
return LLMResponse(content="", model="mock", stop_reason="stop")
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _set_output_scenario(key: str, value: str) -> list:
return [
ToolCallEvent(
tool_use_id=f"call_{key}",
tool_name="set_output",
tool_input={"key": key, "value": value},
),
FinishEvent(stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"),
]
def _text_then_set_output(text: str, key: str, value: str) -> list:
return [
TextDeltaEvent(content=text, snapshot=text),
ToolCallEvent(
tool_use_id=f"call_{key}",
tool_name="set_output",
tool_input={"key": key, "value": value},
),
FinishEvent(stop_reason="tool_calls", input_tokens=10, output_tokens=5, model="mock"),
]
def _text_finish(text: str) -> list:
return [
TextDeltaEvent(content=text, snapshot=text),
FinishEvent(stop_reason="stop", input_tokens=10, output_tokens=5, model="mock"),
]
def _make_runtime():
rt = MagicMock(spec=Runtime)
rt.start_run = MagicMock(return_value="run_1")
rt.end_run = MagicMock()
rt.report_problem = MagicMock()
rt.decide = MagicMock(return_value="dec_1")
rt.record_outcome = MagicMock()
rt.set_node = MagicMock()
return rt
def _make_goal():
return Goal(id="g1", name="test", description="test goal")
# ===========================================================================
# Unit tests for _parse_verdict
# ===========================================================================
class TestParseVerdict:
def test_accept(self):
v = _parse_verdict("ACTION: ACCEPT\nCONFIDENCE: 0.9\nFEEDBACK:")
assert v.action == "ACCEPT"
assert v.confidence == 0.9
assert v.feedback == ""
def test_retry_with_feedback(self):
v = _parse_verdict("ACTION: RETRY\nCONFIDENCE: 0.6\nFEEDBACK: Research is too shallow.")
assert v.action == "RETRY"
assert v.confidence == 0.6
assert "shallow" in v.feedback
def test_defaults_on_garbage(self):
v = _parse_verdict("some random text\nno structured output")
assert v.action == "ACCEPT" # default
assert v.confidence == 0.8 # default
def test_invalid_action_defaults_to_accept(self):
v = _parse_verdict("ACTION: ESCALATE\nCONFIDENCE: 0.5")
assert v.action == "ACCEPT" # ESCALATE not valid for Level 2
# ===========================================================================
# Unit tests for evaluate_phase_completion
# ===========================================================================
class TestEvaluatePhaseCompletion:
@pytest.mark.asyncio
async def test_accept_on_good_response(self):
"""LLM says ACCEPT → verdict is ACCEPT."""
llm = MockStreamingLLM(complete_response="ACTION: ACCEPT\nCONFIDENCE: 0.95\nFEEDBACK:")
conv = NodeConversation(system_prompt="test")
await conv.add_user_message("Do research on topic X")
await conv.add_assistant_message("I found 5 high-quality sources on X.")
verdict = await evaluate_phase_completion(
llm=llm,
conversation=conv,
phase_name="Research",
phase_description="Research the topic",
success_criteria="Find at least 3 credible sources",
accumulator_state={"findings": "5 sources found"},
)
assert verdict.action == "ACCEPT"
assert verdict.confidence == 0.95
@pytest.mark.asyncio
async def test_retry_on_poor_response(self):
"""LLM says RETRY → verdict is RETRY with feedback."""
llm = MockStreamingLLM(
complete_response=(
"ACTION: RETRY\nCONFIDENCE: 0.4\nFEEDBACK: Only found 1 source, need 3."
)
)
conv = NodeConversation(system_prompt="test")
await conv.add_user_message("Do research")
await conv.add_assistant_message("I found 1 source.")
verdict = await evaluate_phase_completion(
llm=llm,
conversation=conv,
phase_name="Research",
phase_description="Research the topic",
success_criteria="Find at least 3 credible sources",
accumulator_state={"findings": "1 source"},
)
assert verdict.action == "RETRY"
assert "1 source" in verdict.feedback
@pytest.mark.asyncio
async def test_llm_failure_defaults_to_accept(self):
"""When LLM fails, Level 2 should not block (Level 0 already passed)."""
llm = MockStreamingLLM()
# Make complete() raise an exception
llm.complete = MagicMock(side_effect=RuntimeError("LLM unavailable"))
conv = NodeConversation(system_prompt="test")
await conv.add_assistant_message("Done.")
verdict = await evaluate_phase_completion(
llm=llm,
conversation=conv,
phase_name="Test",
phase_description="Test phase",
success_criteria="Do the thing",
accumulator_state={"result": "done"},
)
assert verdict.action == "ACCEPT"
assert verdict.confidence == 0.5
# ===========================================================================
# Integration: Level 2 in EventLoopNode implicit judge
# ===========================================================================
class TestLevel2InImplicitJudge:
@pytest.mark.asyncio
async def test_no_success_criteria_level0_only(self):
"""Without success_criteria, Level 0 accepts normally (existing behavior)."""
runtime = _make_runtime()
llm = MockStreamingLLM(
scenarios=[
_set_output_scenario("result", "done"),
_text_finish("accepted"),
]
)
spec = NodeSpec(
id="n1",
name="Node1",
description="test",
node_type="event_loop",
output_keys=["result"],
# No success_criteria!
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[spec],
edges=[],
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# LLM.complete should NOT have been called for Level 2
assert len(llm.complete_calls) == 0
@pytest.mark.asyncio
async def test_success_criteria_accept(self):
"""With success_criteria and good work, Level 2 accepts."""
runtime = _make_runtime()
llm = MockStreamingLLM(
scenarios=[
_text_then_set_output("I did thorough research.", "result", "done"),
_text_finish(""), # triggers judge
],
complete_response="ACTION: ACCEPT\nCONFIDENCE: 0.9\nFEEDBACK:",
)
spec = NodeSpec(
id="n1",
name="Research",
description="Do research",
node_type="event_loop",
output_keys=["result"],
success_criteria="Provide thorough research with multiple sources.",
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[spec],
edges=[],
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# LLM.complete should have been called for Level 2
assert len(llm.complete_calls) >= 1
@pytest.mark.asyncio
async def test_success_criteria_retry_then_accept(self):
"""Level 2 rejects first attempt, LLM tries again, Level 2 accepts."""
runtime = _make_runtime()
# Track complete calls to alternate responses
complete_responses = [
"ACTION: RETRY\nCONFIDENCE: 0.4\nFEEDBACK: Need more detail.",
"ACTION: ACCEPT\nCONFIDENCE: 0.9\nFEEDBACK:",
]
call_count = [0]
class SequentialLLM(MockStreamingLLM):
def complete(self, messages, system="", **kwargs):
idx = call_count[0]
call_count[0] += 1
resp = complete_responses[idx % len(complete_responses)]
return LLMResponse(content=resp, model="mock", stop_reason="stop")
llm = SequentialLLM(
scenarios=[
# Turn 1: set output, then stop → Level 2 RETRY
_text_then_set_output("Brief research.", "result", "brief"),
_text_finish(""), # triggers judge → Level 2 RETRY
# Turn 2: after retry feedback, set output again, stop → Level 2 ACCEPT
_text_then_set_output("Much more detailed research.", "result", "detailed"),
_text_finish(""), # triggers judge → Level 2 ACCEPT
]
)
spec = NodeSpec(
id="n1",
name="Research",
description="Do research",
node_type="event_loop",
output_keys=["result"],
success_criteria="Provide thorough research with multiple sources.",
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[spec],
edges=[],
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# Should have had 2 complete calls (first RETRY, second ACCEPT)
assert call_count[0] >= 2
@pytest.mark.asyncio
async def test_level2_only_fires_when_level0_passes(self):
"""Level 2 should NOT fire when output keys are missing."""
runtime = _make_runtime()
llm = MockStreamingLLM(
scenarios=[
# Turn 1: just text, no set_output → Level 0 RETRY (missing keys)
_text_finish("I did some thinking."),
# Turn 2: set output → Level 0 ACCEPT, Level 2 check
_text_then_set_output("Now I have output.", "result", "done"),
_text_finish(""), # triggers judge
],
complete_response="ACTION: ACCEPT\nCONFIDENCE: 0.9\nFEEDBACK:",
)
spec = NodeSpec(
id="n1",
name="Research",
description="Do research",
node_type="event_loop",
output_keys=["result"],
success_criteria="Provide results.",
)
graph = GraphSpec(
id="g1",
goal_id="g1",
entry_node="n1",
nodes=[spec],
edges=[],
)
executor = GraphExecutor(runtime=runtime, llm=llm)
result = await executor.execute(graph=graph, goal=_make_goal())
assert result.success
# Level 2 should only fire once (when Level 0 passes)
assert len(llm.complete_calls) == 1
+20 -19
View File
@@ -86,6 +86,7 @@ class ScriptableMockLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, Any] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
return LLMResponse(
content="Conversation summary for compaction.",
@@ -825,7 +826,7 @@ async def test_event_loop_no_executor_retry(runtime):
result = await executor.execute(graph, goal, {})
assert not result.success
assert failing_node.attempt_count == 1 # Executor forced max_retries to 0
assert failing_node.attempt_count == 3 # Custom nodes keep their max_retries
# ===========================================================================
@@ -929,6 +930,7 @@ async def test_context_handoff_between_nodes(runtime):
@pytest.mark.asyncio
@pytest.mark.skip(reason="Hangs in non-interactive shells (client-facing blocks on stdin)")
async def test_client_facing_node_streams_output():
"""Client-facing node emits CLIENT_OUTPUT_DELTA events."""
recorded: list[AgentEvent] = []
@@ -1005,11 +1007,20 @@ async def test_internal_node_no_client_output():
@pytest.mark.asyncio
async def test_mixed_node_graph(runtime):
"""function -> event_loop -> function end-to-end."""
"""Simple node -> event_loop -> simple node end-to-end."""
# Function 1: write leads to memory
def load_leads(**kwargs):
return ["lead_A", "lead_B", "lead_C"]
class LoadLeadsNode(NodeProtocol):
async def execute(self, ctx: NodeContext) -> NodeResult:
leads = ["lead_A", "lead_B", "lead_C"]
ctx.memory.write("leads", leads)
return NodeResult(success=True, output={"leads": leads})
class FormatOutputNode(NodeProtocol):
async def execute(self, ctx: NodeContext) -> NodeResult:
summary = ctx.input_data.get("summary", ctx.memory.read("summary") or "no summary")
report = f"Report: {summary}"
ctx.memory.write("report", report)
return NodeResult(success=True, output={"report": report})
# Event loop: process leads, produce summary
el_scripts = [
@@ -1026,18 +1037,12 @@ async def test_mixed_node_graph(runtime):
]
el_llm = ScriptableMockLLMProvider(el_scripts)
# Function 2: format final output
def format_output(**kwargs):
summary = kwargs.get("summary", "no summary")
return f"Report: {summary}"
# Node specs
load_spec = NodeSpec(
id="load",
name="Load Leads",
description="Load lead data",
node_type="function",
function="load_leads",
node_type="event_loop",
output_keys=["leads"],
)
process_spec = NodeSpec(
@@ -1045,17 +1050,13 @@ async def test_mixed_node_graph(runtime):
name="Process Leads",
description="Process leads with LLM",
node_type="event_loop",
# input_keys left empty: EventLoopNode._check_pause() reads "pause_requested"
# from memory, and a restrictive scope would block it. Data flows via input_data.
output_keys=["summary"],
)
format_spec = NodeSpec(
id="format",
name="Format Output",
description="Format final report",
node_type="function",
function="format_output",
# input_keys left empty for same scoping reason with FunctionNode
node_type="event_loop",
output_keys=["report"],
)
@@ -1076,9 +1077,9 @@ async def test_mixed_node_graph(runtime):
goal = Goal(id="test_goal", name="Pipeline Test", description="test full pipeline")
executor = GraphExecutor(runtime=runtime, llm=el_llm)
executor.register_function("load", load_leads)
executor.register_node("load", LoadLeadsNode())
executor.register_node("process", EventLoopNode(config=LoopConfig(max_iterations=5)))
executor.register_function("format", format_output)
executor.register_node("format", FormatOutputNode())
result = await executor.execute(graph, goal, {})
+697
View File
@@ -425,6 +425,7 @@ class TestEventBusLifecycle:
assert EventType.NODE_LOOP_COMPLETED in received_events
@pytest.mark.asyncio
@pytest.mark.skip(reason="Hangs in non-interactive shells (client-facing blocks on stdin)")
async def test_client_facing_uses_client_output_delta(self, runtime, memory):
"""client_facing=True should emit CLIENT_OUTPUT_DELTA instead of LLM_TEXT_DELTA."""
spec = NodeSpec(
@@ -475,6 +476,7 @@ class TestClientFacingBlocking:
)
@pytest.mark.asyncio
@pytest.mark.skip(reason="Hangs in non-interactive shells (client-facing blocks on stdin)")
async def test_text_only_no_blocking(self, runtime, memory, client_spec):
"""client_facing + text-only (no ask_user) should NOT block."""
llm = MockStreamingLLM(
@@ -630,6 +632,7 @@ class TestClientFacingBlocking:
assert received[0].type == EventType.CLIENT_INPUT_REQUESTED
@pytest.mark.asyncio
@pytest.mark.skip(reason="Hangs in non-interactive shells (client-facing blocks on stdin)")
async def test_ask_user_with_real_tools(self, runtime, memory):
"""ask_user alongside real tool calls still triggers blocking."""
spec = NodeSpec(
@@ -993,3 +996,697 @@ class TestOutputAccumulator:
assert acc.get("key1") == "val1"
assert acc.get("key2") == "val2"
assert acc.has_all_keys(["key1", "key2"]) is True
# ===========================================================================
# Transient error retry (ITEM 2)
# ===========================================================================
class ErrorThenSuccessLLM(LLMProvider):
"""LLM that raises on the first N calls, then succeeds.
Used to test the retry-with-backoff wrapper around _run_single_turn().
"""
def __init__(self, error: Exception, fail_count: int, success_scenario: list):
self.error = error
self.fail_count = fail_count
self.success_scenario = success_scenario
self._call_index = 0
async def stream(self, messages, system="", tools=None, max_tokens=4096):
call_num = self._call_index
self._call_index += 1
if call_num < self.fail_count:
raise self.error
for event in self.success_scenario:
yield event
def complete(self, messages, system="", **kwargs) -> LLMResponse:
return LLMResponse(content="ok", model="mock", stop_reason="stop")
def complete_with_tools(self, messages, system, tools, tool_executor, **kwargs) -> LLMResponse:
return LLMResponse(content="", model="mock", stop_reason="stop")
class TestTransientErrorRetry:
"""Test retry-with-backoff for transient LLM errors in EventLoopNode."""
@pytest.mark.asyncio
async def test_transient_error_retries_then_succeeds(self, runtime, node_spec, memory):
"""A transient error on the first try should retry and succeed."""
node_spec.output_keys = []
llm = ErrorThenSuccessLLM(
error=ConnectionError("connection reset"),
fail_count=1,
success_scenario=text_scenario("success"),
)
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
config=LoopConfig(
max_iterations=5,
max_stream_retries=3,
stream_retry_backoff_base=0.01, # fast for tests
),
)
result = await node.execute(ctx)
assert result.success is True
assert llm._call_index == 2 # 1 failure + 1 success
@pytest.mark.asyncio
async def test_permanent_error_no_retry(self, runtime, node_spec, memory):
"""A permanent error (ValueError) should NOT be retried."""
node_spec.output_keys = []
llm = ErrorThenSuccessLLM(
error=ValueError("bad request: invalid model"),
fail_count=1,
success_scenario=text_scenario("success"),
)
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
config=LoopConfig(
max_iterations=5,
max_stream_retries=3,
stream_retry_backoff_base=0.01,
),
)
with pytest.raises(ValueError, match="bad request"):
await node.execute(ctx)
assert llm._call_index == 1 # only tried once
@pytest.mark.asyncio
async def test_transient_error_exhausts_retries(self, runtime, node_spec, memory):
"""Transient errors that exhaust retries should raise."""
node_spec.output_keys = []
llm = ErrorThenSuccessLLM(
error=TimeoutError("request timed out"),
fail_count=100, # always fails
success_scenario=text_scenario("unreachable"),
)
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
config=LoopConfig(
max_iterations=5,
max_stream_retries=2,
stream_retry_backoff_base=0.01,
),
)
with pytest.raises(TimeoutError, match="request timed out"):
await node.execute(ctx)
assert llm._call_index == 3 # 1 initial + 2 retries
@pytest.mark.asyncio
async def test_stream_error_event_retried_as_runtime_error(self, runtime, node_spec, memory):
"""StreamErrorEvent(recoverable=False) raises RuntimeError caught by retry."""
node_spec.output_keys = []
# Scenario: non-recoverable StreamErrorEvent with transient keywords
error_scenario = [
StreamErrorEvent(
error="Stream error: 503 service unavailable",
recoverable=False,
)
]
success_scenario = text_scenario("recovered")
call_index = 0
class StreamErrorThenSuccessLLM(LLMProvider):
async def stream(self, messages, system="", tools=None, max_tokens=4096):
nonlocal call_index
idx = call_index
call_index += 1
if idx == 0:
for event in error_scenario:
yield event
else:
for event in success_scenario:
yield event
def complete(self, messages, system="", **kwargs):
return LLMResponse(
content="ok",
model="mock",
stop_reason="stop",
)
def complete_with_tools(
self,
messages,
system,
tools,
tool_executor,
**kwargs,
):
return LLMResponse(
content="",
model="mock",
stop_reason="stop",
)
llm = StreamErrorThenSuccessLLM()
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
config=LoopConfig(
max_iterations=5,
max_stream_retries=3,
stream_retry_backoff_base=0.01,
),
)
result = await node.execute(ctx)
assert result.success is True
assert call_index == 2
@pytest.mark.asyncio
async def test_retry_emits_event_bus_event(self, runtime, node_spec, memory):
"""Retry should emit NODE_RETRY event on the event bus."""
node_spec.output_keys = []
llm = ErrorThenSuccessLLM(
error=ConnectionError("network down"),
fail_count=1,
success_scenario=text_scenario("ok"),
)
bus = EventBus()
retry_events = []
bus.subscribe(
event_types=[EventType.NODE_RETRY],
handler=lambda e: retry_events.append(e),
)
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
event_bus=bus,
config=LoopConfig(
max_iterations=5,
max_stream_retries=3,
stream_retry_backoff_base=0.01,
),
)
result = await node.execute(ctx)
assert result.success is True
assert len(retry_events) == 1
assert retry_events[0].data["retry_count"] == 1
@pytest.mark.asyncio
async def test_recoverable_stream_error_retried_not_silent(self, runtime, node_spec, memory):
"""Recoverable StreamErrorEvent with empty response should raise ConnectionError.
Previously, recoverable stream errors were silently swallowed,
producing empty responses that the judge retried creating an
infinite loop of 50+ empty-response iterations. Now they raise
ConnectionError so the outer transient-error retry handles them
with proper backoff.
"""
node_spec.output_keys = ["result"]
call_index = 0
class RecoverableErrorThenSuccessLLM(LLMProvider):
async def stream(self, messages, system="", tools=None, max_tokens=4096):
nonlocal call_index
idx = call_index
call_index += 1
if idx == 0:
# Recoverable error with no content
yield StreamErrorEvent(
error="503 service unavailable",
recoverable=True,
)
elif idx == 1:
# Success: set output
for event in tool_call_scenario(
"set_output", {"key": "result", "value": "done"}
):
yield event
else:
# Subsequent calls: text-only (no more tool calls)
for event in text_scenario("done"):
yield event
def complete(self, messages, system="", **kwargs):
return LLMResponse(content="ok", model="mock", stop_reason="stop")
def complete_with_tools(self, messages, system, tools, tool_executor, **kwargs):
return LLMResponse(content="", model="mock", stop_reason="stop")
llm = RecoverableErrorThenSuccessLLM()
ctx = build_ctx(runtime, node_spec, memory, llm)
node = EventLoopNode(
config=LoopConfig(
max_iterations=5,
max_stream_retries=3,
stream_retry_backoff_base=0.01,
),
)
result = await node.execute(ctx)
assert result.success is True
assert result.output.get("result") == "done"
# call 0: recoverable error → ConnectionError raised → outer retry
# call 1: set_output tool call succeeds
# call 2: inner tool loop re-invokes LLM after tool result → text "done"
assert call_index == 3
class TestIsTransientError:
"""Unit tests for _is_transient_error() classification."""
def test_timeout_error(self):
assert EventLoopNode._is_transient_error(TimeoutError("timed out")) is True
def test_connection_error(self):
assert EventLoopNode._is_transient_error(ConnectionError("reset")) is True
def test_os_error(self):
assert EventLoopNode._is_transient_error(OSError("network unreachable")) is True
def test_value_error_not_transient(self):
assert EventLoopNode._is_transient_error(ValueError("bad input")) is False
def test_type_error_not_transient(self):
assert EventLoopNode._is_transient_error(TypeError("wrong type")) is False
def test_runtime_error_with_transient_keywords(self):
check = EventLoopNode._is_transient_error
assert check(RuntimeError("Stream error: 429 rate limit")) is True
assert check(RuntimeError("Stream error: 503")) is True
assert check(RuntimeError("Stream error: connection reset")) is True
assert check(RuntimeError("Stream error: timeout exceeded")) is True
def test_runtime_error_without_transient_keywords(self):
assert EventLoopNode._is_transient_error(RuntimeError("authentication failed")) is False
assert EventLoopNode._is_transient_error(RuntimeError("invalid JSON in response")) is False
# ===========================================================================
# Tool doom loop detection (ITEM 1)
# ===========================================================================
class TestFingerprintToolCalls:
"""Unit tests for _fingerprint_tool_calls()."""
def test_basic_fingerprint(self):
results = [
{"tool_name": "search", "tool_input": {"q": "hello"}},
]
fps = EventLoopNode._fingerprint_tool_calls(results)
assert len(fps) == 1
assert fps[0][0] == "search"
# Args should be JSON with sort_keys
assert fps[0][1] == '{"q": "hello"}'
def test_order_sensitive(self):
r1 = [
{"tool_name": "search", "tool_input": {"q": "a"}},
{"tool_name": "fetch", "tool_input": {"url": "b"}},
]
r2 = [
{"tool_name": "fetch", "tool_input": {"url": "b"}},
{"tool_name": "search", "tool_input": {"q": "a"}},
]
assert EventLoopNode._fingerprint_tool_calls(r1) != (
EventLoopNode._fingerprint_tool_calls(r2)
)
def test_sort_keys_deterministic(self):
r1 = [{"tool_name": "t", "tool_input": {"b": 2, "a": 1}}]
r2 = [{"tool_name": "t", "tool_input": {"a": 1, "b": 2}}]
assert EventLoopNode._fingerprint_tool_calls(r1) == EventLoopNode._fingerprint_tool_calls(
r2
)
class TestIsToolDoomLoop:
"""Unit tests for _is_tool_doom_loop()."""
def test_below_threshold(self):
node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
fp = [("search", '{"q": "hello"}')]
is_doom, _ = node._is_tool_doom_loop([fp, fp])
assert is_doom is False
def test_at_threshold_identical(self):
node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
fp = [("search", '{"q": "hello"}')]
is_doom, desc = node._is_tool_doom_loop([fp, fp, fp])
assert is_doom is True
assert "search" in desc
def test_different_args_no_doom(self):
node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
fp1 = [("search", '{"q": "a"}')]
fp2 = [("search", '{"q": "b"}')]
fp3 = [("search", '{"q": "c"}')]
is_doom, _ = node._is_tool_doom_loop([fp1, fp2, fp3])
assert is_doom is False
def test_disabled_via_config(self):
node = EventLoopNode(
config=LoopConfig(tool_doom_loop_enabled=False),
)
fp = [("search", '{"q": "hello"}')]
is_doom, _ = node._is_tool_doom_loop([fp, fp, fp])
assert is_doom is False
def test_empty_fingerprints_no_doom(self):
node = EventLoopNode(config=LoopConfig(tool_doom_loop_threshold=3))
is_doom, _ = node._is_tool_doom_loop([[], [], []])
assert is_doom is False
class ToolRepeatLLM(LLMProvider):
"""LLM that produces identical tool calls across outer iterations.
Alternates: even calls tool call, odd calls text (exits inner loop).
This ensures each outer iteration = 2 LLM calls with 1 tool executed.
After tool_turns outer iterations, always returns text.
"""
def __init__(
self,
tool_name: str,
tool_input: dict,
tool_turns: int,
final_text: str = "done",
):
self.tool_name = tool_name
self.tool_input = tool_input
self.tool_turns = tool_turns
self.final_text = final_text
self._call_index = 0
async def stream(self, messages, system="", tools=None, max_tokens=4096):
idx = self._call_index
self._call_index += 1
# Which outer iteration we're in (2 calls per iteration)
outer_iter = idx // 2
is_tool_call = (idx % 2 == 0) and outer_iter < self.tool_turns
if is_tool_call:
yield ToolCallEvent(
tool_use_id=f"call_{outer_iter}",
tool_name=self.tool_name,
tool_input=self.tool_input,
)
yield FinishEvent(
stop_reason="tool_calls",
input_tokens=10,
output_tokens=5,
model="mock",
)
else:
# Unique text per call to avoid stall detection
text = f"{self.final_text} (call {idx})"
yield TextDeltaEvent(content=text, snapshot=text)
yield FinishEvent(
stop_reason="stop",
input_tokens=10,
output_tokens=5,
model="mock",
)
def complete(self, messages, system="", **kwargs) -> LLMResponse:
return LLMResponse(
content="ok",
model="mock",
stop_reason="stop",
)
def complete_with_tools(
self,
messages,
system,
tools,
tool_executor,
**kwargs,
) -> LLMResponse:
return LLMResponse(
content="",
model="mock",
stop_reason="stop",
)
class TestToolDoomLoopIntegration:
"""Integration tests for doom loop detection in execute().
Uses ToolRepeatLLM: returns tool calls for first N calls, then text.
Each outer iteration = 2 LLM calls (tool call + text exit for inner loop).
logged_tool_calls accumulates across inner iterations.
"""
@pytest.mark.asyncio
async def test_doom_loop_injects_warning(
self,
runtime,
node_spec,
memory,
):
"""3 identical tool call turns should inject a warning."""
node_spec.output_keys = []
judge = AsyncMock(spec=JudgeProtocol)
eval_count = 0
async def judge_eval(*args, **kwargs):
nonlocal eval_count
eval_count += 1
if eval_count >= 4:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="RETRY")
judge.evaluate = judge_eval
# 3 tool calls (6 LLM calls: tool+text each), then 1 text
llm = ToolRepeatLLM("search", {"q": "hello"}, tool_turns=3)
def tool_exec(tool_use: ToolUse) -> ToolResult:
return ToolResult(
tool_use_id=tool_use.id,
content="result",
is_error=False,
)
ctx = build_ctx(
runtime,
node_spec,
memory,
llm,
tools=[Tool(name="search", description="s", parameters={})],
)
node = EventLoopNode(
judge=judge,
tool_executor=tool_exec,
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
),
)
result = await node.execute(ctx)
assert result.success is True
@pytest.mark.asyncio
async def test_doom_loop_emits_event(
self,
runtime,
node_spec,
memory,
):
"""Doom loop should emit NODE_TOOL_DOOM_LOOP event."""
node_spec.output_keys = []
judge = AsyncMock(spec=JudgeProtocol)
eval_count = 0
async def judge_eval(*args, **kwargs):
nonlocal eval_count
eval_count += 1
if eval_count >= 4:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="RETRY")
judge.evaluate = judge_eval
llm = ToolRepeatLLM("search", {"q": "hello"}, tool_turns=3)
bus = EventBus()
doom_events: list = []
bus.subscribe(
event_types=[EventType.NODE_TOOL_DOOM_LOOP],
handler=lambda e: doom_events.append(e),
)
def tool_exec(tool_use: ToolUse) -> ToolResult:
return ToolResult(
tool_use_id=tool_use.id,
content="result",
is_error=False,
)
ctx = build_ctx(
runtime,
node_spec,
memory,
llm,
tools=[Tool(name="search", description="s", parameters={})],
)
node = EventLoopNode(
judge=judge,
tool_executor=tool_exec,
event_bus=bus,
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
),
)
result = await node.execute(ctx)
assert result.success is True
assert len(doom_events) == 1
assert "search" in doom_events[0].data["description"]
@pytest.mark.asyncio
async def test_doom_loop_disabled(
self,
runtime,
node_spec,
memory,
):
"""Disabled doom loop should not trigger with identical calls."""
node_spec.output_keys = []
judge = AsyncMock(spec=JudgeProtocol)
eval_count = 0
async def judge_eval(*args, **kwargs):
nonlocal eval_count
eval_count += 1
if eval_count >= 4:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="RETRY")
judge.evaluate = judge_eval
llm = ToolRepeatLLM("search", {"q": "hello"}, tool_turns=4)
def tool_exec(tool_use: ToolUse) -> ToolResult:
return ToolResult(
tool_use_id=tool_use.id,
content="result",
is_error=False,
)
ctx = build_ctx(
runtime,
node_spec,
memory,
llm,
tools=[Tool(name="search", description="s", parameters={})],
)
node = EventLoopNode(
judge=judge,
tool_executor=tool_exec,
config=LoopConfig(
max_iterations=10,
tool_doom_loop_enabled=False,
),
)
result = await node.execute(ctx)
assert result.success is True
@pytest.mark.asyncio
async def test_different_args_no_doom_loop(
self,
runtime,
node_spec,
memory,
):
"""Different tool args each turn should NOT trigger doom loop."""
node_spec.output_keys = []
judge = AsyncMock(spec=JudgeProtocol)
eval_count = 0
async def judge_eval(*args, **kwargs):
nonlocal eval_count
eval_count += 1
if eval_count >= 4:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="RETRY")
judge.evaluate = judge_eval
# LLM that returns different args each call
call_idx = 0
class DiffArgsLLM(LLMProvider):
async def stream(self, messages, **kwargs):
nonlocal call_idx
idx = call_idx
call_idx += 1
if idx < 3:
yield ToolCallEvent(
tool_use_id=f"c{idx}",
tool_name="search",
tool_input={"q": f"query_{idx}"},
)
yield FinishEvent(
stop_reason="tool_calls",
input_tokens=10,
output_tokens=5,
model="mock",
)
else:
text = f"done (call {idx})"
yield TextDeltaEvent(
content=text,
snapshot=text,
)
yield FinishEvent(
stop_reason="stop",
input_tokens=10,
output_tokens=5,
model="mock",
)
def complete(self, messages, **kwargs):
return LLMResponse(
content="ok",
model="mock",
stop_reason="stop",
)
def complete_with_tools(
self,
messages,
system,
tools,
tool_executor,
**kw,
):
return LLMResponse(
content="",
model="mock",
stop_reason="stop",
)
llm = DiffArgsLLM()
def tool_exec(tool_use: ToolUse) -> ToolResult:
return ToolResult(
tool_use_id=tool_use.id,
content="result",
is_error=False,
)
ctx = build_ctx(
runtime,
node_spec,
memory,
llm,
tools=[Tool(name="search", description="s", parameters={})],
)
node = EventLoopNode(
judge=judge,
tool_executor=tool_exec,
config=LoopConfig(
max_iterations=10,
tool_doom_loop_threshold=3,
),
)
result = await node.execute(ctx)
assert result.success is True
+11 -11
View File
@@ -65,7 +65,7 @@ def test_client_facing_defaults_false():
id="n1",
name="Node 1",
description="test",
node_type="llm_generate",
node_type="event_loop",
)
assert spec.client_facing is False
@@ -143,7 +143,7 @@ def test_registered_event_loop_returns_impl(runtime):
@pytest.mark.asyncio
async def test_event_loop_max_retries_forced_zero(runtime):
"""An event_loop node with max_retries=3 should only execute once (no retry)."""
"""Custom NodeProtocol impls with node_type=event_loop keep their max_retries."""
node_spec = NodeSpec(
id="el_fail",
name="Failing Event Loop",
@@ -171,9 +171,9 @@ async def test_event_loop_max_retries_forced_zero(runtime):
result = await executor.execute(graph, goal, {})
# Event loop nodes get max_retries overridden to 0, meaning execute once then fail
# Custom nodes (not EventLoopNode instances) keep their max_retries
assert not result.success
assert failing_node.attempt_count == 1
assert failing_node.attempt_count == 3
@pytest.mark.asyncio
@@ -246,21 +246,21 @@ async def test_event_loop_max_retries_positive_logs_warning(runtime, caplog):
with caplog.at_level(logging.WARNING):
await executor.execute(graph, goal, {})
assert "Overriding to 0" in caplog.text
assert "el_warn" in caplog.text
# Custom nodes (not EventLoopNode instances) don't get override warning
assert "Overriding to 0" not in caplog.text
# --- Existing node types unaffected ---
def test_existing_node_types_unchanged():
"""All pre-existing node types must still be in VALID_NODE_TYPES with defaults preserved."""
expected = {"llm_tool_use", "llm_generate", "router", "function", "human_input"}
assert expected.issubset(GraphExecutor.VALID_NODE_TYPES)
"""Only event_loop is a valid node type."""
expected = {"event_loop"}
assert expected == GraphExecutor.VALID_NODE_TYPES
# Default node_type is still llm_tool_use
# Default node_type is event_loop
spec = NodeSpec(id="x", name="X", description="x")
assert spec.node_type == "llm_tool_use"
assert spec.node_type == "event_loop"
# Default max_retries is still 3
assert spec.max_retries == 3
+9 -6
View File
@@ -106,7 +106,7 @@ class TestExecutionQuality:
id="node1",
name="Always Succeeds",
description="Never fails",
node_type="function",
node_type="event_loop",
output_keys=["result"],
),
],
@@ -151,6 +151,7 @@ class TestExecutionQuality:
)
# Create graph with flaky node (fails 2 times before succeeding)
# (actual impl from registry is FlakyNode)
graph = GraphSpec(
id="test-graph",
goal_id=goal.id,
@@ -159,7 +160,7 @@ class TestExecutionQuality:
id="flaky",
name="Flaky Node",
description="Fails then succeeds",
node_type="function",
node_type="event_loop",
output_keys=["result"],
max_retries=3, # Allow retries
),
@@ -206,6 +207,7 @@ class TestExecutionQuality:
)
# Create graph with always-failing node
# (actual impl from registry is AlwaysFailsNode)
graph = GraphSpec(
id="test-graph",
goal_id=goal.id,
@@ -214,7 +216,7 @@ class TestExecutionQuality:
id="fails",
name="Always Fails",
description="Never succeeds",
node_type="function",
node_type="event_loop",
output_keys=["result"],
max_retries=2, # Will retry twice then fail
),
@@ -261,6 +263,7 @@ class TestExecutionQuality:
)
# Create graph with multiple flaky nodes
# (actual impls from registry are FlakyNode instances)
graph = GraphSpec(
id="test-graph",
goal_id=goal.id,
@@ -269,7 +272,7 @@ class TestExecutionQuality:
id="flaky1",
name="Flaky Node 1",
description="Fails once",
node_type="function",
node_type="event_loop",
output_keys=["result1"],
max_retries=3,
),
@@ -277,7 +280,7 @@ class TestExecutionQuality:
id="flaky2",
name="Flaky Node 2",
description="Fails twice",
node_type="function",
node_type="event_loop",
input_keys=["result1"],
output_keys=["result2"],
max_retries=3,
@@ -286,7 +289,7 @@ class TestExecutionQuality:
id="success",
name="Success Node",
description="Always succeeds",
node_type="function",
node_type="event_loop",
input_keys=["result2"],
output_keys=["final"],
),
+179 -5
View File
@@ -1,13 +1,15 @@
"""Tests for ExecutionStream retention behavior."""
import json
from collections.abc import Callable
from collections.abc import AsyncIterator, Callable
from typing import Any
import pytest
from framework.graph import Goal, NodeSpec, SuccessCriterion
from framework.graph.edge import GraphSpec
from framework.llm.provider import LLMProvider, LLMResponse, Tool
from framework.llm.stream_events import FinishEvent, StreamEvent, TextDeltaEvent, ToolCallEvent
from framework.runtime.event_bus import EventBus
from framework.runtime.execution_stream import EntryPointSpec, ExecutionStream
from framework.runtime.outcome_aggregator import OutcomeAggregator
@@ -16,7 +18,13 @@ from framework.storage.concurrent import ConcurrentStorage
class DummyLLMProvider(LLMProvider):
"""Deterministic LLM provider for execution stream tests."""
"""Deterministic LLM provider for execution stream tests.
Uses set_output tool call to properly set outputs, avoiding stall detection.
"""
def __init__(self):
self._call_count = 0
def complete(
self,
@@ -26,8 +34,9 @@ class DummyLLMProvider(LLMProvider):
max_tokens: int = 1024,
response_format: dict[str, object] | None = None,
json_mode: bool = False,
max_retries: int | None = None,
) -> LLMResponse:
return LLMResponse(content=json.dumps({"result": "ok"}), model="dummy")
return LLMResponse(content="Summary for compaction.", model="dummy")
def complete_with_tools(
self,
@@ -37,7 +46,29 @@ class DummyLLMProvider(LLMProvider):
tool_executor: Callable,
max_iterations: int = 10,
) -> LLMResponse:
return LLMResponse(content=json.dumps({"result": "ok"}), model="dummy")
return LLMResponse(content="Summary for compaction.", model="dummy")
async def stream(
self,
messages: list[dict[str, Any]],
system: str = "",
tools: list[Tool] | None = None,
max_tokens: int = 4096,
) -> AsyncIterator[StreamEvent]:
self._call_count += 1
if self._call_count == 1:
# First call: set the output via tool call
yield ToolCallEvent(
tool_use_id=f"tc_{self._call_count}",
tool_name="set_output",
tool_input={"key": "result", "value": "ok"},
)
yield FinishEvent(stop_reason="tool_use", input_tokens=10, output_tokens=10)
else:
# Subsequent calls: just finish with text
yield TextDeltaEvent(content="Done.", snapshot="Done.")
yield FinishEvent(stop_reason="end_turn", input_tokens=5, output_tokens=5)
@pytest.mark.asyncio
@@ -61,7 +92,7 @@ async def test_execution_stream_retention(tmp_path):
id="hello",
name="Hello",
description="Return a result",
node_type="llm_generate",
node_type="event_loop",
input_keys=["user_name"],
output_keys=["result"],
system_prompt='Return JSON: {"result": "ok"}',
@@ -120,3 +151,146 @@ async def test_execution_stream_retention(tmp_path):
await stream.stop()
await storage.stop()
@pytest.mark.asyncio
async def test_shared_session_reuses_directory_and_memory(tmp_path):
"""When an async entry point uses resume_session_id, it should:
1. Run in the same session directory as the primary execution
2. Have access to the primary session's memory
3. NOT overwrite the primary session's state.json
"""
goal = Goal(
id="test-goal",
name="Test",
description="Shared session test",
success_criteria=[
SuccessCriterion(
id="result",
description="Result present",
metric="output_contains",
target="result",
)
],
constraints=[],
)
node = NodeSpec(
id="hello",
name="Hello",
description="Return a result",
node_type="event_loop",
input_keys=["user_name"],
output_keys=["result"],
system_prompt='Return JSON: {"result": "ok"}',
)
graph = GraphSpec(
id="test-graph",
goal_id=goal.id,
version="1.0.0",
entry_node="hello",
entry_points={"start": "hello"},
terminal_nodes=["hello"],
pause_nodes=[],
nodes=[node],
edges=[],
default_model="dummy",
max_tokens=10,
)
storage = ConcurrentStorage(tmp_path)
await storage.start()
from framework.storage.session_store import SessionStore
session_store = SessionStore(tmp_path)
# Primary stream
primary_stream = ExecutionStream(
stream_id="primary",
entry_spec=EntryPointSpec(
id="primary",
name="Primary",
entry_node="hello",
trigger_type="manual",
isolation_level="shared",
),
graph=graph,
goal=goal,
state_manager=SharedStateManager(),
storage=storage,
outcome_aggregator=OutcomeAggregator(goal, EventBus()),
event_bus=None,
llm=DummyLLMProvider(),
tools=[],
tool_executor=None,
session_store=session_store,
)
await primary_stream.start()
# Run primary execution — creates session directory and state.json
primary_exec_id = await primary_stream.execute({"user_name": "alice"})
primary_result = await primary_stream.wait_for_completion(primary_exec_id, timeout=5)
assert primary_result is not None
assert primary_result.success
# Verify primary session's state.json exists and has the primary entry_point
primary_state_path = tmp_path / "sessions" / primary_exec_id / "state.json"
assert primary_state_path.exists()
primary_state = json.loads(primary_state_path.read_text())
assert primary_state["entry_point"] == "primary"
# Async stream — simulates a webhook entry point sharing the session
async_stream = ExecutionStream(
stream_id="webhook",
entry_spec=EntryPointSpec(
id="webhook",
name="Webhook",
entry_node="hello",
trigger_type="event",
isolation_level="shared",
),
graph=graph,
goal=goal,
state_manager=SharedStateManager(),
storage=storage,
outcome_aggregator=OutcomeAggregator(goal, EventBus()),
event_bus=None,
llm=DummyLLMProvider(),
tools=[],
tool_executor=None,
session_store=session_store,
)
await async_stream.start()
# Run async execution with resume_session_id pointing to primary session
session_state = {
"resume_session_id": primary_exec_id,
"memory": {"rules": "star important emails"},
}
async_exec_id = await async_stream.execute({"event": "new_email"}, session_state=session_state)
# Should reuse the primary session ID
assert async_exec_id == primary_exec_id
async_result = await async_stream.wait_for_completion(async_exec_id, timeout=5)
assert async_result is not None
assert async_result.success
# State.json should NOT have been overwritten by the async execution
# (it should still show the primary entry point)
final_state = json.loads(primary_state_path.read_text())
assert final_state["entry_point"] == "primary"
# Verify only ONE session directory exists (not two)
sessions_dir = tmp_path / "sessions"
session_dirs = [d for d in sessions_dir.iterdir() if d.is_dir()]
assert len(session_dirs) == 1
assert session_dirs[0].name == primary_exec_id
await primary_stream.stop()
await async_stream.stop()
await storage.stop()
+20 -18
View File
@@ -81,7 +81,9 @@ def goal():
def test_max_node_visits_default():
"""NodeSpec.max_node_visits should default to 1."""
spec = NodeSpec(id="n", name="N", description="test", node_type="function", output_keys=["out"])
spec = NodeSpec(
id="n", name="N", description="test", node_type="event_loop", output_keys=["out"]
)
assert spec.max_node_visits == 1
@@ -101,7 +103,7 @@ async def test_visit_limit_skips_node(runtime, goal):
id="a",
name="A",
description="entry with visit limit",
node_type="function",
node_type="event_loop",
output_keys=["a_out"],
max_node_visits=1,
)
@@ -109,7 +111,7 @@ async def test_visit_limit_skips_node(runtime, goal):
id="b",
name="B",
description="middle node",
node_type="function",
node_type="event_loop",
output_keys=["b_out"],
max_node_visits=0, # unlimited — let max_steps guard
)
@@ -159,7 +161,7 @@ async def test_visit_limit_allows_multiple(runtime, goal):
id="a",
name="A",
description="entry allows two visits",
node_type="function",
node_type="event_loop",
output_keys=["a_out"],
max_node_visits=2,
)
@@ -167,7 +169,7 @@ async def test_visit_limit_allows_multiple(runtime, goal):
id="b",
name="B",
description="middle node",
node_type="function",
node_type="event_loop",
output_keys=["b_out"],
max_node_visits=0, # unlimited
)
@@ -215,7 +217,7 @@ async def test_visit_limit_zero_unlimited(runtime, goal):
id="a",
name="A",
description="unlimited visits",
node_type="function",
node_type="event_loop",
output_keys=["a_out"],
max_node_visits=0,
)
@@ -223,7 +225,7 @@ async def test_visit_limit_zero_unlimited(runtime, goal):
id="b",
name="B",
description="middle node",
node_type="function",
node_type="event_loop",
output_keys=["b_out"],
max_node_visits=0,
)
@@ -274,7 +276,7 @@ async def test_conditional_feedback_edge(runtime, goal):
id="director",
name="Director",
description="plans work",
node_type="function",
node_type="event_loop",
output_keys=["plan"],
max_node_visits=2,
)
@@ -282,7 +284,7 @@ async def test_conditional_feedback_edge(runtime, goal):
id="writer",
name="Writer",
description="writes draft",
node_type="function",
node_type="event_loop",
output_keys=["draft", "needs_revision"],
max_node_visits=2,
)
@@ -290,7 +292,7 @@ async def test_conditional_feedback_edge(runtime, goal):
id="output",
name="Output",
description="final output",
node_type="function",
node_type="event_loop",
output_keys=["final"],
)
@@ -370,7 +372,7 @@ async def test_conditional_feedback_false(runtime, goal):
id="director",
name="Director",
description="plans work",
node_type="function",
node_type="event_loop",
output_keys=["plan"],
max_node_visits=2,
)
@@ -378,14 +380,14 @@ async def test_conditional_feedback_false(runtime, goal):
id="writer",
name="Writer",
description="writes draft",
node_type="function",
node_type="event_loop",
output_keys=["draft", "needs_revision"],
)
output_node = NodeSpec(
id="output",
name="Output",
description="final output",
node_type="function",
node_type="event_loop",
output_keys=["final"],
)
@@ -458,14 +460,14 @@ async def test_visit_counts_in_result(runtime, goal):
id="a",
name="A",
description="entry",
node_type="function",
node_type="event_loop",
output_keys=["a_out"],
)
node_b = NodeSpec(
id="b",
name="B",
description="terminal",
node_type="function",
node_type="event_loop",
input_keys=["a_out"],
output_keys=["b_out"],
)
@@ -509,21 +511,21 @@ async def test_conditional_priority_prevents_fanout(runtime, goal):
id="writer",
name="Writer",
description="produces output",
node_type="function",
node_type="event_loop",
output_keys=["draft", "needs_revision"],
)
output_node = NodeSpec(
id="output",
name="Output",
description="forward target",
node_type="function",
node_type="event_loop",
output_keys=["final"],
)
director = NodeSpec(
id="director",
name="Director",
description="feedback target",
node_type="function",
node_type="event_loop",
output_keys=["plan"],
max_node_visits=2,
)
+6 -6
View File
@@ -79,7 +79,7 @@ async def test_executor_respects_custom_max_retries_high(runtime):
name="Flaky Node",
description="A node that fails multiple times before succeeding",
max_retries=10, # Should allow 10 retries
node_type="function",
node_type="event_loop",
output_keys=["result"],
)
@@ -123,7 +123,7 @@ async def test_executor_respects_custom_max_retries_low(runtime):
name="Fragile Node",
description="A node with low retry tolerance",
max_retries=2, # max_retries=N means N total attempts allowed
node_type="function",
node_type="event_loop",
output_keys=["result"],
)
@@ -166,7 +166,7 @@ async def test_executor_respects_default_max_retries(runtime):
name="Default Node",
description="A node using default retry settings",
# max_retries not specified, should default to 3
node_type="function",
node_type="event_loop",
output_keys=["result"],
)
@@ -211,7 +211,7 @@ async def test_executor_max_retries_two_succeeds_on_second(runtime):
name="Two Retry Node",
description="A node with two attempts allowed",
max_retries=2, # max_retries=N means N total attempts allowed
node_type="function",
node_type="event_loop",
output_keys=["result"],
)
@@ -253,7 +253,7 @@ async def test_executor_different_nodes_different_max_retries(runtime):
name="Node 1",
description="First node in multi-node test",
max_retries=2,
node_type="function",
node_type="event_loop",
output_keys=["result1"],
)
@@ -262,7 +262,7 @@ async def test_executor_different_nodes_different_max_retries(runtime):
name="Node 2",
description="Second node in multi-node test",
max_retries=5,
node_type="function",
node_type="event_loop",
input_keys=["result1"],
output_keys=["result2"],
)
+29 -25
View File
@@ -116,7 +116,7 @@ def _make_fanout_graph(
id="source",
name="Source",
description="entry",
node_type="function",
node_type="event_loop",
output_keys=["data"],
)
@@ -164,10 +164,10 @@ def _make_fanout_graph(
async def test_fanout_triggers_on_multiple_success_edges(runtime, goal):
"""Fan-out should activate when a node has >1 ON_SUCCESS outgoing edges."""
b1 = NodeSpec(
id="b1", name="B1", description="branch 1", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="branch 1", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2", name="B2", description="branch 2", node_type="function", output_keys=["b2_out"]
id="b2", name="B2", description="branch 2", node_type="event_loop", output_keys=["b2_out"]
)
graph = _make_fanout_graph([b1, b2])
@@ -195,10 +195,10 @@ async def test_branches_execute_concurrently(runtime, goal):
"""All fan-out branches should be launched via asyncio.gather (concurrent)."""
order = []
b1 = NodeSpec(
id="b1", name="B1", description="branch 1", node_type="function", output_keys=["b1_done"]
id="b1", name="B1", description="branch 1", node_type="event_loop", output_keys=["b1_done"]
)
b2 = NodeSpec(
id="b2", name="B2", description="branch 2", node_type="function", output_keys=["b2_done"]
id="b2", name="B2", description="branch 2", node_type="event_loop", output_keys=["b2_done"]
)
graph = _make_fanout_graph([b1, b2])
@@ -223,13 +223,17 @@ async def test_branches_execute_concurrently(runtime, goal):
async def test_convergence_at_fan_in_node(runtime, goal):
"""After fan-out branches complete, execution should continue at convergence node."""
b1 = NodeSpec(
id="b1", name="B1", description="branch 1", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="branch 1", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2", name="B2", description="branch 2", node_type="function", output_keys=["b2_out"]
id="b2", name="B2", description="branch 2", node_type="event_loop", output_keys=["b2_out"]
)
merge = NodeSpec(
id="merge", name="Merge", description="fan-in", node_type="function", output_keys=["merged"]
id="merge",
name="Merge",
description="fan-in",
node_type="event_loop",
output_keys=["merged"],
)
graph = _make_fanout_graph([b1, b2], fan_in_node=merge)
@@ -255,13 +259,13 @@ async def test_convergence_at_fan_in_node(runtime, goal):
async def test_fail_all_strategy_raises_on_branch_failure(runtime, goal):
"""fail_all should raise RuntimeError if any branch fails."""
b1 = NodeSpec(
id="b1", name="B1", description="ok branch", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="ok branch", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2",
name="B2",
description="bad branch",
node_type="function",
node_type="event_loop",
output_keys=["b2_out"],
max_retries=1,
)
@@ -290,13 +294,13 @@ async def test_fail_all_strategy_raises_on_branch_failure(runtime, goal):
async def test_continue_others_strategy_allows_partial_success(runtime, goal):
"""continue_others should let successful branches complete even if one fails."""
b1 = NodeSpec(
id="b1", name="B1", description="ok", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="ok", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2",
name="B2",
description="fail",
node_type="function",
node_type="event_loop",
output_keys=["b2_out"],
max_retries=1,
)
@@ -325,13 +329,13 @@ async def test_continue_others_strategy_allows_partial_success(runtime, goal):
async def test_wait_all_strategy_collects_all_results(runtime, goal):
"""wait_all should wait for all branches before proceeding."""
b1 = NodeSpec(
id="b1", name="B1", description="ok", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="ok", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2",
name="B2",
description="fail",
node_type="function",
node_type="event_loop",
output_keys=["b2_out"],
max_retries=1,
)
@@ -365,12 +369,12 @@ async def test_per_branch_retry(runtime, goal):
id="b1",
name="B1",
description="flaky",
node_type="function",
node_type="event_loop",
output_keys=["b1_out"],
max_retries=5,
)
b2 = NodeSpec(
id="b2", name="B2", description="solid", node_type="function", output_keys=["b2_out"]
id="b2", name="B2", description="solid", node_type="event_loop", output_keys=["b2_out"]
)
graph = _make_fanout_graph([b1, b2])
@@ -394,13 +398,13 @@ async def test_per_branch_retry(runtime, goal):
async def test_single_edge_no_parallel_overhead(runtime, goal):
"""A single outgoing edge should follow normal sequential path, not fan-out."""
n1 = NodeSpec(
id="n1", name="N1", description="entry", node_type="function", output_keys=["out1"]
id="n1", name="N1", description="entry", node_type="event_loop", output_keys=["out1"]
)
n2 = NodeSpec(
id="n2",
name="N2",
description="next",
node_type="function",
node_type="event_loop",
input_keys=["out1"],
output_keys=["out2"],
)
@@ -432,8 +436,8 @@ async def test_single_edge_no_parallel_overhead(runtime, goal):
def test_detect_fan_out_nodes():
"""GraphSpec.detect_fan_out_nodes should identify fan-out topology."""
b1 = NodeSpec(id="b1", name="B1", description="b", node_type="function", output_keys=["x"])
b2 = NodeSpec(id="b2", name="B2", description="b", node_type="function", output_keys=["y"])
b1 = NodeSpec(id="b1", name="B1", description="b", node_type="event_loop", output_keys=["x"])
b2 = NodeSpec(id="b2", name="B2", description="b", node_type="event_loop", output_keys=["y"])
graph = _make_fanout_graph([b1, b2])
fan_outs = graph.detect_fan_out_nodes()
@@ -447,10 +451,10 @@ def test_detect_fan_out_nodes():
def test_detect_fan_in_nodes():
"""GraphSpec.detect_fan_in_nodes should identify convergence topology."""
b1 = NodeSpec(id="b1", name="B1", description="b", node_type="function", output_keys=["x"])
b2 = NodeSpec(id="b2", name="B2", description="b", node_type="function", output_keys=["y"])
b1 = NodeSpec(id="b1", name="B1", description="b", node_type="event_loop", output_keys=["x"])
b2 = NodeSpec(id="b2", name="B2", description="b", node_type="event_loop", output_keys=["y"])
merge = NodeSpec(
id="merge", name="Merge", description="m", node_type="function", output_keys=["z"]
id="merge", name="Merge", description="m", node_type="event_loop", output_keys=["z"]
)
graph = _make_fanout_graph([b1, b2], fan_in_node=merge)
@@ -467,10 +471,10 @@ def test_detect_fan_in_nodes():
async def test_parallel_disabled_uses_sequential(runtime, goal):
"""When enable_parallel_execution=False, multi-edge should follow first match only."""
b1 = NodeSpec(
id="b1", name="B1", description="b1", node_type="function", output_keys=["b1_out"]
id="b1", name="B1", description="b1", node_type="event_loop", output_keys=["b1_out"]
)
b2 = NodeSpec(
id="b2", name="B2", description="b2", node_type="function", output_keys=["b2_out"]
id="b2", name="B2", description="b2", node_type="event_loop", output_keys=["b2_out"]
)
graph = _make_fanout_graph([b1, b2])
-442
View File
@@ -1,442 +0,0 @@
"""
Tests for the Worker-Judge flexible execution pattern.
Tests cover:
- Plan and PlanStep data structures
- Code sandbox security
- HybridJudge rule evaluation
- WorkerNode action dispatch
- FlexibleGraphExecutor end-to-end
"""
import asyncio
import pytest
from framework.graph.code_sandbox import (
CodeSandbox,
safe_eval,
safe_exec,
)
from framework.graph.goal import Goal, SuccessCriterion
from framework.graph.judge import HybridJudge, create_default_judge
from framework.graph.plan import (
ActionSpec,
ActionType,
EvaluationRule,
ExecutionStatus,
Judgment,
JudgmentAction,
Plan,
PlanExecutionResult,
PlanStep,
StepStatus,
)
class TestPlanDataStructures:
"""Tests for Plan and PlanStep."""
def test_plan_step_creation(self):
"""Test creating a PlanStep."""
action = ActionSpec(
action_type=ActionType.LLM_CALL,
prompt="Hello, world!",
)
step = PlanStep(
id="step_1",
description="Say hello",
action=action,
expected_outputs=["greeting"],
)
assert step.id == "step_1"
assert step.status == StepStatus.PENDING
assert step.action.action_type == ActionType.LLM_CALL
def test_plan_step_is_ready(self):
"""Test PlanStep.is_ready() with dependencies."""
step1 = PlanStep(
id="step_1",
description="First step",
action=ActionSpec(action_type=ActionType.FUNCTION),
dependencies=[],
)
step2 = PlanStep(
id="step_2",
description="Second step",
action=ActionSpec(action_type=ActionType.FUNCTION),
dependencies=["step_1"],
)
# Step 1 is ready (no deps)
assert step1.is_ready(set()) is True
# Step 2 is not ready (dep not met)
assert step2.is_ready(set()) is False
# Step 2 is ready after step 1 completes
assert step2.is_ready({"step_1"}) is True
def test_plan_get_ready_steps(self):
"""Test Plan.get_ready_steps()."""
plan = Plan(
id="test_plan",
goal_id="goal_1",
description="Test plan",
steps=[
PlanStep(
id="step_1",
description="First",
action=ActionSpec(action_type=ActionType.FUNCTION),
dependencies=[],
),
PlanStep(
id="step_2",
description="Second",
action=ActionSpec(action_type=ActionType.FUNCTION),
dependencies=["step_1"],
),
],
)
ready = plan.get_ready_steps()
assert len(ready) == 1
assert ready[0].id == "step_1"
def test_plan_is_complete(self):
"""Test Plan.is_complete()."""
plan = Plan(
id="test_plan",
goal_id="goal_1",
description="Test plan",
steps=[
PlanStep(
id="step_1",
description="First",
action=ActionSpec(action_type=ActionType.FUNCTION),
status=StepStatus.COMPLETED,
),
],
)
assert plan.is_complete() is True
def test_plan_to_feedback_context(self):
"""Test Plan.to_feedback_context()."""
plan = Plan(
id="test_plan",
goal_id="goal_1",
description="Test plan",
steps=[
PlanStep(
id="step_1",
description="Completed step",
action=ActionSpec(action_type=ActionType.FUNCTION),
status=StepStatus.COMPLETED,
result={"data": "value"},
),
PlanStep(
id="step_2",
description="Failed step",
action=ActionSpec(action_type=ActionType.FUNCTION),
status=StepStatus.FAILED,
error="Something went wrong",
attempts=3,
),
],
)
context = plan.to_feedback_context()
assert context["plan_id"] == "test_plan"
assert len(context["completed_steps"]) == 1
assert len(context["failed_steps"]) == 1
assert context["failed_steps"][0]["error"] == "Something went wrong"
class TestCodeSandbox:
"""Tests for code sandbox security."""
def test_simple_execution(self):
"""Test simple code execution."""
result = safe_exec("x = 1 + 2\nresult = x * 3")
assert result.success is True
assert result.variables.get("x") == 3
assert result.result == 9
def test_input_injection(self):
"""Test passing inputs to sandbox."""
result = safe_exec(
"result = x + y",
inputs={"x": 10, "y": 20},
)
assert result.success is True
assert result.result == 30
def test_blocked_import(self):
"""Test that dangerous imports are blocked."""
result = safe_exec("import os")
assert result.success is False
assert "blocked" in result.error.lower() or "import" in result.error.lower()
def test_blocked_private_access(self):
"""Test that private attribute access is blocked."""
result = safe_exec("x = [].__class__.__bases__")
assert result.success is False
def test_blocked_exec_eval(self):
"""Test that exec/eval are blocked."""
result = safe_exec("exec('print(1)')")
assert result.success is False
def test_safe_eval_expression(self):
"""Test safe_eval for expressions."""
result = safe_eval("x + y", inputs={"x": 5, "y": 3})
assert result.success is True
assert result.result == 8
def test_allowed_modules(self):
"""Test that allowed modules work."""
sandbox = CodeSandbox()
# math is in ALLOWED_MODULES
result = sandbox.execute(
"""
import math
result = math.sqrt(16)
""",
inputs={},
)
# Note: imports are blocked by default in validation
# This test documents current behavior
assert result.success is False # imports blocked by validator
class TestHybridJudge:
"""Tests for the HybridJudge."""
def test_rule_based_accept(self):
"""Test rule-based accept judgment."""
judge = HybridJudge()
judge.add_rule(
EvaluationRule(
id="success_check",
description="Accept on success flag",
condition="result.get('success') == True",
action=JudgmentAction.ACCEPT,
)
)
step = PlanStep(
id="test_step",
description="Test",
action=ActionSpec(action_type=ActionType.FUNCTION),
)
goal = Goal(
id="goal_1",
name="Test Goal",
description="A test goal",
success_criteria=[
SuccessCriterion(
id="sc1", description="Complete task", metric="completion", target="100%"
),
],
)
# Use sync version for testing
judgment = asyncio.run(judge.evaluate(step, {"success": True}, goal))
assert judgment.action == JudgmentAction.ACCEPT
assert judgment.rule_matched == "success_check"
def test_rule_based_retry(self):
"""Test rule-based retry judgment."""
judge = HybridJudge()
judge.add_rule(
EvaluationRule(
id="timeout_retry",
description="Retry on timeout",
condition="result.get('error_type') == 'timeout'",
action=JudgmentAction.RETRY,
feedback_template="Timeout occurred, please retry",
)
)
step = PlanStep(
id="test_step",
description="Test",
action=ActionSpec(action_type=ActionType.FUNCTION),
)
goal = Goal(
id="goal_1",
name="Test Goal",
description="A test goal",
success_criteria=[
SuccessCriterion(
id="sc1", description="Complete task", metric="completion", target="100%"
),
],
)
judgment = asyncio.run(judge.evaluate(step, {"error_type": "timeout"}, goal))
assert judgment.action == JudgmentAction.RETRY
def test_rule_priority(self):
"""Test that higher priority rules are checked first."""
judge = HybridJudge()
# Lower priority - would match
judge.add_rule(
EvaluationRule(
id="low_priority",
description="Low priority accept",
condition="True",
action=JudgmentAction.ACCEPT,
priority=1,
)
)
# Higher priority - should match first
judge.add_rule(
EvaluationRule(
id="high_priority",
description="High priority escalate",
condition="True",
action=JudgmentAction.ESCALATE,
priority=100,
)
)
step = PlanStep(
id="test_step",
description="Test",
action=ActionSpec(action_type=ActionType.FUNCTION),
)
goal = Goal(
id="goal_1",
name="Test Goal",
description="A test goal",
success_criteria=[
SuccessCriterion(
id="sc1", description="Complete task", metric="completion", target="100%"
),
],
)
judgment = asyncio.run(judge.evaluate(step, {}, goal))
assert judgment.rule_matched == "high_priority"
assert judgment.action == JudgmentAction.ESCALATE
def test_default_judge_rules(self):
"""Test that create_default_judge includes useful rules."""
judge = create_default_judge()
# Should have rules for common cases
rule_ids = {r.id for r in judge.rules}
assert "explicit_success" in rule_ids
assert "transient_error_retry" in rule_ids
assert "security_escalate" in rule_ids
class TestJudgment:
"""Tests for Judgment data structure."""
def test_judgment_creation(self):
"""Test creating a Judgment."""
judgment = Judgment(
action=JudgmentAction.ACCEPT,
reasoning="Step completed successfully",
confidence=0.95,
)
assert judgment.action == JudgmentAction.ACCEPT
assert judgment.confidence == 0.95
assert judgment.llm_used is False
def test_judgment_with_feedback(self):
"""Test Judgment with feedback for retry/replan."""
judgment = Judgment(
action=JudgmentAction.REPLAN,
reasoning="Missing required data",
feedback="Need to fetch user data first",
context={"missing": ["user_id", "email"]},
)
assert judgment.action == JudgmentAction.REPLAN
assert judgment.feedback is not None
assert "user_id" in judgment.context["missing"]
class TestPlanExecutionResult:
"""Tests for PlanExecutionResult."""
def test_completed_result(self):
"""Test completed execution result."""
result = PlanExecutionResult(
status=ExecutionStatus.COMPLETED,
results={"output": "success"},
steps_executed=5,
total_tokens=1000,
)
assert result.status == ExecutionStatus.COMPLETED
assert result.steps_executed == 5
def test_needs_replan_result(self):
"""Test needs_replan execution result."""
result = PlanExecutionResult(
status=ExecutionStatus.NEEDS_REPLAN,
feedback="Step 3 failed: missing data",
feedback_context={
"completed_steps": ["step_1", "step_2"],
"failed_step": "step_3",
},
completed_steps=["step_1", "step_2"],
)
assert result.status == ExecutionStatus.NEEDS_REPLAN
assert result.feedback is not None
assert len(result.completed_steps) == 2
# Integration tests would require mocking Runtime and LLM
class TestFlexibleExecutorIntegration:
"""Integration tests for FlexibleGraphExecutor."""
def test_executor_creation(self, tmp_path):
"""Test creating a FlexibleGraphExecutor."""
from framework.graph.flexible_executor import FlexibleGraphExecutor
from framework.runtime.core import Runtime
runtime = Runtime(storage_path=tmp_path / "runtime")
executor = FlexibleGraphExecutor(runtime=runtime)
assert executor.runtime == runtime
assert executor.judge is not None
assert executor.worker is not None
def test_executor_with_custom_judge(self, tmp_path):
"""Test executor with custom judge."""
from framework.graph.flexible_executor import FlexibleGraphExecutor
from framework.runtime.core import Runtime
runtime = Runtime(storage_path=tmp_path / "runtime")
custom_judge = HybridJudge()
custom_judge.add_rule(
EvaluationRule(
id="custom_rule",
description="Custom rule",
condition="True",
action=JudgmentAction.ACCEPT,
)
)
executor = FlexibleGraphExecutor(runtime=runtime, judge=custom_judge)
assert len(executor.judge.rules) == 1
assert executor.judge.rules[0].id == "custom_rule"
if __name__ == "__main__":
pytest.main([__file__, "-v"])
+16 -73
View File
@@ -5,7 +5,7 @@ Focused on minimal success and failure scenarios.
import pytest
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.edge import GraphSpec
from framework.graph.executor import GraphExecutor
from framework.graph.goal import Goal
from framework.graph.node import NodeResult, NodeSpec
@@ -49,7 +49,7 @@ async def test_executor_single_node_success():
id="n1",
name="node1",
description="test node",
node_type="llm_generate",
node_type="event_loop",
input_keys=[],
output_keys=["result"],
max_retries=0,
@@ -104,7 +104,7 @@ async def test_executor_single_node_failure():
id="n1",
name="node1",
description="failing node",
node_type="llm_generate",
node_type="event_loop",
input_keys=[],
output_keys=["result"],
max_retries=0,
@@ -143,77 +143,20 @@ class FakeEventBus:
async def emit_node_loop_completed(self, **kwargs):
self.events.append(("completed", kwargs))
async def emit_edge_traversed(self, **kwargs):
self.events.append(("edge_traversed", kwargs))
async def emit_execution_paused(self, **kwargs):
self.events.append(("execution_paused", kwargs))
async def emit_execution_resumed(self, **kwargs):
self.events.append(("execution_resumed", kwargs))
async def emit_node_retry(self, **kwargs):
self.events.append(("node_retry", kwargs))
@pytest.mark.asyncio
async def test_executor_emits_node_events():
"""Executor should emit NODE_LOOP_STARTED/COMPLETED for each non-event_loop node."""
runtime = DummyRuntime()
event_bus = FakeEventBus()
graph = GraphSpec(
id="graph-ev",
goal_id="g-ev",
nodes=[
NodeSpec(
id="n1",
name="first",
description="first node",
node_type="llm_generate",
input_keys=[],
output_keys=["result"],
max_retries=0,
),
NodeSpec(
id="n2",
name="second",
description="second node",
node_type="llm_generate",
input_keys=["result"],
output_keys=["result"],
max_retries=0,
),
],
edges=[
EdgeSpec(
id="e1",
source="n1",
target="n2",
condition=EdgeCondition.ON_SUCCESS,
),
],
entry_node="n1",
terminal_nodes=["n2"],
)
executor = GraphExecutor(
runtime=runtime,
node_registry={
"n1": SuccessNode(),
"n2": SuccessNode(),
},
event_bus=event_bus,
stream_id="test-stream",
)
goal = Goal(id="g-ev", name="event-test", description="test events")
result = await executor.execute(graph=graph, goal=goal)
assert result.success is True
assert result.path == ["n1", "n2"]
# Should have 4 events: started/completed for n1, then started/completed for n2
assert len(event_bus.events) == 4
assert event_bus.events[0] == ("started", {"stream_id": "test-stream", "node_id": "n1"})
assert event_bus.events[1] == (
"completed",
{"stream_id": "test-stream", "node_id": "n1", "iterations": 1},
)
assert event_bus.events[2] == ("started", {"stream_id": "test-stream", "node_id": "n2"})
assert event_bus.events[3] == (
"completed",
{"stream_id": "test-stream", "node_id": "n2", "iterations": 1},
)
# ---- Fake event_loop node (registered, so executor won't emit for it) ----
class FakeEventLoopNode:
@@ -276,7 +219,7 @@ async def test_executor_no_events_without_event_bus():
id="n1",
name="node1",
description="test node",
node_type="llm_generate",
node_type="event_loop",
input_keys=[],
output_keys=["result"],
max_retries=0,
+296 -2
View File
@@ -9,12 +9,18 @@ For live tests (requires API keys):
OPENAI_API_KEY=sk-... pytest tests/test_litellm_provider.py -v -m live
"""
import asyncio
import os
import threading
import time
from datetime import UTC, datetime, timedelta
from unittest.mock import MagicMock, patch
import pytest
from framework.llm.anthropic import AnthropicProvider
from framework.llm.litellm import LiteLLMProvider
from framework.llm.provider import LLMProvider, Tool, ToolResult, ToolUse
from framework.llm.litellm import LiteLLMProvider, _compute_retry_delay
from framework.llm.provider import LLMProvider, LLMResponse, Tool, ToolResult, ToolUse
class TestLiteLLMProviderInit:
@@ -532,3 +538,291 @@ class TestJsonMode:
messages = call_kwargs["messages"]
assert messages[0]["role"] == "system"
assert "Please respond with a valid JSON object" in messages[0]["content"]
class TestComputeRetryDelay:
"""Test _compute_retry_delay() header parsing and fallback logic."""
def test_fallback_exponential_backoff(self):
"""No exception -> exponential backoff."""
assert _compute_retry_delay(0) == 2 # 2 * 2^0
assert _compute_retry_delay(1) == 4 # 2 * 2^1
assert _compute_retry_delay(2) == 8 # 2 * 2^2
assert _compute_retry_delay(3) == 16 # 2 * 2^3
def test_max_delay_cap(self):
"""Backoff should be capped at RATE_LIMIT_MAX_DELAY."""
# 2 * 2^10 = 2048, should be capped at 120
assert _compute_retry_delay(10) == 120
def test_custom_max_delay(self):
"""Custom max_delay should be respected."""
assert _compute_retry_delay(5, max_delay=10) == 10
def test_retry_after_ms_header(self):
"""retry-after-ms header should be parsed as milliseconds."""
exc = _make_exception_with_headers({"retry-after-ms": "5000"})
assert _compute_retry_delay(0, exception=exc) == 5.0
def test_retry_after_ms_fractional(self):
"""retry-after-ms should handle fractional values."""
exc = _make_exception_with_headers({"retry-after-ms": "1500"})
assert _compute_retry_delay(0, exception=exc) == 1.5
def test_retry_after_seconds_header(self):
"""retry-after header as seconds should be parsed."""
exc = _make_exception_with_headers({"retry-after": "3"})
assert _compute_retry_delay(0, exception=exc) == 3.0
def test_retry_after_seconds_fractional(self):
"""retry-after header should handle fractional seconds."""
exc = _make_exception_with_headers({"retry-after": "2.5"})
assert _compute_retry_delay(0, exception=exc) == 2.5
def test_retry_after_ms_takes_priority(self):
"""retry-after-ms should take priority over retry-after."""
exc = _make_exception_with_headers(
{
"retry-after-ms": "2000",
"retry-after": "10",
}
)
assert _compute_retry_delay(0, exception=exc) == 2.0
def test_retry_after_http_date(self):
"""retry-after as HTTP-date should be parsed."""
from email.utils import format_datetime
future = datetime.now(UTC) + timedelta(seconds=5)
date_str = format_datetime(future, usegmt=True)
exc = _make_exception_with_headers({"retry-after": date_str})
delay = _compute_retry_delay(0, exception=exc)
assert 3.0 <= delay <= 6.0 # within tolerance
def test_exception_without_response(self):
"""Exception with response=None should fall back to exponential."""
exc = Exception("test")
exc.response = None # type: ignore[attr-defined]
assert _compute_retry_delay(0, exception=exc) == 2 # exponential fallback
def test_exception_without_response_attr(self):
"""Exception without .response attr should fall back to exponential."""
exc = ValueError("no response attr")
assert _compute_retry_delay(0, exception=exc) == 2
def test_negative_retry_after_clamped_to_zero(self):
"""Negative retry-after should be clamped to 0."""
exc = _make_exception_with_headers({"retry-after": "-5"})
assert _compute_retry_delay(0, exception=exc) == 0
def test_negative_retry_after_ms_clamped_to_zero(self):
"""Negative retry-after-ms should be clamped to 0."""
exc = _make_exception_with_headers({"retry-after-ms": "-1000"})
assert _compute_retry_delay(0, exception=exc) == 0
def test_invalid_retry_after_falls_back(self):
"""Non-numeric, non-date retry-after should fall back to exponential."""
exc = _make_exception_with_headers({"retry-after": "not-a-number-or-date"})
assert _compute_retry_delay(0, exception=exc) == 2 # exponential fallback
def test_invalid_retry_after_ms_falls_back_to_retry_after(self):
"""Invalid retry-after-ms should fall through to retry-after."""
exc = _make_exception_with_headers(
{
"retry-after-ms": "garbage",
"retry-after": "7",
}
)
assert _compute_retry_delay(0, exception=exc) == 7.0
def test_retry_after_capped_at_max_delay(self):
"""Server-provided delay should be capped at max_delay."""
exc = _make_exception_with_headers({"retry-after": "3600"})
assert _compute_retry_delay(0, exception=exc) == 120 # capped
def test_retry_after_ms_capped_at_max_delay(self):
"""Server-provided ms delay should be capped at max_delay."""
exc = _make_exception_with_headers({"retry-after-ms": "300000"}) # 300s
assert _compute_retry_delay(0, exception=exc) == 120 # capped
def _make_exception_with_headers(headers: dict[str, str]) -> BaseException:
"""Create a mock exception with response headers for testing."""
exc = Exception("rate limited")
response = MagicMock()
response.headers = headers
exc.response = response # type: ignore[attr-defined]
return exc
# ---------------------------------------------------------------------------
# Async LLM methods — non-blocking event loop tests
# ---------------------------------------------------------------------------
class TestAsyncComplete:
"""Test that acomplete/acomplete_with_tools don't block the event loop."""
@pytest.mark.asyncio
@patch("litellm.acompletion")
async def test_acomplete_uses_acompletion(self, mock_acompletion):
"""acomplete() should call litellm.acompletion (async), not litellm.completion."""
mock_response = MagicMock()
mock_response.choices = [MagicMock()]
mock_response.choices[0].message.content = "async hello"
mock_response.choices[0].message.tool_calls = None
mock_response.choices[0].finish_reason = "stop"
mock_response.model = "gpt-4o-mini"
mock_response.usage.prompt_tokens = 10
mock_response.usage.completion_tokens = 5
# acompletion is async, so mock must return a coroutine
async def async_return(*args, **kwargs):
return mock_response
mock_acompletion.side_effect = async_return
provider = LiteLLMProvider(model="gpt-4o-mini", api_key="test-key")
result = await provider.acomplete(
messages=[{"role": "user", "content": "Hello"}],
system="You are helpful.",
)
assert result.content == "async hello"
assert result.model == "gpt-4o-mini"
assert result.input_tokens == 10
assert result.output_tokens == 5
mock_acompletion.assert_called_once()
@pytest.mark.asyncio
@patch("litellm.acompletion")
async def test_acomplete_does_not_block_event_loop(self, mock_acompletion):
"""Verify event loop stays responsive during acomplete()."""
heartbeat_ticks = []
async def heartbeat():
start = time.monotonic()
for _ in range(10):
heartbeat_ticks.append(time.monotonic() - start)
await asyncio.sleep(0.05)
async def slow_acompletion(*args, **kwargs):
# Simulate a 300ms LLM call — async, so event loop should stay free
await asyncio.sleep(0.3)
resp = MagicMock()
resp.choices = [MagicMock()]
resp.choices[0].message.content = "done"
resp.choices[0].message.tool_calls = None
resp.choices[0].finish_reason = "stop"
resp.model = "gpt-4o-mini"
resp.usage.prompt_tokens = 5
resp.usage.completion_tokens = 3
return resp
mock_acompletion.side_effect = slow_acompletion
provider = LiteLLMProvider(model="gpt-4o-mini", api_key="test-key")
# Run heartbeat + acomplete concurrently
_, result = await asyncio.gather(
heartbeat(),
provider.acomplete(
messages=[{"role": "user", "content": "hi"}],
),
)
assert result.content == "done"
# Heartbeat should have ticked multiple times during the 300ms LLM call
# (if the event loop were blocked, we'd see 0-1 ticks)
assert len(heartbeat_ticks) >= 3, (
f"Event loop was blocked — only {len(heartbeat_ticks)} heartbeat ticks"
)
@pytest.mark.asyncio
@patch("litellm.acompletion")
async def test_acomplete_with_tools_uses_acompletion(self, mock_acompletion):
"""acomplete_with_tools() should use litellm.acompletion."""
mock_response = MagicMock()
mock_response.choices = [MagicMock()]
mock_response.choices[0].message.content = "tool result"
mock_response.choices[0].message.tool_calls = None
mock_response.choices[0].finish_reason = "stop"
mock_response.model = "gpt-4o-mini"
mock_response.usage.prompt_tokens = 10
mock_response.usage.completion_tokens = 5
async def async_return(*args, **kwargs):
return mock_response
mock_acompletion.side_effect = async_return
provider = LiteLLMProvider(model="gpt-4o-mini", api_key="test-key")
tools = [
Tool(
name="search",
description="Search the web",
parameters={"properties": {"q": {"type": "string"}}, "required": ["q"]},
)
]
result = await provider.acomplete_with_tools(
messages=[{"role": "user", "content": "Search for cats"}],
system="You are helpful.",
tools=tools,
tool_executor=lambda tu: ToolResult(tool_use_id=tu.id, content="cats"),
)
assert result.content == "tool result"
mock_acompletion.assert_called_once()
@pytest.mark.asyncio
async def test_mock_provider_acomplete(self):
"""MockLLMProvider.acomplete() should work without blocking."""
from framework.llm.mock import MockLLMProvider
provider = MockLLMProvider()
result = await provider.acomplete(
messages=[{"role": "user", "content": "test"}],
system="Be helpful.",
)
assert result.content # Should have some mock content
assert result.model == "mock-model"
@pytest.mark.asyncio
async def test_base_provider_acomplete_offloads_to_executor(self):
"""Base LLMProvider.acomplete() should offload sync complete() to thread pool."""
call_thread_ids = []
class SlowSyncProvider(LLMProvider):
def complete(
self,
messages,
system="",
tools=None,
max_tokens=1024,
response_format=None,
json_mode=False,
max_retries=None,
):
call_thread_ids.append(threading.current_thread().ident)
time.sleep(0.1) # Sync blocking
return LLMResponse(content="sync done", model="slow")
def complete_with_tools(
self, messages, system, tools, tool_executor, max_iterations=10
):
return LLMResponse(content="sync tools done", model="slow")
provider = SlowSyncProvider()
main_thread_id = threading.current_thread().ident
result = await provider.acomplete(
messages=[{"role": "user", "content": "hi"}],
)
assert result.content == "sync done"
# The sync complete() should have run on a different thread
assert call_thread_ids[0] != main_thread_id, (
"Base acomplete() should offload sync complete() to a thread pool"
)
+2
View File
@@ -143,6 +143,7 @@ def _has_api_key(env_var: str) -> bool:
# ---------------------------------------------------------------------------
# Real API tests — text streaming
# ---------------------------------------------------------------------------
@pytest.mark.skip(reason="Requires valid live API keys — run manually")
class TestRealAPITextStreaming:
"""Stream a simple text response from each provider and dump events."""
@@ -204,6 +205,7 @@ class TestRealAPITextStreaming:
# ---------------------------------------------------------------------------
# Real API tests — tool call streaming
# ---------------------------------------------------------------------------
@pytest.mark.skip(reason="Requires valid live API keys — run manually")
class TestRealAPIToolCallStreaming:
"""Stream a tool call response from each provider and dump events."""
+1
View File
@@ -35,6 +35,7 @@ class MockLLMProvider(LLMProvider):
max_tokens=1024,
response_format=None,
json_mode=False,
max_retries=None,
):
self.complete_calls.append(
{
-117
View File
@@ -1,117 +0,0 @@
"""Tests for LLMNode JSON extraction logic.
Run with:
cd core
pytest tests/test_node_json_extraction.py -v
"""
import pytest
from framework.graph.node import LLMNode
class TestJsonExtraction:
"""Test _extract_json JSON extraction without LLM calls."""
@pytest.fixture
def node(self):
"""Create an LLMNode instance for testing."""
return LLMNode()
def test_clean_json(self, node):
"""Test parsing clean JSON directly."""
result = node._extract_json('{"key": "value"}', ["key"])
assert result == {"key": "value"}
def test_json_with_whitespace(self, node):
"""Test parsing JSON with surrounding whitespace."""
result = node._extract_json(' {"key": "value"} \n', ["key"])
assert result == {"key": "value"}
def test_markdown_code_block_at_start(self, node):
"""Test extracting JSON from markdown code block at start."""
input_text = '```json\n{"key": "value"}\n```'
result = node._extract_json(input_text, ["key"])
assert result == {"key": "value"}
def test_markdown_code_block_without_json_label(self, node):
"""Test extracting JSON from markdown code block without 'json' label."""
input_text = '```\n{"key": "value"}\n```'
result = node._extract_json(input_text, ["key"])
assert result == {"key": "value"}
def test_prose_around_markdown_block(self, node):
"""Test extracting JSON when prose surrounds the markdown block."""
input_text = 'Here is the result:\n```json\n{"key": "value"}\n```\nHope this helps!'
result = node._extract_json(input_text, ["key"])
assert result == {"key": "value"}
def test_json_embedded_in_prose(self, node):
"""Test extracting JSON embedded in prose text."""
input_text = 'The answer is {"key": "value"} as requested.'
result = node._extract_json(input_text, ["key"])
assert result == {"key": "value"}
def test_nested_json(self, node):
"""Test parsing nested JSON objects."""
input_text = '{"outer": {"inner": "value"}}'
result = node._extract_json(input_text, ["outer"])
assert result == {"outer": {"inner": "value"}}
def test_deeply_nested_json(self, node):
"""Test parsing deeply nested JSON objects."""
input_text = '{"a": {"b": {"c": {"d": "deep"}}}}'
result = node._extract_json(input_text, ["a"])
assert result == {"a": {"b": {"c": {"d": "deep"}}}}
def test_json_with_array(self, node):
"""Test parsing JSON with array values."""
input_text = '{"items": [1, 2, 3]}'
result = node._extract_json(input_text, ["items"])
assert result == {"items": [1, 2, 3]}
def test_json_with_string_containing_braces(self, node):
"""Test parsing JSON where string values contain braces."""
input_text = '{"code": "function() { return 1; }"}'
result = node._extract_json(input_text, ["code"])
assert result == {"code": "function() { return 1; }"}
def test_json_with_escaped_quotes(self, node):
"""Test parsing JSON with escaped quotes in strings."""
input_text = '{"message": "He said \\"hello\\""}'
result = node._extract_json(input_text, ["message"])
assert result == {"message": 'He said "hello"'}
def test_multiple_json_objects_takes_first(self, node):
"""Test that when multiple JSON objects exist, first is taken."""
input_text = '{"first": 1} and then {"second": 2}'
result = node._extract_json(input_text, ["first"])
assert result == {"first": 1}
def test_json_with_boolean_and_null(self, node):
"""Test parsing JSON with boolean and null values."""
input_text = '{"active": true, "deleted": false, "data": null}'
result = node._extract_json(input_text, ["active", "deleted", "data"])
assert result == {"active": True, "deleted": False, "data": None}
def test_json_with_numbers(self, node):
"""Test parsing JSON with integer and float values."""
input_text = '{"count": 42, "price": 19.99}'
result = node._extract_json(input_text, ["count", "price"])
assert result == {"count": 42, "price": 19.99}
def test_invalid_json_raises_error(self, node, monkeypatch):
"""Test that completely invalid JSON raises an error when no LLM fallback available."""
# Remove API keys so LLM fallback is not attempted
monkeypatch.delenv("CEREBRAS_API_KEY", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
with pytest.raises(ValueError, match="Cannot parse JSON"):
node._extract_json("This is not JSON at all", ["key"])
def test_empty_string_raises_error(self, node, monkeypatch):
"""Test that empty string raises an error when no LLM fallback available."""
# Remove API keys so LLM fallback is not attempted
monkeypatch.delenv("CEREBRAS_API_KEY", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
with pytest.raises(ValueError, match="Cannot parse JSON"):
node._extract_json("", ["key"])

Some files were not shown because too many files have changed in this diff Show More