Compare commits

...

248 Commits

Author SHA1 Message Date
RichardTang-Aden 52b1a3f472 Merge pull request #6282 from aden-hive/feat/refactor-session
Release / Create Release (push) Waiting to run
Refactor session lifecycle with flowchart planning and triggers
2026-03-12 21:15:10 -07:00
Richard Tang 079e00c8f7 Merge remote-tracking branch 'origin/main' into feat/refactor-session 2026-03-12 21:13:15 -07:00
Richard Tang 60bba38941 chore: ruff lint 2026-03-12 21:01:47 -07:00
Richard Tang ea8e7b11c6 Merge remote-tracking branch 'origin/feature/flowchart-linked-experimental' into feat/refactor-session 2026-03-12 20:54:08 -07:00
Richard Tang 3dc2b25b01 fix: adding the trigger helpers 2026-03-12 20:53:45 -07:00
bryan 543b90b34f chore: tooltip update 2026-03-12 20:50:39 -07:00
Richard Tang 2ad78ec8a2 Merge remote-tracking branch 'origin/feature/flowchart-linked-experimental' into feat/refactor-session 2026-03-12 20:48:09 -07:00
Timothy 412658e9f2 fix: remove subagent shapes 2026-03-12 20:46:09 -07:00
Richard Tang 9bfddec322 fix: missing _FLOWCHART_TYPES reference 2026-03-12 20:43:03 -07:00
Timothy bbd9c10169 fix: decision node cannot have subagents 2026-03-12 20:36:04 -07:00
Richard Tang 51fdc4ddde fix: always new session for new agent 2026-03-12 20:34:42 -07:00
Richard Tang 04685d33ca fix: solve the problem from merge conflict 2026-03-12 20:28:25 -07:00
Richard Tang 729a0e0cec fix: resolve merge conflict 2026-03-12 20:23:58 -07:00
bryan 2bcb0cacee added pause/run button 2026-03-12 20:15:25 -07:00
Timothy 44bf191f53 fix: no orphaned node by bfs 2026-03-12 20:04:00 -07:00
Richard Tang 993b31f19b Merge remote-tracking branch 'origin/feature/flowchart-linked-experimental' into feat/refactor-session 2026-03-12 20:00:45 -07:00
Richard Tang 41b3b9619f Merge remote-tracking branch 'origin/feature/flowchart-linked-experimental' into feature/flowchart-linked-experimental 2026-03-12 19:45:45 -07:00
Richard Tang 2a4fe4020c feat: force the planning agent to ask questions 2026-03-12 19:45:07 -07:00
Ishan Chaurasia 9d1f268078 fix(server): honor session_id in one-step session creation (#6233)
Align POST /api/sessions behavior across queen-only and one-step worker creation so callers can rely on deterministic session IDs. Add a regression test covering the forwarded session_id contract.

Made-with: Cursor
2026-03-13 10:43:12 +08:00
bryan 2185e127b1 style: coder tools formatting and template quote fixes 2026-03-12 19:39:53 -07:00
bryan 99ed885fd0 fix: add cached_tokens to finish event test assertion 2026-03-12 19:39:53 -07:00
bryan d8a390a685 feat: flowchart rendering in DraftGraph with node shapes and layout 2026-03-12 19:39:53 -07:00
bryan f50cf1735b feat: CSS variable theming for agent graph components 2026-03-12 19:39:53 -07:00
bryan 04eb57f54e feat: auto-load worker on cold restore when queen resumes 2026-03-12 19:39:53 -07:00
bryan 7378408eb8 feat: add flowchart type system and draft-to-graph dissolution 2026-03-12 19:39:53 -07:00
bryan cf05420417 style: formatting and import cleanup across framework modules 2026-03-12 19:38:55 -07:00
Timothy f5ed4c7d43 fix: validate orphaned gcu node 2026-03-12 19:38:44 -07:00
Timothy 5547432b6e fix: queen defaults to global max context tokens 2026-03-12 19:29:14 -07:00
Ishan Chaurasia 336557d7c7 fix: pass browser_wait text as data (#6235)
Pass browser_wait text through Playwright's function argument channel so quoted and multiline strings do not break the generated wait expression. Add a regression test covering text that previously would have been interpolated unsafely.

Made-with: Cursor
2026-03-13 10:08:16 +08:00
Timothy 87c172227c fix: mandate flowchart topology correction 2026-03-12 19:03:46 -07:00
Richard Tang c2c4929de8 feat: remove the phase in the label 2026-03-12 18:55:24 -07:00
Timothy a978338738 fix: allow replanning 2026-03-12 18:54:01 -07:00
Timothy 8eb59b1f66 fix: mandate usage of ask tools and change pending behavior 2026-03-12 18:34:15 -07:00
Richard Tang f9d5f95936 Merge remote-tracking branch 'origin/feature/flowchart-linked-experimental' into feat/refactor-session 2026-03-12 18:32:26 -07:00
Timothy 651e99ffe3 Merge branch 'feature/multiple-asks' into feature/flowchart-linked-experimental 2026-03-12 17:57:11 -07:00
Timothy 2564f1b948 feat: allow multiple questions 2026-03-12 17:56:58 -07:00
Richard Tang c01cd528d2 feat: planning phase prompt improvements 2026-03-12 17:44:06 -07:00
Timothy bc194ee4e9 Merge branch 'main' into feature/flowchart-linked-experimental 2026-03-12 16:50:17 -07:00
Timothy @aden 2bac100c03 Merge pull request #6283 from vincentjiang777/main
docs: rename and expand contributing guidelines
2026-03-12 16:46:59 -07:00
Timothy @aden 425d37f868 Merge branch 'main' into main 2026-03-12 16:44:29 -07:00
Vincent Jiang 99b127e2da docs: revert filename to CONTRIBUTING.md for GitHub compliance
Changed HOW_TO_CONTRIBUTE.md back to CONTRIBUTING.md to comply with
GitHub's standard for contributing guidelines files.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 16:42:42 -07:00
Timothy 43b759bf61 fix: ensure flowchart existence 2026-03-12 16:40:18 -07:00
Vincent Jiang 20d8d52f12 docs: rename and expand contributing guidelines
Renamed CONTRIBUTING.md to HOW_TO_CONTRIBUTE.md and significantly expanded
the documentation with detailed sections on development setup, OS support,
tooling requirements, performance metrics, and contribution workflows.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 16:29:13 -07:00
Richard Tang 944567dc31 chore: ruff lint 2026-03-12 16:23:13 -07:00
nightcityblade 7e09588e4e fix: reject path-like agent names in hive dispatch --agents (#6211)
Validate that agent names passed to --agents do not contain path
separators. Previously, passing 'exports/my_agent' would result in
the doubled path 'exports/exports/my_agent' with a confusing error.
Now a clear error message is shown suggesting the correct usage.

Fixes #6208

Co-authored-by: nightcityblade <nightcityblade@gmail.com>
2026-03-12 16:22:37 -07:00
Priyanka Bhallamudi 7bf69d2263 fix: read nodes from graph object in discovery.py for correct node count (#6227)
Co-authored-by: Lakshmi Priyanka Bhallamudi <priyanka@Lakshmis-MacBook-Air.local>
2026-03-12 16:22:37 -07:00
bryan 99d2b0c003 chore: update readme 2026-03-12 16:22:37 -07:00
bryan 8868416baa chore: update the tests and readme 2026-03-12 16:22:37 -07:00
bryan 405b120674 feat: fixed google credentials to use the google oauth credential 2026-03-12 16:22:37 -07:00
Trisha 66a7b43199 [bug:6117:docs]: fix inconsistent configuration and troubleshooting guidance (#6118) 2026-03-12 16:22:36 -07:00
Trisha a8f9d83723 docs: fix typos and awkward copy (#6115)
* [bug:6109:README]: fix typos and awkward copy

* trigger ci

* rerun checks
2026-03-12 16:22:36 -07:00
bryan d95d5804ca fix: align the credential functions to be the same 2026-03-12 16:22:36 -07:00
Richard Tang 674cf05601 feat: track the number of runs 2026-03-12 15:19:13 -07:00
Timothy 86349c78d0 Merge branch 'feature/guardrails' into feature/flowchart-linked-experimental 2026-03-12 15:11:12 -07:00
Timothy 2232f49191 fix: queen flowcharting behavior 2026-03-12 15:10:32 -07:00
Richard Tang 6fa71fa27d feat: track queen phase by message 2026-03-12 14:58:35 -07:00
Vincent Jiang 1ac9ba69d6 docs: replace recipe examples with 100 sample agent prompts
Replace individual recipe READMEs with a comprehensive collection of 100 real-world agent prompt examples across marketing, sales, operations, engineering, and finance. This provides users with a broader range of use case inspiration in a single, organized reference document.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 14:46:09 -07:00
Vincent Jiang 9e16be8f03 docs: replace recipe examples with 100 sample agent prompts
Replace individual recipe READMEs with a comprehensive collection of 100 real-world agent prompt examples across marketing, sales, operations, engineering, and finance. This provides users with a broader range of use case inspiration in a single, organized reference document.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 14:44:32 -07:00
Richard Tang 8c7065ad37 refactor: remove the parts conversion logic 2026-03-12 14:36:27 -07:00
Richard Tang a18ed5bbe6 feat: restore queen phase 2026-03-12 14:29:01 -07:00
Timothy 8f55170c1e fix: compaction ratio reporting 2026-03-12 14:17:42 -07:00
Richard Tang ed3d4bfe33 feat: resume cold session from event logs 2026-03-12 14:07:57 -07:00
Timothy 31a98a5f95 feat: cached token handing 2026-03-12 14:03:58 -07:00
Timothy 7667b773f2 fix: 18x tool discovery efficiency by progressive disclosure 2026-03-12 13:12:43 -07:00
Timothy 49560260de fix: token counts 2026-03-12 11:52:08 -07:00
Richard Tang 596ce9878d feat: unique run id 2026-03-12 11:09:36 -07:00
Timothy 1cc75f89bd feat: replanning 2026-03-12 09:55:42 -07:00
Timothy bb3c69cff1 fix: proper guardrail on combined context window 2026-03-12 09:37:17 -07:00
Timothy 70d11f537e feat: merge subagent nodes 2026-03-12 09:06:41 -07:00
Timothy b15dd2f623 fix: better logging 2026-03-12 09:03:29 -07:00
Timothy ce308312ae fix: usage tracking 2026-03-12 08:56:33 -07:00
nightcityblade f757c724cc fix: reject path-like agent names in hive dispatch --agents (#6211)
Validate that agent names passed to --agents do not contain path
separators. Previously, passing 'exports/my_agent' would result in
the doubled path 'exports/exports/my_agent' with a confusing error.
Now a clear error message is shown suggesting the correct usage.

Fixes #6208

Co-authored-by: nightcityblade <nightcityblade@gmail.com>
2026-03-12 21:11:02 +08:00
Priyanka Bhallamudi a4c758403e fix: read nodes from graph object in discovery.py for correct node count (#6227)
Co-authored-by: Lakshmi Priyanka Bhallamudi <priyanka@Lakshmis-MacBook-Air.local>
2026-03-12 18:34:47 +08:00
Timothy a67563850b feat: flowchart reconciliation 2026-03-11 19:58:27 -07:00
Bryan @ Aden b48465b778 Merge pull request #6230 from aden-hive/feat/google-doc-credential-alignment
micro-fix: Feat/google doc credential alignment
2026-03-12 02:52:03 +00:00
bryan d3baaaab24 chore: update readme 2026-03-11 19:48:00 -07:00
Timothy c764b4dc3b Merge branch 'main' into feature/flowchart-linked-experimental 2026-03-11 19:12:51 -07:00
bryan ad6077bd7b chore: update the tests and readme 2026-03-11 19:12:38 -07:00
Timothy ce2a91b1c0 feat: flowchart mapping 2026-03-11 19:12:25 -07:00
bryan c2e7afeb5e feat: fixed google credentials to use the google oauth credential 2026-03-11 19:12:25 -07:00
Timothy 0c9680ca89 feat: dissolution graph structure 2026-03-11 18:38:17 -07:00
Richard Tang 726016d24a fix: remove the duplicated session logic 2026-03-11 17:11:03 -07:00
Richard Tang 4895cea08a chore: lint and micro-fix 2026-03-11 16:55:29 -07:00
Richard Tang c9723a3ff2 feat(wip): always resume the previous session 2026-03-11 16:48:31 -07:00
Richard Tang 6cb73a6fea refactor: remove the remaining old trigger format and change the trigger format in examples to the latest format 2026-03-11 16:13:37 -07:00
Richard Tang 0c7f43f595 refactor: remove reference of the unused session judge 2026-03-11 16:01:00 -07:00
Richard Tang ea5cfcc5d6 refactor: remove the unused session judge 2026-03-11 15:57:19 -07:00
Richard Tang 34e85019c3 feat: stop supporting the old scheduler 2026-03-11 15:54:48 -07:00
Timothy 8011b72673 fix: flowchart display 2026-03-11 15:41:55 -07:00
RichardTang-Aden d87dfca1ab Merge pull request #6075 from aden-hive/fix/credential-function-alignment
fix: align the credential functions to be the same
2026-03-11 15:11:57 -07:00
Richard Tang c979dba958 fix: reference error from the rename 2026-03-11 14:33:42 -07:00
Richard Tang b4caa045e1 Merge remote-tracking branch 'origin/main' into feat/agent-trigger 2026-03-11 14:32:36 -07:00
Timothy b0fd4bc356 fix: draft flowchart display 2026-03-11 11:05:33 -07:00
Trisha a79d7de482 [bug:6117:docs]: fix inconsistent configuration and troubleshooting guidance (#6118) 2026-03-11 14:41:54 +08:00
Trisha e5e57302fa docs: fix typos and awkward copy (#6115)
* [bug:6109:README]: fix typos and awkward copy

* trigger ci

* rerun checks
2026-03-11 14:38:37 +08:00
Emmanuel Nwanguma c69cf1aea5 test(security): add comprehensive unit tests for 7 security scanning tools (#6151)
* test(security): add comprehensive unit tests for 7 security scanning tools

Add dedicated test files for all security scanning tools:
- test_dns_security_scanner.py (12 tests)
- test_http_headers_scanner.py (13 tests)
- test_ssl_tls_scanner.py (14 tests)
- test_subdomain_enumerator.py (15 tests)
- test_port_scanner.py (17 tests)
- test_tech_stack_detector.py (20 tests)
- test_risk_scorer.py (24 tests)

Total: 115 new tests covering:
- Input validation and cleaning
- Connection error handling
- Core scanning logic with mocked responses
- Grade/risk calculation
- Edge cases

Fixes #5920

* fix(tests): strengthen weak assertions in security scanner tests

- SSL scanner: replace always-true `or` assertions with specific checks
  that verify hostname stripping actually happened
- Port scanner: verify timeout clamp value, not just absence of error
- DNS scanner: remove unused helper method

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-03-11 13:29:11 +08:00
Emmanuel Nwanguma 2f4cd8c36f fix(credentials): improve exception handling in key_storage.py (#6153)
Replace bare except Exception: clauses with specific exception handling:

- delete_aden_api_key(): Catch FileNotFoundError, PermissionError at debug
  level; log unexpected errors at WARNING with exc_info=True
- _read_credential_key_file(): Catch FileNotFoundError, PermissionError at
  debug level; log unexpected errors at WARNING with exc_info=True
- _read_aden_from_encrypted_store(): Catch FileNotFoundError, PermissionError,
  KeyError at debug level; log unexpected errors at WARNING with exc_info=True

This makes credential issues easier to diagnose by:
- Logging unexpected errors at WARNING level (visible in production)
- Including full stack traces with exc_info=True
- Keeping expected failures (file not found, permissions) at debug level

Fixes #5931
2026-03-11 13:05:10 +08:00
Aaryann Chandola 6f571e6d00 [BUG] fix: use ReplaceFileW for atomic writes on Windows to preserve ACLs (#5849)
* [BUG] fix: use ReplaceFileW for atomic writes on Windows to preserve ACLs

* fix: ensure atomic_replace checks for Windows API availability
2026-03-11 12:59:14 +08:00
Emmanuel Nwanguma 31bc84106f test: add API integration tests for hubspot, intercom, google_docs tools (#6167)
>>
>> Resolves #5921
>>
>> - test_hubspot_tool.py: 51 tests covering 15 MCP tools
>> - test_intercom_tool.py: 50 tests covering 11 MCP tools
>> - test_google_docs_tool.py: 57 tests covering 11 MCP tools
2026-03-11 12:55:03 +08:00
Timothy bdd6194203 feature: hive flowchart at planning phase 2026-03-10 19:54:02 -07:00
RichardTang-Aden fd79dceb0f Merge pull request #6166 from aden-hive/fix/subagent-reply-stall
Release / Create Release (push) Waiting to run
micro-fix: update escalation tests for new ESCALATION_REQUESTED flow
2026-03-10 19:47:00 -07:00
Richard Tang ad50139d67 chore: lint 2026-03-10 19:46:35 -07:00
Richard Tang 12fb40c110 test: update escalation tests for ESCALATION_REQUESTED flow
Tests were asserting the old CLIENT_OUTPUT_DELTA + CLIENT_INPUT_REQUESTED
pattern; the fix in 89ccd66f routes escalations through the queen via
ESCALATION_REQUESTED instead.
2026-03-10 19:45:21 -07:00
RichardTang-Aden 738e469d96 Merge pull request #6165 from aden-hive/feature/provider-moonshotai-kimi
feat: support MoonShot AI Kimi subscription
2026-03-10 19:39:25 -07:00
Timothy 80ccbcc827 chore: lint 2026-03-10 19:37:18 -07:00
RichardTang-Aden 08fac31a9d Merge pull request #6159 from aden-hive/fix/subagent-reply-stall
fix: route subagent report_to_parent escalations to queen instead of user
2026-03-10 18:24:33 -07:00
Richard Tang 89ccd66fb9 fix: subagent _EscalationReceiver 2026-03-10 18:21:50 -07:00
Timothy 7c47e367de feat: support moonshotai kimi subscription 2026-03-10 18:03:44 -07:00
Timothy b8741bf94c fix: queen agent system prompt hooks 2026-03-10 16:25:07 -07:00
RichardTang-Aden c90dcbb32f Merge pull request #6152 from aden-hive/refactor/remove-dead-code
refactor: remove deprecated codes
2026-03-10 15:31:34 -07:00
Richard Tang ac3a5f5e93 chore: remove the ai generated temp doc 2026-03-10 15:29:21 -07:00
Timothy 1ccfdbbf7d chore: minimax key check 2026-03-10 15:24:09 -07:00
Timothy 1de37d2747 chore: lint 2026-03-10 15:00:14 -07:00
Timothy 2aefdf5b5f refactor: remove deprecated codes 2026-03-10 14:57:54 -07:00
Hundao 4caaa79900 Merge pull request #5988 from roberthallers/docs/fix-tui-deprecation-5941
docs: fix TUI deprecation inconsistency in roadmap
2026-03-10 16:46:41 +08:00
Hundao 296089d4cd Merge pull request #6108 from Hundao/fix/subagent-judge-feedback
fix: SubagentJudge and implicit judge return feedback=None on ACCEPT
2026-03-10 15:39:29 +08:00
hundao cae5f971cf fix: update test assertions for newly added tools
Tool counts and expected lists were outdated after new tools were added
to stripe, linear, apollo, discord, and google_analytics.
2026-03-10 15:36:12 +08:00
hundao bac716eea3 fix: pass feedback="" on evaluated ACCEPT verdicts in SubagentJudge and implicit judge
Fixes #6107
2026-03-10 15:24:39 +08:00
Navya Bijoy 14daf672e8 Fix: SessionManager._cleanup_stale_active_sessions indiscriminately cancels healthy concurrent agent sessions (#6081)
* fixes a bug in the  SessionManager

* chore: remove debug print from test

---------

Co-authored-by: hundao <alchemy_wimp@hotmail.com>
2026-03-10 15:18:11 +08:00
Emmanuel Nwanguma e352ae5145 fix(mcp): close errlog file handle to prevent resource leak (#6094)
Track the errlog file handle opened on non-Windows systems and
properly close it during cleanup to prevent file descriptor leaks.

Changes:
- Add _errlog_handle instance variable to track the file handle
- Store handle reference when opening os.devnull
- Close handle in _cleanup_stdio_async() after other cleanup
- Clear reference in disconnect() for safety

Fixes #6002
2026-03-10 15:06:51 +08:00
Pushkal a58ffc2669 fix(server): use session.phase_state instead of session.mode_state in handle_pause (#6069)
The handle_pause endpoint referenced session.mode_state (lines 360-361),
which does not exist on the Session dataclass. This caused an
AttributeError every time the pause endpoint reached the phase transition
step, preventing the queen phase from transitioning to staging and
returning a 500 error to the frontend.

Changed to session.phase_state, consistent with handle_stop (line 412),
handle_run (line 75), and the Session dataclass definition
(session_manager.py line 44).
2026-03-10 15:03:19 +08:00
RichardTang-Aden 3fefea52be Merge pull request #6102 from aden-hive/micro-fix/report-to-parent-empty-check
micro-fix: track reported_to_parent to prevent false empty-turn detection
2026-03-09 21:12:23 -07:00
Richard Tang 06fd045b3e micro-fix: track reported_to_parent to prevent false empty-turn detection
Turns that call report_to_parent were incorrectly treated as "truly
empty" because the flag was not propagated. Thread it through
_run_single_turn and include it in the empty-turn guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:10:47 -07:00
RichardTang-Aden 2e43d2af46 Merge pull request #6100 from aden-hive/feature/integration-extended
Release / Create Release (push) Waiting to run
micro-fix: wrong reference for hive_coder
2026-03-09 19:52:35 -07:00
Richard Tang 2c9790c65d Merge remote-tracking branch 'origin' into feature/integration-extended 2026-03-09 19:52:17 -07:00
Richard Tang 9700ac71bb micro-fix: wrong reference for hive_coder 2026-03-09 19:50:07 -07:00
RichardTang-Aden 61ed67b068 Merge pull request #6097 from aden-hive/feature/integration-extended
Expand integration tool coverage across 40 vendors
2026-03-09 19:47:34 -07:00
Richard Tang c3bea8685a Merge remote-tracking branch 'origin/main' into feature/integration-extended 2026-03-09 19:47:21 -07:00
RichardTang-Aden 98c57b795a Merge pull request #6050 from aden-hive/feat/queen-planning-phase
Add queen planning phase, global memory, and refactor hive_coder
2026-03-09 19:46:23 -07:00
Richard Tang 9be1d03b5c chore ruff lint 2026-03-09 19:45:36 -07:00
Richard Tang 0d09510539 Merge remote-tracking branch 'origin/main' into feat/queen-planning-phase 2026-03-09 19:42:10 -07:00
Richard Tang 639c37ba17 feat: prompt to init the agent 2026-03-09 19:34:01 -07:00
Richard Tang 2258c23254 Merge branch 'feature/queen-global-memory' into feat/queen-planning-phase 2026-03-09 19:11:32 -07:00
Richard Tang 9714ea106d feat: improve initialize_and_build_agent clarity 2026-03-09 18:54:48 -07:00
Timothy f4ad500177 chore: lint 2026-03-09 18:53:01 -07:00
Timothy 9154a4d9f8 fix: resolve E501 line-too-long lint errors across 7 tool files 2026-03-09 18:51:01 -07:00
Timothy add6efe6f1 fix(micro-fix): increase stall threshold 2026-03-09 18:40:13 -07:00
Richard Tang 7ceb1efd02 fix: replace old tool name reference 2026-03-09 18:40:01 -07:00
Timothy a29ecf8435 chore(micro-fix): fix ci test blockage 2026-03-09 18:27:21 -07:00
Richard Tang d0ba5ef4f4 fix: update the wrong variable name 2026-03-09 18:12:29 -07:00
Richard Tang 860f637491 feat: add validation for module import 2026-03-09 17:53:50 -07:00
Richard Tang acb2cab317 feat: minor prompt change for switching to building mode 2026-03-09 17:41:23 -07:00
Richard Tang b453806918 feat: execution end message 2026-03-09 17:29:58 -07:00
Richard Tang 7ba8a0f51b feat: strengthen validation logic when loading 2026-03-09 17:08:20 -07:00
Richard Tang f6f398b6b1 feat: add GCU knowledge to planning 2026-03-09 17:02:13 -07:00
Timothy c4b22fa5c4 feat(postgres): update credential spec with new tool names 2026-03-09 16:47:27 -07:00
Timothy 0e64f977cd feat(postgres): add table stats, indexes, and foreign keys tools
Add pg_get_table_stats for row counts and size info,
pg_list_indexes for index details, and pg_get_foreign_keys
for relationship discovery with both outgoing and incoming FKs.
2026-03-09 16:47:09 -07:00
Timothy f24c9708fc feat(lusha): update credential spec with new tool names 2026-03-09 16:45:33 -07:00
Timothy bb4436e277 feat(lusha): add bulk enrich, technologies, and decision makers tools
Add lusha_bulk_enrich_persons for batch enrichment,
lusha_get_technologies for company tech stack lookup, and
lusha_search_decision_makers for senior contact discovery.
2026-03-09 16:45:17 -07:00
Timothy 795f66c90b feat(gsc): update credential spec with new tool names 2026-03-09 16:44:33 -07:00
Timothy 9ef6d51573 feat(gsc): add top queries, top pages, and delete sitemap tools
Add gsc_top_queries and gsc_top_pages convenience wrappers for
click-sorted analytics, and gsc_delete_sitemap for sitemap removal.
2026-03-09 16:44:20 -07:00
Timothy 3fed4e3409 feat(aws-s3): update credential specs with new tool names 2026-03-09 16:43:37 -07:00
Timothy 670e69f2ce feat(aws-s3): add copy, metadata, and presigned URL tools
Add s3_copy_object for copying within/between buckets,
s3_get_object_metadata for HEAD-based metadata retrieval, and
s3_generate_presigned_url for temporary access URL generation.
2026-03-09 16:42:46 -07:00
Timothy f6c4747905 feat(pushover): update credential spec with new tool names 2026-03-09 16:42:04 -07:00
Timothy 7b78f6c12f feat(pushover): add cancel receipt, glance update, and limits tools
Add pushover_cancel_receipt for stopping emergency retries,
pushover_send_glance for widget data updates, and
pushover_get_limits for checking message usage.
2026-03-09 16:41:52 -07:00
Timothy 1c75100f59 feat(news): update credential spec with new tool names 2026-03-09 16:41:15 -07:00
Timothy b325e103c6 feat(news): add latest, by-source, and by-topic search tools
Add news_latest for breaking news without query, news_by_source
for source-filtered articles, and news_by_topic for topic-based
discovery with automatic date ranges.
2026-03-09 16:40:54 -07:00
Timothy aef2d2d474 feat(serpapi): update credential spec with new tool names 2026-03-09 16:40:05 -07:00
Timothy 95a2b6711e feat(serpapi): add cited-by, profile search, and Google web search tools
Add scholar_cited_by for finding papers citing a given paper,
scholar_search_profiles for author profile discovery, and
serpapi_google_search for structured Google web results.
2026-03-09 16:38:50 -07:00
Timothy 7fb5e8145c feat(exa-search): update credential spec with new tool names 2026-03-09 16:37:56 -07:00
Timothy 8e45d0df83 feat(exa-search): add news, papers, and company search tools
Add exa_search_news, exa_search_papers, and exa_search_companies
convenience wrappers with pre-configured category filters and
automatic date/domain filtering.
2026-03-09 16:37:44 -07:00
Timothy 3d175a6d54 feat(greenhouse): update credential spec with new tool names
Add greenhouse_list_offers, greenhouse_add_candidate_note, greenhouse_list_scorecards.
2026-03-09 16:02:53 -07:00
Timothy b9debaf957 feat(greenhouse): add list offers, candidate notes, and scorecards tools
- greenhouse_list_offers: GET /offers or /applications/{id}/offers
- greenhouse_add_candidate_note: POST /candidates/{id}/activity_feed/notes
- greenhouse_list_scorecards: GET /applications/{id}/scorecards
- Add _post helper for POST requests
2026-03-09 16:02:08 -07:00
Timothy d2d7bdc374 feat(brevo): update credential spec with new tool names
Add brevo_list_contacts, brevo_delete_contact, brevo_list_email_campaigns.
2026-03-09 16:01:16 -07:00
Timothy 40e494b15d feat(brevo): add list contacts, delete contact, and list campaigns tools
- brevo_list_contacts: GET /contacts with pagination and modified_since filter
- brevo_delete_contact: DELETE /contacts/{email} to remove contacts
- brevo_list_email_campaigns: GET /emailCampaigns with status filter and stats
2026-03-09 16:00:42 -07:00
Timothy b5e840c0cb feat(quickbooks): update credential specs with new tool names
Add quickbooks_list_invoices, quickbooks_get_customer, quickbooks_create_payment
to both credential specs (token and realm_id).
2026-03-09 15:59:46 -07:00
Timothy f3d74c9ae4 feat(quickbooks): add list invoices, get customer, and create payment tools
- quickbooks_list_invoices: query invoices with status/customer filters
- quickbooks_get_customer: GET /customer/{id} with address and contact info
- quickbooks_create_payment: POST /payment with optional invoice linking
2026-03-09 15:59:23 -07:00
Timothy 2e7dbad118 feat(cloudinary): update credential specs with new tool names
Add cloudinary_get_usage, cloudinary_rename_resource, cloudinary_add_tag
to all three credential specs (cloud_name, key, secret).
2026-03-09 15:31:42 -07:00
Timothy 6183d1b65b feat(cloudinary): add usage, rename, and add tag tools
- cloudinary_get_usage: GET /usage for storage, bandwidth, transformation limits
- cloudinary_rename_resource: POST /rename to change public_id
- cloudinary_add_tag: POST /tags to add tags to resources
2026-03-09 15:31:22 -07:00
Timothy 09931e6d98 feat(twitter): update credential spec with new tool names
Add twitter_get_user_followers, twitter_get_tweet_replies, twitter_get_list_tweets.
2026-03-09 15:25:21 -07:00
Timothy cb394127d1 feat(twitter): add user followers, tweet replies, and list tweets tools
- twitter_get_user_followers: GET /users/{id}/followers with profile details
- twitter_get_tweet_replies: search recent replies via conversation_id
- twitter_get_list_tweets: GET /lists/{id}/tweets with author expansion
2026-03-09 15:21:47 -07:00
Timothy 588fa1f9ea feat(google-analytics): update credential spec with new tool names
Add ga_get_user_demographics, ga_get_conversion_events, ga_get_landing_pages.
2026-03-09 15:21:09 -07:00
Timothy 73325c280c feat(google-analytics): add demographics, conversion events, and landing pages tools
- ga_get_user_demographics: country/language/device breakdown
- ga_get_conversion_events: event counts, conversions, and revenue
- ga_get_landing_pages: top landing pages with bounce rate and session duration
2026-03-09 15:20:51 -07:00
Timothy 8c5ae8ffa8 feat(docker-hub): update credential spec with new tool names
Add docker_hub_get_tag_detail, docker_hub_delete_tag, docker_hub_list_webhooks.
2026-03-09 15:19:58 -07:00
Timothy 7389423c70 feat(docker-hub): add tag detail, delete tag, and list webhooks tools
- docker_hub_get_tag_detail: GET /repositories/{repo}/tags/{tag} with image architectures
- docker_hub_delete_tag: DELETE /repositories/{repo}/tags/{tag}
- docker_hub_list_webhooks: GET /repositories/{repo}/webhooks
- Add _delete helper for DELETE requests
2026-03-09 15:18:46 -07:00
Timothy 20c15446a7 feat(apollo): update credential spec with new tool names
Add apollo_get_person_activities, apollo_list_email_accounts,
apollo_bulk_enrich_people.
2026-03-09 15:17:38 -07:00
Timothy bcd2fb76bd feat(apollo): add person activities, email accounts, and bulk enrich tools
- apollo_get_person_activities: GET /activities for contact activity history
- apollo_list_email_accounts: GET /email_accounts for connected sending accounts
- apollo_bulk_enrich_people: POST /people/bulk_match for batch enrichment (up to 10)
2026-03-09 15:03:21 -07:00
Timothy 5fb97ab6df feat(calendly): update credential spec with new tool names
Add calendly_cancel_event, calendly_list_webhooks, calendly_get_event_type.
2026-03-09 15:00:46 -07:00
Timothy 0224ebc800 feat(calendly): add cancel event, list webhooks, and get event type tools
- calendly_cancel_event: POST /scheduled_events/{id}/cancellation
- calendly_list_webhooks: GET /webhook_subscriptions for org/user scope
- calendly_get_event_type: GET /event_types/{id} for meeting template details
- Add _post helper for POST requests
2026-03-09 15:00:34 -07:00
Timothy af88f7299a feat(pagerduty): update credential specs with new tool names
Add pagerduty_list_oncalls, pagerduty_add_incident_note,
pagerduty_list_escalation_policies to api_key spec.
Add pagerduty_add_incident_note to from_email spec (write operation).
2026-03-09 14:59:53 -07:00
Timothy 81729706ae feat(pagerduty): add oncalls, incident notes, and escalation policies tools
- pagerduty_list_oncalls: GET /oncalls with schedule/policy filters
- pagerduty_add_incident_note: POST /incidents/{id}/notes to add notes
- pagerduty_list_escalation_policies: GET /escalation_policies with search
2026-03-09 14:59:33 -07:00
Timothy bbb1b43ebe feat(airtable): update credential spec with new tool names
Add airtable_delete_records, airtable_search_records, airtable_list_collaborators.
2026-03-09 14:58:57 -07:00
Timothy 70ed5fa8df feat(airtable): add delete records, search records, and list collaborators tools
- airtable_delete_records: DELETE records by comma-separated IDs (up to 10)
- airtable_search_records: search records using FIND formula for partial matching
- airtable_list_collaborators: list base collaborators via meta API
- Add _delete helper for DELETE requests
2026-03-09 14:58:42 -07:00
Timothy 312db6620d feat(reddit): update credential specs with new tool names
Add reddit_get_subreddit_info, reddit_get_post_detail, reddit_get_user_posts
to both credential specs (client_id and client_secret).
2026-03-09 14:57:50 -07:00
Timothy 93c1fc5488 feat(reddit): add subreddit info, post detail, and user posts tools
- reddit_get_subreddit_info: GET /r/{name}/about for subscriber count, description
- reddit_get_post_detail: GET /by_id/t3_{id} for full post details with flair, ratios
- reddit_get_user_posts: GET /user/{name}/submitted for user's post history
2026-03-09 14:57:33 -07:00
Timothy 801443027d feat(pipedrive): update credential spec with new tool names
Add pipedrive_update_deal, pipedrive_create_person, pipedrive_create_activity
to the credential spec tools list.
2026-03-09 14:54:22 -07:00
Timothy ca2ead76cd feat(pipedrive): add deal update, person creation, and activity creation tools
Add pipedrive_update_deal, pipedrive_create_person, and
pipedrive_create_activity tools using Pipedrive REST API v1.
2026-03-09 14:52:27 -07:00
Timothy d562144a6d feat(confluence): register new tools in credential specs
Add confluence_update_page, confluence_delete_page, and
confluence_get_page_children to all three Confluence credential specs.
2026-03-09 14:51:39 -07:00
Timothy af7fb7da27 feat(confluence): add page update, delete, and children listing tools
Add confluence_update_page, confluence_delete_page, and
confluence_get_page_children tools using Confluence REST API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:51:26 -07:00
Timothy c17dd63b4a feat(intercom): register new tools in credential spec
Add intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations to Intercom credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:50:49 -07:00
Timothy 866db289e2 feat(intercom): add close conversation, create contact, and list conversations tools
Add close_conversation, create_contact, and list_conversations client
methods plus intercom_close_conversation, intercom_create_contact, and
intercom_list_conversations MCP tools using Intercom API v2.11.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:50:30 -07:00
Timothy b4ac5e9607 feat(gitlab): register new tools in credential spec
Add gitlab_update_issue, gitlab_get_merge_request, and
gitlab_create_merge_request_note to GitLab credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:49:01 -07:00
Timothy 3ca7af4242 feat(gitlab): add issue update, MR detail, and MR comment tools
Add _put helper and gitlab_update_issue, gitlab_get_merge_request,
and gitlab_create_merge_request_note tools using GitLab REST API v4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:48:40 -07:00
RichardTang-Aden 5e1ab3ca37 Merge pull request #5029 from karthik-kotra/docs/setup-troubleshooting
docs(setup): add troubleshooting steps for common WSL setup issues
2026-03-09 14:06:28 -07:00
Timothy 79c32c9f47 feat(slack): register new tools in credential spec
Add slack_get_channel_info, slack_list_files, and slack_get_file_info
to Slack credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:58:14 -07:00
Timothy 35ee29a843 feat(slack): add channel info, file listing, and file detail tools
Add get_channel_info, list_files, and get_file_info client methods
plus slack_get_channel_info, slack_list_files, and slack_get_file_info
MCP tools using Slack Web API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:57:45 -07:00
Timothy 573aea1d9c feat(stripe): register new tools in credential spec
Add stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session to Stripe credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:56:20 -07:00
Timothy 6ecbc30293 feat(stripe): add disputes, events, and checkout session tools
Add list_disputes, list_events, and create_checkout_session client
methods plus stripe_list_disputes, stripe_list_events, and
stripe_create_checkout_session MCP tools using Stripe API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:56:07 -07:00
Timothy 843b1f2e1d feat(linear): register new tools in credential spec
Add linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create to Linear credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:54:48 -07:00
Timothy 89f6c8e4ef feat(linear): add cycle listing, issue comments, and issue relations tools
Add list_cycles, list_issue_comments, and create_issue_relation client
methods plus linear_cycles_list, linear_issue_comments_list, and
linear_issue_relation_create MCP tools using Linear GraphQL API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:52:09 -07:00
Timothy 304ac07bd8 feat(zoom): register new tools in credential spec
Add zoom_update_meeting, zoom_list_meeting_participants, and
zoom_list_meeting_registrants to Zoom credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:50:27 -07:00
Timothy 82f0684b83 feat(zoom): add meeting update, participants, and registrants tools
Add zoom_update_meeting (PATCH), zoom_list_meeting_participants
(past meeting attendees), and zoom_list_meeting_registrants
using Zoom REST API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:45:11 -07:00
Timothy 963c37dc31 feat(twilio): register new tools in credential specs
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message to both Twilio credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:41:26 -07:00
Timothy c02da3ba5a feat(twilio): add phone number listing, call history, and message deletion tools
Add twilio_list_phone_numbers, twilio_list_calls, and
twilio_delete_message tools using Twilio REST API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:58 -07:00
Timothy 7f34e95ec6 feat(shopify): register new tools in credential specs
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order to both Shopify credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:28 -07:00
Timothy f2998fe098 feat(shopify): add product update, customer detail, and draft order tools
Add shopify_update_product, shopify_get_customer, and
shopify_create_draft_order tools using Shopify Admin REST API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:40:15 -07:00
Timothy 323a2489b8 feat(zendesk): register new tools in credential specs
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users to all three Zendesk credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:39:35 -07:00
Timothy f6d1cd640e feat(zendesk): add ticket comments and user listing tools
Add zendesk_get_ticket_comments, zendesk_add_ticket_comment, and
zendesk_list_users tools using Zendesk Support API v2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:39:25 -07:00
Timothy ddf89a04fe feat(asana): update credential spec for new tools
Register asana_update_task, asana_add_comment, and
asana_create_subtask in the Asana credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:35:16 -07:00
Timothy c5dc89f5ee feat(asana): add update_task, add_comment, create_subtask tools
Add _put helper and three new Asana MCP tools:
- asana_update_task: modify name, notes, completion, due date, assignee
- asana_add_comment: post comment stories on tasks
- asana_create_subtask: create subtasks under existing tasks

API ref: https://developers.asana.com/docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:35:05 -07:00
Timothy 6ade34b759 feat(trello): register get_card, create_list, search_cards tools
Add three new Trello MCP tools:
- trello_get_card: retrieve full card details with members/checklists/attachments
- trello_create_list: create new lists on boards
- trello_search_cards: full-text search across cards with board scoping

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:20:43 -07:00
Timothy 09d5f0a9df feat(trello): add client methods for get_card, create_list, search
Add TrelloClient methods for:
- get_card: GET /1/cards/{id} with members, checklists, attachments
- create_list: POST /1/lists to create new board lists
- search: GET /1/search for full-text search across cards

API ref: https://developer.atlassian.com/cloud/trello/rest/api-group-cards/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:19:59 -07:00
Timothy a60d63cca2 feat(github): register list_commits, create_release, list_workflow_runs
Add three new GitHub MCP tools:
- github_list_commits: query commits with author/date/branch filters
- github_create_release: create tagged releases with notes and draft support
- github_list_workflow_runs: monitor CI/CD pipeline runs with status filters

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:19:16 -07:00
Timothy 8616975fc5 feat(github): add client methods for commits, releases, workflow runs
Add _GitHubClient methods for:
- list_commits: GET /repos/{owner}/{repo}/commits with sha/author/date filters
- create_release: POST /repos/{owner}/{repo}/releases with tag, notes, draft
- list_workflow_runs: GET /repos/{owner}/{repo}/actions/runs with filters

API ref: https://docs.github.com/en/rest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:18:33 -07:00
Timothy e5ae919d8f feat(telegram): register get_chat_member_count, send_video, set_description
Add three new Telegram MCP tools:
- telegram_get_chat_member_count: retrieve group/channel membership size
- telegram_send_video: send video files via URL or file_id
- telegram_set_chat_description: update group/channel descriptions

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:17:30 -07:00
Timothy 8e7f5eaaba feat(telegram): add client methods for member count, video, description
Add _TelegramClient methods for:
- get_chat_member_count: getChatMemberCount API endpoint
- send_video: sendVideo with caption, parse_mode, duration support
- set_chat_description: setChatDescription for groups/channels

API ref: https://core.telegram.org/bots/api

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:13:06 -07:00
Timothy 4d1ff8b054 feat(salesforce): update credential spec for new tools
Register salesforce_delete_record, salesforce_search_records, and
salesforce_get_record_count in both Salesforce credential specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:12:25 -07:00
Timothy 9fa81e8599 feat(salesforce): add delete_record, search_records, get_record_count
Add three new Salesforce MCP tools:
- salesforce_delete_record: DELETE /sobjects/{type}/{id}
- salesforce_search_records: SOSL full-text search via /search/
- salesforce_get_record_count: efficient COUNT() query for any SObject

API ref: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:12:11 -07:00
Timothy cf8e19b059 feat(discord): register get_channel, create_reaction, delete_message tools
Add three new Discord MCP tools:
- discord_get_channel: retrieve channel metadata (name, topic, type)
- discord_create_reaction: add emoji reactions to messages
- discord_delete_message: remove messages from channels

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:11:25 -07:00
Timothy dfa3f60fcf feat(discord): add client methods for get_channel, reactions, delete
Add _DiscordClient methods for:
- get_channel: retrieve channel metadata via GET /channels/{id}
- create_reaction: add emoji reaction via PUT reactions endpoint
- delete_message: remove a message via DELETE messages endpoint

API ref: https://discord.com/developers/docs/resources

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:10:49 -07:00
Timothy b795f1b253 feat(notion): update credential spec for new tools
Register notion_update_page, notion_archive_page, and
notion_append_blocks in the Notion credential spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:06:20 -07:00
Timothy 73423c0dd2 feat(notion): add update_page, archive_page, append_blocks tools
Add three new Notion MCP tools:
- notion_update_page: modify page properties via PATCH /pages/{id}
- notion_archive_page: archive or restore pages
- notion_append_blocks: add paragraphs, headings, lists, todos, etc.

API ref: https://developers.notion.com/reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:06:08 -07:00
Timothy 3d844e1539 feat(jira): update credential spec for new tools
Register jira_update_issue, jira_list_transitions, and
jira_transition_issue in all three Jira credential specs
(domain, email, token).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:04:32 -07:00
Timothy b619119eb5 feat(jira): add update_issue, list_transitions, transition_issue tools
Add three new Jira MCP tools:
- jira_update_issue: modify summary, description, priority, labels, assignee
- jira_list_transitions: discover available status transitions for an issue
- jira_transition_issue: move an issue to a new status with optional comment

API ref: https://developer.atlassian.com/cloud/jira/platform/rest/v3/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:04:19 -07:00
Timothy b00ed4fc70 feat(hubspot): register delete_object, list/create_associations tools
Add three new MCP tools:
- hubspot_delete_object: archive contacts, companies, or deals
- hubspot_list_associations: query links between CRM objects (v4 API)
- hubspot_create_association: link two CRM records together

Update credential spec to include the new tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:03:37 -07:00
Timothy 5ec5fbe998 feat(hubspot): add client methods for delete, associations
Add _HubSpotClient methods for:
- delete_object: archive a CRM object via DELETE /crm/v3/objects
- list_associations: query associations via GET /crm/v4/objects associations endpoint
- create_association: link two CRM objects via PUT /crm/v4/objects associations endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:02:49 -07:00
Bryan @ Aden 402bb38267 Merge pull request #6079 from Waryjustice/fix/google-sheets-credentials-orphan
fix(credentials): remove orphaned google_sheets.py credential spec
2026-03-09 18:37:27 +00:00
Waryjustice 0a55928872 fix(credentials): remove orphaned google_sheets.py credential spec
The google_sheets.py file defined GOOGLE_SHEETS_CREDENTIALS (an API-key
based credential for reading public sheets via GOOGLE_SHEETS_API_KEY) but
was never wired into the package:

- Never imported in credentials/__init__.py
- Never merged into CREDENTIAL_SPECS
- Never listed in __all__
- Tool never calls credentials.get('google_sheets_key') — uses 'google' (OAuth2)
- Tool names in the spec were stale and did not match actual function names

The 'google' credential in email.py already correctly covers all Google
Sheets tools via OAuth2. This file was dead code with no referencing
imports anywhere in the repository.

Closes #6077

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-09 23:44:26 +05:30
bryan 4ad0d0e077 fix: align the credential functions to be the same 2026-03-09 10:14:21 -07:00
Timothy @aden b55a77634b Delete .github/ISSUE_TEMPLATE/link-discord.yml 2026-03-08 19:44:48 -07:00
bryan cba0ec110f fix: linter update 2026-03-08 19:37:57 -07:00
bryan 0256e0c944 Merge branch 'main' into feat/agent-trigger 2026-03-08 19:28:36 -07:00
Bryan @ Aden f7db603922 Merge pull request #6048 from aden-hive/fix/draft-email-tool
(micro-fix): draft email tool
2026-03-09 02:26:58 +00:00
bryan b4a47a12ff fix: linter formatting 2026-03-08 19:26:06 -07:00
bryan 2228851b16 feat: added reply in thread to draft email tool 2026-03-08 19:24:38 -07:00
Bryan @ Aden ed0a211906 Merge pull request #6047 from aden-hive/fix/reply-email-tool
(micro-fix): reply email tool
2026-03-09 02:00:03 +00:00
bryan 63744ddaef fix: update to pass linter 2026-03-08 18:58:50 -07:00
bryan 82331acb77 feat: update reply email tool to contain the email thread in the body 2026-03-08 18:53:53 -07:00
bryan 4d9d0362a0 fixes to make the timer trigger properly 2026-03-08 18:44:42 -07:00
bryan f474d0bc8e Merge branch 'main' into feat/agent-trigger 2026-03-08 16:59:14 -07:00
bryan 6a0681b9aa feat: fixing phase 4, continuing to test 2026-03-08 16:52:00 -07:00
Robert Hallers 7a467ef9b8 docs: mark TUI as deprecated in roadmap to match CLAUDE.md
Resolves inconsistency between CLAUDE.md/AGENTS.md (TUI deprecated) and
docs/roadmap.md (TUI listed as completed feature).

- Strike through TUI items in 3 roadmap sections
- Add deprecation note to TUI-to-GUI upgrade section
- Reference AGENTS.md and hive open as replacement

Fixes #5941

Signed-off-by: Robert Hallers <robert@terplabs.ai>
2026-03-07 02:36:04 -05:00
bryan c7e634851b feat: phase 4 of trigger plan 2026-03-06 19:21:32 -08:00
bryan cdb7155960 feat: phase 3 of trigger plan 2026-03-06 18:07:26 -08:00
bryan 3f7790c26a feat: phase 2 of trigger plan 2026-03-06 17:22:57 -08:00
bryan 5676b115f4 Merge branch 'feat/queen-responsibility' into feat/agent-trigger 2026-03-06 16:58:06 -08:00
bryan 61c59d57e8 feat: phase 1 of trigger plan 2026-03-06 15:11:36 -08:00
karthik-kotra 41cd11d5c9 docs(setup): add troubleshooting steps for common WSL setup issues 2026-02-17 07:30:00 +00:00
262 changed files with 21989 additions and 10372 deletions
-31
View File
@@ -1,31 +0,0 @@
name: Link Discord Account
description: Connect your GitHub and Discord for the bounty program
title: "link: @{{ github.actor }}"
labels: ["link-discord"]
body:
- type: markdown
attributes:
value: |
Link your Discord account to receive XP and role rewards when your bounty PRs are merged.
**How to find your Discord ID:**
1. Open Discord Settings > Advanced > Enable **Developer Mode**
2. Right-click your username > **Copy User ID**
- type: input
id: discord_id
attributes:
label: Discord User ID
description: "Your numeric Discord ID (not your username). Example: 123456789012345678"
placeholder: "123456789012345678"
validations:
required: true
- type: input
id: display_name
attributes:
label: Display Name (optional)
description: How you'd like to be credited
placeholder: "Jane Doe"
validations:
required: false
-4
View File
@@ -2,10 +2,6 @@
Shared agent instructions for this workspace.
## Deprecations
- **TUI is deprecated.** The terminal UI (`hive tui`) is no longer maintained. Use the browser-based interface (`hive open`) instead.
## Coding Agent Notes
-
+1020 -17
View File
File diff suppressed because it is too large Load Diff
+10 -10
View File
@@ -5,20 +5,20 @@ help: ## Show this help
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'
lint: ## Run ruff linter and formatter (with auto-fix)
cd core && ruff check --fix .
cd tools && ruff check --fix .
cd core && ruff format .
cd tools && ruff format .
cd core && uv run ruff check --fix .
cd tools && uv run ruff check --fix .
cd core && uv run ruff format .
cd tools && uv run ruff format .
format: ## Run ruff formatter
cd core && ruff format .
cd tools && ruff format .
cd core && uv run ruff format .
cd tools && uv run ruff format .
check: ## Run all checks without modifying files (CI-safe)
cd core && ruff check .
cd tools && ruff check .
cd core && ruff format --check .
cd tools && ruff format --check .
cd core && uv run ruff check .
cd tools && uv run ruff check .
cd core && uv run ruff format --check .
cd tools && uv run ruff format --check .
test: ## Run all tests (core + tools, excludes live)
cd core && uv run python -m pytest tests/ -v
+4 -4
View File
@@ -111,7 +111,7 @@ This sets up:
- **LLM provider** - Interactive default model configuration
- All required Python dependencies with `uv`
- At last, it will initiate the open hive interface in your browser
- Finally, it will open the Hive interface in your browser
> **Tip:** To reopen the dashboard later, run `hive open` from the project directory.
@@ -125,18 +125,18 @@ Type the agent you want to build in the home input box
### Use Template Agents
Click "Try a sample agent" and check the templates. You can run a templates directly or choose to build your version on top of the existing template.
Click "Try a sample agent" and check the templates. You can run a template directly or choose to build your version on top of the existing template.
### Run Agents
Now you can run an agent by selectiing the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
Now you can run an agent by selecting the agent (either an existing agent or example agent). You can click the Run button on the top left, or talk to the queen agent and it can run the agent for you.
<img width="2500" height="1214" alt="Image" src="https://github.com/user-attachments/assets/71c38206-2ad5-49aa-bde8-6698d0bc55f5" />
## Features
- **Browser-Use** - Control the browser on your computer to achieve hard tasks
- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agent compelteing the jobs for you
- **Parallel Execution** - Execute the generated graph in parallel. This way you can have multiple agents completing the jobs for you
- **[Goal-Driven Generation](docs/key_concepts/goals_outcome.md)** - Define objectives in natural language; the coding agent generates the agent graph and connection code to achieve them
- **[Adaptiveness](docs/key_concepts/evolution.md)** - Framework captures failures, calibrates according to the objectives, and evolves the agent graph
- **[Dynamic Node Connections](docs/key_concepts/graph.md)** - No predefined edges; connection code is generated by any capable LLM based on your goals
+2 -2
View File
@@ -39,8 +39,8 @@ We consider security research conducted in accordance with this policy to be:
## Security Best Practices for Users
1. **Keep Updated**: Always run the latest version
2. **Secure Configuration**: Review `config.yaml` settings, especially in production
3. **Environment Variables**: Never commit `.env` files or `config.yaml` with secrets
2. **Secure Configuration**: Review your `~/.hive/configuration.json`, `.mcp.json`, and environment variable settings, especially in production
3. **Environment Variables**: Never commit `.env` files or any configuration files that contain secrets
4. **Network Security**: Use HTTPS in production, configure firewalls appropriately
5. **Database Security**: Use strong passwords, limit network access
+1 -1
View File
@@ -1,6 +1,6 @@
# MCP Server Guide - Agent Building Tools
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_agent_package` tool, with underlying logic in `framework.builder.package_generator`.
> **Note:** The standalone `agent-builder` MCP server (`framework.mcp.agent_builder_server`) has been replaced. Agent building is now done via the `coder-tools` server's `initialize_and_build_agent` tool, with underlying logic in `tools/coder_tools_server.py`.
This guide covers the MCP tools available for building goal-driven agents.
+1 -1
View File
@@ -19,7 +19,7 @@ uv pip install -e .
## Agent Building
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_agent_package` tool and related utilities. The underlying package generation logic lives in `framework.builder.package_generator`.
Agent scaffolding is handled by the `coder-tools` MCP server (in `tools/coder_tools_server.py`), which provides the `initialize_and_build_agent` tool and related utilities. The package generation logic lives directly in `tools/coder_tools_server.py`.
See the [Getting Started Guide](../docs/getting-started.md) for building agents.
+1 -1
View File
@@ -601,7 +601,7 @@ async def handle_ws(websocket):
)
node = EventLoopNode(
event_bus=bus,
config=LoopConfig(max_iterations=10_000, max_history_tokens=32_000),
config=LoopConfig(max_iterations=10_000, max_context_tokens=32_000),
conversation_store=STORE,
tool_executor=tool_executor,
)
+1 -1
View File
@@ -1769,7 +1769,7 @@ async def _run_pipeline(websocket, initial_message: str):
config=LoopConfig(
max_iterations=30,
max_tool_calls_per_turn=30,
max_history_tokens=64000,
max_context_tokens=64000,
max_tool_result_chars=8_000,
spillover_dir=str(_DATA_DIR),
),
+2 -2
View File
@@ -752,7 +752,7 @@ async def _run_pipeline(websocket, topic: str):
config=LoopConfig(
max_iterations=20,
max_tool_calls_per_turn=30,
max_history_tokens=32_000,
max_context_tokens=32_000,
),
conversation_store=store_a,
tool_executor=tool_executor,
@@ -850,7 +850,7 @@ async def _run_pipeline(websocket, topic: str):
config=LoopConfig(
max_iterations=10,
max_tool_calls_per_turn=30,
max_history_tokens=32_000,
max_context_tokens=32_000,
),
conversation_store=store_b,
)
+1 -1
View File
@@ -1258,7 +1258,7 @@ async def _run_org_pipeline(websocket, topic: str):
config=LoopConfig(
max_iterations=30,
max_tool_calls_per_turn=30,
max_history_tokens=32_000,
max_context_tokens=32_000,
),
conversation_store=store,
tool_executor=executor,
-3
View File
@@ -22,7 +22,6 @@ The framework includes a Goal-Based Testing system (Goal → Agent → Eval):
See `framework.testing` for details.
"""
from framework.builder.query import BuilderQuery
from framework.llm import AnthropicProvider, LLMProvider
from framework.runner import AgentOrchestrator, AgentRunner
from framework.runtime.core import Runtime
@@ -51,8 +50,6 @@ __all__ = [
"Problem",
# Runtime
"Runtime",
# Builder
"BuilderQuery",
# LLM
"LLMProvider",
"AnthropicProvider",
@@ -1,8 +1,6 @@
"""CLI entry point for Credential Tester agent."""
import asyncio
import logging
import sys
import click
@@ -10,13 +8,14 @@ from .agent import CredentialTesterAgent
def setup_logging(verbose=False, debug=False):
from framework.observability import configure_logging
if debug:
level, fmt = logging.DEBUG, "%(asctime)s %(name)s: %(message)s"
configure_logging(level="DEBUG")
elif verbose:
level, fmt = logging.INFO, "%(message)s"
configure_logging(level="INFO")
else:
level, fmt = logging.WARNING, "%(levelname)s: %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
configure_logging(level="WARNING")
def pick_account(agent: CredentialTesterAgent) -> dict | None:
@@ -51,42 +50,6 @@ def cli():
pass
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
def tui(verbose, debug):
"""Launch TUI to test a credential interactively."""
setup_logging(verbose=verbose, debug=debug)
try:
from framework.tui.app import AdenTUI
except ImportError:
click.echo("TUI requires 'textual'. Install with: pip install textual")
sys.exit(1)
agent = CredentialTesterAgent()
account = pick_account(agent)
if account is None:
sys.exit(1)
agent.select_account(account)
provider = account.get("provider", "?")
alias = account.get("alias", "?")
click.echo(f"\nTesting {provider}/{alias}...\n")
async def run_tui():
agent._setup()
runtime = agent._agent_runtime
await runtime.start()
try:
app = AdenTUI(runtime)
await app.run_async()
finally:
await runtime.stop()
asyncio.run(run_tui())
@cli.command()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--debug", is_flag=True)
@@ -19,6 +19,7 @@ from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
from framework.config import get_max_context_tokens
from framework.graph import Goal, NodeSpec, SuccessCriterion
from framework.graph.checkpoint_config import CheckpointConfig
from framework.graph.edge import GraphSpec
@@ -455,7 +456,6 @@ identity_prompt = (
loop_config = {
"max_iterations": 50,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
}
# ---------------------------------------------------------------------------
@@ -541,7 +541,7 @@ class CredentialTesterAgent:
loop_config={
"max_iterations": 50,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
"max_context_tokens": get_max_context_tokens(),
},
conversation_mode="continuous",
identity_prompt=(
+178
View File
@@ -0,0 +1,178 @@
"""Agent discovery — scan known directories and return categorised AgentEntry lists."""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from pathlib import Path
@dataclass
class AgentEntry:
"""Lightweight agent metadata for the picker / API discover endpoint."""
path: Path
name: str
description: str
category: str
session_count: int = 0
run_count: int = 0
node_count: int = 0
tool_count: int = 0
tags: list[str] = field(default_factory=list)
last_active: str | None = None
def _get_last_active(agent_name: str) -> str | None:
"""Return the most recent updated_at timestamp across all sessions."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return None
latest: str | None = None
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
continue
state_file = session_dir / "state.json"
if not state_file.exists():
continue
try:
data = json.loads(state_file.read_text(encoding="utf-8"))
ts = data.get("timestamps", {}).get("updated_at")
if ts and (latest is None or ts > latest):
latest = ts
except Exception:
continue
return latest
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
def _count_runs(agent_name: str) -> int:
"""Count unique run_ids across all sessions for an agent."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
run_ids: set[str] = set()
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
continue
# runs.jsonl lives inside workspace subdirectories
for runs_file in session_dir.rglob("runs.jsonl"):
try:
for line in runs_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
record = json.loads(line)
rid = record.get("run_id")
if rid:
run_ids.add(rid)
except Exception:
continue
return len(run_ids)
def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
"""Extract node count, tool count, and tags from an agent directory.
Prefers agent.py (AST-parsed) over agent.json for node/tool counts
since agent.json may be stale. Tags are only available from agent.json.
"""
import ast
node_count, tool_count, tags = 0, 0, []
agent_py = agent_path / "agent.py"
if agent_py.exists():
try:
tree = ast.parse(agent_py.read_text(encoding="utf-8"))
for node in ast.walk(tree):
if isinstance(node, ast.Assign):
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "nodes":
if isinstance(node.value, ast.List):
node_count = len(node.value.elts)
except Exception:
pass
agent_json = agent_path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
json_nodes = data.get("graph", {}).get("nodes", []) or data.get("nodes", [])
if node_count == 0:
node_count = len(json_nodes)
tools: set[str] = set()
for n in json_nodes:
tools.update(n.get("tools", []))
tool_count = len(tools)
tags = data.get("agent", {}).get("tags", [])
except Exception:
pass
return node_count, tool_count, tags
def discover_agents() -> dict[str, list[AgentEntry]]:
"""Discover agents from all known sources grouped by category."""
from framework.runner.cli import (
_extract_python_agent_metadata,
_get_framework_agents_dir,
_is_valid_agent_dir,
)
groups: dict[str, list[AgentEntry]] = {}
sources = [
("Your Agents", Path("exports")),
("Framework", _get_framework_agents_dir()),
("Examples", Path("examples/templates")),
]
for category, base_dir in sources:
if not base_dir.exists():
continue
entries: list[AgentEntry] = []
for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
if not _is_valid_agent_dir(path):
continue
name, desc = _extract_python_agent_metadata(path)
config_fallback_name = path.name.replace("_", " ").title()
used_config = name != config_fallback_name
node_count, tool_count, tags = _extract_agent_stats(path)
if not used_config:
agent_json = path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
meta = data.get("agent", {})
name = meta.get("name", name)
desc = meta.get("description", desc)
except Exception:
pass
entries.append(
AgentEntry(
path=path,
name=name,
description=desc,
category=category,
session_count=_count_sessions(path.name),
run_count=_count_runs(path.name),
node_count=node_count,
tool_count=tool_count,
tags=tags,
last_active=_get_last_active(path.name),
)
)
if entries:
groups[category] = entries
return groups
+1 -3
View File
@@ -14,8 +14,7 @@ queen_goal = Goal(
id="queen-manager",
name="Queen Manager",
description=(
"Manage the worker agent lifecycle and serve as the user's primary "
"interactive interface. Triage health escalations from the judge."
"Manage the worker agent lifecycle and serve as the user's primary interactive interface."
),
success_criteria=[],
constraints=[],
@@ -35,6 +34,5 @@ queen_graph = GraphSpec(
loop_config={
"max_iterations": 999_999,
"max_tool_calls_per_turn": 30,
"max_history_tokens": 32000,
},
)
+461 -222
View File
@@ -35,15 +35,14 @@ def _build_appendices() -> str:
# Shared appendices — appended to every coding node's system prompt.
_appendices = _build_appendices()
# GCU first-class section for building phase (when GCU is enabled).
# This is placed prominently in the main prompt body, not as an appendix.
_gcu_building_section = (
# GCU guide — shared between planning and building via _shared_building_knowledge.
_gcu_section = (
("\n\n# GCU Nodes — Browser Automation\n\n" + _gcu_guide)
if _is_gcu_enabled() and _gcu_guide
else ""
)
# Tools available to both coder (worker) and queen.
# Tools available to phases.
_SHARED_TOOLS = [
# File I/O
"read_file",
@@ -78,6 +77,10 @@ _QUEEN_PLANNING_TOOLS = [
"list_agent_sessions",
"list_agent_checkpoints",
"get_agent_checkpoint",
# Draft graph (visual-only, no code) — new planning workflow
"save_agent_draft",
"confirm_and_build",
# Scaffold + transition to building (requires confirm_and_build first)
"initialize_and_build_agent",
# Load existing agent (after user confirms)
"load_built_agent",
@@ -88,6 +91,7 @@ _QUEEN_BUILDING_TOOLS = _SHARED_TOOLS + [
"load_built_agent",
"list_credentials",
"replan_agent",
"save_agent_draft", # Re-draft during building → auto-dissolves + updates flowchart
"write_to_diary", # Episodic memory — available in all phases
]
@@ -106,6 +110,10 @@ _QUEEN_STAGING_TOOLS = [
"stop_worker_and_edit",
"stop_worker_and_plan",
"write_to_diary", # Episodic memory — available in all phases
# Trigger management
"set_trigger",
"remove_trigger",
"list_triggers",
]
# Running phase: worker is executing — monitor and control.
@@ -122,11 +130,16 @@ _QUEEN_RUNNING_TOOLS = [
"stop_worker_and_edit",
"stop_worker_and_plan",
"get_worker_status",
"run_agent_with_input",
"inject_worker_message",
# Monitoring
"get_worker_health_summary",
"notify_operator",
"write_to_diary", # Episodic memory — available in all phases
# Trigger management
"set_trigger",
"remove_trigger",
"list_triggers",
]
@@ -137,7 +150,8 @@ _QUEEN_RUNNING_TOOLS = [
# additions.
# ---------------------------------------------------------------------------
_shared_building_knowledge = """\
_shared_building_knowledge = (
"""\
# Shared Rules (Planning & Building)
## Paths (MANDATORY)
@@ -164,62 +178,50 @@ generate a clickable file URI for the user
IMPORTANT: Do NOT tell workers to use read_file, write_file, edit_file, \
search_files, or list_directory those are YOUR tools, not theirs.
"""
+ _gcu_section
)
_planning_knowledge = """\
**A responsible engineer doesn't jump into building. First, \
understand the problem and be transparent about what the framework can and cannot do.**
Use the user's selection (or their custom description if they chose "Other") \
as context when shaping the goal below. If the user already described \
what they want before this step, skip the question and proceed directly.
**Be responsible, understand the problem by asking practical qualify questions \
and be transparent about what the framework can and cannot do.**
# Core Mandates (Planning)
- **DO NOT propose a complete goal on your own.** Instead, \
collaborate with the user to define it.
- **NEVER call `initialize_and_build_agent` without explicit user approval.** \
Present the full design first and wait for the user to confirm before building.
- **Discover tools dynamically.** NEVER reference tools from static \
docs. Always run list_agent_tools() to see what actually exists.
# Tool Discovery (MANDATORY before designing)
Before designing any agent, run list_agent_tools() with NO arguments \
to see ALL available tools (names + descriptions, grouped by category). \
ONLY use tools from this list in your node definitions. \
Before designing any agent, discover tools progressively start compact, drill into \
what you need. ONLY use tools from this list in your node definitions. \
NEVER guess or fabricate tool names from memory.
list_agent_tools() # ALWAYS call this first (simple mode)
list_agent_tools(group="google", output_schema="full") # drill into a provider
list_agent_tools() # Step 1: provider summary
list_agent_tools(group="google", output_schema="summary") # Step 2: service breakdown
list_agent_tools(group="google", service="gmail") # Step 3: tool names
list_agent_tools( # Step 4: full detail
group="google", service="gmail", output_schema="full"
)
NEVER skip the first call. Always start with the full list \
so you know what providers and tools exist before drilling in. \
Simple mode truncates long descriptions use group + "full" to \
get the complete description and input_schema for the tools you need.
Step 1 is MANDATORY. Returns provider names, tool counts, credential availability very compact. \
Step 2 breaks a provider into services (e.g. google gmail/calendar/sheets/drive). Only do this \
for providers that are relevant to the task. \
Step 3 gets tool names for a specific service no descriptions, minimal tokens. \
Step 4 only for services you plan to actually use. \
Use credentials="available" at any step to filter to tools whose credentials are already configured.
# Discovery & Design Workflow
## 1: Fast Discovery (3-6 Turns)
## 1: Discovery (3-6 Turns)
**The core principle**: Discovery should feel like progress, not paperwork. \
The stakeholder should walk away feeling like you understood them faster \
than anyone else would have.
**Communication sytle**: Be concise. Say less. Mean more. Impatient stakeholders \
don't want a wall of text — they want to know you get it. Every sentence you say \
should either move the conversation forward or prove you understood something. \
If it does neither, cut it.
**Ask Question Rules: Respect Their Time.** Every question must earn its place by:
1. **Preventing a costly wrong turn** you're about to build the wrong thing
2. **Unlocking a shortcut** their answer lets you simplify the design
3. **Surfacing a dealbreaker** there's a constraint that changes everything
4. **Provide Options** - Provide options to your questions if possible, \
but also always allow the user to type something beyong the options.
If a question doesn't do one of these, don't ask it. Make an assumption, state it, and move on.
---
### 1.1: Let Them Talk, But Listen Like an Solution Architect
Ask questions to help the user find bridge the goal and the solution \
When the stakeholder describes what they want, mentally construct:
- **The pain**: What about today's situation is broken, slow, or missing?
@@ -230,57 +232,6 @@ When the stakeholder describes what they want, mentally construct:
---
### 1.2: Use Domain Knowledge to Fill In the Blanks
You have broad knowledge of how systems work. Use it aggressively.
If they say "I need a research agent," you already know it probably involves: \
search, summarization, source tracking, and iteration. Don't ask about each — \
use them as your starting mental model and let their specifics override your defaults.
If they say "I need to monitor files and alert me," you know this probably involves: \
watch patterns, triggers, notifications, and state tracking.
---
### 1.3: Play Back a Proposed Model (Not a List of Questions)
After listening, present a **concrete picture** of what you think they need. \
Make it specific enough that they can spot what's wrong. \
Can you ASCII to show the user
**Pattern: "Here's what I heard — tell me where I'm off"**
> "OK here's how I'm picturing this: [User type] needs to [core action]. \
Right now they're [current painful workflow]. \
What you want is [proposed solution that replaces the pain].
> The way I'd structure this: [key entities] connected by [key relationships], \
with the main flow being [trigger steps outcome].
> For the MVP, I'd focus on [the one thing that delivers the most value] \
and hold off on [things that can wait].
> Before I start [1-2 specific questions you genuinely can't infer]."
---
### 1.4: Ask Only What You Cannot Infer
Your questions should be **narrow, specific, and consequential**. \
Never ask what you could answer yourself.
**Good questions** (high-stakes, can't infer):
- "Who's the primary user — you or your end customers?"
- "Is this replacing a spreadsheet, or is there literally nothing today?"
- "Does this need to integrate with anything, or standalone?"
- "Is there existing data to migrate, or starting fresh?"
**Bad questions** (low-stakes, inferable):
- "What should happen if there's an error?" *(handle gracefully, obviously)*
- "Should it have search?" *(if there's a list, yes)*
- "How should we handle permissions?" *(follow standard patterns)*
- "What tools should I use?" *(your call, not theirs)*
---
## 2: Capability Assessment & Gap Analysis
**After the user responds, assess fit and gaps together.** Be honest and specific. \
@@ -295,70 +246,153 @@ Present a short **Framework Fit Assessment**:
- **Gaps/Deal-breakers**: Only list genuinely missing capabilities after checking \
both list_agent_tools() and built-in features like GCU
## 3: Design Graph and Propose
### Credential Check (MANDATORY)
Act like an experienced AI solution architect Design the agent architecture:
- Goal: id, name, description, 3-5 success criteria, 2-4 constraints
- Nodes: **3-6 nodes** (HARD RULE: never fewer than 3, never more than 6). \
2 nodes is ALWAYS wrong it means you under-decomposed the task. \
Use as many nodes as the use case requires, but don't create nodes without \
tools merge them into nodes that do real work.
- Edges: on_success for linear, conditional for routing
- Lifecycle: ALWAYS have terminal_nodes
The summary from list_agent_tools() includes `credentials_required` and \
`credentials_available` per provider. **Before designing the graph**, check \
which providers the design will need and whether credentials are available.
**MERGE nodes when:**
- Node has NO tools (pure LLM reasoning) merge into predecessor/successor
- Node sets only 1 trivial output collapse into predecessor
For each provider whose tools you plan to use and where \
`credentials_available` is false:
- Tell the user which credential is missing and what it's needed for
- Ask if they have access to set it up (e.g., API key, OAuth, service account)
- If they don't have access, adjust the design to work without that provider \
or suggest alternatives
**SEPARATE nodes when:**
- Fundamentally different tool sets (e.g., search vs. write vs. validate)
- Fan-out parallelism (parallel branches MUST be separate)
- Different failure/retry semantics (e.g., gather can retry, transform cannot)
- Distinct phases of work (e.g., research, transform, validate, deliver)
- A node would need more than ~5 tools split by responsibility
**Do NOT proceed to the design step with tools that require unavailable \
credentials without the user acknowledging it.** Finding out at runtime that \
credentials are missing wastes everyone's time. Surface this early.
**Typical patterns (queen manages all user interaction):**
- 3 nodes: `gather work review`
- 4 nodes: `gather analyze transform review`
- 5 nodes: `gather research transform validate deliver`
- WRONG: 2 nodes where everything is crammed into one giant node
- WRONG: 7 nodes where half have no tools and just do LLM reasoning
Example:
> "The design needs Google Sheets tools, but the `google` credential isn't \
configured yet. Do you have a Google service account or OAuth credentials \
you can set up? If not, I can use CSV file output instead."
Read reference agents before designing:
list_agents()
read_file("exports/deep_research_agent/agent.py")
read_file("exports/deep_research_agent/nodes/__init__.py")
## 3: Design flowchart
Present the design to the user. Lead with a large ASCII graph inside \
a code block so it renders in monospace. Make it visually prominent \
use box-drawing characters and clear flow arrows:
Act like an experienced AI solution architect. Design the agent architecture \
in the flowchart
The flowchart is the shared canvas. Every structural change should be \
visible to the user immediately. The draft captures business logic \
(node purposes, data flow, tools) without requiring executable code. \
Include in each node: id, name, description, planned tools, \
input/output keys, and success criteria as high-level hints.
Each node is auto-classified into an ISO 5807 flowchart symbol type \
with a unique color. You can override auto-detection by setting \
`flowchart_type` explicitly on a node. Common types:
**Core symbols:**
- **start** (green, stadium): Entry point / trigger
- **terminal** (red, stadium): End of flow
- **process** (blue, rectangle): Standard processing step
- **decision** (amber, diamond): Conditional branching
- **io** (purple, parallelogram): External data input/output
- **document** (blue-grey, wavy rect): Report or document generation
- **subprocess** (teal, subroutine): Delegated sub-agent / predefined process
- **preparation** (brown, hexagon): Setup / initialization step
- **manual_operation** (pink, trapezoid): Human-in-the-loop / manual review
- **delay** (orange, D-shape): Wait / throttle / cooldown
- **display** (cyan): Present results to user
**Data storage:**
- **database** (light green, cylinder): Database or data store
- **stored_data** (lime): Generic persistent data
- **internal_storage** (amber): In-memory / cache
**Flow operations:**
- **merge** (indigo, inv. triangle): Combine multiple inputs
- **extract** (indigo, triangle): Split or filter data
- **connector** (grey, circle): On-page link
- **offpage_connector** (dark grey, pentagon): Cross-page link
**Domain-specific:**
- **browser** (dark indigo, hexagon): GCU browser automation / sub-agent \
delegation. At build time, browser nodes are dissolved into the parent \
node's sub_agents list. Use for any GCU or sub-agent leaf node.
Auto-detection works well for most cases: first node start, nodes with \
no outgoing edges terminal, nodes with multiple conditional outgoing \
edges decision, GCU nodes browser, nodes mentioning "database" \
database, nodes mentioning "report/document" document, etc. Set \
flowchart_type explicitly only when auto-detection would be wrong.
## Decision Nodes — Planning-Only Conditional Branching
Decision nodes (amber diamonds) are **planning-only** visual elements. They \
let you show explicit conditional logic in the flowchart so the user can see \
and approve branching behavior. At `confirm_and_build()`, decision nodes are \
automatically **dissolved** into the runtime graph:
- The decision clause is merged into the predecessor node's `success_criteria`
- The yes/no edges are rewired as the predecessor's `on_success`/`on_failure` edges
- The original flowchart (with decision diamonds) is preserved for display
**When to use decision nodes:**
- When a workflow has a meaningful condition that determines the next step \
(e.g., "Did we find enough results?", "Is the data valid?", "Amount > $100?")
- When the branching logic is important for the user to understand and approve
- When different outcomes lead to genuinely different processing paths
**How to create a decision node:**
- Set `flowchart_type: "decision"` on the node
- Set `decision_clause` to the condition text (e.g., "Data passes validation?")
- Add two outgoing edges with `label: "Yes"` and `label: "No"` pointing \
to the respective target nodes
**Good flowcharts display conditions explicitly.** During planning, the user \
sees the full flowchart with decision diamonds. This is different from the \
building/running phase where conditions are embedded inside node criteria. \
The flowchart is the user-facing contract make branching logic visible.
Example with a decision node:
```
gather
subagent: gcu_search
input: user_request
tools: web_search,
save_data
on_success
work
subagent: gcu_interact
tools: load_data,
save_data
on_success
review
tools: save_data
serve_file_to_user
on_failure
back to gather
gather [Valid data?] Yes transform deliver
No notify_user
```
In the draft: the `[Valid data?]` node has `flowchart_type: "decision"`, \
`decision_clause: "Data passes validation checks?"`, with labeled yes/no edges.
## Sub-Agent Nodes — Planning-Only Delegation
Sub-agent nodes (dark teal subroutines) are **planning-only** visual elements \
that show which nodes delegate to sub-agents. At `confirm_and_build()`, \
sub-agent nodes are **dissolved** into their parent node:
- The sub-agent node's ID is added to the predecessor's `sub_agents` list
- The sub-agent node and its connecting edge are removed
- At runtime, the parent node can invoke the sub-agent via `delegate_to_sub_agent`
**Rules for sub-agent nodes (INCLUDING GCU nodes):**
- GCU nodes are auto-detected as `flowchart_type: "browser"` (hexagon)
- Connect from the managing parent node to the sub-agent node
- Sub-agent nodes must be **leaf nodes** NO outgoing edges to other nodes
- At build time, browser/GCU nodes are dissolved into the parent's \
`sub_agents` list, just like decision nodes are dissolved into criteria
**CRITICAL: GCU nodes (`node_type: "gcu"`) are ALWAYS sub-agents.** \
They MUST NOT appear in the linear flow. NEVER chain GCU nodes \
sequentially (A gcu1 gcu2 B is WRONG). Instead, attach them \
as leaves to the parent that orchestrates them:
```
WRONG: intake gcu_find_prospect gcu_scan_mutuals check_results
WRONG: decision_node gcu_node (as a yes/no branch)
RIGHT: intake (sub_agents: [gcu_find, gcu_scan]) check_results
```
The parent node delegates to its GCU sub-agents and collects results. \
The main flow continues from the parent, not from the GCU node. \
GCU nodes MUST NOT be children of decision nodes decision nodes \
dissolve at build time, which would leave the GCU as a dangling \
workflow step.
**How to show delegation in the flowchart:**
```
research (deep_searcher) browser/GCU node, leaf
research [Enough results?] decision node
```
After dissolution: `research` node gets `sub_agents: ["deep_searcher"]` \
and `success_criteria: "Enough results?"`.
If the worker agent start from some initial input it is okay. \
The queen(you) owns intake: you gathers user requirements, then calls \
@@ -367,17 +401,25 @@ When building the agent, design the entry node's `input_keys` to \
match what the queen will provide at run time. Worker nodes should \
use `escalate` for blockers.
Follow the graph with a brief summary of each node's purpose. \
Get user approval before implementing.
## 4: Get User Confirmation (MANDATORY GATE)
## 4: Get User Confirmation by ask_user
**This is a hard boundary between planning and building.** \
You MUST get explicit user approval before ANY code is generated.
**WAIT for user response.**
- If **Proceed**: Move to next implementing
- If **Adjust scope**: Discuss what to change, update your notes, re-assess if needed
- If **More questions**: Answer them honestly, then ask again
- If **Reconsider**: Discuss alternatives. If they decide to proceed anyway, \
that's their informed choice
1. Call ask_user() with options like \
["Approve and build", "Adjust the design", "I have questions"]
2. **WAIT for user response.** Do NOT proceed without it.
3. Handle the response:
- If **Approve / Proceed**: Call confirm_and_build(), then \
initialize_and_build_agent(agent_name, nodes)
- If **Adjust scope**: Discuss changes, update the draft with \
save_agent_draft() again, and re-ask
- If **More questions**: Answer them honestly, then ask again
- If **Reconsider**: Discuss alternatives. If they decide to proceed, \
that's their informed choice
**NEVER call initialize_and_build_agent without first calling \
confirm_and_build().** The system will block the transition if you try.
"""
_building_knowledge = """\
@@ -405,11 +447,10 @@ hashline=True for anchors in results
- undo_changes(path?) restore from git snapshot
## Meta-Agent
- list_agent_tools(server_config_path?, output_schema?, group?) discover \
available tools grouped by category. output_schema: "simple" (default, \
descriptions truncated to ~200 chars) or "full" (complete descriptions + \
input_schema). group: "all" (default) or a provider like "google". \
Call FIRST before designing.
- list_agent_tools(group?, service?, output_schema?, credentials?) discover tools \
progressively: no args=provider summary; group+output_schema="summary"=service breakdown; \
group+service=tool names; group+service+output_schema="full"=full details. \
credentials="available" filters to configured tools. Call FIRST before designing.
- validate_agent_package(agent_name) run ALL validation checks in one call \
(class validation, runner load, tool validation, tests). Call after building.
- list_agents() list all agent packages in exports/ with session counts
@@ -435,25 +476,44 @@ When a user says "my agent is failing" or "debug this agent":
## 5. Implement
**Please make sure you have propose the design to the user before implementing**
**You should only reach this step after the user has approved the draft design \
in the planning phase. The draft metadata will pre-populate descriptions, \
goals, success criteria, and node metadata in the generated files.**
Call `initialize_and_build_agent(agent_name)` to generate all package files \
from your graph session. The agent_name must be snake_case (e.g., "my_agent").
Call `initialize_and_build_agent(agent_name, nodes)` to generate all package \
files. The agent_name must be snake_case (e.g., "my_agent"). Pass node names \
as comma-separated string (e.g., "gather,process,review").
The tool creates: config.py, nodes/__init__.py, agent.py, \
__init__.py, __main__.py, mcp_servers.json, tests/conftest.py, \
agent.json, README.md.
__init__.py, __main__.py, mcp_servers.json, tests/conftest.py.
The generated files are **structurally complete** with correct imports, \
class definition, `validate()` method, `default_agent` export, and \
`__init__.py` re-exports. They pass validation as-is.
`mcp_servers.json` is auto-generated with hive-tools as the default. \
Do NOT manually create or overwrite `mcp_servers.json`.
After initialization, review and customize if needed:
- System prompts in nodes/__init__.py
- CLI options in __main__.py
- Identity prompt in agent.py
- For async entry points (timers/webhooks), add AsyncEntryPointSpec \
and AgentRuntimeConfig to agent.py manually
### Customizing generated files
Do NOT manually write these files from scratch always use the tool.
**CRITICAL: Use `edit_file` to customize TODO placeholders. \
NEVER use `write_file` to rewrite generated files from scratch. \
Rewriting breaks imports, class structure, and causes validation failures.**
Safe to edit with `edit_file`:
- System prompts, tools, input_keys, output_keys, success_criteria in \
nodes/__init__.py
- Goal description, success criteria values, constraint values, edge \
definitions, identity_prompt in agent.py
- CLI options in __main__.py
- For triggers (timers/webhooks), add entries to triggers.json in the \
agent's export directory
Do NOT modify or rewrite:
- Import statements at top of agent.py (they are correct)
- The agent class definition, `validate()`, `_build_graph()`, `_setup()`, \
or lifecycle methods (start/stop/run)
- `__init__.py` exports (all required variables are already re-exported)
- `default_agent = ClassName()` at bottom of agent.py
## 6. Verify and Load
@@ -481,12 +541,15 @@ _package_builder_knowledge = _shared_building_knowledge + _planning_knowledge +
_queen_identity_planning = """\
You are an experienced, responsible and curious Solution Architect. \
"Queen" is the internal alias. \
You ask smart questions to guide user to the solution \
You are in PLANNING phase your job is to either: \
(a) understand what the user wants and design a new agent, or \
(b) diagnose issues with an existing agent, discuss a fix plan with the user, \
then transition to building to implement. \
You have read-only tools for exploration but no write/edit tools. \
Focus on conversation, research, and design.\
Focus on conversation, research, and design. \
You MUST use ask_user / ask_user_multiple tools for ALL questions \
never ask questions in plain text without calling the tool.\
"""
_queen_identity_building = """\
@@ -529,24 +592,45 @@ but no write/edit tools.
- run_command(command, cwd?, timeout?) Read-only commands only (grep, ls, git log). \
Never use this to write files, run scripts, or modify the filesystem transition \
to BUILDING phase for that.
- list_agent_tools(server_config_path?, output_schema?, group?) \
Discover available tools for design
- list_agent_tools(server_config_path?, output_schema?, group?, credentials?) \
Discover available tools for design (summary names full)
- list_agents() See existing agent packages for reference
- list_agent_sessions(agent_name, status?, limit?) Inspect past runs of an agent
- list_agent_checkpoints(agent_name, session_id) View execution history
- get_agent_checkpoint(agent_name, session_id, checkpoint_id?) Load a checkpoint
- initialize_and_build_agent(agent_name?, nodes?) With agent_name: scaffold a \
new agent and transition to BUILDING phase. Without agent_name: transition to \
BUILDING to fix the currently loaded agent (requires a loaded worker).
## Draft Graph Workflow (new agents)
- save_agent_draft(agent_name, goal, nodes, edges?, terminal_nodes?, ...) \
Create an ISO 5807 color-coded flowchart draft. No code is generated. Each \
node is auto-classified into a standard flowchart symbol (process, decision, \
document, database, subprocess, etc.) with unique shapes and colors. Set \
flowchart_type on a node to override. Nodes need only an id. \
Use decision nodes (flowchart_type: "decision", with decision_clause and \
labeled yes/no edges) to make conditional branching explicit. \
GCU/sub-agent nodes (node_type: "gcu") are auto-detected as browser \
hexagons connect them as leaf nodes to their parent.
- confirm_and_build() Record user confirmation of the draft. Dissolves \
planning-only nodes (decision predecessor criteria; browser/GCU \
predecessor sub_agents list). Call this ONLY after the user explicitly \
approves via ask_user.
- initialize_and_build_agent(agent_name?, nodes?) Scaffold the agent package \
and transition to BUILDING phase. For new agents, this REQUIRES \
save_agent_draft() + confirm_and_build() first. The draft metadata is used to \
pre-populate the generated files. Without agent_name: transition to BUILDING \
to fix the currently loaded agent (no draft required).
## Loading existing agents
- load_built_agent(agent_path) Load an existing agent and switch to STAGING \
phase. Only use this when the user explicitly asks to work with an existing agent \
(e.g. "load my_agent", "run the research agent"). Confirm with the user first.
Focus on understanding requirements and proposing an agent architecture \
with ASCII graph art. Use ask_user to get user approval, then call \
initialize_and_build_agent to begin building. If the user wants to work with \
an existing agent instead, use load_built_agent after confirming. \
If you are diagnosing an existing agent, call initialize_and_build_agent() \
## Workflow summary
1. Understand requirements discover tools design graph
2. Call save_agent_draft() to create visual draft present to user
3. Call ask_user() to get explicit approval
4. Call confirm_and_build() to record approval
5. Call initialize_and_build_agent() to scaffold and start building
For diagnosis of existing agents, call initialize_and_build_agent() \
(no args) after agreeing on a fix plan with the user.
"""
@@ -561,6 +645,15 @@ list_agents, list_agent_sessions, \
list_agent_checkpoints, get_agent_checkpoint
- load_built_agent(agent_path) Load the agent and switch to STAGING phase
- list_credentials(credential_id?) List authorized credentials
- save_agent_draft(...) **Re-draft the flowchart during building.** When \
called during building, planning-only nodes (decision, browser/GCU) are \
dissolved automatically no re-confirmation needed. The user sees the \
updated flowchart immediately. Use this when you make structural changes \
(add/remove nodes, change edges) so the flowchart stays in sync.
- replan_agent() Switch back to PLANNING phase. The previous draft is \
restored (with decision/browser nodes intact) so you can edit it. Use \
when the user wants to change integrations, swap tools, rethink the \
flow, or discuss any design changes before you build them.
When you finish building an agent, call load_built_agent(path) to stage it.
"""
@@ -576,6 +669,9 @@ The agent is loaded and ready to run. You can inspect it and launch it:
- stop_worker_and_plan() Go to PLANNING phase to discuss changes with the user \
first (DEFAULT for most modification requests)
- stop_worker_and_edit() Go to BUILDING phase for immediate, specific fixes
- set_trigger(trigger_id, trigger_type?, trigger_config?) Activate a trigger (timer)
- remove_trigger(trigger_id) Deactivate a trigger
- list_triggers() List all triggers and their active/inactive status
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction.
@@ -598,6 +694,15 @@ with the user first (DEFAULT for most modification requests)
You do NOT have write tools. To modify the agent, prefer \
stop_worker_and_plan() unless the user gave a specific instruction. \
To just stop without modifying, call stop_worker().
- stop_worker_and_edit() Stop the worker and switch back to BUILDING phase
- set_trigger(trigger_id, trigger_type?, trigger_config?) Activate a trigger (timer)
- remove_trigger(trigger_id) Deactivate a trigger
- list_triggers() List all triggers and their active/inactive status
You do NOT have write tools or agent construction tools. \
If you need to modify the agent, call stop_worker_and_edit() to switch back \
to BUILDING phase. To stop the worker and ask the user what to do next, call \
stop_worker() to return to STAGING phase.
"""
# -- Behavior shared across all phases --
@@ -605,25 +710,57 @@ To just stop without modifying, call stop_worker().
_queen_behavior_always = """
# Behavior
## CRITICAL RULE — ask_user tool
## CRITICAL RULE — ask_user / ask_user_multiple
Every response that ends with a question, a prompt, or expects user \
input MUST finish with a call to ask_user(prompt, options). \
input MUST finish with a call to ask_user or ask_user_multiple. \
The system CANNOT detect that you are waiting for \
input unless you call ask_user. You MUST call ask_user as the LAST \
input unless you call one of these tools. You MUST call it as the LAST \
action in your response.
NEVER end a response with a question in text without calling ask_user. \
NEVER rely on the user seeing your text and replying call ask_user.
NEVER rely on the user seeing your text and replying call ask_user. \
NEVER list options as text bullets the tool renders interactive buttons.
**When you have 2+ questions**, use ask_user_multiple instead of ask_user. \
This renders all questions at once so the user answers in one interaction \
instead of going back and forth. ALWAYS prefer ask_user_multiple when \
you need to clarify multiple things. \
**IMPORTANT: When using ask_user_multiple, do NOT repeat the questions \
in your text response.** The widget renders the questions with options \
duplicating them in text wastes the user's time and delays the widget \
appearing. Keep your text to a brief context/intro sentence only.
Always provide 2-4 short options that cover the most likely answers. \
The user can always type a custom response.
Examples:
- ask_user("What do you need?",
["Build a new agent", "Run the loaded worker", "Help with code"])
- ask_user("Which pattern?",
["Simple 3-node", "Rich with feedback", "Custom"])
### WRONG — never do this:
```
I need a few details:
- Documentation Source: Where should the agent look?
- Trigger: Should the agent poll or get a URL?
- Review Channel: Slack, Email, or Sheets?
Which of these would you like to define first?
1. Documentation source
2. Trigger
3. Review channel
```
This lists questions as plain text with NO tool call the user has no \
interactive widget and the system doesn't know you're waiting for input.
### RIGHT — always do this:
Write a brief intro (1-2 sentences), then call the tool:
- ask_user_multiple(questions=[
{"id": "docs", "prompt": "Where should the agent find answers?",
"options": ["GitHub repo", "Documentation website", "Internal wiki"]},
{"id": "trigger", "prompt": "How should questions be discovered?",
"options": ["Poll search automatically", "I provide a URL"]},
{"id": "review", "prompt": "Where to send drafted responses?",
"options": ["Slack", "Email", "Google Sheets"]}
])
Examples (single question):
- ask_user("Ready to proceed?",
["Yes, go ahead", "Let me change something"])
@@ -663,16 +800,49 @@ _queen_behavior_planning = """
## Planning phase
You are in planning mode. Your job is to:
1. Understand what the user wants (3-6 turns)
2. Discover available tools with list_agent_tools()
3. Assess framework fit and gaps
4. Design the agent graph and present it as ASCII art
5. Use ask_user to get explicit user approval
6. Call initialize_and_build_agent(agent_name, nodes) to scaffold and start building
1. Thoroughly explore the code for the worker agent you're working on
2. Understand what the user wants (3-6 turns)
3. Discover available tools with list_agent_tools()
4. Assess framework fit and gaps
5. Consider multiple approaches and their trade-offs
6. Design the agent graph call save_agent_draft() **as soon as you have a \
rough shape**, even before finalizing all details
7. **Iterate on the draft interactively** every time the user gives feedback \
that changes the structure, call save_agent_draft() again so they see the \
update in real-time. The flowchart is a live collaboration tool.
8. When the design is stable, use ask_user to get explicit approval
9. Call confirm_and_build() after the user approves
10. Call initialize_and_build_agent(agent_name, nodes) to scaffold and start building
Do NOT skip ahead to implementation. You have read-only tools but no write/edit \
tools in this phase. If the user asks you to write code, explain that you need \
to finalize the plan first.
**The flowchart is your shared whiteboard.** Don't describe changes in text \
and then ask "should I update the draft?" just update it. If the user says \
"add a validation step," immediately call save_agent_draft() with the new \
node added. If they say "remove that," update and re-draft. The user should \
see every structural change reflected in the visualizer as you discuss it.
**CRITICAL: Planning Building boundary.** You MUST get explicit user \
confirmation before moving to building. The sequence is:
save_agent_draft() iterate with user ask_user() confirm_and_build() \
initialize_and_build_agent()
Skipping any of these steps will be blocked by the system.
Remember: DO NOT write or edit any files yet. This is a read-only exploration \
and planning phase. You have read-only tools but no write/edit tools in this \
phase. If the user asks you to write code, explain that you need to finalize \
the plan first.
## Diagnosis mode (returning from staging/running)
If you entered planning from a running/staged agent (via stop_worker_and_plan), \
your priority is diagnosis, not new design:
1. Inspect the agent's checkpoints, sessions, and logs to understand what went wrong
2. Summarize the root cause to the user
3. Propose a fix plan (what to change, what behavior to adjust)
4. Get user approval via ask_user
5. Call initialize_and_build_agent() (no args) to transition to building and implement the fix
Do NOT start the full discovery workflow (tool discovery, gap analysis) in \
diagnosis mode you already have a built agent, you just need to fix it.
"""
_queen_memory_instructions = """
@@ -707,6 +877,41 @@ run_agent_with_input(task) (if in staging) or load then run (if in building)
subtasks to justify delegation.
- Building, modifying, or configuring agents is ALWAYS your job. Never \
delegate agent construction to the worker, even as a "research" subtask.
## Keeping the flowchart in sync during building
When you make structural changes to the agent (add/remove/rename nodes, \
change edges, modify sub-agent assignments), call save_agent_draft() to \
update the flowchart. During building, this auto-dissolves planning-only \
nodes without needing user re-confirmation. The user sees the updated \
flowchart immediately.
- **Minor changes** (add a node, rename, adjust edges): call \
save_agent_draft() with the updated graph and keep building.
- **User wants to discuss, redesign, or change integrations/tools**: call \
replan_agent(). The previous draft is restored so you can edit it with \
the user. After they approve, confirm_and_build() continue building.
**When to call replan_agent():** Changing which tools or integrations a \
node uses, swapping data sources, rethinking the flow, or any time the \
user says "replan", "go back", "let's redesign", "change the approach", \
"use a different tool/API", etc. Do NOT stay in building to handle these \
switch to planning so the user can review and approve the new design.
## CRITICAL — Graph topology errors require replanning, not code edits
If you discover that the agent graph has structural problems GCU nodes \
in the linear flow, missing edges, wrong node connections, incorrect \
sub-agent assignments you MUST call replan_agent() and fix the draft. \
Do NOT attempt to fix topology by editing agent.py directly. The graph \
structure is defined by the draft dissolution code-gen pipeline. \
Editing code to rewire nodes bypasses the flowchart and creates drift \
between what the user sees and what the code does.
**WRONG:** "Let me fix agent.py to remove GCU nodes from edges..."
**RIGHT:** Call replan_agent(), fix the draft with save_agent_draft(), \
get user approval, then confirm_and_build() the corrected code is \
generated automatically.
"""
# -- STAGING phase behavior --
@@ -772,15 +977,7 @@ stages, tools, and edges from the loaded worker. Do NOT enter the \
agent building workflow you are describing what already exists, not \
building something new.
## Modifying the loaded worker
When the user asks to change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
**Default: use stop_worker_and_plan().** Most modification requests need \
discussion first what to change, why, and how. Only skip planning if \
the user gave you an explicit, unambiguous instruction (e.g., "delete node X", \
"change the model to gpt-4o").
## Fixing or Modifying the loaded worker
Use stop_worker_and_plan() when:
- The user says "modify", "improve", "fix", or "change" without specifics
@@ -792,6 +989,33 @@ Use stop_worker_and_edit() only when:
- The user gave a specific, concrete instruction ("add save_data to the gather node")
- You already discussed the fix in a previous planning session
- The change is trivial and unambiguous (rename, toggle a flag)
## Trigger Management
Use list_triggers() to see available triggers from the loaded worker.
Use set_trigger(trigger_id) to activate a timer. Once active, triggers \
fire periodically and inject [TRIGGER: ...] messages so you can decide \
whether to call run_agent_with_input(task).
### When the user says "Enable trigger <id>" (or clicks Enable in the UI):
1. Call get_worker_status(focus="memory") to check if the worker has \
saved configuration (rules, preferences, settings from a prior run).
2. If memory contains saved config: compose a task string from it \
(e.g. "Process inbox emails using saved rules") and call \
set_trigger(trigger_id, task="...") immediately. Tell the user the \
trigger is now active and what schedule it uses. Do NOT ask them to \
provide the task you derive it from memory.
3. If memory is empty (no prior run): tell the user the agent needs to \
run once first so its configuration can be saved. Offer to run it now. \
Once the worker finishes, enable the trigger.
4. If the user just provided config this session (rules/task context \
already in conversation): use that directly, no memory lookup needed. \
Enable the trigger immediately.
Never ask "what should the task be?" when enabling a trigger for an \
agent with a clear purpose. The task string is a brief description of \
what the worker does, derived from its saved state or your current context.
"""
# -- RUNNING phase behavior --
@@ -806,7 +1030,6 @@ NOT ask the user directly.
You wake up when:
- The user explicitly addresses you
- A worker escalation arrives (`[WORKER_ESCALATION_REQUEST]`)
- An escalation ticket arrives from the judge
- The worker finishes (`[WORKER_TERMINAL]`)
If the user asks for progress, call get_worker_status() ONCE and report. \
@@ -876,14 +1099,29 @@ building something new.
- Call get_worker_status(focus="issues") for more details when needed.
## Modifying the loaded worker
## Fixing or Modifying the loaded worker
When the user asks to change, modify, or update the loaded worker \
When the user asks to fix, change, modify, or update the loaded worker \
(e.g., "change the report node", "add a node", "delete node X"):
**Default: use stop_worker_and_plan().** Most modification requests need \
discussion first. Only use stop_worker_and_edit() when the user gave a \
specific, unambiguous instruction or you already agreed on the fix.
## Trigger Handling
You will receive [TRIGGER: ...] messages when a scheduled timer fires. \
These are framework-level signals, not user messages.
Rules:
- Check get_worker_status() before calling run_agent_with_input(task). If the worker \
is already RUNNING, decide: skip this trigger, or note it for after completion.
- When multiple [TRIGGER] messages arrive at once, read them all before acting. \
Batch your response do not call run_agent_with_input() once per trigger.
- If a trigger fires but the task no longer makes sense (e.g., user changed \
config since last run), skip it and inform the user.
- Never disable a trigger without telling the user. Use remove_trigger() only \
when explicitly asked or when the trigger is clearly obsolete.
"""
# -- Backward-compatible composed versions (used by queen_node.system_prompt default) --
@@ -901,14 +1139,16 @@ _queen_tools_docs = (
+ "\n\n### RUNNING phase (worker is executing)\n"
+ _queen_tools_running.strip()
+ "\n\n### Phase transitions\n"
"- initialize_and_build_agent(agent_name?, nodes?) → with name: scaffolds package; "
"without name: switches to BUILDING for existing agent\n"
"- save_agent_draft(...) → creates visual-only draft graph (stays in PLANNING)\n"
"- confirm_and_build() → records user approval of draft (stays in PLANNING)\n"
"- initialize_and_build_agent(agent_name?, nodes?) → scaffolds package + switches to "
"BUILDING (requires draft + confirmation for new agents)\n"
"- replan_agent() → switches back to PLANNING phase (only when user explicitly requests)\n"
"- load_built_agent(path) → switches to STAGING phase\n"
"- run_agent_with_input(task) → starts worker, switches to RUNNING phase\n"
"- stop_worker() → stops worker, switches to STAGING phase (ask user: re-run or edit?)\n"
"- stop_worker_and_edit() → stops worker (if running), switches to BUILDING phase\n"
"- stop_worker_and_plan() → stops worker (if running), switches to PLANNING phase for diagnosis\n"
"- stop_worker_and_plan() → stops worker (if running), switches to PLANNING phase\n"
)
_queen_behavior = (
@@ -945,8 +1185,8 @@ ticket_triage_node = NodeSpec(
id="ticket_triage",
name="Ticket Triage",
description=(
"Queen's triage node. Receives an EscalationTicket from the Health Judge "
"via event-driven entry point and decides: dismiss or notify the operator."
"Queen's triage node. Receives an EscalationTicket via event-driven "
"entry point and decides: dismiss or notify the operator."
),
node_type="event_loop",
client_facing=True, # Operator can chat with queen once connected (Ctrl+Q)
@@ -960,8 +1200,8 @@ ticket_triage_node = NodeSpec(
),
tools=["notify_operator"],
system_prompt="""\
You are the Queen. The Worker Health Judge has escalated a worker \
issue to you. The ticket is in your memory under key "ticket". Read it carefully.
You are the Queen. A worker health issue has been escalated to you. \
The ticket is in your memory under key "ticket". Read it carefully.
## Dismiss criteria — do NOT call notify_operator:
- severity is "low" AND steps_since_last_accept < 8
@@ -1000,7 +1240,7 @@ queen_node = NodeSpec(
description=(
"User's primary interactive interface with full coding capability. "
"Can build agents directly or delegate to the worker. Manages the "
"worker agent lifecycle and triages health escalations from the judge."
"worker agent lifecycle."
),
node_type="event_loop",
client_facing=True,
@@ -1021,7 +1261,6 @@ queen_node = NodeSpec(
_queen_identity_building
+ _queen_style
+ _package_builder_knowledge
+ _gcu_building_section # GCU as first-class citizen (not appendix)
+ _queen_tools_docs
+ _queen_behavior
+ _queen_phase_7
@@ -1062,5 +1301,5 @@ __all__ = [
"_building_knowledge",
"_package_builder_knowledge",
"_appendices",
"_gcu_building_section",
"_gcu_section",
]
+15 -17
View File
@@ -74,11 +74,7 @@ def format_for_injection() -> str:
return ""
body = "\n\n---\n\n".join(parts)
return (
"--- Your Cross-Session Memory ---\n\n"
+ body
+ "\n\n--- End Cross-Session Memory ---"
)
return "--- Your Cross-Session Memory ---\n\n" + body + "\n\n--- End Cross-Session Memory ---"
_SEED_TEMPLATE = """\
@@ -143,7 +139,8 @@ Rules:
- Do NOT include diary sections, daily logs, or session summaries. Those belong elsewhere.
MEMORY.md is about who they are, what they want, what works not what happened today.
- Reference dates only when noting a lasting milestone (e.g. "since March 8th they prefer X").
- If the session had no meaningful new information about the person, return the existing text unchanged.
- If the session had no meaningful new information about the person,
return the existing text unchanged.
- Do not add fictional details. Only reflect what is evidenced in the notes.
- Stay concise. Prune rather than accumulate. A lean, accurate file is more useful than a
dense one. If something was true once but has been resolved or superseded, remove it.
@@ -154,14 +151,16 @@ _DIARY_SYSTEM = """\
You maintain the daily episodic diary of an AI assistant called the Queen.
You receive: (1) today's existing diary so far, and (2) notes from the latest session.
Rewrite the complete diary for today as a single unified narrative first person, reflective, honest.
Rewrite the complete diary for today as a single unified narrative
first person, reflective, honest.
Merge and deduplicate: if the same story (e.g. a research agent stalling) recurred several times,
describe it once with appropriate weight rather than retelling it. Weave in new developments from
the session notes. Preserve important milestones, emotional texture, and session path references.
If today's diary is empty, write the initial entry based on the session notes alone.
Output only the full diary prose no date heading, no timestamp headers, no preamble, no code fences.
Output only the full diary prose no date heading, no timestamp headers,
no preamble, no code fences.
"""
@@ -195,10 +194,7 @@ def read_session_context(session_dir: Path, max_messages: int = 80) -> str:
if role == "tool":
continue # skip verbose tool results
if role == "assistant" and tool_calls and not content:
names = [
tc.get("function", {}).get("name", "?")
for tc in tool_calls
]
names = [tc.get("function", {}).get("name", "?") for tc in tool_calls]
lines.append(f"[queen calls: {', '.join(names)}]")
elif content:
label = "user" if role == "user" else "queen"
@@ -299,9 +295,7 @@ async def consolidate_queen_memory(
len(session_context),
)
session_context = await _compact_context(session_context, llm)
logger.info(
"queen_memory: compacted to %d chars", len(session_context)
)
logger.info("queen_memory: compacted to %d chars", len(session_context))
existing_semantic = read_semantic_memory()
today_journal = read_episodic_memory()
@@ -325,7 +319,7 @@ async def consolidate_queen_memory(
len(user_msg) // 4,
)
from framework.agents.hive_coder.config import default_config
from framework.agents.queen.config import default_config
semantic_resp, diary_resp = await asyncio.gather(
llm.acomplete(
@@ -356,7 +350,11 @@ async def consolidate_queen_memory(
ep_path.parent.mkdir(parents=True, exist_ok=True)
heading = f"# {today_str}"
ep_path.write_text(f"{heading}\n\n{diary_entry}\n", encoding="utf-8")
logger.info("queen_memory: episodic diary rewritten for %s (%d chars)", today_str, len(diary_entry))
logger.info(
"queen_memory: episodic diary rewritten for %s (%d chars)",
today_str,
len(diary_entry),
)
except Exception:
tb = traceback.format_exc()
@@ -180,7 +180,7 @@ terminal_nodes = [] # Forever-alive
# Module-level vars read by AgentRunner.load()
conversation_mode = "continuous"
identity_prompt = "You are a helpful agent."
loop_config = {"max_iterations": 100, "max_tool_calls_per_turn": 20, "max_history_tokens": 32000}
loop_config = {"max_iterations": 100, "max_tool_calls_per_turn": 20, "max_context_tokens": 32000}
class MyAgent:
@@ -332,81 +332,46 @@ class MyAgent:
default_agent = MyAgent()
```
## agent.py — Async Entry Points Variant
## triggers.json — Timer and Webhook Triggers
When an agent needs timers, webhooks, or event-driven triggers, add
`async_entry_points` and optionally `runtime_config` as module-level variables.
These are IN ADDITION to the standard variables above.
When an agent needs timers, webhooks, or event-driven triggers, create a
`triggers.json` file in the agent's directory (alongside `agent.py`).
The queen loads these at session start and the user can manage them via
the `set_trigger` / `remove_trigger` tools at runtime.
```python
# Additional imports for async entry points
from framework.graph.edge import GraphSpec, AsyncEntryPointSpec
from framework.runtime.agent_runtime import (
AgentRuntime, AgentRuntimeConfig, create_agent_runtime,
)
# ... (goal, nodes, edges, entry_node, entry_points, etc. as above) ...
# Async entry points — event-driven triggers
async_entry_points = [
# Timer with cron: daily at 9am
AsyncEntryPointSpec(
id="daily-check",
name="Daily Check",
entry_node="process-node",
trigger_type="timer",
trigger_config={"cron": "0 9 * * *"},
isolation_level="shared",
max_concurrent=1,
),
# Timer with fixed interval: every 20 minutes
AsyncEntryPointSpec(
id="scheduled-check",
name="Scheduled Check",
entry_node="process-node",
trigger_type="timer",
trigger_config={"interval_minutes": 20, "run_immediately": False},
isolation_level="shared",
max_concurrent=1,
),
# Event: reacts to webhook events
AsyncEntryPointSpec(
id="webhook-event",
name="Webhook Event Handler",
entry_node="process-node",
trigger_type="event",
trigger_config={"event_types": ["webhook_received"]},
isolation_level="shared",
max_concurrent=10,
),
```json
[
{
"id": "daily-check",
"name": "Daily Check",
"trigger_type": "timer",
"trigger_config": {"cron": "0 9 * * *"},
"task": "Run the daily check process"
},
{
"id": "scheduled-check",
"name": "Scheduled Check",
"trigger_type": "timer",
"trigger_config": {"interval_minutes": 20},
"task": "Run the scheduled check"
},
{
"id": "webhook-event",
"name": "Webhook Event Handler",
"trigger_type": "webhook",
"trigger_config": {"event_types": ["webhook_received"]},
"task": "Process incoming webhook event"
}
]
# Webhook server config (only needed if using webhooks)
runtime_config = AgentRuntimeConfig(
webhook_host="127.0.0.1",
webhook_port=8080,
webhook_routes=[
{
"source_id": "my-source",
"path": "/webhooks/my-source",
"methods": ["POST"],
},
],
)
```
**Key rules for async entry points:**
- `async_entry_points` is a list of `AsyncEntryPointSpec` (NOT `EntryPointSpec`)
- `runtime_config` is `AgentRuntimeConfig` (NOT `RuntimeConfig` from config.py)
- Valid trigger_types: `timer`, `event`, `webhook`, `manual`, `api`
- Valid isolation_levels: `isolated`, `shared`, `synchronized`
**Key rules for triggers.json:**
- Valid trigger_types: `timer`, `webhook`
- Timer trigger_config (cron): `{"cron": "0 9 * * *"}` — standard 5-field cron expression
- Timer trigger_config (interval): `{"interval_minutes": float, "run_immediately": bool}`
- Event trigger_config: `{"event_types": ["webhook_received"], "filter_stream": "...", "filter_node": "..."}`
- Use `isolation_level="shared"` for async entry points that need to read
the primary session's memory (e.g., user-configured rules)
- The `_build_graph()` method passes `async_entry_points` to GraphSpec
- Reference: `exports/gmail_inbox_guardian/agent.py`
- Timer trigger_config (interval): `{"interval_minutes": float}`
- Each trigger must have a unique `id`
- The `task` field describes what the worker should do when the trigger fires
- Triggers are persisted back to `triggers.json` when modified via queen tools
## __init__.py
@@ -453,21 +418,6 @@ __all__ = [
]
```
**If the agent uses async entry points**, also import and export:
```python
from .agent import (
...,
async_entry_points,
runtime_config, # Only if using webhooks
)
__all__ = [
...,
"async_entry_points",
"runtime_config",
]
```
## __main__.py
```python
@@ -559,7 +509,7 @@ if __name__ == "__main__":
## mcp_servers.json
> **Auto-generated.** `initialize_agent_package` creates this file with hive-tools
> **Auto-generated.** `initialize_and_build_agent` creates this file with hive-tools
> as the default. Only edit manually to add additional MCP servers.
```json
@@ -31,8 +31,7 @@ module-level variables via `getattr()`:
| `conversation_mode` | no | not passed | Isolated mode (no context carryover) |
| `identity_prompt` | no | not passed | No agent-level identity |
| `loop_config` | no | `{}` | No iteration limits |
| `async_entry_points` | no | `[]` | No async triggers (timers, webhooks, events) |
| `runtime_config` | no | `None` | No webhook server |
| `triggers.json` (file) | no | not present | No triggers (timers, webhooks) |
**CRITICAL:** `__init__.py` MUST import and re-export ALL of these from
`agent.py`. Missing exports silently fall back to defaults, causing
@@ -226,7 +225,7 @@ Only three valid keys:
loop_config = {
"max_iterations": 100, # Max LLM turns per node visit
"max_tool_calls_per_turn": 20, # Max tool calls per LLM response
"max_history_tokens": 32000, # Triggers conversation compaction
"max_context_tokens": 32000, # Triggers conversation compaction
}
```
**INVALID keys** (do NOT use): `"strategy"`, `"mode"`, `"timeout"`,
@@ -257,44 +256,28 @@ Multiple ON_SUCCESS edges from same source → parallel execution via asyncio.ga
Judge is the SOLE acceptance mechanism — no ad-hoc framework gating.
## Async Entry Points (Webhooks, Timers, Events)
## Triggers (Timers, Webhooks)
For agents that react to external events, use `AsyncEntryPointSpec`:
For agents that react to external events, create a `triggers.json` file
in the agent's export directory:
```python
from framework.graph.edge import AsyncEntryPointSpec
from framework.runtime.agent_runtime import AgentRuntimeConfig
# Timer trigger (cron or interval)
async_entry_points = [
AsyncEntryPointSpec(
id="daily-check",
name="Daily Check",
entry_node="process",
trigger_type="timer",
trigger_config={"cron": "0 9 * * *"}, # daily at 9am
isolation_level="shared",
)
```json
[
{
"id": "daily-check",
"name": "Daily Check",
"trigger_type": "timer",
"trigger_config": {"cron": "0 9 * * *"},
"task": "Run the daily check process"
}
]
# Webhook server (optional)
runtime_config = AgentRuntimeConfig(
webhook_host="127.0.0.1",
webhook_port=8080,
webhook_routes=[{"source_id": "gmail", "path": "/webhooks/gmail", "methods": ["POST"]}],
)
```
### Key Fields
- `trigger_type`: `"timer"`, `"event"`, `"webhook"`, `"manual"`
- `trigger_type`: `"timer"` or `"webhook"`
- `trigger_config`: `{"cron": "0 9 * * *"}` or `{"interval_minutes": 20}`
- `isolation_level`: `"shared"` (recommended), `"isolated"`, `"synchronized"`
- `event_types`: For event triggers, e.g., `["webhook_received"]`
### Exports Required
Both `async_entry_points` and `runtime_config` must be exported from `__init__.py`.
See `exports/gmail_inbox_guardian/agent.py` for complete example.
- `task`: describes what the worker should do when the trigger fires
- Triggers can also be created/removed at runtime via `set_trigger` / `remove_trigger` queen tools
## Tool Discovery
@@ -1,8 +1,8 @@
"""Queen's ticket receiver entry point.
When the Worker Health Judge emits a WORKER_ESCALATION_TICKET event on the
shared EventBus, this entry point fires and routes to the ``ticket_triage``
node, where the Queen deliberates and decides whether to notify the operator.
When a WORKER_ESCALATION_TICKET event is emitted on the shared EventBus,
this entry point fires and routes to the ``ticket_triage`` node, where the
Queen deliberates and decides whether to notify the operator.
Isolation level is ``isolated`` the queen's triage memory is kept separate
from the worker's shared memory. Each ticket triage runs in its own context.
-7
View File
@@ -1,7 +0,0 @@
"""Builder interface for analyzing and building agents."""
from framework.builder.query import BuilderQuery
__all__ = [
"BuilderQuery",
]
-501
View File
@@ -1,501 +0,0 @@
"""
Builder Query Interface - How I (Builder) analyze agent runs.
This is designed around the questions I need to answer:
1. What happened? (summaries, narratives)
2. Why did it fail? (failure analysis, decision traces)
3. What patterns emerge? (across runs, across nodes)
4. What should we change? (suggestions)
"""
from collections import defaultdict
from pathlib import Path
from typing import Any
from framework.schemas.decision import Decision
from framework.schemas.run import Run, RunStatus, RunSummary
from framework.storage.backend import FileStorage
class FailureAnalysis:
"""Structured analysis of why a run failed."""
def __init__(
self,
run_id: str,
failure_point: str,
root_cause: str,
decision_chain: list[str],
problems: list[str],
suggestions: list[str],
):
self.run_id = run_id
self.failure_point = failure_point
self.root_cause = root_cause
self.decision_chain = decision_chain
self.problems = problems
self.suggestions = suggestions
def to_dict(self) -> dict[str, Any]:
return {
"run_id": self.run_id,
"failure_point": self.failure_point,
"root_cause": self.root_cause,
"decision_chain": self.decision_chain,
"problems": self.problems,
"suggestions": self.suggestions,
}
def __str__(self) -> str:
lines = [
f"=== Failure Analysis for {self.run_id} ===",
"",
f"Failure Point: {self.failure_point}",
f"Root Cause: {self.root_cause}",
"",
"Decision Chain Leading to Failure:",
]
for i, dec in enumerate(self.decision_chain, 1):
lines.append(f" {i}. {dec}")
if self.problems:
lines.append("")
lines.append("Reported Problems:")
for prob in self.problems:
lines.append(f" - {prob}")
if self.suggestions:
lines.append("")
lines.append("Suggestions:")
for sug in self.suggestions:
lines.append(f"{sug}")
return "\n".join(lines)
class PatternAnalysis:
"""Patterns detected across multiple runs."""
def __init__(
self,
goal_id: str,
run_count: int,
success_rate: float,
common_failures: list[tuple[str, int]],
problematic_nodes: list[tuple[str, float]],
decision_patterns: dict[str, Any],
):
self.goal_id = goal_id
self.run_count = run_count
self.success_rate = success_rate
self.common_failures = common_failures
self.problematic_nodes = problematic_nodes
self.decision_patterns = decision_patterns
def to_dict(self) -> dict[str, Any]:
return {
"goal_id": self.goal_id,
"run_count": self.run_count,
"success_rate": self.success_rate,
"common_failures": self.common_failures,
"problematic_nodes": self.problematic_nodes,
"decision_patterns": self.decision_patterns,
}
def __str__(self) -> str:
lines = [
f"=== Pattern Analysis for Goal {self.goal_id} ===",
"",
f"Runs Analyzed: {self.run_count}",
f"Success Rate: {self.success_rate:.1%}",
]
if self.common_failures:
lines.append("")
lines.append("Common Failures:")
for failure, count in self.common_failures:
lines.append(f" - {failure} ({count} occurrences)")
if self.problematic_nodes:
lines.append("")
lines.append("Problematic Nodes (failure rate):")
for node, rate in self.problematic_nodes:
lines.append(f" - {node}: {rate:.1%} failure rate")
return "\n".join(lines)
class BuilderQuery:
"""
The interface I (Builder) use to understand what agents are doing.
This is optimized for the questions I need to answer when analyzing
agent behavior and deciding what to improve.
"""
def __init__(self, storage_path: str | Path):
self.storage = FileStorage(storage_path)
# === WHAT HAPPENED? ===
def get_run_summary(self, run_id: str) -> RunSummary | None:
"""Get a quick summary of a run."""
return self.storage.load_summary(run_id)
def get_full_run(self, run_id: str) -> Run | None:
"""Get the complete run with all decisions."""
return self.storage.load_run(run_id)
def list_runs_for_goal(self, goal_id: str) -> list[RunSummary]:
"""Get summaries of all runs for a goal."""
run_ids = self.storage.get_runs_by_goal(goal_id)
summaries = []
for run_id in run_ids:
summary = self.storage.load_summary(run_id)
if summary:
summaries.append(summary)
return summaries
def get_recent_failures(self, limit: int = 10) -> list[RunSummary]:
"""Get recent failed runs."""
run_ids = self.storage.get_runs_by_status(RunStatus.FAILED)
summaries = []
for run_id in run_ids[:limit]:
summary = self.storage.load_summary(run_id)
if summary:
summaries.append(summary)
return summaries
# === WHY DID IT FAIL? ===
def analyze_failure(self, run_id: str) -> FailureAnalysis | None:
"""
Deep analysis of why a run failed.
This is my primary tool for understanding what went wrong.
"""
run = self.storage.load_run(run_id)
if run is None or run.status != RunStatus.FAILED:
return None
# Find the first failed decision
failed_decisions = [d for d in run.decisions if not d.was_successful]
if not failed_decisions:
failure_point = "Unknown - no decision marked as failed"
root_cause = "Run failed but all decisions succeeded (external cause?)"
else:
first_failure = failed_decisions[0]
failure_point = first_failure.summary_for_builder()
root_cause = first_failure.outcome.error if first_failure.outcome else "Unknown"
# Build the decision chain leading to failure
decision_chain = []
for d in run.decisions:
decision_chain.append(d.summary_for_builder())
if not d.was_successful:
break
# Extract problems
problems = [f"[{p.severity}] {p.description}" for p in run.problems]
# Generate suggestions based on the failure
suggestions = self._generate_suggestions(run, failed_decisions)
return FailureAnalysis(
run_id=run_id,
failure_point=failure_point,
root_cause=root_cause,
decision_chain=decision_chain,
problems=problems,
suggestions=suggestions,
)
def get_decision_trace(self, run_id: str) -> list[str]:
"""Get a readable trace of all decisions in a run."""
run = self.storage.load_run(run_id)
if run is None:
return []
return [d.summary_for_builder() for d in run.decisions]
# === WHAT PATTERNS EMERGE? ===
def find_patterns(self, goal_id: str) -> PatternAnalysis | None:
"""
Find patterns across runs for a goal.
This helps me understand systemic issues vs one-off failures.
"""
run_ids = self.storage.get_runs_by_goal(goal_id)
if not run_ids:
return None
runs = []
for run_id in run_ids:
run = self.storage.load_run(run_id)
if run:
runs.append(run)
if not runs:
return None
# Calculate success rate
completed = [r for r in runs if r.status == RunStatus.COMPLETED]
success_rate = len(completed) / len(runs) if runs else 0.0
# Find common failures
failure_counts: dict[str, int] = defaultdict(int)
for run in runs:
for decision in run.decisions:
if not decision.was_successful and decision.outcome:
error = decision.outcome.error or "Unknown error"
failure_counts[error] += 1
common_failures = sorted(failure_counts.items(), key=lambda x: x[1], reverse=True)[:5]
# Find problematic nodes
node_stats: dict[str, dict[str, int]] = defaultdict(lambda: {"total": 0, "failed": 0})
for run in runs:
for decision in run.decisions:
node_stats[decision.node_id]["total"] += 1
if not decision.was_successful:
node_stats[decision.node_id]["failed"] += 1
problematic_nodes = []
for node_id, stats in node_stats.items():
if stats["total"] > 0:
failure_rate = stats["failed"] / stats["total"]
if failure_rate > 0.1: # More than 10% failure rate
problematic_nodes.append((node_id, failure_rate))
problematic_nodes.sort(key=lambda x: x[1], reverse=True)
# Decision patterns
decision_patterns = self._analyze_decision_patterns(runs)
return PatternAnalysis(
goal_id=goal_id,
run_count=len(runs),
success_rate=success_rate,
common_failures=common_failures,
problematic_nodes=problematic_nodes,
decision_patterns=decision_patterns,
)
def compare_runs(self, run_id_1: str, run_id_2: str) -> dict[str, Any]:
"""Compare two runs to understand what differed."""
run1 = self.storage.load_run(run_id_1)
run2 = self.storage.load_run(run_id_2)
if run1 is None or run2 is None:
return {"error": "One or both runs not found"}
return {
"run_1": {
"id": run1.id,
"status": run1.status.value,
"decisions": len(run1.decisions),
"success_rate": run1.metrics.success_rate,
},
"run_2": {
"id": run2.id,
"status": run2.status.value,
"decisions": len(run2.decisions),
"success_rate": run2.metrics.success_rate,
},
"differences": self._find_differences(run1, run2),
}
# === WHAT SHOULD WE CHANGE? ===
def suggest_improvements(self, goal_id: str) -> list[dict[str, Any]]:
"""
Generate improvement suggestions based on run analysis.
This is what I use to propose changes to the human engineer.
"""
patterns = self.find_patterns(goal_id)
if patterns is None:
return []
suggestions = []
# Suggestion: Fix problematic nodes
for node_id, failure_rate in patterns.problematic_nodes:
suggestions.append(
{
"type": "node_improvement",
"target": node_id,
"reason": f"Node has {failure_rate:.1%} failure rate",
"recommendation": (
f"Review and improve node '{node_id}' - "
"high failure rate suggests prompt or tool issues"
),
"priority": "high" if failure_rate > 0.3 else "medium",
}
)
# Suggestion: Address common failures
for failure, count in patterns.common_failures:
if count >= 2:
suggestions.append(
{
"type": "error_handling",
"target": failure,
"reason": f"Error occurred {count} times",
"recommendation": f"Add handling for: {failure}",
"priority": "high" if count >= 5 else "medium",
}
)
# Suggestion: Overall success rate
if patterns.success_rate < 0.8:
suggestions.append(
{
"type": "architecture",
"target": goal_id,
"reason": f"Goal success rate is only {patterns.success_rate:.1%}",
"recommendation": (
"Consider restructuring the agent graph or improving goal definition"
),
"priority": "high",
}
)
return suggestions
def get_node_performance(self, node_id: str) -> dict[str, Any]:
"""Get performance metrics for a specific node across all runs."""
run_ids = self.storage.get_runs_by_node(node_id)
total_decisions = 0
successful_decisions = 0
total_latency = 0
total_tokens = 0
decision_types: dict[str, int] = defaultdict(int)
for run_id in run_ids:
run = self.storage.load_run(run_id)
if run:
for decision in run.decisions:
if decision.node_id == node_id:
total_decisions += 1
if decision.was_successful:
successful_decisions += 1
if decision.outcome:
total_latency += decision.outcome.latency_ms
total_tokens += decision.outcome.tokens_used
decision_types[decision.decision_type.value] += 1
return {
"node_id": node_id,
"total_decisions": total_decisions,
"success_rate": successful_decisions / total_decisions if total_decisions > 0 else 0,
"avg_latency_ms": total_latency / total_decisions if total_decisions > 0 else 0,
"total_tokens": total_tokens,
"decision_type_distribution": dict(decision_types),
}
# === PRIVATE HELPERS ===
def _generate_suggestions(
self,
run: Run,
failed_decisions: list[Decision],
) -> list[str]:
"""Generate suggestions based on failure analysis."""
suggestions = []
for decision in failed_decisions:
# Check if there were alternatives
if len(decision.options) > 1:
chosen = decision.chosen_option
alternatives = [o for o in decision.options if o.id != decision.chosen_option_id]
if alternatives:
alt_desc = alternatives[0].description
chosen_desc = chosen.description if chosen else "unknown"
suggestions.append(
f"Consider alternative: '{alt_desc}' instead of '{chosen_desc}'"
)
# Check for missing context
if not decision.input_context:
suggestions.append(
f"Decision '{decision.intent}' had no input context - "
"ensure relevant data is passed"
)
# Check for constraint issues
if decision.active_constraints:
constraints = ", ".join(decision.active_constraints)
suggestions.append(f"Review constraints: {constraints} - may be too restrictive")
# Check for reported problems with suggestions
for problem in run.problems:
if problem.suggested_fix:
suggestions.append(problem.suggested_fix)
return suggestions
def _analyze_decision_patterns(self, runs: list[Run]) -> dict[str, Any]:
"""Analyze decision patterns across runs."""
type_counts: dict[str, int] = defaultdict(int)
option_counts: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
for run in runs:
for decision in run.decisions:
type_counts[decision.decision_type.value] += 1
# Track which options are chosen for similar intents
intent_key = decision.intent[:50] # Truncate for grouping
if decision.chosen_option:
option_counts[intent_key][decision.chosen_option.description] += 1
# Find most common choices per intent
common_choices = {}
for intent, choices in option_counts.items():
if choices:
most_common = max(choices.items(), key=lambda x: x[1])
common_choices[intent] = {
"choice": most_common[0],
"count": most_common[1],
"alternatives": len(choices) - 1,
}
return {
"decision_type_distribution": dict(type_counts),
"common_choices": common_choices,
}
def _find_differences(self, run1: Run, run2: Run) -> list[str]:
"""Find key differences between two runs."""
differences = []
# Status difference
if run1.status != run2.status:
differences.append(f"Status: {run1.status.value} vs {run2.status.value}")
# Decision count difference
if len(run1.decisions) != len(run2.decisions):
differences.append(f"Decision count: {len(run1.decisions)} vs {len(run2.decisions)}")
# Find first divergence point
for i, (d1, d2) in enumerate(zip(run1.decisions, run2.decisions, strict=False)):
if d1.chosen_option_id != d2.chosen_option_id:
differences.append(
f"Diverged at decision {i}: "
f"chose '{d1.chosen_option_id}' vs '{d2.chosen_option_id}'"
)
break
# Node differences
nodes1 = set(run1.metrics.nodes_executed)
nodes2 = set(run2.metrics.nodes_executed)
if nodes1 != nodes2:
only_1 = nodes1 - nodes2
only_2 = nodes2 - nodes1
if only_1:
differences.append(f"Nodes only in run 1: {only_1}")
if only_2:
differences.append(f"Nodes only in run 2: {only_2}")
return differences
+23
View File
@@ -56,6 +56,14 @@ def get_max_tokens() -> int:
return get_hive_config().get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
DEFAULT_MAX_CONTEXT_TOKENS = 32_000
def get_max_context_tokens() -> int:
"""Return the configured max_context_tokens, falling back to DEFAULT_MAX_CONTEXT_TOKENS."""
return get_hive_config().get("llm", {}).get("max_context_tokens", DEFAULT_MAX_CONTEXT_TOKENS)
def get_api_key() -> str | None:
"""Return the API key, supporting env var, Claude Code subscription, Codex, and ZAI Code.
@@ -90,6 +98,17 @@ def get_api_key() -> str | None:
except ImportError:
pass
# Kimi Code subscription: read API key from ~/.kimi/config.toml
if llm.get("use_kimi_code_subscription"):
try:
from framework.runner.runner import get_kimi_code_token
token = get_kimi_code_token()
if token:
return token
except ImportError:
pass
# Standard env-var path (covers ZAI Code and all API-key providers)
api_key_env_var = llm.get("api_key_env_var")
if api_key_env_var:
@@ -108,6 +127,9 @@ def get_api_base() -> str | None:
if llm.get("use_codex_subscription"):
# Codex subscription routes through the ChatGPT backend, not api.openai.com.
return "https://chatgpt.com/backend-api/codex"
if llm.get("use_kimi_code_subscription"):
# Kimi Code uses an Anthropic-compatible endpoint (no /v1 suffix).
return "https://api.kimi.com/coding"
return llm.get("api_base")
@@ -164,6 +186,7 @@ class RuntimeConfig:
model: str = field(default_factory=get_preferred_model)
temperature: float = 0.7
max_tokens: int = field(default_factory=get_max_tokens)
max_context_tokens: int = field(default_factory=get_max_context_tokens)
api_key: str | None = field(default_factory=get_api_key)
api_base: str | None = field(default_factory=get_api_base)
extra_kwargs: dict[str, Any] = field(default_factory=get_llm_extra_kwargs)
+1 -3
View File
@@ -6,7 +6,7 @@ This module provides secure credential storage with:
- Template-based usage: {{cred.key}} patterns for injection
- Bipartisan model: Store stores values, tools define usage
- Provider system: Extensible lifecycle management (refresh, validate)
- Multiple backends: Encrypted files, env vars, HashiCorp Vault
- Multiple backends: Encrypted files, env vars
Quick Start:
from core.framework.credentials import CredentialStore, CredentialObject
@@ -38,8 +38,6 @@ For Aden server sync:
AdenSyncProvider,
)
For Vault integration:
from core.framework.credentials.vault import HashiCorpVaultStorage
"""
from .key_storage import (
+17 -3
View File
@@ -149,8 +149,14 @@ def delete_aden_api_key() -> None:
storage = EncryptedFileStorage()
storage.delete(ADEN_CREDENTIAL_ID)
except (FileNotFoundError, PermissionError) as e:
logger.debug("Could not delete %s from encrypted store: %s", ADEN_CREDENTIAL_ID, e)
except Exception:
logger.debug("Could not delete %s from encrypted store", ADEN_CREDENTIAL_ID)
logger.warning(
"Unexpected error deleting %s from encrypted store",
ADEN_CREDENTIAL_ID,
exc_info=True,
)
os.environ.pop(ADEN_ENV_VAR, None)
@@ -167,8 +173,10 @@ def _read_credential_key_file() -> str | None:
value = CREDENTIAL_KEY_PATH.read_text(encoding="utf-8").strip()
if value:
return value
except (FileNotFoundError, PermissionError) as e:
logger.debug("Could not read %s: %s", CREDENTIAL_KEY_PATH, e)
except Exception:
logger.debug("Could not read %s", CREDENTIAL_KEY_PATH)
logger.warning("Unexpected error reading %s", CREDENTIAL_KEY_PATH, exc_info=True)
return None
@@ -196,6 +204,12 @@ def _read_aden_from_encrypted_store() -> str | None:
cred = storage.load(ADEN_CREDENTIAL_ID)
if cred:
return cred.get_key("api_key")
except (FileNotFoundError, PermissionError, KeyError) as e:
logger.debug("Could not load %s from encrypted store: %s", ADEN_CREDENTIAL_ID, e)
except Exception:
logger.debug("Could not load %s from encrypted store", ADEN_CREDENTIAL_ID)
logger.warning(
"Unexpected error loading %s from encrypted store",
ADEN_CREDENTIAL_ID,
exc_info=True,
)
return None
@@ -1,55 +0,0 @@
"""
HashiCorp Vault integration for the credential store.
This module provides enterprise-grade secret management through
HashiCorp Vault integration.
Quick Start:
from core.framework.credentials import CredentialStore
from core.framework.credentials.vault import HashiCorpVaultStorage
# Configure Vault storage
storage = HashiCorpVaultStorage(
url="https://vault.example.com:8200",
# token read from VAULT_TOKEN env var
mount_point="secret",
path_prefix="hive/agents/prod"
)
# Create credential store with Vault backend
store = CredentialStore(storage=storage)
# Use normally - credentials are stored in Vault
credential = store.get_credential("my_api")
Requirements:
pip install hvac
Authentication:
Set the VAULT_TOKEN environment variable or pass the token directly:
export VAULT_TOKEN="hvs.xxxxxxxxxxxxx"
For production, consider using Vault auth methods:
- Kubernetes auth
- AppRole auth
- AWS IAM auth
Vault Configuration:
Ensure KV v2 secrets engine is enabled:
vault secrets enable -path=secret kv-v2
Grant appropriate policies:
path "secret/data/hive/credentials/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
path "secret/metadata/hive/credentials/*" {
capabilities = ["list", "delete"]
}
"""
from .hashicorp import HashiCorpVaultStorage
__all__ = ["HashiCorpVaultStorage"]
@@ -1,394 +0,0 @@
"""
HashiCorp Vault storage adapter.
Provides integration with HashiCorp Vault for enterprise secret management.
Requires the 'hvac' package: uv pip install hvac
"""
from __future__ import annotations
import logging
import os
from datetime import datetime
from typing import Any
from pydantic import SecretStr
from ..models import CredentialKey, CredentialObject, CredentialType
from ..storage import CredentialStorage
logger = logging.getLogger(__name__)
class HashiCorpVaultStorage(CredentialStorage):
"""
HashiCorp Vault storage adapter.
Features:
- KV v2 secrets engine support
- Namespace support (Enterprise)
- Automatic secret versioning
- Audit logging via Vault
The adapter stores credentials in Vault's KV v2 secrets engine with
the following structure:
{mount_point}/data/{path_prefix}/{credential_id}
data:
_type: "oauth2"
access_token: "xxx"
refresh_token: "yyy"
_expires_access_token: "2024-01-26T12:00:00"
_provider_id: "oauth2"
Example:
storage = HashiCorpVaultStorage(
url="https://vault.example.com:8200",
token="hvs.xxx", # Or use VAULT_TOKEN env var
mount_point="secret",
path_prefix="hive/credentials"
)
store = CredentialStore(storage=storage)
# Credentials are now stored in Vault
store.save_credential(credential)
credential = store.get_credential("my_api")
Authentication:
The adapter uses token-based authentication. The token can be provided:
1. Directly via the 'token' parameter
2. Via the VAULT_TOKEN environment variable
For production, consider using:
- Kubernetes auth method
- AppRole auth method
- AWS IAM auth method
Requirements:
uv pip install hvac
"""
def __init__(
self,
url: str,
token: str | None = None,
mount_point: str = "secret",
path_prefix: str = "hive/credentials",
namespace: str | None = None,
verify_ssl: bool = True,
):
"""
Initialize Vault storage.
Args:
url: Vault server URL (e.g., https://vault.example.com:8200)
token: Vault token. If None, reads from VAULT_TOKEN env var
mount_point: KV secrets engine mount point (default: "secret")
path_prefix: Path prefix for all credentials
namespace: Vault namespace (Enterprise feature)
verify_ssl: Whether to verify SSL certificates
Raises:
ImportError: If hvac is not installed
ValueError: If authentication fails
"""
try:
import hvac
except ImportError as e:
raise ImportError(
"HashiCorp Vault support requires 'hvac'. Install with: uv pip install hvac"
) from e
self._url = url
self._token = token or os.environ.get("VAULT_TOKEN")
self._mount = mount_point
self._prefix = path_prefix
self._namespace = namespace
if not self._token:
raise ValueError(
"Vault token required. Set VAULT_TOKEN env var or pass token parameter."
)
self._client = hvac.Client(
url=url,
token=self._token,
namespace=namespace,
verify=verify_ssl,
)
if not self._client.is_authenticated():
raise ValueError("Vault authentication failed. Check token and server URL.")
logger.info(f"Connected to HashiCorp Vault at {url}")
def _path(self, credential_id: str) -> str:
"""Build Vault path for credential."""
# Sanitize credential_id
safe_id = credential_id.replace("/", "_").replace("\\", "_")
return f"{self._prefix}/{safe_id}"
def save(self, credential: CredentialObject) -> None:
"""Save credential to Vault KV v2."""
path = self._path(credential.id)
data = self._serialize_for_vault(credential)
try:
self._client.secrets.kv.v2.create_or_update_secret(
path=path,
secret=data,
mount_point=self._mount,
)
logger.debug(f"Saved credential '{credential.id}' to Vault at {path}")
except Exception as e:
logger.error(f"Failed to save credential '{credential.id}' to Vault: {e}")
raise
def load(self, credential_id: str) -> CredentialObject | None:
"""Load credential from Vault."""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_version(
path=path,
mount_point=self._mount,
)
data = response["data"]["data"]
return self._deserialize_from_vault(credential_id, data)
except Exception as e:
# Check if it's a "not found" error
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
logger.debug(f"Credential '{credential_id}' not found in Vault")
return None
logger.error(f"Failed to load credential '{credential_id}' from Vault: {e}")
raise
def delete(self, credential_id: str) -> bool:
"""Delete credential from Vault (all versions)."""
path = self._path(credential_id)
try:
self._client.secrets.kv.v2.delete_metadata_and_all_versions(
path=path,
mount_point=self._mount,
)
logger.debug(f"Deleted credential '{credential_id}' from Vault")
return True
except Exception as e:
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
return False
logger.error(f"Failed to delete credential '{credential_id}' from Vault: {e}")
raise
def list_all(self) -> list[str]:
"""List all credentials under the prefix."""
try:
response = self._client.secrets.kv.v2.list_secrets(
path=self._prefix,
mount_point=self._mount,
)
keys = response.get("data", {}).get("keys", [])
# Remove trailing slashes from folder names
return [k.rstrip("/") for k in keys]
except Exception as e:
error_str = str(e).lower()
if "not found" in error_str or "404" in error_str:
return []
logger.error(f"Failed to list credentials from Vault: {e}")
raise
def exists(self, credential_id: str) -> bool:
"""Check if credential exists in Vault."""
try:
path = self._path(credential_id)
self._client.secrets.kv.v2.read_secret_version(
path=path,
mount_point=self._mount,
)
return True
except Exception:
return False
def _serialize_for_vault(self, credential: CredentialObject) -> dict[str, Any]:
"""Convert credential to Vault secret format."""
data: dict[str, Any] = {
"_type": credential.credential_type.value,
}
if credential.provider_id:
data["_provider_id"] = credential.provider_id
if credential.description:
data["_description"] = credential.description
if credential.auto_refresh:
data["_auto_refresh"] = "true"
# Store each key
for key_name, key in credential.keys.items():
data[key_name] = key.get_secret_value()
if key.expires_at:
data[f"_expires_{key_name}"] = key.expires_at.isoformat()
if key.metadata:
data[f"_metadata_{key_name}"] = str(key.metadata)
return data
def _deserialize_from_vault(self, credential_id: str, data: dict[str, Any]) -> CredentialObject:
"""Reconstruct credential from Vault secret."""
# Extract metadata fields
cred_type = CredentialType(data.pop("_type", "api_key"))
provider_id = data.pop("_provider_id", None)
description = data.pop("_description", "")
auto_refresh = data.pop("_auto_refresh", "") == "true"
# Build keys dict
keys: dict[str, CredentialKey] = {}
# Find all non-metadata keys
key_names = [k for k in data.keys() if not k.startswith("_")]
for key_name in key_names:
value = data[key_name]
# Check for expiration
expires_at = None
expires_key = f"_expires_{key_name}"
if expires_key in data:
try:
expires_at = datetime.fromisoformat(data[expires_key])
except (ValueError, TypeError):
pass
# Check for metadata
metadata: dict[str, Any] = {}
metadata_key = f"_metadata_{key_name}"
if metadata_key in data:
try:
import ast
metadata = ast.literal_eval(data[metadata_key])
except (ValueError, SyntaxError):
pass
keys[key_name] = CredentialKey(
name=key_name,
value=SecretStr(value),
expires_at=expires_at,
metadata=metadata,
)
return CredentialObject(
id=credential_id,
credential_type=cred_type,
keys=keys,
provider_id=provider_id,
description=description,
auto_refresh=auto_refresh,
)
# --- Vault-Specific Operations ---
def get_secret_metadata(self, credential_id: str) -> dict[str, Any] | None:
"""
Get Vault metadata for a secret (version info, timestamps, etc.).
Args:
credential_id: The credential identifier
Returns:
Metadata dict or None if not found
"""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_metadata(
path=path,
mount_point=self._mount,
)
return response.get("data", {})
except Exception:
return None
def soft_delete(self, credential_id: str, versions: list[int] | None = None) -> bool:
"""
Soft delete specific versions (can be recovered).
Args:
credential_id: The credential identifier
versions: Version numbers to delete. If None, deletes latest.
Returns:
True if successful
"""
path = self._path(credential_id)
try:
if versions:
self._client.secrets.kv.v2.delete_secret_versions(
path=path,
versions=versions,
mount_point=self._mount,
)
else:
self._client.secrets.kv.v2.delete_latest_version_of_secret(
path=path,
mount_point=self._mount,
)
return True
except Exception as e:
logger.error(f"Soft delete failed for '{credential_id}': {e}")
return False
def undelete(self, credential_id: str, versions: list[int]) -> bool:
"""
Recover soft-deleted versions.
Args:
credential_id: The credential identifier
versions: Version numbers to recover
Returns:
True if successful
"""
path = self._path(credential_id)
try:
self._client.secrets.kv.v2.undelete_secret_versions(
path=path,
versions=versions,
mount_point=self._mount,
)
return True
except Exception as e:
logger.error(f"Undelete failed for '{credential_id}': {e}")
return False
def load_version(self, credential_id: str, version: int) -> CredentialObject | None:
"""
Load a specific version of a credential.
Args:
credential_id: The credential identifier
version: Version number to load
Returns:
CredentialObject or None
"""
path = self._path(credential_id)
try:
response = self._client.secrets.kv.v2.read_secret_version(
path=path,
version=version,
mount_point=self._mount,
)
data = response["data"]["data"]
return self._deserialize_from_vault(credential_id, data)
except Exception:
return None
+9 -9
View File
@@ -307,13 +307,13 @@ class NodeConversation:
def __init__(
self,
system_prompt: str = "",
max_history_tokens: int = 32000,
max_context_tokens: int = 32000,
compaction_threshold: float = 0.8,
output_keys: list[str] | None = None,
store: ConversationStore | None = None,
) -> None:
self._system_prompt = system_prompt
self._max_history_tokens = max_history_tokens
self._max_context_tokens = max_context_tokens
self._compaction_threshold = compaction_threshold
self._output_keys = output_keys
self._store = store
@@ -525,16 +525,16 @@ class NodeConversation:
self._last_api_input_tokens = actual_input_tokens
def usage_ratio(self) -> float:
"""Current token usage as a fraction of *max_history_tokens*.
"""Current token usage as a fraction of *max_context_tokens*.
Returns 0.0 when ``max_history_tokens`` is zero (unlimited).
Returns 0.0 when ``max_context_tokens`` is zero (unlimited).
"""
if self._max_history_tokens <= 0:
if self._max_context_tokens <= 0:
return 0.0
return self.estimate_tokens() / self._max_history_tokens
return self.estimate_tokens() / self._max_context_tokens
def needs_compaction(self) -> bool:
return self.estimate_tokens() >= self._max_history_tokens * self._compaction_threshold
return self.estimate_tokens() >= self._max_context_tokens * self._compaction_threshold
# --- Output-key extraction ---------------------------------------------
@@ -1029,7 +1029,7 @@ class NodeConversation:
await self._store.write_meta(
{
"system_prompt": self._system_prompt,
"max_history_tokens": self._max_history_tokens,
"max_context_tokens": self._max_context_tokens,
"compaction_threshold": self._compaction_threshold,
"output_keys": self._output_keys,
}
@@ -1062,7 +1062,7 @@ class NodeConversation:
conv = cls(
system_prompt=meta.get("system_prompt", ""),
max_history_tokens=meta.get("max_history_tokens", 32000),
max_context_tokens=meta.get("max_context_tokens", 32000),
compaction_threshold=meta.get("compaction_threshold", 0.8),
output_keys=meta.get("output_keys"),
store=store,
+3 -3
View File
@@ -37,7 +37,7 @@ async def evaluate_phase_completion(
phase_description: str,
success_criteria: str,
accumulator_state: dict[str, Any],
max_history_tokens: int = 8_196,
max_context_tokens: int = 8_196,
) -> PhaseVerdict:
"""Level 2 judge: read the conversation and evaluate quality.
@@ -50,7 +50,7 @@ async def evaluate_phase_completion(
phase_description: Description of the phase
success_criteria: Natural-language criteria for phase completion
accumulator_state: Current output key values
max_history_tokens: Main conversation token budget (judge gets 20%)
max_context_tokens: Main conversation token budget (judge gets 20%)
Returns:
PhaseVerdict with action and optional feedback
@@ -89,7 +89,7 @@ FEEDBACK: (reason if RETRY, empty if ACCEPT)"""
response = await llm.acomplete(
messages=[{"role": "user", "content": user_prompt}],
system=system_prompt,
max_tokens=max(1024, max_history_tokens // 5),
max_tokens=max(1024, max_context_tokens // 5),
max_retries=1,
)
if not response.content or not response.content.strip():
+13 -85
View File
@@ -322,7 +322,11 @@ class AsyncEntryPointSpec(BaseModel):
id: str = Field(description="Unique identifier for this entry point")
name: str = Field(description="Human-readable name")
entry_node: str = Field(description="Node ID to start execution from")
entry_node: str = Field(
default="",
description="Deprecated: Node ID to start execution from. "
"Triggers are graph-level; worker always enters at GraphSpec.entry_node.",
)
trigger_type: str = Field(
default="manual",
description="How this entry point is triggered: webhook, api, timer, event, manual",
@@ -331,6 +335,10 @@ class AsyncEntryPointSpec(BaseModel):
default_factory=dict,
description="Trigger-specific configuration (e.g., webhook URL, timer interval)",
)
task: str = Field(
default="",
description="Worker task string when this trigger fires autonomously",
)
isolation_level: str = Field(
default="shared", description="State isolation: isolated, shared, or synchronized"
)
@@ -368,28 +376,8 @@ class GraphSpec(BaseModel):
edges=[...],
)
For multi-entry-point agents (concurrent streams):
GraphSpec(
id="support-agent-graph",
goal_id="support-001",
entry_node="process-webhook", # Default entry
async_entry_points=[
AsyncEntryPointSpec(
id="webhook",
name="Zendesk Webhook",
entry_node="process-webhook",
trigger_type="webhook",
),
AsyncEntryPointSpec(
id="api",
name="API Handler",
entry_node="process-request",
trigger_type="api",
),
],
nodes=[...],
edges=[...],
)
Triggers (timer, webhook, event) are now defined in ``triggers.json``
alongside the agent directory, not embedded in the graph spec.
"""
id: str
@@ -402,12 +390,6 @@ class GraphSpec(BaseModel):
default_factory=dict,
description="Named entry points for resuming execution. Format: {name: node_id}",
)
async_entry_points: list[AsyncEntryPointSpec] = Field(
default_factory=list,
description=(
"Asynchronous entry points for concurrent execution streams (used with AgentRuntime)"
),
)
terminal_nodes: list[str] = Field(
default_factory=list, description="IDs of nodes that end execution"
)
@@ -486,17 +468,6 @@ class GraphSpec(BaseModel):
return node
return None
def has_async_entry_points(self) -> bool:
"""Check if this graph uses async entry points (multi-stream execution)."""
return len(self.async_entry_points) > 0
def get_async_entry_point(self, entry_point_id: str) -> AsyncEntryPointSpec | None:
"""Get an async entry point by ID."""
for ep in self.async_entry_points:
if ep.id == entry_point_id:
return ep
return None
def get_outgoing_edges(self, node_id: str) -> list[EdgeSpec]:
"""Get all edges leaving a node, sorted by priority."""
edges = [e for e in self.edges if e.source == node_id]
@@ -587,37 +558,6 @@ class GraphSpec(BaseModel):
if not self.get_node(self.entry_node):
errors.append(f"Entry node '{self.entry_node}' not found")
# Check async entry points
seen_entry_ids = set()
for entry_point in self.async_entry_points:
# Check for duplicate IDs
if entry_point.id in seen_entry_ids:
errors.append(f"Duplicate async entry point ID: '{entry_point.id}'")
seen_entry_ids.add(entry_point.id)
# Check entry node exists
if not self.get_node(entry_point.entry_node):
errors.append(
f"Async entry point '{entry_point.id}' references "
f"missing node '{entry_point.entry_node}'"
)
# Validate isolation level
valid_isolation = {"isolated", "shared", "synchronized"}
if entry_point.isolation_level not in valid_isolation:
errors.append(
f"Async entry point '{entry_point.id}' has invalid isolation_level "
f"'{entry_point.isolation_level}'. Valid: {valid_isolation}"
)
# Validate trigger type
valid_triggers = {"webhook", "api", "timer", "event", "manual"}
if entry_point.trigger_type not in valid_triggers:
errors.append(
f"Async entry point '{entry_point.id}' has invalid trigger_type "
f"'{entry_point.trigger_type}'. Valid: {valid_triggers}"
)
# Check terminal nodes exist
for term in self.terminal_nodes:
if not self.get_node(term):
@@ -646,10 +586,6 @@ class GraphSpec(BaseModel):
for entry_point_node in self.entry_points.values():
to_visit.append(entry_point_node)
# Add all async entry points as valid starting points
for async_entry in self.async_entry_points:
to_visit.append(async_entry.entry_node)
# Traverse from all entry points
while to_visit:
current = to_visit.pop()
@@ -666,18 +602,10 @@ class GraphSpec(BaseModel):
for sub_agent_id in sub_agents:
reachable.add(sub_agent_id)
# Build set of async entry point nodes for quick lookup
async_entry_nodes = {ep.entry_node for ep in self.async_entry_points}
for node in self.nodes:
if node.id not in reachable:
# Skip if node is a pause node, entry point target, or async entry
# (pause/resume architecture and async entry points make reachable)
if (
node.id in self.pause_nodes
or node.id in self.entry_points.values()
or node.id in async_entry_nodes
):
# Skip if node is a pause node or entry point target
if node.id in self.pause_nodes or node.id in self.entry_points.values():
continue
errors.append(f"Node '{node.id}' is unreachable from entry")
+318 -81
View File
@@ -36,6 +36,21 @@ from framework.runtime.llm_debug_logger import log_llm_turn
logger = logging.getLogger(__name__)
@dataclass
class TriggerEvent:
"""A framework-level trigger signal (timer tick or webhook hit).
Triggers are queued separately from user messages / external events
and drained atomically so the LLM sees all pending triggers at once.
"""
trigger_type: str # "timer" | "webhook"
source_id: str # entry point ID or webhook route ID
payload: dict[str, Any] = field(default_factory=dict)
timestamp: float = field(default_factory=time.time)
# Pattern for detecting context-window-exceeded errors across LLM providers.
_CONTEXT_TOO_LARGE_RE = re.compile(
r"context.{0,20}(length|window|limit|size)|"
@@ -73,6 +88,7 @@ class _EscalationReceiver:
def __init__(self) -> None:
self._event = asyncio.Event()
self._response: str | None = None
self._awaiting_input = True # So inject_worker_message() can prefer us
async def inject_event(self, content: str, *, is_client_input: bool = False) -> None:
"""Called by ExecutionStream.inject_input() when the user responds."""
@@ -134,7 +150,7 @@ class SubagentJudge:
async def evaluate(self, context: dict[str, Any]) -> JudgeVerdict:
missing = context.get("missing_keys", [])
if not missing:
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="ACCEPT", feedback="")
iteration = context.get("iteration", 0)
remaining = self._max_iterations - iteration - 1
@@ -168,8 +184,8 @@ class LoopConfig:
max_tool_calls_per_turn: int = 30
judge_every_n_turns: int = 1
stall_detection_threshold: int = 3
stall_similarity_threshold: float = 0.7
max_history_tokens: int = 32_000
stall_similarity_threshold: float = 0.85
max_context_tokens: int = 32_000
store_prefix: str = ""
# Overflow margin for max_tool_calls_per_turn. Tool calls are only
@@ -345,6 +361,7 @@ class EventLoopNode(NodeProtocol):
self._tool_executor = tool_executor
self._conversation_store = conversation_store
self._injection_queue: asyncio.Queue[tuple[str, bool]] = asyncio.Queue()
self._trigger_queue: asyncio.Queue[TriggerEvent] = asyncio.Queue()
# Client-facing input blocking state
self._input_ready = asyncio.Event()
self._awaiting_input = False
@@ -511,7 +528,7 @@ class EventLoopNode(NodeProtocol):
conversation = NodeConversation(
system_prompt=system_prompt,
max_history_tokens=self._config.max_history_tokens,
max_context_tokens=self._config.max_context_tokens,
output_keys=ctx.node_spec.output_keys or None,
store=self._conversation_store,
)
@@ -548,6 +565,8 @@ class EventLoopNode(NodeProtocol):
tools.append(set_output_tool)
if ctx.node_spec.client_facing and not ctx.event_triggered:
tools.append(self._build_ask_user_tool())
if stream_id == "queen":
tools.append(self._build_ask_user_multiple_tool())
# Workers/subagents can escalate blockers to the queen.
if stream_id not in ("queen", "judge"):
tools.append(self._build_escalate_tool())
@@ -628,12 +647,15 @@ class EventLoopNode(NodeProtocol):
# 6b. Drain injection queue
await self._drain_injection_queue(conversation)
# 6b1. Drain trigger queue (framework-level signals)
await self._drain_trigger_queue(conversation)
# 6b2. Dynamic tool refresh (mode switching)
if ctx.dynamic_tools_provider is not None:
_synthetic_names = {
"set_output",
"ask_user",
"ask_user_multiple",
"escalate",
"delegate_to_sub_agent",
"report_to_parent",
@@ -652,8 +674,20 @@ class EventLoopNode(NodeProtocol):
conversation.update_system_prompt(_new_prompt)
logger.info("[%s] Dynamic prompt updated (phase switch)", node_id)
# 6c. Publish iteration event
await self._publish_iteration(stream_id, node_id, iteration, execution_id)
# 6c. Publish iteration event (with per-iteration metadata when available)
_iter_meta = None
if ctx.iteration_metadata_provider is not None:
try:
_iter_meta = ctx.iteration_metadata_provider()
except Exception:
pass
await self._publish_iteration(
stream_id,
node_id,
iteration,
execution_id,
extra_data=_iter_meta,
)
# 6d. Pre-turn compaction check (tiered)
_compacted_this_iter = False
@@ -684,6 +718,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
request_system_prompt,
request_messages,
reported_to_parent,
) = await self._run_single_turn(
ctx, conversation, tools, iteration, accumulator
)
@@ -710,6 +745,7 @@ class EventLoopNode(NodeProtocol):
model=turn_tokens.get("model", ""),
input_tokens=turn_tokens.get("input", 0),
output_tokens=turn_tokens.get("output", 0),
cached_tokens=turn_tokens.get("cached", 0),
execution_id=execution_id,
iteration=iteration,
)
@@ -885,6 +921,7 @@ class EventLoopNode(NodeProtocol):
and not outputs_set
and not user_input_requested
and not queen_input_requested
and not reported_to_parent
)
if truly_empty and accumulator is not None:
missing = self._get_missing_output_keys(
@@ -1055,7 +1092,13 @@ class EventLoopNode(NodeProtocol):
mcp_tool_calls = [
tc
for tc in logged_tool_calls
if tc.get("tool_name") not in ("set_output", "ask_user", "escalate")
if tc.get("tool_name")
not in (
"set_output",
"ask_user",
"ask_user_multiple",
"escalate",
)
]
if mcp_tool_calls:
fps = self._fingerprint_tool_calls(mcp_tool_calls)
@@ -1249,9 +1292,28 @@ class EventLoopNode(NodeProtocol):
iteration,
_cf_auto,
)
# Check for multi-question batch from ask_user_multiple
multi_qs = getattr(self, "_pending_multi_questions", None)
self._pending_multi_questions = None
got_input = await self._await_user_input(
ctx, prompt=_cf_prompt, options=ask_user_options
ctx,
prompt=_cf_prompt,
options=ask_user_options,
questions=multi_qs,
)
# Emit deferred tool_call_completed for ask_user / ask_user_multiple
deferred = getattr(self, "_deferred_tool_complete", None)
if deferred:
self._deferred_tool_complete = None
await self._publish_tool_completed(
deferred["stream_id"],
deferred["node_id"],
deferred["tool_use_id"],
deferred["tool_name"],
deferred["content"],
deferred["is_error"],
deferred["execution_id"],
)
logger.info("[%s] iter=%d: unblocked, got_input=%s", node_id, iteration, got_input)
if not got_input:
await self._publish_loop_completed(
@@ -1335,8 +1397,8 @@ class EventLoopNode(NodeProtocol):
# Auto-block beyond grace -- fall through to judge (6i)
# 6h''. Worker wait for queen guidance
# If a worker escalates with wait_for_response=true, pause here and
# skip judge evaluation until queen injects guidance.
# When a worker escalates, pause here and skip judge evaluation
# until the queen injects guidance.
if queen_input_requested:
if self._shutdown:
await self._publish_loop_completed(
@@ -1557,7 +1619,7 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="ACCEPT",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
@@ -1600,7 +1662,7 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="ESCALATE",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
@@ -1642,7 +1704,7 @@ class EventLoopNode(NodeProtocol):
node_type="event_loop",
step_index=iteration,
verdict="RETRY",
verdict_feedback=verdict.feedback,
verdict_feedback=verdict.feedback or "",
tool_calls=logged_tool_calls,
llm_text=assistant_text,
input_tokens=turn_tokens.get("input", 0),
@@ -1706,6 +1768,15 @@ class EventLoopNode(NodeProtocol):
await self._injection_queue.put((content, is_client_input))
self._input_ready.set()
async def inject_trigger(self, trigger: TriggerEvent) -> None:
"""Inject a framework-level trigger into the running queen loop.
Triggers are queued separately from user messages and drained
atomically via _drain_trigger_queue().
"""
await self._trigger_queue.put(trigger)
self._input_ready.set()
def signal_shutdown(self) -> None:
"""Signal the node to exit its loop cleanly.
@@ -1733,6 +1804,7 @@ class EventLoopNode(NodeProtocol):
prompt: str = "",
*,
options: list[str] | None = None,
questions: list[dict] | None = None,
emit_client_request: bool = True,
) -> bool:
"""Block until user input arrives or shutdown is signaled.
@@ -1747,15 +1819,17 @@ class EventLoopNode(NodeProtocol):
options: Optional predefined choices for the user (from ask_user).
Passed through to the CLIENT_INPUT_REQUESTED event so the
frontend can render a QuestionWidget with buttons.
questions: Optional list of question dicts for ask_user_multiple.
Each dict has id, prompt, and optional options.
emit_client_request: When False, wait silently without publishing
CLIENT_INPUT_REQUESTED. Used for worker waits where input is
expected from the queen via inject_worker_message().
Returns True if input arrived, False if shutdown was signaled.
"""
# If messages arrived while the LLM was processing, skip blocking
# entirely — the next _drain_injection_queue() will pick them up.
if not self._injection_queue.empty():
# If messages or triggers arrived while the LLM was processing, skip
# blocking — the next drain pass will pick them up.
if not self._injection_queue.empty() or not self._trigger_queue.empty():
return True
# Clear BEFORE emitting so that synchronous handlers (e.g. the
@@ -1771,6 +1845,7 @@ class EventLoopNode(NodeProtocol):
prompt=prompt,
execution_id=ctx.execution_id or "",
options=options,
questions=questions,
)
self._awaiting_input = True
@@ -1803,12 +1878,13 @@ class EventLoopNode(NodeProtocol):
bool,
str,
list[dict[str, Any]],
bool,
]:
"""Run a single LLM turn with streaming and tool execution.
Returns (assistant_text, real_tool_results, outputs_set, token_counts, logged_tool_calls,
user_input_requested, ask_user_prompt, ask_user_options, queen_input_requested,
system_prompt, messages).
system_prompt, messages, reported_to_parent).
``real_tool_results`` contains only results from actual tools (web_search,
etc.), NOT from synthetic framework tools such as ``set_output``,
@@ -1818,8 +1894,8 @@ class EventLoopNode(NodeProtocol):
``ask_user`` during this turn. This separation lets the caller treat
synthetic tools as framework concerns rather than tool-execution concerns.
``queen_input_requested`` is True when the worker called
``escalate(wait_for_response=true)`` and should wait for
queen guidance before judge evaluation.
``escalate`` and should wait for queen guidance before judge
evaluation.
``logged_tool_calls`` accumulates ALL tool calls across inner iterations
(real tools, set_output, and discarded calls) for L3 logging. Unlike
@@ -1829,7 +1905,7 @@ class EventLoopNode(NodeProtocol):
stream_id = ctx.stream_id or ctx.node_id
node_id = ctx.node_id
execution_id = ctx.execution_id or ""
token_counts: dict[str, int] = {"input": 0, "output": 0}
token_counts: dict[str, int] = {"input": 0, "output": 0, "cached": 0}
tool_call_count = 0
final_text = ""
final_system_prompt = conversation.system_prompt
@@ -1840,6 +1916,7 @@ class EventLoopNode(NodeProtocol):
ask_user_prompt = ""
ask_user_options: list[str] | None = None
queen_input_requested = False
reported_to_parent = False
# Accumulate ALL tool calls across inner iterations for L3 logging.
# Unlike real_tool_results (reset each inner iteration), this persists.
logged_tool_calls: list[dict] = []
@@ -1909,6 +1986,7 @@ class EventLoopNode(NodeProtocol):
elif isinstance(event, FinishEvent):
token_counts["input"] += event.input_tokens
token_counts["output"] += event.output_tokens
token_counts["cached"] += event.cached_tokens
token_counts["stop_reason"] = event.stop_reason
token_counts["model"] = event.model
@@ -1993,6 +2071,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# Execute tool calls — framework tools (set_output, ask_user)
@@ -2136,11 +2215,65 @@ class EventLoopNode(NodeProtocol):
)
results_by_id[tc.tool_use_id] = result
elif tc.tool_name == "ask_user_multiple":
# --- Framework-level ask_user_multiple ---
user_input_requested = True
raw_questions = tc.tool_input.get("questions", [])
if not isinstance(raw_questions, list) or len(raw_questions) < 2:
result = ToolResult(
tool_use_id=tc.tool_use_id,
content=(
"ERROR: questions must be an array of at "
"least 2 question objects. Use ask_user "
"for single questions."
),
is_error=True,
)
results_by_id[tc.tool_use_id] = result
user_input_requested = False
continue
# Normalize each question entry
questions: list[dict] = []
for i, q in enumerate(raw_questions):
if not isinstance(q, dict):
continue
qid = str(q.get("id", f"q{i + 1}"))
prompt = str(q.get("prompt", ""))
opts = q.get("options", None)
if isinstance(opts, list):
opts = [str(o) for o in opts if o]
if len(opts) < 2:
opts = None
else:
opts = None
questions.append(
{
"id": qid,
"prompt": prompt,
**({"options": opts} if opts else {}),
}
)
# Store as multi-question prompt/options for
# the event emission path
ask_user_prompt = ""
ask_user_options = None
# Pass the full questions list via a special
# key that the event emitter picks up
self._pending_multi_questions = questions
result = ToolResult(
tool_use_id=tc.tool_use_id,
content="Waiting for user input...",
is_error=False,
)
results_by_id[tc.tool_use_id] = result
elif tc.tool_name == "escalate":
# --- Framework-level escalate handling ---
reason = str(tc.tool_input.get("reason", "")).strip()
context = str(tc.tool_input.get("context", "")).strip()
# Always wait for queen guidance
if stream_id in ("queen", "judge"):
result = ToolResult(
@@ -2195,6 +2328,7 @@ class EventLoopNode(NodeProtocol):
elif tc.tool_name == "report_to_parent":
# --- Report from sub-agent to parent (optionally blocking) ---
reported_to_parent = True
msg = tc.tool_input.get("message", "")
data = tc.tool_input.get("data")
wait = tc.tool_input.get("wait_for_response", False)
@@ -2382,6 +2516,7 @@ class EventLoopNode(NodeProtocol):
if tc.tool_name not in (
"set_output",
"ask_user",
"ask_user_multiple",
"escalate",
"delegate_to_sub_agent",
"report_to_parent",
@@ -2402,15 +2537,27 @@ class EventLoopNode(NodeProtocol):
content=result.content,
is_error=result.is_error,
)
await self._publish_tool_completed(
stream_id,
node_id,
tc.tool_use_id,
tc.tool_name,
result.content,
result.is_error,
execution_id,
)
if tc.tool_name in ("ask_user", "ask_user_multiple"):
# Defer tool_call_completed until after user responds
self._deferred_tool_complete = {
"stream_id": stream_id,
"node_id": node_id,
"tool_use_id": tc.tool_use_id,
"tool_name": tc.tool_name,
"content": result.content,
"is_error": result.is_error,
"execution_id": execution_id,
}
else:
await self._publish_tool_completed(
stream_id,
node_id,
tc.tool_use_id,
tc.tool_name,
result.content,
result.is_error,
execution_id,
)
# If the limit was hit, add error results for every remaining
# tool call so the conversation stays consistent. Without this,
@@ -2451,7 +2598,7 @@ class EventLoopNode(NodeProtocol):
# next turn. The char-based token estimator underestimates
# actual API tokens, so the standard compaction check in the
# outer loop may not trigger in time.
protect = max(2000, self._config.max_history_tokens // 12)
protect = max(2000, self._config.max_context_tokens // 12)
pruned = await conversation.prune_old_tool_results(
protect_tokens=protect,
min_prune_tokens=max(1000, protect // 3),
@@ -2460,7 +2607,7 @@ class EventLoopNode(NodeProtocol):
logger.info(
"Post-limit pruning: cleared %d old tool results (budget: %d)",
pruned,
self._config.max_history_tokens,
self._config.max_context_tokens,
)
# Limit hit — return from this turn so the judge can
# evaluate instead of looping back for another stream.
@@ -2476,11 +2623,12 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# --- Mid-turn pruning: prevent context blowup within a single turn ---
if conversation.usage_ratio() >= 0.6:
protect = max(2000, self._config.max_history_tokens // 12)
protect = max(2000, self._config.max_context_tokens // 12)
pruned = await conversation.prune_old_tool_results(
protect_tokens=protect,
min_prune_tokens=max(1000, protect // 3),
@@ -2507,6 +2655,7 @@ class EventLoopNode(NodeProtocol):
queen_input_requested,
final_system_prompt,
final_messages,
reported_to_parent,
)
# Tool calls processed -- loop back to stream with updated conversation
@@ -2572,6 +2721,72 @@ class EventLoopNode(NodeProtocol):
},
)
def _build_ask_user_multiple_tool(self) -> Tool:
"""Build the synthetic ask_user_multiple tool for batched questions.
Queen-only tool that presents multiple questions at once so the user
can answer them all in a single interaction rather than one at a time.
"""
return Tool(
name="ask_user_multiple",
description=(
"Ask the user multiple questions at once. Use this instead of "
"ask_user when you have 2 or more questions to ask in the same "
"turn — it lets the user answer everything in one go rather than "
"going back and forth. Each question can have its own predefined "
"options (2-3 choices) or be free-form. The UI renders all "
"questions together with a single Submit button. "
"ALWAYS prefer this over ask_user when you have multiple things "
"to clarify. "
"IMPORTANT: Do NOT repeat the questions in your text response — "
"the widget renders them. Keep your text to a brief intro only. "
'Example: {"questions": ['
' {"id": "scope", "prompt": "What scope?", "options": ["Full", "Partial"]},'
' {"id": "format", "prompt": "Output format?", "options": ["PDF", "CSV", "JSON"]},'
' {"id": "details", "prompt": "Any special requirements?"}'
"]}"
),
parameters={
"type": "object",
"properties": {
"questions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": (
"Short identifier for this question (used in the response)."
),
},
"prompt": {
"type": "string",
"description": "The question text shown to the user.",
},
"options": {
"type": "array",
"items": {"type": "string"},
"description": (
"2-3 predefined choices. The UI appends an "
"'Other' free-text input automatically. "
"Omit only when the user must type a free-form answer."
),
"minItems": 2,
"maxItems": 3,
},
},
"required": ["id", "prompt"],
},
"minItems": 2,
"maxItems": 8,
"description": "List of questions to present to the user.",
},
},
"required": ["questions"],
},
)
def _build_set_output_tool(self, output_keys: list[str] | None) -> Tool | None:
"""Build the synthetic set_output tool for explicit output declaration."""
if not output_keys:
@@ -2906,7 +3121,7 @@ class EventLoopNode(NodeProtocol):
phase_description=ctx.node_spec.description,
success_criteria=ctx.node_spec.success_criteria,
accumulator_state=accumulator.to_dict(),
max_history_tokens=self._config.max_history_tokens,
max_context_tokens=self._config.max_context_tokens,
)
if verdict.action != "ACCEPT":
return JudgeVerdict(
@@ -2914,7 +3129,7 @@ class EventLoopNode(NodeProtocol):
feedback=verdict.feedback or "Phase criteria not met.",
)
return JudgeVerdict(action="ACCEPT")
return JudgeVerdict(action="ACCEPT", feedback="")
# -------------------------------------------------------------------
# Helpers
@@ -2994,8 +3209,10 @@ class EventLoopNode(NodeProtocol):
def _is_stalled(self, recent_responses: list[str]) -> bool:
"""Detect stall using n-gram similarity.
Detects when N consecutive responses have similarity >= threshold.
This catches phrases like "I'm still stuck" vs "I'm stuck".
Detects when ALL N consecutive responses are mutually similar
(>= threshold). A single dissimilar response resets the signal.
This catches phrases like "I'm still stuck" vs "I'm stuck"
without false-positives on "attempt 1" vs "attempt 2".
"""
if len(recent_responses) < self._config.stall_detection_threshold:
return False
@@ -3003,13 +3220,11 @@ class EventLoopNode(NodeProtocol):
return False
threshold = self._config.stall_similarity_threshold
# Check similarity against all recent responses (excluding self)
for i, resp in enumerate(recent_responses):
# Compare against all previous responses
for prev in recent_responses[:i]:
if self._ngram_similarity(resp, prev) >= threshold:
return True
return False
# Every consecutive pair must be similar
for i in range(1, len(recent_responses)):
if self._ngram_similarity(recent_responses[i], recent_responses[i - 1]) < threshold:
return False
return True
@staticmethod
def _is_transient_error(exc: BaseException) -> bool:
@@ -3088,10 +3303,11 @@ class EventLoopNode(NodeProtocol):
self,
recent_tool_fingerprints: list[list[tuple[str, str]]],
) -> tuple[bool, str]:
"""Detect doom loop using n-gram similarity on tool inputs.
"""Detect doom loop via exact fingerprint match.
Detects when N consecutive turns have similar tool calls.
Similarity applies to the canonicalized tool input strings.
Detects when N consecutive turns invoke the same tools with
identical (canonicalized) arguments. Different arguments mean
different work, so only exact matches count.
Returns (is_doom_loop, description).
"""
@@ -3104,23 +3320,12 @@ class EventLoopNode(NodeProtocol):
if not first:
return False, ""
# Convert a turn's list of (name, args) pairs to a single comparable string.
def _turn_sig(fp: list[tuple[str, str]]) -> str:
return "|".join(f"{name}:{args}" for name, args in fp)
first_sig = _turn_sig(first)
similarity_threshold = self._config.stall_similarity_threshold
similar_count = sum(
1
for fp in recent_tool_fingerprints
if self._ngram_similarity(_turn_sig(fp), first_sig) >= similarity_threshold
)
if similar_count >= threshold:
tool_names = [name for fp in recent_tool_fingerprints for name, _ in fp]
# All turns in the window must match the first exactly
if all(fp == first for fp in recent_tool_fingerprints[1:]):
tool_names = [name for name, _ in first]
desc = (
f"Doom loop detected: {similar_count}/{len(recent_tool_fingerprints)} "
f"consecutive similar tool calls ({', '.join(tool_names)})"
f"Doom loop detected: {len(recent_tool_fingerprints)} "
f"identical consecutive tool calls ({', '.join(tool_names)})"
)
return True, desc
return False, ""
@@ -3356,7 +3561,7 @@ class EventLoopNode(NodeProtocol):
phase_grad = getattr(ctx, "continuous_mode", False)
# --- Step 1: Prune old tool results (free, no LLM) ---
protect = max(2000, self._config.max_history_tokens // 12)
protect = max(2000, self._config.max_context_tokens // 12)
pruned = await conversation.prune_old_tool_results(
protect_tokens=protect,
min_prune_tokens=max(1000, protect // 3),
@@ -3462,7 +3667,7 @@ class EventLoopNode(NodeProtocol):
accumulator,
formatted,
)
summary_budget = max(1024, self._config.max_history_tokens // 2)
summary_budget = max(1024, self._config.max_context_tokens // 2)
try:
response = await ctx.llm.acomplete(
messages=[{"role": "user", "content": prompt}],
@@ -3565,7 +3770,7 @@ class EventLoopNode(NodeProtocol):
elif spec.output_keys:
ctx_lines.append(f"OUTPUTS STILL NEEDED: {', '.join(spec.output_keys)}")
target_tokens = self._config.max_history_tokens // 2
target_tokens = self._config.max_context_tokens // 2
target_chars = target_tokens * 4
node_ctx = "\n".join(ctx_lines)
@@ -3881,6 +4086,34 @@ class EventLoopNode(NodeProtocol):
break
return count
async def _drain_trigger_queue(self, conversation: NodeConversation) -> int:
"""Drain all pending trigger events as a single batched user message.
Multiple triggers are merged so the LLM sees them atomically and can
reason about all pending triggers before acting.
"""
triggers: list[TriggerEvent] = []
while not self._trigger_queue.empty():
try:
triggers.append(self._trigger_queue.get_nowait())
except asyncio.QueueEmpty:
break
if not triggers:
return 0
parts: list[str] = []
for t in triggers:
task = t.payload.get("task", "")
task_line = f"\nTask: {task}" if task else ""
payload_str = json.dumps(t.payload, default=str)
parts.append(f"[TRIGGER: {t.trigger_type}/{t.source_id}]{task_line}\n{payload_str}")
combined = "\n\n".join(parts)
logger.info("[drain] %d trigger(s): %s", len(triggers), combined[:200])
await conversation.add_user_message(combined)
return len(triggers)
async def _check_pause(
self,
ctx: NodeContext,
@@ -4015,7 +4248,12 @@ class EventLoopNode(NodeProtocol):
await conversation.add_user_message(result.inject)
async def _publish_iteration(
self, stream_id: str, node_id: str, iteration: int, execution_id: str = ""
self,
stream_id: str,
node_id: str,
iteration: int,
execution_id: str = "",
extra_data: dict | None = None,
) -> None:
if self._event_bus:
await self._event_bus.emit_node_loop_iteration(
@@ -4023,6 +4261,7 @@ class EventLoopNode(NodeProtocol):
node_id=node_id,
iteration=iteration,
execution_id=execution_id,
extra_data=extra_data,
)
async def _publish_llm_turn_complete(
@@ -4033,6 +4272,7 @@ class EventLoopNode(NodeProtocol):
model: str,
input_tokens: int,
output_tokens: int,
cached_tokens: int = 0,
execution_id: str = "",
iteration: int | None = None,
) -> None:
@@ -4044,6 +4284,7 @@ class EventLoopNode(NodeProtocol):
model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
cached_tokens=cached_tokens,
execution_id=execution_id,
iteration=iteration,
)
@@ -4326,22 +4567,18 @@ class EventLoopNode(NodeProtocol):
registry[escalation_id] = receiver
try:
# Stream message to user (parent's node_id so TUI shows parent talking)
await self._event_bus.emit_client_output_delta(
stream_id=ctx.node_id,
node_id=ctx.node_id,
content=message,
snapshot=message,
execution_id=ctx.execution_id,
)
# Request input (escalation_id for routing response back)
await self._event_bus.emit_client_input_requested(
stream_id=ctx.node_id,
# Escalate to the queen instead of asking the user directly.
# The queen handles the request and injects the response via
# inject_worker_message(), which finds this receiver through
# its _awaiting_input flag.
await self._event_bus.emit_escalation_requested(
stream_id=ctx.stream_id or ctx.node_id,
node_id=escalation_id,
prompt=message,
reason=f"Subagent report (wait_for_response) from {agent_id}",
context=message,
execution_id=ctx.execution_id,
)
# Block until user responds
# Block until queen responds
return await receiver.wait()
finally:
registry.pop(escalation_id, None)
@@ -4448,7 +4685,7 @@ class EventLoopNode(NodeProtocol):
max_iterations=max_iter, # Tighter budget
max_tool_calls_per_turn=self._config.max_tool_calls_per_turn,
tool_call_overflow_margin=self._config.tool_call_overflow_margin,
max_history_tokens=self._config.max_history_tokens,
max_context_tokens=self._config.max_context_tokens,
stall_detection_threshold=self._config.stall_detection_threshold,
max_tool_result_chars=self._config.max_tool_result_chars,
spillover_dir=subagent_spillover,
+16 -3
View File
@@ -34,6 +34,16 @@ from framework.schemas.checkpoint import Checkpoint
from framework.storage.checkpoint_store import CheckpointStore
def _default_max_context_tokens() -> int:
"""Resolve max_context_tokens from global config, falling back to 32000."""
try:
from framework.config import get_max_context_tokens
return get_max_context_tokens()
except Exception:
return 32_000
@dataclass
class ExecutionResult:
"""Result of executing a graph."""
@@ -138,6 +148,7 @@ class GraphExecutor:
tool_provider_map: dict[str, str] | None = None,
dynamic_tools_provider: Callable | None = None,
dynamic_prompt_provider: Callable | None = None,
iteration_metadata_provider: Callable | None = None,
):
"""
Initialize the executor.
@@ -183,6 +194,7 @@ class GraphExecutor:
self.tool_provider_map = tool_provider_map
self.dynamic_tools_provider = dynamic_tools_provider
self.dynamic_prompt_provider = dynamic_prompt_provider
self.iteration_metadata_provider = iteration_metadata_provider
# Parallel execution settings
self.enable_parallel_execution = enable_parallel_execution
@@ -330,7 +342,7 @@ class GraphExecutor:
_depth,
)
else:
max_tokens = getattr(conversation, "_max_history_tokens", 32000)
max_tokens = getattr(conversation, "_max_context_tokens", 32000)
target_tokens = max_tokens // 2
target_chars = target_tokens * 4
@@ -1604,7 +1616,7 @@ class GraphExecutor:
# Return with paused status
return ExecutionResult(
success=False,
error="Execution paused by user",
error="Execution cancelled",
output=saved_memory,
steps_executed=steps,
total_tokens=total_tokens,
@@ -1799,6 +1811,7 @@ class GraphExecutor:
shared_node_registry=self.node_registry, # For subagent escalation routing
dynamic_tools_provider=self.dynamic_tools_provider,
dynamic_prompt_provider=self.dynamic_prompt_provider,
iteration_metadata_provider=self.iteration_metadata_provider,
)
VALID_NODE_TYPES = {
@@ -1872,7 +1885,7 @@ class GraphExecutor:
max_tool_calls_per_turn=lc.get("max_tool_calls_per_turn", 30),
tool_call_overflow_margin=lc.get("tool_call_overflow_margin", 0.5),
stall_detection_threshold=lc.get("stall_detection_threshold", 3),
max_history_tokens=lc.get("max_history_tokens", 32000),
max_context_tokens=lc.get("max_context_tokens", _default_max_context_tokens()),
max_tool_result_chars=lc.get("max_tool_result_chars", 30_000),
spillover_dir=spillover,
hooks=lc.get("hooks", {}),
-203
View File
@@ -1,203 +0,0 @@
"""
Standardized HITL (Human-In-The-Loop) Protocol
This module defines the formal structure for pause/resume interactions
where agents need to gather input from humans.
"""
from dataclasses import dataclass, field
from enum import StrEnum
from typing import Any
class HITLInputType(StrEnum):
"""Type of input expected from human."""
FREE_TEXT = "free_text" # Open-ended text response
STRUCTURED = "structured" # Specific fields to fill
SELECTION = "selection" # Choose from options
APPROVAL = "approval" # Yes/no/modify decision
MULTI_FIELD = "multi_field" # Multiple related inputs
@dataclass
class HITLQuestion:
"""A single question to ask the human."""
id: str
question: str
input_type: HITLInputType = HITLInputType.FREE_TEXT
# For SELECTION type
options: list[str] = field(default_factory=list)
# For STRUCTURED type
fields: dict[str, str] = field(default_factory=dict) # {field_name: description}
# Metadata
required: bool = True
help_text: str = ""
@dataclass
class HITLRequest:
"""
Formal request for human input at a pause node.
This is what the agent produces when it needs human input.
"""
# Context
objective: str # What we're trying to accomplish
current_state: str # Where we are in the process
# What we need
questions: list[HITLQuestion] = field(default_factory=list)
missing_info: list[str] = field(default_factory=list)
# Guidance
instructions: str = ""
examples: list[str] = field(default_factory=list)
# Metadata
request_id: str = ""
node_id: str = ""
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for serialization."""
return {
"objective": self.objective,
"current_state": self.current_state,
"questions": [
{
"id": q.id,
"question": q.question,
"input_type": q.input_type.value,
"options": q.options,
"fields": q.fields,
"required": q.required,
"help_text": q.help_text,
}
for q in self.questions
],
"missing_info": self.missing_info,
"instructions": self.instructions,
"examples": self.examples,
"request_id": self.request_id,
"node_id": self.node_id,
}
@dataclass
class HITLResponse:
"""
Human's response to a HITL request.
This is what gets passed back when resuming from a pause.
"""
# Original request reference
request_id: str
# Human's answers
answers: dict[str, Any] = field(default_factory=dict) # {question_id: answer}
raw_input: str = "" # Raw text if provided
# Metadata
response_time_ms: int = 0
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for serialization."""
return {
"request_id": self.request_id,
"answers": self.answers,
"raw_input": self.raw_input,
"response_time_ms": self.response_time_ms,
}
class HITLProtocol:
"""
Standardized protocol for HITL interactions.
Usage in pause nodes:
1. Pause Node: Generates HITLRequest with questions
2. Executor: Saves state and returns request to user
3. User: Provides HITLResponse with answers
4. Resume Node: Processes response and merges into context
"""
@staticmethod
def create_request(
objective: str,
questions: list[HITLQuestion],
missing_info: list[str] | None = None,
node_id: str = "",
) -> HITLRequest:
"""Create a standardized HITL request."""
return HITLRequest(
objective=objective,
current_state="Awaiting clarification",
questions=questions,
missing_info=missing_info or [],
request_id=f"{node_id}_{hash(objective) % 10000}",
node_id=node_id,
)
@staticmethod
def parse_response(
raw_input: str,
request: HITLRequest,
use_haiku: bool = True,
) -> HITLResponse:
"""
Parse human's raw input into structured response.
Maps the raw input to the first question. For multi-question HITL,
the caller should present one question at a time.
"""
response = HITLResponse(request_id=request.request_id, raw_input=raw_input)
# If no questions, just return raw input
if not request.questions:
return response
# Map raw input to first question
response.answers[request.questions[0].id] = raw_input
return response
@staticmethod
def format_for_display(request: HITLRequest) -> str:
"""Format HITL request for user-friendly display."""
parts = []
if request.objective:
parts.append(f"📋 Objective: {request.objective}")
if request.current_state:
parts.append(f"📍 Current State: {request.current_state}")
if request.instructions:
parts.append(f"\n{request.instructions}")
if request.questions:
parts.append(f"\n❓ Questions ({len(request.questions)}):")
for i, q in enumerate(request.questions, 1):
parts.append(f"{i}. {q.question}")
if q.help_text:
parts.append(f" 💡 {q.help_text}")
if q.options:
parts.append(f" Options: {', '.join(q.options)}")
if request.missing_info:
parts.append("\n📝 Missing Information:")
for info in request.missing_info:
parts.append(f"{info}")
if request.examples:
parts.append("\n📚 Examples:")
for example in request.examples:
parts.append(f"{example}")
return "\n".join(parts)
+5
View File
@@ -565,6 +565,11 @@ class NodeContext:
# staging / running) without restarting the conversation.
dynamic_prompt_provider: Any = None # Callable[[], str] | None
# Per-iteration metadata provider — when set, EventLoopNode merges
# the returned dict into node_loop_iteration event data. Used by
# the queen to record the current phase per iteration.
iteration_metadata_provider: Any = None # Callable[[], dict] | None
@dataclass
class NodeResult:
+137 -5
View File
@@ -119,6 +119,29 @@ RATE_LIMIT_BACKOFF_BASE = 2 # seconds
RATE_LIMIT_MAX_DELAY = 120 # seconds - cap to prevent absurd waits
MINIMAX_API_BASE = "https://api.minimax.io/v1"
# Providers that accept cache_control on message content blocks.
# Anthropic: native ephemeral caching. MiniMax & Z-AI/GLM: pass-through to their APIs.
# (OpenAI caches automatically server-side; Groq/Gemini/etc. strip the header.)
_CACHE_CONTROL_PREFIXES = (
"anthropic/",
"claude-",
"minimax/",
"minimax-",
"MiniMax-",
"zai-glm",
"glm-",
)
def _model_supports_cache_control(model: str) -> bool:
return any(model.startswith(p) for p in _CACHE_CONTROL_PREFIXES)
# Kimi For Coding uses an Anthropic-compatible endpoint (no /v1 suffix).
# Claude Code integration uses this format; the /v1 OpenAI-compatible endpoint
# enforces a coding-agent whitelist that blocks unknown User-Agents.
KIMI_API_BASE = "https://api.kimi.com/coding"
# Empty-stream retries use a short fixed delay, not the rate-limit backoff.
# Conversation-structure issues are deterministic — long waits don't help.
EMPTY_STREAM_MAX_RETRIES = 3
@@ -323,9 +346,21 @@ class LiteLLMProvider(LLMProvider):
api_base: Custom API base URL (for proxies or local deployments)
**kwargs: Additional arguments passed to litellm.completion()
"""
# Kimi For Coding exposes an Anthropic-compatible endpoint at
# https://api.kimi.com/coding (the same format Claude Code uses natively).
# Translate kimi/ prefix to anthropic/ so litellm uses the Anthropic
# Messages API handler and routes to that endpoint — no special headers needed.
_original_model = model
if model.lower().startswith("kimi/"):
model = "anthropic/" + model[len("kimi/") :]
# Normalise api_base: litellm's Anthropic handler appends /v1/messages,
# so the base must be https://api.kimi.com/coding (no /v1 suffix).
# Strip a trailing /v1 in case the user's saved config has the old value.
if api_base and api_base.rstrip("/").endswith("/v1"):
api_base = api_base.rstrip("/")[:-3]
self.model = model
self.api_key = api_key
self.api_base = api_base or self._default_api_base_for_model(model)
self.api_base = api_base or self._default_api_base_for_model(_original_model)
self.extra_kwargs = kwargs
# The Codex ChatGPT backend (chatgpt.com/backend-api/codex) rejects
# several standard OpenAI params: max_output_tokens, stream_options.
@@ -350,6 +385,8 @@ class LiteLLMProvider(LLMProvider):
model_lower = model.lower()
if model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
return MINIMAX_API_BASE
if model_lower.startswith("kimi/"):
return KIMI_API_BASE
return None
def _completion_with_rate_limit_retry(
@@ -689,7 +726,10 @@ class LiteLLMProvider(LLMProvider):
full_messages: list[dict[str, Any]] = []
if system:
full_messages.append({"role": "system", "content": system})
sys_msg: dict[str, Any] = {"role": "system", "content": system}
if _model_supports_cache_control(self.model):
sys_msg["cache_control"] = {"type": "ephemeral"}
full_messages.append(sys_msg)
full_messages.extend(messages)
if json_mode:
@@ -860,7 +900,10 @@ class LiteLLMProvider(LLMProvider):
full_messages: list[dict[str, Any]] = []
if system:
full_messages.append({"role": "system", "content": system})
sys_msg: dict[str, Any] = {"role": "system", "content": system}
if _model_supports_cache_control(self.model):
sys_msg["cache_control"] = {"type": "ephemeral"}
full_messages.append(sys_msg)
full_messages.extend(messages)
# Codex Responses API requires an `instructions` field (system prompt).
@@ -925,9 +968,26 @@ class LiteLLMProvider(LLMProvider):
response = await litellm.acompletion(**kwargs) # type: ignore[union-attr]
async for chunk in response:
choice = chunk.choices[0] if chunk.choices else None
if not choice:
# Capture usage from the trailing usage-only chunk that
# stream_options={"include_usage": True} sends with empty choices.
if not chunk.choices:
usage = getattr(chunk, "usage", None)
if usage:
input_tokens = getattr(usage, "prompt_tokens", 0) or 0
output_tokens = getattr(usage, "completion_tokens", 0) or 0
logger.debug(
"[tokens] trailing usage chunk: input=%d output=%d model=%s",
input_tokens,
output_tokens,
self.model,
)
else:
logger.debug(
"[tokens] empty-choices chunk with no usage (model=%s)",
self.model,
)
continue
choice = chunk.choices[0]
delta = choice.delta
@@ -1000,19 +1060,91 @@ class LiteLLMProvider(LLMProvider):
tail_events.append(TextEndEvent(full_text=accumulated_text))
usage = getattr(chunk, "usage", None)
logger.debug(
"[tokens] finish-chunk raw usage: %r (type=%s)",
usage,
type(usage).__name__,
)
cached_tokens = 0
if usage:
input_tokens = getattr(usage, "prompt_tokens", 0) or 0
output_tokens = getattr(usage, "completion_tokens", 0) or 0
_details = getattr(usage, "prompt_tokens_details", None)
cached_tokens = (
getattr(_details, "cached_tokens", 0) or 0
if _details is not None
else getattr(usage, "cache_read_input_tokens", 0) or 0
)
logger.debug(
"[tokens] finish-chunk usage: "
"input=%d output=%d cached=%d model=%s",
input_tokens,
output_tokens,
cached_tokens,
self.model,
)
logger.debug(
"[tokens] finish event: input=%d output=%d cached=%d stop=%s model=%s",
input_tokens,
output_tokens,
cached_tokens,
choice.finish_reason,
self.model,
)
tail_events.append(
FinishEvent(
stop_reason=choice.finish_reason,
input_tokens=input_tokens,
output_tokens=output_tokens,
cached_tokens=cached_tokens,
model=self.model,
)
)
# Fallback: LiteLLM strips usage from yielded chunks before
# returning them to us, but appends the original chunk (with
# usage intact) to response.chunks first. Use LiteLLM's own
# calculate_total_usage() on that accumulated list.
if input_tokens == 0 and output_tokens == 0:
try:
from litellm.litellm_core_utils.streaming_handler import (
calculate_total_usage,
)
_chunks = getattr(response, "chunks", None)
if _chunks:
_usage = calculate_total_usage(chunks=_chunks)
input_tokens = _usage.prompt_tokens or 0
output_tokens = _usage.completion_tokens or 0
_details = getattr(_usage, "prompt_tokens_details", None)
cached_tokens = (
getattr(_details, "cached_tokens", 0) or 0
if _details is not None
else getattr(_usage, "cache_read_input_tokens", 0) or 0
)
logger.debug(
"[tokens] post-loop chunks fallback:"
" input=%d output=%d cached=%d model=%s",
input_tokens,
output_tokens,
cached_tokens,
self.model,
)
# Patch the FinishEvent already queued with 0 tokens
for _i, _ev in enumerate(tail_events):
if isinstance(_ev, FinishEvent) and _ev.input_tokens == 0:
tail_events[_i] = FinishEvent(
stop_reason=_ev.stop_reason,
input_tokens=input_tokens,
output_tokens=output_tokens,
cached_tokens=cached_tokens,
model=_ev.model,
)
break
except Exception as _e:
logger.debug("[tokens] chunks fallback failed: %s", _e)
# Check whether the stream produced any real content.
# (If text deltas were yielded above, has_content is True
# and we skip the retry path — nothing was yielded in vain.)
+1
View File
@@ -71,6 +71,7 @@ class FinishEvent:
stop_reason: str = ""
input_tokens: int = 0
output_tokens: int = 0
cached_tokens: int = 0
model: str = ""
-4
View File
@@ -1,4 +0,0 @@
"""MCP servers for worker-bee."""
# Don't auto-import servers to avoid double-import issues when running with -m
__all__ = []
+1 -33
View File
@@ -1,33 +1 @@
"""Framework-level worker monitoring package.
Provides the Worker Health Judge: a reusable secondary graph that attaches to
any worker agent runtime and monitors its execution health via periodic log
inspection. Emits structured EscalationTickets when degradation is detected.
Usage::
from framework.monitoring import HEALTH_JUDGE_ENTRY_POINT, judge_goal, judge_graph
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
# Register tools bound to the worker runtime's EventBus
monitoring_registry = ToolRegistry()
register_worker_monitoring_tools(monitoring_registry, worker_runtime._event_bus, storage_path)
# Load judge as secondary graph on the worker runtime
await worker_runtime.add_graph(
graph_id="judge",
graph=judge_graph,
goal=judge_goal,
entry_points={"health_check": HEALTH_JUDGE_ENTRY_POINT},
storage_subpath="graphs/judge",
)
"""
from .judge import HEALTH_JUDGE_ENTRY_POINT, judge_goal, judge_graph, judge_node
__all__ = [
"HEALTH_JUDGE_ENTRY_POINT",
"judge_goal",
"judge_graph",
"judge_node",
]
"""Framework-level worker monitoring package."""
-258
View File
@@ -1,258 +0,0 @@
"""Worker Health Judge — framework-level reusable monitoring graph.
Attaches to any worker agent runtime as a secondary graph. Fires on a
2-minute timer, reads the worker's session logs via ``get_worker_health_summary``,
accumulates observations in a continuous conversation context, and emits a
structured ``EscalationTicket`` when it detects a degradation pattern.
Usage::
from framework.monitoring import judge_graph, judge_goal, HEALTH_JUDGE_ENTRY_POINT
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
# Register tools bound to the worker runtime's event bus
monitoring_registry = ToolRegistry()
register_worker_monitoring_tools(
monitoring_registry, worker_runtime._event_bus, storage_path
)
monitoring_tools = list(monitoring_registry.get_tools().values())
monitoring_executor = monitoring_registry.get_executor()
# Load judge as secondary graph on the worker runtime
await worker_runtime.add_graph(
graph_id="judge",
graph=judge_graph,
goal=judge_goal,
entry_points={"health_check": HEALTH_JUDGE_ENTRY_POINT},
storage_subpath="graphs/judge",
)
Design:
- ``isolation_level="isolated"`` the judge has its own memory, not
polluting the worker's shared memory namespace.
- ``conversation_mode="continuous"`` the judge's conversation carries
across timer ticks. The conversation IS the judge's memory. It tracks
trends by referring to its own prior messages ("Last check I saw 47
steps; now 52; 5 new steps, 3 RETRY").
- No shared memory keys. No external state files.
"""
from __future__ import annotations
from framework.graph import Constraint, Goal, NodeSpec, SuccessCriterion
from framework.graph.edge import AsyncEntryPointSpec, GraphSpec
# ---------------------------------------------------------------------------
# Goal
# ---------------------------------------------------------------------------
judge_goal = Goal(
id="worker-health-monitor",
name="Worker Health Monitor",
description=(
"Periodically assess the health of the worker agent by reading its "
"execution logs. Detect degradation patterns (excessive retries, "
"stalls, doom loops) and emit structured EscalationTickets when the "
"worker needs attention."
),
success_criteria=[
SuccessCriterion(
id="accurate-detection",
description="Only escalates genuine degradation, not normal retry cycles",
metric="false_positive_rate",
target="low",
weight=0.5,
),
SuccessCriterion(
id="timely-detection",
description="Detects genuine stalls within 2 timer ticks (≤4 minutes)",
metric="detection_latency_minutes",
target="<=4",
weight=0.5,
),
],
constraints=[
Constraint(
id="conservative-escalation",
description=(
"Do not escalate on a single bad verdict or a brief stall. "
"Require clear patterns (10+ consecutive bad verdicts or 4+ minute stall) "
"before creating a ticket."
),
constraint_type="hard",
category="quality",
),
Constraint(
id="complete-ticket",
description=(
"Every EscalationTicket must have all required fields filled. "
"Do not emit partial or placeholder tickets."
),
constraint_type="hard",
category="correctness",
),
],
)
# ---------------------------------------------------------------------------
# Node
# ---------------------------------------------------------------------------
judge_node = NodeSpec(
id="judge",
name="Worker Health Judge",
description=(
"Autonomous health monitor for worker agents. Reads execution logs "
"on each timer tick, compares to prior observations (via conversation "
"history), and emits a structured EscalationTicket when a genuine "
"degradation pattern is detected."
),
node_type="event_loop",
client_facing=False, # Autonomous monitor, not interactive
max_node_visits=0, # Unbounded — runs on every timer tick
input_keys=[],
output_keys=["health_verdict"],
nullable_output_keys=["health_verdict"],
success_criteria=(
"A clear health verdict is produced each check: either 'healthy' with "
"a brief observation, or a complete EscalationTicket is emitted via "
"emit_escalation_ticket and health_verdict describes the issue."
),
tools=[
"get_worker_health_summary",
"emit_escalation_ticket",
],
system_prompt="""\
You are the Worker Health Judge. You run every 2 minutes alongside a worker \
agent to monitor its execution health.
# Your Role
You observe the worker's iteration patterns over time and escalate only when \
you see genuine degradation not normal retry cycles. Your conversation history \
IS your memory. On each check, refer to your previous observations to track trends.
# Check Procedure
On each timer tick (every 2 minutes):
## Step 1: Read health snapshot
Call get_worker_health_summary() with no arguments to auto-discover the active \
session. This returns:
- worker_agent_id: the worker's agent name — use this for ticket identity fields
- worker_graph_id: the worker's primary graph ID — use this for ticket identity fields
- session_id: the session being monitored use this for worker_session_id in tickets
- total_steps: how many log steps have been recorded
- recent_verdicts: list of recent ACCEPT/RETRY/CONTINUE verdicts
- steps_since_last_accept: consecutive non-ACCEPT steps
- stall_minutes: wall-clock since last step (null if active)
- evidence_snippet: recent LLM output
## Step 2: Compare to prior check
Look at your conversation history. What was total_steps last time?
- If total_steps is UNCHANGED from prior check AND prior check was also unchanged:
STALL confirmed (worker has produced no new iterations in 4+ minutes).
Escalate with severity="high" or "critical" depending on stall duration.
- If total_steps increased: worker is making progress. Examine verdicts.
## Step 3: Analyze verdict pattern
- Healthy: Mix of ACCEPT and RETRY, steps_since_last_accept < 5. No action.
- Warning: steps_since_last_accept is 5-9. Note it, no escalation yet.
- Degraded: steps_since_last_accept >= 10. Examine evidence_snippet.
- If evidence shows the agent is making real progress (complex reasoning,
exploring solutions, productive tool use): may be a hard problem. Note it.
- If evidence shows a loop (same error, same tool call, no new information):
Escalate with severity="medium" or "high".
- Critical: steps_since_last_accept >= 20, OR stall_minutes >= 4.
Escalate with severity="critical".
## Step 4: Decide
### If healthy:
set_output("health_verdict", "healthy: <brief observation>")
Done.
### If escalating:
Build an EscalationTicket JSON string with ALL required fields:
{
"worker_agent_id": "<worker_agent_id from get_worker_health_summary>",
"worker_session_id": "<session_id from get_worker_health_summary>",
"worker_node_id": "<worker_graph_id from get_worker_health_summary>",
"worker_graph_id": "<worker_graph_id from get_worker_health_summary>",
"severity": "<low|medium|high|critical>",
"cause": "<what you observed — concrete, specific>",
"judge_reasoning": "<why you decided to escalate, not just dismiss>",
"suggested_action": "<what you recommend: restart, human review, etc.>",
"recent_verdicts": [<list from get_worker_health_summary>],
"total_steps_checked": <int>,
"steps_since_last_accept": <int>,
"stall_minutes": <float or null>,
"evidence_snippet": "<from get_worker_health_summary>"
}
Call: emit_escalation_ticket(ticket_json=<the JSON string above>)
Then: set_output("health_verdict", "escalated: <one-line summary>")
# Severity Guide
- low: Mild concern, worth noting. 5-9 consecutive bad verdicts.
- medium: Clear degradation pattern. 10-15 bad verdicts or brief stall (1-2 min).
- high: Serious issue. 15+ bad verdicts or stall 2-4 minutes or clear doom loop.
- critical: Worker is definitively stuck. 20+ bad verdicts or stall > 4 minutes.
# Conservative Bias
You MUST resist the urge to escalate prematurely. Worker agents naturally retry.
A node may legitimately need 5-8 retries before succeeding. Do not escalate unless:
1. The pattern is clear and sustained across your observation window, AND
2. The evidence shows no genuine progress
One missed escalation is less costly than two false alarms. The Queen will filter \
further. But do not be passive genuine stalls and doom loops must be caught.
# Rules
- Never escalate on the FIRST check unless stall_minutes > 4
- Always call get_worker_health_summary FIRST before deciding anything
- All ticket fields are REQUIRED do not submit partial tickets
- After any emit_escalation_ticket call, always set_output to complete the check
""",
)
# ---------------------------------------------------------------------------
# Entry Point
# ---------------------------------------------------------------------------
HEALTH_JUDGE_ENTRY_POINT = AsyncEntryPointSpec(
id="health_check",
name="Worker Health Check",
entry_node="judge",
trigger_type="timer",
trigger_config={
"interval_minutes": 2,
"run_immediately": True, # Fire immediately to establish a baseline
},
isolation_level="isolated", # Own memory namespace, not polluting worker's
)
# ---------------------------------------------------------------------------
# Graph
# ---------------------------------------------------------------------------
judge_graph = GraphSpec(
id="judge-graph",
goal_id=judge_goal.id,
version="1.0.0",
entry_node="judge",
entry_points={"health_check": "judge"},
terminal_nodes=["judge"], # Judge node can terminate after each check
pause_nodes=[],
nodes=[judge_node],
edges=[],
conversation_mode="continuous", # Conversation persists across timer ticks
async_entry_points=[HEALTH_JUDGE_ENTRY_POINT],
loop_config={
"max_iterations": 10, # One check shouldn't take many turns
"max_tool_calls_per_turn": 3, # get_summary + optionally emit_ticket
"max_history_tokens": 16000, # Compact — judge only needs recent context
},
)
+3 -2
View File
@@ -148,8 +148,9 @@ class HumanReadableFormatter(logging.Formatter):
if record_event is not None:
event = f" [{record_event}]"
# Format message: [LEVEL] [trace context] message
return f"{color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
timestamp = self.formatTime(record, "%Y-%m-%d %H:%M:%S")
# Format message: TIMESTAMP [LEVEL] [trace context] message
return f"{timestamp} {color}[{level}]{reset} {context_prefix}{record.getMessage()}{event}"
def configure_logging(
+81 -391
View File
@@ -51,11 +51,7 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
action="store_true",
help="Show detailed execution logs (steps, LLM calls, etc.)",
)
run_parser.add_argument(
"--tui",
action="store_true",
help="Launch interactive terminal dashboard",
)
run_parser.add_argument(
"--model",
"-m",
@@ -194,143 +190,6 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
shell_parser.set_defaults(func=cmd_shell)
# tui command (interactive agent dashboard)
tui_parser = subparsers.add_parser(
"tui",
help="Launch interactive TUI dashboard",
description="Browse available agents and launch the terminal dashboard.",
)
tui_parser.add_argument(
"--model",
"-m",
type=str,
default=None,
help="LLM model to use (any LiteLLM-compatible name)",
)
tui_parser.set_defaults(func=cmd_tui)
# sessions command group (checkpoint/resume management)
sessions_parser = subparsers.add_parser(
"sessions",
help="Manage agent sessions",
description="List, inspect, and manage agent execution sessions.",
)
sessions_subparsers = sessions_parser.add_subparsers(
dest="sessions_cmd",
help="Session management commands",
)
# sessions list
sessions_list_parser = sessions_subparsers.add_parser(
"list",
help="List agent sessions",
description="List all sessions for an agent.",
)
sessions_list_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_list_parser.add_argument(
"--status",
choices=["all", "active", "failed", "completed", "paused"],
default="all",
help="Filter by session status (default: all)",
)
sessions_list_parser.add_argument(
"--has-checkpoints",
action="store_true",
help="Show only sessions with checkpoints",
)
sessions_list_parser.set_defaults(func=cmd_sessions_list)
# sessions show
sessions_show_parser = sessions_subparsers.add_parser(
"show",
help="Show session details",
description="Display detailed information about a specific session.",
)
sessions_show_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_show_parser.add_argument(
"session_id",
type=str,
help="Session ID to inspect",
)
sessions_show_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
sessions_show_parser.set_defaults(func=cmd_sessions_show)
# sessions checkpoints
sessions_checkpoints_parser = sessions_subparsers.add_parser(
"checkpoints",
help="List session checkpoints",
description="List all checkpoints for a session.",
)
sessions_checkpoints_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
sessions_checkpoints_parser.add_argument(
"session_id",
type=str,
help="Session ID",
)
sessions_checkpoints_parser.set_defaults(func=cmd_sessions_checkpoints)
# pause command
pause_parser = subparsers.add_parser(
"pause",
help="Pause running session",
description="Request graceful pause of a running agent session.",
)
pause_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
pause_parser.add_argument(
"session_id",
type=str,
help="Session ID to pause",
)
pause_parser.set_defaults(func=cmd_pause)
# resume command
resume_parser = subparsers.add_parser(
"resume",
help="Resume session from checkpoint",
description="Resume a paused or failed session from a checkpoint.",
)
resume_parser.add_argument(
"agent_path",
type=str,
help="Path to agent folder",
)
resume_parser.add_argument(
"session_id",
type=str,
help="Session ID to resume",
)
resume_parser.add_argument(
"--checkpoint",
"-c",
type=str,
help="Specific checkpoint ID to resume from (default: latest)",
)
resume_parser.add_argument(
"--tui",
action="store_true",
help="Resume in TUI dashboard mode",
)
resume_parser.set_defaults(func=cmd_resume)
# setup-credentials command
setup_creds_parser = subparsers.add_parser(
"setup-credentials",
@@ -384,6 +243,8 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
action="store_true",
help="Open dashboard in browser after server starts",
)
serve_parser.add_argument("--verbose", "-v", action="store_true", help="Enable INFO log level")
serve_parser.add_argument("--debug", action="store_true", help="Enable DEBUG log level")
serve_parser.set_defaults(func=cmd_serve)
# open command (serve + auto-open browser)
@@ -421,6 +282,8 @@ def register_commands(subparsers: argparse._SubParsersAction) -> None:
default=None,
help="LLM model for preloaded agents",
)
open_parser.add_argument("--verbose", "-v", action="store_true", help="Enable INFO log level")
open_parser.add_argument("--debug", action="store_true", help="Enable DEBUG log level")
open_parser.set_defaults(func=cmd_open)
@@ -516,18 +379,18 @@ def _prompt_before_start(agent_path: str, runner, model: str | None = None):
def cmd_run(args: argparse.Namespace) -> int:
"""Run an exported agent."""
import logging
from framework.credentials.models import CredentialError
from framework.observability import configure_logging
from framework.runner import AgentRunner
# Set logging level (quiet by default for cleaner output)
if args.quiet:
logging.basicConfig(level=logging.ERROR, format="%(message)s")
configure_logging(level="ERROR")
elif getattr(args, "verbose", False):
logging.basicConfig(level=logging.INFO, format="%(message)s")
configure_logging(level="INFO")
else:
logging.basicConfig(level=logging.WARNING, format="%(message)s")
configure_logging(level="WARNING")
# Load input context
context = {}
@@ -562,128 +425,67 @@ def cmd_run(args: argparse.Namespace) -> int:
)
return 1
# Run the agent (with TUI or standard)
if getattr(args, "tui", False):
from framework.tui.app import AdenTUI
# Standard execution
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
async def run_with_tui():
try:
# Load runner inside the async loop to ensure strict loop affinity
# (only one load — avoids spawning duplicate MCP subprocesses)
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
# Prompt before starting (allows credential updates)
if sys.stdin.isatty() and not args.quiet:
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return 1
# Prompt before starting (allows credential updates)
if sys.stdin.isatty():
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return
# Force setup inside the loop
if runner._agent_runtime is None:
try:
runner._setup()
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
# Start runtime before TUI so it's ready for user input
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(
runner._agent_runtime,
resume_session=getattr(args, "resume_session", None),
resume_checkpoint=getattr(args, "checkpoint", None),
)
# TUI handles execution via ChatRepl — user submits input,
# ChatRepl calls runtime.trigger_and_wait(). No auto-launch.
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
return None
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
else:
# Standard execution — load runner here (not shared with TUI path)
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
args.agent_path,
model=args.model,
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
checkpoint = getattr(args, "checkpoint", None)
if resume_session:
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
if session_state is None:
print(
f"Error: Could not load session state for {resume_session}",
file=sys.stderr,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return 1
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# Prompt before starting (allows credential updates)
if sys.stdin.isatty() and not args.quiet:
runner = _prompt_before_start(args.agent_path, runner, args.model)
if runner is None:
return 1
# Load session/checkpoint state for resume (headless mode)
session_state = None
resume_session = getattr(args, "resume_session", None)
checkpoint = getattr(args, "checkpoint", None)
if resume_session:
session_state = _load_resume_state(args.agent_path, resume_session, checkpoint)
if session_state is None:
print(
f"Error: Could not load session state for {resume_session}",
file=sys.stderr,
)
return 1
if not args.quiet:
resume_node = session_state.get("paused_at", "unknown")
if checkpoint:
print(f"Resuming from checkpoint: {checkpoint}")
else:
print(f"Resuming session: {resume_session}")
print(f"Resume point: {resume_node}")
print()
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
context["user_id"] = os.environ.get("USER", "default_user")
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
resume_node = session_state.get("paused_at", "unknown")
if checkpoint:
print(f"Resuming from checkpoint: {checkpoint}")
else:
print(f"Resuming session: {resume_session}")
print(f"Resume point: {resume_node}")
print()
result = asyncio.run(runner.run(context, session_state=session_state))
# Auto-inject user_id if the agent expects it but it's not provided
entry_input_keys = runner.graph.nodes[0].input_keys if runner.graph.nodes else []
if "user_id" in entry_input_keys and context.get("user_id") is None:
import os
context["user_id"] = os.environ.get("USER", "default_user")
if not args.quiet:
info = runner.info()
print(f"Agent: {info.name}")
print(f"Goal: {info.goal_name}")
print(f"Steps: {info.node_count}")
print(f"Input: {json.dumps(context)}")
print()
print("=" * 60)
print("Executing agent...")
print("=" * 60)
print()
result = asyncio.run(runner.run(context, session_state=session_state))
# Format output
output = {
@@ -944,6 +746,17 @@ def cmd_dispatch(args: argparse.Namespace) -> int:
if args.agents:
# Use specific agents
for agent_name in args.agents:
# Guard against full paths: if the name contains path separators
# (e.g. "exports/my_agent"), it will be doubled with agents_dir
agent_name_path = Path(agent_name)
if len(agent_name_path.parts) > 1:
print(
f"Error: --agents expects agent names, not paths. "
f"Use: --agents {agent_name_path.name} "
f"instead of --agents {agent_name}",
file=sys.stderr,
)
return 1
agent_path = agents_dir / agent_name
if not _is_valid_agent_dir(agent_path):
print(f"Agent not found: {agent_path}", file=sys.stderr)
@@ -1109,16 +922,12 @@ def _format_natural_language_to_json(
def cmd_shell(args: argparse.Namespace) -> int:
"""Start an interactive agent session."""
import logging
from framework.credentials.models import CredentialError
from framework.observability import configure_logging
from framework.runner import AgentRunner
# Configure logging to show runtime visibility
logging.basicConfig(
level=logging.INFO,
format="%(message)s", # Simple format for clean output
)
configure_logging(level="INFO")
agents_dir = Path(args.agents_dir)
@@ -1349,75 +1158,6 @@ def _get_framework_agents_dir() -> Path:
return Path(__file__).resolve().parent.parent / "agents"
def _launch_agent_tui(
agent_path: str | Path,
model: str | None = None,
) -> int:
"""Load an agent and launch the TUI. Shared by cmd_tui and cmd_code."""
from framework.credentials.models import CredentialError
from framework.runner import AgentRunner
from framework.tui.app import AdenTUI
async def run_with_tui():
# AgentRunner handles credential setup interactively when stdin is a TTY.
try:
runner = AgentRunner.load(
agent_path,
model=model,
)
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
except Exception as e:
print(f"Error loading agent: {e}")
return
if runner._agent_runtime is None:
try:
runner._setup()
except CredentialError as e:
print(f"\n{e}", file=sys.stderr)
return
if runner._agent_runtime and not runner._agent_runtime.is_running:
await runner._agent_runtime.start()
app = AdenTUI(runner._agent_runtime)
try:
await app.run_async()
except Exception as e:
import traceback
traceback.print_exc()
print(f"TUI error: {e}")
await runner.cleanup_async()
asyncio.run(run_with_tui())
print("TUI session ended.")
return 0
def cmd_tui(args: argparse.Namespace) -> int:
"""Launch the interactive TUI dashboard with in-app agent picker."""
import logging
logging.basicConfig(level=logging.WARNING, format="%(message)s")
from framework.tui.app import AdenTUI
async def run_tui():
app = AdenTUI(
model=args.model,
)
await app.run_async()
asyncio.run(run_tui())
print("TUI session ended.")
return 0
def _extract_python_agent_metadata(agent_path: Path) -> tuple[str, str]:
"""Extract name and description from a Python-based agent's config.py.
@@ -1770,56 +1510,6 @@ def _interactive_multi(agents_dir: Path) -> int:
return 0
def cmd_sessions_list(args: argparse.Namespace) -> int:
"""List agent sessions."""
print("⚠ Sessions list command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Status filter: {args.status}")
print(f"Has checkpoints: {args.has_checkpoints}")
return 1
def cmd_sessions_show(args: argparse.Namespace) -> int:
"""Show detailed session information."""
print("⚠ Session show command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_sessions_checkpoints(args: argparse.Namespace) -> int:
"""List checkpoints for a session."""
print("⚠ Session checkpoints command not yet implemented")
print("This will be available once checkpoint infrastructure is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_pause(args: argparse.Namespace) -> int:
"""Pause a running session."""
print("⚠ Pause command not yet implemented")
print("This will be available once executor pause integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
return 1
def cmd_resume(args: argparse.Namespace) -> int:
"""Resume a session from checkpoint."""
print("⚠ Resume command not yet implemented")
print("This will be available once checkpoint resume integration is complete.")
print(f"\nAgent: {args.agent_path}")
print(f"Session: {args.session_id}")
if args.checkpoint:
print(f"Checkpoint: {args.checkpoint}")
if args.tui:
print("Mode: TUI")
return 1
def cmd_setup_credentials(args: argparse.Namespace) -> int:
"""Interactive credential setup for an agent."""
from framework.credentials.setup import CredentialSetupSession
@@ -1935,18 +1625,18 @@ def _build_frontend() -> bool:
def cmd_serve(args: argparse.Namespace) -> int:
"""Start the HTTP API server."""
import logging
from aiohttp import web
_build_frontend()
from framework.observability import configure_logging
from framework.server.app import create_app
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
if getattr(args, "debug", False):
configure_logging(level="DEBUG")
else:
configure_logging(level="INFO")
model = getattr(args, "model", None)
app = create_app(model=model)
+13 -1
View File
@@ -68,6 +68,7 @@ class MCPClient:
self._read_stream = None
self._write_stream = None
self._stdio_context = None # Context manager for stdio_client
self._errlog_handle = None # Track errlog file handle for cleanup
self._http_client: httpx.Client | None = None
self._tools: dict[str, MCPTool] = {}
self._connected = False
@@ -200,7 +201,8 @@ class MCPClient:
if os.name == "nt":
errlog = sys.stderr
else:
errlog = open(os.devnull, "w") # noqa: SIM115
self._errlog_handle = open(os.devnull, "w")
errlog = self._errlog_handle
self._stdio_context = stdio_client(server_params, errlog=errlog)
(
self._read_stream,
@@ -475,6 +477,15 @@ class MCPClient:
finally:
self._stdio_context = None
# Third: close errlog file handle if we opened one
if self._errlog_handle is not None:
try:
self._errlog_handle.close()
except Exception as e:
logger.debug(f"Error closing errlog handle: {e}")
finally:
self._errlog_handle = None
def disconnect(self) -> None:
"""Disconnect from the MCP server."""
# Clean up persistent STDIO connection
@@ -545,6 +556,7 @@ class MCPClient:
self._write_stream = None
self._loop = None
self._loop_thread = None
self._errlog_handle = None
# Clean up HTTP client
if self._http_client:
+78 -67
View File
@@ -9,14 +9,13 @@ from datetime import UTC
from pathlib import Path
from typing import TYPE_CHECKING, Any
from framework.config import get_hive_config, get_preferred_model
from framework.config import get_hive_config, get_max_context_tokens, get_preferred_model
from framework.credentials.validation import (
ensure_credential_key_env as _ensure_credential_key_env,
)
from framework.graph import Goal
from framework.graph.edge import (
DEFAULT_MAX_TOKENS,
AsyncEntryPointSpec,
EdgeCondition,
EdgeSpec,
GraphSpec,
@@ -517,6 +516,41 @@ def get_codex_account_id() -> str | None:
return None
# ---------------------------------------------------------------------------
# Kimi Code subscription token helpers
# ---------------------------------------------------------------------------
def get_kimi_code_token() -> str | None:
"""Get the API key from a Kimi Code CLI installation.
Reads the API key from ``~/.kimi/config.toml``, which is created when
the user runs ``kimi /login`` in the Kimi Code CLI.
Returns:
The API key if available, None otherwise.
"""
import tomllib
config_path = Path.home() / ".kimi" / "config.toml"
if not config_path.exists():
return None
try:
with open(config_path, "rb") as f:
config = tomllib.load(f)
providers = config.get("providers", {})
# kimi-cli stores credentials under providers.kimi-for-coding
for provider_cfg in providers.values():
if isinstance(provider_cfg, dict):
key = provider_cfg.get("api_key")
if key:
return key
except Exception:
pass
return None
@dataclass
class AgentInfo:
"""Information about an exported agent."""
@@ -535,9 +569,6 @@ class AgentInfo:
constraints: list[dict]
required_tools: list[str]
has_tools_module: bool
# Multi-entry-point support
async_entry_points: list[dict] = field(default_factory=list)
is_multi_entry_point: bool = False
@dataclass
@@ -595,22 +626,6 @@ def load_agent_export(data: str | dict) -> tuple[GraphSpec, Goal]:
)
edges.append(edge)
# Build AsyncEntryPointSpec objects for multi-entry-point support
async_entry_points = []
for aep_data in graph_data.get("async_entry_points", []):
async_entry_points.append(
AsyncEntryPointSpec(
id=aep_data["id"],
name=aep_data.get("name", aep_data["id"]),
entry_node=aep_data["entry_node"],
trigger_type=aep_data.get("trigger_type", "manual"),
trigger_config=aep_data.get("trigger_config", {}),
isolation_level=aep_data.get("isolation_level", "shared"),
priority=aep_data.get("priority", 0),
max_concurrent=aep_data.get("max_concurrent", 10),
)
)
# Build GraphSpec
graph = GraphSpec(
id=graph_data.get("id", "agent-graph"),
@@ -618,7 +633,6 @@ def load_agent_export(data: str | dict) -> tuple[GraphSpec, Goal]:
version=graph_data.get("version", "1.0.0"),
entry_node=graph_data.get("entry_node", ""),
entry_points=graph_data.get("entry_points", {}), # Support pause/resume architecture
async_entry_points=async_entry_points, # Support multi-entry-point agents
terminal_nodes=graph_data.get("terminal_nodes", []),
pause_nodes=graph_data.get("pause_nodes", []), # Support pause/resume architecture
nodes=nodes,
@@ -770,8 +784,6 @@ class AgentRunner:
# AgentRuntime — unified execution path for all agents
self._agent_runtime: AgentRuntime | None = None
self._uses_async_entry_points = self.graph.has_async_entry_points()
# Pre-load validation: structural checks + credentials.
# Fails fast with actionable guidance — no MCP noise on screen.
run_preload_validation(
@@ -891,10 +903,32 @@ class AgentRunner:
if agent_config and hasattr(agent_config, "max_tokens"):
max_tokens = agent_config.max_tokens
logger.info(
"Agent default_config overrides max_tokens: %d "
"(configuration.json value ignored)",
max_tokens,
)
else:
hive_config = get_hive_config()
max_tokens = hive_config.get("llm", {}).get("max_tokens", DEFAULT_MAX_TOKENS)
# Resolve max_context_tokens with priority:
# 1. agent loop_config["max_context_tokens"] (explicit, wins silently)
# 2. agent default_config.max_context_tokens (logged)
# 3. configuration.json llm.max_context_tokens
# 4. hardcoded default (32_000)
agent_loop_config: dict = dict(getattr(agent_module, "loop_config", {}))
if "max_context_tokens" not in agent_loop_config:
if agent_config and hasattr(agent_config, "max_context_tokens"):
agent_loop_config["max_context_tokens"] = agent_config.max_context_tokens
logger.info(
"Agent default_config overrides max_context_tokens: %d"
" (configuration.json value ignored)",
agent_config.max_context_tokens,
)
else:
agent_loop_config["max_context_tokens"] = get_max_context_tokens()
# Read intro_message from agent metadata (shown on TUI load)
agent_metadata = getattr(agent_module, "metadata", None)
intro_message = ""
@@ -908,13 +942,12 @@ class AgentRunner:
"version": "1.0.0",
"entry_node": getattr(agent_module, "entry_node", nodes[0].id),
"entry_points": getattr(agent_module, "entry_points", {}),
"async_entry_points": getattr(agent_module, "async_entry_points", []),
"terminal_nodes": getattr(agent_module, "terminal_nodes", []),
"pause_nodes": getattr(agent_module, "pause_nodes", []),
"nodes": nodes,
"edges": edges,
"max_tokens": max_tokens,
"loop_config": getattr(agent_module, "loop_config", {}),
"loop_config": agent_loop_config,
}
# Only pass optional fields if explicitly defined by the agent module
conversation_mode = getattr(agent_module, "conversation_mode", None)
@@ -1104,6 +1137,7 @@ class AgentRunner:
llm_config = config.get("llm", {})
use_claude_code = llm_config.get("use_claude_code_subscription", False)
use_codex = llm_config.get("use_codex_subscription", False)
use_kimi_code = llm_config.get("use_kimi_code_subscription", False)
api_base = llm_config.get("api_base")
api_key = None
@@ -1119,6 +1153,12 @@ class AgentRunner:
if not api_key:
print("Warning: Codex subscription configured but no token found.")
print("Run 'codex' to authenticate, then try again.")
elif use_kimi_code:
# Get API key from Kimi Code CLI config (~/.kimi/config.toml)
api_key = get_kimi_code_token()
if not api_key:
print("Warning: Kimi Code subscription configured but no key found.")
print("Run 'kimi /login' to authenticate, then try again.")
if api_key and use_claude_code:
# Use litellm's built-in Anthropic OAuth support.
@@ -1149,6 +1189,14 @@ class AgentRunner:
store=False,
allowed_openai_params=["store"],
)
elif api_key and use_kimi_code:
# Kimi Code subscription uses the Kimi coding API (OpenAI-compatible).
# The api_base is set automatically by LiteLLMProvider for kimi/ models.
self._llm = LiteLLMProvider(
model=self.model,
api_key=api_key,
api_base=api_base,
)
else:
# Local models (e.g. Ollama) don't need an API key
if self._is_local_model(self.model):
@@ -1314,6 +1362,8 @@ class AgentRunner:
return "TOGETHER_API_KEY"
elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
return "MINIMAX_API_KEY"
elif model_lower.startswith("kimi/"):
return "KIMI_API_KEY"
else:
# Default: assume OpenAI-compatible
return "OPENAI_API_KEY"
@@ -1334,6 +1384,8 @@ class AgentRunner:
cred_id = "anthropic"
elif model_lower.startswith("minimax/") or model_lower.startswith("minimax-"):
cred_id = "minimax"
elif model_lower.startswith("kimi/"):
cred_id = "kimi"
# Add more mappings as providers are added to LLM_CREDENTIALS
if cred_id is None:
@@ -1375,21 +1427,7 @@ class AgentRunner:
event_bus=None,
) -> None:
"""Set up multi-entry-point execution using AgentRuntime."""
# Convert AsyncEntryPointSpec to EntryPointSpec for AgentRuntime
entry_points = []
for async_ep in self.graph.async_entry_points:
ep = EntryPointSpec(
id=async_ep.id,
name=async_ep.name,
entry_node=async_ep.entry_node,
trigger_type=async_ep.trigger_type,
trigger_config=async_ep.trigger_config,
isolation_level=async_ep.isolation_level,
priority=async_ep.priority,
max_concurrent=async_ep.max_concurrent,
max_resurrections=async_ep.max_resurrections,
)
entry_points.append(ep)
# Always create a primary entry point for the graph's entry node.
# For multi-entry-point agents this ensures the primary path (e.g.
@@ -1696,19 +1734,6 @@ class AgentRunner:
for edge in self.graph.edges
]
# Build async entry points info
async_entry_points_info = [
{
"id": ep.id,
"name": ep.name,
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"isolation_level": ep.isolation_level,
"max_concurrent": ep.max_concurrent,
}
for ep in self.graph.async_entry_points
]
return AgentInfo(
name=self.graph.id,
description=self.graph.description,
@@ -1735,8 +1760,6 @@ class AgentRunner:
],
required_tools=sorted(required_tools),
has_tools_module=(self.agent_path / "tools.py").exists(),
async_entry_points=async_entry_points_info,
is_multi_entry_point=self._uses_async_entry_points,
)
def validate(self) -> ValidationResult:
@@ -2051,18 +2074,6 @@ Respond with JSON only:
trigger_type="manual",
isolation_level="shared",
)
for aep in runner.graph.async_entry_points:
entry_points[aep.id] = EntryPointSpec(
id=aep.id,
name=aep.name,
entry_node=aep.entry_node,
trigger_type=aep.trigger_type,
trigger_config=aep.trigger_config,
isolation_level=aep.isolation_level,
priority=aep.priority,
max_concurrent=aep.max_concurrent,
)
await runtime.add_graph(
graph_id=gid,
graph=runner.graph,
+2 -2
View File
@@ -454,11 +454,11 @@ An agent has requested handoff to the Hive Coder (via the `escalate` synthetic t
## Worker Health Monitoring
These events form the **judge → queen → operator** escalation pipeline.
These events form the **queen → operator** escalation pipeline.
### `worker_escalation_ticket`
The Worker Health Judge has detected a degradation pattern and is escalating to the Queen.
A worker degradation pattern has been detected and is being escalated to the Queen.
| Data Field | Type | Description |
| ---------- | ------ | ------------------------------------ |
+10 -3
View File
@@ -8,6 +8,7 @@ while preserving the goal-driven approach.
import asyncio
import logging
import time
import uuid
from collections.abc import Callable
from dataclasses import dataclass, field
from datetime import datetime
@@ -822,7 +823,8 @@ class AgentRuntime:
if stream is None:
raise ValueError(f"Entry point '{entry_point_id}' not found")
return await stream.execute(input_data, correlation_id, session_state)
run_id = uuid.uuid4().hex[:12]
return await stream.execute(input_data, correlation_id, session_state, run_id=run_id)
async def trigger_and_wait(
self,
@@ -1359,8 +1361,8 @@ class AgentRuntime:
allowed_keys = set(entry_node.input_keys)
# Search primary graph's streams for an active session.
# Skip isolated streams (e.g. health judge) — they have their own
# session directories and must never be used as a shared session.
# Skip isolated streams — they have their own session directories
# and must never be used as a shared session.
all_streams: list[tuple[str, ExecutionStream]] = []
for _gid, reg in self._graphs.items():
for ep_id, stream in reg.streams.items():
@@ -1531,6 +1533,11 @@ class AgentRuntime:
for executor in stream._active_executors.values():
for node_id, node in executor.node_registry.items():
if getattr(node, "_awaiting_input", False):
# Skip escalation receivers — those are handled
# by the queen via inject_worker_message(), not
# by the user directly.
if ":escalation:" in node_id:
continue
return node_id, graph_id
return None, None
+5 -5
View File
@@ -1,4 +1,4 @@
"""EscalationTicket — structured schema for worker health judge escalations."""
"""EscalationTicket — structured schema for worker health escalations."""
from __future__ import annotations
@@ -10,10 +10,10 @@ from pydantic import BaseModel, Field
class EscalationTicket(BaseModel):
"""Structured escalation report emitted by the Worker Health Judge.
"""Structured escalation report for worker health monitoring.
The judge must fill every field before calling emit_escalation_ticket.
Pydantic validation rejects partial tickets, preventing impulsive escalation.
All fields must be filled before calling emit_escalation_ticket.
Pydantic validation rejects partial tickets.
"""
ticket_id: str = Field(default_factory=lambda: str(uuid4()))
@@ -25,7 +25,7 @@ class EscalationTicket(BaseModel):
worker_node_id: str
worker_graph_id: str
# Problem characterization (filled by judge via LLM deliberation)
# Problem characterization
severity: Literal["low", "medium", "high", "critical"]
cause: str # Human-readable: "Node has produced 18 RETRY verdicts..."
judge_reasoning: str # Judge's own deliberation chain
+186 -6
View File
@@ -97,6 +97,7 @@ class EventType(StrEnum):
# Client I/O (client_facing=True nodes only)
CLIENT_OUTPUT_DELTA = "client_output_delta"
CLIENT_INPUT_REQUESTED = "client_input_requested"
CLIENT_INPUT_RECEIVED = "client_input_received"
# Internal node observability (client_facing=False nodes)
NODE_INTERNAL_OUTPUT = "node_internal_output"
@@ -104,7 +105,7 @@ class EventType(StrEnum):
NODE_STALLED = "node_stalled"
NODE_TOOL_DOOM_LOOP = "node_tool_doom_loop"
# Judge decisions
# Judge decisions (implicit judge in event loop nodes)
JUDGE_VERDICT = "judge_verdict"
# Output tracking
@@ -126,7 +127,7 @@ class EventType(StrEnum):
# Escalation (agent requests handoff to queen)
ESCALATION_REQUESTED = "escalation_requested"
# Worker health monitoring (judge → queen → operator)
# Worker health monitoring
WORKER_ESCALATION_TICKET = "worker_escalation_ticket"
QUEEN_INTERVENTION_REQUESTED = "queen_intervention_requested"
@@ -137,6 +138,12 @@ class EventType(StrEnum):
WORKER_LOADED = "worker_loaded"
CREDENTIALS_REQUIRED = "credentials_required"
# Draft graph (planning phase — lightweight graph preview)
DRAFT_GRAPH_UPDATED = "draft_graph_updated"
# Flowchart map updated (after reconciliation with runtime graph)
FLOWCHART_MAP_UPDATED = "flowchart_map_updated"
# Queen phase changes (building <-> staging <-> running)
QUEEN_PHASE_CHANGED = "queen_phase_changed"
@@ -146,6 +153,13 @@ class EventType(StrEnum):
# Subagent reports (one-way progress updates from sub-agents)
SUBAGENT_REPORT = "subagent_report"
# Trigger lifecycle (queen-level triggers / heartbeats)
TRIGGER_AVAILABLE = "trigger_available"
TRIGGER_ACTIVATED = "trigger_activated"
TRIGGER_DEACTIVATED = "trigger_deactivated"
TRIGGER_FIRED = "trigger_fired"
TRIGGER_REMOVED = "trigger_removed"
@dataclass
class AgentEvent:
@@ -159,10 +173,11 @@ class AgentEvent:
timestamp: datetime = field(default_factory=datetime.now)
correlation_id: str | None = None # For tracking related events
graph_id: str | None = None # Which graph emitted this event (multi-graph sessions)
run_id: str | None = None # Unique ID per trigger() invocation — used for run dividers
def to_dict(self) -> dict:
"""Convert to dictionary for serialization."""
return {
d = {
"type": self.type.value,
"stream_id": self.stream_id,
"node_id": self.node_id,
@@ -172,6 +187,9 @@ class AgentEvent:
"correlation_id": self.correlation_id,
"graph_id": self.graph_id,
}
if self.run_id is not None:
d["run_id"] = self.run_id
return d
# Type for event handlers
@@ -240,6 +258,127 @@ class EventBus:
self._semaphore = asyncio.Semaphore(max_concurrent_handlers)
self._subscription_counter = 0
self._lock = asyncio.Lock()
# Per-session persistent event log (always-on, survives restarts)
self._session_log: IO[str] | None = None
self._session_log_iteration_offset: int = 0
# Accumulator for client_output_delta snapshots — flushed on llm_turn_complete.
# Key: (stream_id, node_id, execution_id, iteration) → latest AgentEvent
self._pending_output_snapshots: dict[tuple, AgentEvent] = {}
def set_session_log(self, path: Path, *, iteration_offset: int = 0) -> None:
"""Enable per-session event persistence to a JSONL file.
Called once when the queen starts so that all events survive server
restarts and can be replayed to reconstruct the frontend state.
``iteration_offset`` is added to the ``iteration`` field in logged
events so that cold-resumed sessions produce monotonically increasing
iteration values preventing frontend message ID collisions between
the original run and resumed runs.
"""
if self._session_log is not None:
try:
self._session_log.close()
except Exception:
pass
path.parent.mkdir(parents=True, exist_ok=True)
self._session_log = open(path, "a", encoding="utf-8") # noqa: SIM115
self._session_log_iteration_offset = iteration_offset
logger.info("Session event log → %s (iteration_offset=%d)", path, iteration_offset)
def close_session_log(self) -> None:
"""Close the per-session event log file."""
# Flush any pending output snapshots before closing
self._flush_pending_snapshots()
if self._session_log is not None:
try:
self._session_log.close()
except Exception:
pass
self._session_log = None
# Event types that are high-frequency streaming deltas — accumulated rather
# than written individually to the session log.
_STREAMING_DELTA_TYPES = frozenset(
{
EventType.CLIENT_OUTPUT_DELTA,
EventType.LLM_TEXT_DELTA,
EventType.LLM_REASONING_DELTA,
}
)
def _write_session_log_event(self, event: AgentEvent) -> None:
"""Write an event to the per-session log with streaming coalescing.
Streaming deltas (client_output_delta, llm_text_delta) are accumulated
in memory. When llm_turn_complete fires, any pending snapshots for that
(stream_id, node_id, execution_id) are flushed as single consolidated
events before the turn-complete event itself is written.
Note: iteration offset is already applied in publish() before this is
called, so events here already have correct iteration values.
"""
if self._session_log is None:
return
if event.type in self._STREAMING_DELTA_TYPES:
# Accumulate — keep only the latest event (which carries the full snapshot)
key = (
event.stream_id,
event.node_id,
event.execution_id,
event.data.get("iteration"),
)
self._pending_output_snapshots[key] = event
return
# On turn-complete, flush accumulated snapshots for this stream first
if event.type == EventType.LLM_TURN_COMPLETE:
self._flush_pending_snapshots(
stream_id=event.stream_id,
node_id=event.node_id,
execution_id=event.execution_id,
)
line = json.dumps(event.to_dict(), default=str)
self._session_log.write(line + "\n")
self._session_log.flush()
def _flush_pending_snapshots(
self,
stream_id: str | None = None,
node_id: str | None = None,
execution_id: str | None = None,
) -> None:
"""Flush accumulated streaming snapshots to the session log.
When called with filters, only matching entries are flushed.
When called without filters (e.g. on close), everything is flushed.
"""
if self._session_log is None or not self._pending_output_snapshots:
return
to_flush: list[tuple] = []
for key, _evt in self._pending_output_snapshots.items():
if stream_id is not None:
k_stream, k_node, k_exec, _ = key
if k_stream != stream_id or k_node != node_id or k_exec != execution_id:
continue
to_flush.append(key)
for key in to_flush:
evt = self._pending_output_snapshots.pop(key)
try:
line = json.dumps(evt.to_dict(), default=str)
self._session_log.write(line + "\n")
except Exception:
pass
if to_flush:
try:
self._session_log.flush()
except Exception:
pass
def subscribe(
self,
@@ -305,6 +444,19 @@ class EventBus:
Args:
event: Event to publish
"""
# Apply iteration offset at the source so ALL consumers (SSE subscribers,
# event history, session log) see the same monotonically increasing
# iteration values. Without this, live SSE would use raw iterations
# while events.jsonl would use offset iterations, causing ID collisions
# on the frontend when replaying after cold resume.
if (
self._session_log_iteration_offset
and isinstance(event.data, dict)
and "iteration" in event.data
):
offset = self._session_log_iteration_offset
event.data = {**event.data, "iteration": event.data["iteration"] + offset}
# Add to history
async with self._lock:
self._event_history.append(event)
@@ -325,6 +477,15 @@ class EventBus:
except Exception:
pass # never break event delivery
# Per-session persistent log (always-on when set_session_log was called).
# Streaming deltas are coalesced: client_output_delta and llm_text_delta
# are accumulated and flushed as a single snapshot event on llm_turn_complete.
if self._session_log is not None:
try:
self._write_session_log_event(event)
except Exception:
pass # never break event delivery
# Find matching subscriptions
matching_handlers: list[EventHandler] = []
@@ -385,6 +546,7 @@ class EventBus:
execution_id: str,
input_data: dict[str, Any] | None = None,
correlation_id: str | None = None,
run_id: str | None = None,
) -> None:
"""Emit execution started event."""
await self.publish(
@@ -394,6 +556,7 @@ class EventBus:
execution_id=execution_id,
data={"input": input_data or {}},
correlation_id=correlation_id,
run_id=run_id,
)
)
@@ -403,6 +566,7 @@ class EventBus:
execution_id: str,
output: dict[str, Any] | None = None,
correlation_id: str | None = None,
run_id: str | None = None,
) -> None:
"""Emit execution completed event."""
await self.publish(
@@ -412,6 +576,7 @@ class EventBus:
execution_id=execution_id,
data={"output": output or {}},
correlation_id=correlation_id,
run_id=run_id,
)
)
@@ -421,6 +586,7 @@ class EventBus:
execution_id: str,
error: str,
correlation_id: str | None = None,
run_id: str | None = None,
) -> None:
"""Emit execution failed event."""
await self.publish(
@@ -430,6 +596,7 @@ class EventBus:
execution_id=execution_id,
data={"error": error},
correlation_id=correlation_id,
run_id=run_id,
)
)
@@ -521,15 +688,19 @@ class EventBus:
node_id: str,
iteration: int,
execution_id: str | None = None,
extra_data: dict[str, Any] | None = None,
) -> None:
"""Emit node loop iteration event."""
data: dict[str, Any] = {"iteration": iteration}
if extra_data:
data.update(extra_data)
await self.publish(
AgentEvent(
type=EventType.NODE_LOOP_ITERATION,
stream_id=stream_id,
node_id=node_id,
execution_id=execution_id,
data={"iteration": iteration},
data=data,
)
)
@@ -616,6 +787,7 @@ class EventBus:
model: str,
input_tokens: int,
output_tokens: int,
cached_tokens: int = 0,
execution_id: str | None = None,
iteration: int | None = None,
) -> None:
@@ -625,6 +797,7 @@ class EventBus:
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cached_tokens": cached_tokens,
}
if iteration is not None:
data["iteration"] = iteration
@@ -722,16 +895,23 @@ class EventBus:
prompt: str = "",
execution_id: str | None = None,
options: list[str] | None = None,
questions: list[dict] | None = None,
) -> None:
"""Emit client input requested event (client_facing=True nodes).
Args:
options: Optional predefined choices for the user (1-3 items).
The frontend appends an "Other" free-text option automatically.
The frontend appends an "Other" free-text option
automatically.
questions: Optional list of question dicts for multi-question
batches (from ask_user_multiple). Each dict has id,
prompt, and optional options.
"""
data: dict[str, Any] = {"prompt": prompt}
if options:
data["options"] = options
if questions:
data["questions"] = questions
await self.publish(
AgentEvent(
type=EventType.CLIENT_INPUT_REQUESTED,
@@ -994,7 +1174,7 @@ class EventBus:
ticket: dict,
execution_id: str | None = None,
) -> None:
"""Emitted by health judge when worker shows a degradation pattern."""
"""Emitted when worker shows a degradation pattern."""
await self.publish(
AgentEvent(
type=EventType.WORKER_ESCALATION_TICKET,
+97 -7
View File
@@ -9,6 +9,7 @@ Each stream has:
import asyncio
import logging
import os
import time
import uuid
from collections import OrderedDict
@@ -126,6 +127,7 @@ class ExecutionContext:
input_data: dict[str, Any]
isolation_level: IsolationLevel
session_state: dict[str, Any] | None = None # For resuming from pause
run_id: str | None = None # Unique ID per trigger() invocation
started_at: datetime = field(default_factory=datetime.now)
completed_at: datetime | None = None
status: str = "pending" # pending, running, completed, failed, paused
@@ -240,6 +242,7 @@ class ExecutionStream:
self._active_executions: dict[str, ExecutionContext] = {}
self._execution_tasks: dict[str, asyncio.Task] = {}
self._active_executors: dict[str, GraphExecutor] = {}
self._cancel_reasons: dict[str, str] = {}
self._execution_results: OrderedDict[str, ExecutionResult] = OrderedDict()
self._execution_result_times: dict[str, float] = {}
self._completion_events: dict[str, asyncio.Event] = {}
@@ -423,11 +426,36 @@ class ExecutionStream:
return True
return False
async def inject_trigger(
self,
node_id: str,
trigger: Any,
) -> bool:
"""Inject a trigger event into a running queen EventLoopNode.
Searches active executors for a node matching ``node_id`` and calls
its ``inject_trigger()`` method to wake the queen.
Args:
node_id: The queen EventLoopNode ID.
trigger: A ``TriggerEvent`` instance (typed as Any to avoid
circular imports with graph layer).
Returns True if the trigger was delivered, False otherwise.
"""
for executor in self._active_executors.values():
node = executor.node_registry.get(node_id)
if node is not None and hasattr(node, "inject_trigger"):
await node.inject_trigger(trigger)
return True
return False
async def execute(
self,
input_data: dict[str, Any],
correlation_id: str | None = None,
session_state: dict[str, Any] | None = None,
run_id: str | None = None,
) -> str:
"""
Queue an execution and return its ID.
@@ -438,6 +466,7 @@ class ExecutionStream:
input_data: Input data for this execution
correlation_id: Optional ID to correlate related executions
session_state: Optional session state to resume from (with paused_at, memory)
run_id: Unique ID for this trigger invocation (for run dividers)
Returns:
Execution ID for tracking
@@ -464,7 +493,7 @@ class ExecutionStream:
node.signal_shutdown()
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
await self.cancel_execution(eid)
await self.cancel_execution(eid, reason="Restarted with new execution")
# When resuming, reuse the original session ID so the execution
# continues in the same session directory instead of creating a new one.
@@ -498,6 +527,7 @@ class ExecutionStream:
input_data=input_data,
isolation_level=self.entry_spec.get_isolation_level(),
session_state=session_state,
run_id=run_id,
)
async with self._lock:
@@ -573,7 +603,9 @@ class ExecutionStream:
execution_id=execution_id,
input_data=ctx.input_data,
correlation_id=ctx.correlation_id,
run_id=ctx.run_id,
)
self._write_run_event(execution_id, ctx.run_id, "run_started")
# Create execution-scoped memory
self._state_manager.create_memory(
@@ -738,6 +770,7 @@ class ExecutionStream:
execution_id=execution_id,
output=result.output,
correlation_id=ctx.correlation_id,
run_id=ctx.run_id,
)
elif result.paused_at:
# The executor returns paused_at on CancelledError but
@@ -755,8 +788,22 @@ class ExecutionStream:
execution_id=execution_id,
error=result.error or "Unknown error",
correlation_id=ctx.correlation_id,
run_id=ctx.run_id,
)
# Write run event for historical restoration
if result.success:
self._write_run_event(execution_id, ctx.run_id, "run_completed")
elif result.paused_at:
self._write_run_event(execution_id, ctx.run_id, "run_paused")
else:
self._write_run_event(
execution_id,
ctx.run_id,
"run_failed",
{"error": result.error or "Unknown error"},
)
logger.debug(f"Execution {execution_id} completed: success={result.success}")
except asyncio.CancelledError:
@@ -801,22 +848,25 @@ class ExecutionStream:
# Emit SSE event so the frontend knows the execution stopped.
# The executor does NOT emit on CancelledError, so there is no
# risk of double-emitting.
cancel_reason = self._cancel_reasons.pop(execution_id, "Execution cancelled")
if self._scoped_event_bus:
if has_result and result.paused_at:
await self._scoped_event_bus.emit_execution_paused(
stream_id=self.stream_id,
node_id=result.paused_at,
reason="Execution cancelled",
reason=cancel_reason,
execution_id=execution_id,
)
else:
await self._scoped_event_bus.emit_execution_failed(
stream_id=self.stream_id,
execution_id=execution_id,
error="Execution cancelled",
error=cancel_reason,
correlation_id=ctx.correlation_id,
run_id=ctx.run_id,
)
self._write_run_event(execution_id, ctx.run_id, "run_cancelled")
# Don't re-raise - we've handled it and saved state
except Exception as e:
@@ -853,7 +903,9 @@ class ExecutionStream:
execution_id=execution_id,
error=str(e),
correlation_id=ctx.correlation_id,
run_id=ctx.run_id,
)
self._write_run_event(execution_id, ctx.run_id, "run_failed", {"error": str(e)})
finally:
# Clean up state
@@ -869,6 +921,36 @@ class ExecutionStream:
self._completion_events.pop(execution_id, None)
self._execution_tasks.pop(execution_id, None)
def _write_run_event(
self,
execution_id: str,
run_id: str | None,
event: str,
extra: dict[str, Any] | None = None,
) -> None:
"""Append a run lifecycle event to runs.jsonl for historical restoration."""
if not self._session_store or not run_id:
return
import json as _json
session_dir = self._session_store.get_session_path(execution_id)
runs_file = session_dir / "runs.jsonl"
now = datetime.now()
record = {
"run_id": run_id,
"event": event,
"timestamp": now.isoformat(),
"created_at": now.timestamp(),
}
if extra:
record.update(extra)
try:
runs_file.parent.mkdir(parents=True, exist_ok=True)
with open(runs_file, "a", encoding="utf-8") as f:
f.write(_json.dumps(record) + "\n")
except OSError:
pass # Non-critical — don't break execution
async def _write_session_state(
self,
execution_id: str,
@@ -961,6 +1043,9 @@ class ExecutionStream:
if error:
state.result.error = error
# Stamp the owning process ID for cross-process stale detection
state.pid = os.getpid()
# Write state.json
await self._session_store.write_state(execution_id, state)
logger.debug(f"Wrote state.json for session {execution_id} (status={status})")
@@ -972,8 +1057,8 @@ class ExecutionStream:
def _create_modified_graph(self) -> "GraphSpec":
"""Create a graph with the entry point overridden.
Preserves the original graph's entry_points and async_entry_points
so that validation correctly considers ALL entry nodes reachable.
Preserves the original graph's entry_points so that validation
correctly considers ALL entry nodes reachable.
Each stream only executes from its own entry_node, but the full
graph must validate with all entry points accounted for.
"""
@@ -998,7 +1083,6 @@ class ExecutionStream:
version=self.graph.version,
entry_node=self.entry_spec.entry_node, # Use our entry point
entry_points=merged_entry_points,
async_entry_points=self.graph.async_entry_points,
terminal_nodes=self.graph.terminal_nodes,
pause_nodes=self.graph.pause_nodes,
nodes=self.graph.nodes,
@@ -1054,18 +1138,24 @@ class ExecutionStream:
"""Get execution context."""
return self._active_executions.get(execution_id)
async def cancel_execution(self, execution_id: str) -> bool:
async def cancel_execution(self, execution_id: str, *, reason: str | None = None) -> bool:
"""
Cancel a running execution.
Args:
execution_id: Execution to cancel
reason: Human-readable reason for the cancellation (e.g.
"Stopped by queen", "User requested pause"). If not
provided, defaults to "Execution cancelled".
Returns:
True if cancelled, False if not found
"""
task = self._execution_tasks.get(execution_id)
if task and not task.done():
# Store the reason so the CancelledError handler can use it
# when emitting the pause/fail event.
self._cancel_reasons[execution_id] = reason or "Execution cancelled"
task.cancel()
# Wait briefly for the task to finish. Don't block indefinitely —
# the task may be stuck in a long LLM API call that doesn't
@@ -17,7 +17,7 @@ from pathlib import Path
import pytest
from framework.graph import Goal
from framework.graph.edge import AsyncEntryPointSpec, EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.edge import EdgeCondition, EdgeSpec, GraphSpec
from framework.graph.goal import Constraint, SuccessCriterion
from framework.graph.node import NodeSpec
from framework.runtime.agent_runtime import AgentRuntime, create_agent_runtime
@@ -101,30 +101,12 @@ def sample_graph():
),
]
async_entry_points = [
AsyncEntryPointSpec(
id="webhook",
name="Webhook Handler",
entry_node="process-webhook",
trigger_type="webhook",
isolation_level="shared",
),
AsyncEntryPointSpec(
id="api",
name="API Handler",
entry_node="process-api",
trigger_type="api",
isolation_level="shared",
),
]
return GraphSpec(
id="test-graph",
goal_id="test-goal",
version="1.0.0",
entry_node="process-webhook",
entry_points={"start": "process-webhook"},
async_entry_points=async_entry_points,
terminal_nodes=["complete"],
pause_nodes=[],
nodes=nodes,
@@ -504,108 +486,6 @@ class TestAgentRuntime:
# === GraphSpec Validation Tests ===
class TestGraphSpecValidation:
"""Tests for GraphSpec with async_entry_points."""
def test_has_async_entry_points(self, sample_graph):
"""Test checking for async entry points."""
assert sample_graph.has_async_entry_points() is True
# Graph without async entry points
simple_graph = GraphSpec(
id="simple",
goal_id="goal",
entry_node="start",
nodes=[],
edges=[],
)
assert simple_graph.has_async_entry_points() is False
def test_get_async_entry_point(self, sample_graph):
"""Test getting async entry point by ID."""
ep = sample_graph.get_async_entry_point("webhook")
assert ep is not None
assert ep.id == "webhook"
assert ep.entry_node == "process-webhook"
ep_not_found = sample_graph.get_async_entry_point("nonexistent")
assert ep_not_found is None
def test_validate_async_entry_points(self):
"""Test validation catches async entry point errors."""
nodes = [
NodeSpec(
id="valid-node",
name="Valid Node",
description="A valid node",
node_type="event_loop",
input_keys=[],
output_keys=[],
),
]
# Invalid entry node
graph = GraphSpec(
id="test",
goal_id="goal",
entry_node="valid-node",
async_entry_points=[
AsyncEntryPointSpec(
id="invalid",
name="Invalid",
entry_node="nonexistent-node",
trigger_type="webhook",
),
],
nodes=nodes,
edges=[],
)
errors = graph.validate()["errors"]
assert any("nonexistent-node" in e for e in errors)
# Invalid isolation level
graph2 = GraphSpec(
id="test",
goal_id="goal",
entry_node="valid-node",
async_entry_points=[
AsyncEntryPointSpec(
id="bad-isolation",
name="Bad Isolation",
entry_node="valid-node",
trigger_type="webhook",
isolation_level="invalid",
),
],
nodes=nodes,
edges=[],
)
errors2 = graph2.validate()["errors"]
assert any("isolation_level" in e for e in errors2)
# Invalid trigger type
graph3 = GraphSpec(
id="test",
goal_id="goal",
entry_node="valid-node",
async_entry_points=[
AsyncEntryPointSpec(
id="bad-trigger",
name="Bad Trigger",
entry_node="valid-node",
trigger_type="invalid_trigger",
),
],
nodes=nodes,
edges=[],
)
errors3 = graph3.validate()["errors"]
assert any("trigger_type" in e for e in errors3)
# === Integration Tests ===
@@ -483,7 +483,6 @@ class TestEventDrivenEntryPoints:
version="1.0.0",
entry_node="process-event",
entry_points={"start": "process-event"},
async_entry_points=[],
terminal_nodes=[],
pause_nodes=[],
nodes=nodes,
+22
View File
@@ -0,0 +1,22 @@
"""Trigger definitions for queen-level heartbeats (timers, webhooks)."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any
@dataclass
class TriggerDefinition:
"""A registered trigger that can be activated on the queen runtime.
Trigger *definitions* come from the worker's ``triggers.json``.
Activation state is per-session (persisted in ``SessionState.active_triggers``).
"""
id: str
trigger_type: str # "timer" | "webhook"
trigger_config: dict[str, Any] = field(default_factory=dict)
description: str = ""
task: str = ""
active: bool = False
+10
View File
@@ -134,6 +134,9 @@ class SessionState(BaseModel):
# Input data (for debugging/replay)
input_data: dict[str, Any] = Field(default_factory=dict)
# Process ID of the owning process (for cross-process stale session detection)
pid: int | None = None
# Isolation level (from ExecutionContext)
isolation_level: str = "shared"
@@ -141,6 +144,13 @@ class SessionState(BaseModel):
checkpoint_enabled: bool = False
latest_checkpoint_id: str | None = None
# Trigger activation state (IDs of triggers the queen/user turned on)
active_triggers: list[str] = Field(default_factory=list)
# Per-trigger task strings (user overrides, keyed by trigger ID)
trigger_tasks: dict[str, str] = Field(default_factory=dict)
# True after first successful worker execution (gates trigger delivery on restart)
worker_configured: bool = Field(default=False)
model_config = {"extra": "allow"}
@computed_field
-36
View File
@@ -1,36 +0,0 @@
"""Backward-compatibility shim.
The primary implementation is now in ``session_manager.py``.
This module re-exports ``SessionManager`` as ``AgentManager`` and
keeps ``AgentSlot`` for test compatibility.
"""
import asyncio
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from framework.server.session_manager import Session, SessionManager # noqa: F401
@dataclass
class AgentSlot:
"""Legacy data class — kept for test compatibility only.
New code should use ``Session`` from ``session_manager``.
"""
id: str
agent_path: Path
runner: Any
runtime: Any
info: Any
loaded_at: float
queen_executor: Any = None
queen_task: asyncio.Task | None = None
judge_task: asyncio.Task | None = None
escalation_sub: str | None = None
# Backward compat alias
AgentManager = SessionManager
+23
View File
@@ -94,6 +94,29 @@ def sessions_dir(session: Session) -> Path:
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
def cold_sessions_dir(session_id: str) -> Path | None:
"""Resolve the worker sessions directory from disk for a cold/stopped session.
Reads agent_path from the queen session's meta.json to find the agent name,
then returns ~/.hive/agents/{agent_name}/sessions/.
Returns None if meta.json is missing or has no agent_path.
"""
import json
meta_path = Path.home() / ".hive" / "queen" / "session" / session_id / "meta.json"
if not meta_path.exists():
return None
try:
meta = json.loads(meta_path.read_text(encoding="utf-8"))
agent_path = meta.get("agent_path")
if not agent_path:
return None
agent_name = Path(agent_path).name
return Path.home() / ".hive" / "agents" / agent_name / "sessions"
except (json.JSONDecodeError, OSError):
return None
# Allowed CORS origins (localhost on any port)
_CORS_ORIGINS = {"http://localhost", "http://127.0.0.1"}
+34 -6
View File
@@ -41,9 +41,7 @@ async def create_queen(
_QUEEN_STAGING_TOOLS,
_appendices,
_building_knowledge,
_gcu_building_section,
_planning_knowledge,
_shared_building_knowledge,
_queen_behavior_always,
_queen_behavior_building,
_queen_behavior_planning,
@@ -59,6 +57,7 @@ async def create_queen(
_queen_tools_planning,
_queen_tools_running,
_queen_tools_staging,
_shared_building_knowledge,
)
from framework.agents.queen.nodes.thinking_hook import select_expert_persona
from framework.graph.event_loop_node import HookContext, HookResult
@@ -91,6 +90,28 @@ async def create_queen(
phase_state = QueenPhaseState(phase=initial_phase, event_bus=session.event_bus)
session.phase_state = phase_state
# ---- Track ask rounds during planning ----------------------------
# Increment planning_ask_rounds each time the queen requests user
# input (ask_user or ask_user_multiple) while in the planning phase.
async def _track_planning_asks(event: AgentEvent) -> None:
if phase_state.phase != "planning":
return
# Only count explicit ask_user / ask_user_multiple calls, not
# auto-block (text-only turns emit CLIENT_INPUT_REQUESTED with
# an empty prompt and no options/questions).
data = event.data or {}
has_prompt = bool(data.get("prompt"))
has_questions = bool(data.get("questions"))
has_options = bool(data.get("options"))
if has_prompt or has_questions or has_options:
phase_state.planning_ask_rounds += 1
session.event_bus.subscribe(
[EventType.CLIENT_INPUT_REQUESTED],
_track_planning_asks,
filter_stream="queen",
)
# ---- Lifecycle tools (always registered) --------------------------
register_queen_lifecycle_tools(
queen_registry,
@@ -138,6 +159,11 @@ async def create_queen(
phase_state.staging_tools = [t for t in queen_tools if t.name in staging_names]
phase_state.running_tools = [t for t in queen_tools if t.name in running_names]
# ---- Cross-session memory ----------------------------------------
from framework.agents.queen.queen_memory import seed_if_missing
seed_if_missing()
# ---- Compose phase-specific prompts ------------------------------
_orig_node = _queen_graph.nodes[0]
@@ -145,7 +171,8 @@ async def create_queen(
worker_identity = (
"\n\n# Worker Profile\n"
"No worker agent loaded. You are operating independently.\n"
"Handle all tasks directly using your coding tools."
"Design or build the agent to solve the user's problem "
"according to your current phase."
)
_planning_body = (
@@ -166,7 +193,6 @@ async def create_queen(
+ _queen_behavior_always
+ _queen_behavior_building
+ _building_knowledge
+ _gcu_building_section
+ _queen_phase_7
+ _appendices
+ worker_identity
@@ -205,8 +231,7 @@ async def create_queen(
data={"persona": persona},
)
)
body = _planning_body if phase_state.phase == "planning" else _building_body
return HookResult(system_prompt=persona + "\n\n" + body)
return HookResult(system_prompt=persona + "\n\n" + phase_state.get_current_prompt())
# ---- Graph preparation -------------------------------------------
initial_prompt_text = phase_state.get_current_prompt()
@@ -250,6 +275,7 @@ async def create_queen(
execution_id=session.id,
dynamic_tools_provider=phase_state.get_current_tools,
dynamic_prompt_provider=phase_state.get_current_prompt,
iteration_metadata_provider=lambda: {"phase": phase_state.phase},
)
session.queen_executor = executor
@@ -267,6 +293,8 @@ async def create_queen(
return
if phase_state.phase == "running":
if event.type == EventType.EXECUTION_COMPLETED:
# Mark worker as configured after first successful run
session.worker_configured = True
output = event.data.get("output", {})
output_summary = ""
if output:
+9
View File
@@ -15,6 +15,7 @@ logger = logging.getLogger(__name__)
DEFAULT_EVENT_TYPES = [
EventType.CLIENT_OUTPUT_DELTA,
EventType.CLIENT_INPUT_REQUESTED,
EventType.CLIENT_INPUT_RECEIVED,
EventType.LLM_TEXT_DELTA,
EventType.TOOL_CALL_STARTED,
EventType.TOOL_CALL_COMPLETED,
@@ -40,6 +41,12 @@ DEFAULT_EVENT_TYPES = [
EventType.CREDENTIALS_REQUIRED,
EventType.SUBAGENT_REPORT,
EventType.QUEEN_PHASE_CHANGED,
EventType.TRIGGER_AVAILABLE,
EventType.TRIGGER_ACTIVATED,
EventType.TRIGGER_DEACTIVATED,
EventType.TRIGGER_FIRED,
EventType.TRIGGER_REMOVED,
EventType.DRAFT_GRAPH_UPDATED,
]
# Keepalive interval in seconds
@@ -89,6 +96,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
"execution_failed",
"execution_paused",
"client_input_requested",
"client_input_received",
"node_loop_iteration",
"node_loop_started",
"credentials_required",
@@ -142,6 +150,7 @@ async def handle_events(request: web.Request) -> web.StreamResponse:
EventType.CLIENT_OUTPUT_DELTA.value,
EventType.EXECUTION_STARTED.value,
EventType.CLIENT_INPUT_REQUESTED.value,
EventType.CLIENT_INPUT_RECEIVED.value,
}
event_type_values = {et.value for et in event_types}
replay_types = _REPLAY_TYPES & event_type_values
+18 -4
View File
@@ -125,6 +125,18 @@ async def handle_chat(request: web.Request) -> web.Response:
node = queen_executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
await node.inject_event(message, is_client_input=True)
# Publish to EventBus so the session event log captures user messages
from framework.runtime.event_bus import AgentEvent, EventType
await session.event_bus.publish(
AgentEvent(
type=EventType.CLIENT_INPUT_RECEIVED,
stream_id="queen",
node_id="queen",
execution_id=session.id,
data={"content": message},
)
)
return web.json_response(
{
"status": "queen",
@@ -347,7 +359,7 @@ async def handle_pause(request: web.Request) -> web.Response:
for exec_id in list(stream.active_execution_ids):
try:
ok = await stream.cancel_execution(exec_id)
ok = await stream.cancel_execution(exec_id, reason="Execution paused by user")
if ok:
cancelled.append(exec_id)
except Exception:
@@ -357,8 +369,8 @@ async def handle_pause(request: web.Request) -> web.Response:
runtime.pause_timers()
# Switch to staging (agent still loaded, ready to re-run)
if session.mode_state is not None:
await session.mode_state.switch_to_staging(source="frontend")
if session.phase_state is not None:
await session.phase_state.switch_to_staging(source="frontend")
return web.json_response(
{
@@ -400,7 +412,9 @@ async def handle_stop(request: web.Request) -> web.Response:
if hasattr(node, "cancel_current_turn"):
node.cancel_current_turn()
cancelled = await stream.cancel_execution(execution_id)
cancelled = await stream.cancel_execution(
execution_id, reason="Execution stopped by user"
)
if cancelled:
# Cancel queen's in-progress LLM turn
if session.queen_executor:
+80
View File
@@ -2,6 +2,7 @@
import json
import logging
import time
from aiohttp import web
@@ -116,6 +117,20 @@ async def handle_list_nodes(request: web.Request) -> web.Response:
}
for ep in reg.entry_points.values()
]
# Append triggers from triggers.json (stored on session)
for t in getattr(session, "available_triggers", {}).values():
entry = {
"id": t.id,
"name": t.description or t.id,
"entry_node": graph.entry_node,
"trigger_type": t.trigger_type,
"trigger_config": t.trigger_config,
"task": t.task,
}
mono = getattr(session, "trigger_next_fire", {}).get(t.id)
if mono is not None:
entry["next_fire_in"] = max(0.0, mono - time.monotonic())
entry_points.append(entry)
return web.json_response(
{
"nodes": nodes,
@@ -234,8 +249,73 @@ async def handle_node_tools(request: web.Request) -> web.Response:
return web.json_response({"tools": tools_out})
async def handle_draft_graph(request: web.Request) -> web.Response:
"""Return the current draft graph from planning phase (if any)."""
session, err = resolve_session(request)
if err:
return err
phase_state = getattr(session, "phase_state", None)
if phase_state is None or phase_state.draft_graph is None:
return web.json_response({"draft": None})
return web.json_response({"draft": phase_state.draft_graph})
async def handle_flowchart_map(request: web.Request) -> web.Response:
"""Return the flowchart→runtime node mapping and the original (pre-dissolution) draft.
Available after confirm_and_build() dissolves decision nodes, or loaded
from the agent's flowchart.json file, or synthesized from the runtime graph.
"""
session, err = resolve_session(request)
if err:
return err
phase_state = getattr(session, "phase_state", None)
# Fast path: already in memory
if phase_state is not None and phase_state.original_draft_graph is not None:
return web.json_response(
{
"map": phase_state.flowchart_map,
"original_draft": phase_state.original_draft_graph,
}
)
# Try loading from flowchart.json in the agent folder
worker_path = getattr(session, "worker_path", None)
if worker_path is not None:
from pathlib import Path
target = Path(worker_path) / "flowchart.json"
if target.is_file():
try:
data = json.loads(target.read_text(encoding="utf-8"))
original_draft = data.get("original_draft")
fmap = data.get("flowchart_map")
# Cache in phase_state for future requests
if phase_state is not None and original_draft:
phase_state.original_draft_graph = original_draft
phase_state.flowchart_map = fmap
return web.json_response(
{
"map": fmap,
"original_draft": original_draft,
}
)
except Exception:
logger.warning("Failed to read flowchart.json from %s", worker_path)
return web.json_response({"map": None, "original_draft": None})
def register_routes(app: web.Application) -> None:
"""Register graph/node inspection routes."""
# Draft graph (planning phase — visual only, no loaded worker required)
app.router.add_get("/api/sessions/{session_id}/draft-graph", handle_draft_graph)
# Flowchart map (post-dissolution — maps runtime nodes to original draft nodes)
app.router.add_get("/api/sessions/{session_id}/flowchart-map", handle_flowchart_map)
# Session-primary routes
app.router.add_get("/api/sessions/{session_id}/graphs/{graph_id}/nodes", handle_list_nodes)
app.router.add_get(
+218 -50
View File
@@ -9,8 +9,10 @@ Session-primary routes:
- DELETE /api/sessions/{session_id}/worker unload worker from session
- GET /api/sessions/{session_id}/stats runtime statistics
- GET /api/sessions/{session_id}/entry-points list entry points
- PATCH /api/sessions/{session_id}/triggers/{id} update trigger task
- GET /api/sessions/{session_id}/graphs list graph IDs
- GET /api/sessions/{session_id}/queen-messages queen conversation history
- GET /api/sessions/{session_id}/events/history persisted eventbus log (for replay)
Worker session browsing (persisted execution runs on disk):
- GET /api/sessions/{session_id}/worker-sessions list
@@ -31,6 +33,7 @@ from pathlib import Path
from aiohttp import web
from framework.server.app import (
cold_sessions_dir,
resolve_session,
safe_path_segment,
sessions_dir,
@@ -140,6 +143,7 @@ async def handle_create_session(request: web.Request) -> web.Response:
session = await manager.create_session_with_worker(
agent_path,
agent_id=agent_id,
session_id=session_id,
model=model,
initial_prompt=initial_prompt,
queen_resume_from=queen_resume_from,
@@ -228,6 +232,22 @@ async def handle_get_live_session(request: web.Request) -> web.Response:
}
for ep in rt.get_entry_points()
]
# Append triggers from triggers.json (stored on session)
runner = getattr(session, "runner", None)
graph_entry = runner.graph.entry_node if runner else ""
for t in getattr(session, "available_triggers", {}).values():
entry = {
"id": t.id,
"name": t.description or t.id,
"entry_node": graph_entry,
"trigger_type": t.trigger_type,
"trigger_config": t.trigger_config,
"task": t.task,
}
mono = getattr(session, "trigger_next_fire", {}).get(t.id)
if mono is not None:
entry["next_fire_in"] = max(0.0, mono - time.monotonic())
data["entry_points"].append(entry)
data["graphs"] = session.worker_runtime.list_graphs()
return web.json_response(data)
@@ -351,23 +371,84 @@ async def handle_session_entry_points(request: web.Request) -> web.Response:
rt = session.worker_runtime
eps = rt.get_entry_points() if rt else []
entry_points = [
{
"id": ep.id,
"name": ep.name,
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"trigger_config": ep.trigger_config,
**(
{"next_fire_in": nf}
if rt and (nf := rt.get_timer_next_fire_in(ep.id)) is not None
else {}
),
}
for ep in eps
]
# Append triggers from triggers.json (stored on session)
runner = getattr(session, "runner", None)
graph_entry = runner.graph.entry_node if runner else ""
for t in getattr(session, "available_triggers", {}).values():
entry = {
"id": t.id,
"name": t.description or t.id,
"entry_node": graph_entry,
"trigger_type": t.trigger_type,
"trigger_config": t.trigger_config,
"task": t.task,
}
mono = getattr(session, "trigger_next_fire", {}).get(t.id)
if mono is not None:
entry["next_fire_in"] = max(0.0, mono - time.monotonic())
entry_points.append(entry)
return web.json_response({"entry_points": entry_points})
async def handle_update_trigger_task(request: web.Request) -> web.Response:
"""PATCH /api/sessions/{session_id}/triggers/{trigger_id} — update trigger task."""
session, err = resolve_session(request)
if err:
return err
trigger_id = request.match_info["trigger_id"]
available = getattr(session, "available_triggers", {})
tdef = available.get(trigger_id)
if tdef is None:
return web.json_response(
{"error": f"Trigger '{trigger_id}' not found"},
status=404,
)
try:
body = await request.json()
except Exception:
return web.json_response({"error": "Invalid JSON body"}, status=400)
task = body.get("task")
if task is None:
return web.json_response({"error": "Missing 'task' field"}, status=400)
if not isinstance(task, str):
return web.json_response({"error": "'task' must be a string"}, status=400)
tdef.task = task
# Persist to session state and agent definition
from framework.tools.queen_lifecycle_tools import (
_persist_active_triggers,
_save_trigger_to_agent,
)
if trigger_id in getattr(session, "active_trigger_ids", set()):
session_id = request.match_info["session_id"]
await _persist_active_triggers(session, session_id)
_save_trigger_to_agent(session, trigger_id, tdef)
return web.json_response(
{
"entry_points": [
{
"id": ep.id,
"name": ep.name,
"entry_node": ep.entry_node,
"trigger_type": ep.trigger_type,
"trigger_config": ep.trigger_config,
**(
{"next_fire_in": nf}
if rt and (nf := rt.get_timer_next_fire_in(ep.id)) is not None
else {}
),
}
for ep in eps
]
"trigger_id": trigger_id,
"task": tdef.task,
}
)
@@ -397,12 +478,15 @@ async def handle_list_worker_sessions(request: web.Request) -> web.Response:
"""List worker sessions on disk."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_path:
return web.json_response({"sessions": []})
sess_dir = sessions_dir(session)
# Fall back to cold session lookup from disk
sid = request.match_info["session_id"]
sess_dir = cold_sessions_dir(sid)
if sess_dir is None:
return err
else:
if not session.worker_path:
return web.json_response({"sessions": []})
sess_dir = sessions_dir(session)
if not sess_dir.exists():
return web.json_response({"sessions": []})
@@ -564,48 +648,85 @@ async def handle_messages(request: web.Request) -> web.Response:
"""Get messages for a worker session."""
session, err = resolve_session(request)
if err:
return err
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
# Fall back to cold session lookup from disk
sid = request.match_info["session_id"]
sess_dir = cold_sessions_dir(sid)
if sess_dir is None:
return err
else:
if not session.worker_path:
return web.json_response({"error": "No worker loaded"}, status=503)
sess_dir = sessions_dir(session)
ws_id = request.match_info.get("ws_id") or request.match_info.get("session_id", "")
ws_id = safe_path_segment(ws_id)
convs_dir = sessions_dir(session) / ws_id / "conversations"
convs_dir = sess_dir / ws_id / "conversations"
if not convs_dir.exists():
return web.json_response({"messages": []})
filter_node = request.query.get("node_id")
all_messages = []
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
if filter_node and node_dir.name != filter_node:
continue
parts_dir = node_dir / "parts"
def _collect_msg_parts(parts_dir: Path, node_id: str) -> None:
if not parts_dir.exists():
continue
return
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part["_node_id"] = node_dir.name
part["_node_id"] = node_id
part.setdefault("created_at", part_file.stat().st_mtime)
all_messages.append(part)
except (json.JSONDecodeError, OSError):
continue
# Flat layout: conversations/parts/*.json
if not filter_node:
_collect_msg_parts(convs_dir / "parts", "worker")
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
if filter_node and node_dir.name != filter_node:
continue
_collect_msg_parts(node_dir / "parts", node_dir.name)
# Merge run lifecycle markers from runs.jsonl (for historical dividers)
runs_file = sess_dir / ws_id / "runs.jsonl"
if runs_file.exists():
try:
for line in runs_file.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
all_messages.append(
{
"seq": -1,
"role": "system",
"content": "",
"_node_id": "_run_marker",
"is_run_marker": True,
"run_id": record.get("run_id"),
"run_event": record.get("event"),
"created_at": record.get("created_at", 0),
}
)
except json.JSONDecodeError:
continue
except OSError:
pass
all_messages.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
client_only = request.query.get("client_only", "").lower() in ("true", "1")
if client_only:
client_facing_nodes: set[str] = set()
if session.runner and hasattr(session.runner, "graph"):
if session and session.runner and hasattr(session.runner, "graph"):
for node in session.runner.graph.nodes:
if node.client_facing:
client_facing_nodes.add(node.id)
@@ -614,12 +735,15 @@ async def handle_messages(request: web.Request) -> web.Response:
all_messages = [
m
for m in all_messages
if not m.get("is_transition_marker")
and m["role"] != "tool"
and not (m["role"] == "assistant" and m.get("tool_calls"))
and (
(m["role"] == "user" and m.get("is_client_input"))
or (m["role"] == "assistant" and m.get("_node_id") in client_facing_nodes)
if m.get("is_run_marker")
or (
not m.get("is_transition_marker")
and m["role"] != "tool"
and not (m["role"] == "assistant" and m.get("tool_calls"))
and (
(m["role"] == "user" and m.get("is_client_input"))
or (m["role"] == "assistant" and m.get("_node_id") in client_facing_nodes)
)
)
]
@@ -640,18 +764,16 @@ async def handle_queen_messages(request: web.Request) -> web.Response:
return web.json_response({"messages": [], "session_id": session_id})
all_messages: list[dict] = []
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
parts_dir = node_dir / "parts"
def _read_parts(parts_dir: Path, node_id: str) -> None:
if not parts_dir.exists():
continue
return
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part["_node_id"] = node_dir.name
part["_node_id"] = node_id
# Use file mtime as created_at so frontend can order
# queen and worker messages chronologically.
part.setdefault("created_at", part_file.stat().st_mtime)
@@ -659,6 +781,15 @@ async def handle_queen_messages(request: web.Request) -> web.Response:
except (json.JSONDecodeError, OSError):
continue
# Flat layout: conversations/parts/*.json
_read_parts(convs_dir / "parts", "queen")
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
_read_parts(node_dir / "parts", node_dir.name)
all_messages.sort(key=lambda m: m.get("created_at", m.get("seq", 0)))
# Filter to client-facing messages only
@@ -673,6 +804,38 @@ async def handle_queen_messages(request: web.Request) -> web.Response:
return web.json_response({"messages": all_messages, "session_id": session_id})
async def handle_session_events_history(request: web.Request) -> web.Response:
"""GET /api/sessions/{session_id}/events/history — persisted eventbus log.
Reads ``events.jsonl`` from the session directory on disk so it works for
both live sessions and cold (post-server-restart) sessions. The frontend
replays these events through ``sseEventToChatMessage`` to fully reconstruct
the UI state on resume.
"""
session_id = request.match_info["session_id"]
queen_dir = Path.home() / ".hive" / "queen" / "session" / session_id
events_path = queen_dir / "events.jsonl"
if not events_path.exists():
return web.json_response({"events": [], "session_id": session_id})
events: list[dict] = []
try:
with open(events_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
events.append(json.loads(line))
except json.JSONDecodeError:
continue
except OSError:
return web.json_response({"events": [], "session_id": session_id})
return web.json_response({"events": events, "session_id": session_id})
async def handle_session_history(request: web.Request) -> web.Response:
"""GET /api/sessions/history — all queen sessions on disk (live + cold).
@@ -731,7 +894,7 @@ async def handle_delete_history_session(request: web.Request) -> web.Response:
async def handle_discover(request: web.Request) -> web.Response:
"""GET /api/discover — discover agents from filesystem."""
from framework.tui.screens.agent_picker import discover_agents
from framework.agents.discovery import discover_agents
manager = _get_manager(request)
loaded_paths = {str(s.worker_path) for s in manager.list_sessions() if s.worker_path}
@@ -746,6 +909,7 @@ async def handle_discover(request: web.Request) -> web.Response:
"description": entry.description,
"category": entry.category,
"session_count": entry.session_count,
"run_count": entry.run_count,
"node_count": entry.node_count,
"tool_count": entry.tool_count,
"tags": entry.tags,
@@ -783,8 +947,12 @@ def register_routes(app: web.Application) -> None:
# Session info
app.router.add_get("/api/sessions/{session_id}/stats", handle_session_stats)
app.router.add_get("/api/sessions/{session_id}/entry-points", handle_session_entry_points)
app.router.add_patch(
"/api/sessions/{session_id}/triggers/{trigger_id}", handle_update_trigger_task
)
app.router.add_get("/api/sessions/{session_id}/graphs", handle_session_graphs)
app.router.add_get("/api/sessions/{session_id}/queen-messages", handle_queen_messages)
app.router.add_get("/api/sessions/{session_id}/events/history", handle_session_events_history)
# Worker session browsing (session-primary)
app.router.add_get("/api/sessions/{session_id}/worker-sessions", handle_list_worker_sessions)
+308 -146
View File
@@ -7,7 +7,6 @@ Architecture:
- Session owns EventBus + LLM, shared with queen and worker
- Queen is always present once a session starts
- Worker is optional loaded into an existing session
- Judge is active only when a worker is loaded
"""
import asyncio
@@ -15,11 +14,13 @@ import json
import logging
import time
import uuid
from dataclasses import dataclass
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any
from framework.runtime.triggers import TriggerDefinition
logger = logging.getLogger(__name__)
@@ -42,12 +43,23 @@ class Session:
worker_info: Any | None = None # AgentInfo
# Queen phase state (building/staging/running)
phase_state: Any = None # QueenPhaseState
# Judge (active when worker is loaded)
judge_task: asyncio.Task | None = None
escalation_sub: str | None = None
# Worker handoff subscription
worker_handoff_sub: str | None = None
# Memory consolidation subscription (fires on CONTEXT_COMPACTED)
memory_consolidation_sub: str | None = None
# Trigger definitions loaded from agent's triggers.json (available but inactive)
available_triggers: dict[str, TriggerDefinition] = field(default_factory=dict)
# Active trigger tracking (IDs currently firing + their asyncio tasks)
active_trigger_ids: set[str] = field(default_factory=set)
active_timer_tasks: dict[str, asyncio.Task] = field(default_factory=dict)
# Queen-owned webhook server (lazy singleton, created on first webhook trigger activation)
queen_webhook_server: Any = None
# EventBus subscription IDs for active webhook triggers (trigger_id -> sub_id)
active_webhook_subs: dict[str, str] = field(default_factory=dict)
# True after first successful worker execution (gates trigger delivery)
worker_configured: bool = False
# Monotonic timestamps for next trigger fire (mirrors AgentRuntime._timer_next_fire)
trigger_next_fire: dict[str, float] = field(default_factory=dict)
# Session directory resumption:
# When set, _start_queen writes queen conversations to this existing session's
# directory instead of creating a new one. This lets cold-restores accumulate
@@ -130,7 +142,9 @@ class SessionManager:
to that existing session's directory instead of creating a new one.
This preserves full conversation history across server restarts.
"""
session = await self._create_session_core(session_id=session_id, model=model)
# Reuse the original session ID when cold-restoring
resolved_session_id = queen_resume_from or session_id
session = await self._create_session_core(session_id=resolved_session_id, model=model)
session.queen_resume_from = queen_resume_from
# Start queen immediately (queen-only, no worker tools yet)
@@ -147,22 +161,28 @@ class SessionManager:
self,
agent_path: str | Path,
agent_id: str | None = None,
session_id: str | None = None,
model: str | None = None,
initial_prompt: str | None = None,
queen_resume_from: str | None = None,
) -> Session:
"""Create a session and load a worker in one step.
When ``queen_resume_from`` is set the queen writes conversation messages
to that existing session's directory instead of creating a new one.
When ``queen_resume_from`` is set the session reuses the original session
ID so the frontend sees a single continuous session. The queen writes
conversation messages to that existing directory, preserving full history.
"""
from framework.tools.queen_lifecycle_tools import build_worker_profile
agent_path = Path(agent_path)
resolved_worker_id = agent_id or agent_path.name
# Auto-generate session ID (not the agent name)
session = await self._create_session_core(model=model)
# Reuse the original session ID when cold-restoring so the frontend
# sees one continuous session instead of a new one each time.
session = await self._create_session_core(
session_id=queen_resume_from,
model=model,
)
session.queen_resume_from = queen_resume_from
try:
# Load worker FIRST (before queen) so queen gets full tools
@@ -202,8 +222,8 @@ class SessionManager:
) -> None:
"""Load a worker agent into a session (core logic).
Sets up the runner, runtime, and session fields. Does NOT start the
judge or notify the queen callers handle those steps.
Sets up the runner, runtime, and session fields. Does NOT notify
the queen callers handle that step.
"""
from framework.runner import AgentRunner
@@ -242,6 +262,25 @@ class SessionManager:
runtime = runner._agent_runtime
# Load triggers from the agent's triggers.json definition file.
from framework.tools.queen_lifecycle_tools import _read_agent_triggers_json
for tdata in _read_agent_triggers_json(agent_path):
tid = tdata.get("id", "")
ttype = tdata.get("trigger_type", "")
if tid and ttype in ("timer", "webhook"):
session.available_triggers[tid] = TriggerDefinition(
id=tid,
trigger_type=ttype,
trigger_config=tdata.get("trigger_config", {}),
description=tdata.get("name", tid),
task=tdata.get("task", ""),
)
logger.info("Loaded trigger '%s' (%s) from triggers.json", tid, ttype)
if session.available_triggers:
await self._emit_trigger_events(session, "available", session.available_triggers)
# Start runtime on event loop
if runtime and not runtime.is_running:
await runtime.start()
@@ -278,11 +317,20 @@ class SessionManager:
When a new runtime starts, any on-disk session still marked 'active'
is from a process that no longer exists. 'Paused' sessions are left
intact so they remain resumable.
Two-layer protection against corrupting live sessions:
1. In-memory: skip any session ID currently tracked in self._sessions
(guaranteed alive in this process).
2. PID validation: if state.json contains a ``pid`` field, check whether
that process is still running on the host. If it is, the session is
owned by another healthy worker process, so leave it alone.
"""
sessions_path = Path.home() / ".hive" / "agents" / agent_path.name / "sessions"
if not sessions_path.exists():
return
live_session_ids = set(self._sessions.keys())
for d in sessions_path.iterdir():
if not d.is_dir() or not d.name.startswith("session_"):
continue
@@ -293,6 +341,26 @@ class SessionManager:
state = json.loads(state_path.read_text(encoding="utf-8"))
if state.get("status") != "active":
continue
# Layer 1: skip sessions that are alive in this process
session_id = state.get("session_id", d.name)
if session_id in live_session_ids or d.name in live_session_ids:
logger.debug(
"Skipping live in-memory session '%s' during stale cleanup",
d.name,
)
continue
# Layer 2: skip sessions whose owning process is still alive
recorded_pid = state.get("pid")
if recorded_pid is not None and self._is_pid_alive(recorded_pid):
logger.debug(
"Skipping session '%s' — owning process %d is still running",
d.name,
recorded_pid,
)
continue
state["status"] = "cancelled"
state.setdefault("result", {})["error"] = "Stale session: runtime restarted"
state.setdefault("timestamps", {})["updated_at"] = datetime.now().isoformat()
@@ -303,6 +371,34 @@ class SessionManager:
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to clean up stale session %s: %s", d.name, e)
@staticmethod
def _is_pid_alive(pid: int) -> bool:
"""Check whether a process with the given PID is still running."""
import os
import platform
if platform.system() == "Windows":
import ctypes
# PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
kernel32 = ctypes.windll.kernel32
handle = kernel32.OpenProcess(0x1000, False, pid)
if not handle:
# 5 is ERROR_ACCESS_DENIED, meaning the process exists but is protected
return kernel32.GetLastError() == 5
exit_code = ctypes.c_ulong()
kernel32.GetExitCodeProcess(handle, ctypes.byref(exit_code))
kernel32.CloseHandle(handle)
# 259 is STILL_ACTIVE
return exit_code.value == 259
else:
try:
os.kill(pid, 0)
except OSError:
return False
return True
async def load_worker(
self,
session_id: str,
@@ -312,7 +408,7 @@ class SessionManager:
) -> Session:
"""Load a worker agent into an existing session (with running queen).
Starts the worker runtime, health judge, and notifies the queen.
Starts the worker runtime and notifies the queen.
"""
agent_path = Path(agent_path)
@@ -328,11 +424,48 @@ class SessionManager:
)
# Notify queen about the loaded worker (skip for queen itself).
# Health judge disabled for simplicity.
if agent_path.name != "queen" and session.worker_runtime:
# await self._start_judge(session, session.runner._storage_path)
await self._notify_queen_worker_loaded(session)
# Restore previously active triggers from persisted session state
if session.available_triggers and session.worker_runtime:
try:
store = session.worker_runtime._session_store
state = await store.read_state(session_id)
if state and state.active_triggers:
from framework.tools.queen_lifecycle_tools import (
_start_trigger_timer,
_start_trigger_webhook,
)
saved_tasks = getattr(state, "trigger_tasks", {}) or {}
for tid in state.active_triggers:
tdef = session.available_triggers.get(tid)
if tdef:
# Restore user-configured task override
saved_task = saved_tasks.get(tid, "")
if saved_task:
tdef.task = saved_task
tdef.active = True
session.active_trigger_ids.add(tid)
if tdef.trigger_type == "timer":
await _start_trigger_timer(session, tid, tdef)
logger.info("Restored trigger timer '%s'", tid)
elif tdef.trigger_type == "webhook":
await _start_trigger_webhook(session, tid, tdef)
logger.info("Restored webhook trigger '%s'", tid)
else:
logger.warning(
"Saved trigger '%s' not found in worker entry points, skipping",
tid,
)
# Restore worker_configured flag
if state and getattr(state, "worker_configured", False):
session.worker_configured = True
except Exception as e:
logger.warning("Failed to restore active triggers: %s", e)
# Emit SSE event so the frontend can update UI
await self._emit_worker_loaded(session)
@@ -346,9 +479,6 @@ class SessionManager:
if session.worker_runtime is None:
return False
# Stop judge + escalation
self._stop_judge(session)
# Cleanup worker
if session.runner:
try:
@@ -356,6 +486,26 @@ class SessionManager:
except Exception as e:
logger.error("Error cleaning up worker '%s': %s", session.worker_id, e)
# Cancel active trigger timers
for tid, task in session.active_timer_tasks.items():
task.cancel()
logger.info("Cancelled trigger timer '%s' on unload", tid)
session.active_timer_tasks.clear()
# Unsubscribe webhook handlers (server stays alive — queen-owned)
for sub_id in session.active_webhook_subs.values():
try:
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.active_webhook_subs.clear()
session.active_trigger_ids.clear()
# Clean up triggers
if session.available_triggers:
await self._emit_trigger_events(session, "removed", session.available_triggers)
session.available_triggers.clear()
worker_id = session.worker_id
session.worker_id = None
session.worker_path = None
@@ -386,8 +536,6 @@ class SessionManager:
_storage_id = getattr(session, "queen_resume_from", None) or session_id
_session_dir = Path.home() / ".hive" / "queen" / "session" / _storage_id
# Stop judge
self._stop_judge(session)
if session.worker_handoff_sub is not None:
try:
session.event_bus.unsubscribe(session.worker_handoff_sub)
@@ -407,6 +555,25 @@ class SessionManager:
session.queen_task = None
session.queen_executor = None
# Cancel active trigger timers
for task in session.active_timer_tasks.values():
task.cancel()
session.active_timer_tasks.clear()
# Unsubscribe webhook handlers and stop queen webhook server
for sub_id in session.active_webhook_subs.values():
try:
session.event_bus.unsubscribe(sub_id)
except Exception:
pass
session.active_webhook_subs.clear()
if session.queen_webhook_server is not None:
try:
await session.queen_webhook_server.stop()
except Exception:
logger.error("Error stopping queen webhook server", exc_info=True)
session.queen_webhook_server = None
# Cleanup worker
if session.runner:
try:
@@ -425,6 +592,9 @@ class SessionManager:
name=f"queen-memory-consolidation-{session_id}",
)
# Close per-session event log
session.event_bus.close_session_log()
logger.info("Session '%s' stopped", session_id)
return True
@@ -434,7 +604,7 @@ class SessionManager:
async def _handle_worker_handoff(self, session: Session, executor: Any, event: Any) -> None:
"""Route worker escalation events into the queen conversation."""
if event.stream_id in ("queen", "judge"):
if event.stream_id == "queen":
return
reason = str(event.data.get("reason", "")).strip()
@@ -523,6 +693,39 @@ class SessionManager:
except OSError:
pass
# Enable per-session event persistence so that all eventbus events
# survive server restarts and can be replayed on cold-session resume.
# Scan the existing event log to find the max iteration ever written,
# then use max+1 as offset so resumed sessions produce monotonically
# increasing iteration values — preventing frontend message ID collisions.
iteration_offset = 0
events_path = queen_dir / "events.jsonl"
try:
if events_path.exists():
max_iter = -1
with open(events_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
evt = json.loads(line)
it = evt.get("data", {}).get("iteration")
if isinstance(it, int) and it > max_iter:
max_iter = it
except (json.JSONDecodeError, TypeError):
continue
if max_iter >= 0:
iteration_offset = max_iter + 1
logger.info(
"Session '%s' resuming with iteration_offset=%d (from events.jsonl max)",
session.id,
iteration_offset,
)
except OSError:
pass
session.event_bus.set_session_log(events_path, iteration_offset=iteration_offset)
session.queen_task = await create_queen(
session=session,
session_manager=self,
@@ -531,6 +734,22 @@ class SessionManager:
initial_prompt=initial_prompt,
)
# Auto-load worker on cold restore — the queen's conversation expects
# the agent to be loaded, but the new session has no worker.
if session.queen_resume_from and not session.worker_runtime:
meta_path = queen_dir / "meta.json"
if meta_path.exists():
try:
_meta = json.loads(meta_path.read_text(encoding="utf-8"))
_agent_path = _meta.get("agent_path")
if _agent_path and Path(_agent_path).exists():
await self.load_worker(session.id, _agent_path)
if session.phase_state:
await session.phase_state.switch_to_staging(source="auto")
logger.info("Cold restore: auto-loaded worker from %s", _agent_path)
except Exception:
logger.warning("Cold restore: failed to auto-load worker", exc_info=True)
# Memory consolidation — triggered by context compaction events.
# Compaction is a natural signal that "enough has happened to be worth remembering".
_consolidation_llm = session.llm
@@ -550,116 +769,6 @@ class SessionManager:
handler=_on_compaction,
)
# ------------------------------------------------------------------
# Judge startup / teardown
# ------------------------------------------------------------------
async def _start_judge(
self,
session: Session,
worker_storage_path: str | Path,
) -> None:
"""Start the health judge for a session's worker."""
from framework.graph.executor import GraphExecutor
from framework.monitoring import judge_goal, judge_graph
from framework.runner.tool_registry import ToolRegistry
from framework.runtime.core import Runtime
from framework.runtime.event_bus import EventType as _ET
from framework.tools.worker_monitoring_tools import register_worker_monitoring_tools
worker_storage_path = Path(worker_storage_path)
try:
# Monitoring tools
monitoring_registry = ToolRegistry()
register_worker_monitoring_tools(
monitoring_registry,
session.event_bus,
worker_storage_path,
worker_graph_id=session.worker_runtime._graph_id,
)
hive_home = Path.home() / ".hive"
judge_dir = hive_home / "judge" / "session" / session.id
judge_dir.mkdir(parents=True, exist_ok=True)
judge_runtime = Runtime(hive_home / "judge")
monitoring_tools = list(monitoring_registry.get_tools().values())
monitoring_executor = monitoring_registry.get_executor()
async def _judge_loop():
interval = 300 # 5 minutes between checks
# Wait before the first check — let the worker actually do something
await asyncio.sleep(interval)
while True:
try:
executor = GraphExecutor(
runtime=judge_runtime,
llm=session.llm,
tools=monitoring_tools,
tool_executor=monitoring_executor,
event_bus=session.event_bus,
stream_id="judge",
storage_path=judge_dir,
loop_config=judge_graph.loop_config,
)
await executor.execute(
graph=judge_graph,
goal=judge_goal,
input_data={
"event": {"source": "timer", "reason": "scheduled"},
},
session_state={"resume_session_id": session.id},
)
except Exception:
logger.error("Health judge tick failed", exc_info=True)
await asyncio.sleep(interval)
session.judge_task = asyncio.create_task(_judge_loop())
# Escalation: judge → queen
async def _on_escalation(event):
ticket = event.data.get("ticket", {})
executor = session.queen_executor
if executor is None:
logger.warning("Escalation received but queen executor is None")
return
node = executor.node_registry.get("queen")
if node is not None and hasattr(node, "inject_event"):
msg = "[ESCALATION TICKET from Health Judge]\n" + json.dumps(
ticket, indent=2, ensure_ascii=False
)
await node.inject_event(msg)
else:
logger.warning("Escalation received but queen node not ready")
session.escalation_sub = session.event_bus.subscribe(
event_types=[_ET.WORKER_ESCALATION_TICKET],
handler=_on_escalation,
)
logger.info("Judge started for session '%s'", session.id)
except Exception as e:
logger.error(
"Failed to start judge for session '%s': %s",
session.id,
e,
exc_info=True,
)
def _stop_judge(self, session: Session) -> None:
"""Cancel judge task and unsubscribe escalation events."""
if session.judge_task is not None:
session.judge_task.cancel()
session.judge_task = None
if session.escalation_sub is not None:
try:
session.event_bus.unsubscribe(session.escalation_sub)
except Exception:
pass
session.escalation_sub = None
# ------------------------------------------------------------------
# Queen notifications
# ------------------------------------------------------------------
@@ -676,7 +785,22 @@ class SessionManager:
return
profile = build_worker_profile(session.worker_runtime, agent_path=session.worker_path)
await node.inject_event(f"[SYSTEM] Worker loaded.{profile}")
# Append available trigger info so the queen knows what's schedulable
trigger_lines = ""
if session.available_triggers:
parts = []
for t in session.available_triggers.values():
cfg = t.trigger_config
detail = cfg.get("cron") or f"every {cfg.get('interval_minutes', '?')} min"
task_info = f' -> task: "{t.task}"' if t.task else " (no task configured)"
parts.append(f" - {t.id} ({t.trigger_type}: {detail}){task_info}")
trigger_lines = (
"\n\nAvailable triggers (inactive — use set_trigger to activate):\n"
+ "\n".join(parts)
)
await node.inject_event(f"[SYSTEM] Worker loaded.{profile}{trigger_lines}")
async def _emit_worker_loaded(self, session: Session) -> None:
"""Publish a WORKER_LOADED event so the frontend can update."""
@@ -708,9 +832,35 @@ class SessionManager:
await node.inject_event(
"[SYSTEM] Worker unloaded. You are now operating independently. "
"Handle all tasks directly using your coding tools."
"Design or build the agent to solve the user's problem "
"according to your current phase."
)
async def _emit_trigger_events(
self,
session: Session,
kind: str,
triggers: dict[str, TriggerDefinition],
) -> None:
"""Emit TRIGGER_AVAILABLE or TRIGGER_REMOVED events for each trigger."""
from framework.runtime.event_bus import AgentEvent, EventType
event_type = (
EventType.TRIGGER_AVAILABLE if kind == "available" else EventType.TRIGGER_REMOVED
)
for t in triggers.values():
await session.event_bus.publish(
AgentEvent(
type=event_type,
stream_id="queen",
data={
"trigger_id": t.id,
"trigger_type": t.trigger_type,
"trigger_config": t.trigger_config,
},
)
)
async def revive_queen(self, session: Session, initial_prompt: str | None = None) -> None:
"""Revive a dead queen executor on an existing session.
@@ -782,13 +932,19 @@ class SessionManager:
# Check whether any message part files are actually present
has_messages = False
try:
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
parts_dir = node_dir / "parts"
if parts_dir.exists() and any(f.suffix == ".json" for f in parts_dir.iterdir()):
has_messages = True
break
# Flat layout: conversations/parts/*.json
flat_parts = convs_dir / "parts"
if flat_parts.exists() and any(f.suffix == ".json" for f in flat_parts.iterdir()):
has_messages = True
else:
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
parts_dir = node_dir / "parts"
if parts_dir.exists() and any(f.suffix == ".json" for f in parts_dir.iterdir()):
has_messages = True
break
except OSError:
pass
@@ -865,21 +1021,27 @@ class SessionManager:
if convs_dir.exists():
try:
all_parts: list[dict] = []
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir():
continue
parts_dir = node_dir / "parts"
def _collect_parts(parts_dir: Path, _dest: list[dict] = all_parts) -> None:
if not parts_dir.exists():
continue
return
for part_file in sorted(parts_dir.iterdir()):
if part_file.suffix != ".json":
continue
try:
part = json.loads(part_file.read_text(encoding="utf-8"))
part.setdefault("created_at", part_file.stat().st_mtime)
all_parts.append(part)
_dest.append(part)
except (json.JSONDecodeError, OSError):
continue
# Flat layout: conversations/parts/*.json
_collect_parts(convs_dir / "parts")
# Node-based layout: conversations/<node_id>/parts/*.json
for node_dir in convs_dir.iterdir():
if not node_dir.is_dir() or node_dir.name == "parts":
continue
_collect_parts(node_dir / "parts")
# Filter to client-facing messages only
client_msgs = [
p
+140
View File
@@ -16,6 +16,9 @@ from aiohttp.test_utils import TestClient, TestServer
from framework.server.app import create_app
from framework.server.session_manager import Session
REPO_ROOT = Path(__file__).resolve().parents[4]
EXAMPLE_AGENT_PATH = REPO_ROOT / "examples" / "templates" / "deep_research_agent"
# ---------------------------------------------------------------------------
# Mock helpers
# ---------------------------------------------------------------------------
@@ -37,6 +40,7 @@ class MockNodeSpec:
client_facing: bool = False
success_criteria: str | None = None
system_prompt: str | None = None
sub_agents: list = field(default_factory=list)
@dataclass
@@ -67,6 +71,7 @@ class MockEntryPoint:
name: str = "Default"
entry_node: str = "start"
trigger_type: str = "manual"
trigger_config: dict = field(default_factory=dict)
@dataclass
@@ -130,6 +135,9 @@ class MockRuntime:
def get_stats(self):
return {"running": True, "executions": 1}
def get_timer_next_fire_in(self, ep_id):
return None
class MockAgentInfo:
name: str = "test_agent"
@@ -342,6 +350,35 @@ class TestHealth:
class TestSessionCRUD:
@pytest.mark.asyncio
async def test_create_session_with_worker_forwards_session_id(self):
app = create_app()
manager = app["manager"]
manager.create_session_with_worker = AsyncMock(
return_value=_make_session(agent_id="my-custom-session")
)
async with TestClient(TestServer(app)) as client:
resp = await client.post(
"/api/sessions",
json={
"session_id": "my-custom-session",
"agent_path": str(EXAMPLE_AGENT_PATH),
},
)
data = await resp.json()
assert resp.status == 201
assert data["session_id"] == "my-custom-session"
manager.create_session_with_worker.assert_awaited_once_with(
str(EXAMPLE_AGENT_PATH.resolve()),
agent_id=None,
session_id="my-custom-session",
model=None,
initial_prompt=None,
queen_resume_from=None,
)
@pytest.mark.asyncio
async def test_list_sessions_empty(self):
app = create_app()
@@ -1556,3 +1593,106 @@ class TestErrorMiddleware:
async with TestClient(TestServer(app)) as client:
resp = await client.get("/api/nonexistent")
assert resp.status == 404
class TestCleanupStaleActiveSessions:
"""Tests for _cleanup_stale_active_sessions with two-layer protection."""
def _make_manager(self):
from framework.server.session_manager import SessionManager
return SessionManager()
def _write_state(self, session_dir: Path, status: str, pid: int | None = None) -> None:
session_dir.mkdir(parents=True, exist_ok=True)
state: dict = {"status": status, "session_id": session_dir.name}
if pid is not None:
state["pid"] = pid
(session_dir / "state.json").write_text(json.dumps(state))
def _read_state(self, session_dir: Path) -> dict:
return json.loads((session_dir / "state.json").read_text())
def test_stale_session_is_cancelled(self, tmp_path, monkeypatch):
"""Truly stale active sessions (no live tracking, no PID) get cancelled."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_stale_001"
self._write_state(session_dir, "active")
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "cancelled"
assert "Stale session" in state["result"]["error"]
def test_live_in_memory_session_is_skipped(self, tmp_path, monkeypatch):
"""Sessions tracked in self._sessions must NOT be cancelled (Layer 1)."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_live_002"
self._write_state(session_dir, "active")
mgr = self._make_manager()
# Simulate a live session in the manager's in-memory map
mgr._sessions["session_live_002"] = MagicMock()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "active", "Live in-memory session should NOT be cancelled"
def test_session_with_live_pid_is_skipped(self, tmp_path, monkeypatch):
"""Sessions whose owning PID is still alive must NOT be cancelled (Layer 2)."""
import os
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_pid_003"
# Use the current process PID — guaranteed to be alive
self._write_state(session_dir, "active", pid=os.getpid())
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "active", "Session with live PID should NOT be cancelled"
def test_session_with_dead_pid_is_cancelled(self, tmp_path, monkeypatch):
"""Sessions whose owning PID is dead should be cancelled."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_dead_004"
# Use a PID that is almost certainly not running
self._write_state(session_dir, "active", pid=999999999)
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "cancelled"
assert "Stale session" in state["result"]["error"]
def test_paused_session_is_never_touched(self, tmp_path, monkeypatch):
"""Paused sessions should remain intact regardless of PID or tracking."""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
agent_path = Path("my_agent")
sessions_dir = tmp_path / ".hive" / "agents" / "my_agent" / "sessions"
session_dir = sessions_dir / "session_paused_005"
self._write_state(session_dir, "paused")
mgr = self._make_manager()
mgr._cleanup_stale_active_sessions(agent_path)
state = self._read_state(session_dir)
assert state["status"] == "paused", "Paused sessions must remain untouched"
-179
View File
@@ -1,179 +0,0 @@
"""
State Writer - Dual-write adapter for migration period.
Writes execution state to both old (Run/RunSummary) and new (state.json) formats
to maintain backward compatibility during the transition period.
"""
import logging
import os
from datetime import datetime
from framework.schemas.run import Problem, Run, RunMetrics, RunStatus
from framework.schemas.session_state import SessionState, SessionStatus
from framework.storage.concurrent import ConcurrentStorage
from framework.storage.session_store import SessionStore
logger = logging.getLogger(__name__)
class StateWriter:
"""
Writes execution state to both old and new formats during migration.
During the dual-write phase:
- New format (state.json) is written when USE_UNIFIED_SESSIONS=true
- Old format (Run/RunSummary) is always written for backward compatibility
"""
def __init__(self, old_storage: ConcurrentStorage, session_store: SessionStore):
"""
Initialize state writer.
Args:
old_storage: ConcurrentStorage for old format (runs/, summaries/)
session_store: SessionStore for new format (sessions/*/state.json)
"""
self.old = old_storage
self.new = session_store
self.dual_write_enabled = os.getenv("USE_UNIFIED_SESSIONS", "false").lower() == "true"
async def write_execution_state(
self,
session_id: str,
state: SessionState,
) -> None:
"""
Write execution state to both old and new formats.
Args:
session_id: Session ID
state: SessionState to write
"""
# Write to new format if enabled
if self.dual_write_enabled:
try:
await self.new.write_state(session_id, state)
logger.debug(f"Wrote state.json for session {session_id}")
except Exception as e:
logger.error(f"Failed to write state.json for {session_id}: {e}")
# Don't fail - old format is still written
# Always write to old format for backward compatibility
try:
run = self._convert_to_run(state)
await self.old.save_run(run)
logger.debug(f"Wrote Run object for session {session_id}")
except Exception as e:
logger.error(f"Failed to write Run object for {session_id}: {e}")
# This is more critical - reraise if old format fails
raise
def _convert_to_run(self, state: SessionState) -> Run:
"""
Convert SessionState to legacy Run object.
Args:
state: SessionState to convert
Returns:
Run object
"""
# Map SessionStatus to RunStatus
status_mapping = {
SessionStatus.ACTIVE: RunStatus.RUNNING,
SessionStatus.PAUSED: RunStatus.RUNNING, # Paused is still "running" in old format
SessionStatus.COMPLETED: RunStatus.COMPLETED,
SessionStatus.FAILED: RunStatus.FAILED,
SessionStatus.CANCELLED: RunStatus.CANCELLED,
}
run_status = status_mapping.get(state.status, RunStatus.FAILED)
# Convert timestamps
started_at = datetime.fromisoformat(state.timestamps.started_at)
completed_at = (
datetime.fromisoformat(state.timestamps.completed_at)
if state.timestamps.completed_at
else None
)
# Build RunMetrics
metrics = RunMetrics(
total_decisions=state.metrics.decision_count,
successful_decisions=state.metrics.decision_count
- len(state.progress.nodes_with_failures), # Approximate
failed_decisions=len(state.progress.nodes_with_failures),
total_tokens=state.metrics.total_input_tokens + state.metrics.total_output_tokens,
total_latency_ms=state.progress.total_latency_ms,
nodes_executed=state.metrics.nodes_executed,
edges_traversed=state.metrics.edges_traversed,
)
# Convert problems (SessionState stores as dicts, Run expects Problem objects)
problems = []
for p_dict in state.problems:
# Handle both old Problem objects and new dict format
if isinstance(p_dict, dict):
problems.append(Problem(**p_dict))
else:
problems.append(p_dict)
# Convert decisions (SessionState stores as dicts, Run expects Decision objects)
from framework.schemas.decision import Decision
decisions = []
for d_dict in state.decisions:
# Handle both old Decision objects and new dict format
if isinstance(d_dict, dict):
try:
decisions.append(Decision(**d_dict))
except Exception:
# Skip invalid decisions
continue
else:
decisions.append(d_dict)
# Create Run object
run = Run(
id=state.session_id, # Use session_id as run_id
goal_id=state.goal_id,
started_at=started_at,
status=run_status,
completed_at=completed_at,
decisions=decisions,
problems=problems,
metrics=metrics,
goal_description="", # Not stored in SessionState
input_data=state.input_data,
output_data=state.result.output,
)
return run
async def read_state(
self,
session_id: str,
prefer_new: bool = True,
) -> SessionState | None:
"""
Read execution state from either format.
Args:
session_id: Session ID
prefer_new: If True, try new format first (default)
Returns:
SessionState or None if not found
"""
if prefer_new:
# Try new format first
state = await self.new.read_state(session_id)
if state:
return state
# Fall back to old format
run = await self.old.load_run(session_id)
if run:
return SessionState.from_legacy_run(run, session_id)
return None
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -27,12 +27,12 @@ def write_to_diary(entry: str) -> str:
You do not need to include a timestamp or date heading; those are added
automatically.
"""
from framework.agents.hive_coder.queen_memory import append_episodic_entry
from framework.agents.queen.queen_memory import append_episodic_entry
append_episodic_entry(entry)
return "Diary entry recorded."
def register_queen_memory_tools(registry: "ToolRegistry") -> None:
def register_queen_memory_tools(registry: ToolRegistry) -> None:
"""Register the episodic memory tool into the queen's tool registry."""
registry.register_function(write_to_diary)
@@ -78,19 +78,6 @@ def register_graph_tools(registry: ToolRegistry, runtime: AgentRuntime) -> int:
isolation_level="shared",
)
# Async entry points
for aep in runner.graph.async_entry_points:
entry_points[aep.id] = EntryPointSpec(
id=aep.id,
name=aep.name,
entry_node=aep.entry_node,
trigger_type=aep.trigger_type,
trigger_config=aep.trigger_config,
isolation_level=aep.isolation_level,
priority=aep.priority,
max_concurrent=aep.max_concurrent,
)
await runtime.add_graph(
graph_id=graph_id,
graph=runner.graph,
@@ -1,20 +1,17 @@
"""Worker monitoring tools for the Health Judge and Queen triage agents.
"""Worker monitoring tools for Queen triage agents.
Three tools are registered by ``register_worker_monitoring_tools()``:
- ``get_worker_health_summary`` reads the worker's session log files and
returns a compact health snapshot (recent verdicts, step count, timing).
session_id is optional: if omitted, the most recent active session is
auto-discovered from storage. No agent-side configuration required.
Used by the Health Judge on every timer tick.
auto-discovered from storage.
- ``emit_escalation_ticket`` validates and publishes an EscalationTicket
to the shared EventBus as a WORKER_ESCALATION_TICKET event.
Used by the Health Judge when it decides to escalate.
- ``notify_operator`` emits a QUEEN_INTERVENTION_REQUESTED event so the TUI
can surface a non-disruptive operator notification.
Used by the Queen's ticket_triage_node when it decides to intervene.
Usage::
@@ -45,7 +42,7 @@ def register_worker_monitoring_tools(
registry: ToolRegistry,
event_bus: EventBus,
storage_path: Path,
stream_id: str = "judge",
stream_id: str = "monitoring",
worker_graph_id: str | None = None,
) -> int:
"""Register worker monitoring tools bound to *event_bus* and *storage_path*.
@@ -55,7 +52,7 @@ def register_worker_monitoring_tools(
event_bus: The shared EventBus for the worker runtime.
storage_path: Root storage path of the worker runtime
(e.g. ``~/.hive/agents/{name}``).
stream_id: Stream ID used when emitting events; defaults to judge's stream.
stream_id: Stream ID used when emitting events.
worker_graph_id: The primary worker graph's ID. Included in health summary
so the judge can populate ticket identity fields accurately.
@@ -65,7 +62,7 @@ def register_worker_monitoring_tools(
from framework.llm.provider import Tool
storage_path = Path(storage_path)
# Derive agent identity from storage path so the judge can fill ticket fields.
# Derive agent identity from storage path for ticket fields.
# storage_path is ~/.hive/agents/{agent_name} — the name is the last component.
_worker_agent_id: str = storage_path.name
_worker_graph_id: str = worker_graph_id or storage_path.name
@@ -201,10 +198,9 @@ def register_worker_monitoring_tools(
description=(
"Read the worker agent's execution logs and return a compact health snapshot. "
"Returns worker_agent_id and worker_graph_id (use these for ticket identity fields), "
"recent judge verdicts, step count, time since last step, and "
"recent verdicts, step count, time since last step, and "
"a snippet of the most recent LLM output. "
"session_id is optional — omit it to auto-discover the most recent active session. "
"Use this on every health check to observe trends."
"session_id is optional — omit it to auto-discover the most recent active session."
),
parameters={
"type": "object",
@@ -241,8 +237,7 @@ def register_worker_monitoring_tools(
"""Validate and publish an EscalationTicket to the shared EventBus.
ticket_json must be a JSON string containing all required EscalationTicket
fields. The ticket is validated before publishing this ensures the judge
has genuinely filled out all required evidence fields.
fields. The ticket is validated before publishing.
Returns a confirmation JSON with the ticket_id on success, or an error.
"""
@@ -257,7 +252,7 @@ def register_worker_monitoring_tools(
try:
await event_bus.emit_worker_escalation_ticket(
stream_id=stream_id,
node_id="judge",
node_id="monitoring",
ticket=ticket.model_dump(),
)
logger.info(
@@ -280,7 +275,6 @@ def register_worker_monitoring_tools(
name="emit_escalation_ticket",
description=(
"Validate and publish a structured EscalationTicket to the shared EventBus. "
"The Queen's ticket_receiver entry point will fire and triage the ticket. "
"ticket_json must be a JSON string with all required EscalationTicket fields: "
"worker_agent_id, worker_session_id, worker_node_id, worker_graph_id, "
"severity (low/medium/high/critical), cause, judge_reasoning, suggested_action, "
File diff suppressed because it is too large Load Diff
-13
View File
@@ -1,13 +0,0 @@
"""TUI screens package."""
from .account_selection import AccountSelectionScreen
from .add_local_credential import AddLocalCredentialScreen
from .agent_picker import AgentPickerScreen
from .credential_setup import CredentialSetupScreen
__all__ = [
"AccountSelectionScreen",
"AddLocalCredentialScreen",
"AgentPickerScreen",
"CredentialSetupScreen",
]
@@ -1,111 +0,0 @@
"""Account selection ModalScreen for picking a connected account before agent start."""
from __future__ import annotations
from rich.text import Text
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical
from textual.screen import ModalScreen
from textual.widgets import Label, OptionList
from textual.widgets._option_list import Option
class AccountSelectionScreen(ModalScreen[dict | None]):
"""Modal screen showing connected accounts for pre-run selection.
Returns the selected account dict, or None if dismissed.
"""
SCOPED_CSS = False
BINDINGS = [
Binding("escape", "dismiss_picker", "Cancel"),
]
DEFAULT_CSS = """
AccountSelectionScreen {
align: center middle;
}
#acct-container {
width: 70%;
max-width: 80;
height: 60%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#acct-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#acct-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#acct-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self, accounts: list[dict]) -> None:
super().__init__()
self._accounts = accounts
def compose(self) -> ComposeResult:
n = len(self._accounts)
with Vertical(id="acct-container"):
yield Label("Select Account to Test", id="acct-title")
yield Label(
f"[dim]{n} connected account{'s' if n != 1 else ''}[/dim]",
id="acct-subtitle",
)
option_list = OptionList(id="acct-list")
# Group: Aden accounts first, then local
aden = [a for a in self._accounts if a.get("source") != "local"]
local = [a for a in self._accounts if a.get("source") == "local"]
ordered = aden + local
for i, acct in enumerate(ordered):
provider = acct.get("provider", "unknown")
alias = acct.get("alias", "unknown")
identity = acct.get("identity", {})
source = acct.get("source", "aden")
# Build identity label: prefer email, then username/workspace
identity_label = (
identity.get("email")
or identity.get("username")
or identity.get("workspace")
or ""
)
label = Text()
label.append(f"{provider}/", style="bold")
label.append(alias, style="bold cyan")
if source == "local":
label.append(" [local]", style="dim yellow")
if identity_label:
label.append(f" ({identity_label})", style="dim")
option_list.add_option(Option(label, id=f"acct-{i}"))
# Keep ordered list for index lookups
self._accounts = ordered
yield option_list
yield Label(
"[dim]Enter[/dim] Select [dim]Esc[/dim] Cancel",
id="acct-footer",
)
def on_mount(self) -> None:
ol = self.query_one("#acct-list", OptionList)
ol.styles.height = "1fr"
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
idx = event.option_index
if 0 <= idx < len(self._accounts):
self.dismiss(self._accounts[idx])
def action_dismiss_picker(self) -> None:
self.dismiss(None)
@@ -1,244 +0,0 @@
"""Add Local Credential ModalScreen for storing named local API key accounts."""
from __future__ import annotations
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical, VerticalScroll
from textual.screen import ModalScreen
from textual.widgets import Button, Input, Label, OptionList
from textual.widgets._option_list import Option
class AddLocalCredentialScreen(ModalScreen[dict | None]):
"""Modal screen for adding a named local API key credential.
Phase 1: Pick credential type from list.
Phase 2: Enter alias + API key, run health check, save.
Returns a dict with credential_id, alias, and identity on success, or None on cancel.
"""
BINDINGS = [
Binding("escape", "dismiss_screen", "Cancel"),
]
DEFAULT_CSS = """
AddLocalCredentialScreen {
align: center middle;
}
#alc-container {
width: 80%;
max-width: 90;
height: 80%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#alc-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#alc-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#alc-type-list {
height: 1fr;
}
#alc-form {
height: 1fr;
}
.alc-field {
margin-bottom: 1;
height: auto;
}
.alc-field Label {
margin-bottom: 0;
}
#alc-status {
width: 100%;
height: auto;
margin-top: 1;
padding: 1;
background: $panel;
}
.alc-buttons {
height: auto;
margin-top: 1;
align: center middle;
}
.alc-buttons Button {
margin: 0 1;
}
#alc-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self) -> None:
super().__init__()
# Load credential specs that support direct API keys
self._specs: list[tuple[str, object]] = self._load_specs()
# Selected credential spec (set in phase 2)
self._selected_id: str = ""
self._selected_spec: object = None
self._phase: int = 1 # 1 = type selection, 2 = form
@staticmethod
def _load_specs() -> list[tuple[str, object]]:
"""Return (credential_id, spec) pairs for direct-API-key credentials."""
try:
from aden_tools.credentials import CREDENTIAL_SPECS
return [
(cid, spec)
for cid, spec in CREDENTIAL_SPECS.items()
if getattr(spec, "direct_api_key_supported", False)
]
except Exception:
return []
# ------------------------------------------------------------------
# Compose
# ------------------------------------------------------------------
def compose(self) -> ComposeResult:
with Vertical(id="alc-container"):
yield Label("Add Local Credential", id="alc-title")
yield Label("[dim]Store a named API key account[/dim]", id="alc-subtitle")
# Phase 1: type selection
option_list = OptionList(id="alc-type-list")
for cid, spec in self._specs:
description = getattr(spec, "description", cid)
option_list.add_option(Option(f"{cid} [dim]{description}[/dim]", id=f"type-{cid}"))
yield option_list
# Phase 2: form (hidden initially)
with VerticalScroll(id="alc-form"):
with Vertical(classes="alc-field"):
yield Label("[bold]Alias[/bold] [dim](e.g. work, personal)[/dim]")
yield Input(value="default", id="alc-alias")
with Vertical(classes="alc-field"):
yield Label("[bold]API Key[/bold]")
yield Input(placeholder="Paste API key...", password=True, id="alc-key")
yield Label("", id="alc-status")
with Vertical(classes="alc-buttons"):
yield Button("Test & Save", variant="primary", id="btn-save")
yield Button("Back", variant="default", id="btn-back")
yield Label(
"[dim]Enter[/dim] Select [dim]Esc[/dim] Cancel",
id="alc-footer",
)
def on_mount(self) -> None:
self._show_phase(1)
# ------------------------------------------------------------------
# Phase switching
# ------------------------------------------------------------------
def _show_phase(self, phase: int) -> None:
self._phase = phase
type_list = self.query_one("#alc-type-list", OptionList)
form = self.query_one("#alc-form", VerticalScroll)
if phase == 1:
type_list.display = True
form.display = False
subtitle = self.query_one("#alc-subtitle", Label)
subtitle.update("[dim]Select the credential type to add[/dim]")
else:
type_list.display = False
form.display = True
spec = self._selected_spec
description = (
getattr(spec, "description", self._selected_id) if spec else self._selected_id
)
subtitle = self.query_one("#alc-subtitle", Label)
subtitle.update(f"[dim]{self._selected_id}[/dim] {description}")
self._clear_status()
# Focus the alias input
self.query_one("#alc-alias", Input).focus()
# ------------------------------------------------------------------
# Event handlers
# ------------------------------------------------------------------
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
if self._phase != 1:
return
option_id = event.option.id or ""
if option_id.startswith("type-"):
cid = option_id[5:] # strip "type-" prefix
self._selected_id = cid
self._selected_spec = next(
(spec for spec_id, spec in self._specs if spec_id == cid), None
)
self._show_phase(2)
def on_button_pressed(self, event: Button.Pressed) -> None:
if event.button.id == "btn-save":
self._do_save()
elif event.button.id == "btn-back":
self._show_phase(1)
# ------------------------------------------------------------------
# Save logic
# ------------------------------------------------------------------
def _do_save(self) -> None:
alias = self.query_one("#alc-alias", Input).value.strip() or "default"
api_key = self.query_one("#alc-key", Input).value.strip()
if not api_key:
self._set_status("[red]API key cannot be empty.[/red]")
return
self._set_status("[dim]Running health check...[/dim]")
# Disable save button while running
btn = self.query_one("#btn-save", Button)
btn.disabled = True
try:
from framework.credentials.local.registry import LocalCredentialRegistry
registry = LocalCredentialRegistry.default()
info, health_result = registry.save_account(
credential_id=self._selected_id,
alias=alias,
api_key=api_key,
run_health_check=True,
)
if health_result is not None and not health_result.valid:
self._set_status(
f"[yellow]Saved with failed health check:[/yellow] {health_result.message}\n"
"[dim]You can re-validate later via validate_credential().[/dim]"
)
else:
identity = info.identity.to_dict()
identity_str = ""
if identity:
parts = [f"{k}: {v}" for k, v in identity.items() if v]
identity_str = " " + ", ".join(parts) if parts else ""
self._set_status(f"[green]Saved:[/green] {info.storage_id}{identity_str}")
# Dismiss with result so callers can react
self.set_timer(1.0, lambda: self.dismiss(info.to_account_dict()))
return
except Exception as e:
self._set_status(f"[red]Error:[/red] {e}")
finally:
btn.disabled = False
def _set_status(self, markup: str) -> None:
self.query_one("#alc-status", Label).update(markup)
def _clear_status(self) -> None:
self.query_one("#alc-status", Label).update("")
def action_dismiss_screen(self) -> None:
self.dismiss(None)
-362
View File
@@ -1,362 +0,0 @@
"""Agent picker ModalScreen for selecting agents within the TUI."""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from rich.console import Group
from rich.text import Text
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical
from textual.screen import ModalScreen
from textual.widgets import Label, OptionList, TabbedContent, TabPane
from textual.widgets._option_list import Option
class GetStartedAction(Enum):
"""Actions available in the Get Started tab."""
RUN_EXAMPLES = "run_examples"
RUN_EXISTING = "run_existing"
BUILD_EDIT = "build_edit"
@dataclass
class AgentEntry:
"""Lightweight agent metadata for the picker."""
path: Path
name: str
description: str
category: str
session_count: int = 0
node_count: int = 0
tool_count: int = 0
tags: list[str] = field(default_factory=list)
last_active: str | None = None
def _get_last_active(agent_name: str) -> str | None:
"""Return the most recent updated_at timestamp across all sessions."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return None
latest: str | None = None
for session_dir in sessions_dir.iterdir():
if not session_dir.is_dir() or not session_dir.name.startswith("session_"):
continue
state_file = session_dir / "state.json"
if not state_file.exists():
continue
try:
data = json.loads(state_file.read_text(encoding="utf-8"))
ts = data.get("timestamps", {}).get("updated_at")
if ts and (latest is None or ts > latest):
latest = ts
except Exception:
continue
return latest
def _count_sessions(agent_name: str) -> int:
"""Count session directories under ~/.hive/agents/{agent_name}/sessions/."""
sessions_dir = Path.home() / ".hive" / "agents" / agent_name / "sessions"
if not sessions_dir.exists():
return 0
return sum(1 for d in sessions_dir.iterdir() if d.is_dir() and d.name.startswith("session_"))
def _extract_agent_stats(agent_path: Path) -> tuple[int, int, list[str]]:
"""Extract node count, tool count, and tags from an agent directory.
Prefers agent.py (AST-parsed) over agent.json for node/tool counts
since agent.json may be stale. Tags are only available from agent.json.
"""
import ast
node_count, tool_count, tags = 0, 0, []
# Try agent.py first — source of truth for nodes
agent_py = agent_path / "agent.py"
if agent_py.exists():
try:
tree = ast.parse(agent_py.read_text(encoding="utf-8"))
for node in ast.walk(tree):
# Find `nodes = [...]` assignment
if isinstance(node, ast.Assign):
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "nodes":
if isinstance(node.value, ast.List):
node_count = len(node.value.elts)
except Exception:
pass
# Fall back to / supplement from agent.json
agent_json = agent_path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
json_nodes = data.get("nodes", [])
if node_count == 0:
node_count = len(json_nodes)
# Tool count: use whichever source gave us nodes, but agent.json
# has the structured tool lists so prefer it for tool counting
tools: set[str] = set()
for n in json_nodes:
tools.update(n.get("tools", []))
tool_count = len(tools)
tags = data.get("agent", {}).get("tags", [])
except Exception:
pass
return node_count, tool_count, tags
def discover_agents() -> dict[str, list[AgentEntry]]:
"""Discover agents from all known sources grouped by category."""
from framework.runner.cli import (
_extract_python_agent_metadata,
_get_framework_agents_dir,
_is_valid_agent_dir,
)
groups: dict[str, list[AgentEntry]] = {}
sources = [
("Your Agents", Path("exports")),
("Framework", _get_framework_agents_dir()),
("Examples", Path("examples/templates")),
]
for category, base_dir in sources:
if not base_dir.exists():
continue
entries: list[AgentEntry] = []
for path in sorted(base_dir.iterdir(), key=lambda p: p.name):
if not _is_valid_agent_dir(path):
continue
# config.py is source of truth for name/description
name, desc = _extract_python_agent_metadata(path)
config_fallback_name = path.name.replace("_", " ").title()
used_config = name != config_fallback_name
node_count, tool_count, tags = _extract_agent_stats(path)
if not used_config:
# config.py didn't provide values, fall back to agent.json
agent_json = path / "agent.json"
if agent_json.exists():
try:
data = json.loads(agent_json.read_text(encoding="utf-8"))
meta = data.get("agent", {})
name = meta.get("name", name)
desc = meta.get("description", desc)
except Exception:
pass
entries.append(
AgentEntry(
path=path,
name=name,
description=desc,
category=category,
session_count=_count_sessions(path.name),
node_count=node_count,
tool_count=tool_count,
tags=tags,
last_active=_get_last_active(path.name),
)
)
if entries:
groups[category] = entries
return groups
def _render_agent_option(agent: AgentEntry) -> Group:
"""Build a Rich renderable for a single agent option."""
# Line 1: name + session badge
line1 = Text()
line1.append(agent.name, style="bold")
if agent.session_count:
line1.append(f" {agent.session_count} sessions", style="dim cyan")
# Line 2: description (word-wrapped by the widget)
desc = agent.description if agent.description else "No description"
line2 = Text(desc, style="dim")
# Line 3: stats chips
chips = Text()
if agent.node_count:
chips.append(f" {agent.node_count} nodes ", style="on dark_green white")
chips.append(" ")
if agent.tool_count:
chips.append(f" {agent.tool_count} tools ", style="on dark_blue white")
chips.append(" ")
for tag in agent.tags[:3]:
chips.append(f" {tag} ", style="on grey37 white")
chips.append(" ")
parts = [line1, line2]
if chips.plain.strip():
parts.append(chips)
return Group(*parts)
def _render_get_started_option(title: str, description: str, icon: str = "") -> Group:
"""Build a Rich renderable for a Get Started option."""
line1 = Text()
line1.append(f"{icon} ", style="bold cyan")
line1.append(title, style="bold")
line2 = Text(description, style="dim")
return Group(line1, line2)
class AgentPickerScreen(ModalScreen[str | None]):
"""Modal screen showing available agents organized by tabbed categories.
Returns the selected agent path as a string, or None if dismissed.
For Get Started actions, returns a special prefix like "action:run_examples".
"""
BINDINGS = [
Binding("escape", "dismiss_picker", "Cancel"),
]
DEFAULT_CSS = """
AgentPickerScreen {
align: center middle;
}
#picker-container {
width: 90%;
max-width: 120;
height: 85%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#picker-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#picker-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#picker-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
TabPane {
padding: 0;
}
OptionList {
height: 1fr;
}
OptionList > .option-list--option {
padding: 1 2;
}
"""
def __init__(
self,
agent_groups: dict[str, list[AgentEntry]],
show_get_started: bool = False,
) -> None:
super().__init__()
self._groups = agent_groups
self._show_get_started = show_get_started
# Map (tab_id, option_index) -> AgentEntry
self._option_map: dict[str, dict[int, AgentEntry]] = {}
def compose(self) -> ComposeResult:
total = sum(len(v) for v in self._groups.values())
with Vertical(id="picker-container"):
yield Label("Hive Agent Launcher", id="picker-title")
yield Label(
f"[dim]{total} agents available[/dim]",
id="picker-subtitle",
)
with TabbedContent():
# Get Started tab (only on initial launch)
if self._show_get_started:
with TabPane("Get Started", id="get-started"):
option_list = OptionList(id="list-get-started")
option_list.add_option(
Option(
_render_get_started_option(
"Test and run example agents",
"Try pre-built example agents to learn how Hive works",
"📚",
),
id="action:run_examples",
)
)
option_list.add_option(
Option(
_render_get_started_option(
"Test and run existing agent",
"Load and run an agent you've already built (from exports/)",
"🚀",
),
id="action:run_existing",
)
)
option_list.add_option(
Option(
_render_get_started_option(
"Build or edit agent",
"Create a new agent or modify an existing one",
"🛠️ ",
),
id="action:build_edit",
)
)
yield option_list
# Agent category tabs
for category, agents in self._groups.items():
tab_id = category.lower().replace(" ", "-")
with TabPane(f"{category} ({len(agents)})", id=tab_id):
option_list = OptionList(id=f"list-{tab_id}")
self._option_map[f"list-{tab_id}"] = {}
for i, agent in enumerate(agents):
option_list.add_option(
Option(
_render_agent_option(agent),
id=str(agent.path),
)
)
self._option_map[f"list-{tab_id}"][i] = agent
yield option_list
yield Label(
"[dim]Enter[/dim] Select [dim]Tab[/dim] Switch category [dim]Esc[/dim] Cancel",
id="picker-footer",
)
def on_option_list_option_selected(self, event: OptionList.OptionSelected) -> None:
list_id = event.option_list.id or ""
# Handle Get Started tab options
if list_id == "list-get-started":
option = event.option
if option and option.id:
self.dismiss(option.id) # Returns "action:run_examples", etc.
return
# Handle agent selection from other tabs
idx = event.option_index
agent_map = self._option_map.get(list_id, {})
agent = agent_map.get(idx)
if agent:
self.dismiss(str(agent.path))
def action_dismiss_picker(self) -> None:
self.dismiss(None)
@@ -1,304 +0,0 @@
"""Credential setup ModalScreen for configuring missing agent credentials."""
from __future__ import annotations
import os
from textual.app import ComposeResult
from textual.binding import Binding
from textual.containers import Vertical, VerticalScroll
from textual.screen import ModalScreen
from textual.widgets import Button, Input, Label
from framework.credentials.setup import CredentialSetupSession, MissingCredential
class CredentialSetupScreen(ModalScreen[bool | None]):
"""Modal screen for configuring missing agent credentials.
Shows a form with one password Input per missing credential.
For Aden-backed credentials (``aden_supported=True``), prompts for
``ADEN_API_KEY`` and runs the Aden sync flow instead of storing a
raw value.
Returns True on successful save, or None on cancel/skip.
"""
BINDINGS = [
Binding("escape", "dismiss_setup", "Cancel"),
]
DEFAULT_CSS = """
CredentialSetupScreen {
align: center middle;
}
#cred-container {
width: 80%;
max-width: 100;
height: 80%;
background: $surface;
border: heavy $primary;
padding: 1 2;
}
#cred-title {
text-align: center;
text-style: bold;
width: 100%;
color: $text;
}
#cred-subtitle {
text-align: center;
width: 100%;
margin-bottom: 1;
}
#cred-scroll {
height: 1fr;
}
.cred-entry {
margin-bottom: 1;
padding: 1;
background: $panel;
height: auto;
}
.cred-entry Input {
margin-top: 1;
}
.cred-buttons {
height: auto;
margin-top: 1;
align: center middle;
}
.cred-buttons Button {
margin: 0 1;
}
#cred-footer {
text-align: center;
width: 100%;
margin-top: 1;
}
"""
def __init__(self, session: CredentialSetupSession) -> None:
super().__init__()
self._session = session
self._missing: list[MissingCredential] = session.missing
# Track which credentials need Aden sync vs direct API key
self._aden_creds: set[int] = set()
self._needs_aden_key = False
for i, cred in enumerate(self._missing):
if cred.aden_supported and not cred.direct_api_key_supported:
self._aden_creds.add(i)
self._needs_aden_key = True
def compose(self) -> ComposeResult:
n = len(self._missing)
with Vertical(id="cred-container"):
yield Label("Credential Setup", id="cred-title")
yield Label(
f"[dim]{n} credential{'s' if n != 1 else ''} needed to run this agent[/dim]",
id="cred-subtitle",
)
with VerticalScroll(id="cred-scroll"):
# If any credential needs Aden, show ADEN_API_KEY input first
if self._needs_aden_key:
aden_key = os.environ.get("ADEN_API_KEY", "")
with Vertical(classes="cred-entry"):
yield Label("[bold]ADEN_API_KEY[/bold]")
aden_names = [
self._missing[i].credential_name for i in sorted(self._aden_creds)
]
yield Label(f"[dim]Required for OAuth sync: {', '.join(aden_names)}[/dim]")
yield Label("[cyan]Get key:[/cyan] https://hive.adenhq.com")
yield Input(
placeholder="Paste ADEN_API_KEY..."
if not aden_key
else "Already set (leave blank to keep)",
password=True,
id="key-aden",
)
# Show direct API key inputs for non-Aden credentials
for i, cred in enumerate(self._missing):
if i in self._aden_creds:
continue # Handled via Aden sync above
with Vertical(classes="cred-entry"):
yield Label(f"[bold]{cred.env_var}[/bold]")
affected = cred.tools or cred.node_types
if affected:
yield Label(f"[dim]Required by: {', '.join(affected)}[/dim]")
if cred.description:
yield Label(f"[dim]{cred.description}[/dim]")
if cred.help_url:
yield Label(f"[cyan]Get key:[/cyan] {cred.help_url}")
yield Input(
placeholder="Paste API key...",
password=True,
id=f"key-{i}",
)
with Vertical(classes="cred-buttons"):
yield Button("Save & Continue", variant="primary", id="btn-save")
yield Button("Skip", variant="default", id="btn-skip")
yield Label(
"[dim]Enter[/dim] Submit [dim]Esc[/dim] Cancel",
id="cred-footer",
)
def on_button_pressed(self, event: Button.Pressed) -> None:
if event.button.id == "btn-save":
self._save_credentials()
elif event.button.id == "btn-skip":
self.dismiss(None)
def _save_credentials(self) -> None:
"""Collect inputs, store credentials, and dismiss."""
self._session._ensure_credential_key()
configured = 0
# Handle Aden-backed credentials
if self._needs_aden_key:
aden_input = self.query_one("#key-aden", Input)
aden_key = aden_input.value.strip()
if aden_key:
from framework.credentials.key_storage import save_aden_api_key
save_aden_api_key(aden_key)
configured += 1 # ADEN_API_KEY itself counts as configured
# Run Aden sync for all Aden-backed creds (best-effort)
if aden_key or os.environ.get("ADEN_API_KEY"):
self._sync_aden_credentials()
# Handle direct API key credentials
for i, cred in enumerate(self._missing):
if i in self._aden_creds:
continue
input_widget = self.query_one(f"#key-{i}", Input)
value = input_widget.value.strip()
if not value:
continue
try:
self._session._store_credential(cred, value)
configured += 1
except Exception as e:
self.notify(f"Error storing {cred.env_var}: {e}", severity="error")
if configured > 0:
self.dismiss(True)
else:
self.notify("No credentials configured", severity="warning", timeout=3)
def _sync_aden_credentials(self) -> int:
"""Sync Aden-backed credentials and return count of successfully synced."""
# Build the Aden sync components directly so we get real errors
# instead of CredentialStore.with_aden_sync() silently falling back.
try:
from framework.credentials.aden import (
AdenCachedStorage,
AdenClientConfig,
AdenCredentialClient,
AdenSyncProvider,
)
from framework.credentials.storage import EncryptedFileStorage
client = AdenCredentialClient(AdenClientConfig(base_url="https://api.adenhq.com"))
provider = AdenSyncProvider(client=client)
local_storage = EncryptedFileStorage()
cached_storage = AdenCachedStorage(
local_storage=local_storage,
aden_provider=provider,
)
except Exception as e:
self.notify(
f"Aden setup error: {e}",
severity="error",
timeout=8,
)
return 0
# Sync all integrations from Aden to get the provider index populated
try:
from framework.credentials import CredentialStore
store = CredentialStore(
storage=cached_storage,
providers=[provider],
auto_refresh=True,
)
num_synced = provider.sync_all(store)
if num_synced == 0:
self.notify(
"No active integrations found in Aden. "
"Connect integrations at hive.adenhq.com.",
severity="warning",
timeout=8,
)
except Exception as e:
self.notify(
f"Aden sync error: {e}",
severity="error",
timeout=8,
)
return 0
synced = 0
for i in sorted(self._aden_creds):
cred = self._missing[i]
cred_id = cred.credential_id or cred.credential_name
if store.is_available(cred_id):
try:
value = store.get_key(cred_id, cred.credential_key)
if value:
os.environ[cred.env_var] = value
self._persist_to_local_store(cred_id, cred.credential_key, value)
synced += 1
else:
self.notify(
f"{cred.credential_name}: key "
f"'{cred.credential_key}' not found "
f"in credential '{cred_id}'",
severity="warning",
timeout=8,
)
except Exception as e:
self.notify(
f"{cred.credential_name} extraction failed: {e}",
severity="error",
timeout=8,
)
else:
self.notify(
f"{cred.credential_name} (id='{cred_id}') "
f"not found in Aden. Connect this "
f"integration at hive.adenhq.com first.",
severity="warning",
timeout=8,
)
return synced
@staticmethod
def _persist_to_local_store(cred_id: str, key_name: str, value: str) -> None:
"""Save a synced token to the local encrypted store under the canonical ID."""
try:
from pydantic import SecretStr
from framework.credentials.models import CredentialKey, CredentialObject, CredentialType
from framework.credentials.storage import EncryptedFileStorage
cred_obj = CredentialObject(
id=cred_id,
credential_type=CredentialType.OAUTH2,
keys={
key_name: CredentialKey(
name=key_name,
value=SecretStr(value),
),
},
auto_refresh=True,
)
EncryptedFileStorage().save(cred_obj)
except Exception:
pass # Best-effort; env var is the primary delivery mechanism
def action_dismiss_setup(self) -> None:
self.dismiss(None)
File diff suppressed because it is too large Load Diff
-139
View File
@@ -1,139 +0,0 @@
"""
Native OS file dialog for PDF selection.
Launches the platform's native file picker (macOS: NSOpenPanel via osascript,
Linux: zenity/kdialog, Windows: PowerShell OpenFileDialog) in a background
thread so Textual's event loop stays responsive.
Falls back to None when no GUI is available (SSH, headless).
"""
import asyncio
import os
import subprocess
import sys
from pathlib import Path
def _has_gui() -> bool:
"""Detect whether a GUI display is available."""
if sys.platform == "darwin":
# macOS: GUI is available unless running over SSH without display forwarding.
return "SSH_CONNECTION" not in os.environ or "DISPLAY" in os.environ
elif sys.platform == "win32":
return True
else:
# Linux/BSD: Need X11 or Wayland.
return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
def _linux_file_dialog() -> subprocess.CompletedProcess | None:
"""Try zenity, then kdialog, on Linux. Returns CompletedProcess or None."""
# Try zenity (GTK)
try:
return subprocess.run(
[
"zenity",
"--file-selection",
"--title=Select a PDF file",
"--file-filter=PDF files (*.pdf)|*.pdf",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
# Try kdialog (KDE)
try:
return subprocess.run(
[
"kdialog",
"--getopenfilename",
".",
"PDF files (*.pdf)",
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
except FileNotFoundError:
pass
return None
def _pick_pdf_subprocess() -> Path | None:
"""Run the native file dialog. BLOCKS until user picks or cancels.
Returns a Path on success, None on cancel or error.
Must be called from a non-main thread (via asyncio.to_thread).
"""
try:
if sys.platform == "darwin":
result = subprocess.run(
[
"osascript",
"-e",
'POSIX path of (choose file of type {"com.adobe.pdf"} '
'with prompt "Select a PDF file")',
],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
elif sys.platform == "win32":
ps_script = (
"Add-Type -AssemblyName System.Windows.Forms; "
"$f = New-Object System.Windows.Forms.OpenFileDialog; "
"$f.Filter = 'PDF files (*.pdf)|*.pdf'; "
"$f.Title = 'Select a PDF file'; "
"if ($f.ShowDialog() -eq 'OK') { $f.FileName }"
)
result = subprocess.run(
["powershell", "-NoProfile", "-Command", ps_script],
encoding="utf-8",
capture_output=True,
text=True,
timeout=300,
)
else:
result = _linux_file_dialog()
if result is None:
return None
if result.returncode != 0:
return None
path_str = result.stdout.strip()
if not path_str:
return None
path = Path(path_str)
if path.is_file() and path.suffix.lower() == ".pdf":
return path
return None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
return None
async def pick_pdf_file() -> Path | None:
"""Open a native OS file dialog to pick a PDF file.
Non-blocking: runs the dialog subprocess in a background thread via
asyncio.to_thread(), so the calling event loop stays responsive.
Returns:
Path to the selected PDF, or None if the user cancelled,
no GUI is available, or the dialog command was not found.
"""
if not _has_gui():
return None
return await asyncio.to_thread(_pick_pdf_subprocess)
-585
View File
@@ -1,585 +0,0 @@
"""
Graph/Tree Overview Widget - Displays real agent graph structure.
Supports rendering loops (back-edges) via right-side return channels:
arrows drawn on the right margin that visually point back up to earlier nodes.
"""
from __future__ import annotations
import re
import time
from textual.app import ComposeResult
from textual.containers import Vertical
from framework.runtime.agent_runtime import AgentRuntime
from framework.runtime.event_bus import EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# Width of each return-channel column (padding + │ + gap)
_CHANNEL_WIDTH = 5
# Regex to strip Rich markup tags for measuring visible width
_MARKUP_RE = re.compile(r"\[/?[^\]]*\]")
def _plain_len(s: str) -> int:
"""Return the visible character length of a Rich-markup string."""
return len(_MARKUP_RE.sub("", s))
class GraphOverview(Vertical):
"""Widget to display Agent execution graph/tree with real data."""
DEFAULT_CSS = """
GraphOverview {
width: 100%;
height: 100%;
background: $panel;
}
GraphOverview > RichLog {
width: 100%;
height: 100%;
background: $panel;
border: none;
scrollbar-background: $surface;
scrollbar-color: $primary;
}
"""
def __init__(self, runtime: AgentRuntime):
super().__init__()
self.runtime = runtime
self._override_graph = None # Set by switch_graph() for secondary graphs
self.active_node: str | None = None
self.execution_path: list[str] = []
# Per-node status strings shown next to the node in the graph display.
# e.g. {"planner": "thinking...", "searcher": "web_search..."}
self._node_status: dict[str, str] = {}
@property
def _graph(self):
"""The graph currently being displayed (may be a secondary graph)."""
return self._override_graph or self.runtime.graph
def switch_graph(self, graph) -> None:
"""Switch to displaying a different graph and refresh."""
self._override_graph = graph
self.active_node = None
self.execution_path = []
self._node_status = {}
self._display_graph()
def compose(self) -> ComposeResult:
# Use RichLog for formatted output
yield RichLog(id="graph-display", highlight=True, markup=True)
def on_mount(self) -> None:
"""Display initial graph structure."""
self._display_graph()
# Refresh every 1s so timer countdowns stay current
if self.runtime._timer_next_fire is not None:
self.set_interval(1.0, self._display_graph)
# ------------------------------------------------------------------
# Graph analysis helpers
# ------------------------------------------------------------------
def _topo_order(self) -> list[str]:
"""BFS from entry_node following edges."""
graph = self._graph
visited: list[str] = []
seen: set[str] = set()
queue = [graph.entry_node]
while queue:
nid = queue.pop(0)
if nid in seen:
continue
seen.add(nid)
visited.append(nid)
for edge in graph.get_outgoing_edges(nid):
if edge.target not in seen:
queue.append(edge.target)
# Append orphan nodes not reachable from entry
for node in graph.nodes:
if node.id not in seen:
visited.append(node.id)
return visited
def _detect_back_edges(self, ordered: list[str]) -> list[dict]:
"""Find edges where target appears before (or equal to) source in topo order.
Returns a list of dicts with keys: edge, source, target, source_idx, target_idx.
"""
order_idx = {nid: i for i, nid in enumerate(ordered)}
back_edges: list[dict] = []
for node_id in ordered:
for edge in self._graph.get_outgoing_edges(node_id):
target_idx = order_idx.get(edge.target, -1)
source_idx = order_idx.get(node_id, -1)
if target_idx != -1 and target_idx <= source_idx:
back_edges.append(
{
"edge": edge,
"source": node_id,
"target": edge.target,
"source_idx": source_idx,
"target_idx": target_idx,
}
)
return back_edges
def _is_back_edge(self, source: str, target: str, order_idx: dict[str, int]) -> bool:
"""Check whether an edge from *source* to *target* is a back-edge."""
si = order_idx.get(source, -1)
ti = order_idx.get(target, -1)
return ti != -1 and ti <= si
# ------------------------------------------------------------------
# Line rendering (Pass 1)
# ------------------------------------------------------------------
def _render_node_line(self, node_id: str) -> str:
"""Render a single node with status symbol and optional status text."""
graph = self._graph
is_terminal = node_id in (graph.terminal_nodes or [])
is_active = node_id == self.active_node
is_done = node_id in self.execution_path and not is_active
status = self._node_status.get(node_id, "")
if is_active:
sym = "[bold green]●[/bold green]"
elif is_done:
sym = "[dim]✓[/dim]"
elif is_terminal:
sym = "[yellow]■[/yellow]"
else:
sym = ""
if is_active:
name = f"[bold green]{node_id}[/bold green]"
elif is_done:
name = f"[dim]{node_id}[/dim]"
else:
name = node_id
suffix = f" [italic]{status}[/italic]" if status else ""
return f" {sym} {name}{suffix}"
def _render_edges(self, node_id: str, order_idx: dict[str, int]) -> list[str]:
"""Render forward-edge connectors from *node_id*.
Back-edges are excluded here they are drawn by the return-channel
overlay in Pass 2.
"""
all_edges = self._graph.get_outgoing_edges(node_id)
if not all_edges:
return []
# Split into forward and back
forward = [e for e in all_edges if not self._is_back_edge(node_id, e.target, order_idx)]
if not forward:
# All edges are back-edges — nothing to render here
return []
if len(forward) == 1:
return ["", ""]
# Fan-out: show branches
lines: list[str] = []
for i, edge in enumerate(forward):
connector = "" if i == len(forward) - 1 else ""
cond = ""
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
lines.append(f" {connector}──▶ {edge.target}{cond}")
return lines
# ------------------------------------------------------------------
# Return-channel overlay (Pass 2)
# ------------------------------------------------------------------
def _overlay_return_channels(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
available_width: int,
) -> list[str]:
"""Overlay right-side return channels onto the line buffer.
Each back-edge gets a vertical channel on the right margin. Channels
are allocated left-to-right by increasing span length so that shorter
(inner) loops are closer to the graph body and longer (outer) loops are
further right.
If the terminal is too narrow to fit even one channel, we fall back to
simple inline ```` annotations instead.
"""
if not back_edges:
return lines
num_channels = len(back_edges)
# Sort by span length ascending → inner loops get nearest channel
sorted_be = sorted(back_edges, key=lambda b: b["source_idx"] - b["target_idx"])
# --- Insert dedicated connector lines for back-edge sources ---
# Each back-edge source gets a blank line inserted after its node
# section (after any forward-edge lines). We process insertions in
# reverse order so that earlier indices remain valid.
all_node_lines_set = set(node_line_map.values())
insertions: list[tuple[int, int]] = [] # (insert_after_line, be_index)
for be_idx, be in enumerate(sorted_be):
source_node_line = node_line_map.get(be["source"])
if source_node_line is None:
continue
# Walk forward to find the last line in this node's section
last_section_line = source_node_line
for li in range(source_node_line + 1, len(lines)):
if li in all_node_lines_set:
break
last_section_line = li
insertions.append((last_section_line, be_idx))
source_line_for_be: dict[int, int] = {}
for insert_after, be_idx in sorted(insertions, reverse=True):
insert_at = insert_after + 1
lines.insert(insert_at, "") # placeholder for connector
source_line_for_be[be_idx] = insert_at
# Shift node_line_map entries that come after the insertion point
for nid in node_line_map:
if node_line_map[nid] > insert_after:
node_line_map[nid] += 1
# Also shift already-assigned source lines
for prev_idx in source_line_for_be:
if prev_idx != be_idx and source_line_for_be[prev_idx] > insert_after:
source_line_for_be[prev_idx] += 1
# Recompute max content width after insertions
max_content_w = max(_plain_len(ln) for ln in lines) if lines else 0
# Check if we have room for channels
channels_total_w = num_channels * _CHANNEL_WIDTH
if max_content_w + channels_total_w + 2 > available_width:
return self._inline_back_edge_fallback(lines, node_line_map, back_edges)
content_pad = max_content_w + 3 # gap between content and first channel
# Build channel info with final line positions
channel_info: list[dict] = []
for ch_idx, be in enumerate(sorted_be):
target_line = node_line_map.get(be["target"])
source_line = source_line_for_be.get(ch_idx)
if target_line is None or source_line is None:
continue
col = content_pad + ch_idx * _CHANNEL_WIDTH
channel_info.append(
{
"target_line": target_line,
"source_line": source_line,
"col": col,
}
)
if not channel_info:
return lines
# Build overlay grid — one row per line, columns for channel area
total_width = content_pad + num_channels * _CHANNEL_WIDTH + 1
overlay_width = total_width - max_content_w
overlays: list[list[str]] = [[" "] * overlay_width for _ in range(len(lines))]
for ci in channel_info:
tl = ci["target_line"]
sl = ci["source_line"]
col_offset = ci["col"] - max_content_w
if col_offset < 0 or col_offset >= overlay_width:
continue
# Target line: ◄──...──┐
if 0 <= tl < len(overlays):
for c in range(col_offset):
if overlays[tl][c] == " ":
overlays[tl][c] = ""
overlays[tl][col_offset] = ""
# Source line: ──...──┘
if 0 <= sl < len(overlays):
for c in range(col_offset):
if overlays[sl][c] == " ":
overlays[sl][c] = ""
overlays[sl][col_offset] = ""
# Vertical lines between target+1 and source-1
for li in range(tl + 1, sl):
if 0 <= li < len(overlays) and overlays[li][col_offset] == " ":
overlays[li][col_offset] = ""
# Merge overlays into the line strings
result: list[str] = []
for i, line in enumerate(lines):
pw = _plain_len(line)
pad = max_content_w - pw
overlay_chars = overlays[i] if i < len(overlays) else []
overlay_str = "".join(overlay_chars)
overlay_trimmed = overlay_str.rstrip()
if overlay_trimmed:
is_target_line = any(ci["target_line"] == i for ci in channel_info)
if is_target_line:
overlay_trimmed = "" + overlay_trimmed[1:]
is_source_line = any(ci["source_line"] == i for ci in channel_info)
if is_source_line and not line.strip():
# Inserted blank line → build └───┘ connector.
# " └" = 3 chars of content prefix, so remaining pad = max_content_w - 3
remaining_pad = max_content_w - 3
full = list(" " * remaining_pad + overlay_trimmed)
# Find the ┘ corner for this source connector
corner_pos = -1
for ci_s in channel_info:
if ci_s["source_line"] == i:
corner_pos = remaining_pad + (ci_s["col"] - max_content_w)
break
# Fill everything up to the corner with ─
if corner_pos >= 0:
for c in range(corner_pos):
if full[c] not in ("", "", ""):
full[c] = ""
connector = "" + "".join(full).rstrip()
result.append(f"[dim]{connector}[/dim]")
continue
colored_overlay = f"[dim]{' ' * pad}{overlay_trimmed}[/dim]"
result.append(f"{line}{colored_overlay}")
else:
result.append(line)
return result
def _inline_back_edge_fallback(
self,
lines: list[str],
node_line_map: dict[str, int],
back_edges: list[dict],
) -> list[str]:
"""Fallback: add inline ↺ annotations when terminal is too narrow for channels."""
# Group back-edges by source node
source_to_be: dict[str, list[dict]] = {}
for be in back_edges:
source_to_be.setdefault(be["source"], []).append(be)
result = list(lines)
# Insert annotation lines after each source node's section
offset = 0
all_node_lines = sorted(node_line_map.values())
for source, bes in source_to_be.items():
source_line = node_line_map.get(source)
if source_line is None:
continue
# Find end of source node section
end_line = source_line
for nl in all_node_lines:
if nl > source_line:
end_line = nl - 1
break
else:
end_line = len(lines) - 1
# Insert after last content line of this node's section
insert_at = end_line + offset + 1
for be in bes:
cond = ""
edge = be["edge"]
if edge.condition.value not in ("always", "on_success"):
cond = f" [dim]({edge.condition.value})[/dim]"
annotation = f" [yellow]↺[/yellow] {be['target']}{cond}"
result.insert(insert_at, annotation)
insert_at += 1
offset += 1
return result
# ------------------------------------------------------------------
# Main display
# ------------------------------------------------------------------
def _display_graph(self) -> None:
"""Display the graph as an ASCII DAG with edge connectors and loop channels."""
display = self.query_one("#graph-display", RichLog)
display.clear()
graph = self._graph
display.write(f"[bold cyan]Agent Graph:[/bold cyan] {graph.id}\n")
ordered = self._topo_order()
order_idx = {nid: i for i, nid in enumerate(ordered)}
# --- Pass 1: Build line buffer ---
lines: list[str] = []
node_line_map: dict[str, int] = {}
for node_id in ordered:
node_line_map[node_id] = len(lines)
lines.append(self._render_node_line(node_id))
for edge_line in self._render_edges(node_id, order_idx):
lines.append(edge_line)
# --- Pass 2: Overlay return channels for back-edges ---
back_edges = self._detect_back_edges(ordered)
if back_edges:
# Try to get actual widget width; default to a reasonable value
try:
available_width = self.size.width or 60
except Exception:
available_width = 60
lines = self._overlay_return_channels(lines, node_line_map, back_edges, available_width)
# Write all lines
for line in lines:
display.write(line)
# Execution path footer
if self.execution_path:
display.write("")
display.write(f"[dim]Path:[/dim] {''.join(self.execution_path[-5:])}")
# Event sources section
self._render_event_sources(display)
# ------------------------------------------------------------------
# Event sources display
# ------------------------------------------------------------------
def _render_event_sources(self, display: RichLog) -> None:
"""Render event source info (webhooks, timers) below the graph."""
entry_points = self.runtime.get_entry_points()
# Filter to non-manual entry points (webhooks, timers, events)
event_sources = [ep for ep in entry_points if ep.trigger_type not in ("manual",)]
if not event_sources:
return
display.write("")
display.write("[bold cyan]Event Sources[/bold cyan]")
config = self.runtime._config
for ep in event_sources:
if ep.trigger_type == "timer":
cron_expr = ep.trigger_config.get("cron")
interval = ep.trigger_config.get("interval_minutes", "?")
schedule_label = f"cron: {cron_expr}" if cron_expr else f"every {interval} min"
display.write(f" [green]⏱[/green] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show schedule + next fire countdown
next_fire = self.runtime._timer_next_fire.get(ep.id)
if next_fire is not None:
remaining = max(0, next_fire - time.monotonic())
hours, rem = divmod(int(remaining), 3600)
mins, secs = divmod(rem, 60)
if hours > 0:
countdown = f"{hours}h {mins:02d}m {secs:02d}s"
else:
countdown = f"{mins}m {secs:02d}s"
display.write(f" [dim]{schedule_label} — next in {countdown}[/dim]")
else:
display.write(f" [dim]{schedule_label}[/dim]")
elif ep.trigger_type in ("event", "webhook"):
display.write(f" [yellow]⚡[/yellow] {ep.name} [dim]→ {ep.entry_node}[/dim]")
# Show webhook endpoint if configured
route = None
for r in config.webhook_routes:
src = r.get("source_id", "")
if src and src in ep.id:
route = r
break
if not route and config.webhook_routes:
# Fall back to first route
route = config.webhook_routes[0]
if route:
host = config.webhook_host
port = config.webhook_port
path = route.get("path", "/webhook")
display.write(f" [dim]{host}:{port}{path}[/dim]")
else:
event_types = ep.trigger_config.get("event_types", [])
if event_types:
display.write(f" [dim]events: {', '.join(event_types)}[/dim]")
# ------------------------------------------------------------------
# Public API (called by app.py)
# ------------------------------------------------------------------
def update_active_node(self, node_id: str) -> None:
"""Update the currently active node."""
self.active_node = node_id
if node_id not in self.execution_path:
self.execution_path.append(node_id)
self._display_graph()
def update_execution(self, event) -> None:
"""Update the displayed node status based on execution lifecycle events."""
if event.type == EventType.EXECUTION_STARTED:
self._node_status.clear()
self.execution_path.clear()
entry_node = event.data.get("entry_node") or (
self._graph.entry_node if self.runtime else None
)
if entry_node:
self.update_active_node(entry_node)
elif event.type == EventType.EXECUTION_COMPLETED:
self.active_node = None
self._node_status.clear()
self._display_graph()
elif event.type == EventType.EXECUTION_FAILED:
error = event.data.get("error", "Unknown error")
if self.active_node:
self._node_status[self.active_node] = f"[red]FAILED: {error}[/red]"
self.active_node = None
self._display_graph()
# -- Event handlers called by app.py _handle_event --
def handle_node_loop_started(self, node_id: str) -> None:
"""A node's event loop has started."""
self._node_status[node_id] = "thinking..."
self.update_active_node(node_id)
def handle_node_loop_iteration(self, node_id: str, iteration: int) -> None:
"""A node advanced to a new loop iteration."""
self._node_status[node_id] = f"step {iteration}"
self._display_graph()
def handle_node_loop_completed(self, node_id: str) -> None:
"""A node's event loop completed."""
self._node_status.pop(node_id, None)
if self.active_node == node_id:
self.active_node = None
self._display_graph()
def handle_tool_call(self, node_id: str, tool_name: str, *, started: bool) -> None:
"""Show tool activity next to the active node."""
if started:
self._node_status[node_id] = f"{tool_name}..."
else:
# Restore to generic thinking status after tool completes
self._node_status[node_id] = "thinking..."
self._display_graph()
def handle_stalled(self, node_id: str, reason: str) -> None:
"""Highlight a stalled node."""
self._node_status[node_id] = f"[red]stalled: {reason}[/red]"
self._display_graph()
def handle_edge_traversed(self, source_node: str, target_node: str) -> None:
"""Highlight an edge being traversed."""
self._node_status[source_node] = f"[dim]→ {target_node}[/dim]"
self._display_graph()
-172
View File
@@ -1,172 +0,0 @@
"""
Log formatting utilities and LogPane widget.
The module-level functions (format_event, extract_event_text, format_python_log)
can be used by any widget that needs to render log lines without instantiating LogPane.
"""
import logging
from datetime import datetime
from textual.app import ComposeResult
from textual.containers import Container
from framework.runtime.event_bus import AgentEvent, EventType
from framework.tui.widgets.selectable_rich_log import SelectableRichLog as RichLog
# --- Module-level formatting constants ---
EVENT_FORMAT: dict[EventType, tuple[str, str]] = {
EventType.EXECUTION_STARTED: (">>", "bold cyan"),
EventType.EXECUTION_COMPLETED: ("<<", "bold green"),
EventType.EXECUTION_FAILED: ("!!", "bold red"),
EventType.TOOL_CALL_STARTED: ("->", "yellow"),
EventType.TOOL_CALL_COMPLETED: ("<-", "green"),
EventType.NODE_LOOP_STARTED: ("@@", "cyan"),
EventType.NODE_LOOP_ITERATION: ("..", "dim"),
EventType.NODE_LOOP_COMPLETED: ("@@", "dim"),
EventType.LLM_TURN_COMPLETE: ("", "green"),
EventType.NODE_STALLED: ("!!", "bold yellow"),
EventType.NODE_INPUT_BLOCKED: ("!!", "yellow"),
EventType.GOAL_PROGRESS: ("%%", "blue"),
EventType.GOAL_ACHIEVED: ("**", "bold green"),
EventType.CONSTRAINT_VIOLATION: ("!!", "bold red"),
EventType.STATE_CHANGED: ("~~", "dim"),
EventType.CLIENT_INPUT_REQUESTED: ("??", "magenta"),
}
LOG_LEVEL_COLORS: dict[int, str] = {
logging.DEBUG: "dim",
logging.INFO: "",
logging.WARNING: "yellow",
logging.ERROR: "red",
logging.CRITICAL: "bold red",
}
# --- Module-level formatting functions ---
def extract_event_text(event: AgentEvent) -> str:
"""Extract human-readable text from an event's data dict."""
et = event.type
data = event.data
if et == EventType.EXECUTION_STARTED:
return "Execution started"
elif et == EventType.EXECUTION_COMPLETED:
return "Execution completed"
elif et == EventType.EXECUTION_FAILED:
return f"Execution FAILED: {data.get('error', 'unknown')}"
elif et == EventType.TOOL_CALL_STARTED:
return f"Tool call: {data.get('tool_name', 'unknown')}"
elif et == EventType.TOOL_CALL_COMPLETED:
name = data.get("tool_name", "unknown")
if data.get("is_error"):
preview = str(data.get("result", ""))[:80]
return f"Tool error: {name} - {preview}"
return f"Tool done: {name}"
elif et == EventType.NODE_LOOP_STARTED:
return f"Node started: {event.node_id or 'unknown'}"
elif et == EventType.NODE_LOOP_ITERATION:
return f"{event.node_id or 'unknown'} iteration {data.get('iteration', '?')}"
elif et == EventType.NODE_LOOP_COMPLETED:
return f"Node done: {event.node_id or 'unknown'}"
elif et == EventType.NODE_STALLED:
reason = data.get("reason", "")
node = event.node_id or "unknown"
return f"Node stalled: {node} - {reason}" if reason else f"Node stalled: {node}"
elif et == EventType.NODE_INPUT_BLOCKED:
return f"Node input blocked: {event.node_id or 'unknown'}"
elif et == EventType.GOAL_PROGRESS:
return f"Goal progress: {data.get('progress', '?')}"
elif et == EventType.GOAL_ACHIEVED:
return "Goal achieved"
elif et == EventType.CONSTRAINT_VIOLATION:
return f"Constraint violated: {data.get('description', 'unknown')}"
elif et == EventType.STATE_CHANGED:
return f"State changed: {data.get('key', 'unknown')}"
elif et == EventType.CLIENT_INPUT_REQUESTED:
return "Waiting for user input"
elif et == EventType.LLM_TURN_COMPLETE:
stop = data.get("stop_reason", "?")
model = data.get("model", "?")
inp = data.get("input_tokens", 0)
out = data.get("output_tokens", 0)
return f"{model}{stop} ({inp}+{out} tokens)"
else:
return f"{et.value}: {data}"
def format_event(event: AgentEvent) -> str:
"""Format an AgentEvent as a Rich markup string with timestamp + symbol."""
ts = event.timestamp.strftime("%H:%M:%S")
symbol, color = EVENT_FORMAT.get(event.type, ("--", "dim"))
text = extract_event_text(event)
return f"[dim]{ts}[/dim] [{color}]{symbol} {text}[/{color}]"
def format_python_log(record: logging.LogRecord) -> str:
"""Format a Python log record as a Rich markup string with timestamp and severity color."""
ts = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
color = LOG_LEVEL_COLORS.get(record.levelno, "")
msg = record.getMessage()
if color:
return f"[dim]{ts}[/dim] [{color}]{record.levelname}[/{color}] {msg}"
else:
return f"[dim]{ts}[/dim] {record.levelname} {msg}"
# --- LogPane widget (kept for backward compatibility) ---
class LogPane(Container):
"""Widget to display logs with reliable rendering."""
DEFAULT_CSS = """
LogPane {
width: 100%;
height: 100%;
}
LogPane > RichLog {
width: 100%;
height: 100%;
background: $surface;
border: none;
scrollbar-background: $panel;
scrollbar-color: $primary;
}
"""
def compose(self) -> ComposeResult:
yield RichLog(id="main-log", highlight=True, markup=True, auto_scroll=False)
def write_event(self, event: AgentEvent) -> None:
"""Format an AgentEvent with timestamp + symbol and write to the log."""
self.write_log(format_event(event))
def write_python_log(self, record: logging.LogRecord) -> None:
"""Format a Python log record with timestamp and severity color."""
self.write_log(format_python_log(record))
def write_log(self, message: str) -> None:
"""Write a log message to the log pane."""
try:
if not self.is_mounted:
return
log = self.query_one("#main-log", RichLog)
if not log.is_mounted:
return
was_at_bottom = log.is_vertical_scroll_end
log.write(message)
if was_at_bottom:
log.scroll_end(animate=False)
except Exception:
pass
@@ -1,229 +0,0 @@
"""
SelectableRichLog - RichLog with mouse-driven text selection and clipboard copy.
Drop-in replacement for RichLog. Click-and-drag to select text, which is
visually highlighted. Press Ctrl+C to copy selection to clipboard (handled
by app.py). Press Escape or single-click to clear selection.
"""
from __future__ import annotations
import subprocess
import sys
from rich.segment import Segment as RichSegment
from rich.style import Style
from textual.geometry import Offset
from textual.selection import Selection
from textual.strip import Strip
from textual.widgets import RichLog
# Highlight style for selected text
_HIGHLIGHT_STYLE = Style(bgcolor="blue", color="white")
class SelectableRichLog(RichLog):
"""RichLog with mouse-driven text selection."""
DEFAULT_CSS = """
SelectableRichLog {
pointer: text;
}
"""
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._sel_anchor: Offset | None = None
self._sel_end: Offset | None = None
self._selecting: bool = False
# -- Internal helpers --
def _apply_highlight(self, strip: Strip) -> Strip:
"""Apply highlight with correct precedence (highlight wins over base style)."""
segments = []
for text, style, control in strip._segments:
if control:
segments.append(RichSegment(text, style, control))
else:
new_style = (style + _HIGHLIGHT_STYLE) if style else _HIGHLIGHT_STYLE
segments.append(RichSegment(text, new_style, control))
return Strip(segments, strip.cell_length)
# -- Selection helpers --
@property
def selection(self) -> Selection | None:
"""Build a Selection from current anchor/end, or None if no selection."""
if self._sel_anchor is None or self._sel_end is None:
return None
if self._sel_anchor == self._sel_end:
return None
return Selection.from_offsets(self._sel_anchor, self._sel_end)
def _mouse_to_content(self, event_x: int, event_y: int) -> Offset:
"""Convert viewport mouse coords to content (line, col) coords."""
scroll_x, scroll_y = self.scroll_offset
return Offset(scroll_x + event_x, scroll_y + event_y)
def clear_selection(self) -> None:
"""Clear any active selection."""
had_selection = self._sel_anchor is not None
self._sel_anchor = None
self._sel_end = None
self._selecting = False
if had_selection:
self.refresh()
# -- Mouse handlers (left button only) --
def on_mouse_down(self, event) -> None:
"""Start selection on left mouse button."""
if event.button != 1:
return
self._sel_anchor = self._mouse_to_content(event.x, event.y)
self._sel_end = self._sel_anchor
self._selecting = True
self.capture_mouse()
self.refresh()
def on_mouse_move(self, event) -> None:
"""Extend selection while dragging."""
if not self._selecting:
return
self._sel_end = self._mouse_to_content(event.x, event.y)
self.refresh()
def on_mouse_up(self, event) -> None:
"""End selection on mouse release."""
if not self._selecting:
return
self._selecting = False
self.release_mouse()
# Single-click (no drag) clears selection
if self._sel_anchor == self._sel_end:
self.clear_selection()
# -- Keyboard handlers --
def on_key(self, event) -> None:
"""Clear selection on Escape."""
if event.key == "escape":
self.clear_selection()
# -- Rendering with highlight --
def render_line(self, y: int) -> Strip:
"""Override to apply selection highlight on top of the base strip."""
strip = super().render_line(y)
sel = self.selection
if sel is None:
return strip
# Determine which content line this viewport row corresponds to
_, scroll_y = self.scroll_offset
content_y = scroll_y + y
span = sel.get_span(content_y)
if span is None:
return strip
start_x, end_x = span
cell_len = strip.cell_length
if cell_len == 0:
return strip
scroll_x, _ = self.scroll_offset
# -1 means "to end of content line" — use viewport end
if end_x == -1:
end_x = cell_len
else:
# Convert content-space x to viewport-space x
end_x = end_x - scroll_x
# Convert content-space x to viewport-space x
start_x = start_x - scroll_x
# Clamp to viewport strip bounds
start_x = max(0, start_x)
end_x = min(end_x, cell_len)
if start_x >= end_x:
return strip
# Divide strip into [before, selected, after] and highlight the middle
parts = strip.divide([start_x, end_x])
if len(parts) < 2:
return strip
highlighted_parts: list[Strip] = []
for i, part in enumerate(parts):
if i == 1:
highlighted_parts.append(self._apply_highlight(part))
else:
highlighted_parts.append(part)
return Strip.join(highlighted_parts)
# -- Text extraction & clipboard --
def get_selected_text(self) -> str | None:
"""Extract the plain text of the current selection, or None."""
sel = self.selection
if sel is None:
return None
# Build full text from all lines
all_text = "\n".join(strip.text for strip in self.lines)
try:
extracted = sel.extract(all_text)
except (IndexError, ValueError):
# Selection coordinates can exceed line count when the virtual
# canvas is larger than the actual content (e.g. after scroll).
return None
return extracted if extracted else None
def copy_selection(self) -> str | None:
"""Copy selected text to system clipboard. Returns text or None."""
text = self.get_selected_text()
if not text:
return None
_copy_to_clipboard(text)
return text
def _copy_to_clipboard(text: str) -> None:
"""Copy text to system clipboard using platform-native tools."""
try:
if sys.platform == "darwin":
subprocess.run(["pbcopy"], encoding="utf-8", input=text.encode(), check=True, timeout=5)
elif sys.platform == "win32":
subprocess.run(
["clip.exe"],
encoding="utf-8",
input=text.encode("utf-16le"),
check=True,
timeout=5,
)
elif sys.platform.startswith("linux"):
try:
subprocess.run(
["xclip", "-selection", "clipboard"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
subprocess.run(
["xsel", "--clipboard", "--input"],
encoding="utf-8",
input=text.encode(),
check=True,
timeout=5,
)
except (subprocess.SubprocessError, FileNotFoundError):
pass
+5
View File
@@ -38,4 +38,9 @@ export const api = {
body: body ? JSON.stringify(body) : undefined,
}),
delete: <T>(path: string) => request<T>(path, { method: "DELETE" }),
patch: <T>(path: string, body?: unknown) =>
request<T>(path, {
method: "PATCH",
body: body ? JSON.stringify(body) : undefined,
}),
};
+11 -1
View File
@@ -1,5 +1,5 @@
import { api } from "./client";
import type { GraphTopology, NodeDetail, NodeCriteria, ToolInfo } from "./types";
import type { GraphTopology, NodeDetail, NodeCriteria, ToolInfo, DraftGraph, FlowchartMap } from "./types";
export const graphsApi = {
nodes: (sessionId: string, graphId: string, workerSessionId?: string) =>
@@ -26,4 +26,14 @@ export const graphsApi = {
api.get<{ tools: ToolInfo[] }>(
`/sessions/${sessionId}/graphs/${graphId}/nodes/${nodeId}/tools`,
),
draftGraph: (sessionId: string) =>
api.get<{ draft: DraftGraph | null }>(
`/sessions/${sessionId}/draft-graph`,
),
flowchartMap: (sessionId: string) =>
api.get<FlowchartMap>(
`/sessions/${sessionId}/flowchart-map`,
),
};
+10 -12
View File
@@ -1,11 +1,11 @@
import { api } from "./client";
import type {
AgentEvent,
LiveSession,
LiveSessionDetail,
SessionSummary,
SessionDetail,
Checkpoint,
Message,
EntryPoint,
} from "./types";
@@ -64,12 +64,18 @@ export const sessionsApi = {
`/sessions/${sessionId}/entry-points`,
),
updateTriggerTask: (sessionId: string, triggerId: string, task: string) =>
api.patch<{ trigger_id: string; task: string }>(
`/sessions/${sessionId}/triggers/${triggerId}`,
{ task },
),
graphs: (sessionId: string) =>
api.get<{ graphs: string[] }>(`/sessions/${sessionId}/graphs`),
/** Get queen conversation history for a session (works for cold/post-restart sessions too). */
queenMessages: (sessionId: string) =>
api.get<{ messages: Message[]; session_id: string }>(`/sessions/${sessionId}/queen-messages`),
/** Get persisted eventbus log for a session (works for cold sessions — used for full UI replay). */
eventsHistory: (sessionId: string) =>
api.get<{ events: AgentEvent[]; session_id: string }>(`/sessions/${sessionId}/events/history`),
/** List all queen sessions on disk — live + cold (post-restart). */
history: () =>
@@ -105,12 +111,4 @@ export const sessionsApi = {
api.post<{ execution_id: string }>(
`/sessions/${sessionId}/worker-sessions/${wsId}/checkpoints/${checkpointId}/restore`,
),
messages: (sessionId: string, wsId: string, nodeId?: string) => {
const params = new URLSearchParams({ client_only: "true" });
if (nodeId) params.set("node_id", nodeId);
return api.get<{ messages: Message[] }>(
`/sessions/${sessionId}/worker-sessions/${wsId}/messages?${params}`,
);
},
};
+63 -1
View File
@@ -31,6 +31,8 @@ export interface EntryPoint {
entry_node: string;
trigger_type: string;
trigger_config?: Record<string, unknown>;
/** Worker task string when this trigger fires autonomously. */
task?: string;
/** Seconds until the next timer fire (only present for timer entry points). */
next_fire_in?: number;
}
@@ -41,6 +43,7 @@ export interface DiscoverEntry {
description: string;
category: string;
session_count: number;
run_count: number;
node_count: number;
tool_count: number;
tags: string[];
@@ -191,6 +194,56 @@ export interface GraphTopology {
entry_points?: EntryPoint[];
}
// --- Draft graph types (planning phase) ---
export interface DraftNode {
id: string;
name: string;
description: string;
node_type: string;
tools: string[];
input_keys: string[];
output_keys: string[];
success_criteria: string;
sub_agents: string[];
/** For decision nodes: the yes/no question evaluated during dissolution. */
decision_clause?: string;
flowchart_type: string;
flowchart_shape: string;
flowchart_color: string;
}
export interface DraftEdge {
id: string;
source: string;
target: string;
condition: string;
description: string;
/** Short label shown on the flowchart edge (e.g. "Yes", "No"). */
label?: string;
}
export interface DraftGraph {
agent_name: string;
goal: string;
description: string;
success_criteria: string[];
constraints: string[];
nodes: DraftNode[];
edges: DraftEdge[];
entry_node: string;
terminal_nodes: string[];
flowchart_legend: Record<string, { shape: string; color: string }>;
}
/** Mapping from runtime graph nodes → original flowchart draft nodes. */
export interface FlowchartMap {
/** runtime_node_id → list of original draft node IDs it absorbed. */
map: Record<string, string[]> | null;
/** Original draft graph preserved before planning-node dissolution (decision + subagent). */
original_draft: DraftGraph | null;
}
export interface NodeCriteria {
node_id: string;
success_criteria: string | null;
@@ -261,6 +314,7 @@ export type EventTypeName =
| "tool_call_completed"
| "client_output_delta"
| "client_input_requested"
| "client_input_received"
| "node_internal_output"
| "node_input_blocked"
| "node_stalled"
@@ -276,7 +330,14 @@ export type EventTypeName =
| "worker_loaded"
| "credentials_required"
| "queen_phase_changed"
| "subagent_report";
| "subagent_report"
| "draft_graph_updated"
| "flowchart_map_updated"
| "trigger_available"
| "trigger_activated"
| "trigger_deactivated"
| "trigger_fired"
| "trigger_removed";
export interface AgentEvent {
type: EventTypeName;
@@ -287,4 +348,5 @@ export interface AgentEvent {
timestamp: string;
correlation_id: string | null;
graph_id: string | null;
run_id?: string | null;
}
+228 -84
View File
@@ -1,4 +1,4 @@
import { memo, useMemo, useState, useRef } from "react";
import { memo, useMemo, useState, useRef, useEffect, useCallback } from "react";
import { Play, Pause, Loader2, CheckCircle2 } from "lucide-react";
export type NodeStatus = "running" | "complete" | "pending" | "error" | "looping";
@@ -20,7 +20,7 @@ export interface GraphNode {
edgeLabels?: Record<string, string>;
}
type RunState = "idle" | "deploying" | "running";
export type RunState = "idle" | "deploying" | "running";
interface AgentGraphProps {
nodes: GraphNode[];
@@ -35,7 +35,7 @@ interface AgentGraphProps {
}
// --- Extracted RunButton so hover state survives parent re-renders ---
interface RunButtonProps {
export interface RunButtonProps {
runState: RunState;
disabled: boolean;
onRun: () => void;
@@ -43,7 +43,7 @@ interface RunButtonProps {
btnRef: React.Ref<HTMLButtonElement>;
}
const RunButton = memo(function RunButton({ runState, disabled, onRun, onPause, btnRef }: RunButtonProps) {
export const RunButton = memo(function RunButton({ runState, disabled, onRun, onPause, btnRef }: RunButtonProps) {
const [hovered, setHovered] = useState(false);
const showPause = runState === "running" && hovered;
@@ -89,46 +89,94 @@ const MARGIN_RIGHT = 50; // space for back-edge arcs
const SVG_BASE_W = 320;
const GAP_X = 12;
// Unified amber/gold palette
const statusColors: Record<NodeStatus, { dot: string; bg: string; border: string; glow: string }> = {
running: {
dot: "hsl(45,95%,58%)",
bg: "hsl(45,95%,58%,0.08)",
border: "hsl(45,95%,58%,0.5)",
glow: "hsl(45,95%,58%,0.15)",
},
looping: {
dot: "hsl(38,90%,55%)",
bg: "hsl(38,90%,55%,0.08)",
border: "hsl(38,90%,55%,0.5)",
glow: "hsl(38,90%,55%,0.15)",
},
complete: {
dot: "hsl(43,70%,45%)",
bg: "hsl(43,70%,45%,0.05)",
border: "hsl(43,70%,45%,0.25)",
glow: "none",
},
pending: {
dot: "hsl(35,15%,28%)",
bg: "hsl(35,10%,12%)",
border: "hsl(35,10%,20%)",
glow: "none",
},
error: {
dot: "hsl(0,65%,55%)",
bg: "hsl(0,65%,55%,0.06)",
border: "hsl(0,65%,55%,0.3)",
glow: "hsl(0,65%,55%,0.1)",
},
};
// Read a CSS custom property value (space-separated HSL components)
function cssVar(name: string): string {
return getComputedStyle(document.documentElement).getPropertyValue(name).trim();
}
// Trigger node palette — cool blue-gray, visually distinct from amber execution nodes
const triggerColors = {
bg: "hsl(210,25%,14%)",
border: "hsl(210,30%,30%)",
text: "hsl(210,30%,65%)",
icon: "hsl(210,40%,55%)",
type StatusColorSet = Record<NodeStatus, { dot: string; bg: string; border: string; glow: string }>;
type TriggerColorSet = { bg: string; border: string; text: string; icon: string };
function buildStatusColors(): StatusColorSet {
const running = cssVar("--node-running") || "45 95% 58%";
const looping = cssVar("--node-looping") || "38 90% 55%";
const complete = cssVar("--node-complete") || "43 70% 45%";
const pending = cssVar("--node-pending") || "35 15% 28%";
const pendingBg = cssVar("--node-pending-bg") || "35 10% 12%";
const pendingBorder = cssVar("--node-pending-border") || "35 10% 20%";
const error = cssVar("--node-error") || "0 65% 55%";
return {
running: {
dot: `hsl(${running})`,
bg: `hsl(${running} / 0.08)`,
border: `hsl(${running} / 0.5)`,
glow: `hsl(${running} / 0.15)`,
},
looping: {
dot: `hsl(${looping})`,
bg: `hsl(${looping} / 0.08)`,
border: `hsl(${looping} / 0.5)`,
glow: `hsl(${looping} / 0.15)`,
},
complete: {
dot: `hsl(${complete})`,
bg: `hsl(${complete} / 0.05)`,
border: `hsl(${complete} / 0.25)`,
glow: "none",
},
pending: {
dot: `hsl(${pending})`,
bg: `hsl(${pendingBg})`,
border: `hsl(${pendingBorder})`,
glow: "none",
},
error: {
dot: `hsl(${error})`,
bg: `hsl(${error} / 0.06)`,
border: `hsl(${error} / 0.3)`,
glow: `hsl(${error} / 0.1)`,
},
};
}
function buildTriggerColors(): TriggerColorSet {
const bg = cssVar("--trigger-bg") || "210 25% 14%";
const border = cssVar("--trigger-border") || "210 30% 30%";
const text = cssVar("--trigger-text") || "210 30% 65%";
const icon = cssVar("--trigger-icon") || "210 40% 55%";
return {
bg: `hsl(${bg})`,
border: `hsl(${border})`,
text: `hsl(${text})`,
icon: `hsl(${icon})`,
};
}
/** Hook that reads node/trigger colors from CSS vars and updates on theme changes. */
function useThemeColors() {
const [statusColors, setStatusColors] = useState<StatusColorSet>(buildStatusColors);
const [triggerColors, setTriggerColors] = useState<TriggerColorSet>(buildTriggerColors);
useEffect(() => {
const rebuild = () => {
setStatusColors(buildStatusColors());
setTriggerColors(buildTriggerColors());
};
const obs = new MutationObserver(rebuild);
obs.observe(document.documentElement, { attributes: true, attributeFilter: ["class", "style"] });
return () => obs.disconnect();
}, []);
return { statusColors, triggerColors };
}
// Active trigger — brighter, more saturated blue
const activeTriggerColors = {
bg: "hsl(210,30%,18%)",
border: "hsl(210,50%,50%)",
text: "hsl(210,40%,75%)",
icon: "hsl(210,60%,65%)",
};
const triggerIcons: Record<string, string> = {
@@ -146,10 +194,96 @@ function truncateLabel(label: string, availablePx: number, fontSize: number): st
return label.slice(0, Math.max(maxChars - 1, 1)) + "\u2026";
}
// ─── Pan & Zoom wrapper ───
function PanZoomSvg({ svgW, svgH, className, children }: { svgW: number; svgH: number; className?: string; children: React.ReactNode }) {
const [zoom, setZoom] = useState(1);
const [pan, setPan] = useState({ x: 0, y: 0 });
const [dragging, setDragging] = useState(false);
const dragStart = useRef({ x: 0, y: 0, panX: 0, panY: 0 });
const MIN_ZOOM = 0.4;
const MAX_ZOOM = 3;
const handleWheel = useCallback((e: React.WheelEvent) => {
e.preventDefault();
const delta = e.deltaY > 0 ? 0.9 : 1.1;
setZoom(z => Math.min(MAX_ZOOM, Math.max(MIN_ZOOM, z * delta)));
}, []);
const handleMouseDown = useCallback((e: React.MouseEvent) => {
if (e.button !== 0) return;
setDragging(true);
dragStart.current = { x: e.clientX, y: e.clientY, panX: pan.x, panY: pan.y };
}, [pan]);
const handleMouseMove = useCallback((e: React.MouseEvent) => {
if (!dragging) return;
setPan({
x: dragStart.current.panX + (e.clientX - dragStart.current.x),
y: dragStart.current.panY + (e.clientY - dragStart.current.y),
});
}, [dragging]);
const handleMouseUp = useCallback(() => setDragging(false), []);
const resetView = useCallback(() => {
setZoom(1);
setPan({ x: 0, y: 0 });
}, []);
return (
<div className="flex-1 relative overflow-hidden px-1 pb-5">
<div
onWheel={handleWheel}
onMouseDown={handleMouseDown}
onMouseMove={handleMouseMove}
onMouseUp={handleMouseUp}
onMouseLeave={handleMouseUp}
className="w-full h-full"
style={{ cursor: dragging ? "grabbing" : "grab" }}
>
<svg
width="100%"
viewBox={`0 0 ${svgW} ${svgH}`}
preserveAspectRatio="xMidYMin meet"
className={`select-none ${className || ""}`}
style={{
fontFamily: "'Inter', system-ui, sans-serif",
transform: `translate(${pan.x}px, ${pan.y}px) scale(${zoom})`,
transformOrigin: "center top",
}}
>
{children}
</svg>
</div>
{/* Zoom controls */}
<div className="absolute bottom-7 right-3 flex items-center gap-1 bg-card/80 backdrop-blur-sm border border-border/40 rounded-lg p-0.5 shadow-sm">
<button
onClick={() => setZoom(z => Math.min(MAX_ZOOM, z * 1.2))}
className="w-6 h-6 flex items-center justify-center rounded text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors text-xs font-bold"
aria-label="Zoom in"
>+</button>
<button
onClick={resetView}
className="px-1.5 h-6 flex items-center justify-center rounded text-[10px] font-mono text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors"
aria-label="Reset zoom"
>{Math.round(zoom * 100)}%</button>
<button
onClick={() => setZoom(z => Math.max(MIN_ZOOM, z * 0.8))}
className="w-6 h-6 flex items-center justify-center rounded text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors text-xs font-bold"
aria-label="Zoom out"
>{"\u2212"}</button>
</div>
</div>
);
}
export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, onPause, version, runState: externalRunState, building, queenPhase }: AgentGraphProps) {
const [localRunState, setLocalRunState] = useState<RunState>("idle");
const runState = externalRunState ?? localRunState;
const runBtnRef = useRef<HTMLButtonElement>(null);
const { statusColors, triggerColors } = useThemeColors();
const handleRun = () => {
if (runState !== "idle") return;
@@ -344,18 +478,21 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
let d: string;
if (skipsLayers && hasCollision(fromLayer, toLayer, from.x, to.x)) {
// Route around intermediate nodes: curve to the left
// Route around intermediate nodes: orthogonal detour to the left
const detourX = Math.min(from.x, to.x) - nodeW * 0.4;
d = `M ${startX} ${y1} C ${startX} ${y1 + 20}, ${detourX} ${y1 + 20}, ${detourX} ${midY} S ${toCenterX} ${y2 - 20} ${toCenterX} ${y2}`;
d = `M ${startX} ${y1} L ${startX} ${midY} L ${detourX} ${midY} L ${detourX} ${y2 - 10} L ${toCenterX} ${y2 - 10} L ${toCenterX} ${y2}`;
} else if (Math.abs(startX - toCenterX) < 2) {
// Straight vertical line when aligned
d = `M ${startX} ${y1} L ${toCenterX} ${y2}`;
} else {
// Standard bezier: from source bottom to target top
d = `M ${startX} ${y1} C ${startX} ${midY}, ${toCenterX} ${midY}, ${toCenterX} ${y2}`;
// Orthogonal: down, across, down
d = `M ${startX} ${y1} L ${startX} ${midY} L ${toCenterX} ${midY} L ${toCenterX} ${y2}`;
}
const fromNode = nodes[edge.fromIdx];
const isActive = fromNode.status === "complete" || fromNode.status === "running" || fromNode.status === "looping";
const strokeColor = isActive ? "hsl(43,70%,45%,0.35)" : "hsl(35,10%,20%)";
const arrowColor = isActive ? "hsl(43,70%,45%,0.5)" : "hsl(35,10%,22%)";
const strokeColor = isActive ? statusColors.complete.border : statusColors.pending.border;
const arrowColor = isActive ? statusColors.complete.dot : statusColors.pending.border;
return (
<g key={`fwd-${i}`}>
@@ -368,7 +505,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
<text
x={(startX + toCenterX) / 2 + 8}
y={midY - 2}
fill="hsl(35,15%,40%)"
fill={statusColors.pending.dot}
fontSize={9}
fontStyle="italic"
>
@@ -394,9 +531,9 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
const fromNode = nodes[edge.fromIdx];
const isActive = fromNode.status === "complete" || fromNode.status === "running" || fromNode.status === "looping";
const color = isActive ? "hsl(38,80%,50%,0.3)" : "hsl(35,10%,20%)";
const color = isActive ? statusColors.looping.border : statusColors.pending.border;
// Bezier curve with rounded corners
// Bezier curve with rounded corners (kept as curves for back edges)
const path = `M ${startX} ${startY} C ${startX + r} ${startY}, ${curveX} ${startY}, ${curveX} ${startY - r} L ${curveX} ${endY + r} C ${curveX} ${endY}, ${endX + r} ${endY}, ${endX + 6} ${endY}`;
return (
@@ -404,7 +541,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
<path d={path} fill="none" stroke={color} strokeWidth={1.5} strokeDasharray="4 3" />
<polygon
points={`${endX + 6},${endY - 3} ${endX + 6},${endY + 3} ${endX},${endY}`}
fill={isActive ? "hsl(38,80%,50%,0.45)" : "hsl(35,10%,22%)"}
fill={isActive ? statusColors.looping.dot : statusColors.pending.border}
/>
</g>
);
@@ -417,10 +554,12 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
const triggerAvailW = nodeW - 38;
const triggerDisplayLabel = truncateLabel(node.label, triggerAvailW, triggerFontSize);
const nextFireIn = node.triggerConfig?.next_fire_in as number | undefined;
const isActive = node.status === "running" || node.status === "complete";
const colors = isActive ? activeTriggerColors : triggerColors;
// Format countdown for display below node
let countdownLabel: string | null = null;
if (nextFireIn != null && nextFireIn > 0) {
if (isActive && nextFireIn != null && nextFireIn > 0) {
const h = Math.floor(nextFireIn / 3600);
const m = Math.floor((nextFireIn % 3600) / 60);
const s = Math.floor(nextFireIn % 60);
@@ -429,24 +568,28 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
: `next in ${m}m ${String(s).padStart(2, "0")}s`;
}
// Status label below countdown
const statusLabel = isActive ? "active" : "inactive";
const statusColor = isActive ? "hsl(140,40%,50%)" : "hsl(210,20%,40%)";
return (
<g key={node.id} onClick={() => onNodeClick?.(node)} style={{ cursor: onNodeClick ? "pointer" : "default" }}>
<title>{node.label}</title>
{/* Pill-shaped background with dashed border */}
{/* Pill-shaped background — solid border when active, dashed when inactive */}
<rect
x={pos.x} y={pos.y}
width={nodeW} height={NODE_H}
rx={NODE_H / 2}
fill={triggerColors.bg}
stroke={triggerColors.border}
strokeWidth={1}
strokeDasharray="4 2"
fill={colors.bg}
stroke={colors.border}
strokeWidth={isActive ? 1.5 : 1}
strokeDasharray={isActive ? undefined : "4 2"}
/>
{/* Trigger type icon */}
<text
x={pos.x + 18} y={pos.y + NODE_H / 2}
fill={triggerColors.icon} fontSize={13}
fill={colors.icon} fontSize={13}
textAnchor="middle" dominantBaseline="middle"
>
{icon}
@@ -455,7 +598,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
{/* Label */}
<text
x={pos.x + 32} y={pos.y + NODE_H / 2}
fill={triggerColors.text}
fill={colors.text}
fontSize={triggerFontSize}
fontWeight={500}
dominantBaseline="middle"
@@ -468,12 +611,21 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
{countdownLabel && (
<text
x={pos.x + nodeW / 2} y={pos.y + NODE_H + 13}
fill="hsl(210,30%,50%)" fontSize={9.5}
fill={triggerColors.text} fontSize={9.5}
textAnchor="middle" fontStyle="italic" opacity={0.7}
>
{countdownLabel}
</text>
)}
{/* Status label */}
<text
x={pos.x + nodeW / 2} y={pos.y + NODE_H + (countdownLabel ? 25 : 13)}
fill={statusColor} fontSize={9}
textAnchor="middle" opacity={0.8}
>
{statusLabel}
</text>
</g>
);
};
@@ -543,7 +695,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
{/* Label -- truncated with ellipsis for narrow nodes */}
<text
x={pos.x + 32} y={pos.y + NODE_H / 2}
fill={isActive ? "hsl(45,90%,85%)" : isDone ? "hsl(40,20%,75%)" : "hsl(35,10%,45%)"}
fill={isActive ? statusColors.running.dot : isDone ? statusColors.complete.dot : statusColors.pending.dot}
fontSize={fontSize}
fontWeight={isActive ? 600 : isDone ? 500 : 400}
dominantBaseline="middle"
@@ -556,7 +708,7 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
{node.statusLabel && isActive && (
<text
x={pos.x + nodeW + 10} y={pos.y + NODE_H / 2}
fill="hsl(45,80%,60%)" fontSize={10.5} fontStyle="italic"
fill={statusColors.running.dot} fontSize={10.5} fontStyle="italic"
dominantBaseline="middle" opacity={0.8}
>
{node.statusLabel}
@@ -600,27 +752,19 @@ export default function AgentGraph({ nodes, title: _title, onNodeClick, onRun, o
</div>
{/* Graph */}
<div className="flex-1 overflow-y-auto overflow-x-hidden px-3 pb-5 relative">
<svg
width={svgWidth}
height={svgHeight}
viewBox={`0 0 ${svgWidth} ${svgHeight}`}
className={`select-none${building ? " opacity-30" : ""}`}
style={{ fontFamily: "'Inter', system-ui, sans-serif" }}
>
{forwardEdges.map((e, i) => renderForwardEdge(e, i))}
{backEdges.map((e, i) => renderBackEdge(e, i))}
{nodes.map((n, i) => renderNode(n, i))}
</svg>
{building && (
<div className="absolute inset-0 flex items-center justify-center">
<div className="flex flex-col items-center gap-3">
<Loader2 className="w-6 h-6 animate-spin text-primary/60" />
<p className="text-xs text-muted-foreground/80">Rebuilding agent...</p>
</div>
<PanZoomSvg svgW={svgWidth} svgH={svgHeight} className={building ? "opacity-30" : ""}>
{forwardEdges.map((e, i) => renderForwardEdge(e, i))}
{backEdges.map((e, i) => renderBackEdge(e, i))}
{nodes.map((n, i) => renderNode(n, i))}
</PanZoomSvg>
{building && (
<div className="absolute inset-0 flex items-center justify-center">
<div className="flex flex-col items-center gap-3">
<Loader2 className="w-6 h-6 animate-spin text-primary/60" />
<p className="text-xs text-muted-foreground/80">Rebuilding agent...</p>
</div>
)}
</div>
</div>
)}
</div>
);
}
+36 -11
View File
@@ -2,6 +2,7 @@ import { memo, useState, useRef, useEffect } from "react";
import { Send, Square, Crown, Cpu, Check, Loader2 } from "lucide-react";
import MarkdownContent from "@/components/MarkdownContent";
import QuestionWidget from "@/components/QuestionWidget";
import MultiQuestionWidget from "@/components/MultiQuestionWidget";
export interface ChatMessage {
id: string;
@@ -9,12 +10,14 @@ export interface ChatMessage {
agentColor: string;
content: string;
timestamp: string;
type?: "system" | "agent" | "user" | "tool_status" | "worker_input_request";
type?: "system" | "agent" | "user" | "tool_status" | "worker_input_request" | "run_divider";
role?: "queen" | "worker";
/** Which worker thread this message belongs to (worker agent name) */
thread?: string;
/** Epoch ms when this message was first created — used for ordering queen/worker interleaving */
createdAt?: number;
/** Queen phase active when this message was created */
phase?: "planning" | "building" | "staging" | "running";
}
interface ChatPanelProps {
@@ -34,8 +37,12 @@ interface ChatPanelProps {
pendingQuestion?: string | null;
/** Options for the pending question */
pendingOptions?: string[] | null;
/** Multiple questions from ask_user_multiple */
pendingQuestions?: { id: string; prompt: string; options?: string[] }[] | null;
/** Called when user submits an answer to the pending question */
onQuestionSubmit?: (answer: string, isOther: boolean) => void;
/** Called when user submits answers to multiple questions */
onMultiQuestionSubmit?: (answers: Record<string, string>) => void;
/** Called when user dismisses the pending question without answering */
onQuestionDismiss?: () => void;
/** Queen operating phase — shown as a tag on queen messages */
@@ -149,6 +156,18 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
const isQueen = msg.role === "queen";
const color = getColor(msg.agent, msg.role);
if (msg.type === "run_divider") {
return (
<div className="flex items-center gap-3 py-2 my-1">
<div className="flex-1 h-px bg-border/60" />
<span className="text-[10px] text-muted-foreground font-medium uppercase tracking-wider">
{msg.content}
</span>
<div className="flex-1 h-px bg-border/60" />
</div>
);
}
if (msg.type === "system") {
return (
<div className="flex justify-center py-1">
@@ -200,13 +219,13 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
}`}
>
{isQueen
? queenPhase === "running"
? "running phase"
: queenPhase === "staging"
? "staging phase"
: queenPhase === "planning"
? "planning phase"
: "building phase"
? ((msg.phase ?? queenPhase) === "running"
? "running"
: (msg.phase ?? queenPhase) === "staging"
? "staging"
: (msg.phase ?? queenPhase) === "planning"
? "planning"
: "building")
: "Worker"}
</span>
</div>
@@ -220,9 +239,9 @@ const MessageBubble = memo(function MessageBubble({ msg, queenPhase }: { msg: Ch
</div>
</div>
);
}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.queenPhase === next.queenPhase);
}, (prev, next) => prev.msg.id === next.msg.id && prev.msg.content === next.msg.content && prev.msg.phase === next.msg.phase && prev.queenPhase === next.queenPhase);
export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, onQuestionSubmit, onQuestionDismiss, queenPhase }: ChatPanelProps) {
export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting, isBusy, activeThread, disabled, onCancel, pendingQuestion, pendingOptions, pendingQuestions, onQuestionSubmit, onMultiQuestionSubmit, onQuestionDismiss, queenPhase }: ChatPanelProps) {
const [input, setInput] = useState("");
const [readMap, setReadMap] = useState<Record<string, number>>({});
const bottomRef = useRef<HTMLDivElement>(null);
@@ -332,7 +351,13 @@ export default function ChatPanel({ messages, onSend, isWaiting, isWorkerWaiting
</div>
{/* Input area — question widget replaces textarea when a question is pending */}
{pendingQuestion && pendingOptions && onQuestionSubmit ? (
{pendingQuestions && pendingQuestions.length >= 2 && onMultiQuestionSubmit ? (
<MultiQuestionWidget
questions={pendingQuestions}
onSubmit={onMultiQuestionSubmit}
onDismiss={onQuestionDismiss}
/>
) : pendingQuestion && pendingOptions && onQuestionSubmit ? (
<QuestionWidget
question={pendingQuestion}
options={pendingOptions}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,215 @@
import { useState, useRef, useEffect, useCallback } from "react";
import { Send, MessageCircleQuestion, X } from "lucide-react";
export interface QuestionItem {
id: string;
prompt: string;
options?: string[];
}
export interface MultiQuestionWidgetProps {
questions: QuestionItem[];
onSubmit: (answers: Record<string, string>) => void;
onDismiss?: () => void;
}
export default function MultiQuestionWidget({ questions, onSubmit, onDismiss }: MultiQuestionWidgetProps) {
// Per-question state: selected index (null = nothing, options.length = "Other")
const [selections, setSelections] = useState<(number | null)[]>(
() => questions.map(() => null),
);
const [customTexts, setCustomTexts] = useState<string[]>(
() => questions.map(() => ""),
);
const [submitted, setSubmitted] = useState(false);
const containerRef = useRef<HTMLDivElement>(null);
// Scroll the first unanswered question into view when it changes
useEffect(() => {
containerRef.current?.scrollTo({ top: 0, behavior: "smooth" });
}, []);
const canSubmit = questions.every((q, i) => {
const sel = selections[i];
if (sel === null) return false;
const isOther = q.options ? sel === q.options.length : true;
if (isOther && !customTexts[i].trim()) return false;
return true;
});
const handleSubmit = useCallback(() => {
if (!canSubmit || submitted) return;
setSubmitted(true);
const answers: Record<string, string> = {};
for (let i = 0; i < questions.length; i++) {
const q = questions[i];
const sel = selections[i]!;
const isOther = q.options ? sel === q.options.length : true;
answers[q.id] = isOther ? customTexts[i].trim() : q.options![sel];
}
onSubmit(answers);
}, [canSubmit, submitted, questions, selections, customTexts, onSubmit]);
// Enter to submit (only when not focused on a text input)
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
if (submitted) return;
const target = e.target as HTMLElement;
const inInput = target.tagName === "INPUT" || target.tagName === "TEXTAREA";
if (e.key === "Enter" && !e.shiftKey && !inInput) {
e.preventDefault();
handleSubmit();
}
};
window.addEventListener("keydown", handleKeyDown);
return () => window.removeEventListener("keydown", handleKeyDown);
}, [handleSubmit, submitted]);
if (submitted) return null;
const answeredCount = selections.filter((s) => s !== null).length;
return (
<div className="p-4">
<div className="bg-card border border-border rounded-xl shadow-sm overflow-hidden">
{/* Header */}
<div className="px-5 pt-4 pb-2 flex items-center gap-3">
<div className="w-7 h-7 rounded-lg bg-primary/10 border border-primary/20 flex items-center justify-center flex-shrink-0">
<MessageCircleQuestion className="w-3.5 h-3.5 text-primary" />
</div>
<div className="flex-1 min-w-0">
<p className="text-sm font-medium text-foreground">
{questions.length} questions
</p>
<p className="text-[11px] text-muted-foreground">
{answeredCount}/{questions.length} answered
</p>
</div>
{onDismiss && (
<button
onClick={onDismiss}
className="p-1 rounded-md text-muted-foreground hover:text-foreground hover:bg-muted/60 transition-colors flex-shrink-0"
>
<X className="w-4 h-4" />
</button>
)}
</div>
{/* Questions */}
<div
ref={containerRef}
className="px-5 pb-3 space-y-4 max-h-[400px] overflow-y-auto"
>
{questions.map((q, qi) => {
const sel = selections[qi];
const hasOptions = q.options && q.options.length >= 2;
const otherIndex = hasOptions ? q.options!.length : 0;
const isOtherSelected = sel === otherIndex;
return (
<div key={q.id} className="space-y-1.5">
<p className="text-sm font-medium text-foreground">
<span className="text-xs text-muted-foreground mr-1.5">
{qi + 1}.
</span>
{q.prompt}
</p>
{hasOptions ? (
<>
{q.options!.map((opt, oi) => (
<button
key={oi}
onClick={() => {
setSelections((prev) => {
const next = [...prev];
next[qi] = oi;
return next;
});
}}
className={`w-full text-left px-4 py-2 rounded-lg border text-sm transition-colors ${
sel === oi
? "border-primary bg-primary/10 text-foreground"
: "border-border/60 bg-muted/20 text-foreground hover:border-primary/40 hover:bg-muted/40"
}`}
>
{opt}
</button>
))}
<input
type="text"
value={customTexts[qi]}
onFocus={() => {
setSelections((prev) => {
const next = [...prev];
next[qi] = otherIndex;
return next;
});
}}
onChange={(e) => {
setSelections((prev) => {
const next = [...prev];
next[qi] = otherIndex;
return next;
});
setCustomTexts((prev) => {
const next = [...prev];
next[qi] = e.target.value;
return next;
});
}}
placeholder="Type a custom response..."
className={`w-full px-4 py-2 rounded-lg border border-dashed text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none ${
isOtherSelected
? "border-primary bg-primary/10 text-foreground"
: "border-border text-muted-foreground hover:border-primary/40"
}`}
/>
</>
) : (
<input
type="text"
value={customTexts[qi]}
onFocus={() => {
setSelections((prev) => {
const next = [...prev];
next[qi] = 0;
return next;
});
}}
onChange={(e) => {
setSelections((prev) => {
const next = [...prev];
next[qi] = 0;
return next;
});
setCustomTexts((prev) => {
const next = [...prev];
next[qi] = e.target.value;
return next;
});
}}
placeholder="Type your answer..."
className="w-full px-4 py-2 rounded-lg border text-sm transition-colors bg-transparent placeholder:text-muted-foreground focus:outline-none border-border text-foreground hover:border-primary/40 focus:border-primary"
/>
)}
</div>
);
})}
</div>
{/* Submit */}
<div className="px-5 pb-4">
<button
onClick={handleSubmit}
disabled={!canSubmit}
className="w-full flex items-center justify-center gap-2 py-2.5 rounded-lg text-sm font-medium bg-primary text-primary-foreground hover:bg-primary/90 disabled:opacity-30 disabled:cursor-not-allowed transition-colors"
>
<Send className="w-3.5 h-3.5" />
Submit All
</button>
</div>
</div>
</div>
);
}
@@ -299,13 +299,13 @@ function SubagentsTab({ subAgentIds, allNodeSpecs, subagentReports }: { subAgent
);
}
type Tab = "overview" | "tools" | "logs" | "prompt" | "subagents";
type Tab = "overview" | "breakdown" | "tools" | "logs" | "subagents";
const tabs: { id: Tab; label: string; Icon: React.FC<{ className?: string }> }[] = [
{ id: "overview", label: "Overview", Icon: ({ className }) => <GitBranch className={className} /> },
{ id: "breakdown", label: "Breakdown", Icon: ({ className }) => <BookOpen className={className} /> },
{ id: "tools", label: "Tools", Icon: ({ className }) => <Wrench className={className} /> },
{ id: "logs", label: "Logs", Icon: ({ className }) => <Terminal className={className} /> },
{ id: "prompt", label: "Prompt", Icon: ({ className }) => <BookOpen className={className} /> },
{ id: "subagents", label: "Subagents", Icon: ({ className }) => <Bot className={className} /> },
];
@@ -331,7 +331,7 @@ export default function NodeDetailPanel({ node, nodeSpec, allNodeSpecs, subagent
// Fetch real criteria when Overview tab is active and session is loaded
useEffect(() => {
if (activeTab === "overview" && sessionId && graphId && node) {
if (activeTab === "breakdown" && sessionId && graphId && node) {
graphsApi.nodeCriteria(sessionId, graphId, node.id, workerSessionId || undefined)
.then(r => setRealCriteria(r))
.catch(() => setRealCriteria(null));
@@ -410,6 +410,10 @@ export default function NodeDetailPanel({ node, nodeSpec, allNodeSpecs, subagent
{/* Tab content */}
<div className="flex-1 overflow-auto px-4 py-4 flex flex-col gap-3">
{activeTab === "overview" && (
<SystemPromptTab systemPrompt={nodeSpec?.system_prompt} />
)}
{activeTab === "breakdown" && (
<>
<p className="text-[10px] font-medium text-muted-foreground uppercase tracking-wider">Action Plan</p>
{actionPlan ? (
@@ -489,10 +493,6 @@ export default function NodeDetailPanel({ node, nodeSpec, allNodeSpecs, subagent
<LogsTab nodeId={node.id} isActive={isActive} sessionId={sessionId} graphId={graphId} workerSessionId={workerSessionId} nodeLogs={nodeLogs} />
)}
{activeTab === "prompt" && (
<SystemPromptTab systemPrompt={nodeSpec?.system_prompt} />
)}
{activeTab === "subagents" && nodeSpec?.sub_agents && (
<SubagentsTab
subAgentIds={nodeSpec.sub_agents}
+3 -3
View File
@@ -1,8 +1,8 @@
import { useState, useCallback } from "react";
import { useNavigate } from "react-router-dom";
import { Crown, X } from "lucide-react";
import { loadPersistedTabs, savePersistedTabs, TAB_STORAGE_KEY, type PersistedTabState } from "@/lib/tab-persistence";
import { sessionsApi } from "@/api/sessions";
import { loadPersistedTabs, savePersistedTabs, TAB_STORAGE_KEY, type PersistedTabState } from "@/lib/tab-persistence";
export interface TopBarTab {
agentType: string;
@@ -51,10 +51,10 @@ export default function TopBar({ tabs: tabsProp, onTabClick, onCloseTab, canClos
onCloseTab(agentType);
return;
}
// Kill the backend session (queen/judge/worker) even outside workspace
// Kill the backend session (queen/worker) even outside workspace
sessionsApi.list()
.then(({ sessions }) => {
const match = sessions.find(s => s.agent_path === agentType);
const match = sessions.find(s => s.agent_path.endsWith(agentType));
if (match) return sessionsApi.stop(match.session_id);
})
.catch(() => {}); // fire-and-forget
+27
View File
@@ -72,6 +72,33 @@
--border: 240 3.7% 15.9%;
--input: 240 3.7% 15.9%;
--ring: 45 93% 47%;
/* Agent graph node status colors */
--node-running: 45 95% 58%;
--node-looping: 38 90% 55%;
--node-complete: 43 70% 45%;
--node-pending: 35 15% 28%;
--node-pending-bg: 35 10% 12%;
--node-pending-border: 35 10% 20%;
--node-error: 0 65% 55%;
/* Agent graph trigger node colors */
--trigger-bg: 210 25% 14%;
--trigger-border: 210 30% 30%;
--trigger-text: 210 30% 65%;
--trigger-icon: 210 40% 55%;
/* Draft graph chrome colors */
--draft-edge: 220 10% 30%;
--draft-edge-arrow: 220 10% 35%;
--draft-edge-label: 220 10% 45%;
--draft-back-edge: 220 10% 25%;
--draft-group-fill: 220 15% 18%;
--draft-group-stroke: 220 10% 40%;
--draft-chrome-text: 220 10% 50%;
--draft-chrome-text-dim: 220 10% 55%;
--draft-node-text: 0 0% 78%;
--draft-node-text-hover: 0 0% 92%;
}
}
+22 -65
View File
@@ -1,60 +1,6 @@
import { describe, it, expect } from "vitest";
import { backendMessageToChatMessage, sseEventToChatMessage, formatAgentDisplayName } from "./chat-helpers";
import type { AgentEvent, Message } from "@/api/types";
// ---------------------------------------------------------------------------
// backendMessageToChatMessage
// ---------------------------------------------------------------------------
describe("backendMessageToChatMessage", () => {
it("converts a user message", () => {
const msg: Message = { seq: 1, role: "user", content: "hello", _node_id: "chat" };
const result = backendMessageToChatMessage(msg, "inbox-management");
expect(result.type).toBe("user");
expect(result.agent).toBe("You");
expect(result.role).toBeUndefined();
expect(result.content).toBe("hello");
expect(result.thread).toBe("inbox-management");
});
it("converts an assistant message with node_id as agent", () => {
const msg: Message = { seq: 2, role: "assistant", content: "hi", _node_id: "intake" };
const result = backendMessageToChatMessage(msg, "inbox-management");
expect(result.agent).toBe("intake");
expect(result.role).toBe("worker");
expect(result.type).toBeUndefined();
});
it("defaults agent to 'Agent' when _node_id is empty", () => {
const msg: Message = { seq: 3, role: "assistant", content: "ok", _node_id: "" };
const result = backendMessageToChatMessage(msg, "inbox-management");
expect(result.agent).toBe("Agent");
});
it("produces deterministic ID from seq", () => {
const msg: Message = { seq: 42, role: "user", content: "test", _node_id: "x" };
const result = backendMessageToChatMessage(msg, "thread");
expect(result.id).toBe("backend-42");
});
it("passes through the thread parameter", () => {
const msg: Message = { seq: 1, role: "user", content: "hi", _node_id: "x" };
const result = backendMessageToChatMessage(msg, "my-thread");
expect(result.thread).toBe("my-thread");
});
it("uses agentDisplayName instead of node_id when provided", () => {
const msg: Message = { seq: 2, role: "assistant", content: "hi", _node_id: "intake" };
const result = backendMessageToChatMessage(msg, "thread", "Competitive Intel Agent");
expect(result.agent).toBe("Competitive Intel Agent");
});
it("still shows 'You' for user messages even when agentDisplayName is provided", () => {
const msg: Message = { seq: 1, role: "user", content: "hello", _node_id: "chat" };
const result = backendMessageToChatMessage(msg, "thread", "My Agent");
expect(result.agent).toBe("You");
});
});
import { sseEventToChatMessage, formatAgentDisplayName } from "./chat-helpers";
import type { AgentEvent } from "@/api/types";
// ---------------------------------------------------------------------------
// sseEventToChatMessage
@@ -261,25 +207,36 @@ describe("sseEventToChatMessage", () => {
expect(result!.id).toMatch(/^stream-t-\d+-chat$/);
});
it("converts client_input_requested with prompt to message", () => {
it("returns null for client_input_requested (handled in workspace.tsx)", () => {
const event = makeEvent({
type: "client_input_requested",
node_id: "chat",
execution_id: "abc",
data: { prompt: "What next?" },
});
const result = sseEventToChatMessage(event, "t");
expect(result).not.toBeNull();
expect(result!.content).toBe("What next?");
expect(result!.role).toBe("worker");
expect(sseEventToChatMessage(event, "t")).toBeNull();
});
it("returns null for client_input_requested without prompt", () => {
it("converts client_input_received to user message", () => {
const event = makeEvent({
type: "client_input_requested",
node_id: "chat",
type: "client_input_received",
node_id: "queen",
execution_id: "abc",
data: { prompt: "" },
data: { content: "do the thing" },
});
const result = sseEventToChatMessage(event, "t");
expect(result).not.toBeNull();
expect(result!.agent).toBe("You");
expect(result!.type).toBe("user");
expect(result!.content).toBe("do the thing");
});
it("returns null for client_input_received with empty content", () => {
const event = makeEvent({
type: "client_input_received",
node_id: "queen",
execution_id: "abc",
data: { content: "" },
});
expect(sseEventToChatMessage(event, "t")).toBeNull();
});
+41 -29
View File
@@ -1,10 +1,10 @@
/**
* Pure functions for converting backend messages and SSE events into ChatMessage objects.
* Pure functions for converting SSE events into ChatMessage objects.
* No React dependencies — just JSON in, object out.
*/
import type { ChatMessage } from "@/components/ChatPanel";
import type { AgentEvent, Message } from "@/api/types";
import type { AgentEvent } from "@/api/types";
/**
* Derive a human-readable display name from a raw agent identifier.
@@ -27,32 +27,6 @@ export function formatAgentDisplayName(raw: string): string {
.trim();
}
/**
* Convert a backend Message (from sessionsApi.messages()) into a ChatMessage.
* When agentDisplayName is provided, it is used as the sender for all agent
* messages instead of the raw node_id.
*/
export function backendMessageToChatMessage(
msg: Message,
thread: string,
agentDisplayName?: string,
): ChatMessage {
// Use file-mtime created_at (epoch seconds → ms) for cross-conversation
// ordering; fall back to seq for backwards compatibility.
const createdAt = msg.created_at ? msg.created_at * 1000 : msg.seq;
return {
id: `backend-${msg._node_id}-${msg.seq}`,
agent: msg.role === "user" ? "You" : agentDisplayName || msg._node_id || "Agent",
agentColor: "",
content: msg.content,
timestamp: "",
type: msg.role === "user" ? "user" : undefined,
role: msg.role === "user" ? undefined : "worker",
thread,
createdAt,
};
}
/**
* Convert an SSE AgentEvent into a ChatMessage, or null if the event
* doesn't produce a visible chat message.
@@ -101,6 +75,21 @@ export function sseEventToChatMessage(
// create a worker_input_request message and set awaitingInput state.
return null;
case "client_input_received": {
const userContent = (event.data?.content as string) || "";
if (!userContent) return null;
return {
id: `user-input-${event.timestamp}`,
agent: "You",
agentColor: "",
content: userContent,
timestamp: "",
type: "user",
thread,
createdAt,
};
}
case "llm_text_delta": {
const snapshot = (event.data?.snapshot as string) || (event.data?.content as string) || "";
if (!snapshot) return null;
@@ -121,7 +110,8 @@ export function sseEventToChatMessage(
id: `paused-${event.execution_id}`,
agent: "System",
agentColor: "",
content: "Execution paused by user",
content:
(event.data?.reason as string) || "Execution paused",
timestamp: "",
type: "system",
thread,
@@ -147,3 +137,25 @@ export function sseEventToChatMessage(
return null;
}
}
type QueenPhase = "planning" | "building" | "staging" | "running";
const VALID_PHASES = new Set<string>(["planning", "building", "staging", "running"]);
/**
* Scan an array of persisted events and return the last queen phase seen,
* or null if no phase event exists. Reads both `queen_phase_changed` events
* and the per-iteration `phase` metadata on `node_loop_iteration` events.
*/
export function extractLastPhase(events: AgentEvent[]): QueenPhase | null {
let last: QueenPhase | null = null;
for (const evt of events) {
const phase =
evt.type === "queen_phase_changed" ? (evt.data?.phase as string) :
evt.type === "node_loop_iteration" ? (evt.data?.phase as string | undefined) :
undefined;
if (phase && VALID_PHASES.has(phase)) {
last = phase as QueenPhase;
}
}
return last;
}
+1
View File
@@ -51,6 +51,7 @@ export function topologyToGraphNodes(topology: GraphTopology): GraphNode[] {
triggerConfig: {
...ep.trigger_config,
...(ep.next_fire_in != null ? { next_fire_in: ep.next_fire_in } : {}),
...(ep.task ? { task: ep.task } : {}),
},
next: [ep.entry_node],
});
+1 -1
View File
@@ -113,7 +113,7 @@ export default function MyAgents() {
<div className="flex items-center gap-1">
<Activity className="w-3 h-3" />
<span>
{agent.session_count} session{agent.session_count !== 1 ? "s" : ""}
{agent.run_count} run{agent.run_count !== 1 ? "s" : ""}
</span>
</div>
<span>{agent.last_active ? timeAgo(agent.last_active) : "Never run"}</span>
File diff suppressed because it is too large Load Diff
-2
View File
@@ -11,12 +11,10 @@ dependencies = [
"litellm>=1.81.0",
"mcp>=1.0.0",
"fastmcp>=2.0.0",
"textual>=1.0.0",
"tools",
]
[project.optional-dependencies]
tui = ["textual>=0.75.0"]
webhook = ["aiohttp>=3.9.0"]
server = ["aiohttp>=3.9.0"]
testing = [
-90
View File
@@ -1,90 +0,0 @@
"""Tests for ChatTextArea key handling (Enter submits, Shift+Enter / Ctrl+J insert newlines)."""
import pytest
from textual.app import App, ComposeResult
from framework.tui.widgets.chat_repl import ChatTextArea
class ChatTextAreaApp(App):
"""Minimal app that mounts a ChatTextArea for testing."""
submitted_texts: list[str]
def compose(self) -> ComposeResult:
yield ChatTextArea(id="input")
def on_mount(self) -> None:
self.submitted_texts = []
def on_chat_text_area_submitted(self, message: ChatTextArea.Submitted) -> None:
self.submitted_texts.append(message.text)
@pytest.fixture
def app():
return ChatTextAreaApp()
@pytest.mark.asyncio
async def test_enter_submits_text(app):
"""Pressing Enter should post a Submitted message and clear the widget."""
async with app.run_test() as pilot:
await pilot.press("h", "e", "l", "l", "o")
await pilot.press("enter")
assert app.submitted_texts == ["hello"]
@pytest.mark.asyncio
async def test_enter_on_empty_does_not_submit(app):
"""Pressing Enter with no text should not post a Submitted message."""
async with app.run_test() as pilot:
await pilot.press("enter")
assert app.submitted_texts == []
@pytest.mark.asyncio
async def test_shift_enter_inserts_newline(app):
"""Shift+Enter should insert a newline, not submit."""
async with app.run_test() as pilot:
widget = app.query_one("#input", ChatTextArea)
await pilot.press("a")
await pilot.press("shift+enter")
await pilot.press("b")
assert app.submitted_texts == []
assert "\n" in widget.text
assert widget.text.startswith("a")
assert widget.text.endswith("b")
@pytest.mark.asyncio
async def test_ctrl_j_inserts_newline(app):
"""Ctrl+J should insert a newline (fallback for terminals without Shift+Enter)."""
async with app.run_test() as pilot:
widget = app.query_one("#input", ChatTextArea)
await pilot.press("a")
await pilot.press("ctrl+j")
await pilot.press("b")
assert app.submitted_texts == []
assert "\n" in widget.text
assert widget.text.startswith("a")
assert widget.text.endswith("b")
@pytest.mark.asyncio
async def test_multiline_submit(app):
"""Typing multiline text via Ctrl+J then pressing Enter should submit all lines."""
async with app.run_test() as pilot:
await pilot.press("a")
await pilot.press("ctrl+j")
await pilot.press("b")
await pilot.press("enter")
assert len(app.submitted_texts) == 1
assert app.submitted_texts[0] == "a\nb"
+1 -1
View File
@@ -572,7 +572,7 @@ async def test_event_loop_conversation_compaction():
judge = CountingJudge(retry_count=3)
node = EventLoopNode(
judge=judge,
config=LoopConfig(max_iterations=10, max_history_tokens=200),
config=LoopConfig(max_iterations=10, max_context_tokens=200),
)
result = await node.execute(ctx)
+8 -5
View File
@@ -763,7 +763,7 @@ class TestClientFacingBlocking:
class TestEscalate:
@pytest.mark.asyncio
async def test_escalate_emits_event(self, runtime, node_spec, memory):
"""escalate() should publish ESCALATION_REQUESTED."""
"""escalate() should publish ESCALATION_REQUESTED and block for queen guidance."""
node_spec.output_keys = []
llm = MockStreamingLLM(
scenarios=[
@@ -772,7 +772,6 @@ class TestEscalate:
{
"reason": "tool failure",
"context": "HTTP 401 from upstream",
"wait_for_response": False,
},
tool_use_id="escalate_1",
),
@@ -790,6 +789,12 @@ class TestEscalate:
ctx = build_ctx(runtime, node_spec, memory, llm, stream_id="worker")
node = EventLoopNode(event_bus=bus, config=LoopConfig(max_iterations=5))
async def queen_reply():
await asyncio.sleep(0.05)
await node.inject_event("Acknowledged, proceed.")
task = asyncio.create_task(queen_reply())
async def queen_reply():
await asyncio.sleep(0.05)
await node.inject_event("Acknowledged, proceed.")
@@ -815,7 +820,6 @@ class TestEscalate:
{
"reason": "blocked",
"context": "dependency missing",
"wait_for_response": False,
},
tool_use_id="escalate_1",
),
@@ -856,7 +860,7 @@ class TestEscalate:
@pytest.mark.asyncio
async def test_escalate_waits_for_queen_input_and_skips_judge(self, runtime, node_spec, memory):
"""wait_for_response=true should block for queen input before judge evaluation."""
"""escalate() should block for queen input before judge evaluation."""
node_spec.output_keys = ["result"]
llm = MockStreamingLLM(
scenarios=[
@@ -865,7 +869,6 @@ class TestEscalate:
{
"reason": "need direction",
"context": "conflicting constraints",
"wait_for_response": True,
},
tool_use_id="escalate_1",
),
-13
View File
@@ -40,16 +40,3 @@ class TestMCPDependencies:
from mcp.server import FastMCP
assert FastMCP is not None
class TestMCPPackageExports:
"""Tests for the framework.mcp package exports."""
def test_package_importable(self):
"""Test that framework.mcp package can be imported."""
if not MCP_AVAILABLE:
pytest.skip(MCP_SKIP_REASON)
import framework.mcp
assert framework.mcp is not None

Some files were not shown because too many files have changed in this diff Show More